Skip to content

Conversation

srest2021
Copy link
Member

@srest2021 srest2021 commented Oct 6, 2025

Previously, the per-project new groups count was being populated by ReleaseProject's new_groups. However, this count becomes incorrect when filtering by environment. Here we populate the per-project new groups count with the counts obtained from ReleaseProjectEnvironment when available. This fix only applies to the old serializer.

We also fix a Django aggregation that calculates the new groups counts when environments are present. Previously the query was grouping once per row, and ordering by first seen, like so:
... GROUP BY "sentry_releaseprojectenvironment"."id" ORDER BY "sentry_releaseprojectenvironment"."first_seen" DESC

But if we try to filter by multiple environments, this won't work because we can have multiple rows for a single (release, project) pairing, and whichever environment is ordered last will "win" and have its values set as the new groups counts.

In this PR, we make a new helper function that skips the unnecessary ordering, and we modify the query to group only by project and release id. We also add test coverage for new groups counts.

@srest2021 srest2021 changed the title fix(releases): Fix environment filtering for new groups count fix(releases): Fix environment filtering for per-project new groups count Oct 6, 2025
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Oct 6, 2025
@srest2021
Copy link
Member Author

@sentry review

Copy link

codecov bot commented Oct 6, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #101003      +/-   ##
===========================================
+ Coverage   76.52%    81.14%   +4.62%     
===========================================
  Files        8653      8658       +5     
  Lines      384048    384155     +107     
  Branches    24249     24186      -63     
===========================================
+ Hits       293880    311721   +17841     
+ Misses      89824     72090   -17734     
  Partials      344       344              

@srest2021 srest2021 marked this pull request as ready for review October 6, 2025 20:59
aggregated_new_issues_count=Sum("new_issues_count")
).values_list("project_id", "release_id", "aggregated_new_issues_count"):
for project_id, release_id, new_groups in (
release_project_envs.order_by()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering is not required and specifying order_by with no arguments does not order the results.

).values_list("project_id", "release_id", "aggregated_new_issues_count"):
for project_id, release_id, new_groups in (
release_project_envs.order_by()
.values("project_id", "release_id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed since you're calling values_list below.

@srest2021
Copy link
Member Author

srest2021 commented Oct 7, 2025

Hi @cmanallen here is my explanation for why I changed the query (sorry for the wall of text)!

The old query will sometimes fail when we filter by multiple environments.

Assuming we’re working in release 1.0.0 (id “r123”) and we have 4 new groups for project A (id “a123”) and 2 new groups for project B (id "b123”):

  • For project A: 3 issues in production, 1 issue in staging (total = 4)
  • For project B: 2 issues in production, 0 issues in staging (total = 2)

This is the same setup as in test_new_groups_environment_filtering. Say we are filtering by the production and staging environments.

The old query will generate SQL like this:
SELECT "sentry_releaseprojectenvironment"."project_id" AS "project_id", "sentry_releaseprojectenvironment"."release_id" AS "release_id", SUM("sentry_releaseprojectenvironment"."new_issues_count") AS "aggregated_new_issues_count" FROM "sentry_releaseprojectenvironment" INNER JOIN "sentry_environment" ON ("sentry_releaseprojectenvironment"."environment_id" = "sentry_environment"."id") WHERE ("sentry_releaseprojectenvironment"."release_id" IN (r123) AND "sentry_environment"."name" IN (production, staging)) GROUP BY "sentry_releaseprojectenvironment"."id" ORDER BY "sentry_releaseprojectenvironment"."first_seen" DESC

Note that there’s an order-by at the very end despite no ordering being included in the code, and that we’re grouping by id. This will essentially group every single row individually.

We get a query set like so: <BaseQuerySet [(b123, r123, 0), (b123, r123, 2), (a123, r123, 1), (a123, r123, 3)]>

And as expected we get one item per row in the ReleaseProjectEnvironment table:

  • (b123, r123, 0) is # new groups in staging for project B
  • (b123, r123, 2) is # new groups in prod for project B
  • (a123, r123, 1) is # new groups in staging for project A
  • (a123, r123, 3) is # new groups in prod for project A

And when we calculate group_counts_by_release, we get this: {r123: {b123: 2, a123: 3}} instead of the correct counts: {r123: {b123: 2, a123: 4}}. Because we’re grouping each row individually, when we run the for project_id, release_id, new_groups loop on this queryset, all the rows for a project will overwrite each other and the last row for that project will “win”. So for example, because (a123, r123, 3) was the last row for project a, that’s why we get 3 new groups for project a instead of the correct total 3+1=4. That's why the old query will fail only if we want multiple environments / multiple rows per project/release pair.

So instead of grouping by row we want to group by (project_id, release_id). If we add .values("project_id", "release_id”) to the query, we get this SQL:
SELECT "sentry_releaseprojectenvironment"."project_id" AS "project_id", "sentry_releaseprojectenvironment"."release_id" AS "release_id", SUM("sentry_releaseprojectenvironment"."new_issues_count") AS "aggregated_new_issues_count" FROM "sentry_releaseprojectenvironment" INNER JOIN "sentry_environment" ON ("sentry_releaseprojectenvironment"."environment_id" = "sentry_environment"."id") WHERE ("sentry_releaseprojectenvironment"."release_id" IN (r123) AND "sentry_environment"."name" IN (production, staging)) GROUP BY 1, 2, "sentry_releaseprojectenvironment"."first_seen" ORDER BY "sentry_releaseprojectenvironment"."first_seen" DESC

Note that for some reason we’re also grouping by first_seen and we’re still ordering by first_seen as well. This query will give us the same exact query set and group_counts_by_release. I suspect it’s because we’re still not grouping only by project and release id.

Now if we also add .order_by() to the query, in addition to .values("project_id", "release_id”), we get this:
SELECT "sentry_releaseprojectenvironment"."project_id" AS "project_id", "sentry_releaseprojectenvironment"."release_id" AS "release_id", SUM("sentry_releaseprojectenvironment"."new_issues_count") AS "aggregated_new_issues_count" FROM "sentry_releaseprojectenvironment" INNER JOIN "sentry_environment" ON ("sentry_releaseprojectenvironment"."environment_id" = "sentry_environment"."id") WHERE ("sentry_releaseprojectenvironment"."release_id" IN (r123) AND "sentry_environment"."name" IN (production, staging)) GROUP BY 1, 2

Now we’re only grouping by project and release id. I think this is because adding order_by() clears whatever default ordering was being added on top of the original query.

And with this query we finally get the correct query set: <BaseQuerySet [(a123, r123, 4), (b123, r123, 2)]> and the correct group_counts_by_release: {r123: {b123: 2, a123: 4}}

Let me know if this makes sense to you.

Copy link
Member

@cmanallen cmanallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srest2021 The .values() method alters the query's result type. It does not modify the query. Using a bare order_by does reset ordering but ask yourself the question, why should ordering matter at all for this function?

Let's abstract the queries into their own functions. Then relentlessly unit test the functions and ensure they are deterministic and that you totally and completely understand what each component of the function is returning and being transformed to. Retrieving counts should not be order dependent otherwise we're making a mistake in calculation.

Radical simplicity should be your goal. Strip away everything. Start from scratch. They are small queries. Re-write them in their most ideal form.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@srest2021 srest2021 changed the title fix(releases): Fix environment filtering for per-project new groups count fix(releases): Fix environment filtering for per-project new groups count in old releases serializer Oct 8, 2025
if project is not None:
release_project_envs = release_project_envs.filter(project=project)

return release_project_envs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: N+1 Query Issue in Release Data Retrieval

The new _get_release_project_envs_unordered method, used when environments are specified, omits select_related("project"). This means __get_release_data_with_environments will trigger N+1 database queries when accessing the project relation.

Fix in Cursor Fix in Web

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not just accessing project id? We don't access any other attributes of release_project_envs.project.

def _get_release_project_envs_unordered(self, item_list, environments, project):
release_project_envs = ReleaseProjectEnvironment.objects.filter(
release__in=item_list
).select_related("release")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we joining the release model? This call to select_related seems unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We access release_project_env.release.version in other parts of __get_release_data_with_environments, for example:

release_project_env.release.version not in first_seen

@srest2021 srest2021 merged commit d568d52 into master Oct 8, 2025
65 checks passed
@srest2021 srest2021 deleted the srest2021/fix-new-groups-count-old-serializer branch October 8, 2025 17:35
srest2021 added a commit that referenced this pull request Oct 8, 2025
Previously, we were using release.newGroups to populate the per-project
new issues counts for releases. However, this count is the total number
of new issues for this release. Here, we instead use
release.projects[x].newGroups, which now contains the correct number of
new issues for this release in the selected project.

We switch to using release.projects[x].newGroups in the following
places: releases index, releases drawer, session health

reverts #99555, which switched
from release.projects[x].newGroups to release.newGroups

followup to #101003, which fixed
the backend bug when calculating release.projects[x].newGroups

### Demo Setup 

RELEASE 1.0.0

Project A - **3** total new groups
- Development - **2** new groups (Error & TypeError)
- Production - **1** new group (SyntaxError)

Project B - **4** total new groups
- Development - **3** new groups (ReferenceError & EvalError & URIError
- Production - **1** new group (SyntaxError)

RELEASE 2.0.0

Project A - **1** total new group
- Development - **1** new group (RangeError)

### Demo

Before: the new issues counts for each project are all equal to the
total number of new issues in that release (eg, 7 new groups for each
project in Release 1.0.0 == 2+1+3+1)


https://github.com/user-attachments/assets/043ceed5-f0cb-4c6a-a954-837830d2c091


After: we get the correct per-project new issues counts, even when
filtering by project or env or both


https://github.com/user-attachments/assets/df5c4057-235d-4b4f-a93f-3c7aea2af656
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants