-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
fix(releases): Fix environment filtering for per-project new groups count in old releases serializer #101003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(releases): Fix environment filtering for per-project new groups count in old releases serializer #101003
Conversation
@sentry review |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #101003 +/- ##
===========================================
+ Coverage 76.52% 81.14% +4.62%
===========================================
Files 8653 8658 +5
Lines 384048 384155 +107
Branches 24249 24186 -63
===========================================
+ Hits 293880 311721 +17841
+ Misses 89824 72090 -17734
Partials 344 344 |
aggregated_new_issues_count=Sum("new_issues_count") | ||
).values_list("project_id", "release_id", "aggregated_new_issues_count"): | ||
for project_id, release_id, new_groups in ( | ||
release_project_envs.order_by() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering is not required and specifying order_by
with no arguments does not order the results.
).values_list("project_id", "release_id", "aggregated_new_issues_count"): | ||
for project_id, release_id, new_groups in ( | ||
release_project_envs.order_by() | ||
.values("project_id", "release_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed since you're calling values_list below.
Hi @cmanallen here is my explanation for why I changed the query (sorry for the wall of text)! The old query will sometimes fail when we filter by multiple environments. Assuming we’re working in release 1.0.0 (id “r123”) and we have 4 new groups for project A (id “a123”) and 2 new groups for project B (id "b123”):
This is the same setup as in The old query will generate SQL like this: Note that there’s an order-by at the very end despite no ordering being included in the code, and that we’re grouping by id. This will essentially group every single row individually. We get a query set like so: And as expected we get one item per row in the ReleaseProjectEnvironment table:
And when we calculate So instead of grouping by row we want to group by (project_id, release_id). If we add Note that for some reason we’re also grouping by Now if we also add Now we’re only grouping by project and release id. I think this is because adding And with this query we finally get the correct query set: Let me know if this makes sense to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srest2021 The .values() method alters the query's result type. It does not modify the query. Using a bare order_by
does reset ordering but ask yourself the question, why should ordering matter at all for this function?
Let's abstract the queries into their own functions. Then relentlessly unit test the functions and ensure they are deterministic and that you totally and completely understand what each component of the function is returning and being transformed to. Retrieving counts should not be order dependent otherwise we're making a mistake in calculation.
Radical simplicity should be your goal. Strip away everything. Start from scratch. They are small queries. Re-write them in their most ideal form.
if project is not None: | ||
release_project_envs = release_project_envs.filter(project=project) | ||
|
||
return release_project_envs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not just accessing project id? We don't access any other attributes of release_project_envs.project.
def _get_release_project_envs_unordered(self, item_list, environments, project): | ||
release_project_envs = ReleaseProjectEnvironment.objects.filter( | ||
release__in=item_list | ||
).select_related("release") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we joining the release model? This call to select_related
seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We access release_project_env.release.version in other parts of __get_release_data_with_environments, for example:
release_project_env.release.version not in first_seen |
Previously, we were using release.newGroups to populate the per-project new issues counts for releases. However, this count is the total number of new issues for this release. Here, we instead use release.projects[x].newGroups, which now contains the correct number of new issues for this release in the selected project. We switch to using release.projects[x].newGroups in the following places: releases index, releases drawer, session health reverts #99555, which switched from release.projects[x].newGroups to release.newGroups followup to #101003, which fixed the backend bug when calculating release.projects[x].newGroups ### Demo Setup RELEASE 1.0.0 Project A - **3** total new groups - Development - **2** new groups (Error & TypeError) - Production - **1** new group (SyntaxError) Project B - **4** total new groups - Development - **3** new groups (ReferenceError & EvalError & URIError - Production - **1** new group (SyntaxError) RELEASE 2.0.0 Project A - **1** total new group - Development - **1** new group (RangeError) ### Demo Before: the new issues counts for each project are all equal to the total number of new issues in that release (eg, 7 new groups for each project in Release 1.0.0 == 2+1+3+1) https://github.com/user-attachments/assets/043ceed5-f0cb-4c6a-a954-837830d2c091 After: we get the correct per-project new issues counts, even when filtering by project or env or both https://github.com/user-attachments/assets/df5c4057-235d-4b4f-a93f-3c7aea2af656
Previously, the per-project new groups count was being populated by ReleaseProject's new_groups. However, this count becomes incorrect when filtering by environment. Here we populate the per-project new groups count with the counts obtained from ReleaseProjectEnvironment when available. This fix only applies to the old serializer.
We also fix a Django aggregation that calculates the new groups counts when environments are present. Previously the query was grouping once per row, and ordering by first seen, like so:
... GROUP BY "sentry_releaseprojectenvironment"."id" ORDER BY "sentry_releaseprojectenvironment"."first_seen" DESC
But if we try to filter by multiple environments, this won't work because we can have multiple rows for a single (release, project) pairing, and whichever environment is ordered last will "win" and have its values set as the new groups counts.
In this PR, we make a new helper function that skips the unnecessary ordering, and we modify the query to group only by project and release id. We also add test coverage for new groups counts.