ref(releases): Remove DISTINCT query modifier and use EXISTS subqueries to join#95229
ref(releases): Remove DISTINCT query modifier and use EXISTS subqueries to join#95229cmanallen wants to merge 7 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #95229 +/- ##
===========================================
+ Coverage 56.19% 87.85% +31.66%
===========================================
Files 10476 10480 +4
Lines 605899 605751 -148
Branches 23711 23626 -85
===========================================
+ Hits 340484 532185 +191701
+ Misses 265057 73206 -191851
- Partials 358 360 +2 |
|
Re Cursor:
This is wrong. If environments can not be fetched the resource aborts 404. Even if it didn't abort environment names are only appended to the filter_params if all the environments are returned.
This is wrong. Previous query pattern checked if the environments key existed prior to applying filters.
This is wrong. Project ids will always be populated in this context. If not an exception is raised: sentry/src/sentry/api/bases/organization.py Lines 566 to 567 in 156f296 |
There was a problem hiding this comment.
Bug: Filter Functions Fail on Empty Parameters
The new filter_releases_by_projects and filter_releases_by_environments functions incorrectly return unfiltered results when their project_ids or environment_ids parameters are empty, instead of the expected no results. This change in behavior means an empty project or environment filter now returns all releases for the organization, potentially exposing unauthorized data. This can occur if environment_objects is not populated or if invalid environment names are provided, leading to an empty environment_ids list.
src/sentry/models/release.py#L824-L857
sentry/src/sentry/models/release.py
Lines 824 to 857 in cf98bc8
src/sentry/api/endpoints/organization_releases.py#L315-L320
sentry/src/sentry/api/endpoints/organization_releases.py
Lines 315 to 320 in cf98bc8
Was this report helpful? Give feedback by reacting with 👍 or 👎
|
This pull request has gone three weeks without activity. In another week, I will close it. But! If you comment or otherwise update it, I will reset the clock, and if you add the label "A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀 |
|
This pull request has gone three weeks without activity. In another week, I will close it. But! If you comment or otherwise update it, I will reset the clock, and if you add the label "A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀 |
Problem statement:
Long running queries are being canceled due to timeouts. Somewhat amazingly DataDog does not report a cost associated with the sort and distinct nodes. I find this hard to believe so I've removed the DISTINCT modifier by using EXISTS subqueries to keep the source dataset de-duplicated. The EXISTS subquery should be a lateral move in terms of performance because both query patterns use a 'Nested Loop Join'. Removing the DISTINCT modifier should be a net benefit all things considered.
Additional Notes:
There are multiple query patterns present in this endpoint. I've document two samples below. This change always lowers the minimum cost to execute the query while mostly not changing the maximum cost. However in some cases the maximum cost is higher. This is not as clear of a win as we saw in #95135. We'll let it ride for a while and monitor its performance. If we see noticeable outliers we may revert but it would have to be in excess of the outliers we see with the current query.
Previous Query Plan Sample (test database):
New Query Plan Sample (test database):