Skip to content

ref(releases): Remove DISTINCT query modifier and use EXISTS subqueries to join#95229

Closed
cmanallen wants to merge 7 commits into
masterfrom
cmanallen/release-optimize-another-query
Closed

ref(releases): Remove DISTINCT query modifier and use EXISTS subqueries to join#95229
cmanallen wants to merge 7 commits into
masterfrom
cmanallen/release-optimize-another-query

Conversation

@cmanallen

Copy link
Copy Markdown
Member

Problem statement:

Long running queries are being canceled due to timeouts. Somewhat amazingly DataDog does not report a cost associated with the sort and distinct nodes. I find this hard to believe so I've removed the DISTINCT modifier by using EXISTS subqueries to keep the source dataset de-duplicated. The EXISTS subquery should be a lateral move in terms of performance because both query patterns use a 'Nested Loop Join'. Removing the DISTINCT modifier should be a net benefit all things considered.

Additional Notes:

There are multiple query patterns present in this endpoint. I've document two samples below. This change always lowers the minimum cost to execute the query while mostly not changing the maximum cost. However in some cases the maximum cost is higher. This is not as clear of a win as we saw in #95135. We'll let it ride for a while and monitor its performance. If we see noticeable outliers we may revert but it would have to be in excess of the outliers we see with the current query.

Previous Query Plan Sample (test database):

Unique  (cost=19.01..19.08 rows=1 width=1762)
  Output: sentry_release.id, sentry_release.organization_id, sentry_release.status, sentry_release.version, sentry_release.ref, sentry_release.url, sentry_release.date_added, sentry_release.date_started, sentry_release.date_released, sentry_release.data, sentry_release.owner_id, sentry_release.commit_count, sentry_release.last_commit_id, sentry_release.authors, sentry_release.total_deploys, sentry_release.last_deploy_id, sentry_release.package, sentry_release.major, sentry_release.minor, sentry_release.patch, sentry_release.revision, sentry_release.prerelease, sentry_release.build_code, sentry_release.build_number, sentry_release.user_agent, sentry_release.date_added
  ->  Sort  (cost=19.01..19.02 rows=1 width=1762)
        Output: sentry_release.id, sentry_release.organization_id, sentry_release.status, sentry_release.version, sentry_release.ref, sentry_release.url, sentry_release.date_added, sentry_release.date_started, sentry_release.date_released, sentry_release.data, sentry_release.owner_id, sentry_release.commit_count, sentry_release.last_commit_id, sentry_release.authors, sentry_release.total_deploys, sentry_release.last_deploy_id, sentry_release.package, sentry_release.major, sentry_release.minor, sentry_release.patch, sentry_release.revision, sentry_release.prerelease, sentry_release.build_code, sentry_release.build_number, sentry_release.user_agent, sentry_release.date_added
        Sort Key: sentry_release.date_added DESC, sentry_release.id, sentry_release.status, sentry_release.version, sentry_release.ref, sentry_release.url, sentry_release.date_started, sentry_release.date_released, sentry_release.data, sentry_release.owner_id, sentry_release.commit_count, sentry_release.last_commit_id, sentry_release.authors, sentry_release.total_deploys, sentry_release.last_deploy_id, sentry_release.package, sentry_release.major, sentry_release.minor, sentry_release.patch, sentry_release.revision, sentry_release.prerelease, sentry_release.build_code, sentry_release.build_number, sentry_release.user_agent
        ->  Nested Loop  (cost=0.29..19.00 rows=1 width=1762)
              Output: sentry_release.id, sentry_release.organization_id, sentry_release.status, sentry_release.version, sentry_release.ref, sentry_release.url, sentry_release.date_added, sentry_release.date_started, sentry_release.date_released, sentry_release.data, sentry_release.owner_id, sentry_release.commit_count, sentry_release.last_commit_id, sentry_release.authors, sentry_release.total_deploys, sentry_release.last_deploy_id, sentry_release.package, sentry_release.major, sentry_release.minor, sentry_release.patch, sentry_release.revision, sentry_release.prerelease, sentry_release.build_code, sentry_release.build_number, sentry_release.user_agent, sentry_release.date_added
              Inner Unique: true
              ->  Index Scan Backward using sentry_rele_organiz_4ed947_idx on public.sentry_release  (cost=0.14..8.16 rows=1 width=1754)
                    Output: sentry_release.id, sentry_release.organization_id, sentry_release.status, sentry_release.version, sentry_release.ref, sentry_release.url, sentry_release.date_added, sentry_release.date_started, sentry_release.date_released, sentry_release.data, sentry_release.owner_id, sentry_release.commit_count, sentry_release.last_commit_id, sentry_release.authors, sentry_release.total_deploys, sentry_release.last_deploy_id, sentry_release.package, sentry_release.major, sentry_release.minor, sentry_release.patch, sentry_release.revision, sentry_release.prerelease, sentry_release.build_code, sentry_release.build_number, sentry_release.user_agent
                    Index Cond: (sentry_release.organization_id = '4556390080839680'::bigint)
                    Filter: ((sentry_release.status = 0) OR (sentry_release.status IS NULL))
              ->  Index Only Scan using sentry_release_project_project_id_release_id_44ff55de_uniq on public.sentry_release_project  (cost=0.15..8.17 rows=1 width=8)
                    Output: sentry_release_project.project_id, sentry_release_project.release_id
                    Index Cond: ((sentry_release_project.project_id = '4556390080905217'::bigint) AND (sentry_release_project.release_id = sentry_release.id))

New Query Plan Sample (test database):

Nested Loop  (cost=0.29..19.00 rows=1 width=1762)
  Output: sentry_release.id, sentry_release.organization_id, sentry_release.status, sentry_release.version, sentry_release.ref, sentry_release.url, sentry_release.date_added, sentry_release.date_started, sentry_release.date_released, sentry_release.data, sentry_release.owner_id, sentry_release.commit_count, sentry_release.last_commit_id, sentry_release.authors, sentry_release.total_deploys, sentry_release.last_deploy_id, sentry_release.package, sentry_release.major, sentry_release.minor, sentry_release.patch, sentry_release.revision, sentry_release.prerelease, sentry_release.build_code, sentry_release.build_number, sentry_release.user_agent, sentry_release.date_added
  Inner Unique: true
  ->  Index Scan Backward using sentry_rele_organiz_4ed947_idx on public.sentry_release  (cost=0.14..8.16 rows=1 width=1754)
        Output: sentry_release.id, sentry_release.organization_id, sentry_release.status, sentry_release.version, sentry_release.ref, sentry_release.url, sentry_release.date_added, sentry_release.date_started, sentry_release.date_released, sentry_release.data, sentry_release.owner_id, sentry_release.commit_count, sentry_release.last_commit_id, sentry_release.authors, sentry_release.total_deploys, sentry_release.last_deploy_id, sentry_release.package, sentry_release.major, sentry_release.minor, sentry_release.patch, sentry_release.revision, sentry_release.prerelease, sentry_release.build_code, sentry_release.build_number, sentry_release.user_agent
        Index Cond: (sentry_release.organization_id = '4556390078349312'::bigint)
        Filter: ((sentry_release.status = 0) OR (sentry_release.status IS NULL))
  ->  Index Only Scan using sentry_release_project_project_id_release_id_44ff55de_uniq on public.sentry_release_project u0  (cost=0.15..8.17 rows=1 width=8)
        Output: u0.project_id, u0.release_id
        Index Cond: ((u0.project_id = '4556390078349314'::bigint) AND (u0.release_id = sentry_release.id))

@cmanallen cmanallen requested review from a team as code owners July 10, 2025 13:35
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Jul 10, 2025
@cmanallen cmanallen changed the title ref(releases): Optimize slow query ref(releases): Remove DISTINCT query modifier and use EXISTS subqueries to join Jul 10, 2025
cursor[bot]

This comment was marked as outdated.

@codecov

codecov Bot commented Jul 10, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/sentry/models/release.py 90.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #95229       +/-   ##
===========================================
+ Coverage   56.19%   87.85%   +31.66%     
===========================================
  Files       10476    10480        +4     
  Lines      605899   605751      -148     
  Branches    23711    23626       -85     
===========================================
+ Hits       340484   532185   +191701     
+ Misses     265057    73206   -191851     
- Partials      358      360        +2     

cursor[bot]

This comment was marked as outdated.

@cmanallen

Copy link
Copy Markdown
Member Author

Re Cursor:

If get_filter_params fails to populate environment_objects correctly, filtering will silently fail, returning unfiltered results.

This is wrong. If environments can not be fetched the resource aborts 404. Even if it didn't abort environment names are only appended to the filter_params if all the environments are returned.

This also changes behavior: an empty environment list now skips filtering entirely, whereas previously it would filter out all results.

This is wrong. Previous query pattern checked if the environments key existed prior to applying filters.

For the non-flatten case, the new filter_releases_by_projects() function now returns all releases when the project list is empty, instead of correctly returning no results.

This is wrong. Project ids will always be populated in this context. If not an exception is raised:

if not projects:
raise NoProjects

cursor[bot]

This comment was marked as outdated.

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Filter Functions Fail on Empty Parameters

The new filter_releases_by_projects and filter_releases_by_environments functions incorrectly return unfiltered results when their project_ids or environment_ids parameters are empty, instead of the expected no results. This change in behavior means an empty project or environment filter now returns all releases for the organization, potentially exposing unauthorized data. This can occur if environment_objects is not populated or if invalid environment names are provided, leading to an empty environment_ids list.

src/sentry/models/release.py#L824-L857

def filter_releases_by_projects(queryset: QuerySetAny, project_ids: list[int]):
"""Return releases belonging to a project."""
if not project_ids:
return queryset
return queryset.filter(
Exists(
ReleaseProject.objects.filter(
release=OuterRef("pk"),
project_id__in=project_ids,
)
)
)
def filter_releases_by_environments(
queryset: QuerySetAny,
project_ids: list[int],
environment_ids: list[int],
):
"""Return a release queryset filtered by environments."""
if not environment_ids:
return queryset
return queryset.filter(
Exists(
ReleaseProjectEnvironment.objects.filter(
release=OuterRef("pk"),
environment_id__in=environment_ids,
project_id__in=project_ids,
)
)
)

src/sentry/api/endpoints/organization_releases.py#L315-L320

queryset = Release.objects.filter(organization_id=organization.id)
queryset = filter_releases_by_environments(
queryset,
filter_params["project_id"],
[e.id for e in filter_params.get("environment_objects", [])],
)

Fix in CursorFix in Web


Was this report helpful? Give feedback by reacting with 👍 or 👎

@getsantry

getsantry Bot commented Aug 2, 2025

Copy link
Copy Markdown
Contributor

This pull request has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you add the label WIP, I will leave it alone unless WIP is removed ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

@getsantry getsantry Bot added Stale and removed Stale labels Aug 2, 2025
@getsantry

getsantry Bot commented Aug 25, 2025

Copy link
Copy Markdown
Contributor

This pull request has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you add the label WIP, I will leave it alone unless WIP is removed ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

@getsantry getsantry Bot added Stale and removed Stale labels Aug 25, 2025
@cmanallen cmanallen closed this Sep 15, 2025
@github-actions github-actions Bot locked and limited conversation to collaborators Oct 1, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant