fix(deletions): Performance issue when selecting rows #102960

armenzg · 2025-11-07T16:25:03Z

Using order_by() requires getting all rows and ordering them, thus, making the query slower and fail when we have millions of rows.

Fixes SENTRY-5C36.

Using order_by() requires getting all rows and ordering them, thus, making the query slower and fail when we have millions of rows. Fixes [SENTRY-5C36](https://sentry.sentry.io/issues/7006347860/).

cursor · 2025-11-07T16:27:11Z

src/sentry/deletions/defaults/group.py

        metrics.incr("deletions.group_hash_metadata.rows_updated", amount=updated, sample_rate=1.0)

-        last_max_id = max(batch_metadata_ids)
+        last_max_id = batch_metadata_ids[-1]  # Last element after sorting


Bug: Cursor-based pagination broken by missing ORDER BY

Removing .order_by("id") from the query breaks cursor-based pagination. Without ORDER BY, the database returns rows in arbitrary order, not necessarily the lowest IDs. After sorting in Python and advancing the cursor with last_max_id = batch_metadata_ids[-1], any IDs between the previous cursor and the new max ID that weren't in the arbitrary batch get permanently skipped, leaving orphaned GroupHashMetadata rows that reference deleted GroupHash rows.

I will follow-up on this.

The bug mentioned here is true, however, even if we're missing some hashes we should catch it as part of the ORM deletion here:

sentry/src/sentry/deletions/defaults/group.py

Line 389 in 45c8473

GroupHash.objects.filter(id__in=hash_ids).delete()

If this is not the case, we can have another function that does the update with sorted IDs after this one completes.

So far I have the deletion of a group with over 1M hashes humming along for the last 20 minutes.

wedamija · 2025-11-07T19:33:58Z

src/sentry/deletions/defaults/group.py

        metrics.incr("deletions.group_hash_metadata.rows_updated", amount=updated, sample_rate=1.0)

-        last_max_id = max(batch_metadata_ids)
+        last_max_id = batch_metadata_ids[-1]  # Last element after sorting


Do we need last_max_id at all here? Can't we just select rows until nothing is left?

You're very right. I will fix it today.

In #102960, I added a more optimal query for querying GroupHashMetadata, however, there's no need to track `last_max_id` since as we update rows there will be less rows to select from. Fixes [SENTRY-5C2V](https://sentry.sentry.io/issues/7005021677/).

In #102960, I added new query for querying GroupHashMetadata, however, using `id` with `seer_matched_grouphash_id__in` requires a composite index which we don't have. Even without that, we don't actually need to use `last_max_id` to keep getting rows and updating them. Fixes [SENTRY-5C2V](https://sentry.sentry.io/issues/7005021677/).

Using order_by() requires getting all rows and ordering them, thus, making the query slower and fail when we have millions of rows. Fixes [SENTRY-5C36](https://sentry.sentry.io/issues/7006347860/).

In #102960, I added new query for querying GroupHashMetadata, however, using `id` with `seer_matched_grouphash_id__in` requires a composite index which we don't have. Even without that, we don't actually need to use `last_max_id` to keep getting rows and updating them. Fixes [SENTRY-5C2V](https://sentry.sentry.io/issues/7005021677/).

Using order_by() requires getting all rows and ordering them, thus, making the query slower and fail when we have millions of rows. Fixes [SENTRY-5C36](https://sentry.sentry.io/issues/7006347860/).

In #102960, I added new query for querying GroupHashMetadata, however, using `id` with `seer_matched_grouphash_id__in` requires a composite index which we don't have. Even without that, we don't actually need to use `last_max_id` to keep getting rows and updating them. Fixes [SENTRY-5C2V](https://sentry.sentry.io/issues/7005021677/).

fix(deletions): Performance issue when selecting rows

6d4b5ab

Using order_by() requires getting all rows and ordering them, thus, making the query slower and fail when we have millions of rows. Fixes [SENTRY-5C36](https://sentry.sentry.io/issues/7006347860/).

armenzg self-assigned this Nov 7, 2025

armenzg requested a review from a team as a code owner November 7, 2025 16:25

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 7, 2025

armenzg enabled auto-merge (squash) November 7, 2025 16:25

roggenkemper approved these changes Nov 7, 2025

View reviewed changes

cursor bot reviewed Nov 7, 2025

View reviewed changes

vercel bot deployed to Preview November 7, 2025 16:27 View deployment

armenzg disabled auto-merge November 7, 2025 16:36

armenzg merged commit 4499d36 into master Nov 7, 2025
66 checks passed

armenzg deleted the 11_07/improve_perf/armenzg branch November 7, 2025 17:12

wedamija reviewed Nov 7, 2025

View reviewed changes

This was referenced Nov 10, 2025

fix(deletions): Only use seer_matched_grouphash to filter #103051

Merged

dev(agents): Guidance on options and slow DB queries #103059

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(deletions): Performance issue when selecting rows #102960

fix(deletions): Performance issue when selecting rows #102960

Uh oh!

armenzg commented Nov 7, 2025 •

edited

Loading

Uh oh!

cursor bot Nov 7, 2025

Uh oh!

armenzg Nov 7, 2025

Uh oh!

armenzg Nov 7, 2025

Uh oh!

Uh oh!

wedamija Nov 7, 2025

Uh oh!

armenzg Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

fix(deletions): Performance issue when selecting rows #102960

fix(deletions): Performance issue when selecting rows #102960

Uh oh!

Conversation

armenzg commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot Nov 7, 2025

Choose a reason for hiding this comment

Bug: Cursor-based pagination broken by missing ORDER BY

Uh oh!

armenzg Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

armenzg Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wedamija Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

armenzg Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

armenzg commented Nov 7, 2025 •

edited

Loading