-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
fix(deletions): Prevent timeouts when deleting GroupHash records #101720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The issues comes from this block: https://github.com/getsentry/sentry/blob/a3a771719d4777bd747d98fb05eb77c20425e3d6/src/sentry/deletions/defaults/group.py#L248-L259 The update is triggered because of this `on_delete`: https://github.com/getsentry/sentry/blob/b1f684a335128dbc74ad3a7fac1d7052df9e8f01/src/sentry/models/grouphashmetadata.py#L116-L118 Currently, when we try to delete all the group hashes, we update the related group hash metadata first. This query ends up failing for taking longer than 30 seconds: > SQL: UPDATE "sentry_grouphashmetadata" SET "seer_matched_grouphash_id" = NULL WHERE "sentry_grouphashmetadata"."seer_matched_grouphash_id" IN (%s, ..., %s) This can be resolved by deleting the group hash _metadata_ rows before trying to delete the group hash rows. This will avoid the update statement altogether. This fix was initially started in #101545, however, the solution has completely changed, thus, starting a new PR. Fixes [SENTRY-5ABJ](https://sentry.io/organizations/sentry/issues/6930113529/).
|
|
||
| iterations += 1 | ||
|
|
||
| if iterations == GROUP_HASH_ITERATIONS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by metric.
| GroupHashMetadata.objects.filter(grouphash_id__in=hash_ids).delete() | ||
| except Exception: | ||
| # XXX: Let's make sure that no issues are caused by this and then remove it | ||
| logger.exception("Error deleting group hash metadata") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once I enable the option, I would like to know if any problems are caused by this while falling back to the original behaviour rather than completely aborting the process.
|
|
||
| __repr__ = sane_repr("group_id", "hash") | ||
| __repr__ = sane_repr("group_id", "hash", "metadata") | ||
| __str__ = __repr__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes print statements during debugging actually useful.
| register( | ||
| "deletions.group-hashes-batch-size", | ||
| default=10000, | ||
| default=100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| flags=FLAG_AUTOMATOR_MODIFIABLE, | ||
| ) | ||
| register( | ||
| "deletions.group.delete_group_hashes_metadata_first", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will control the new behaviour.
| assert grouphash_a.metadata is not None | ||
| assert grouphash_a.metadata.seer_matched_grouphash is None | ||
| assert grouphash_b.metadata is not None | ||
| assert grouphash_b.metadata.seer_matched_grouphash == grouphash_a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This verifies that we have the column set to the first group hash. Not that we end up doing the update statement, however, I want to verify that we're testing the same code path.
| """ | ||
| Test that when deleting group hashes, the group hash metadata is deleted first (which will not update the references to the other group hashes) | ||
| """ | ||
| with self.options({"deletions.group.delete_group_hashes_metadata_first": True}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two tests are functionally the same.
This test avoids the update call.
|
bugbot run |
| # gh B -> ghm B -> gh C | ||
| # gh C -> ghm C -> gh A | ||
| # | ||
| # Deleting group hashes A, B & C (since they all point to the same group) will require: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Deleting group hashes A, B & C (since they all point to the same group) will require: | ||
| # * Updating columns ghmB & ghmC to point to None | ||
| # * Deleting the group hash metadata rows | ||
| # * Deleting the group hashes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this PR, the current approach is this:
- Start group hashes deletion
- Which triggers group hash metadata column updating
- Which then triggers deleting the group hash metadata rows
- Now that children group hash metadata are deleted we can delete the group hashes
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #101720 +/- ##
===========================================
+ Coverage 80.85% 80.95% +0.09%
===========================================
Files 8707 8707
Lines 387091 387104 +13
Branches 24524 24524
===========================================
+ Hits 313000 313393 +393
+ Misses 73743 73363 -380
Partials 348 348 |
| # Verify that seer matched event_b to event_a's hash | ||
| assert event_a.group_id == event_b.group_id | ||
| # Make sure it has not changed | ||
| assert grouphash_a.metadata is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The django ORM record won't change even if the underlying db record was changed. You could add grouphash_a.refresh_from_db() to ensure that you have the latest data from postgres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to push that change. I have it locally. I will make it part of my next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this in #101796
Issues attributed to commits in this pull requestThis pull request was merged and Sentry observed the following issues: |
This is part of the ThreadLeaks project and not part of the CI test runs. I can see my host name in there. |
Last week when I was debugging tests for #101720 it was confusing to find events and groups that had nothing to do with the tests I was working on. This refactor moves the majority of the logic from the setUp function to the first test since it's where its needed.
This was added in #101720 and it's not needed anymore.

The issues comes from this block:
sentry/src/sentry/deletions/defaults/group.py
Lines 248 to 259 in a3a7717
The update is triggered because of this
on_delete:sentry/src/sentry/models/grouphashmetadata.py
Lines 116 to 118 in b1f684a
Currently, when we try to delete all the group hashes, we update the related group hash metadata first. This query ends up failing for taking longer than 30 seconds:
This can be resolved by deleting the group hash metadata rows before trying to delete the group hash rows. This will avoid the update statement altogether.
This fix was initially started in #101545, however, the solution has completely changed, thus, starting a new PR.
Fixes SENTRY-5ABJ.