chore(alerts): Clean up old metric alert rows in NotificationMessage#115647
Conversation
|
This PR has a migration; here is the generated SQL for for --
-- Raw Python operation
--
-- THIS OPERATION CANNOT BE WRITTEN AS SQL |
wedamija
left a comment
There was a problem hiding this comment.
lgtm, most recent row with these columns null is from 2025-10-08 21:19
| for row in RangeQuerySetWrapperWithProgressBar( | ||
| NotificationMessage.objects.filter(incident__isnull=False, trigger_action__isnull=False) | ||
| ): | ||
| row.delete() |
There was a problem hiding this comment.
This will hopefully work ok, something using RangeQuerySetWrapperWithProgressBar can fail due to there not being an explicit index on incident, trigger_action, id.
Btw, I noticed that we never expire data from this table. Should we be doing that?
There was a problem hiding this comment.
Probably yes, this table is massive and only used for threading message replies in Slack. I can look into getting that set up.
There was a problem hiding this comment.
@wedamija Is there something else I can use besides RangeQuerySetWrapperWithProgressBar? The test I added keeps failing because it's too slow so I'm worried about timeouts when I actually apply this. Maybe I should batch the deletes?
There was a problem hiding this comment.
I doubt that's the reason the test is failing... There aren't that many rows in the table afaik. Does it pass locally? It could also just be worth a retry
There was a problem hiding this comment.
Yeah it passes locally, I'll retry it again.
There was a problem hiding this comment.
Probably yes, this table is massive and only used for threading message replies in Slack. I can look into getting that set up.
To do that we'll need to add an index on date_added, or whatever we should use to ttl these out. I'm not sure if there's any reason we don't do this already... Like if we need a reference to old notification messages for any reason?
| dependencies = [ | ||
| ("notifications", "0008_remove_metric_alert_constraints_notificationmessage"), |
There was a problem hiding this comment.
I wonder if the reason the test is slow is something to do with the dependencies. Maybe we should make sure this depends on the latest sentry migration too? Just a random guess...
There was a problem hiding this comment.
Trying that now, if it still fails I'll try a batched approach
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit cc8fba7. Configure here.
| for row in RangeQuerySetWrapperWithProgressBar( | ||
| NotificationMessage.objects.filter(incident__isnull=False, trigger_action__isnull=False) | ||
| ): | ||
| row.delete() |
There was a problem hiding this comment.
Row-by-row deletion on massive table causes extreme slowness
High Severity
The migration deletes rows one at a time via row.delete() inside a RangeQuerySetWrapperWithProgressBar loop. On a table described as "massive," this issues a separate SQL DELETE statement per row, which is extremely slow and risks timeouts. The codebase already provides bulk_delete_objects in sentry.utils.query that efficiently batch-deletes using subquery-based DELETE ... WHERE id IN (SELECT ...). A batched approach would reduce millions of individual queries to a manageable number of batch operations.
Reviewed by Cursor Bugbot for commit cc8fba7. Configure here.
Backend Test FailuresFailures on
|


We have not written metric alert rows (those with
incidentandtrigger_action) since October of last year and only read the workflow engine rows (those withactionandgroup). To prep for removing the metric alert rows and making the workflow engine ones not nullable, we must first clean up all the old rows.