New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(deletions): Trigger nodestore deletions from group deletions #15065
Conversation
When we stop writing events to postgres, we will no longer be able to trigger nodestore deletions from an event.
from ..base import BaseDeletionTask, BaseRelation, ModelDeletionTask, ModelRelation | ||
|
||
|
||
class GroupNodeDeletionTask(BaseDeletionTask): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be expanded to handle EventAttachment in future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also want to handle UserReports this way, since we don't always write a group_id when these are received
src/sentry/deletions/__init__.py
Outdated
@@ -24,6 +24,7 @@ | |||
from __future__ import absolute_import | |||
|
|||
from .base import BulkModelDeletionTask, ModelDeletionTask, ModelRelation # NOQA | |||
from .defaults.group import GroupNodeDeletionTask # NOQA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for tests :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate? What is the issue there ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the easiest way I could find to overwrite the DEFAULT_CHUNK_SIZE value, so that we would also be running through the batching logic in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not following why you cannot just import GroupNodeDeletionTask in the test and overwrite?
But if this is a parameter you want to overwrite, why not making it an option ? It will be very hard to understand (later) why this import is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I think i messed up the import path before. Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please provide some details about how the test plan worked ?
@@ -36,6 +89,15 @@ def get_child_relations(self, instance): | |||
|
|||
relations.extend([ModelRelation(m, {"group_id": instance.id}) for m in model_list]) | |||
|
|||
relations.extend( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The event deletion would still run and thus try to delete the same nodes on nodestore. Would that make the nodestore deletion fail ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is safe to delete the same node twice from nodestore. We are already performing 2 deletes from nodestore right now (which will soon both go away) - one triggered by event deletion https://github.com/getsentry/sentry/blob/master/src/sentry/deletions/defaults/event.py#L8-L15 and one by nodefield deletion https://github.com/getsentry/sentry/blob/master/src/sentry/db/models/fields/node.py#L184).
node_id = Event.generate_node_id(self.project_id, event.id) | ||
node_ids.append(node_id) | ||
|
||
nodestore.delete_multi(node_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this fails half way through and throws an exception?
Specifically:
- is delete_multi an atomic operation or can it fail with half the node deleted and half not?
- if this fails, how do we retry the deletion ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete_multi
may result in partial deletes, although it won't raise an exception. For the bigtable implementation, the SDK we use has some default retry logic that will be used, but we don't have a way of knowing when it has reached the retry deadline and given up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for delete-multi.
Still this task can throw for other reasons. Do we retry in those cases ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, i think we run Celery deletion tasks with infinite retry if any kind of exception is raised.
events = eventstore.get_events( | ||
filter=eventstore.Filter( | ||
conditions=conditions, project_ids=[self.project_id], group_ids=[self.group_id] | ||
), | ||
limit=self.DEFAULT_CHUNK_SIZE, | ||
referrer="deletions.group", | ||
orderby=["-timestamp", "-event_id"], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we guaranteed that this operation happens before clickhouse has dropped the events for retention ? What about failures ? How does the retry policy works and is the deletion retried before the events are dropped by snuba ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe nodestore values will expire after the retention period, so we only need to cover the case where groups are deleted via a user action prior to the retention period end.
src/sentry/deletions/__init__.py
Outdated
@@ -24,6 +24,7 @@ | |||
from __future__ import absolute_import | |||
|
|||
from .base import BulkModelDeletionTask, ModelDeletionTask, ModelRelation # NOQA | |||
from .defaults.group import GroupNodeDeletionTask # NOQA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate? What is the issue there ?
Since we are only adding a nodestore deletion here (and not yet removing from the original place where this happens), the test is going to pass regardless of the code being added here. I also ran the tests locally omitting this line to make sure that deletion is also being performed from group deletions now https://github.com/getsentry/sentry/blob/master/src/sentry/deletions/defaults/group.py#L34 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's resolve the import issue, otherwise it seems fine
src/sentry/deletions/__init__.py
Outdated
@@ -24,6 +24,7 @@ | |||
from __future__ import absolute_import | |||
|
|||
from .base import BulkModelDeletionTask, ModelDeletionTask, ModelRelation # NOQA | |||
from .defaults.group import GroupNodeDeletionTask # NOQA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not following why you cannot just import GroupNodeDeletionTask in the test and overwrite?
But if this is a parameter you want to overwrite, why not making it an option ? It will be very hard to understand (later) why this import is here.
node_id = Event.generate_node_id(self.project_id, event.id) | ||
node_ids.append(node_id) | ||
|
||
nodestore.delete_multi(node_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for delete-multi.
Still this task can throw for other reasons. Do we retry in those cases ?
node_ids = [] | ||
for event in events: | ||
node_id = Event.generate_node_id(self.project_id, event.id) | ||
node_ids.append(node_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, I think this works well as a list comprehension
node_ids = [Event.generate_node_id(self.project_id, event.id) for event in events]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it
self.store_event( | ||
data={ | ||
"event_id": event_id2, | ||
"timestamp": iso_format(before_now(minutes=1)), | ||
"fingerprint": ["group1"], | ||
}, | ||
project_id=project.id, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd advice you added a third event with a different fingerprint and assert that event is not removed from nodestore.
When we stop writing events to postgres, we will no longer be able to
trigger nodestore deletions from an event.