New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(deletions): Trigger nodestore deletions from group deletions #15065
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,59 @@ | ||
from __future__ import absolute_import, print_function | ||
|
||
from ..base import ModelDeletionTask, ModelRelation | ||
from sentry import eventstore, nodestore | ||
from sentry.models import Event | ||
|
||
from ..base import BaseDeletionTask, BaseRelation, ModelDeletionTask, ModelRelation | ||
|
||
|
||
class GroupNodeDeletionTask(BaseDeletionTask): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will be expanded to handle EventAttachment in future There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We might also want to handle UserReports this way, since we don't always write a group_id when these are received |
||
""" | ||
Deletes nodestore data for group | ||
""" | ||
|
||
DEFAULT_CHUNK_SIZE = 10000 | ||
|
||
def __init__(self, manager, group_id, project_id, **kwargs): | ||
self.group_id = group_id | ||
self.project_id = project_id | ||
self.last_event = None | ||
super(GroupNodeDeletionTask, self).__init__(manager, **kwargs) | ||
|
||
def chunk(self): | ||
conditions = [] | ||
if self.last_event is not None: | ||
conditions.extend( | ||
[ | ||
["timestamp", "<=", self.last_event.timestamp], | ||
[ | ||
["timestamp", "<", self.last_event.timestamp], | ||
["event_id", "<", self.last_event.event_id], | ||
], | ||
] | ||
) | ||
|
||
events = eventstore.get_events( | ||
filter=eventstore.Filter( | ||
conditions=conditions, project_ids=[self.project_id], group_ids=[self.group_id] | ||
), | ||
limit=self.DEFAULT_CHUNK_SIZE, | ||
referrer="deletions.group", | ||
orderby=["-timestamp", "-event_id"], | ||
) | ||
Comment on lines
+35
to
+42
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we guaranteed that this operation happens before clickhouse has dropped the events for retention ? What about failures ? How does the retry policy works and is the deletion retried before the events are dropped by snuba ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe nodestore values will expire after the retention period, so we only need to cover the case where groups are deleted via a user action prior to the retention period end. |
||
|
||
if not events: | ||
return False | ||
|
||
self.last_event = events[-1] | ||
|
||
node_ids = [] | ||
for event in events: | ||
node_id = Event.generate_node_id(self.project_id, event.id) | ||
node_ids.append(node_id) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit, I think this works well as a list comprehension
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like it |
||
|
||
nodestore.delete_multi(node_ids) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if this fails half way through and throws an exception?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok for delete-multi. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, i think we run Celery deletion tasks with infinite retry if any kind of exception is raised. |
||
|
||
return True | ||
|
||
|
||
class GroupDeletionTask(ModelDeletionTask): | ||
|
@@ -36,6 +89,15 @@ def get_child_relations(self, instance): | |
|
||
relations.extend([ModelRelation(m, {"group_id": instance.id}) for m in model_list]) | ||
|
||
relations.extend( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The event deletion would still run and thus try to delete the same nodes on nodestore. Would that make the nodestore deletion fail ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is safe to delete the same node twice from nodestore. We are already performing 2 deletes from nodestore right now (which will soon both go away) - one triggered by event deletion https://github.com/getsentry/sentry/blob/master/src/sentry/deletions/defaults/event.py#L8-L15 and one by nodefield deletion https://github.com/getsentry/sentry/blob/master/src/sentry/db/models/fields/node.py#L184). |
||
[ | ||
BaseRelation( | ||
{"group_id": instance.id, "project_id": instance.project_id}, | ||
GroupNodeDeletionTask, | ||
) | ||
] | ||
) | ||
|
||
return relations | ||
|
||
def delete_instance(self, instance): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,32 +13,46 @@ | |
ScheduledDeletion, | ||
UserReport, | ||
) | ||
from sentry import nodestore | ||
from sentry.deletions import GroupNodeDeletionTask | ||
from sentry.tasks.deletion import run_deletion | ||
from sentry.testutils import TestCase | ||
|
||
from sentry.testutils import TestCase, SnubaTestCase | ||
from sentry.testutils.helpers.datetime import iso_format, before_now | ||
|
||
|
||
class DeleteGroupTest(TestCase): | ||
class DeleteGroupTest(TestCase, SnubaTestCase): | ||
def test_simple(self): | ||
key = "key" | ||
value = "value" | ||
|
||
GroupNodeDeletionTask.DEFAULT_CHUNK_SIZE = 1 # test chunking logic | ||
event_id = "a" * 32 | ||
event_id2 = "b" * 32 | ||
project = self.create_project() | ||
node_id = Event.generate_node_id(project.id, event_id) | ||
node_id2 = Event.generate_node_id(project.id, event_id2) | ||
|
||
event = self.store_event( | ||
data={ | ||
"event_id": event_id, | ||
"tags": {key: value}, | ||
"tags": {"foo": "bar"}, | ||
"timestamp": iso_format(before_now(minutes=1)), | ||
"fingerprint": ["group1"], | ||
}, | ||
project_id=project.id, | ||
) | ||
|
||
self.store_event( | ||
data={ | ||
"event_id": event_id2, | ||
"timestamp": iso_format(before_now(minutes=1)), | ||
"fingerprint": ["group1"], | ||
}, | ||
project_id=project.id, | ||
) | ||
|
||
Comment on lines
+45
to
+53
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd advice you added a third event with a different fingerprint and assert that event is not removed from nodestore. |
||
group = event.group | ||
group.update(status=GroupStatus.PENDING_DELETION) | ||
|
||
project = self.create_project() | ||
group = self.create_group(project=project) | ||
event = self.create_event(group=group) | ||
|
||
UserReport.objects.create(group_id=group.id, project_id=event.project_id, name="Jane Doe") | ||
|
||
|
@@ -50,6 +64,9 @@ def test_simple(self): | |
deletion = ScheduledDeletion.schedule(group, days=0) | ||
deletion.update(in_progress=True) | ||
|
||
assert nodestore.get(node_id) | ||
assert nodestore.get(node_id2) | ||
|
||
with self.tasks(): | ||
run_deletion(deletion.id) | ||
|
||
|
@@ -58,3 +75,6 @@ def test_simple(self): | |
assert not GroupRedirect.objects.filter(group_id=group.id).exists() | ||
assert not GroupHash.objects.filter(group_id=group.id).exists() | ||
assert not Group.objects.filter(id=group.id).exists() | ||
|
||
assert not nodestore.get(node_id) | ||
assert not nodestore.get(node_id2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for tests :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate? What is the issue there ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the easiest way I could find to overwrite the DEFAULT_CHUNK_SIZE value, so that we would also be running through the batching logic in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not following why you cannot just import GroupNodeDeletionTask in the test and overwrite?
But if this is a parameter you want to overwrite, why not making it an option ? It will be very hard to understand (later) why this import is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I think i messed up the import path before. Removed.