-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
ref(reprocessing): Retain issue data by migrating associated models over #21609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
39632bc
a12e1a9
2df6788
3c76f09
24de9cf
8f27081
d96205d
877408c
819ac42
5b1c8c0
88b1311
3cbbfda
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| # -*- coding: utf-8 -*- | ||
| # Generated by Django 1.11.28 on 2020-10-27 18:48 | ||
| from __future__ import unicode_literals | ||
|
|
||
| from django.db import migrations | ||
| import sentry.db.models.fields.bounded | ||
|
|
||
|
|
||
| class Migration(migrations.Migration): | ||
| # This flag is used to mark that a migration shouldn't be automatically run in | ||
| # production. We set this to True for operations that we think are risky and want | ||
| # someone from ops to run manually and monitor. | ||
| # General advice is that if in doubt, mark your migration as `is_dangerous`. | ||
| # Some things you should always mark as dangerous: | ||
| # - Large data migrations. Typically we want these to be run manually by ops so that | ||
| # they can be monitored. Since data migrations will now hold a transaction open | ||
| # this is even more important. | ||
| # - Adding columns to highly active tables, even ones that are NULL. | ||
| is_dangerous = False | ||
|
|
||
| # This flag is used to decide whether to run this migration in a transaction or not. | ||
| # By default we prefer to run in a transaction, but for migrations where you want | ||
| # to `CREATE INDEX CONCURRENTLY` this needs to be set to False. Typically you'll | ||
| # want to create an index concurrently when adding one to an existing table. | ||
| atomic = False | ||
jan-auer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| dependencies = [ | ||
| ('sentry', '0116_backfill_debug_file_checksum'), | ||
jan-auer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ] | ||
|
|
||
| operations = [ | ||
| migrations.AlterField( | ||
| model_name='activity', | ||
| name='type', | ||
| field=sentry.db.models.fields.bounded.BoundedPositiveIntegerField(choices=[(1, 'set_resolved'), (15, 'set_resolved_by_age'), (13, 'set_resolved_in_release'), (16, 'set_resolved_in_commit'), (21, 'set_resolved_in_pull_request'), (2, 'set_unresolved'), (3, 'set_ignored'), (4, 'set_public'), (5, 'set_private'), (6, 'set_regression'), (7, 'create_issue'), (8, 'note'), (9, 'first_seen'), (10, 'release'), (11, 'assigned'), (12, 'unassigned'), (14, 'merge'), (17, 'deploy'), (18, 'new_processing_issues'), (19, 'unmerge_source'), (20, 'unmerge_destination'), (22, 'reprocess')]), | ||
| ), | ||
| ] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,22 +4,27 @@ | |
| import hashlib | ||
| import logging | ||
| import sentry_sdk | ||
| import six | ||
|
|
||
| from django.conf import settings | ||
|
|
||
| from sentry import nodestore, features, eventstore | ||
| from sentry.attachments import CachedAttachment, attachment_cache | ||
| from sentry.models import EventAttachment | ||
| from sentry import models | ||
| from sentry.utils import snuba | ||
| from sentry.utils.cache import cache_key_for_event | ||
| from sentry.utils.redis import redis_clusters | ||
| from sentry.eventstore.processing import event_processing_store | ||
| from sentry.deletions.defaults.group import GROUP_RELATED_MODELS | ||
|
|
||
| logger = logging.getLogger("sentry.reprocessing") | ||
|
|
||
| _REDIS_SYNC_TTL = 3600 | ||
|
|
||
|
|
||
| GROUP_MODELS_TO_MIGRATE = GROUP_RELATED_MODELS + (models.Activity,) | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After #21611 this addition should not be necessary |
||
|
|
||
|
|
||
| def _generate_unprocessed_event_node_id(project_id, event_id): | ||
| return hashlib.md5( | ||
| u"{}:{}:unprocessed".format(project_id, event_id).encode("utf-8") | ||
|
|
@@ -101,7 +106,7 @@ def reprocess_event(project_id, event_id, start_time): | |
| cache_key = event_processing_store.store(data) | ||
|
|
||
| # Step 2: Copy attachments into attachment cache | ||
| queryset = EventAttachment.objects.filter( | ||
| queryset = models.EventAttachment.objects.filter( | ||
| project_id=project_id, event_id=orig_event_id | ||
| ).select_related("file") | ||
|
|
||
|
|
@@ -202,16 +207,47 @@ def mark_event_reprocessed(data): | |
| _get_sync_redis_client().decr(key) | ||
|
|
||
|
|
||
| def start_group_reprocessing(project_id, group_id, max_events=None): | ||
| from sentry.models.group import Group, GroupStatus | ||
| from sentry.models.grouphash import GroupHash | ||
| def start_group_reprocessing(project_id, group_id, max_events=None, acting_user_id=None): | ||
| from django.db import transaction | ||
|
|
||
| with transaction.atomic(): | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. btw this might be a pretty large transaction, it touches a lot of tables at once.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's follow up on this please. I'm pretty sure we'll want to back this out.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem is that I would really not like to fail halfway through, because that's quite a mess to clean up. Perhaps we want some shitty kind of "clientside rollback", I think I saw that someplace else already. |
||
| Group.objects.filter(id=group_id).update(status=GroupStatus.REPROCESSING) | ||
| # Remove all grouphashes such that new events get sorted into a | ||
| # different group. | ||
| GroupHash.objects.filter(group_id=group_id).delete() | ||
| group = models.Group.objects.get(id=group_id) | ||
| original_status = group.status | ||
| original_short_id = group.short_id | ||
| group.status = models.GroupStatus.REPROCESSING | ||
| # satisfy unique constraint of (project_id, short_id) | ||
| # we manually tested that multiple groups with (project_id=1, | ||
| # short_id=null) can exist in postgres | ||
| group.short_id = None | ||
| group.save() | ||
|
|
||
| # Create a duplicate row that has the same attributes by nulling out | ||
| # the primary key and saving | ||
| group.pk = group.id = None | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be worth having a comment here that you're tricking django into doing an INSERT.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
| new_group = group # rename variable just to avoid confusion | ||
| del group | ||
| new_group.status = original_status | ||
| new_group.short_id = original_short_id | ||
| # this will be incremented by the events that are reprocessed | ||
| new_group.times_seen = 0 | ||
| new_group.save() | ||
|
|
||
| for model in GROUP_MODELS_TO_MIGRATE: | ||
| model.objects.filter(group_id=group_id).update(group_id=new_group.id) | ||
|
|
||
| models.GroupRedirect.objects.create( | ||
| organization_id=new_group.project.organization_id, | ||
| group_id=new_group.id, | ||
| previous_group_id=group_id, | ||
| ) | ||
|
|
||
| models.Activity.objects.create( | ||
| type=models.Activity.REPROCESS, | ||
| project=new_group.project, | ||
| ident=six.text_type(group_id), | ||
| group=new_group, | ||
| user_id=acting_user_id, | ||
| ) | ||
|
|
||
| # Get event counts of issue (for all environments etc). This was copypasted | ||
| # and simplified from groupserializer. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Since this is now used outside of deletions, can we move this to a better place, for instance somewhere in
sentry.models? This could even be onGroupitself.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the name this is still related to group deletion and the order of items is specific to deletion. I was also considering duplicating the list like group merge/unmerge does. Let me know what you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I'll leave this up to you. I found it interesting that this is being imported from reprocessing to copy over information rather than delete it, that's all.