-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
ref: No longer prune duplicated data before persisting for future refactorings #10683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Additionally this forces normalization for all code that uses save which found a bunch of bad test cases. |
| try: | ||
| manager = EventManager(data) | ||
| event = manager.save(project_id) | ||
| event = manager.save(project_id, assume_normalized=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is becoming slightly dangerous now because nothing is asserting that only normalized data goes in the queue and all tests will normalize with this. If there is a bug, we can only find it in production.
This is not a problem of your change, though it is uncovered now. I just have no idea how to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already the case.
| platform = data.get('platform') | ||
|
|
||
| recorded_timestamp = data.pop('timestamp') | ||
| recorded_timestamp = data.get('timestamp') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter that the event_id and timestamp getters are becoming safe? They would throw a KeyError before, and I'm not sure if that would catch anything of value before it caused problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no access to these attributes that would put stuff in.
| def as_dict(self): | ||
| # We use a OrderedDict to keep elements ordered for a potential JSON serializer | ||
| data = OrderedDict() | ||
| data['id'] = self.event_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the meaning/repercussions of changing this (and the test), can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is only used in the JSON export. Neither UI nor plugins depend on this. The idea here is to make that output compatible to ingestion, so we can easily take such a payload and process it again, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mitsuhiko @jan-auer for changing this. It definitely annoyed me before that we were using id instead of event_id in this serialization but I had just assumed that changing it would blow up everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Etching ever closer to being able to re-ingest off to_json :)
jan-auer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except the tiny bug in as_dict
|
Small last changes: |
d2587fc to
2233623
Compare
* master: ref: No longer prune duplicated data before persisting for future refactorings (#10683)
We used to remove some data before storing from the data blob and move it onto group
and event exclusively. This is making a lot of processing work harder because some
data needs to be looked up in the original data.
This is slightly wasteful potentially but it also has the chance to clean up some stuff
going forward. For instance culprit in the event export right now is always the
group culprit and not the real event culprit.
This also changes
idtoevent_idin the JSON export which is the actually correctkey.
#sync-getsentry