Skip to content

Conversation

@mitsuhiko
Copy link
Contributor

@mitsuhiko mitsuhiko commented Nov 20, 2018

We used to remove some data before storing from the data blob and move it onto group
and event exclusively. This is making a lot of processing work harder because some
data needs to be looked up in the original data.

This is slightly wasteful potentially but it also has the chance to clean up some stuff
going forward. For instance culprit in the event export right now is always the
group culprit and not the real event culprit.

This also changes id to event_id in the JSON export which is the actually correct
key.

#sync-getsentry

@mitsuhiko
Copy link
Contributor Author

Additionally this forces normalization for all code that uses save which found a bunch of bad test cases.

try:
manager = EventManager(data)
event = manager.save(project_id)
event = manager.save(project_id, assume_normalized=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is becoming slightly dangerous now because nothing is asserting that only normalized data goes in the queue and all tests will normalize with this. If there is a bug, we can only find it in production.

This is not a problem of your change, though it is uncovered now. I just have no idea how to fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already the case.

platform = data.get('platform')

recorded_timestamp = data.pop('timestamp')
recorded_timestamp = data.get('timestamp')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter that the event_id and timestamp getters are becoming safe? They would throw a KeyError before, and I'm not sure if that would catch anything of value before it caused problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no access to these attributes that would put stuff in.

def as_dict(self):
# We use a OrderedDict to keep elements ordered for a potential JSON serializer
data = OrderedDict()
data['id'] = self.event_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the meaning/repercussions of changing this (and the test), can you explain?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is only used in the JSON export. Neither UI nor plugins depend on this. The idea here is to make that output compatible to ingestion, so we can easily take such a payload and process it again, etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mitsuhiko @jan-auer for changing this. It definitely annoyed me before that we were using id instead of event_id in this serialization but I had just assumed that changing it would blow up everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Etching ever closer to being able to re-ingest off to_json :)

Copy link
Member

@jan-auer jan-auer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except the tiny bug in as_dict

@mitsuhiko
Copy link
Contributor Author

Small last changes: site and server_name are now deprecated attributes which are pushed into tags in normalization. The other attributes are actually the canonical ones we expect and are just duplicated into tags.

@untitaker untitaker force-pushed the feature/normalization-forced branch from d2587fc to 2233623 Compare November 21, 2018 10:50
@mitsuhiko mitsuhiko merged commit f643bcc into master Nov 21, 2018
@mitsuhiko mitsuhiko deleted the feature/normalization-forced branch November 21, 2018 14:07
jan-auer added a commit that referenced this pull request Nov 21, 2018
* master:
  ref: No longer prune duplicated data before persisting for future refactorings (#10683)
@github-actions github-actions bot locked and limited conversation to collaborators Dec 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants