ref: No longer prune duplicated data before persisting for future refactorings #10683

mitsuhiko · 2018-11-20T14:57:20Z

We used to remove some data before storing from the data blob and move it onto group
and event exclusively. This is making a lot of processing work harder because some
data needs to be looked up in the original data.

This is slightly wasteful potentially but it also has the chance to clean up some stuff
going forward. For instance culprit in the event export right now is always the
group culprit and not the real event culprit.

This also changes id to event_id in the JSON export which is the actually correct
key.

#sync-getsentry

…actorings

mitsuhiko · 2018-11-20T14:58:03Z

Additionally this forces normalization for all code that uses save which found a bunch of bad test cases.

src/sentry/models/event.py

jan-auer · 2018-11-20T16:50:49Z

src/sentry/tasks/store.py

    try:
        manager = EventManager(data)
-        event = manager.save(project_id)
+        event = manager.save(project_id, assume_normalized=True)


This is becoming slightly dangerous now because nothing is asserting that only normalized data goes in the queue and all tests will normalize with this. If there is a bug, we can only find it in production.

This is not a problem of your change, though it is uncovered now. I just have no idea how to fix this.

This is already the case.

bretthoerner · 2018-11-20T16:12:05Z

src/sentry/event_manager.py

+        platform = data.get('platform')

-        recorded_timestamp = data.pop('timestamp')
+        recorded_timestamp = data.get('timestamp')


Does it matter that the event_id and timestamp getters are becoming safe? They would throw a KeyError before, and I'm not sure if that would catch anything of value before it caused problems.

There is no access to these attributes that would put stuff in.

bretthoerner · 2018-11-20T16:14:07Z

src/sentry/models/event.py

    def as_dict(self):
        # We use a OrderedDict to keep elements ordered for a potential JSON serializer
        data = OrderedDict()
-        data['id'] = self.event_id


I don't fully understand the meaning/repercussions of changing this (and the test), can you explain?

This code is only used in the JSON export. Neither UI nor plugins depend on this. The idea here is to make that output compatible to ingestion, so we can easily take such a payload and process it again, etc

Ah, got it.

Thank you @mitsuhiko @jan-auer for changing this. It definitely annoyed me before that we were using id instead of event_id in this serialization but I had just assumed that changing it would blow up everything.

Etching ever closer to being able to re-ingest off to_json :)

jan-auer

LGTM, except the tiny bug in as_dict

mitsuhiko · 2018-11-20T23:38:11Z

Small last changes: site and server_name are now deprecated attributes which are pushed into tags in normalization. The other attributes are actually the canonical ones we expect and are just duplicated into tags.

* master: ref: No longer prune duplicated data before persisting for future refactorings (#10683)

ref: No longer prune duplicated data before persisting for future ref…

fab2a2e

…actorings

mitsuhiko requested review from bretthoerner and jan-auer November 20, 2018 14:57

jan-auer reviewed Nov 20, 2018

View reviewed changes

bretthoerner reviewed Nov 20, 2018

View reviewed changes

jan-auer requested changes Nov 20, 2018

View reviewed changes

mitsuhiko added 2 commits November 21, 2018 00:35

ref: Separate out clear legacy attributes

f66b018

Merge branch 'master' into feature/normalization-forced

000c740

bretthoerner approved these changes Nov 20, 2018

View reviewed changes

jan-auer approved these changes Nov 21, 2018

View reviewed changes

mitsuhiko and others added 3 commits November 21, 2018 11:23

feat: Chop off sentry: to json payload in tags

d034ac5

ref: Support non integer levels in store

9a0d6ea

build: empty commit to trigger sync bot

2233623

untitaker force-pushed the feature/normalization-forced branch from d2587fc to 2233623 Compare November 21, 2018 10:50

ref: Fix a bad code path in normalize

983d01c

mitsuhiko merged commit f643bcc into master Nov 21, 2018

mitsuhiko deleted the feature/normalization-forced branch November 21, 2018 14:07

jan-auer added a commit that referenced this pull request Nov 21, 2018

Merge branch 'master' into ref/interface-meta

417624a

* master: ref: No longer prune duplicated data before persisting for future refactorings (#10683)

github-actions bot locked and limited conversation to collaborators Dec 21, 2020

Uh oh!

ref: No longer prune duplicated data before persisting for future refactorings #10683

ref: No longer prune duplicated data before persisting for future refactorings #10683

Uh oh!

Conversation

mitsuhiko commented Nov 20, 2018 • edited by untitaker Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mitsuhiko commented Nov 20, 2018

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan-auer left a comment

Choose a reason for hiding this comment

Uh oh!

mitsuhiko commented Nov 20, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mitsuhiko commented Nov 20, 2018 •

edited by untitaker

Loading