Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(canonical): Add a shim for canonical keys #8789

Merged
merged 26 commits into from
Jul 24, 2018
Merged

Conversation

jan-auer
Copy link
Member

@jan-auer jan-auer commented Jun 20, 2018

This PR migrates the legacy "sentry.interfaces.*" event data keys to their new canonical representation, i.e. short names. Only canonical names will be stored, but access is still possible via the legacy names in the entire codebase. This includes ingestion and processing.

A full list of renames is attached at the bottom.

I tried to keep this PR to a minimum, with more refactoring to happen afterwards. Especially EventManager code needs some serious clean up and I'd like to walk through the code base to remove every mention of "sentry.interfaces.*".

Mapping Types

The conversion is mostly handled by two shim types:

  • CanonicalKeyDict: A mutable mapping based a dict that will only store canonical keys internally. Its constructor already converts and deduplicates all legacy keys. It does not synchronize with its input data.
  • CanonicalKeyView: A read-only view on an underlying mapping. When iterating or accessing keys, it internally checks for the legacy name as well as the canonical name in both directions. Additionally, it guarantees to preserve order for ordered data structures. Length and iteration operate on deduplicated canonical keys. For performance reasons, __len__ is computed in the constructor, all other methods will operate on the underlying data.

Both wrappers prefer the canonical key in case both canonical and legacy keys are present in the input, regardless of the order. This also holds for OrderedDicts.

Event Model Changes

The event model has been changed to now have two properties:

  • event.node_data: An instance of NodeData that was originally stored at event.data. It contains the raw mutable data dict. Apart from event creation and the bind_nodes utility, it should never be used directly.
  • event.data: A memoized read-only CanonicalKeyView on node_data. Since this is used by most of the Sentry code base, including plugins that operate on Event models, they continue to work with their legacy keys. Mutation of the event data should not have been allowed before, with the only exception of the EventManager itself.

Warning: Since event.data is memoized, all calls to Event.objects.bind_nodes(event, 'node_data') must be made before the first access of event.objects. Otherwise, len(event) will return an invalid value. At the moment, this is the case in the entire codebase.

API and EventManager Changes

The EventManager used to rely on mutation of the data dict passed into its constructor. This is no longer the case, with the caveat that one must use the return value of normalize now. Apart from tests, there was only one affected location inside LazyData which was already updated in #8774.

manager = EventManager(data)
data = manager.normalize()

The EventManager now uses a mutable CanonicalKeyDict to store data internally and returns it from normalize. This value is assumed directly by LazyData and thus carries through the entire store API code, allowing it and plugins to continue using legacy keys. Finally, the EventManager sets it directly into the Event's NodeData. This is the only place where node_data does not contain a raw dict.

Also, some code like get_path or the schema validation had to be updated to accommodate a non-dict data, either by using isinstance(data, collections.Mapping) or by accessing data.data (the internal raw dict).

Processing and Store Changes

These jobs load event data without the Event model from the cache. Their entry points have been updated to wrap the returned data dict in a mutable CanonicalKeyDict. Since none of these jobs operates on Event models, no further changes were necessary.

Note that data no longer passes isinstance(data, dict).

Warning: To ensure a smooth rollout, save and processing workers should be deployed before web. Otherwise, incompatible data with canonical keys might reach old workers.

Next Steps

I'd not like this to go out right away because it has a high chance of breaking a lot. Ideally, we could deploy a small number of workers first before making the full change.

  • Validate assumptions and implications described above
  • Verify that plugins continue to work
  • Deploy save / process workers
  • Deploy web
  • Follow-up: Clean up / refactor all uses of "sentry.interfaces.*"

Renamed Keys

For reference, this is the mapping from legacy keys to canonical keys:

  • sentry.interfaces.Exception -> exception,
  • sentry.interfaces.Message -> logentry,
  • sentry.interfaces.Stacktrace -> stacktrace,
  • sentry.interfaces.Template -> template,
  • sentry.interfaces.Query -> query,
  • sentry.interfaces.Http -> request,
  • sentry.interfaces.User -> user,
  • sentry.interfaces.Csp -> csp,
  • sentry.interfaces.AppleCrashReport -> applecrashreport,
  • sentry.interfaces.Breadcrumbs -> breadcrumbs,
  • sentry.interfaces.Contexts -> contexts,
  • sentry.interfaces.Threads -> threads,
  • sentry.interfaces.DebugMeta -> debug_meta,

@jan-auer
Copy link
Member Author

@mattrobenolt @mitsuhiko I know it's kind of messy. If you're looking for a place to start reviewing, have a look at coreapi, EventManager, canonical.py, or store.py. Please also double-check if my view code matches your expectations of performance.

@jan-auer jan-auer requested a review from JTCunning June 20, 2018 15:04
@@ -182,7 +184,7 @@ def load_data(platform, default=None, timestamp=None, sample_name=None):

# Make breadcrumb timestamps relative to right now so they make sense
breadcrumbs = data.get('sentry.interfaces.Breadcrumbs')
if breadcrumbs is not None:
if breadcrumbs is not None and 'values' in breadcrumbs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It this a change that creeped in or does this PR change behavior WRT to values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a defensive change to keep behavior of the cocoa ingestion test the same. cocoa.json contains a key breadcrumbs: [ ... ], which was not touched by the old code and would error with the new one. So instead of fixing this code, I chose to rather keep the test behavior the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cocoa dump the only one that does not have canonical data for value use? If that's the case I rather have the data fixed than special handle this here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. Will change the visual diff though.

@@ -36,7 +36,7 @@ def get(self, request, event_id):

self.check_object_permissions(request, event.group)

Event.objects.bind_nodes([event], 'data')
Event.objects.bind_nodes([event], 'node_data')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be an issue. I much rather change bind_nodes so that it accepts the db column in addition to the descriptor and not change all calls to bind_nodes. Otherwise we need to touch some other stuff as well (hipchat-ac plugin though we can delete that, getsentry commands).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you mean with db column vs descriptor please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the other comment, I think I get it now. That would mean that everyone accessing data would have to change to a new field then, instead of continuing to use event.data. I'm afraid that'd be a higher risk of breaking something.

We could probably also patch this in Event.objects.bind_nodes and add a special case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jan-auer i was thinking of letting bind_nodes also take the db_column into account as a fallback.

@@ -106,8 +112,9 @@ def get_event_metadata(self):
etype = self.data.get('type', 'default')
if 'metadata' not in self.data:
# TODO(dcramer): remove after Dec 1 2016
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for @mattrobenolt: the comment is a lie. Cannot be removed yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove it? cc @dcramer - right now the handling of event.message is quite messy and hard to understand. @mitsuhiko and I were thinking about cleaning this up soon, but we'd need some input on which way to go.

@@ -19,11 +21,12 @@ class RawEvent(Model):
project = FlexibleForeignKey('sentry.Project')
event_id = models.CharField(max_length=32, null=True)
datetime = models.DateTimeField(default=timezone.now)
data = NodeField(
node_data = NodeField(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to mirror the comment above: we might want to keep the field under the original name since it affects bind_nodes. Additionally if we do this change I believe we need to reflect this in a dummy db migration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, forgot to run migrations. I'll wait on a decision whether we should rename this or not. Regarding bind_nodes: If we do not rename this field, we'll have to come up with a new name for the accessor attribute and change every access to it. If we keep renaming it to node_data but don't want to change calls to bind_nodes, we'd have to look up the model's attribute name from the db_column (i.e. 'data' -> 'node_data'). There's probably a mapping somewhere in the meta model, but it sounds a bit strange to me tbh.

assert data['sentry.interfaces.User'] == {'id': '1'}
assert 'user' not in data
assert data['user'] == {'id': '1'}
assert 'sentry.interfaces.User' not in data.keys()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is keys() necessary here? This is an odd statement and if it has a purpose a comment should clarify what it checks against the new list instead of the container itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because __contains__ will say true if you ask data directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment clarifying this? This will likely to be picked up by a linter sooner or later and someone is going to "fix" it.

@jan-auer
Copy link
Member Author

This diff shows an example where the old undefined behavior and the new defined behavior are different: The example JSON contains both "user" and "sentry.interfaces.User": https://percy.io/getsentry/sentry/builds/804476

@mitsuhiko mitsuhiko removed the request for review from JTCunning June 20, 2018 15:51
jan-auer referenced this pull request Jun 27, 2018
@bretthoerner
Copy link
Contributor

bretthoerner commented Jun 27, 2018

Only canonical names will be stored, but access is still possible via the legacy names in the entire codebase.

I don't have any review feedback, but I'm glad this came up (in the geo PR)... because this will effect Snuba and we need to update to be forwards compatible before this is merged since it consumes from Kafka and lives in its own codebase. If I'm reading correctly the interface keys in the JSON will actually be changed.

It should only take me a minute to update Snuba, but we probably need some way to ensure we inform downstream consumers (since we'll probably have more in the future).

@mitsuhiko
Copy link
Member

mitsuhiko commented Jun 27, 2018

@bretthoerner which are the downstream consumers?

@bretthoerner
Copy link
Contributor

bretthoerner commented Jun 27, 2018

Snuba is the only existing downstream consumer, but if the keys suddenly changed to (for example) exception our code would have stopped working and inserted nulls into ClickHouse.

I just think we'll have more Kafka consumers in the future, but even if it's just Snuba... I don't think anyone knew I didn't know to update our Kafka consumer for this. Maybe someone else knew, but I don't see any Snuba tickets related to this and I've done most of the consumer code.

@dcramer
Copy link
Member

dcramer commented Jun 27, 2018

This is probably a good reason to say we lock in versions and dont change these keys in the future.

@mitsuhiko
Copy link
Member

mitsuhiko commented Jun 27, 2018 via email

@bretthoerner
Copy link
Contributor

Maybe I'm misunderstanding and the keys aren't changing?

We publish the event.data.data to Kafka and do a plain json.loads of whatever that is. Are you maintaining the old keys in there forever?

https://github.com/getsentry/getsentry/blob/5121257164ea45873cf271d5a80e61c63ca0237c/getsentry/processing.py#L55-L66

@mitsuhiko
Copy link
Member

mitsuhiko commented Jun 27, 2018 via email

bretthoerner added a commit to getsentry/snuba that referenced this pull request Jun 27, 2018
Related to getsentry/sentry#8789

This is the bare minimum we need so Snuba is forward-compatible with
events it may see going forward. There may be other coordination work or
schemas we share in the future, but this is here so we don't block
process on canonicalization.
bretthoerner added a commit to getsentry/snuba that referenced this pull request Jun 27, 2018
Related to getsentry/sentry#8789

This is the bare minimum we need so Snuba is forward-compatible with
events it may see going forward. There may be other coordination work or
schemas we share in the future, but this is here so we don't block
process on canonicalization.
@jan-auer
Copy link
Member Author

@bretthoerner You could use something like CanonicalKeyDict on your side, too.

@mitsuhiko
Copy link
Member

I changed this back to bind_nodes('data') by giving the event a custom manager.

@mitsuhiko mitsuhiko force-pushed the feat/canonical-keys branch 4 times, most recently from 3470fde to c223cff Compare July 18, 2018 17:08
@mitsuhiko
Copy link
Member

@jan-auer there is absolutely no way to make the column rename work. As a result i had to go back to event.data being the NodeField itself. The tests pass but I did not actually verify that the data persisted is the correct one. I want to verify this tomorrow.

Copy link
Contributor

@mattrobenolt mattrobenolt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♂️

@jan-auer jan-auer merged commit 4204ff5 into master Jul 24, 2018
@jan-auer jan-auer deleted the feat/canonical-keys branch July 24, 2018 16:53
@github-actions github-actions bot locked and limited conversation to collaborators Dec 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants