Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): Add processing metadata to scope and event #4252

Merged
merged 9 commits into from Jan 11, 2022

Conversation

lobsterkatie
Copy link
Member

@lobsterkatie lobsterkatie commented Dec 9, 2021

There are various situations where it's helpful to be able to provide data to the event processing pipeline, without that data necessarily being sent to Sentry. (See here for one example where we're already doing this. It also comes up in the as-yet-unmerged dynamic sampling code, and has come up once more in nextjs work I'm doing.)

Rather than piggyback that data in debug_meta, and then have to do a "is there still anything in it, or should I delete it" dance after that data is pulled back out, it would simpler if we just had a spot where we could chuck anything we wanted with the knowledge that all of the data there will get filtered out of the final event.

This PR introduces such a spot, called sdkProcessingMetadata, both on the scope (so that the data can get set at any point prior to the event getting captured) and in the event. No use is (yet) made of this - that will come in future PRs.

Note: While there is a public setter for this data on the scope, I'm thinking of this as an internal API, and therefore am not planning to document it for users.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2021

size-limit report

Path Base Size (e5ede4c) Current Size Change
@sentry/browser - CDN Bundle (gzipped) 19.73 KB 19.77 KB +0.21% 🔺
@sentry/browser - CDN Bundle (minified) 62.78 KB 62.94 KB +0.27% 🔺
@sentry/browser - Webpack 22.26 KB 22.31 KB +0.22% 🔺
@sentry/browser - Webpack - gzip = false 75.93 KB 76.19 KB +0.34% 🔺
@sentry/react - Webpack 22.29 KB 22.34 KB +0.22% 🔺
@sentry/nextjs Client - Webpack 46.42 KB 46.48 KB +0.14% 🔺
@sentry/browser + @sentry/tracing - CDN Bundle (gzipped) 28.39 KB 28.43 KB +0.16% 🔺

Copy link
Member

@iker-barriocanal iker-barriocanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No use is (yet) made of this - that will come in future PRs.

What is the end goal (use case)?

(Didn't review tests since CI is failing).

return this._notifyEventProcessors([...getGlobalEventProcessors(), ...this._eventProcessors], event, hint);
}

/**
* Add data which will be accessible during event processing but won't get sent to Sentry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be docs about the behavior of other places: hard to maintain and easy to have it wrong with new changes.

Suggested change
* Add data which will be accessible during event processing but won't get sent to Sentry
* Add data which will be accessible during event processing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the entire point of this field, though - it's the scratch pad, where you can stick anything you want without worrying about having to pluck it (and only it) out, maybe deleting the whole container, etc. We just delete this entire field before the event is sent, and we know we haven't accidentally deleted too much or too little. (That's why it's called processingMetadata - it's data that's only useful for the SDK's event processing pipeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but I think that information should live where the container is removed. Not blocking, your call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree - otherwise, how does someone reading the code know it's safe to stick stuff there, without worrying about it getting sent to Sentry? Its entire raison d'etre is to be a local (and local only) datastore, which doesn't actually turn into anything. Kinda like os.tmpdir() in node.

The fact that we're even having this conversation makes me wonder if there's a better name for it, though. Can you think of a name which would make it clearer what it's there for? localProcessingMetadata? Maybe it would help to stick a _ at the front? WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming sounds good. The names I can think of are probably too long; for example, sdkEventProcessingMetadata. I rather sdk over local since IMO it's more accurate and easier to understand. I don't like eventProcessingMetadata since it doesn't reference it happening in the SDK. I've been some time thinking about this and can't get a good name (good also including short), so choose what you think fits best. I'm only opinionated against adding _ to the prefix, it's easy to relate it to a private class attribute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing to consider is that the name must be very clear and should not be mixed with debug_meta.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with sdkProcessingMetadata. Slightly shorter than sdkEventProcessingMetadata, but hopefully specific enough to get the point across.

packages/hub/src/scope.ts Outdated Show resolved Hide resolved
packages/types/src/event.ts Outdated Show resolved Hide resolved
Copy link
Member

@AbhiPrasad AbhiPrasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this is the approach we should take.

It adds bundle size without a huge amount of value. I’d rather we just use event processors in conjugation with the _extra field on the scope instead of making a new field.

@lobsterkatie
Copy link
Member Author

lobsterkatie commented Dec 10, 2021

It adds bundle size without a huge amount of value. I’d rather we just use event processors in conjugation with the _extra field on the scope instead of making a new field.

Actually, once this gets applied just in the example I linked in the PR description (UPDATE: also quoted below), it ends up decreasing bundle size - combining this PR and that change, it ends up replacing 542 characters with 238 characters in bundle.tracing.min.js. But even if those numbers were reversed, given how small that 304-character difference is compared to the 100K total characters in the tracing bundle, I don't think that's how we should be making this decision.

If we don't make this change, the data in question can remain in debug_meta, the way you can see we're doing in the example currently. (Event processors won't work for that example or the dynamic sampling one, because in those cases we need the data to persist all the way to eventToSentryRequest, so it can be put into the envelope header, and that happens after all event processors have run.) The reasons I see to move it are:

  • Right now we do a bunch of work to specifically pull some, but not all, data out:

      const { transactionSampling, ...metadata } = event.debug_meta || {};
      const { method: samplingMethod, rate: sampleRate } = transactionSampling || {};
      if (Object.keys(metadata).length === 0) {
        delete event.debug_meta;
      } else {
        event.debug_meta = metadata;
      }
    

    whereas if we have a dedicated spot we could just delete it entirely after we pull data out of it, which is both easier to reason about and the reason for the character savings mentioned above.

  • In previous iterations, data was getting passed in tags, and there ended up being a bug wherein the data wasn't cleared before the event was sent, which is how it ended up in debug_meta (because at least that only appears in the JSON, not the UI). This just takes that a step further. (Moving it to _extra opens the possibility for that to regress, since it's also visible in the UI.)

Thoughts?

@AbhiPrasad
Copy link
Member

See here for one example where we're already doing this. It also comes up in the as-yet-unmerged dynamic sampling code, and has come up once more in nextjs work I'm doing

Ok I understand your reasoning. Do you mind clearly outlining each of these examples and what they are setting that needs to be removed?

@AbhiPrasad
Copy link
Member

Ok I spent my lunch break thinking about this and you have convinced me - we need a hook right before request basically.

This would be solved if we had a hooks systems though, but it's unrealistic to depend on that coming to fruition.

I would like to choose a better name (though I understand the implicit difficulty of that statement :P) - and maybe discuss with the other sdk teams to see if we it's worthwhile publishing in develop as a "hey not every sdk needs this, but maybe if you find it useful?" thing. The example list would be useful as well for future readers of this PR.

@AbhiPrasad AbhiPrasad self-requested a review December 10, 2021 18:56
@AbhiPrasad AbhiPrasad dismissed their stale review December 10, 2021 18:56

Abhi's mind changed

@lobsterkatie
Copy link
Member Author

Do you mind clearly outlining each of these examples and what they are setting that needs to be removed?

Sure, here's each, along with the alternative:

  1. (and 1a). The example I keep talking about, linked and quoted above, where the point is for data to be able to travel through the SDK's processing pipeline with the event and eventually land in the envelope header. My use of it in the dynamic sampling branch is the same idea, just with a different piece of data.

    The current flow is like this:

    • event is created -> add data to debug_meta
    • eventToSentryRequest -> add data to envelope header, pull it from debug_meta, leave the rest of debug_meta intact (as long as there's still stuff in it; otherwise, yank it, too)

    Using this change, it would look like this:

    • event is created -> add data to processingMetadata
    • eventToSentryRequest -> add data to envelope header, delete processingMetadata

    Not a huge change, but one which does simplify things a bit. Doesn't use scope.

  2. The nextjs case, in which one (new) event processor needs access to the request data, but runs before the event processor which adds that data to the event.

    The flow without this change would look like this:

    • Sentry.init -> add helper data to global
    • individual request -> add existing request-data-adding event processor, make it also do the new thing
    • event is created -> (nothing)
    • event is processed -> lone event processor uses global helper data and existing request data, does both things

    Using this change, it would look like this:

    • Sentry.init -> add event processor which needs both request and helper data
    • individual request -> add existing request-data-adding event processor, add request data to scope._processingMetadata
    • event is created -> request data moved to event's processingMetadata
    • event is processed -> first event processor uses helper data and request data from processingMetadata, second event processor is unchanged from current

    These are roughly equivalent in complexity. I went for this change to solve it mostly because it was the third problem I'd seen which could be solved by a change I'd never gotten around the making, so I finally go around to making it. :-)

Given the above, there's also a hybrid option of just making the internal event schema change and leaving the scope out of it. (I can't guarantee that there might not be an example in the future which needs the scope piece, though.)

@lobsterkatie
Copy link
Member Author

we need a hook right before request basically.

Huh. Interesting. I see how you're getting that from this, but that's a much broader change than this PR was ever intended to make. All I'm really trying to do here is split data which should be deleted away from data which shouldn't be deleted in a simpler way (that deleted data being stuff used by the processing pipeline but not meant for the actual event).

I would like to choose a better name

If this is about a hook, I totally agree with you. But given that the point of this is to have a processing pipeline scratchpad, I feel like the current name works. (I'm not wedded to exactly that name, but my point is that some variation on the current name seems right to me.)

@lobsterkatie lobsterkatie force-pushed the kmclb-add-processing-metadata-to-event branch from b8be386 to 40b6815 Compare December 13, 2021 05:58
Copy link
Member

@iker-barriocanal iker-barriocanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change. +1 to have hooks, but I think we can move forward with this.

Should we add the feature of hooks to the backlog? cc @smeubank.

return this._notifyEventProcessors([...getGlobalEventProcessors(), ...this._eventProcessors], event, hint);
}

/**
* Add data which will be accessible during event processing but won't get sent to Sentry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but I think that information should live where the container is removed. Not blocking, your call.

packages/core/src/request.ts Show resolved Hide resolved
packages/hub/test/scope.test.ts Outdated Show resolved Hide resolved
packages/core/test/lib/base.test.ts Show resolved Hide resolved
@lobsterkatie
Copy link
Member Author

+1 to have hooks, but I think we can move forward with this.

Should we add the feature of hooks to the backlog? cc @smeubank.

I'm all for hooks, but again, that's totally separate from what this PR is about. If there were a beforeACTUALSendLikeForRealWhenWeActuallySendItToTheInternet hook, it's true that I could use it to do the deletion here, but that's neither here nor there in terms of the idea of having a temporary local datastore for use in event processing.

@iker-barriocanal
Copy link
Member

I'm all for hooks, but again, that's totally separate from what this PR is about.

Yes, that's why I'm suggesting to move forward with this PR and add the hook system to the backlog as a feature request.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2022

This pull request has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

@lobsterkatie lobsterkatie force-pushed the kmclb-add-processing-metadata-to-event branch from 40b6815 to 7b6cb1c Compare January 11, 2022 05:22
@lobsterkatie lobsterkatie merged commit 38ac85a into master Jan 11, 2022
@lobsterkatie lobsterkatie deleted the kmclb-add-processing-metadata-to-event branch January 11, 2022 05:44
lobsterkatie added a commit that referenced this pull request Jan 13, 2022
… data (#4388)

In #4252, a new `sdkProcessingMetadata` field was added to the `Scope` and `Event` interfaces, to provide a place to put data which is needed for in-SDK event processing but shouldn't be sent to Sentry. This refactors one step in that processing... process... to use the new field, eliminating the need to pull the data out of `debug_meta` before the event is sent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants