Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(node-experimental): Use native OTEL Spans #9161

Merged
merged 3 commits into from
Oct 9, 2023
Merged

Conversation

mydea
Copy link
Member

@mydea mydea commented Oct 3, 2023

This PR changes the performance handling of the node-experimental package fundamentally, aligning it even more with the OpenTelemetry model.

Tasks

Edit tasklist title
Beta Give feedback Tasklist Tasks, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Store breadcrumbs as TimedEvent on otel spans
    Options
  2. Update startSpan / startInactiveSpan to create & expose OTEL spans (instead of sentry spans)
    Options
  3. Update span processing to avoid global map & use exporter
    Options
  4. Add unit tests
    Options
  5. Add E2E test
    Options

The core changes are:

  • startSpan / startInactiveSpan create Otel spans now, not Sentry Spans/Transactions
  • Generally, Sentry spans/transactions should be completely gone, from a users POV
  • Splits the functionality that was previously in the SpanProcessor into a new SpanProcessor + SpanExporter
  • Ditches the global map for state in favor of some WeakMaps, which means we do not need to clean our references up manually anymore etc.
  • Breadcrumbs are stored as events on the otel spans

How does transaction/span creation work now

In the old model, we would start transactions/spans in the span processor onStart and onEnd hook. This required us to keep track of the parent Sentry span, as we need it to call parentSpan.startSpan() in onStart. Since it can be tricky to know when a span is not needed anymore as a parent etc, this made garbage collection harder and messier, and also required us to still sprinkle Sentry spans/transactions everywhere through our code.

In the new model, only minimal processing is done in the span processor, and importantly, we do not create any Sentry spans yet. We store some additional data we need later in a WeakMap associated to the (Otel) span.

Then we leverage the underlying BatchSpanProcessor from OTEL, which collects spans together and sends them for processing to a SpanExporter. So only finished spans end up in our span processor. Our custom span exporter does the following:

  • Builds a tree hierarchy of the spans
  • Picks the root spans that are found, and builds transactions from there - adding children down the tree
  • Every span/transaction is immediately finished (with the correct end time) and then sent
  • Note that we store the current scope when the transaction was created and apply this scope.

For now I copied most of the stuff from opentelemetry-node over, eventually we can merge most of this together probably and export the parts from opentelemetry-node.

How do breadcrumbs work

We now pick all events added to spans and add them as breadcrumbs.
For this, we walk up the tree of spans up to the root and collect all breadcrumbs together. We use a special JSON field for now to actually store the breadcrumbs data (TODO: Maybe there is a better way to do this...).
When we add a breadcrumb, we actually always add it to the root span for now, not the active span. The reason is that this works better with our mental model of breadcrumbs, where anything that happens in this root span is relevant. But we'll also pick up any other events added by otel instrumentation along the way.

Open questions

  • Should we apply the scope when a span is started, or when it is finished? OTEL uses the context when it was started, and the model makes this easier to implement, but if we prefer we can probably also pick the current context/scope in onEnd. But not 100% sure how this would work with parallel spans, needs to be tested I guess.

@mydea mydea requested review from Lms24 and AbhiPrasad October 3, 2023 07:39
@mydea mydea self-assigned this Oct 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2023

size-limit report 📦

Path Size
@sentry/browser (incl. Tracing, Replay) - Webpack (gzipped) 84.24 KB (0%)
@sentry/browser (incl. Tracing) - Webpack (gzipped) 31.41 KB (0%)
@sentry/browser - Webpack (gzipped) 22 KB (0%)
@sentry/browser (incl. Tracing, Replay) - ES6 CDN Bundle (gzipped) 78.76 KB (-0.01% 🔽)
@sentry/browser (incl. Tracing) - ES6 CDN Bundle (gzipped) 28.59 KB (-0.01% 🔽)
@sentry/browser - ES6 CDN Bundle (gzipped) 21 KB (-0.01% 🔽)
@sentry/browser (incl. Tracing, Replay) - ES6 CDN Bundle (minified & uncompressed) 254.38 KB (0%)
@sentry/browser (incl. Tracing) - ES6 CDN Bundle (minified & uncompressed) 86.66 KB (0%)
@sentry/browser - ES6 CDN Bundle (minified & uncompressed) 62.35 KB (0%)
@sentry/browser (incl. Tracing) - ES5 CDN Bundle (gzipped) 31.45 KB (-0.01% 🔽)
@sentry/react (incl. Tracing, Replay) - Webpack (gzipped) 84.27 KB (0%)
@sentry/react - Webpack (gzipped) 22.05 KB (0%)
@sentry/nextjs Client (incl. Tracing, Replay) - Webpack (gzipped) 102.23 KB (0%)
@sentry/nextjs Client - Webpack (gzipped) 50.99 KB (0%)

Copy link
Member

@Lms24 Lms24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I took a look at the PR and while I didn't review every single detail (frankly, I think I lack the context for a lot of the more otel-specific APIs), the general concept makes sense to me. Had some questions around the exporter and types but nothing blocking.

Should we apply the scope when a span is started, or when it is finished

I think it's fine (for now) to use the scope from when the span was started. Especially because it's how Otel does it and I think we generally want to stick with Otel in this package.

Another only tangentially related thought while reviewing: We definitely need to add proper docs for the package (however it is called) we release/maintain alongside @sentry/node during v8.

@@ -27,6 +30,18 @@ export function initOtel(): () => void {
diag.setLogger(otelLogger, DiagLogLevel.DEBUG);
}

if (client) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: If client is undefined, should we even do anything in this function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question 😅 I would say right now that is more of a theoretical question, as that is just called by init() right after we called initNode(), so there should always be a client. But once we eventually split this up into more easily consumable parts, we'll need to handle these cases better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

packages/node-experimental/src/sdk/scope.ts Outdated Show resolved Hide resolved
function finishSpan(): void {
span.end();
}

_initSpan(span as OtelSpan, spanContext);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: I'm seeing quite a lot of type casting when dealing with otel spans. Is this because Otel types are somehow wrong/too broad?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because @opentelemetry/api and everything related to that passes a basic Span type around, while we often need the spans that are actually generated by @opentelemetry/sdk-trace-base for certain things (because that has some more fields with things we need). But it's something we should look into when we stabilize this/split this up into better reusable parts, ideally we can avoid as much of this as possible 😬

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually here specifically I'll update it and just pass the regular 'spans' around. that's safer and prob. more correct anyhow!

@@ -136,6 +100,22 @@ function getTracer(): Tracer | undefined {
return client && client.tracer;
}

function isTransaction(span: Span): span is Transaction {
return span instanceof Transaction;
function _initSpan(span: OtelSpan, spanContext: NodeExperimentalSpanContext): void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l: wdyt about calling this something around applySentryAttributesToSpan or similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me! 👍

/**
* This is a fork of the base Transaction with OTEL specific stuff added.
*/
export class NodeExperimentalTransaction extends Transaction {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l: iiuc, Transaction::finish shouldn't do anything anymore, right? Should we override it here to print a warning that it noops if it's called? I guess chances are low that users would call finish given that they'll only work with spans, so feel free to disregard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually users should not get a hold of a transaction ever 😅 startTransaction is not exported, and only used in the span exporter.


this._finishedSpans.push(...spans);

const remainingSpans = maybeSend(this._finishedSpans);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: When would there be remaining spans and what happens with them? Is it when we have multiple concurrent root spans?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two scenarios, one is "normal" and will happen, and one should not happen but may happen (who knows):

  1. When a root span is not finished yet, all the child spans will remain in there. E.g. think one http request in a transaction may be finished and go to the exporter, while the overall transaction is still running. In this case, the http span will remain here until the root span is completed.

  2. Somewhere along the way some span was dropped (for whatever reason), so the parent span of this span never gets to the exporter. This should not happen, but 🤷 So in this case we'll eventually clean this up and just discard the span.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added a comment to explain this a bit better!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for explaining!

@mydea
Copy link
Member Author

mydea commented Oct 9, 2023

I've updated this based on feedback from @Lms24 , thanks a lot.
I also went through and actually aligned the Span type used. Wherever possible, I try to avoid type casting it and just use import { Span } from '@opentelemetry/api', so the most generic span type. However, in some places this is not possible, as we expect/need more fields 😢 But I try to narrow this down as much as possible and use instance checks where possible to actually ensure this works as robustly as possible.

Use regular `Span` type from `@opentelemetry/api` wherever possible.
Copy link
Member

@Lms24 Lms24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for applying my feedback and answering my questions!

@mydea mydea merged commit aeb4462 into develop Oct 9, 2023
83 checks passed
@mydea mydea deleted the fn/potel-native-spans branch October 9, 2023 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants