Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new POTel span processor #3223

Merged
merged 14 commits into from
Jul 9, 2024
Merged

Conversation

sl0thentr0py
Copy link
Member

  • only acts on on_end instead of both on_start/on_end as before
  • store children spans in a dict mapping span_id -> children
  • new dict only stores otel span objects and no sentry transaction/span objects so we save a bit of useless memory allocation
  • I'm not using our current Transaction/Span classes at all to build the event because when we add our APIs later, we'll need to rip these out and we also avoid having to deal with the instrumenter problem
  • if we get a root span (without parent), we recursively walk the dict and find the children and package up the transaction event and send it
    • I didn't do it like JS because I think this way is better
    • they group an array of finished_spans every time a root span ends and I think this uses more cpu than what I did
    • and the dict like I used it doesn't take more space than the array either
  • if we get a span with a parent we just update the dict to find the span later
  • moved the common is_sentry_span logic to utils

sl0thentr0py and others added 12 commits June 11, 2024 13:43
* create a new otel context `_SCOPES_KEY` that will hold a tuple of
  `(curent_scope, isolation_scope)`
* the `current_scope` will always be forked (like on every span creation/context update in practice)
  * note that this is on `attach`, so not on all copy-on-write context
    object creation but only on apis such as
    [`trace.use_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L547)
    or [`tracer.start_as_current_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L329)
  * basically every otel `context` fork corresponds to our `current_scope` fork
* the `isolation_scope` currently will not be forked
  * these will later be updated, for instance when we update our top
    level scope apis that fork isolation scope, that will also have a
    corresponding change in this `attach` function
* create a new otel context `_SCOPES_KEY` that will hold a tuple of
  `(curent_scope, isolation_scope)`
* the `current_scope` will always be forked (like on every span creation/context update in practice)
  * note that this is on `attach`, so not on all copy-on-write context
    object creation but only on apis such as
    [`trace.use_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L547)
    or [`tracer.start_as_current_span`](https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-api/src/opentelemetry/trace/__init__.py#L329)
  * basically every otel `context` fork corresponds to our `current_scope` fork
* the `isolation_scope` currently will not be forked
  * these will later be updated, for instance when we update our top
    level scope apis that fork isolation scope, that will also have a
    corresponding change in this `attach` function
@sl0thentr0py
Copy link
Member Author

tested simple nested trace

import sentry_sdk
from opentelemetry import trace
from time import sleep

sentry_sdk.init(
    debug=True,
    traces_sample_rate=1.0,
    _experiments={"otel_powered_performance": True},
)

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("request"):
    sleep(0.1)
    with tracer.start_as_current_span("db"):
        sleep(0.5)
        with tracer.start_as_current_span("redis"):
            sleep(0.2)
    with tracer.start_as_current_span("http"):
        sleep(1)

image

Copy link
Contributor

@sentrivana sentrivana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool ✨

Comment on lines +52 to +53
# if have a root span ending, we build a transaction and send it
self._flush_root_span(span)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking what happens if there's a child span that, for whatever reason (weird async cases? wonky instrumentation?), hasn't finished before the root span. Since we're not using on_start at all, there'll be virtually no record of the span at this point and the parent transaction will be sent. Once the child finishes, we'll run on_end with the now orphaned span and it'll be saved in _children_spans, but never sent, since the parent transaction is already gone. So we might have a potential leak there.

Do we need some sort of cleanup of orphaned spans? Should we wait a bit before flushing the transaction to account for child spans possibly ending very close to the parent span, but a bit late? IIRC I've noticed JS also having a small sleep in place.

TBH not sure how much of a real world problem late child spans are, but I can imagine they happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option: Also use on_start to create some record of the span and then in on_end, if a transaction comes and we detect that it has unfinished child spans, wait for a grace period to give them time to finish?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some heuristic cleanup logic will be done in a follow up PR yes, I haven't thought through exactly how we'll do it but JS just has a cutoff logic of 5 minutes

Comment on lines +101 to +102
# we construct the event from scratch here
# and not use the current Transaction class for easier refactoring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool, haven't thought of completely bypassing this. I'll have to think about the implications for the granular instrumenter if we isolate OTel and our instrumentation like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so for now I'm just creating a raw dict, but I'm thinking eventually of a TypedDict for Span too and the current Span/Transaction classes will basically be completely removed.

Copy link

codecov bot commented Jun 28, 2024

Codecov Report

Attention: Patch coverage is 48.71795% with 80 lines in your changes missing coverage. Please review.

Project coverage is 78.97%. Comparing base (436626b) to head (2c29711).

Additional details and impacted files
@@                           Coverage Diff                           @@
##           neel/potel/initial-scope-management    #3223      +/-   ##
=======================================================================
- Coverage                                79.27%   78.97%   -0.30%     
=======================================================================
  Files                                      134      135       +1     
  Lines                                    14255    14335      +80     
  Branches                                  2990     3009      +19     
=======================================================================
+ Hits                                     11300    11321      +21     
- Misses                                    2107     2170      +63     
+ Partials                                   848      844       -4     
Files Coverage Δ
sentry_sdk/integrations/opentelemetry/consts.py 100.00% <100.00%> (ø)
...y_sdk/integrations/opentelemetry/span_processor.py 76.31% <77.77%> (+0.14%) ⬆️
sentry_sdk/integrations/opentelemetry/utils.py 80.00% <80.00%> (ø)
...integrations/opentelemetry/potel_span_processor.py 24.17% <14.66%> (-37.73%) ⬇️

... and 1 file with indirect coverage changes

@sl0thentr0py
Copy link
Member Author

@sentrivana moved op/description/status logic to utils, but status mapping will be done separately. Maybe @szokeasaurusrex can look into making this logic as similar to JS as possible, but for now this is good enough for this PR.

https://github.com/getsentry/sentry-javascript/blob/master/packages/opentelemetry/src/utils/mapStatus.ts
https://github.com/getsentry/sentry-javascript/blob/master/packages/opentelemetry/src/utils/parseSpanDescription.ts

Base automatically changed from neel/potel/initial-scope-management to potel-base July 9, 2024 12:35
@sl0thentr0py sl0thentr0py marked this pull request as ready for review July 9, 2024 12:37
@sl0thentr0py sl0thentr0py merged commit f0c1a84 into potel-base Jul 9, 2024
117 of 118 checks passed
@sl0thentr0py sl0thentr0py deleted the neel/potel/span-processor branch July 9, 2024 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants