Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry FastAPI and Requests #1940

Closed
zukwung opened this issue Mar 2, 2023 · 9 comments · Fixed by #2235
Closed

OpenTelemetry FastAPI and Requests #1940

zukwung opened this issue Mar 2, 2023 · 9 comments · Fixed by #2235

Comments

@zukwung
Copy link

zukwung commented Mar 2, 2023

How do you use Sentry?

Sentry Saas (sentry.io)

Version

1.16.0

Steps to Reproduce

  1. Manually instrument a FastAPI application using requests as an HTTP client using the following code snippet:
    sentry_sdk.init(
        dsn=get_sentry_dsn(),
        traces_sample_rate=traces_sample_rate,
        instrumenter="otel",
        environment=get_environment(),
    )
    
    tracer_provider = TracerProvider(resource=resource)
    tracer_provider.add_span_processor(SentrySpanProcessor())
    trace.set_tracer_provider(tracer_provider)
    set_global_textmap(SentryPropagator())
    ... 
    RequestsInstrumentor().instrument(tracer_provider=tracer_provider)
...
    FastAPIInstrumentor.instrument_app(app, tracer_provider=trace.get_tracer_provider())
  1. Create two instances of the Application
  2. Using the requests library, In the first instance, call the second instance in a way to cause an error, in this case the first instance endpoint calls with an invalid auth token, resulting in a 403
  3. Review sentry trace in sentry.io

Expected Result

A single trace with both requests

image

Actual Result

Multiple traces with each request
image
image

Additional Info

I spent a day or two debugging this issue, and it seems to boil down to how the trace-id is extracted in the propagator. I believe the ASGI middleware is generating a span somewhere that I haven't been able to find, since the scope here already has the new span somewhere, see a sample below.

Because of the asgi middleware, the sentry_trace = getter.get(carrier, SENTRY_TRACE_HEADER_NAME) becomes sentry_trace: [['ecf45d41418a4ab8a5723e6c24b598d3-9cdd27733f0c0d37-', 'bd345b67708c6000341cdd61eeeb5583-3b4d0fe494390136-1']], which means sentry uses the first trace id to then send to sentry, causing the behaviour above. I tried changing the selection to select the last element, which works, but I'm not sure how this would impact usage in other places, otherwise I would open a PR. I believe there is no issue with the outgoing request because looking at the sentry span in inject in the propagator, the current span seems to be correct, see below for more info.

asgi carrier info

carrier: [{'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8081), 'client': ('127.0.0.1', 59687), 'scheme': 'http', 'method': 'GET', 'root_path': '', 'path': '/api/v1/example/test2', 'raw_path': b'/api/v1/example/test2', 'query_string': b'', 'headers': [(b'host', b'localhost:8081'), (b'sentry-trace', b'ecf45d41418a4ab8a5723e6c24b598d3-9cdd27733f0c0d37-'), (b'user-agent', b'python-requests/2.28.2'), (b'accept-encoding', b'gzip, deflate'), (b'accept', b'*/*'), (b'connection', b'keep-alive'), (b'content-type', b'application/json'), (b'authorization', b'Bearer token'), (b'sentry-trace', b'bd345b67708c6000341cdd61eeeb5583-3b4d0fe494390136-1'), (b'baggage', b'sentry-trace_id=bd345b67708c6000341cdd61eeeb5583,sentry-environment=dev,sentry-release=<sentry_release>,sentry-public_key=<public_key>,sentry-transaction=/api/v1,sentry-sample_rate=1.0')], 'app': <fastapi.applications.FastAPI object at 0x106070fa0>}]

sentry span in propagator inject

sentry_span: <Span(op=None, description:'HTTP GET', trace_id='bd345b67708c6000341cdd61eeeb5583', span_id='3b4d0fe494390136', parent_span_id='46cb3b9f78354054', sampled=True)>
@antonpirker
Copy link
Member

Hey @zukwung

Thanks for reporting this! I put this on our backlog.

I think you are right that the problem is the second instance not getting the trace information right and thus it can not attach its transaction to the trace.

How did you create the first, correct screenshot?

@antonpirker antonpirker self-assigned this Mar 3, 2023
@zukwung
Copy link
Author

zukwung commented Mar 3, 2023

@antonpirker i modified this line to sentrytrace = extract_sentrytrace_data(sentry_trace[-1]), but I don't know what implications changing this line has.

@antonpirker
Copy link
Member

Hm. In the line above sentry_trace = getter.get(carrier, SENTRY_TRACE_HEADER_NAME) we fetch the sentry trace header. And this sentry_trace is an array.

Instead of picking the first element to call with extract_sentrytrace_data() you pick the last one.

I was guessing that the trace would be always in the first element. But It could be because we have two services with the same implementation that the header is in there twice and the last one is the correct one?

I will probably look at this beginning of next week.

Great debuggin @zukwung !

@zukwung
Copy link
Author

zukwung commented Mar 3, 2023

great, thanks @antonpirker, let me know if you need more info

@antonpirker
Copy link
Member

@zukwung sorry for still not tackling this. I am kind of swamped with other things. Will take some time until I can do this.

@zukwung
Copy link
Author

zukwung commented Apr 24, 2023

no worries @antonpirker, for our use cases I went ahead and just used the usual Sentry SDK, but I really appreciate the efforts Sentry has with OpenTelemetry, keep it up :)

@nambrosini-codes
Copy link

I think i'm facing a similar issue. Sentry swallows the trace header from baggage and conflicts with the regular otel mechanism to propagate attributes via header. When i disable sentry, the attributes propagate across services, when i enable, propagation fails. I will debug sentry code to try to find specifically when we're doing that to be more helpful but wanted to write in the issue to communicate there's more than 1 user affected by this

@kristofferlanno
Copy link

kristofferlanno commented May 29, 2023

Facing the same issue. We have service A (sampling rate 1.0) calling service B (sampling function that samples if parent_sampled=True). When using SentryPropagator the trace is broken across the two services and it doesn't seem to sample properly in service B (parent_sampled is always None). Printing the headers in service B results in the below.

'Sentry-Trace': 'cd429eb592d8422183b699e2b9c6e157-b4f74d4df648ecf7-,240b9aa69cd6e8dae92d6494a855dc0e-b125fadd945d8599-1'
'Baggage': 'sentry-trace_id=240b9aa69cd6e8dae92d6494a855dc0e,sentry-environment=dev,sentry-public_key=(redacted),sentry-transaction=status/,sentry-sample_rate=1.0'

Span information as printed by OTEL in Service A:

"context": {
      "trace_id": "0x240b9aa69cd6e8dae92d6494a855dc0e",
      "span_id": "0xb125fadd945d8599",
      "trace_state": "[]"
}

Completely commenting out the below line fixes the issue with the broken traces (I get a complete trace) and everything looks great in the Sentry UI. We then also get the traceparent header in service B which looks correct (and which we would like to have actually). In this case sampling doesn't work however (parent_sampled is always None)

# set_global_textmap(SentryPropagator())

What we want to achieve is to use OTEL with Sentry as the tracing backend and having the W3C TraceContext headers be propagated so that we can use them with our logging to correlate log messages based on the trace-id.

Let me know if there's anything we can help out with.

Update: Although the issue looks very similar our issue might not be exactly the same (could be). In our case the sentry-trace header value only has a single item which sentry_trace[0] extracts. But the string for some reason looks like this: cd429eb592d8422183b699e2b9c6e157-b4f74d4df648ecf7-,240b9aa69cd6e8dae92d6494a855dc0e-b125fadd945d8599-1 which extract_sentrytrace_data cannot extract (returns None). I have confirmed that service A sets the header to only the 240b9aa69cd6e8dae92d6494a855dc0e-b125fadd945d8599-1 part which seems correct to me.

@kristofferlanno
Copy link

kristofferlanno commented May 30, 2023

I followed @zukwung's idea of modifying the extract_sentrytrace_data function to get the correct trace context out and similarly it then worked, traces are whole and shows up in the Sentry UI. My change was no identical, I had to change it like so to account for the string I got:

extract_sentrytrace_data(sentry_trace[0].split(",")[-1])

However parent_sampled is still always None which I've tracked down to:

  1. Being properly extracted and added to the context by SentryPropagator (here)
  2. Being correctly assigned to trace_data inside SpanProcessor (here)
  3. But then never being passed down to startTransaction inside SpanProcessor (here) and thus:
  4. Never being added to the sampling_context inside hub (here).

Which means even though the context and trace_data says parent_sampled: True, that info is lost by the time we want to make decisions based on the data. Is this a bug? Are we supposed to be able to use parent_sampled in the sampling function?

@antonpirker antonpirker changed the title OpenTelemetry FastAPI and Request OpenTelemetry FastAPI and Requests Jul 6, 2023
antonpirker added a commit that referenced this issue Jul 11, 2023
If OpenTelementry is enabled, the sentry-trace headers should not be applied by Sentry intregration, but only by the OTel propagator.

Fixes #1940
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants