-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proxy: OpenTelemetry tracing. #3458
Conversation
4effe37
to
28064b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this can't be based on main instead of #3433. Adding opentelemetry tracing is one thing, but tuning the spans to your liking is the other. We could add the support right now as is and then do other refactorings.
Without having the spans right, the opentelemetry traces will be less useful. But sure, we could make these changes the other way round, too. |
Agreed. Luckily, #3331 has been merged, so we now have much better spans. I'll gladly merge this PR if you rebase it on top of main. |
28064b5
to
f0c9e35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Saw new env var OTEL_EXPORTER_OTLP_ENDPOINT
and started thinking again about different config sources, some parameters come from cli flags, some from env. Would be nice to eventually merge different sources into one
f0c9e35
to
fcec052
Compare
This commit sets up OpenTelemetry tracing and exporter, so that they can be exported as OpenTelemetry traces as well. All outgoing HTTP requests will be traced. A separate (child) span is created for each outgoing HTTP request, and the tracing context is also propagated to the server in the HTTP headers. If tracing is enabled in the control plane and compute node too, you can now get an end-to-end distributed trace of what happens when a new connection is established, starting from the handshake with the client, creating the 'start_compute' operation in the control plane, starting the compute node, all the way to down to fetching the base backup and the availability checks in compute_ctl. Co-authored-by: Dmitry Ivanov <dima@neon.tech>
fcec052
to
c0f6cd4
Compare
On the surface, this doesn't add much, but there are some benefits: * We can do graceful shutdowns and thus record more code coverage data. * We now have a foundation for the more interesting behaviors, e.g. "stop accepting new connections after SIGTERM but keep serving the existing ones". * We give the otel machinery a chance to flush trace events before finally shutting down.
c0f6cd4
to
f575909
Compare
Ok, so now we can easily capture traces for the proxy's tests. How to reproduce: # https://www.jaegertracing.io/docs/1.42/getting-started/
docker run -d --name jaeger \
-e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
-e COLLECTOR_OTLP_ENABLED=true \
-p 6831:6831/udp \
-p 6832:6832/udp \
-p 5778:5778 \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
-p 14250:14250 \
-p 14268:14268 \
-p 14269:14269 \
-p 9411:9411 \
jaegertracing/all-in-one:1.42
# important: set otel endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
# run proxy's tests
poetry run pytest test_runner/regress/test_proxy.py
# open jaeger's UI in a web browser
xdg-open http://localhost:16686 |
Nice! Worked for me |
Nice. @lassizci I haven't followed closely story about traces, but I see that we have tempo source in Grafana now. What are the steps to looks at that traces in grafana? |
OpenTelemetry tracing support was added to proxy in neondatabase/neon#3458
OpenTelemetry tracing support was added to proxy in neondatabase/neon#3458
Thanks to commit XXXX ("Refactor common parts of handle_client and handle_ws_client to function."), we now have separate tracing spans for the connection establishment phase and for the forwarring phase of each connection.This commit sets up OpenTelemetry tracing and exporter, so that they can be exported as OpenTelemetry traces as well.This adds tracing to all outgoing HTTP requests. A separate (child) span is created for each outgoing HTTP request, and the tracing context is also propagated to the server in the HTTP headers. Use the 'reqwest-middleware' crate to do that.
If tracing is enabled in the control plane and compute node too, you can now get an end-to-end distributed trace of what happens when a new connection is established, starting from the handshake with the client, creating the 'start_compute' operation in the control plane, starting the compute node, all the way to down to fetching the base backup and the availability checks in compute_ctl.
Note: this is on top of PR #3433