Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy: OpenTelemetry tracing. #3458

Merged
merged 3 commits into from
Feb 17, 2023
Merged

proxy: OpenTelemetry tracing. #3458

merged 3 commits into from
Feb 17, 2023

Conversation

hlinnaka
Copy link
Contributor

@hlinnaka hlinnaka commented Jan 26, 2023

Thanks to commit XXXX ("Refactor common parts of handle_client and handle_ws_client to function."), we now have separate tracing spans for the connection establishment phase and for the forwarring phase of each connection. This commit sets up OpenTelemetry tracing and exporter, so that they can be exported as OpenTelemetry traces as well.

This adds tracing to all outgoing HTTP requests. A separate (child) span is created for each outgoing HTTP request, and the tracing context is also propagated to the server in the HTTP headers. Use the 'reqwest-middleware' crate to do that.

If tracing is enabled in the control plane and compute node too, you can now get an end-to-end distributed trace of what happens when a new connection is established, starting from the handshake with the client, creating the 'start_compute' operation in the control plane, starting the compute node, all the way to down to fetching the base backup and the availability checks in compute_ctl.

Note: this is on top of PR #3433

@hlinnaka hlinnaka requested a review from a team as a code owner January 26, 2023 13:57
@hlinnaka hlinnaka requested review from chaporgin, funbringer and koivunej and removed request for a team January 26, 2023 13:57
Copy link
Contributor

@funbringer funbringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this can't be based on main instead of #3433. Adding opentelemetry tracing is one thing, but tuning the spans to your liking is the other. We could add the support right now as is and then do other refactorings.

proxy/src/main.rs Outdated Show resolved Hide resolved
proxy/src/main.rs Outdated Show resolved Hide resolved
@hlinnaka
Copy link
Contributor Author

hlinnaka commented Feb 1, 2023

I don't understand why this can't be based on main instead of #3433. Adding opentelemetry tracing is one thing, but tuning the spans to your liking is the other. We could add the support right now as is and then do other refactorings.

Without having the spans right, the opentelemetry traces will be less useful. But sure, we could make these changes the other way round, too.

@funbringer
Copy link
Contributor

Without having the spans right, the opentelemetry traces will be less useful.

Agreed. Luckily, #3331 has been merged, so we now have much better spans. I'll gladly merge this PR if you rebase it on top of main.

@hlinnaka hlinnaka mentioned this pull request Feb 6, 2023
2 tasks
@funbringer funbringer requested review from a team as code owners February 16, 2023 18:48
@funbringer funbringer changed the base branch from proxy-refactorings to main February 16, 2023 18:48
Copy link
Contributor

@LizardWizzard LizardWizzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Saw new env var OTEL_EXPORTER_OTLP_ENDPOINT and started thinking again about different config sources, some parameters come from cli flags, some from env. Would be nice to eventually merge different sources into one

This commit sets up OpenTelemetry tracing and exporter, so that they
can be exported as OpenTelemetry traces as well.

All outgoing HTTP requests will be traced. A separate (child)
span is created for each outgoing HTTP request, and the tracing
context is also propagated to the server in the HTTP headers.

If tracing is enabled in the control plane and compute node too, you
can now get an end-to-end distributed trace of what happens when a new
connection is established, starting from the handshake with the
client, creating the 'start_compute' operation in the control plane,
starting the compute node, all the way to down to fetching the base
backup and the availability checks in compute_ctl.

Co-authored-by: Dmitry Ivanov <dima@neon.tech>
On the surface, this doesn't add much, but there are some benefits:

* We can do graceful shutdowns and thus record more code coverage data.

* We now have a foundation for the more interesting behaviors, e.g. "stop
  accepting new connections after SIGTERM but keep serving the existing ones".

* We give the otel machinery a chance to flush trace events before
  finally shutting down.
@funbringer
Copy link
Contributor

funbringer commented Feb 17, 2023

Ok, so now we can easily capture traces for the proxy's tests.

изображение

How to reproduce:

# https://www.jaegertracing.io/docs/1.42/getting-started/
docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 14250:14250 \
  -p 14268:14268 \
  -p 14269:14269 \
  -p 9411:9411 \
  jaegertracing/all-in-one:1.42

# important: set otel endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# run proxy's tests
poetry run pytest test_runner/regress/test_proxy.py

# open jaeger's UI in a web browser
xdg-open http://localhost:16686

@LizardWizzard
Copy link
Contributor

LizardWizzard commented Feb 17, 2023

Ok, so now we can easily capture traces for the proxy's tests.

Nice! Worked for me

@funbringer funbringer merged commit d90cd36 into main Feb 17, 2023
@funbringer funbringer deleted the proxy-telemetry branch February 17, 2023 12:32
@kelvich
Copy link
Contributor

kelvich commented Feb 17, 2023

Nice.

@lassizci I haven't followed closely story about traces, but I see that we have tempo source in Grafana now. What are the steps to looks at that traces in grafana?

hlinnaka added a commit to neondatabase/helm-charts that referenced this pull request Apr 24, 2023
OpenTelemetry tracing support was added to proxy in
neondatabase/neon#3458
hlinnaka added a commit to neondatabase/helm-charts that referenced this pull request Apr 25, 2023
OpenTelemetry tracing support was added to proxy in
neondatabase/neon#3458
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants