proxy: OpenTelemetry tracing. #3458

hlinnaka · 2023-01-26T13:57:24Z

Thanks to commit XXXX ("Refactor common parts of handle_client and handle_ws_client to function."), we now have separate tracing spans for the connection establishment phase and for the forwarring phase of each connection. This commit sets up OpenTelemetry tracing and exporter, so that they can be exported as OpenTelemetry traces as well.

This adds tracing to all outgoing HTTP requests. A separate (child) span is created for each outgoing HTTP request, and the tracing context is also propagated to the server in the HTTP headers. Use the 'reqwest-middleware' crate to do that.

If tracing is enabled in the control plane and compute node too, you can now get an end-to-end distributed trace of what happens when a new connection is established, starting from the handshake with the client, creating the 'start_compute' operation in the control plane, starting the compute node, all the way to down to fetching the base backup and the availability checks in compute_ctl.

~~Note: this is on top of PR #3433~~

funbringer

I don't understand why this can't be based on main instead of #3433. Adding opentelemetry tracing is one thing, but tuning the spans to your liking is the other. We could add the support right now as is and then do other refactorings.

proxy/src/main.rs

hlinnaka · 2023-02-01T12:47:59Z

I don't understand why this can't be based on main instead of #3433. Adding opentelemetry tracing is one thing, but tuning the spans to your liking is the other. We could add the support right now as is and then do other refactorings.

Without having the spans right, the opentelemetry traces will be less useful. But sure, we could make these changes the other way round, too.

funbringer · 2023-02-01T14:19:41Z

Without having the spans right, the opentelemetry traces will be less useful.

Agreed. Luckily, #3331 has been merged, so we now have much better spans. I'll gladly merge this PR if you rebase it on top of main.

LizardWizzard

LGTM

Saw new env var OTEL_EXPORTER_OTLP_ENDPOINT and started thinking again about different config sources, some parameters come from cli flags, some from env. Would be nice to eventually merge different sources into one

This commit sets up OpenTelemetry tracing and exporter, so that they can be exported as OpenTelemetry traces as well. All outgoing HTTP requests will be traced. A separate (child) span is created for each outgoing HTTP request, and the tracing context is also propagated to the server in the HTTP headers. If tracing is enabled in the control plane and compute node too, you can now get an end-to-end distributed trace of what happens when a new connection is established, starting from the handshake with the client, creating the 'start_compute' operation in the control plane, starting the compute node, all the way to down to fetching the base backup and the availability checks in compute_ctl. Co-authored-by: Dmitry Ivanov <dima@neon.tech>

On the surface, this doesn't add much, but there are some benefits: * We can do graceful shutdowns and thus record more code coverage data. * We now have a foundation for the more interesting behaviors, e.g. "stop accepting new connections after SIGTERM but keep serving the existing ones". * We give the otel machinery a chance to flush trace events before finally shutting down.

funbringer · 2023-02-17T12:03:09Z

Ok, so now we can easily capture traces for the proxy's tests.

How to reproduce:

# https://www.jaegertracing.io/docs/1.42/getting-started/
docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 14250:14250 \
  -p 14268:14268 \
  -p 14269:14269 \
  -p 9411:9411 \
  jaegertracing/all-in-one:1.42

# important: set otel endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# run proxy's tests
poetry run pytest test_runner/regress/test_proxy.py

# open jaeger's UI in a web browser
xdg-open http://localhost:16686

LizardWizzard · 2023-02-17T12:32:01Z

Ok, so now we can easily capture traces for the proxy's tests.

Nice! Worked for me

kelvich · 2023-02-17T14:59:32Z

Nice.

@lassizci I haven't followed closely story about traces, but I see that we have tempo source in Grafana now. What are the steps to looks at that traces in grafana?

OpenTelemetry tracing support was added to proxy in neondatabase/neon#3458

hlinnaka requested a review from a team as a code owner January 26, 2023 13:57

hlinnaka requested review from chaporgin, funbringer and koivunej and removed request for a team January 26, 2023 13:57

hlinnaka force-pushed the proxy-telemetry branch from 4effe37 to 28064b5 Compare January 26, 2023 15:04

hlinnaka mentioned this pull request Jan 26, 2023

Epic: More OpenTelemetry tracing #3463

Open

8 tasks

funbringer reviewed Jan 31, 2023

View reviewed changes

proxy/src/main.rs Outdated Show resolved Hide resolved

proxy/src/main.rs Outdated Show resolved Hide resolved

hlinnaka mentioned this pull request Feb 6, 2023

Epic: Latency #3548

Closed

2 tasks

funbringer force-pushed the proxy-telemetry branch from 28064b5 to f0c9e35 Compare February 16, 2023 18:48

funbringer requested review from a team as code owners February 16, 2023 18:48

funbringer changed the base branch from proxy-refactorings to main February 16, 2023 18:48

funbringer requested a review from LizardWizzard February 16, 2023 18:49

LizardWizzard approved these changes Feb 16, 2023

View reviewed changes

funbringer force-pushed the proxy-telemetry branch from f0c9e35 to fcec052 Compare February 17, 2023 10:42

funbringer force-pushed the proxy-telemetry branch from fcec052 to c0f6cd4 Compare February 17, 2023 10:43

funbringer added 2 commits February 17, 2023 14:22

[proxy] Improve tracing spans here and there.

f575909

funbringer force-pushed the proxy-telemetry branch from c0f6cd4 to f575909 Compare February 17, 2023 11:52

funbringer merged commit d90cd36 into main Feb 17, 2023

funbringer deleted the proxy-telemetry branch February 17, 2023 12:32

hlinnaka added a commit to neondatabase/helm-charts that referenced this pull request Apr 24, 2023

Add setting for OTEL_EXPORTER_OTLP_ENDPOINT to proxy

4bc788f

OpenTelemetry tracing support was added to proxy in neondatabase/neon#3458

hlinnaka mentioned this pull request Apr 24, 2023

Add setting for OTEL_EXPORTER_OTLP_ENDPOINT to proxy neondatabase/helm-charts#32

Merged

hlinnaka added a commit to neondatabase/helm-charts that referenced this pull request Apr 25, 2023

Add setting for OTEL_EXPORTER_OTLP_ENDPOINT to proxy

952a49f

OpenTelemetry tracing support was added to proxy in neondatabase/neon#3458

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proxy: OpenTelemetry tracing. #3458

proxy: OpenTelemetry tracing. #3458

hlinnaka commented Jan 26, 2023 •

edited by funbringer

Loading

funbringer left a comment

hlinnaka commented Feb 1, 2023

funbringer commented Feb 1, 2023

LizardWizzard left a comment

funbringer commented Feb 17, 2023 •

edited

Loading

LizardWizzard commented Feb 17, 2023 •

edited

Loading

kelvich commented Feb 17, 2023

proxy: OpenTelemetry tracing. #3458

proxy: OpenTelemetry tracing. #3458

Conversation

hlinnaka commented Jan 26, 2023 • edited by funbringer Loading

funbringer left a comment

Choose a reason for hiding this comment

hlinnaka commented Feb 1, 2023

funbringer commented Feb 1, 2023

LizardWizzard left a comment

Choose a reason for hiding this comment

funbringer commented Feb 17, 2023 • edited Loading

LizardWizzard commented Feb 17, 2023 • edited Loading

kelvich commented Feb 17, 2023

hlinnaka commented Jan 26, 2023 •

edited by funbringer

Loading

funbringer commented Feb 17, 2023 •

edited

Loading

LizardWizzard commented Feb 17, 2023 •

edited

Loading