-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Enable selective tracing with Jaeger and update Jaeger site config schema #9330
Conversation
6b1e400 to
4614d3f
Compare
Codecov Report
@@ Coverage Diff @@
## master #9330 +/- ##
==========================================
- Coverage 41.54% 41.49% -0.05%
==========================================
Files 1333 1333
Lines 72658 72664 +6
Branches 6582 6583 +1
==========================================
- Hits 30184 30151 -33
- Misses 39675 39713 +38
- Partials 2799 2800 +1
|
c0d132f to
00d6c98
Compare
web/src/backend/graphql.tsx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sourcegraph/web this look kosher to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One initial thought would be if it affects CORS, https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Headers
keegancsmith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but there is an approach here which is much more minimal than updating every callsite. And that is to implement your own opentracing.Tracer which either does a noop or uses Jaeger. Then set that as the GlobalTracer.
@keegancsmith this is the approach I tried first. However, I got hung up, because I am using a context item to toggle tracing on/off and the |
e097ac8 to
ce8b5d7
Compare
Just took a look. You are right the API is pretty poor here. You could make it such that your GlobalTracer does noop for all "root" spans. Then the part that does the "real" root span uses the correct decision logic. That way we will get the optional propagating down. That seems like a reasonable approach and easy to reason about (ie only create a span if there is a parent, special method for root spans). But this also works to be fair. Also I see opentelemetry is now a thing which we may want to migrate to, so having an intermediate pkg makes that easier. |
CHANGELOG.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this relate to the retention policy in jaeger? How long is data kept there and what is the default storage capacity in our Kubernetes deployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has no bearing on the data retention policy in Jaeger, which must be configured in Jaeger itself. This only affects behavior of the Jaeger client. This point will be addressed elsewhere, in the Jaeger install docs we provide in the Sourcegraph docs. I've added it to the TODO in this other PR: sourcegraph/deploy-sourcegraph#559.
@keegancsmith Yeah, we should just migrate to opentelemetry. I will open an issue to do that after this is merged. It might be a little tricky given we use some intermediary packages that depend on opentracing. |
Addresses #9300
From the updated CHANGELOG:
have been made with the goal of making it easier to use distributed tracing with Sourcegraph:
The site configuration field
"tracing.distributedTracing": { "sampling" }allows a site admin to control which requests generate tracing data."all"will trace all requests."selective"will trace all requests initiated from an end-user URL with?trace=1. Non-end-user-initiated requests can set a HTTP headerX-Sourcegraph-Should-Trace: true. This is the recommended setting, as"all"can generate large amounts of tracing data that may cause network and memory resource contention in the Sourcegraph instance."none"turns off tracing.Jaeger is now the officially supported distributed tracer. The following is the recommended site configuration to connect Sourcegraph to a Jaeger agent (which must be deployed on the same host and listening on the default ports):
The site configuration field,
useJaeger, is deprecated in favor of"tracing.distributedTracing": { "type": "jaeger" }.The site configuration field
"experimentalFeatures": { "debug.log": { "opentracing" } }toggles debug logging that logs every call initiated from the opentracing (Jaeger) client.Support for configuring Lightstep as a distributed tracer is deprecated and will be removed in a subsequent release. Because most Sourcegraph instances are deployed on-prem and Lightstep is only available "in the Cloud", usage of Lightstep was very low or non-existent. If you are a paying customer and would like us to maintain support, please email support@sourcegraph.com.
Other notes:
"selective"setting and toggle on?trace=1in the URL to notice Jaeger trace collection turn on/off for the given request tree.internal/trace/otpackage (which implements the "selective" tracing behavior described in the CHANGELOG).tracing.distributedTracingwas made, because I anticipate wanting to addtracing.nettraceshortly. If anyone prefers a different naming scheme or site config structure, please comment.TODO
debug(toggle doesn't work)Following merge, I will do the following: