New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solve panic due to concurrent access to ExportSpans #3058
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is really correct.
If your claim that ExportSpans
can only be called on a single thread is correct then this is not the only place where it gets called. It will get called by regular context traces from buildkitd (maybe container forwarding as well).
In that case the only way I would see is that all the exporter implementations https://github.com/moby/buildkit/blob/master/util/tracing/detect/detect.go#L78 would need to be wrapped with something that makes ExportSpans
safe to call before this exporter is used to create a traceprovider.
Tbh, if the description is correct I don't really understand the logic of this API design. Exporter is supposed to be input to sdktrace.NewBatchSpanProcessor
in the regular flow. But if everything breaks down if there are two spanprocessor/traceprovider instances with the same exporter I don't understand what is the point of defining an exporter interface at all.
I've mentioned this thread in the original issue in the opentelemetry SDK, hopefully they will respond. Without knowing too much about the reason this architecture was chosen, there is a benefit to the exporter interface, as it allows for different implementations of the exporter. This is used even in Going through all calls to ExportSpans in the codebase:
I see what you mean about |
https://github.com/moby/buildkit/blob/master/util/tracing/detect/detect.go#L78 will return for example in this case the Jaeger exporter. Then this exporter is called for example in control where you added mutex. But it also goes to the https://github.com/moby/buildkit/blob/master/util/tracing/detect/detect.go#L99 and from there all the local traces will end up in the exporter. Even if the control path has a mutex and |
1503aa6
to
8c7c731
Compare
Sorry about the delay, I've changed the implementation to create a thread-safe exporter instance and use that instead. Let me know if you'd like me to change anything! :) |
@tonistiigi - can you re-review? Thanks! |
Hi @tonistiigi - can you re-review this please? I'd like to stop having to maintaining my fork :) |
@gsaraf Could you rebase. GH doesn't show conflict but this code has actually changed in latest version. |
3cae6c6
to
fe649ce
Compare
Thanks for the review! Rebased and made the requested changes. |
fe649ce
to
f81f662
Compare
Signed-off-by: Gahl Saraf <saraf.gahl@gmail.com>
f81f662
to
afb01a7
Compare
Background:
After enabling Open Telemetry export to Jaeger using the
JAEGER_TRACE
env var, started seeing occasional panics across our fleet ofbuildkitd
containers. A helpful pointer from the people at the OpenTelemetry Go repo indicated a possible concurrency problem. Wrapping theExportSpans
with a mutex solved the problem completely.Issue: #3004
Testing: Before this fix, would get several panics a day. After, have gone for several weeks without any panics.
I've also run the suggested tests:
./hack/test integration gateway dockerfile
.