Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve panic due to concurrent access to ExportSpans #3058

Merged
merged 1 commit into from Dec 29, 2022

Conversation

gsaraf
Copy link
Contributor

@gsaraf gsaraf commented Aug 23, 2022

Background:
After enabling Open Telemetry export to Jaeger using the JAEGER_TRACE env var, started seeing occasional panics across our fleet of buildkitd containers. A helpful pointer from the people at the OpenTelemetry Go repo indicated a possible concurrency problem. Wrapping the ExportSpans with a mutex solved the problem completely.

Issue: #3004

Testing: Before this fix, would get several panics a day. After, have gone for several weeks without any panics.
I've also run the suggested tests: ./hack/test integration gateway dockerfile.

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is really correct.

If your claim that ExportSpans can only be called on a single thread is correct then this is not the only place where it gets called. It will get called by regular context traces from buildkitd (maybe container forwarding as well).

In that case the only way I would see is that all the exporter implementations https://github.com/moby/buildkit/blob/master/util/tracing/detect/detect.go#L78 would need to be wrapped with something that makes ExportSpans safe to call before this exporter is used to create a traceprovider.

Tbh, if the description is correct I don't really understand the logic of this API design. Exporter is supposed to be input to sdktrace.NewBatchSpanProcessor in the regular flow. But if everything breaks down if there are two spanprocessor/traceprovider instances with the same exporter I don't understand what is the point of defining an exporter interface at all.

@gsaraf
Copy link
Contributor Author

gsaraf commented Aug 24, 2022

I've mentioned this thread in the original issue in the opentelemetry SDK, hopefully they will respond.

Without knowing too much about the reason this architecture was chosen, there is a benefit to the exporter interface, as it allows for different implementations of the exporter. This is used even in buildkit, for creating a delegated exporter.

Going through all calls to ExportSpans in the codebase:

  • in util/tracing/detect/delegated/delegated.go the export is guarded by a mutex, though this may have been intended to guard the exporters list. Still safe.
  • in control/control.go I add the mutex in this PR.
  • in cmd/buildkitd/main.go maybe used for uploading traces from the client? I'm not 100% sure. Would potentially need to be guarded as well. I'm happy to add that change.

I see what you mean about detectExporter returning a safe exporter - that sounds fine as well. Do you prefer that change?

@tonistiigi
Copy link
Member

https://github.com/moby/buildkit/blob/master/util/tracing/detect/detect.go#L78 will return for example in this case the Jaeger exporter. Then this exporter is called for example in control where you added mutex. But it also goes to the https://github.com/moby/buildkit/blob/master/util/tracing/detect/detect.go#L99 and from there all the local traces will end up in the exporter. Even if the control path has a mutex and SpanProcessor implementation is single-threaded they still don't know about each other at all.

@gsaraf
Copy link
Contributor Author

gsaraf commented Dec 4, 2022

Sorry about the delay, I've changed the implementation to create a thread-safe exporter instance and use that instead. Let me know if you'd like me to change anything! :)

@gsaraf
Copy link
Contributor Author

gsaraf commented Dec 14, 2022

@tonistiigi - can you re-review? Thanks!

@gsaraf
Copy link
Contributor Author

gsaraf commented Dec 29, 2022

Hi @tonistiigi - can you re-review this please? I'd like to stop having to maintaining my fork :)

@tonistiigi
Copy link
Member

@gsaraf Could you rebase. GH doesn't show conflict but this code has actually changed in latest version.

@gsaraf gsaraf force-pushed the fix_ExportSpans_panic branch 2 times, most recently from 3cae6c6 to fe649ce Compare December 29, 2022 20:37
@gsaraf
Copy link
Contributor Author

gsaraf commented Dec 29, 2022

Thanks for the review! Rebased and made the requested changes.

Signed-off-by: Gahl Saraf <saraf.gahl@gmail.com>
@tonistiigi tonistiigi merged commit fff089f into moby:master Dec 29, 2022
@tonistiigi tonistiigi mentioned this pull request Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants