Add clarification for Span under multithreading scenarios. #120
Add clarification for Span under multithreading scenarios. #120
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If SkyWalking is going to support span across thread, I hope could be some explicit way to declare it. Otherwise, the auto instrumentation agent can't prepare for that. It is major difference between SkyWalking and manual instrumentation, we must propagate context in ThreadLocal(Java), because no one will do that, and we don't want people to do so.
|
||
The `Span` interface adds one or more paragraph detailing the expected behavior of the objects under multithreading scenarios, specially when a single operation may take place through different threads (maybe using a thread pool). `Span` objects should then either: | ||
|
||
* Being thread-safe in all its operations (logs, tags, baggage). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will be a performance concern for span's OPs. Many existed APM/tracers expected span don't face multiple threads, that is also based on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without nitpicking on specific wording, I am in favor of clarifying that span has to be thread-safe
@wu-sheng It was my understanding that the skywalking agent code is not using OpenTracing, it is coded directly against skyWalking spans. The only part that is OpenTracing compatible is if users interact with the spans directly from the application, via Tracer. Would that not provide sufficient indication that you need to expose a thread-safe version of the span if accessed via OT API, while the internal API used by the agent can be single-threaded? |
@yurishkuro Yes, for the bridge, we can do that. But, my concern is more about that. You could just consider this as the disadvantage of auto-instrumentation agents. Java program also runs in a single thread(exclude distributed), except for thread pool or job assignment modules, so nearly all Java trace agent(.NET likely too), store context in Thread(ThreadLocal for Java), and also maintain an active span stack in context for help the plugin(instrumentation) developer easier to process parent/child relationship. New span's parent is also the last active span. Such as: A->B->C, when C created, B is its parent span as auto and default. If you finish C, then create D, then D is also B's child too. All these are automatic, that is core of auto instrumentation for many languages. So you can find out, when span(like C) can cross thread, the stack structure fails, because, you can't guarantee the C has been finished before D created, but you may expect that. That is a deep conflict, and also why I argue the across thread span all the time. Although, I think you understand, SkyWalking always did instrumentation in auto way, so we don't relied on OT instrumentation very much. If we consider a limited support, such as providing a way to create some span manually, then SkyWalking can say, we only support in that way. |
ScopeManager also keeps spans in thread local in a stack-like manner. @carlosalberto I'm curious, with the async frameworks using ScopeManager, has anyone looked at what happens with that stack? In Jaeger's previous context manager impl we had a |
Could you elaborate on the "Application using a thread pool" use case? Why don't you have one Span per thread/callback? Ensuring visibility across threads is, of course, important. Especially when the spans are reported asynchronously in a different thread than in the one they have been recorded. But that can be assured using cheaper mechanisms, like memory barriers. Not quite sure why requiring full thread safety is needed. Did you do some benchmarks to get a feel for the costs of this? This should go into the risk assessment IMHO. |
SkyWalking did this too.
Please consider the benchmark scenario should be collecting more than 6000 calls per second, each call has at least 15 spans. That is at least what SkyWalking need and promise to support in product use. That is also my performance concern comes. |
Sure, this is the route taking by some applications or tracing designs (a new So the point of this PR is precisely to get this discussion going, and get some feedback ;)
I know other tracing systems (Jaeger, LightStep for a start) do thread-safe Spans. Wondering about the performance scenarios (if any, and if so, which ones, etc). |
Hey @yurishkuro
Oh, there was some basic experimentation with Akka (from my side at least), which ended up me being re-introducing the |
@carlosalberto would be useful to run Akka etc. examples under profiler to check for memory leaks, because as far as I can tell a naive way of starting non-auto-finish span/scope might leave the scope on the stack forever. |
Hey all! We would like to have a Cross-Language group call on July 13th (next week) and discuss this item as part of it. Please let me know if you guys can make it. I'm specially interested, if possible, in having @wu-sheng and @felixbarny if possible ;) We will also be probably discussing it over Gitter this week, prior to the call ;) |
@felixbarny I agree that "Thread safety" should be clarified. In this case, it should just mean "safety," in the sense that it exceptions will not occur, memory will not be corrupted, and functionality will continue to work. There is no need to guarantee any ordering of operations between threads, etc. In practice there could be more efficient safety guarantees than locks. |
Multithreading Span usage.
Current State: Draft
Author: carlosalberto
In the OpenTracing specification, there is no mention of Span behavior under multithreaded applications, nor clarification of any thread-safety requirement. This proposal is thus intended to be explicit about this specific scenario.
Background
Different tracing systems have historically taken different decisions regarding this, with many of them making Spans (and their operations) thread-safe. Yet, it has been shown lately that a few frameworks have decided to not make Spans thread-safe, and instead provide their own ways to handle this scenario. This could in the near future create confusion among both users and tracing authors, as well as incompatibilities among implementations.
Specification Changes
The
Span
interface adds one or more paragraph detailing the expected behavior of the objects under multithreading scenarios, specially when a single operation may take place through different threads (maybe using a thread pool).Span
objects should then either:Use Cases
Application using a thread pool
Multithreaded applications using a thread pool, with the user creating a
Span
for a given operation, which would be executed as a series of callbacks, each one running on a different, unknown thread.For many frameworks, this could be approached with the
Span
being passed between the threads/callbacks during its execution, to add logs, set tags or modify its baggage.Risk Assessment
The following risks have been identified:
This change may affect tracing systems that have taken for granted that there is no need to provide any form of thread-safety, as they would need to update or change its own API or code.
The clarification may stay too vague or be insufficient in the long term, regarding the frameworks that decide to not provide thread-safety. Thus a proper, clear addition needs to happen for this case.