Add clarification for Span under multithreading scenarios. #120

carlosalberto · 2018-05-30T15:38:47Z

Multithreading Span usage.

Current State: Draft
Author: carlosalberto

In the OpenTracing specification, there is no mention of Span behavior under multithreaded applications, nor clarification of any thread-safety requirement. This proposal is thus intended to be explicit about this specific scenario.

Background

Different tracing systems have historically taken different decisions regarding this, with many of them making Spans (and their operations) thread-safe. Yet, it has been shown lately that a few frameworks have decided to not make Spans thread-safe, and instead provide their own ways to handle this scenario. This could in the near future create confusion among both users and tracing authors, as well as incompatibilities among implementations.

Specification Changes

The Span interface adds one or more paragraph detailing the expected behavior of the objects under multithreading scenarios, specially when a single operation may take place through different threads (maybe using a thread pool). Span objects should then either:

Being thread-safe in all its operations (logs, tags, baggage).
Guaranteed to work fine and correctly under the described scenarios.

Use Cases

Application using a thread pool

Multithreaded applications using a thread pool, with the user creating a Span for a given operation, which would be executed as a series of callbacks, each one running on a different, unknown thread.

For many frameworks, this could be approached with the Span being passed between the threads/callbacks during its execution, to add logs, set tags or modify its baggage.

Risk Assessment

The following risks have been identified:

This change may affect tracing systems that have taken for granted that there is no need to provide any form of thread-safety, as they would need to update or change its own API or code.
The clarification may stay too vague or be insufficient in the long term, regarding the frameworks that decide to not provide thread-safety. Thus a proper, clear addition needs to happen for this case.

wu-sheng

If SkyWalking is going to support span across thread, I hope could be some explicit way to declare it. Otherwise, the auto instrumentation agent can't prepare for that. It is major difference between SkyWalking and manual instrumentation, we must propagate context in ThreadLocal(Java), because no one will do that, and we don't want people to do so.

wu-sheng · 2018-05-31T01:09:37Z

rfc/span_multithreading.md

+
+The `Span` interface adds one or more paragraph detailing the expected behavior of the objects under multithreading scenarios, specially when a single operation may take place through different threads (maybe using a thread pool). `Span` objects should then either:
+
+* Being thread-safe in all its operations (logs, tags, baggage).


That will be a performance concern for span's OPs. Many existed APM/tracers expected span don't face multiple threads, that is also based on this.

yurishkuro

without nitpicking on specific wording, I am in favor of clarifying that span has to be thread-safe

yurishkuro · 2018-05-31T03:07:52Z

@wu-sheng It was my understanding that the skywalking agent code is not using OpenTracing, it is coded directly against skyWalking spans. The only part that is OpenTracing compatible is if users interact with the spans directly from the application, via Tracer. Would that not provide sufficient indication that you need to expose a thread-safe version of the span if accessed via OT API, while the internal API used by the agent can be single-threaded?

wu-sheng · 2018-05-31T03:40:26Z

It was my understanding that the skywalking agent code is not using OpenTracing, it is coded directly against skyWalking spans. The only part that is OpenTracing compatible is if users interact with the spans directly from the application, via Tracer. Would that not provide sufficient indication that you need to expose a thread-safe version of the span if accessed via OT API, while the internal API used by the agent can be single-threaded?

@yurishkuro Yes, for the bridge, we can do that. But, my concern is more about that. You could just consider this as the disadvantage of auto-instrumentation agents.

Java program also runs in a single thread(exclude distributed), except for thread pool or job assignment modules, so nearly all Java trace agent(.NET likely too), store context in Thread(ThreadLocal for Java), and also maintain an active span stack in context for help the plugin(instrumentation) developer easier to process parent/child relationship. New span's parent is also the last active span.

Such as: A->B->C, when C created, B is its parent span as auto and default. If you finish C, then create D, then D is also B's child too. All these are automatic, that is core of auto instrumentation for many languages.

So you can find out, when span(like C) can cross thread, the stack structure fails, because, you can't guarantee the C has been finished before D created, but you may expect that.

That is a deep conflict, and also why I argue the across thread span all the time.

Although, I think you understand, SkyWalking always did instrumentation in auto way, so we don't relied on OT instrumentation very much. If we consider a limited support, such as providing a way to create some span manually, then SkyWalking can say, we only support in that way.

yurishkuro · 2018-05-31T03:47:30Z

ScopeManager also keeps spans in thread local in a stack-like manner.

@carlosalberto I'm curious, with the async frameworks using ScopeManager, has anyone looked at what happens with that stack? In Jaeger's previous context manager impl we had a clear() method that would always be called by the server endpoint, but ScopeManager doesn't have it.

felixbarny · 2018-05-31T07:58:22Z

Could you elaborate on the "Application using a thread pool" use case? Why don't you have one Span per thread/callback?

Ensuring visibility across threads is, of course, important. Especially when the spans are reported asynchronously in a different thread than in the one they have been recorded. But that can be assured using cheaper mechanisms, like memory barriers. Not quite sure why requiring full thread safety is needed. Did you do some benchmarks to get a feel for the costs of this? This should go into the risk assessment IMHO.

wu-sheng · 2018-05-31T08:50:31Z

Why don't you have one Span per thread/callback?

SkyWalking did this too.

Did you do some benchmarks to get a feel for the costs of this?

Please consider the benchmark scenario should be collecting more than 6000 calls per second, each call has at least 15 spans. That is at least what SkyWalking need and promise to support in product use.

That is also my performance concern comes.

carlosalberto · 2018-05-31T13:19:58Z

Why don't you have one Span per thread/callback?

Sure, this is the route taking by some applications or tracing designs (a new Span per thread) - but some other times you will end up creating a Span in a thread, pass it to next thread (maybe through a callback), one at a time, and finish it when the last point is reached.

So the point of this PR is precisely to get this discussion going, and get some feedback ;)

Please consider the benchmark scenario should be collecting more than 6000 calls per second, each call has at least 15 spans.

I know other tracing systems (Jaeger, LightStep for a start) do thread-safe Spans. Wondering about the performance scenarios (if any, and if so, which ones, etc).

carlosalberto · 2018-05-31T13:22:14Z

Hey @yurishkuro

with the async frameworks using ScopeManager, has anyone looked at what happens with that stack? In Jaeger's previous context manager impl we had a clear() method that would always be called by the server endpoint

Oh, there was some basic experimentation with Akka (from my side at least), which ended up me being re-introducing the AutoFinishScopeManager intoopentracing-util. That being said, maybe it's time to re-check it (the clear() method would make lots of sense to me, btw).

yurishkuro · 2018-05-31T14:54:50Z

@carlosalberto would be useful to run Akka etc. examples under profiler to check for memory leaks, because as far as I can tell a naive way of starting non-auto-finish span/scope might leave the scope on the stack forever.

carlosalberto · 2018-07-09T14:00:53Z

Hey all!

We would like to have a Cross-Language group call on July 13th (next week) and discuss this item as part of it. Please let me know if you guys can make it. I'm specially interested, if possible, in having @wu-sheng and @felixbarny if possible ;)

We will also be probably discussing it over Gitter this week, prior to the call ;)

tedsuo · 2018-07-12T23:55:46Z

@felixbarny I agree that "Thread safety" should be clarified.

In this case, it should just mean "safety," in the sense that it exceptions will not occur, memory will not be corrupted, and functionality will continue to work. There is no need to guarantee any ordering of operations between threads, etc. In practice there could be more efficient safety guarantees than locks.

Add a RFC for Span under multithreading scenarios.

32705a0

carlosalberto requested review from tedsuo, yurishkuro and wu-sheng May 30, 2018 15:39

carlosalberto changed the title ~~Add a RFC for Span under multithreading scenarios.~~ Add clarification for Span under multithreading scenarios. May 30, 2018

wu-sheng reviewed May 31, 2018

View reviewed changes

yurishkuro reviewed May 31, 2018

View reviewed changes

carlosalberto mentioned this pull request Jun 7, 2018

Feedbacks from Apache SkyWalking project opentracing/opentracing-java#257

Closed

carlosalberto mentioned this pull request Jun 22, 2018

Clarify thread-safety of Span #115

Open

yurishkuro closed this May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add clarification for Span under multithreading scenarios. #120

Add clarification for Span under multithreading scenarios. #120

carlosalberto commented May 30, 2018

wu-sheng left a comment

wu-sheng May 31, 2018

yurishkuro left a comment

yurishkuro commented May 31, 2018

wu-sheng commented May 31, 2018

yurishkuro commented May 31, 2018

felixbarny commented May 31, 2018

wu-sheng commented May 31, 2018

carlosalberto commented May 31, 2018 •

edited

carlosalberto commented May 31, 2018

yurishkuro commented May 31, 2018

carlosalberto commented Jul 9, 2018

tedsuo commented Jul 12, 2018


		The `Span` interface adds one or more paragraph detailing the expected behavior of the objects under multithreading scenarios, specially when a single operation may take place through different threads (maybe using a thread pool). `Span` objects should then either:

		* Being thread-safe in all its operations (logs, tags, baggage).

Add clarification for Span under multithreading scenarios. #120

Add clarification for Span under multithreading scenarios. #120

Conversation

carlosalberto commented May 30, 2018

Multithreading Span usage.

Background

Specification Changes

Use Cases

Application using a thread pool

Risk Assessment

wu-sheng left a comment

Choose a reason for hiding this comment

wu-sheng May 31, 2018

Choose a reason for hiding this comment

yurishkuro left a comment

Choose a reason for hiding this comment

yurishkuro commented May 31, 2018

wu-sheng commented May 31, 2018

yurishkuro commented May 31, 2018

felixbarny commented May 31, 2018

wu-sheng commented May 31, 2018

carlosalberto commented May 31, 2018 • edited

carlosalberto commented May 31, 2018

yurishkuro commented May 31, 2018

carlosalberto commented Jul 9, 2018

tedsuo commented Jul 12, 2018

carlosalberto commented May 31, 2018 •

edited