-
Notifications
You must be signed in to change notification settings - Fork 182
Scope Manager should use continuation passing #126
Comments
I've added an implementation of CPS-style JS span activation opentracing/opentracing-javascript#113. For example, the implementation using the Zone TC39 proposal: const propertyKey = 'opentracing';
const zoneName = 'opentracing';
interface PropertyData {
owner: ZoneSpanManager;
span: Span | null;
}
export class ZoneSpanManager implements SpanManager {
active(): Span | null {
for (let zone = Zone.current; zone = zone.getZoneWith(propertyKey), zone; zone = zone.parent) {
const { owner, span } = zone.get('opentracing') as PropertyData;
if (owner === this) {
return span;
}
}
return null;
}
activate<A>(span: Span | null, f: () => A): A {
const zone = Zone.current.fork({
name: zoneName,
properties: {
[propertyKey]: {owner: this, span}
}
});
return zone.run(f);
}
} I also tried implementing the current API. Didn't wind up nearly as clean and I still had questions about edge cases for the implementation. https://github.com/rivethealth/opentracing-javascript/blob/pauldraper/wip/src/zone_scope_manager.ts |
Drive-by comment by a naive reader here: the datadog client has its own implementation https://github.com/DataDog/dd-trace-js/tree/master/src/scope, which may be of interest to you. (I’m no author of that code, I have only been diving into this code as a user for the past few days and was looking for details on how scoping worked.) |
@alloy, that is of great interest, thanks! Until last month, Datadog used a CPS-style CLS library. I didn't know they had replaced it. The current imperative RFC can be implemented , though that library is a good example of how non-trivial and non-obvious the implementation is in JavaScript. |
@rochdev cc ^ |
We replaced CLS with a scope manager that is as close to the specification as possible as we planned to bring that work to Basically we had similar thoughts about how this should work, and here were our findings so far: We used a combination of
We then ended up using only As far a closing the scope, the approach we took currently is to automatically close the scope only when the entire asynchronous context has ended, including any child contexts. This approach has proven difficult but not impossible to implement. This is effectively I think it would be possible to actually have the 2 APIs available. There has been talks of a higher level API, and I think this could be a great use case for this. Basically the scope manager would be the low-level construct, and the CPS approach that was proposed by @pauldraper could be the high-level API that abstracts the complexity of the scope manager. This would however come with a few restrictions of how the scope manager is implemented, especially related to when the scope should be closed. |
I think that's how Java works, with its AutoFinishScope. My question was really about manually closing the scope. At what point in this code should what scope(s) be manually closed to have the commented effect? // ES 2017
const scope = scopeManager.activate(span, false);
const promise = f1(); // active
f2(); // active
await promise;
f3(); // active
f4(); // not active
Or in Java // Java 8
Scope scope = scopeManager.activate(span, false);
CompletableFuture promise = CompletableFuture.supplyAsync(f1, hookedExecutor); // active
f2(); // active
promise.thenApply(result -> {
f3(); // active
f4(); // not active
}); I'd say in between f3() and f4(), but can it be closed from another thread? Or is a new scope created and you have to call |
Hey @pauldraper Thanks a lot for the detailed explanation. Sorry for the delayed answer - wanted to gather together some information before providing the proper information ;)
Quick answers on those:
So my feeling is that we need to do an even better effort at making things clear - we are probably in a state where insiders know pretty well the expected API behavior, thus we have been forgetting to make those details widely public. Hopefully I can put some more love into #122 to solve this (and later, mention that in the README of each language implementation).
Regarding this, @yurishkuro had previously mentioned the possibility to add a
At the moment both the Python and C# implementations take this into account (the former when instrumenting # A Scope is created, kept around, automatically restored after run_coroutine_a()
# and finally closed/deactivated.
# For the children, it is expected that Span is automatically propagated and set as active,
# probably with the help of some additional instrumentation code.
with tracer.start_active_span('foo') as scope:
run_something()
await run_coroutine_a()
run_something_else() For C# it's even simpler: // A Scope is created and automatically automatically restored after
// the task (coroutine-alike) finishes. `
# AsyncLocal` is used to store the active `Span`, so it gets *automatically* propagated.
using (IScope scope = tracer.BuildSpan("foo").StartActive()) {
RunSomething();
await RunATask();
RunSomethingElse();
}
My impression, after the iterations on this topic, is that most developers want more detailed, manual control over things (take, for example, the desire to remove automatic finish of However, what has worked for other languages may not work for the rest (Go is an example of that too). And my impression, given the nature of Javascript and and the challenges I knew about its context propagation (even before hearing of DDog's effort), is that you may be on the right track for it (and obviously, if somebody thinks somebody else, feel free to jump in and add feedback ;) ) |
Hey @rochdev
Oh, indeed, I was thinking the same - @tedsuo had proposed a high level API sitting on top of what we have now, so it's definitely something worth trying out (eventually, that is). |
On the latest @pauldraper comment:
So with the current API you'd need to get a
(Another reminder we need to work better at the spec and documentation ;) ) |
@carlosalberto I haven't tried if scope managers are easy to extend for other functionality (e.g. Brave has a bunch of hooks into its scope managers), but one thing that would be super useful as extension is an optional verifier that would check that a scope is never accessed from another thread aside from the one that created it. I think we should make it clear that scopes must always be "scoped" to a single thread. |
So far I have an implementation that we use in our tracer since we went GA. I'm currently reworking it to be simpler and more efficient. We've had many iterations and at this point are confident that it should be implemented like this:
For example: const scope = tracer.scopeManager().activate(tracer.startSpan('A'), true)
const promise = Promise.resolve()
setImmediate(() => {
// active scope is "A"
tracer.scopeManager().activate(tracer.startSpan('B'))
promise
.then(() => {
// active scope is "B"
tracer.scopeManager().activate(tracer.startSpan('C'))
// scope C will be automatically closed here since it wasn't closed explicitly before the end of the context
})
.then(() => {
// active scope is "B"
// span of scope A will be finished after this context ends since there are no more descendant asynchronous contexts and finishSpanOnClose was set to true
})
// scope B will be automatically closed here since it wasn't closed explicitly before the end of the context
})
scope.close()
// there is no active scope at this point Please let me know if you need clarification or disagree with any of this as we are actively refactoring our scope manager, and would like to start work to reintegrate it in |
An alternative would be to require the user to manually finish the scope in these edge cases. This would make the implementation much simpler and make it a lot easier to support other older Node versions and browsers. From opentracing/opentracing-javascript#113:
I would like to point out however that if you look at the code of for example
|
@pauldraper After discussing this in Gitter, it seems the API doesn't have to be exactly the same as the spec, but the semantics should be the same. So in theory it should be possible to use an API similar you proposed but with the constructs from the spec: interface ScopeManager {
active(): Scope;
activate(span: Span, f: (scope: Scope) => void): void;
}
Not sure if it makes sense to still expose |
Logically speaking, the callback is a child of the execution at the time the callback was created (not whatever execution completed the Promise). Zone.js chooses the same thing. I would think java.concurrent.CompletionStage instrumentation follows the same principle.
If you use CPS, there's no point to having a Scope to determine span activation. The callback provides the scoping already. The raw simplicity is one of the advantages of CPS here. interface ScopeManger {
active(): Span;
execute(Span span, f: () => void): void;
} Honest question: Does any one have an actual concrete example where you want to do something other than scope = scopeManager.activate(span)
... some synchronous work ...
scope.close() If there is a more complicated API, there should be some very tangible, real-world examples to justify it. As @rochdev has out, the real difficult of the RFC, CPS or not, is the automatic close. That is, closing the span one every scope descendant has closed. It complicates things because it requires more extensive monitoring of async tasks. E.g. function f() {
...
}
const scope = scopeManager.active(span, true);
// https://github.com/tc39/proposal-observable
// https://github.com/ReactiveX/rxjs
const observable = ...
observable.subscribe(f);
...
scope.close(); I now have to track when the observable has finished, or the when the subscription has been unsubscribed, so that I know when all child scopes have ended. And there are common APIs that make that even harder to do like DOM EventTarget which deduplicates listeners. Question: What is the concrete example where this is useful? Again, making very complex-to-implement standards needs to have some real, tangible, compensory justification. What is it? |
Just to be clear this is exactly what I'm proposing (wasn't sure if this was an argument or validation of the point).
Totally agree. I was just trying to stick to the spec constructs but since it's unnecessary it can probably be the span directly. However, the context that lives inside the callback should definitely be called the "scope" as a concept (i.e. in docs).
scope = scopeManager.activate(span)
... some synchronous work ...
scope.close() Technically, for this specific imperative example I'd say the most common one is async/await: const scope = scopeManager.activate(span)
await somePromise()
scope.close() // this no longer runs synchronously We could then consider that this is yet another use case that is better served by using CPS: scopeManager.activate(span, async span => {
await somePromise()
// scope can be closed synchronously in the wrapper without waiting
}) Maybe there are other use cases, but I can't think of one at a glance.
Not many to be honest :) And you're absolutely right that it comes with not only additional complexity but also performance implications. In my most recent implementation testing I've removed this tracking. |
Sorry for the ambiguity. I was further justifying that.
Yes. This goes back to some of my original questions, but I believe that you'd want the imperative one to be. scope = scopeManager.active(span)
promise = somePromise()
scope.close()
await promise or equivalently scope = scopeManager.active(span)
somePromise().then(() => ...)
scope.close() But I don't believe there is a correct way to close the scope without breaking apart lines scope = scopeManage.active(span)
// scope.close() -- wrong
await somePromise()
// scope.close() -- wrong
:/ |
Simple implementation using CPS: https://github.com/DataDog/dd-trace-js/blob/cps-scope-manager/src/scope/new/scope_manager.js Sorry if the semantics are a bit weird in a few places as I didn't update the naming from the previous implementation. |
Meanwhile, more than a year later.... I think it should be obvious by this point that there is an absurd amount of complexity around what is essentially already a solved problem. (https://github.com/angular/zone.js/, https://github.com/royalrover/threadlocal, https://github.com/othiym23/node-continuation-local-storage) How this works is so trivially obvious:
Whereas implementing the monstrosity of Opentracing context manager is so difficult that many months later no one has been able to do it in opentracing-javascript (opentracing/opentracing-javascript#112 aws/aws-xray-sdk-node#60). I wonder why OpenTracing even tries to pretend to bring multiple languages under one standard; the decisions seem so obvious in their disregard of other languages. Originally, it started with Go programmers. Go has no thread locals, so OpenTracing ignores context propagation altogether. OpenTracing subsequently fails over and over again to be adopted into Java or Python stacks because it lacks the feature that APMs have had since day 1. After lengthy discussion, OpenTracing finally adds context propagation but then gears it very specifically towards a certain concurrency model, and adds all sorts of noise like auto closing. And so in the end, I'm stuck writing my own thing because everyone is handicapped by a couple special languages that the OpenTracing authors happened to know. |
@pauldraper One of the reasons the scope manager didn't happen in OpenTracing is that most of the effort has shifted to OpenTelemetry as it will replace OpenTracing soon. While the language bias towards Java has unfortunately not been addressed in OpenTelemetry, the scope manager is at least already merged and is actively being improved. I would recommend to move any discussion related to the scope manager over there. |
Hey @pauldraper This is possibly a good time to join the OpenTelemetry Javascript effort to provide feedback and insight into the in-process propagation part. Btw, agree that the |
https://github.com/opentracing/specification/blob/master/rfc/scope_manager.md
Problems
1. Scope Manager is hard to understand and easy to misuse
APIs should be easy to understand and hard to misuse. This is neither. opentracing-java discussion has been a barrage of variations on this idea with edge cases, competing priorities, awkward APIs, unintuitive descriptions, and general confusion. When the conversation is muddled, it's a sign the best answer lies along a different direction.
2. Scope Manager causes memory leaks where there were none before
Last I checked, Java auto-reactivates the last active span. This was a problem with opentracing-java and caused crashes at Lucid Software in a couple messy parts of the code.
Unbounded async stacks are not a new problem, but (for Java, at least), this is an entirely avoidable problem for Scope Manager.
3. In continuation-based languages, it extremely hard to understand
I've spent the past several days on implementations of the RFC for JavaScript, with both Node.js async_hooks and Zone.js.
i. Does the scope last until A? Does it automatically end, or do I have to close it?
ii. Or does it last until through B and needs to be closed there?
iii. Or the scope last until A and another second scope is implicitly created at B and that second scope needs to be closed?
I believe (iii) is the "sensible" solution .
Solutions require either (1) leaking memory creating Scope Managers (2) an explicit enable/disable step (3) WeakMaps of Scope Managers, which is non-trivial.
Every continuation local storage solution is instead is based on callbacks, where the storage is scope to a callback and it's transitive calls.
https://github.com/othiym23/node-continuation-local-storage
https://github.com/angular/zone.js/
https://nodejs.org/api/domain.html
Even the Go hack for thread locals uses a CPS API: https://godoc.org/github.com/jtolds/gls.
Solution
Similar to my proposal Feb 2017, #23 (comment) in-process propagation should have a continuation-based API.
Proposal
The entire API could be essentially as simple as:
That's it.
There's no open pits for users to injure themselves, no complicated questions. It unambiguously shadows and restores stacks. And it lends itself to "obviously correct" implementations.
Downsides:
It typically adds at least one and probably two frames to the stack for each activated span.
Fitting in some imperative boxes become hard or even impossible.
IDK how often that pattern comes up. FWIW, the Java servlet standard uses CPS.
It would be possible to have imperative exceptions in certain cases, e.g. Java supports a raw set(Span).
But for 90% of the time and most languages, CPS is a really straightforward, robust, and broadly applicable way to augment scoping/flow control.
The text was updated successfully, but these errors were encountered: