Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DanglingSpansShutdownHandler for cleaning up dangling spans / contexts #1140

Closed
wants to merge 4 commits into from

Conversation

huggsboson
Copy link

No description provided.

@huggsboson huggsboson requested a review from a team November 13, 2023 20:57
Copy link

welcome bot commented Nov 13, 2023

Thanks for opening your first pull request! If you haven't yet signed our Contributor License Agreement (CLA), then please do so that we can accept your contribution. A link should appear shortly in this PR if you have not already signed one.

Copy link

linux-foundation-easycla bot commented Nov 13, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@brettmc
Copy link
Collaborator

brettmc commented Nov 13, 2023

In https://github.com/open-telemetry/opentelemetry-php/blob/main/src/Context/ContextInterface.php#L37 it's documented that you should use try-catch-finally when working inside active scopes.
There's also a helper function that can do this for you.

This covers your use-case, doesn't it? I think it could be better documented though, perhaps in here: https://opentelemetry.io/docs/instrumentation/php/manual/#create-nested-spans ?

@huggsboson
Copy link
Author

No unfortunately even with try {} finally {} php's exit() can be called leaving spans / contexts unended/detached.

@huggsboson
Copy link
Author

Maybe I should package this as an optional shutdown handler instead of a utility function?

@brettmc
Copy link
Collaborator

brettmc commented Nov 13, 2023

php's exit() can be called leaving spans / contexts unended/detached

Things are becoming clearer. So your application just randomly exits in the middle of code, and that's why dangling context. You said this was happening in tests, though - die/exit would terminate phpunit, which renders your application untestable anyway, right? Do you have any use-cases that aren't die/exit based?

If we were to adopt something like this, I agree that it would have to be opt-in - the warnings are important and are most likely alerting developers to bugs, and I don't want those to just be helpfully disappeared since that defeats the purpose of having the checking there in the first place.

I'm not convinced that this belongs in our core repository. Perhaps something like a contrib module, then opting-in could be composer require opentelemetry/die-handler (which can then use autoload->files to register its shutdown handler)?

@huggsboson
Copy link
Author

Yeah sorry I've merged 2 things into this 1 PR that are unrelated:

  1. The primary problem is that unfortunately we have some older code that renders redirect responses and exits in certain cases and we don't want to lose the open spans (nor error) in those cases.
  2. I also noticed while writing a test for my fix to the early exit behavior that some of our tests were accumulating context into the statics (in this case a Phake mocked version of a ContextInterface) that had been left over from earlier runs. With any of our stuff internally that uses statics they have a way of resetting the static in-between tests that we call, I was emulating that same mode here with the reset method.

@huggsboson
Copy link
Author

I don't mind structuring this however you want, one of the challenges are that you have marked some things in Context "internal" (DebugScope and getRoot) and resetting storage impossible (by making the type non-nullable). If you wanted to keep them like that in core I was worried about landing this behavior outside.

@brettmc
Copy link
Collaborator

brettmc commented Nov 14, 2023

some of our tests were accumulating context into the statics (in this case a Phake mocked version of a ContextInterface) that had been left over from earlier runs

That sounds like a bug, and you're hiding it so that the tests pass. Why are earlier runs leaving context in storage, and not detaching them? We've established that die/exit is probably unavoidable without rewriting your framework/application, but that's probably not what's happening in this situation since die/exit would also exit your test framework.

@huggsboson
Copy link
Author

I refactored this as a ShutdownHandler

@huggsboson
Copy link
Author

I also removed the Context::resetStorage. But note: In general statics like this can be hard to deal with, but will try to figure something out.

@huggsboson huggsboson changed the title Add ContextHelper for cleaning up dangling spans / contexts Add DanglingSpansShutdownHandler for cleaning up dangling spans / contexts Nov 14, 2023
while ($ctx->current() !== Context::getRoot()) {
$span = Span::fromContext($ctx->current());
$span->end();
$ds = new DebugScope($ctx->scope());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the benefit of using debugscope compared to just detaching the active scope?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't wrap it in a debugScope it won't set the correct properties and will trigger_error when the original DebugScope goes out of... scope.

Copy link
Collaborator

@brettmc brettmc Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem right. We don't need to do that ordinarily (eg how all/most auto-instrumentation post hooks end the active span).

edit: actually, auto-instrumentation doesn't use DebugScope, because it's guaranteed to detach since observer post hooks always run, even after die/exit. Hmmm...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a bit more, and I see the issue. DebugScope keeps a reference to a scope, but doesn't store it anywhere. That means currently the expectation is that you'd keep a reference to the debugscope, and deactivate that, rather than the "real" scope.
I think that's actually also the issue in #1138

TBH I'm surprised that just creating a new debug scope and then deactivating it actually works, it's surely by happy accident. I'm now wondering whether DebugScope can be improved somewhat, since in this case it's complaining that a scope hasn't been detached when in fact it has. I'll have a play with this - perhaps a weak reference to the real scope, or the debugscope itself could be stored?

/**
* @covers \OpenTelemetry\SDK\Common\Util\DanglingSpansShutdownHandler
*/
class DanglingSpansShutdownHelperTest extends TestCase
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class DanglingSpansShutdownHelperTest extends TestCase
class DanglingSpansShutdownHandlerTest extends TestCase

*
* @return void
*/
public static function shutdown(callable $fn): void
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look like $fn is used now, can it be removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I thought I had. will fix.

// Intentionally no detach call

// Will trigger_error without this
DanglingSpansShutdownHandler::shutdown();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another assertion here that context now is root?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do... I wanted to add an assertion for the span being ended, but the NoopTracers spans don't really do anything. I can try to mock out the various things to include the span end assertion as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get really lost when I try to figure out how to create a span that sets the context. I added your assert but haven't figured out how to simulate the span so I can detect the end call. I can't even really tell how this code is creating a new context given the nonRecordingSpans involved

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd look at mockery here, eg mocking a tracer, then having its spanBuilder method return a partial mock of a span: https://docs.mockery.io/en/latest/reference/partial_mocks.html - you should then be able to assert on the span's end() method.

@brettmc
Copy link
Collaborator

brettmc commented Nov 14, 2023

Somebody is going to want to use this in conjunction with SDK autoloading (from env vars). This is where we need to think about whether it can be enabled by an env var, and thence plumb it in to the sdk autoloading code.
The alternative is a small self-contained package in contrib that's installable via composer. But, if loading order is important (as the comments suggest), then we don't have full control over that via composer's autoload->files feature.

@huggsboson
Copy link
Author

Somebody is going to want to use this in conjunction with SDK autoloading (from env vars). This is where we need to think about whether it can be enabled by an env var, and thence plumb it in to the sdk autoloading code. The alternative is a small self-contained package in contrib that's installable via composer. But, if loading order is important (as the comments suggest), then we don't have full control over that via composer's autoload->files feature.

Happy to take your lead on this one. I hear you about wanting to have it as a separate package, though I worry a bit about it not being considered if DebugScope or Context::getRoot get refactored.

@huggsboson
Copy link
Author

Sorry for going dark here, I was trying to get my local env up. I ended up needing to run make build as the make install would fail due to "exec format error". I'm using colima on an M1 mac. The arm image it pulled seemed to be 32bit (strangely as it was marked arm v8). Doing the make build has allowed me to make progress.

@huggsboson
Copy link
Author

I've fixed the docs / tests / etc. I'm not fully sure what the right move with DebugScope is. deptrac correctly calls out I shouldn't be using it. Ideally there'd be a way to allow access from certain classes that may need it instead of making it non-internal.

@huggsboson
Copy link
Author

huggsboson commented Nov 16, 2023 via email

@brettmc
Copy link
Collaborator

brettmc commented Nov 17, 2023

@huggsboson a cleaner way to solve your issue would be via auto-instrumentation. Is that an option for you? post hooks fire even on exit/die, so all these shenanigans to clean up danging contexts are not required.

@huggsboson
Copy link
Author

huggsboson commented Nov 17, 2023

I think the instrumentation hook method is going to be a hard thing to switch to at this point for us given how some of the flows work. I'm gonna step back from this for a few.

  1. I'm wondering if your suggested plan to put the DebugScope itself in the context would mean we could easily install our own exit handler in the code without using internal classes.
  2. I'm wondering if using a destructor on our side wrapping the span and context is the cleanest path in our own utility code as destructors are guaranteed to be called (though not cleanly ordered).

@Nevay
Copy link
Contributor

Nevay commented Nov 29, 2023

There are two parts to this problem:
a ) warnings triggered by DebugScope on exit()
b) spans not being ended on exit()

(a) IMO the cleanest solution is to suppress the missing call to Scope::detach() warnings by using a custom set_error_handler() that is registered before calling exit().
(Using a shutdown function should not work as the stack is unwound before shutdown functions are invoked.)

(b) can be solved by using a span processor wrapper that keeps track of all SpanProcessor::onStart()ed and not yet SpanProcessor::onEnd()ed spans, Span::end()ing these spans on SpanProcessor::shutdown() before calling ::shutdown() on the inner span processor.

@huggsboson
Copy link
Author

huggsboson commented Nov 29, 2023 via email

@Nevay
Copy link
Contributor

Nevay commented Nov 29, 2023

We don't just want to suppress the warnings.

Scopes do not affect span export.

We want to actually close (and export) any spans that are sitting open in the parent path so we still get those in our trace.

Untested implementation for (b) that exports all open spans on ::shutdown() (relies on the fact that ::onStart() and ::onEnd() receive the same span instance, afair not guaranteed by spec):

use OpenTelemetry\Context\ContextInterface;
use OpenTelemetry\SDK\Common\Future\CancellationInterface;
use OpenTelemetry\SDK\Trace\ReadableSpanInterface;
use OpenTelemetry\SDK\Trace\ReadWriteSpanInterface;
use OpenTelemetry\SDK\Trace\SpanProcessorInterface;
use WeakMap;

final class EndOpenSpansOnShutdownSpanProcessor implements SpanProcessorInterface {

    private readonly SpanProcessorInterface $spanProcessor;
    /** @var WeakMap<ReadWriteSpanInterface, true> $spans */
    private readonly WeakMap $spans;

    public function __construct(SpanProcessorInterface $spanProcessor) {
        $this->spanProcessor = $spanProcessor;
        $this->spans = new WeakMap();
    }

    public function onStart(ReadWriteSpanInterface $span, ContextInterface $parentContext): void {
        $this->spans[$span] = true;
        $this->spanProcessor->onStart($span, $parentContext);
    }

    public function onEnd(ReadableSpanInterface $span): void {
        unset($this->spans[$span]);
        $this->spanProcessor->onEnd($span);
    }

    public function forceFlush(?CancellationInterface $cancellation = null): bool {
        return $this->spanProcessor->forceFlush($cancellation);
    }

    public function shutdown(?CancellationInterface $cancellation = null): bool {
        $spans = [];
        foreach ($this->spans as $span => $_) {
            $spans[] = $span;
        }
        foreach ($spans as $span) {
            $span->end();
        }

        return $this->spanProcessor->shutdown($cancellation);
    }
}

@huggsboson
Copy link
Author

huggsboson commented Jan 3, 2024

But the scopes remaining open will trigger an fatal error in our exit handlers. Closing the scopes and the spans in one go fixed both issues.

Happy to do your span processor solve for closing open spans, but we still don't have a good solver for the fact that the DebugScope error behavior is currently non-optional.

@Nevay
Copy link
Contributor

Nevay commented Jan 4, 2024

I don't think that we can handle exit on library side:

  • we cannot hook into exit (shutdown handler trigger too late, see below)
  • even if we could hook into exit: any exception that is thrown during stack unwinding will start execution of finally blocks -> simply detaching all scopes is not a solution

Would overriding the error_handler before calling exit be sufficient for now? Alternatively we could add an option to disable the debug scope notices completely.

$errorHandler = set_error_handler(static function(int $errno, string $errstr, string $errfile, int $errline) use (&$errorHandler): bool {
    if (str_starts_with($errstr, 'Scope: missing call to Scope::detach()')) {
        return true;
    }
    if (str_starts_with($errstr, 'Scope: unexpected call to Scope::detach()')) {
        return true;
    }

    return $errorHandler($errno, $errstr, $errfile, $errline);
}, /* $errorLevels ? */);
Using a shutdown handler to detach the scopes and/or to suppress the notices will not work, some scope destructors will be called before the shutdown handler is invoked.
register_shutdown_function(static function(): void {
    $ctx = Context::storage();
    while ($scope = $ctx->scope()) {
        $ds = new DebugScope($scope);
        $ds->detach();
    }
});

set_error_handler(static function(int $errno, string $errstr): bool {
    throw new Exception($errstr, $errno);
});

function someFunctionThatMayExit(): void {
    exit;
}

function someIntermediateFunction(): void {
    // Stack is unwound before shutdown handlers are invoked
    // -> this scope will throw before the shutdown handler is invoked
    $scope = Context::getCurrent()->activate();
    try {
        someFunctionThatMayExit();
    } finally {
        $scope->detach();
    }
}

// This scope will be detached by the shutdown handler
$scope = Context::getCurrent()->activate();
try {
    someIntermediateFunction();
} finally {
    $scope->detach();
}

@huggsboson
Copy link
Author

I'm trying to process the fact that the DebugScope only lives in the call stack of the activate caller vs gets stored off into ContextStorage so we can close it later. I guess that gives nice deterministic life span so its finalizer gets called when exit goes out of scope. The downside is I can't close it in cases where exit() gets called as there's no reference to it anywhere. The error handler string thing feels like a potentially brittle hack, say you update the eerror message?

@brettmc
Copy link
Collaborator

brettmc commented Jan 8, 2024

Alternatively we could add an option to disable the debug scope notices completely

I think this sounds like a good idea. The feature is meant to help users to detect incorrect context usage, but in this case it seems to be getting in the way. Being able to disable via config, ie "I know what I'm doing" mode, seems like it would help.

@brettmc
Copy link
Collaborator

brettmc commented Jan 9, 2024

@huggsboson could you test out whether the linked PR helps your use-case? You'll need to install context dev-main

@brettmc brettmc reopened this Jan 9, 2024
@huggsboson
Copy link
Author

That should likely work for us. I'll let you know when we confirm.

@huggsboson
Copy link
Author

closing due to alternative fix.

@huggsboson huggsboson closed this Jan 11, 2024
@huggsboson
Copy link
Author

@brettmc @Nevay I just wanted to follow up and say thank you guys for working through this with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants