-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a way to debug a context leak #4100
Comments
Hi @minwoox, I'm interesting in this issue. Can I work on this issue? After read through code, Here is my initial idea about implementation.
On point 4 and 5, I'm quite not sure if there is also another place that have add this logic. I check by tracing on where |
@klurpicolo Thanks a lot for your interest in this issue. 😄
We also need to add this logic to |
Ah I was looking into this a few months ago but forgot to continue after a preparation PR. I think you should be able to implement as an implementation of context storage without any other API changes, similar to this code that probably provides a lot of copy pasta to use |
Thanks, @anuraaga for the link. I didn't know that otel had a similar implementation.
If we use the context storage via a JVM flag, we cannot enable the tracking only for a specific service context. |
Stack traces are so expensive that even for a given service it's probably too much for production. Sampling is one approach, but I think in practice enabling the flag on a staging server tends to catch 99+% of leaks anyways, any it's easy for sampling to not help much if the problem is in a path with low testing coverage anyways (if tests hit it, staging server is enough). Instead of picking out possibly problematic services, often after an issue already made it to production, I believe it is a more robust approach to have a flag that users always enable on staging servers and generally keep disabled on production servers. And in general, strongly recommending not requiring code change for detecting context leaks - any code change can have an even bigger problem than a context leak. A follow up to the above could be a flag to allow enabling stack traces based on some simple HTTP properties like path (perhaps the matcher can be code that is loaded by SPI, this is to avoid hard coupling like |
That makes sense. 😄 @klurpicolo Let's use the storage as @anuraaga suggested. |
Ok, Noted. |
…4232) Motivation: - Context leaks are hard to find because an exception does not tell where/which context is pushed without poping. By using TraceAbleRequestContextStorage, it helps to report the source of context leaks. - Details as mentioned in #4100 By the way, Thanks to @anuraaga for giving a reference to read on [opentelemetry](https://github.com/open-telemetry/opentelemetry-java). Modifications: - Add `TraceAbleRequestContextStorage` that stores `RequestContext` stack trace and reports to the user where it happens. - Add `requestContextLeakDetectionSampler` flag that users can use for enable leak detection. Users can enable it by either system property or SPI flag provider. Result: - Closes #4100 - `TraceAbleRequestContextStorage` is added, so users can use it to report where context leaks happen. How to enable: 1) By system property `-Dcom.linecorp.armeria.requestContextLeakDetectionSampler=<sampler-spec>` 2) By providing FlagsProvider SPI ```java public final class EnableLeakDetectionFlagsProvider implements FlagsProvider { @OverRide public Sampler<? super RequestContext> requestContextLeakDetectionSampler() { return Sampler.always(); } ... } ``` 3) By providing RequestContextStorageProvider SPI (not recommend since RequestContextStorageProvider SPI'll be remove as mentioned in ##4211 ) ```java public final class CustomRequestContextStorageProvider implements RequestContextStorageProvider { @OverRide public RequestContextStorage newStorage() { return new TraceAbleRequestContextStorage(delegate); } } ``` Use case: Users problematic code ```java executor.execute(() -> { SafeCloseable leaked = fooCtx.push(); //This causes Request context leaks! ... }); executor.execute(() -> { try (SafeCloseable ignored = barCtx.push()) { //Exception happen here ... } }); ``` The above code will produce an error as below. Therefore, users can check the stack trace that which line causes context leaks. ``` java.lang.IllegalStateException: Trying to call object wrapped with context [%New RequestContext%], but context is currently set to TraceableServiceRequestContext[%Previous RequestContext%] com.linecorp.armeria.internal.common.LeakTracingRequestContextStorage$PendingRequestContextStackTrace: At thread [armeria-testing-eventloop-nio-1-1] previous RequestContext is pushed at the following stacktrace at com.linecorp.armeria.internal.common.LeakTracingRequestContextStorage$TraceableServiceRequestContext.<init>(LeakTracingRequestContextStorage.java:111) at com.linecorp.armeria.internal.common.LeakTracingRequestContextStorage$TraceableServiceRequestContext.<init>(LeakTracingRequestContextStorage.java:105) at com.linecorp.armeria.internal.common.LeakTracingRequestContextStorage.warpRequestContext(LeakTracingRequestContextStorage.java:82) at com.linecorp.armeria.internal.common.LeakTracingRequestContextStorage.push(LeakTracingRequestContextStorage.java:62) at com.linecorp.armeria.internal.common.RequestContextUtil.getAndSet(RequestContextUtil.java:149) at com.linecorp.armeria.server.ServiceRequestContext.push(ServiceRequestContext.java:221) at com.linecorp.armeria.internal.common.TraceRequestContextLeakTest.lambda$singleThreadContextLeak$2(TraceRequestContextLeakTest.java:101) <- This is the line where leaked RequestContext is push ... . This means the callback was called from unexpected thread or forgetting to close previous context. at com.linecorp.armeria.internal.common.RequestContextUtil.newIllegalContextPushingException(RequestContextUtil.java:100) at com.linecorp.armeria.server.ServiceRequestContext.push(ServiceRequestContext.java:237) at com.linecorp.armeria.internal.common.TraceRequestContextLeakTest.lambda$singleThreadContextLeak$3(TraceRequestContextLeakTest.java:107) ... ```
…ine#4232) Motivation: - Context leaks are hard to find because an exception does not tell where/which context is pushed without poping. By using TraceAbleRequestContextStorage, it helps to report the source of context leaks. - Details as mentioned in line#4100 By the way, Thanks to @anuraaga for giving a reference to read on [opentelemetry](https://github.com/open-telemetry/opentelemetry-java). Modifications: - Add `TraceAbleRequestContextStorage` that stores `RequestContext` stack trace and reports to the user where it happens. - Add `requestContextLeakDetectionSampler` flag that users can use for enable leak detection. Users can enable it by either system property or SPI flag provider. Result: - Closes line#4100 - `TraceAbleRequestContextStorage` is added, so users can use it to report where context leaks happen. How to enable: 1) By system property `-Dcom.linecorp.armeria.requestContextLeakDetectionSampler=<sampler-spec>` 2) By providing FlagsProvider SPI ```java public final class EnableLeakDetectionFlagsProvider implements FlagsProvider { @OverRide public Sampler<? super RequestContext> requestContextLeakDetectionSampler() { return Sampler.always(); } ... } ``` 3) By providing RequestContextStorageProvider SPI (not recommend since RequestContextStorageProvider SPI'll be remove as mentioned in #line#4211 ) ```java public final class CustomRequestContextStorageProvider implements RequestContextStorageProvider { @OverRide public RequestContextStorage newStorage() { return new TraceAbleRequestContextStorage(delegate); } } ``` Use case: Users problematic code ```java executor.execute(() -> { SafeCloseable leaked = fooCtx.push(); //This causes Request context leaks! ... }); executor.execute(() -> { try (SafeCloseable ignored = barCtx.push()) { //Exception happen here ... } }); ``` The above code will produce an error as below. Therefore, users can check the stack trace that which line causes context leaks. ``` java.lang.IllegalStateException: Trying to call object wrapped with context [%New RequestContext%], but context is currently set to TraceableServiceRequestContext[%Previous RequestContext%] com.linecorp.armeria.internal.common.LeakTracingRequestContextStorage$PendingRequestContextStackTrace: At thread [armeria-testing-eventloop-nio-1-1] previous RequestContext is pushed at the following stacktrace at com.linecorp.armeria.internal.common.LeakTracingRequestContextStorage$TraceableServiceRequestContext.<init>(LeakTracingRequestContextStorage.java:111) at com.linecorp.armeria.internal.common.LeakTracingRequestContextStorage$TraceableServiceRequestContext.<init>(LeakTracingRequestContextStorage.java:105) at com.linecorp.armeria.internal.common.LeakTracingRequestContextStorage.warpRequestContext(LeakTracingRequestContextStorage.java:82) at com.linecorp.armeria.internal.common.LeakTracingRequestContextStorage.push(LeakTracingRequestContextStorage.java:62) at com.linecorp.armeria.internal.common.RequestContextUtil.getAndSet(RequestContextUtil.java:149) at com.linecorp.armeria.server.ServiceRequestContext.push(ServiceRequestContext.java:221) at com.linecorp.armeria.internal.common.TraceRequestContextLeakTest.lambda$singleThreadContextLeak$2(TraceRequestContextLeakTest.java:101) <- This is the line where leaked RequestContext is push ... . This means the callback was called from unexpected thread or forgetting to close previous context. at com.linecorp.armeria.internal.common.RequestContextUtil.newIllegalContextPushingException(RequestContextUtil.java:100) at com.linecorp.armeria.server.ServiceRequestContext.push(ServiceRequestContext.java:237) at com.linecorp.armeria.internal.common.TraceRequestContextLeakTest.lambda$singleThreadContextLeak$3(TraceRequestContextLeakTest.java:107) ... ```
TL;DR
We can store stacktrace of a
RequestContext
when it's pushed viaRequestContext.push()
and use the information to debug a context leak.We use the
RequestContext
to store and convey the information of a request.If a
RequestContext
is pushed into thread-local and a thread executes lines of code with the context in its thread-local,we call that the lines of code are request-scoped.
Let's see the following example:
Because it's request-scoped, we can use the information of a request in the block.
Let's say that we setup to log message with the request ID using
RequestContextExportingAppender
:Because it prints the request ID, which is unique for each request, with the message, a user easily tracks the call flow of a request in an async server.
The
RequestContext
should be used with try-with-resources to avoid the context leak. Let's say it's not popped by mistake:In the above example, users get confused because the debug log also prints the ID of the
fooCtx
even though it's irrelevant.We call that "Context leak".
In order to prevent it, we raise an
IllegalStateException
when another context is pushed while the thread has a context:So we can notice that the
fooCtx
is not popped after its use.However, because the exception is raised when
barCtx
is pushed, we don't have any stacktrace of thefooCtx
.If we know when the
fooCtx
is pushed by looking at the stacktrace, we can easily fix the context leak.Of course, we can scour through all lines of code to find the leak, but it's especially hard to find it when Armeria is used with third-party or in Kotlin.
So if we can make a
RequestContext
has the stacktrace when it's pushed, we can easily fix the context leak.RequestContext
stores the stacktrace whenRequestContext.push()
is called.RequestContext
removes the stored stacktrace whenRequestContext.pop()
is called.RequestContext
can be pushed multiple times before popped, we need a stack to store the stacktrace.IllegalStateException
is raised, it also prints the stacktrace of the currentRequestContext
in thread-localBecause generating stacktrace costs a lot, we can introduce different detection levels:- DISABLED: Do not record stacktrace- ADVANCED: Record only samples ofRequestContext
s- PARANOID: Record allRequestContext
sas Netty does: https://netty.io/wiki/reference-counted-objects.html#leak-detection-levelsWe can add a flag to enable this feature or setters to
ServerBuilder
andClientBuilder
.The text was updated successfully, but these errors were encountered: