Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transaction metadata (spans, Custom) does not get created/transferred (I think) when incoming request > threshold, and when the Expect HTTP header present #972

Closed
pbalwall opened this issue Oct 9, 2020 · 10 comments · Fixed by #992
Assignees
Labels
bug Something isn't working

Comments

@pbalwall
Copy link

pbalwall commented Oct 9, 2020

I am using Elastic.Apm/Elastic.Apm.AspNetFullFramework 1.6.1 in a asp.net WebApi service.

I have noticed cases where a transaction is visible in the Kibana APM app, and does include the standard metadata (HTTP headers, etc), but not the API produced metadata such as the Spans, and the Custom items. From some testing, I think I have identified a couple of conditions that appear to effect this disappearance:

  • the presence of the Expect header (which seems to occur by default in HttpClient .. I think) .. e.g. 'Expect: 100-continue'
  • when the Content-Length exceeds some threshold, though not very clear on this threshold because it seemed to vary in my test between http and https. In my test, increasing the size of the request from 16K to 18K appeared to have this behavior (with no other mis-behavior in terms of web app function, nor the APM transaction being sent to the APMServer (other than the missing metadata)).

I have also seen other strangeness such as when Labels were populated that the transaction was not visible in Kibana. [Because of this issue, I am using the Custom items rather than Labels]

To Reproduce
This mis-behavior occured when the client was using the Expect header, and when the requests exceeded this threshold. I have subsequent reproduced this (in the context of the web app) this misbehavior using curl .. e.g

"c:\program files\curl\bin\curl" -v -H "Expect: 100-continue" -H "Content-Type: application/json" -H "Authorization : Basic xx=" --data-binary "@c:/temp/searchentity.json" http://xxx.xx.net/..

Expected behavior
Its very possible some aspect of my APM configuration or usage pattern is not ideal, but the fact that the transaction is registered in the APMServer, with the default metadata, and occasionally with this additional metadata (contingent on the non-presence of the conditions listed above) lead me to think this is primarily a issue with the APM agent.

Is there any logging available from the APM agent? I have not seen much at in the NT event logs.

@pbalwall pbalwall added the bug Something isn't working label Oct 9, 2020
@pbalwall
Copy link
Author

After some more experimentation, I believe this issue is not really related to the size of the request. Instead I think it is a important issue with how this agent manages the operative CurrentTransaction and CurrentSpan .. with the root issue being that this NetFramework agent assumes that the managed thread executing the module (which is where CurrentTransaction is associated with thread-local-storage) is the same thread used in subsequent processing of the request. In my asp.net mvc app, subsequent request processing, such as invocation of Controller methods in fact could occur on a different managed thread. The size of the request merely increases the likelihood that the thread performing the module OnBeginRequest method is different from the thread performing subsequent processing (for the request). [I have seen this happen (I think) on both Windows Server 2012R2 and Windows Server 2016, and in the 4.7 and 4.8 versions of asp.net/.net framework.

The root cause of the reason for the missing spans/Custom was that my Controller (since it was running on a different thread) got back a null from Agent.Trace.CurrentTransaction .. and proceeded to create its own new ITransaction using Tracer.StartTransaction(), into which it put the spans/Custom. The module though on the OnEndRequest, was still operating on the ITransaction it itself had created, and which did get sent to the APMServer .. but without any spans/Custom.

If I am correct, the main issue is the implementation of Elastic.Apm.CurrentExecutionSegmentsContainer, and its reliance on thread local storage. I downloaded the latest master branch and created a implementation of ICurrentExecutionSegmentsContainer (in the AspNetFullFramework project) that relies instead on HttpContext.Items. [I had to change the interface to use ISpan and ITransaction.] This seemed to fix the issue.

I have not though looked into the reason for the issue with the Expect header .. though I think its very possible that it is the same issue .. and perhaps for that even HttpContext.Items might not suffice.

@pbalwall
Copy link
Author

My fix (ie using HttpContext.Items) seems to work when the http header 'Expect: 100-continue' is present as well. I thinking this header always causes a difference in threads between the module and the subsequent Controller processing.

@russcam
Copy link
Contributor

russcam commented Oct 26, 2020

Thanks for opening @pbalwall, and for the detailed analysis. I'm looking at a fix for this

@russcam russcam self-assigned this Oct 26, 2020
russcam added a commit to russcam/apm-agent-dotnet that referenced this issue Oct 27, 2020
This commit adds an implementation of ICurrentExecutionSegmentsContainer
that gets/sets the current transaction and span in HttpContext.Items,
in addition to AsyncLocal<T>, making them available within the
ExecutionContext of IIS.

AsyncLocal<T> are not propagated across the ASP.NET ExecutionContext
running in IIS, resulting in the current transaction set in Application_BeginRequest
returning as null from Agent.Tracer.CurrentTransaction in later subsequent
IIS events, when such events execute asynchronously. ASP.NET does
however propagate the current HttpContext through HttpContext.Current,
allowing values to be stored in HttpContext.Items hashtable and shared
across IIS events. The HttpContextCurrentExecutionSegmentsContainer
uses both AsyncLocal<T> and HttpContext.Items to get/set the current
transaction and span, allowing them to be available in both IIS
events and async contexts. AsyncLocal<T> is still needed for places
where HttpContext.Current is null, for example, DiagnosticSourceListeners
running on a separate thread.

It seems it may be possible for the current transaction and current
span instances stored in AsyncLocal<T> and HttpContext.Items to be different
in such scenarios where one is available but not the other on beginning
the transaction/span, and where availability is reversed on end. Unsure
how likely this is in practice however, and using both AsyncLocal<T> and
HttpContext.Items is the common approach in resolving this issue in < .NET 4.7.1.

For .NET 4.7.1, a new OnExecuteRequestStep method has been added to
HttpApplication, to allow the execution context to be restored/saved:

https://devblogs.microsoft.com/dotnet/net-framework-4-7-1-asp-net-and-configuration-features/#asp-net-execution-step-feature

Closes elastic#934
Closes elastic#972
russcam added a commit to russcam/apm-agent-dotnet that referenced this issue Oct 27, 2020
This commit adds an implementation of ICurrentExecutionSegmentsContainer
that gets/sets the current transaction and span in HttpContext.Items,
in addition to AsyncLocal<T>, making them available within the
ExecutionContext of IIS.

AsyncLocal<T> are not propagated across the ASP.NET ExecutionContext
running in IIS, resulting in the current transaction set in Application_BeginRequest
returning as null from Agent.Tracer.CurrentTransaction in later subsequent
IIS events, when such events execute asynchronously. ASP.NET does
however propagate the current HttpContext through HttpContext.Current,
allowing values to be stored in HttpContext.Items hashtable and shared
across IIS events. The HttpContextCurrentExecutionSegmentsContainer
uses both AsyncLocal<T> and HttpContext.Items to get/set the current
transaction and span, allowing them to be available in both IIS
events and async contexts. AsyncLocal<T> is still needed for places
where HttpContext.Current is null, for example, DiagnosticSourceListeners
running on a separate thread.

It seems it may be possible for the current transaction and current
span instances stored in AsyncLocal<T> and HttpContext.Items to be different
in such scenarios where one is available but not the other on beginning
the transaction/span, and where availability is reversed on end. Unsure
how likely this is in practice however, and using both AsyncLocal<T> and
HttpContext.Items is the common approach in resolving this issue in < .NET 4.7.1.

For .NET 4.7.1, a new OnExecuteRequestStep method has been added to
HttpApplication, to allow the execution context to be restored/saved:

https://devblogs.microsoft.com/dotnet/net-framework-4-7-1-asp-net-and-configuration-features/#asp-net-execution-step-feature

Closes elastic#934
Closes elastic#972
russcam added a commit to russcam/apm-agent-dotnet that referenced this issue Oct 27, 2020
This commit adds an implementation of ICurrentExecutionSegmentsContainer
that gets/sets the current transaction and span in HttpContext.Items,
in addition to AsyncLocal<T>, making them available within the
ExecutionContext of IIS.

AsyncLocal<T> are not propagated across the ASP.NET ExecutionContext
running in IIS, resulting in the current transaction set in Application_BeginRequest
returning as null from Agent.Tracer.CurrentTransaction in later subsequent
IIS events, when such events execute asynchronously. ASP.NET does
however propagate the current HttpContext through HttpContext.Current,
allowing values to be stored in HttpContext.Items hashtable and shared
across IIS events. The HttpContextCurrentExecutionSegmentsContainer
uses both AsyncLocal<T> and HttpContext.Items to get/set the current
transaction and span, allowing them to be available in both IIS
events and async contexts. AsyncLocal<T> is still needed for places
where HttpContext.Current is null, for example, DiagnosticSourceListeners
running on a separate thread.

It seems it may be possible for the current transaction and current
span instances stored in AsyncLocal<T> and HttpContext.Items to be different
in such scenarios where one is available but not the other on beginning
the transaction/span, and where availability is reversed on end. Unsure
how likely this is in practice however, and using both AsyncLocal<T> and
HttpContext.Items is the common approach in resolving this issue in < .NET 4.7.1.

For .NET 4.7.1, a new OnExecuteRequestStep method has been added to
HttpApplication, to allow the execution context to be restored/saved:

https://devblogs.microsoft.com/dotnet/net-framework-4-7-1-asp-net-and-configuration-features/#asp-net-execution-step-feature

Closes elastic#934
Closes elastic#972
@pbalwall
Copy link
Author

@russcam Thank you Russ for you work on this. I did want to ask a question, voice a concern .. though I could be wrong in my understanding. Your implementation of HttpContextCurrentExecutionSegmentsContainer seems to prefer a value in thread-local-storage if available (ie in its get methods). But I fear that if the processing in a http request crosses threads, that your implementation could be accessing the wrong transaction. Ie if the sequence of events was something like

  1. HttpRequest#1 , ApmModule.ProcessBeginRequest : handled by ManagedThread#1
  2. HttpRequest#2, ApmModule.ProcessBeingRequest : handled by ManagedThread#2
  3. HttpRequest#1, Controller.xxx() : handled by ManagedThread#2 - I fear that the thread-local-storage set in event 2 for HttpRequest#2 will be accessed by this event.

I thus am wondering whether the HttpContext.Items should be given priority and only if not present use thread-local-storage.

russcam added a commit to russcam/apm-agent-dotnet that referenced this issue Oct 28, 2020
This commit adds an implementation of ICurrentExecutionSegmentsContainer
that gets/sets the current transaction and span in HttpContext.Items,
in addition to AsyncLocal<T>, making them available within the
ExecutionContext of IIS.

AsyncLocal<T> are not propagated across the ASP.NET ExecutionContext
running in IIS, resulting in the current transaction set in Application_BeginRequest
returning as null from Agent.Tracer.CurrentTransaction in later subsequent
IIS events, when such events execute asynchronously. ASP.NET does
however propagate the current HttpContext through HttpContext.Current,
allowing values to be stored in HttpContext.Items hashtable and shared
across IIS events. The HttpContextCurrentExecutionSegmentsContainer
uses both AsyncLocal<T> and HttpContext.Items to get/set the current
transaction and span, allowing them to be available in both IIS
events and async contexts. AsyncLocal<T> is still needed for places
where HttpContext.Current is null, for example, DiagnosticSourceListeners
running on a separate thread.

It seems it may be possible for the current transaction and current
span instances stored in AsyncLocal<T> and HttpContext.Items to be different
in such scenarios where one is available but not the other on beginning
the transaction/span, and where availability is reversed on end. Unsure
how likely this is in practice however, and using both AsyncLocal<T> and
HttpContext.Items is the common approach in resolving this issue in < .NET 4.7.1.

For .NET 4.7.1, a new OnExecuteRequestStep method has been added to
HttpApplication, to allow the execution context to be restored/saved:

https://devblogs.microsoft.com/dotnet/net-framework-4-7-1-asp-net-and-configuration-features/#asp-net-execution-step-feature

Closes elastic#934
Closes elastic#972
@russcam
Copy link
Contributor

russcam commented Oct 28, 2020

@pbalwall Please voice your concerns!

Your implementation of HttpContextCurrentExecutionSegmentsContainer seems to prefer a value in thread-local-storage if available (ie in its get methods). But I fear that if the processing in a http request crosses threads, that your implementation could be accessing the wrong transaction. Ie if the sequence of events was something like

HttpRequest#1 , ApmModule.ProcessBeginRequest : handled by ManagedThread#1
HttpRequest#2, ApmModule.ProcessBeingRequest : handled by ManagedThread#2
HttpRequest#1, Controller.xxx() : handled by ManagedThread#2 - I fear that the thread-local-storage set in event 2 for HttpRequest#2 will be accessed by this event.
I thus am wondering whether the HttpContext.Items should be given priority and only if not present use thread-local-storage.

Note that the current span and transaction are stored in AsyncLocal<T> and not ThreadLocal<T>. Your concern is valid for ThreadLocal<T> but I don't think it is for AsyncLocal<T>. Let's dissect it:

  1. HttpRequest#1 , ApmModule.ProcessBeginRequest : handled by ManagedThread#1

    Current transaction#1 stored in AsyncLocal<Transaction> and HttpContext.Current.Items

  2. HttpRequest#2, ApmModule.ProcessBeingRequest : handled by ManagedThread#2

    Current transaction#2 stored in AsyncLocal<Transaction> and HttpContext.Current.Items

  3. HttpRequest#1, Controller.xxx() : handled by ManagedThread#2

    Current transaction retrieved. AsyncLocal<Transaction> is tried first, but it's value is null because ASP.NET pipeline events do not propagate this context. HttpContext.Current.Items is tried next and returns transaction#1 set in point 1.

  4. HttpRequest#1, Controller.xxx() performs some async flow e.g. HttpClient call

    Span is started by diagnostic listener for Begin HttpClient call and stored in AsyncLocal<Span> and HttpContext.Current. The HttpContext.Current is null in the async flow if it runs on a separate thread.

    Span is retrieved by diagnostic listener for End HttpClient call. AsyncLocal<Span> is tried first, and returns the value.

So I think the above flow is fine. Where I share a concern is noted in #992;

It seems it may be possible for the current transaction and current
span instances stored in AsyncLocal<T> and HttpContext.Items to be different
in such scenarios where one is available but not the other on beginning
the transaction/span, and where availability is reversed on end. Unsure
how likely this is in practice however, and using both AsyncLocal<T> and
HttpContext.Items is the common approach in resolving this issue in < .NET 4.7.1.

I think it'd be possible for this to happen if tracing an async flow was started in a context where HttpContext.Current is null and crossed across a traced ASP.NET pipeline event. I don't believe this to be typical though; a user's async flow is typically contained inside the handler execution in ExecuteRequestHandler.

@pbalwall
Copy link
Author

@russcam Thank you Russ, I was indeed not aware of the difference between AsyncLocal and ThreadLocal. I cant pretend to understand all the subtlety here, but it definitely seems you have a better handle on this than me!

In our case, our Controller.xxx method was not an async method, and perhaps that made us susceptible to this .. ie if it was an async method, then accessing the transaction stored in AsyncLocal (as in the previous implementation) would have worked fine. Though maybe not, because I recall the managed-thread-id was different in this Controller.xxx() call vs the managed-thread-id in the APM module execution (for the same http request) .. ie if asp.net MVC does not guarantee the same managed-thread-id across all execution for a http request, then perhaps neither AsyncLocal nor ThreadLocal would ever reliably work, and HttpContext (which asp.net does seem to guarantee to be consistent across all execution on behalf of a http request) would be the only way to reliably do this. [Other than your example with use if httpClient if it did run in a different thread where HttpContext was not available.]

Using both though does seem the right way to do this, especially if that is the standard pattern.

@russcam
Copy link
Contributor

russcam commented Oct 29, 2020

if asp.net MVC does not guarantee the same managed-thread-id across all execution for a http request, then perhaps neither AsyncLocal nor ThreadLocal would ever reliably work, and HttpContext (which asp.net does seem to guarantee to be consistent across all execution on behalf of a http request) would be the only way to reliably do this. [Other than your example with use if httpClient if it did run in a different thread where HttpContext was not available.]

The ASP.NET (full framework) pipeline does not guarantee that the same thread is used across the request pipeline events. It's very typical for different threads to be involved in the pipeline of a specific incoming request. HttpContext.Current is stored in CallContext.HostContext and flowed across thread contexts by the framework, which is why HttpContext.Current.Items can be used across pipeline events.

In contrast, I believe that the issue with AsyncLocal<T> is that this ASP.NET framework context flow does not flow the thread ExecutionContext across ASP.NET pipeline events, which includes the async ambient data. So, if an AsyncLocal<T> value is set for thread#1 in an ASP.NET pipeline event e.g. BeginRequest, this is set in the thread ExecutionContext. Now, if thread#2 handles AuthenticateRequest, the next step in the pipeline for this specific incoming request, the ASP.NET pipeline does not flow the ExecutionContext of thread#1 to thread#2.

@russcam
Copy link
Contributor

russcam commented Oct 29, 2020

@pbalwall Once #992 is merged, would you like to try it out and see if it resolves the issue for you?

russcam added a commit that referenced this issue Oct 29, 2020
* Add HttpContextCurrentExecutionSegmentsContainer

This commit adds an implementation of ICurrentExecutionSegmentsContainer
that gets/sets the current transaction and span in HttpContext.Items,
in addition to AsyncLocal<T>, making them available within the
ExecutionContext of IIS.

AsyncLocal<T> are not propagated across the ASP.NET ExecutionContext
running in IIS, resulting in the current transaction set in Application_BeginRequest
returning as null from Agent.Tracer.CurrentTransaction in later subsequent
IIS events, when such events execute asynchronously. ASP.NET does
however propagate the current HttpContext through HttpContext.Current,
allowing values to be stored in HttpContext.Items hashtable and shared
across IIS events. The HttpContextCurrentExecutionSegmentsContainer
uses both AsyncLocal<T> and HttpContext.Items to get/set the current
transaction and span, allowing them to be available in both IIS
events and async contexts. AsyncLocal<T> is still needed for places
where HttpContext.Current is null, for example, DiagnosticSourceListeners
running on a separate thread.

It seems it may be possible for the current transaction and current
span instances stored in AsyncLocal<T> and HttpContext.Items to be different
in such scenarios where one is available but not the other on beginning
the transaction/span, and where availability is reversed on end. Unsure
how likely this is in practice however, and using both AsyncLocal<T> and
HttpContext.Items is the common approach in resolving this issue in < .NET 4.7.1.

For .NET 4.7.1, a new OnExecuteRequestStep method has been added to
HttpApplication, to allow the execution context to be restored/saved:

https://devblogs.microsoft.com/dotnet/net-framework-4-7-1-asp-net-and-configuration-features/#asp-net-execution-step-feature

Closes #934
Closes #972

* tidy after rebase

* Add concurrent requests test

This commit adds an integration test for
HttpContextCurrentExecutionSegmentsContainer
that makes multiple concurrent requests
and asserts that the captured transactions
and span ids align with expectations.
@pbalwall
Copy link
Author

@pbalwall Once #992 is merged, would you like to try it out and see if it resolves the issue for you?

@russcam Definitely!

@pbalwall
Copy link
Author

pbalwall commented Nov 2, 2020

@pbalwall Once #992 is merged, would you like to try it out and see if it resolves the issue for you?

Hello @russcam , from some initial testing yes it indeed does seem to now work better, though due to the sporadic nature of the issue, I will monitor, but things definitely seem better.

Question .. I see this new version is 1.7 .. do you anticipate that your fix will be part of the 1.7 release, and do you have any idea when that will be (afaik there is a beta ongoing for this version)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants