Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET 8 Upgrade #104

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

chrisb92
Copy link

Updating library to support .NET 8

@@ -142,6 +142,7 @@ public async Task Span_has_status_404()
var client = CreateClient();

await client.GetAsync("/not-found");
await Task.Delay(50);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this particular test being flaky in the .NET 8 collection only. It's successful when run on its own or in debug, but will always fail when running all tests (the whole suite or .NET 8 only). Adding a delay between client.GetAsync and fetching the finished spans "fixed" it. If anyone has a better suggestion on how to resolve this, I'm all ears!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have a solution for you, but the problem is the handling of the diagnostics events.

If you compare the output trace between a single run and multipl runs, the working one does log
OpenTracing.Contrib.NetCore.Internal.DiagnosticManager Trace: Stopping AllListeners subscription

So basically, the tests are trying to get the finishedSpans, but the listeners have not handled or received them.
I think, per test, you need to ensure the host shuts down and handles all listeners.

I noticed the same problem btw happens sometimes on .net6 tests

Copy link

@jbroumels jbroumels Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisb92 I did some more tests to figure out where this comes from.

It is also reproducable for the older framework versions, when you go into debug mode:
When you debug from breakpoints in AspNetCoreDiagnostics HandleEvent, you will reach the Unittest "_tracer.FinishedSpans()" before the Stop event is handled on the server, and therefore the Span is not finished and available in FinishedSpans.

I think in .net8 the performance has been inproved in such a way that when there is no endpoint being hit and there is no content on the response, it closes even faster compared to earlier versions. Therefore this now pops up, but it actually was a possible scenario that could happen before as well.

The AspNetCoreDiagnostics, when handling "Microsoft.AspNetCore.Hosting.HttpRequestIn.Stop" event is happening at the moment the HttpContext is being disposed. This means the request and response is fully handled and cleaning up on the server side still. But the client is also already able to process and in the unittest the flow thus continues.

I think the solution here is to make sure the server has completed all work (and cleanup / disposals), before you fetch _tracer.FinishedSpans()

The main problem is not specifically the timing (Task.Delay will also not always work), but there is no synchronisation between the client and the server.
From the client side, you only want to get the spans when the server has completed.
But not just completed the request, but also the deposal of all request related stuff (Diagnostics going to Tracing spans)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that great write up, I'll take a look at the test. :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbroumels @chrisb92 Hi)

How about creating TestMockTracer, inheriting it from MockTracer, and overriding OnSpanFinished? Inside this span, we can use a TaskCompletionSource and set the result when OnSpanFinished is called. Then, we can wait for the task from TCS inside tests.

Copy link

@jbroumels jbroumels Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such would solve the problem of this specific test, but I actually believe the problem is a bit deeper.
You can fix this test with another Tracer, but in actual production code, it also means that this library sometimes will be missing some of the Spans getting completed.

So I believe that is the actual issue and the services handling and dispatching should wait for span completions, not the test itself. The HostedService being used, probably shouldn't shut down when there's still open Spans.

Actually... the problem might even be in the .net framework.
The Host allows for shutdown, while the Diagnostics missing in this case are happening on disposal of the HttpContext. Somehow in .net8 this maybe is so much faster that that the Stop is never hit while the host is still active. So basically the host shuts down while the request is completed, but it's not cleaned up yet

What do you think ?

Copy link

@jbroumels jbroumels Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working Trace:
`

| OpenTracing.Contrib.NetCore.Internal.DiagnosticManager Trace: Starting AllListeners subscription
| OpenTracing.Contrib.NetCore.Internal.DiagnosticManager Trace: Subscriber 'AspNetCoreDiagnostics' returned subscription for 'Microsoft.AspNetCore'
ApplicationStarted
| Microsoft.AspNetCore.Hosting.Diagnostics Information: Request starting HTTP/1.1 GET http://localhost/not-found - - -
| Microsoft.AspNetCore.Hosting.Diagnostics Information: Request finished HTTP/1.1 GET http://localhost/not-found - 404 - - 19.3370ms
| Microsoft.AspNetCore.Hosting.Diagnostics Information: Request reached the end of the middleware pipeline without being handled by application code. Request path: GET http://localhost/not-found, Response status code: 404
ApplicationStopping
| OpenTracing.Contrib.NetCore.Internal.DiagnosticManager Trace: Stopping AllListeners subscription
ApplicationStopped
`

Broken Trace
`

| OpenTracing.Contrib.NetCore.Internal.DiagnosticManager Trace: Starting AllListeners subscription
| OpenTracing.Contrib.NetCore.Internal.DiagnosticManager Trace: Subscriber 'AspNetCoreDiagnostics' returned subscription for 'Microsoft.AspNetCore'
ApplicationStarted
| Microsoft.AspNetCore.Hosting.Diagnostics Information: Request starting HTTP/1.1 GET http://localhost/not-found - - -
| Microsoft.AspNetCore.Hosting.Diagnostics Information: Request finished HTTP/1.1 GET http://localhost/not-found - 404 - - 0.6302ms
ApplicationStopping
| OpenTracing.Contrib.NetCore.Internal.DiagnosticManager Trace: Stopping AllListeners subscription
ApplicationStopped
| Microsoft.AspNetCore.Hosting.Diagnostics Information: Request reached the end of the middleware pipeline without being handled by application code. Request path: GET http://localhost/not-found, Response status code: 404
`

The latter shows that the internal diagnostics of .net (Request reached the end of the middlware.....)
after the ApplicationStopped executed and thus the Listeners are already shut down.

@chrisb92 chrisb92 marked this pull request as ready for review November 26, 2023 21:30
@agkeskin
Copy link

agkeskin commented Jan 16, 2024

when will be dot net 8 version published? This package is blocked us to upgrade project to .Net8 because of it's dependent packages.

@jbroumels
Copy link

jbroumels commented Feb 5, 2024

@chrisb92 What do you think is best ? Add the delay and make the tests work, or should this be further checked with the main .net teams ? (because I personally believe that their current setup has some glitch that causes certain diagnostics signals to not get processed. It's like it doesn't gracefully shut down. I say this because the app can shut down while there is still a HttpContext to be disposed that triggers these diagnostics)

I've submit this to MS runtime lib: dotnet/aspnetcore#53820

@EngRajabi
Copy link

Is this not resolved?
We have to use OpenTelemetry, which is still rc

@jbroumels
Copy link

#104 (comment)
@chrisb92 What do you want to do?
The problem only seems to occur on 404 request diagnostics handling.
Maybe merge it with the delay patch ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants