Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traceparent Header Propagation #3562

Closed
KirolosShehata opened this issue Aug 7, 2022 · 17 comments
Closed

Traceparent Header Propagation #3562

KirolosShehata opened this issue Aug 7, 2022 · 17 comments
Labels
question Further information is requested

Comments

@KirolosShehata
Copy link

KirolosShehata commented Aug 7, 2022

Open Telemetry version: 1.0.0-rc9.3
.NET version: 6.0.7

I was trying different things using OTel in .NET and it worked as expected while having the AddHttpClientInstrumentation and AddAspNetCoreInstrumentation enabled. visualization was as below

image

but when i disabled the AddHttpClientInstrumentation from the caller , I found that the parent span ID isn't set right and therefore the visualization is not right.
image

same thing happens if i disable the AddAspNetCoreInstrumentation from the callee.
not sure if this is a bug in .NET OTel SDK or what but I think devs should have the option to not use the auto instrumentation

Another question is that I tried using "activity.SetParentId(parentSpanId); or Activity.Current.SetParentId(parentSpanId);"
but it's not overwriting the current value of the parent span id.

@KirolosShehata KirolosShehata added the question Further information is requested label Aug 7, 2022
@utpilla
Copy link
Contributor

utpilla commented Aug 9, 2022

@KirolosShehata Could you please describe your expectation in more detail? Do you have two ASP.NET Core apps with one calling the other and you want the traceId to be propagated?

Please check the remarks for Activity.SetParentId(): https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.activity.setparentid?view=net-6.0

  • This method is only intended for use with Activities created from the Activity constructor. Activities created by calling CreateActivity or StartActivity already have the parent ID set, and invoking this method has no effect.
  • This method should only be used before starting the Activity object. This method has no effect if you call it after the Activity object has started.

@KirolosShehata
Copy link
Author

KirolosShehata commented Aug 10, 2022

@utpilla I have two apps. First scenario:

  • App B (caller)
    • Has AddHttpClientInstrumentation enabled
    • Has AddAspNetCoreInstrumentation enabled
  • App A (callee)
    • Has AddHttpClientInstrumentation enabled
    • Has AddAspNetCoreInstrumentation enabled

The tracing of the two apps in this case is a expected (shown below)
image

Second scenario:

  • App B (caller)
    • Has AddHttpClientInstrumentation disabled
    • Has AddAspNetCoreInstrumentation enabled
  • App A (callee)
    • Has AddHttpClientInstrumentation enabled
    • Has AddAspNetCoreInstrumentation enabled

The tracing of the two apps in this case is not as expected (shown below)
This happened due to propagating an invalid parent span id in the traceparent header (done by OTel)
image

However, My expectation is that disabling the AddHttpClientInstrumentation in App B (caller) will only result in:

  • Removing the second span (HTTP GET) in the screenshot below
  • the parent of the third span (/test) should be the parent of the HTTP GET span [so the parent of /test should be root span]
    image

@utpilla
Copy link
Contributor

utpilla commented Aug 11, 2022

@KirolosShehata Could you please share a minimal repro app? netcoreapp3.1 onwards, HttpClient automatically injects traceparent into the request headers and the receiving Asp .NET Core app should automatically extract it, so your expectation is correct. I tried the "second scenario" that you mentioned, and it works fine as expected.

@KirolosShehata
Copy link
Author

KirolosShehata commented Aug 15, 2022

@utpilla we created a sample repo for you.

[@OmniaSalehSaad]
(https://github.com/OmniaSalehSaad)

The issue is that we might need to disable HttpClient in Project B. when we do so, the traceparent header is still automatically propagated but with a wrong ParentID value.

@KirolosShehata
Copy link
Author

@utpilla any updates?

@cijothomas
Copy link
Member

Can you give a minimal repro app, along with readme.md instruction to run it?
From a quick glance, the one shared is not a minimal repro, and likely has your custom code. It'd be a lot easier to debug, if you make it minimal.
(dotnet new webapi, add nugets, enable otel..) https://stackoverflow.com/help/minimal-reproducible-example

@KirolosShehata
Copy link
Author

@cijothomas thanks for the shared link.
We updated the repo to match the minimal repro guidelines: https://github.com/OmniaSalehSaad/ProjectB
Let us know if anything else is needed

@cijothomas
Copy link
Member

we discussed this in yesterday's community meeting as well, and what you see is the expected behavior.(agree its a bit awkward and not really desirable. See the details below)

  1. When ServiceB makes a call to ServiceA, the context must be propagated from B to A. It is the responsibility of the caller ServiceB to propagate the context. And it is the responsibility of the callee (ServiceA), to restore the context and continue the trace. If this propagation/restore is not done., then it'll result in broken traces. i.e. ServiceA starts its own traceid.

  2. OpenTelemetry SDK does not do the out of process progation/restoration automatically. User has to do it. How user does that depends.. One option is to do it manually using the propagators API. (example here: https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/examples/MicroserviceExample). Another option (the most common one) is to leverage instrumentation libraries. Instrumentation libraries for those libraries doing cross-proc calls are responsible for correct propagation and restore of context. In your example, those libraries are Asp.Net Core and HttpClient, and one must enable instrumentation libraries for both of them to correctly do the propagation/restore.

  3. HttpClient and Asp.NetCore are a bit special cases. Even without OpenTelemetry, these libraries natively create spans(activity), and also takes care of propagation and restore. And this is what causes a surprising behavior. Though you did not enable HttpClientInstrumentation, HttpClient went ahead and created its own Activity, and propagated its context to the next service. As this Activity is not exported, it results in broken trace. It must be noted that, even though this resulting in a incomplete trace, the traceid is still propagated correctly. If this was some other library, then you'd have seen a totally different results (with 2 traceids from A and B).

  4. I can see why this special behavior of HttpClient can be confusing. We can explore some ways to fix this issue. Note that the fix would be in the HttpClient library itself, and hence it'll be some time before any changes can be done there. (The earliest would be .NET 8 coming next year).

Until the httpclient behaviour can be addressed, you need to enable HttpClient instrumentation (same for any library doing out of proc communication), to ensure correlation is done correctly.

Let us know if this makes sense.

@KirolosShehata
Copy link
Author

KirolosShehata commented Aug 28, 2022

@cijothomas thanks Cijo for your detailed reply.
I believe OpenTelemetry SDK does the out of process propagation/restoration automatically even without using http client instrumentation as the trace id is propagated correctly. The problem is only in the parent span id which is due to a bug in http client as you mentioned.

We have found a workaround for this which is to overwrite the "parent span id" part in the traceparent header with the span id of the actual last activity in Project B (in the cases that we are not using HTTP client instrumentation).
This workaround solves the problem however I wonder if this can be a feature in the OTel SDK itself.

In the meantime, I would like to know what you think about the proposed workaround and how we can implement it the best way (in a single location not after each leaf span). I thought about implementing the TextMapPropagator to a custom one and overwrite the parent span id in it, but I do not know how to set my propagator as the global one to be used automatically in all outgoing http calls. I tried: Sdk.SetDefaultTextMapPropagator(new TestMapPropagator()); but I think it's not working

UPDATE:
When setting the propagator using: Sdk.SetDefaultTextMapPropagator(new TestMapPropagator()), the Inject method works ONLY IF I'm using httpclientinstrumentation which should not be the case.
Updated the repo with the work around implementation: https://github.com/OmniaSalehSaad/ProjectB

@KirolosShehata
Copy link
Author

@cijothomas & @utpilla any updates around the issue mentioned above?

@cijothomas
Copy link
Member

I believe OpenTelemetry SDK does the out of process propagation/restoration automatically even without using http client instrumentation as the trace id is propagated correctly.

OTel SDK never does context propagation. Its always done by libraries doing cross proc communication. (either the library natively does it, like HttpClient. Or the instrumentation library does it using Propagtors API.)

@cijothomas
Copy link
Member

implementing the TextMapPropagator to a custom one

Propagators are global in nature. i.e we have no ability to have a propagator for httpclient (doing the special propagation), and another propagator for another library.

@cijothomas
Copy link
Member

cijothomas commented Aug 31, 2022

When setting the propagator using: Sdk.SetDefaultTextMapPropagator(new TestMapPropagator()), the Inject method works ONLY IF I'm using httpclientinstrumentation which should not be the case.

See my response above. Propagation is never done by the OTel SDK. It is always the responsibility of the libraries doing cross proc communication.

@rcdailey
Copy link

these libraries natively create spans(activity), and also takes care of propagation and restore

@cijothomas I'm sorry to ping you in an old issue. I am on a quest to learn more about how I should go about doing header propagation for the correlation ID in ASP.NET Web API in .NET 7. Can you share a link to official Microsoft docs that talk about its automatic header propagation? Despite my best efforts, I cannot find anything. The best I have found is the Microsoft.AspNetCore.HeaderPropagation package, which still requires some manual set up.

Thank you.

@cijothomas
Copy link
Member

Can you share a link to official Microsoft docs that talk about its automatic header propagation?

I do not think its documented.
Here's an official (old) blog post when Asp.Net Core 1st added support for this : https://devblogs.microsoft.com/dotnet/improvements-in-net-core-3-0-for-troubleshooting-and-monitoring-distributed-apps/

@sunildatla
Copy link

sunildatla commented May 30, 2023

@cijothomas , Sorry for messaging on a closed issue , but because there is context here i am messaging here .

I have a couple of questions on the trace propagation using these instrumentation libraries which i am struggling to get in a sample asp net core app

  1. Will adding these instrumentation libraries AddAspNetCoreInstrumentation() and AddHttpClientInstrumentation() also make sure that the logs get the traceid and spanid propagated and stitched automatically?
  2. I was working on logs to try and understand how to make sure the trace is propogated for logs as well and unable to get traceparent header or maintain the same trace in the sample asp net core app
  3. The nugets have no extensions to add aspnetcoreinstrumentation and httpclientinstrumentation to the logging extensions below
  4. How can we ensure that the traceid remains same for logs as well
  5. Also for some reason i do not see traceparent header for the external httpclient call from below sample , i am missing something fundamental with my understanding regarding this i suppose.

Below is the startup

builder.Services.AddOpenTelemetry()
             .WithTracing(
            builder =>
            {
                builder.AddAspNetCoreInstrumentation();
                builder.AddHttpClientInstrumentation(); 
            }
            );
        builder.Services.AddLogging(loggingbuilder =>
        {            loggingbuilder.AddOpenTelemetry(options =>
            {
                options.AddConsoleExporter();
            });
        });

Below is the simple controller which redirects to another action

public class HomeController : Controller
    {
        private readonly ILogger<HomeController> _logger;

        public HomeController(ILogger<HomeController> logger)
        {
            _logger = logger;
        }

        public IActionResult Index()
        {
            var activity = new Activity("StartingIindex");
            activity.Start();
            _logger.LogInformation("I am in {viewname} View and {trace}", "Index",Activity.Current?.Id);

            HttpClient httpClient = new HttpClient();
           HttpResponseMessage responseMessage= httpClient.GetAsync("https://api.publicapis.org/entries").GetAwaiter().GetResult();
            var response=responseMessage.Content.ReadAsStringAsync().Result;

            return RedirectToAction("Privacy");
            activity.Stop();
    
        }

        public IActionResult Privacy()
        {
            _logger.LogInformation("I am in {viewname} View", "Privacy");
            return View();
        }
        ```

@cijothomas
Copy link
Member

@sunildatla Please open a new issue describing the problem, so we can take a look.
The code snippet shown abode does not seem to follow the official guidelines (for eg: I can see new Activity("StartingIndex") which is not correct.)

Please start with https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/examples/AspNetCore example app always, and modify it to show us the incorrect/unexpected behavior you are experiencing. That'll make it super easy to troubelshoot.

Also correlation is automatically done for Logs. See doc : https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/logs/correlation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants