Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add W3C propagator #102

Merged
merged 13 commits into from
Apr 8, 2021
Merged

Conversation

dszmigielski
Copy link
Contributor

This PR introduces W3C propagator. One thing to have in mind is that this feature will work only for app configured to use OpenTelemetry TraceId convention. Because of that there is a need for an iffy if in ContextPropagatorBuilder. Please have a closer look at that when reviewing, maybe someone will come up with a better idea how to handle that.

@dszmigielski dszmigielski requested a review from a team as a code owner April 1, 2021 11:28
{
return getter(traceIdConvention);
// W3C propagator requires Otel TraceId convention as it's specification clearly states lengths of traceId and spanId values in the header.
return IsW3CButNotOtelTraceId(propagator, traceIdConvention)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would throw an unsupported exception here. Cause if config is asking for W3C and gets default, it might cause more misunderstanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that actually sounds reasonable 👍

src/Datadog.Trace/Tracer.cs Outdated Show resolved Hide resolved
Copy link
Member

@nrcventura nrcventura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though OTel doesn't have any additional data to add to the tracestate header, should we continue to propagate the tracestate header value that we initially received to provide better interoperability with systems that are not fully monitored by OTel?

Comment on lines 29 to 42
if (context == null)
{
throw new ArgumentNullException(nameof(context));
}

if (carrier == null)
{
throw new ArgumentNullException(nameof(carrier));
}

if (setter == null)
{
throw new ArgumentNullException(nameof(setter));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you "collate" it into 3 lines, please 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is just a matter of styling, but I hate one line ifs. I would rather leave it as it is if we don't have strict rules on such things.

Copy link
Member

@pellared pellared Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I see no value of null checks for internals. It just adds noise :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I did it to be consistent with other propagators. TBH if nobody opposes I would love to get rid of that :)

}

var spanIdString = traceParentHeader.Substring(VersionAndTraceIdLength, SpanIdLength);
if (ulong.TryParse(spanIdString, NumberStyles.HexNumber, CultureInfo.InvariantCulture, out var spanId))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to invert the if statement so that in "main" flow we have the "positive" scenario

/// <summary>
/// Gets the trace state for W3C propagation
/// </summary>
internal string TraceState { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have redone how this is parsed.

@@ -100,6 +116,11 @@ private SpanContext(TraceId? traceId, string serviceName)
/// </summary>
internal string Origin { get; }

/// <summary>
/// Gets the trace state for W3C propagation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like:

TraceState carries vendor-specific trace identification data, represented as a list of key-value pairs. TraceState allows multiple tracing systems to participate in the same trace. It is fully described in the W3C Trace Context specification: https://www.w3.org/TR/trace-context/#tracestate-header.

// W3C propagator requires Otel TraceId convention as it's specification clearly states lengths of traceId and spanId values in the header.
if (propagator == PropagatorType.W3C && traceIdConvention is not OtelTraceIdConvention)
{
throw new NotSupportedException($"'{PropagatorType.W3C}' propagator requires '{ConventionType.OpenTelemetry}' convention to be set");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pjanotti @zacharycmontoya @nrcventura
I am not sure if we should throw an exception and crash the instrumented application in case of improper configuration. As far as I see in all other places there is a fallback to defaults in case of improper value.

Besides, I think we should have some docs to describe the strategies for error handling e.g. when we should log vs crash etc.

Copy link
Contributor

@RassK RassK Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it seems very confusing if the application does not respect the values and starts behaving differently.

  1. Either fail fast (+ log the reason of the failure)
  2. Settings should be consistent for the whole system. So instead of creating default propagator, the setting should be reverting back to the default value (+ log the reason). Otherwise other parts of the system will still see the invalid value but don't know about the actual behavior. Which means it could lead to weird bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go with the 1. here, but after giving it a second thought that would make any .NET app crash on the machine which can be disturbing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I am also in favour of 1.
However, maybe there are reasons that we should not go that way. Especially that currently it does not work that way if I follow the code correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dszmigielski I think it's very important point - already forgot that the default behavior is to instruct every dotnet application.

Wondering why the internals of tracer are not protected? If there is a fatal failure it should put the tracer to offline mode (+ log the reason) but leave the original application alive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This time it seems ok

TBH now I am not sure. We would need to check what happens if we attach the profiler e.g. to an app that only prints Hello world to the console.

Copy link
Contributor

@zacharycmontoya zacharycmontoya Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @lucaspimentel

In this case, it makes sense to bubble up an exception through the Tracer constructor because some combination of configuration was invalid. However, I don't think we account for this currently. We currently use a lazy initializer and get the default Tracer instance:

BTW this is called at startup when the IL rewriting aggressively finds the first JIT-compiled method and makes a call to this:
var type = assembly.GetType("Datadog.Trace.ClrProfiler.Instrumentation", throwOnError: false);

Inside Instrumentation.Initialize would be a good place to

  1. Catch the exception from the constructor
  2. Log the exception from the constructor
  3. Maybe set Tracer.Instance = new Tracer(new TracerSettings { TraceEnabled = false } ); so it can't do anymore harm, even though the IL rewriting will still occur?

Copy link
Member

@pellared pellared Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ad. 3
I think it would be better to additionally "log" to the console (stdout) and then close the application. Otherwise, the user may not be aware that the auto-instrumentation is not working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment this will not fail until the first instrumentation attempt is made, so it's no near perfect. Even though I would close this PR and create an issue for fixing this behaviour separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this issue can be handled separately, and we can discuss what the side effect of bad configuration is. At Datadog, we have a strong opinion that we should absolutely minimize changes to the application's behavior when attaching the tracer, so in our tracer we would not choose to stop the application because of bad configuration.


public SpanContext Extract<T>(T carrier, Func<T, string, IEnumerable<string>> getter)
{
var traceStateCollection = getter(carrier, W3CHeaderNames.TraceState);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It maybe slightly more efficient to read the tracestate right before returning the final SpanContext so you don't spend time reading headers you may not use.

var traceStateCollection = getter(carrier, W3CHeaderNames.TraceState);
var traceState = ExtractTraceState(traceStateCollection);

var traceParentValues = getter(carrier, W3CHeaderNames.TraceParent).ToList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the SDK, they get the Count() of the resulting enumerable and then access the traceparent by getting the First() of the resulting enumerable. That seems slightly more efficient so we don't allocate a new List when we don't really need to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this could cause multiple enumeration, so I would rather not do that.

Copy link
Contributor

@RassK RassK Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Zach's point is valid. string.Join(",", traceParentValues) could be just traceParentHeader, cause traceParentValues.Count != 1 will not let you past anyway. So you are left with Count() and First() operations - which means full list will be never needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or even better, FirstOrDefault() single enumerable operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FirstOrDefault would be cool but then we have no way of checking if there was more than one value, and for now in this case we return null SpanContext.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I meant SingleOrDefault(), my bad.

@pellared
Copy link
Member

pellared commented Apr 7, 2021

Just make GH issue for leftovers, please.

Copy link
Contributor

@zacharycmontoya zacharycmontoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some of the validation / error handling can be addressed separately

@dszmigielski dszmigielski reopened this Apr 7, 2021
@nrcventura nrcventura merged commit da57e3b into open-telemetry:main Apr 8, 2021
@dszmigielski dszmigielski deleted the add-w3c-propagation branch April 12, 2021 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants