Skip to content

Conversation

@sgryphon
Copy link
Contributor

@sgryphon sgryphon commented Nov 16, 2020

This adds the span-id and parent-id distributed trace correlation fields as custom fields.

One of the fields, span.id, is included in ECS 1.6, and can simply be replaced when upgrading.

This implements it as a custom field, Span.id, using uppercase conventions as specified in the ECS documentation.

The Parent.id field is also included (this has been mentioned for ECS, but not yet included).

The values (TraceId, SpanId, and ParentId), based on W3C Trace Context, are support in .NET (via the Activity class), and automatically populated in modern ASP.NET applications (e.g. via the traceparent header). ASP.NET also sets a logging Scope with the three values.

--

This Pull Request is on top of the bug fix to get custom fields supported again, as while implementing I noticed that had been broken.

It would be nice, but not essential, to merge this before Extensions GA #95.

@ghost
Copy link

ghost commented Nov 16, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Started by user Russ Cam

  • Start Time: 2021-06-02T01:19:20.221+0000

  • Duration: 15 min 28 sec

  • Commit: 46d9493

Test stats 🧪

Test Results
Failed 0
Passed 78
Skipped 0
Total 78

Trends 🧪

Image of Build Times

Image of Tests

@Mpdreamz
Copy link
Member

@sgryphon mind rebasing this against master now that #107 is finally merged, thanks!

@sgryphon
Copy link
Contributor Author

TL; DR; this is wrong, I'm going to change to lower case.

Longer discussion:

I was thinking about compatibility, and this is actually the wrong thing to do. If we add custom fields with upper case, e.g. Span.id, then when the real span.id (ECS 1.6) is implemented we face a breaking change by removing Span.id (or, probably worse, supporting both).

More broader, for compatibility: if we are running logger 1.5, and pushing to a system that supports ECS 1.4, then backwards compatibility should mean we are okay (as it is only a minor version difference). We are pushing extra fields, but they would be ignored (like custom fields). The other side is only going to report / use the 1.4 fields.

In the reverse direction, if we are running logger 1.5, and push to a system that wants ECS 1.7, then it also should not be a problem. We can't trigger anything based on a 1.7 field (e.g. email alert), because we don't send them. But it should be able to at least handle the missing field (as fields can be blank).

One of the reasons I was thinking about this was whether we needed to have one release of the logger support potentially configurable versions of ECS, e.g. select which version of ECS to use. Seeing as versions are backwards compatible this doesn't make sense. It is the first scenario above. There is not point having the ability to downgrade what we sent to 1.4; we just sent 1.5 and the target system will ignore the extra fields.

So, for span.id, we also have the situation that implementing new fields that match those in future versions is also not a problem.

If we release version 1.5.1, that adds span.id (lowercase), then 1.4 destinations will continue to ignore it, and 1.7 destinations will start using it, even though we aren't filling in other 1.7 fields (they just assume the fields are blank).

In fact, the whole notion of supporting 1.5 is a bit weird for the logger; sure the ECS library has placeholders for all the 1.5 fields, but none of the loggers fill in every single field, or even have the potential to fill in all fields (there are many that are simply ignored).

From that point of view the logger could simply claim it is 1.7, just not sending any of the new fields. Although I agree the actual library itself should have every 1.7 field before it claims it supports version 1.7.

But we could still add partial support, e.g. 1.5.1 with the span.id field, and then bump to claim 1.6 once the rest are added.

Note that for fields that really are custom, like Scope, and won't be part of ECS, using title case is correct.

So, for span.id, this is fine, as it will be in a future version.

parent.id is a bit more tricky. I am going to put up a proposal to add it to ECS, which probably wouldn't be until at least 1.9, and I'm not even sure it would be approved (there was a discussion on an issue).

Maybe the best is to leave parent.id until the change to ECS is either approved or rejected. If approved add it as parent.id, for future compatibility.

If not, then consider adding Parent.id (like Scope, etc). If these ever got added, e.g. scope got added, then we would have to consider duplicating vs breaking change.

@sgryphon
Copy link
Contributor Author

Updated as per my longer comments to be forward compatible with ECS 1.6, as we know the field span.id is in the newer versions. Removed parent.id (until it appears in the standard).

Also updated for the changes in the custom field fix branch, to support deserialisation as well. i.e. this PR includes everything the bug fix PR does, plus the span.id extra bit.

@sgryphon
Copy link
Contributor Author

@russcam I have fixed the merge conflicts on this one as well, if you can review please.

@russcam
Copy link
Contributor

russcam commented May 28, 2021

@sgryphon I'm in the process of updating to ECS 1.6.0 version (and to follow with subsequent versions). I hope to have this ready to do a release very soon, including GA'ing the great work you've put into Elasticsearch.Extensions.Logging.

Since 1.6 onwards has the span.id, and with the plan to release 1.6.0 as soon as possible, I'm thinking this PR might be moot. What are your thoughts?

@sgryphon
Copy link
Contributor Author

sgryphon commented May 28, 2021

We still need the code to pull out span from Activity and insert it into the log event, i.e. the code in ExtractW3cSpanIdFromActivityId and AddTracing. Even the line logEvent.Span.Id would be the same.

What would change would be Span property on LogEvent, and the custom serialisation, which would be replaced by the same property on Base. It may actually still work, as LogEvent.Span will simply hide the inherited member Base.Span (compiler warnings, etc).

So, you still want to merge to get all the rest of the code, then when combining with the 1.6 branch simply remove LogEvent.Span and recompile.

@sgryphon
Copy link
Contributor Author

@russcam I'm happy to do the patch to version 1.6, but half of this PR is still required (to get the value), just the setting half changes. So the PR is still required (even for 1.6)

@russcam
Copy link
Contributor

russcam commented May 31, 2021

@sgryphon I'm about to open a PR that updates to ECS 1.6.0 👍

Pulling the trace data out of Activity I think is a reasonable default. I think it should be possible to override the default to pull trace data out of Elastic APM's transaction, if present, similar to Elastic.Apm.SerilogEnricher and Elastic.Apm.NLog

@sgryphon
Copy link
Contributor Author

Sounds like a good idea; I'll have a look at the other code. Are you okay if I do that as a separate pull request?

P.S. Pulling it out of Activity.Id is for backwards compatibility so it can be compiled with Standard 2.0, and work correctly with later versions. If we want to do multiple targeting for the library we can add conditional compilation for later version to just grab details directly. (Again, as a separate pull request though).

@russcam
Copy link
Contributor

russcam commented May 31, 2021

Separate PR is good; this one already contains a monster amount of your effort, which is hugely appreciated 😃

logEvent.Trace = new Trace() { Id = activity.RootId };

// Use field span.id forward compatible with ECS 1.6
var spanId = ExtractW3cSpanIdFromActivityId(activity.Id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could check if the Activity is not null and activity.IdFormat == ActivityIdFormat.W3C, and pull the trace id and span id from activity.TraceId.ToHexString() and activity.SpanId.ToHexString(), respectively. This would remove the need to do the W3C Trace Context parsing. If there is a desire to also handle the older hierarchical format, that could be handled in a separate if/else branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Activity.IdFormat, Activity.Span, etc are all netcore 3.0 features, and not available in netstandard2.0; hence the clunky pulling Activity.Id and seeing if the string matches the expected format (because the library is being used in a 3.0+ application).

We could do multi-targeting for the library packages, and then conditional compilation #if version >= 3.0 then grab SpanId, but I suggest maybe as a separate pull request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using them in Elastic.Apm, which targets netstandard2.0. Am I misunderstanding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so Elasticsearch.Net/7.6.1 has a dependency on System.Diagnostics.DiagnosticSource/5.0.0; which from looking is not actually part of netstandard proper, just a library built on top of it.

The documentation says you need Core 3.0 or above, https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.activity.idformat

But that's not actually correct, all you need is a nuget reference to System.Diagnostics.DiagnosticSource, which runs on top of netstandard 1.1 or above, https://www.nuget.org/packages/System.Diagnostics.DiagnosticSource/5.0.0

It seems to all compile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use Activity.Trace/.Span, if W3C format, otherwise fallback (similar to older fallback to TraceContext.Activity)

@russcam
Copy link
Contributor

russcam commented Jun 1, 2021

I'm about to push a commit to this branch that updates in line with ECS 1.6 changes

@sgryphon
Copy link
Contributor Author

sgryphon commented Jun 1, 2021

I'm about to push a commit to this branch that updates in line with ECS 1.6 changes

Yes. Just removing the custom span and using base; the rest of the code (to set the value) is the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants