Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic - Span API -> Activity API #684

Closed
27 of 32 tasks
cijothomas opened this issue May 19, 2020 · 7 comments
Closed
27 of 32 tasks

Epic - Span API -> Activity API #684

cijothomas opened this issue May 19, 2020 · 7 comments
Labels
area:api area:sdk release:required-for-ga To be resolved before GA release

Comments

@cijothomas
Copy link
Member

cijothomas commented May 19, 2020

With the end goal is replacing OT Span API with .NET Activity API (part of System.Diagnostics.DiagnosticSouce nuget package), opening this Epic to track overall state of this effort.

Notes: For each heading, supporting the most basic (a being most basic, b being next, and so on) is current priority. This would allow us to validate Activity APIs and provide sufficient and timely feedback to .NET Team. (DiagnosticSouce v5 is expected to ship Nov 2020, but getting bigger changes becomes increasing difficult as we reach closer to ship date.)

Current known gaps for Activity API are listed here: #675. This will be updated as new fixes are made available by .NET, and as new issues are discovered.

1. Support Activity in processor/exporter pipeline.

2. Sampling

3. Adapters (previously called collectors, now called Instrumentation)- Must be modified to the new API guidelines.

4. Exporter

5. General

6. Examples

7. Shim/Wrapper

  • a - Decide if this repo should publish a wrapper - It was decided on SIG meeting on 6/23/2020 that we'd ship a wrapper over Activity/ActivitySource. it'll be a very light wrapper internally using Activity/ActivitySource.
  • b - Actual implementation

8. Benchmarking

@Aaronontheweb
Copy link
Contributor

Aaronontheweb commented May 26, 2020

@cijothomas

With the end goal is replacing OT Span API with .NET Activity API (part of System.Diagnostics.DiagnosticSouce nuget package), opening this Epic to track overall state of this effort.

.... Why is this even necessary? Why should OT for .NET adopt a totally different terminology and API surface area compared to all of the other implementations? How many years is it going to take for any of this to be accessible to library developers like myself who want to use OT? And are these API changes to Activity going to be .NET 5-only going forward?

@cijothomas
Copy link
Member Author

Clarifying 2 important things here:

  1. The .NET Activity API is shipped out-of-band. It works for .NET Core 2.x or higher, .NET Framework 4.5.2 or higher.
  2. The .NET Activity API will support everything from OT Span API. The names will ofcourse be different, (like Activity instead of Span) - This is where item 7 comes into picture - the shim/wrapper will make everything matching OT SPan API.

@Aaronontheweb if you can join the Gitter and/or SIG meetings, we'll be talking about this in detail. (I can get recordings from previous ones as well)

@SergeyKanzhelev
Copy link
Member

.... Why is this even necessary? Wouldn't should OT for .NET adopt a totally different terminology and API surface area compared to all of the other implementations? How many years is it going to take for any of this to be accessible to library developers like myself who want to use OT? And are these API changes to Activity going to be .NET 5-only going forward?

Activity is a part of DiagnosticSource package which is build for all platforms down to NET45. So it must be accessible as soon as we GA. As for terminology - we need more feedback on what's OK and what is broken.

@Aaronontheweb
Copy link
Contributor

Very glad that this isn't going to be a .NET5-only thing. That would have been a big problem for us. .NET Standard 2.0 and up is our current target.

Per the advice of @davidfowl on Twitter I wanted to include feedback from my experience as an OpenTracing user / contributor / library developer looking to replace that work with OpenTelemetry in the coming years:

Customization and Extensibility

One concern I have around wrapping everything inside of Activity is that it's not subclassable. Why would I need that to do that? For instance, in Akka.NET, used by the likes of big banks, healthcare, insurance, gaming, airlines, governments, etc, we can't propagate context using AsyncLocal because the actor concurrency model doesn't execute that way. Everything is done, for performance reasons among other things, by inlining as much actor activity onto as few quantums as possible.

To help preserve our trace context correctly in this environment we used an OpenTracing IScopeManager in combination with a custom IScope implementation. This took maybe about 30 lines of code. I could correctly preserve incoming context entering an actor from an external environment (i.e. SignalR hub, ASP.NET HTTP request, etc...) and within / out of our actor-based environment without having to expose any of that to our users. How would I go about accomplishing that without the ability to create my own TelemetrySpan or DistributedContextCarrier if needed?

I'm still working on my OpenTelemetry POC using the latest alphas, but I think it's possible for me to achieve this today. If we remove the above tooling, I don't know if it will be feasible.

Terminology

Having a shared technical lexicon is useful particularly when trying to communicate across language and experience barriers. Anyone coming from the Node.JS world who's worked with Zipkin knows what a span is. Ditto with Java developers who've worked with Jaeger. I think there's a lot of benefit in maintaining the current terminology in the .NET library, as there's a corpus of work in the distributed tracing space goes back years.

Moving to a model of activity, activity source, diagnostic trace listeners - I've been doing .NET for 15 years, worked at Microsoft for two of them, and I've now launched two different businesses that deal with application telemetry for .NET developers and I couldn't give you a clear definition of what these terms mean or how to use them in an application.

People switch jobs, languages, and runtimes often - giving them a standard to target for something as crucial as systems observability will go a long away towards seeing its successful adoption for years to come.

I think adapting Span to Activity is much more preferable to dropping Span in its entirety. The current API more or less does this today.

Preserving that layer of abstraction helps resolve both of these issues.

@cijothomas sure, I can see about joining the Gitter at least. I'm pretty busy but I'll do my best to contribute.

@austinlparker
Copy link
Member

I'm not entirely sure if this is aligned with the spec in terms of API re-implementation (see https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/library-guidelines.md). I think having a translation layer between span/activity is fine, but I don't think we can just replace things in the API outright, especially since the existing OTel API is pretty well-defined. I'm empathetic to the goals of seamless interop with existing .NET stuff, but as someone that does a lot with the community, I'd rather see .NET being more aligned with open standards rather than less.

@MikeGoldsmith
Copy link
Member

Copying this from Gitter where we originally discussed:

Mike Goldsmith @MikeGoldsmith May 27 18:41
I was very much in the camp of preferring OTel being a separate entity to the basr .NET Activity & ActivitySource. I expected the current API to be the main programming interface and be able to subscribe / capture multiple forms of diagnostic trace forms via instrumentation.
I also understand Microsoft's position of wanting to closely tie OpenTelemetry with .NET libraries as possible. I'm not certain shifting span/tracer to activity/activitysource will work, and that's why we're doing this work alongside the existing API.
If the activity/activitysource presents enough and we can achieve everything we need, I'd still expect the library to present an Otel spec compliant interface that sits on top to make moving between languages trivial.
I'm not 100% sold on the migration but also want to give it a good chance to succeed.

Mike Goldsmith @MikeGoldsmith May 27 18:49
Regarding span context in the actor model, I don't expect the OTel span context API to change too much and will allow for a swappable context carrier, which defaults to async local. It should be possible to add a thread static version of a carrier to work on the same way you had it working in OpenTracing.

I think it's fair to investigate whether Activity can support what is needed to comply with the OpenTelemetry spec. It will ease integrations and widen the net of libraries & frameworks that are automatically instrumented.

I'm personally unsure whether Activity/ActivitySource can do all what we need and we'll only commit to it once it is proven to. We have also agreed to provide an OpenTelemetry style shim so working across language implementations still feels consistent.

My perspective is one of OpenTelemetry's primary goals is to present a consistent and cross-language API, and that includes naming conventions. However, we also need to utilse as much of each languages features and be idiomatic possible. The .NET SIG is currently lucky enough to be able to engage with the language maintainer (Microsoft) directly on trying to make telemetry as accessible as possible.

@cijothomas
Copy link
Member Author

Closing this as the major work is complete.
There will be smaller items when .NET releases the next preview. Will track them separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api area:sdk release:required-for-ga To be resolved before GA release
Projects
None yet
Development

No branches or pull requests

6 participants