Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Integrate with a distributed tracing system like OpenTracing or Zipkin #1436

Closed
serkangunes opened this issue Aug 10, 2018 · 15 comments
Closed

Comments

@serkangunes
Copy link

User Stories

  • As an operator, I would like to be able to trace the calls to see what are happening within my micro-services ecosystem. This will help me find out the latencies between service calls.And it will help me to identify which services are inefficient.
  • As a developer, I would like to be able to see which service calls are failing so that I can debug my services quicker.

Rationale

Distributed nature of micro services makes it difficult to find out where are the point of failures. Having the support for a distributed tracing within the service mesh will free the microservices from integration code and will keep them cleaner.

@jakerobb
Copy link

It would be really nice to have the proxy add details about things only it knows when it records its spans.

Examples (many of these are forward-looking to features not yet implemented):

  • when an outgoing proxy initiates a retry after a failure
  • when a proxy injects a fault
  • if there are any custom/configurable routing rules in place, any details that help someone understand later how the routing decision was made

@wmorgan
Copy link
Member

wmorgan commented Aug 23, 2018

@jakerobb Great list, thanks!

@zparnold
Copy link

zparnold commented Sep 4, 2018

@jakerobb @serkangunes Is there any active dev on this? I'm trying to figure out where it is on the backlog and if it's worth me taking a look if I have time.

@wmorgan
Copy link
Member

wmorgan commented Sep 5, 2018

Not afaik. Go for it!

@bourquep
Copy link
Contributor

I have started looking at OpenTracing and experimented a little bit on my dev cluster. I've done a small internal demo and we all agree that implementing this throughout our services is a must.

The good thing about it is that we can add instrumentation in an incremental/iterative way, leveraging automagic instrumentation/span creation tools already available to us, such as:

  • Enabling OpenTracing in our nginx-ingress controller.
  • Injecting OpenTracing.Contrib.NetCore into our ASP.NET core services to get free instrumentation for ASP.NET and EntityFramework
  • Having spans created automatically by Linkerd2 for our cross-service interactions. (wouldn't that be nice! 😉)

And then gradually adding manual instrumentation where we need it.

To me, the immediate and indispensable value of Linkerd2 is in how it gives me visibility into my system "for free" - without any complicated/invasive setup. The very simple act of injecting the proxy into my deployments gives me tremendous value in terms of knowing what's going on inside my cluster.

Therefore, having automagic cross-service OpenTracing trace data seems like a perfect and natural fit for Linkerd2, where it would augment/complement the visibility already provided by Linkerd2.

It would allow to go from zero to minimal useful tracing in no time. Obviously, manual instrumentation will always be required to get even more out of the traces, but it would set a solid foundation on top of which manual instrumentation could be added.

Anyways, I realize that I am not adding much value to this RFC in terms of the "what/how", but I felt compelled to add my voice to the "why". 🤓

@moderation
Copy link

moderation commented Sep 23, 2018

One approach to consider is OpenCensus. In theory this could allow for the exporting of traces to multiple backends including Jaeger, Zipkin etc. (and metrics). The glaring issue right now is that there is currently no support for Rust.

There doesn't appear to be community recognized Rust work for Zipkin but I did find palantir/rust-zipkin. On the OpenTracing front I found opentracing/opentracing-rust, opentracingrust and rustracing.

@balchua
Copy link

balchua commented Oct 26, 2018

We are looking at Linkerd2. I like its lightweight install process. This is one of the features we hope linkerd2 will support. What i like about linkerd 1 are out of the box support for distributed tracing and circuit breaker. I hope linkerd2 will have feature parity with linkerd1.

@codefromthecrypt
Copy link

fyi OpenTracing is not a tracing system, so the subject line is a bit confused. it is same as saying JDBC is the same as oracle.

@codefromthecrypt
Copy link

also "OpenTracing trace data" implies another confusion as the project defines no data format. If this is about programming api, then it would be about how you link extra data like add more instrumentation to the proxy. OpenTracing defines nothing useful for service abstraction, no headers, data format anything. So many times people misunderstand and think OpenTracing (an api with no data format or headers) is the same as Zipkin (defined both). Whatever comes out of this, better to not add to the confusion.

Census is more accurate as it explicitly supports header propagation formats including B3, and you also have a consistent implementation regardless of which backend processes the data. In fact they have defined an intermediate data format intended to be processed similar to how most people process zipkin data.

If it is intentional to not use the most used format (zipkin data and b3), for whatever reason :P then at the moment, the only alternative for this abstraction is census.

@jakerobb
Copy link

jakerobb commented Dec 9, 2018

Whoa, my worlds colliding. Hi Adrian!

It’s also important to note that you can’t have tracing “for free” — even if all instrumentation is offloaded to the proxy, you still have to update your services such that any outgoing requests propagate the tracing headers (e.g. Zipkin’s B3 headers) that came in with the triggering request/message/etc.

That’s not a particular lot of work, but it’s strictly necessary.

@codefromthecrypt
Copy link

codefromthecrypt commented Dec 9, 2018 via email

@codefromthecrypt
Copy link

codefromthecrypt commented Dec 9, 2018 via email

@renannprado
Copy link

What is the status of this?
I'm at kubecon and clearly see how amazing is linkerd. Good job!

@wmorgan
Copy link
Member

wmorgan commented May 31, 2019

@renannprado On the big list of things we want to do, but haven't found anyone to implement yet. Want to give it a try?

@renannprado
Copy link

@wmorgan I actually want to help, but I wonder if I can.
I've never programmed in rust and I suppose this implementation goes to linkerd2-proxy, correct? But I can learn rust, no problem.
Also time is a big challenge.
It's not to give an excuse, but to mention what would be the situation for me. As nobody is implementing this at the moment, maybe time is not a big problem anyhow.
I also suppose that there are clear guides on how to "hack" the things that I would have to modify, correct?

Besides all of that, what does linkerd2 misses or provides in terms of distributed tracing? because looking at this thread I couldn't really tell what exactly is missing. Of course it does require some cooperation for the applications as well for this to work, but besides that I would like to understand what's missing today related to distributed tracing. Is it not giving any support whatsoever? i.e. start from scratch?

I won't promise anything, but if I understand all of that I might give it a try and show you guys in case there's progress.

Thanks!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants