Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support new OpenTracing API #1196

Closed
tedsuo opened this issue Jun 30, 2017 · 26 comments
Closed

Support new OpenTracing API #1196

tedsuo opened this issue Jun 30, 2017 · 26 comments
Labels
area/tracing enhancement Feature requests. Not bugs or questions.
Milestone

Comments

@tedsuo
Copy link

tedsuo commented Jun 30, 2017

A new version of the OpenTracing API for C++11 is almost ready (opentracing/opentracing-cpp#11). Once it has been merged, we will be changing the LightStep tracer so that it is OT-compatible. It would be great if we could then do the following with Envoy:

  • extract the ZipKin tracer and use it as the basis for a new OT-compatible ZipKin C++ tracer
  • reimplement the tracing instrumentation in Envoy to use the OpenTracing API.
  • support linking to 3rd-party OT tracers beyond lightstep and zipkin.

We're happy to put the work in to make this happen. Any recommendations or objections?

@mattklein123 mattklein123 added enhancement Feature requests. Not bugs or questions. area/tracing labels Jun 30, 2017
@mattklein123
Copy link
Member

This sounds good to me.

cc @adriancole @rshriram @RomanDzhabarov @fabolive for further comment.

@moderation
Copy link
Contributor

I was doing some very initial testing with @RomanDzhabarov to upgrade to lightstep-tracer-cpp 0.38 (currently on 0.36). We should probably put this on hold and wait for the implementation suggested by @tedsuo?

@mattklein123
Copy link
Member

@moderation yes I would probably just skip the upgrade at this point.

@codefromthecrypt
Copy link
Contributor

There are usually mapping problems with OpenTracing efforts, surfacing as poorly encoded data in Zipkin or spans not appearing as client/server etc. These add support burden to our gitter channel and issues list. These mapping problems happen because usually OT is tested as a mock, as opposed to a micro-integration test (ex to the degree tests exist, they don't relate directly to data emitted). In other words, adding any layer of indirection is likely to undo the clean integration currently in envoy. Solving problems will become harder and not possible directly in envoy code.

TL;DR; I would highly discourage ripping out the existing Zipkin support until there are tests in permanently to show that envoy works with Zipkin via OT, not just that OT variants work.

@mattklein123
Copy link
Member

TL;DR; I would highly discourage ripping out the existing Zipkin support until there are tests in permanently to show that envoy works with Zipkin via OT, not just that OT variants work.

Agreed. Nothing will change until we have confidence that any new solution is supported by the community and works. I'm mainly saying that I have no objection to people attempting to tackle this if they are willing to do all the work.

@tedsuo
Copy link
Author

tedsuo commented Jun 30, 2017

Yes, @adriancole this is also part of an effort to validate that the new API is correct, we want to prove that we can bind a ZipKin C++ implementation to it and have it work in real systems like Envoy and NGINX. (I'd also like to talk to you about test harnesses and other compatibility guarantees for OT, but probably a different thread). We're also fine supporting the C++ integration w/ZipKin and ensuring it is stable until the API has reached maturity.

We don't plan on making a PR until we're confident that there is solid integration that has parity with the existing instrumentation.

@codefromthecrypt
Copy link
Contributor

Thanks for the offer @tedsuo, and glad you are on board with making the first version of this portable with zipkin. Having an organization of mostly volunteers, I'm more concerned about future versions. Don't read this as skepticism about you personally, as your statement sounds great.

What I'm generally concerned with is a scenario where OT is inserted then Zipkin later removed or degraded due to api decisions not made in the interest of the OpenZipkin community of end users. We've had problems with this in the past where apis crucial to Zipkin were on the chopping board and would have been removed if @basvanbeek or I didn't notice.

It integration tests are here and someone in OT makes a decision not in favor of Zipkin users (regardless of intent), the build breaks. Fixing the build is either sacrificing these users or making the OT -> Zipkin story right again.

TL;DR; if the goal includes removing the code that just went in for Zipkin support, let's make sure integration tests happen here, and are left in for the duration Envoy wants to support the OpenZipkin community. I think this "tests here" part is the only thing different than your plan.

@tedsuo
Copy link
Author

tedsuo commented Jun 30, 2017

Sounds good @adriancole, we'll include Zipkin-based integration tests when we make our PR.

@rnburn
Copy link
Contributor

rnburn commented Jul 21, 2017

Hello Everyone, I extracted out the Zipkin tracing code from envoy into a stand-alone library and added a bridge to the proposed C++ OpenTracing API. I also wrote a version of LightStep’s tracer that supports the new API.

I’m starting to look into how best to integrate envoy with the OpenTracing API. @mattklein123 One thing I was wondering — Would you be willing to use the recorders that come with the Zipkin and LightStep libraries instead of having custom ones built into envoy, provided that Zipkin and LightStep support a way to get access to reporting statistics so you can continue recording things like the dropped span counts?

Doing so could simplify envoy's tracing code and make it easier to support features like Kafka collection since you could just turn on the functionality provided by Zipkin's tracing library.

@mattklein123
Copy link
Member

@rnburn my primary requirement for the tracing libraries are that:

  • We don't lose any stats (as you point out).
  • We decouple framing vs. transport. Meaning, the libraries should be written in a way that we can still use Envoy per thread networking and not have any connection pools/threading/etc. inside the libraries. If the libraries implement a default transport that's fine, but they should be written in a way that allows us to use our own transport.

As long as the previous two points are handled, the more we can push into the libraries the better IMO.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Jul 22, 2017 via email

@moderation
Copy link
Contributor

Lightstep C++ tracer 0.4.0 released - https://github.com/lightstep/lightstep-tracer-cpp/releases/tag/v0.4.0. Envoy still using 0.36 from February 2017. /cc @tedsuo

@rshriram
Copy link
Member

rshriram commented Oct 5, 2017

I am fine with this, as long as functionality is maintained, and the output remains consistent with what we have in Istio today. I am mostly concerned about the steps that we take today to ensure that the tags are not duplicated between client and server [some of these steps have been undone, as me and @fabolive noticed yesterday].

Put more generally, (this might be a dumb question): do all tracing systems have to make similar decisions like what we do in Zipkin (delineating cs/sr/ss/cr, having to decide whether to do treat a new trace in front-envoy [gateway] should have server-receive/ client-send tag, etc)? If not, then with an opentracing implementation, will the zipkin-specific instrumentation points in code go away?

@rshriram
Copy link
Member

rshriram commented Oct 5, 2017

cc @objectiser

@rnburn
Copy link
Contributor

rnburn commented Oct 5, 2017

Hey @rshriram, not all tracing systems use the cs/sr/ss/cr fields -- LightStep, for example, doesn't have them.

I'm not a Zipkin expert, but I believe they can controlled in Zipkin via OpenTracing by specifying the span.kind tag.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Oct 6, 2017 via email

@mattklein123
Copy link
Member

All: I'm going to put it in my GH bio. We are not deleting direct Zipkin format, probably ever.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Oct 6, 2017 via email

@mattklein123
Copy link
Member

re: performance, all of the tracers need perf work. There are quite a few issues in both cases. There are not currently any integrated microbenchmarks. Would love for someone to work on this.

@tedsuo
Copy link
Author

tedsuo commented Oct 17, 2017

We'll definitely be benchmarking this work, and the OT C++ API in general. Related to that, we're discussing adding key lookup interface for Carriers on Extract, to avoid iterating over all of the keys. This is the only place so far where we have seen the current API cause an implementation difference. Please have a look if you are interested: opentracing/opentracing-cpp#25. Once that is resolved, we will have a PR ready for review.

@rshriram not all tracing systems have zipkin-like functionality around cs/sr/ss/cr, but OpenTracing does want standard conventions around tagging RPC calls, and those standards should allow the OT-zipkin bridge to create the correct zipkin constructs: https://github.com/opentracing/specification/blob/master/semantic_conventions.md#rpcs

@moderation
Copy link
Contributor

moderation commented Nov 8, 2017

Lightstep 0.5.0 C++ tracer released - https://github.com/lightstep/lightstep-tracer-cpp/releases/tag/v0.5.0. Envoy still using 0.36 from February 2017. /cc @tedsuo.

I suggest we keep this issue focused on upgrading Lightstep and not about Zipkin (@mattklein123 has it in his Github bio that Zipkin will not be deleted).

opentracing/opentracing-cpp#25 merged 2 days ago.

@mattklein123
Copy link
Member

I will defer to @tedsuo on how to proceed. I think the focus is to deprecate the existing LS tracer and use the new OT one.

@rnburn
Copy link
Contributor

rnburn commented Nov 8, 2017

I just put in #2017. It adds common OpenTracing instrumentation that can be shared across tracers and switches the LightStep tracer to use it.

@tedsuo
Copy link
Author

tedsuo commented Nov 8, 2017

Thanks Ryan! Yes, I would like us to use lightstep behind OT, and vet the performance.

@tedsuo
Copy link
Author

tedsuo commented Nov 8, 2017

Also, while we can continue to compile tracers directly into envoy, I would encourage Envoy maintainers to consider some form of dynamic loading to allow users to install instrumentation of their choosing. There's a discussion on this issue here: opentracing/opentracing-cpp#28

@mattklein123 mattklein123 added this to the 1.6.0 milestone Nov 26, 2017
@abbi-gaurav
Copy link

Does it also provides the capability to use jaeger as a tracing solution since it is based on OpenTracing standard?

jpsim pushed a commit that referenced this issue Nov 28, 2022
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this issue Nov 29, 2022
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracing enhancement Feature requests. Not bugs or questions.
Projects
None yet
Development

No branches or pull requests

7 participants