Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROPOSAL: Add tracing backend support #4509

Open
gouthamve opened this Issue Aug 16, 2018 · 16 comments

Comments

Projects
None yet
6 participants
@gouthamve
Copy link
Member

gouthamve commented Aug 16, 2018

Now that we have some spans in prometheus now, I think it's time we added some backend support to prometheus. This would help people see how each of their query performs. We run cortex-querier with sampling rate as 1 and make sure to get a trace for every query.

As the number of queries hitting prometheus would be fairly low, it works and also gives us an insight into which queries take how long. I'm sure people would love the same information for the prometheus server.

We know users want it and ideally would love it in logs: #1315 but this could be an alternate avenue for them until we figure out the logging situation.

/cc @brancz also expressed interest in this.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 16, 2018

I think it'd be useful to have an in-memory backend so users could view both recent and long queries. I'm not sure at all about remote backends, as that seems like we'd end up having to support a pile of systems.

@gouthamve

This comment has been minimized.

Copy link
Member Author

gouthamve commented Aug 16, 2018

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 16, 2018

Why should we support jaeger and not anything else? Is there some generic protocol we can support?

Given that OpenTracing is a code-level API, we may have to leave this up to users to compile in themselves as needed.

@tomwilkie

This comment has been minimized.

Copy link
Member

tomwilkie commented Aug 16, 2018

@tomwilkie , you were looking into this last, how complex is the in-mem backend?

Non-trivial, but not hard. We really just want enough to render a simple /tracez style page. I have a branch somewhere, but realistically I won't have time to see this through.

We could also support OpenCensus, which should solve the too many backends problem, but would require instrumenting everything again with it along side open-tracing. Sigh.

There was some movement a while back on an opentracing to opencensus bridge, that would allow us to stick with opentracing and get the z pages from opencensus. Don't know where it went though: census-instrumentation/opencensus-go#505

Is there some generic protocol we can support?

There is not, see opentracing/specification#64

Given that OpenTracing is a code-level API, we may have to leave this up to users to compile in themselves as needed.

This kinda sucks as an end user experience. I'd be inclined to say we should have a batteries included approach here, and jaeger seems reasonable - its part of the CNCF after all. I raised the idea of using binary plugins here too, but that seems a way off: opentracing/opentracing-go#133

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 16, 2018

that would allow us to stick with opentracing and get the z pages from opencensus.

That'd be handy, I'd like requestz in particular as it solves #1315.

This kinda sucks as an end user experience.

It does, unfortunately the tracing world doesn't seem to be in a place where we can do much better in a sane way. Not offering a generic interface would also suck, and given our experience with SD and notifiers I'd be for only supporting a generic interface.

I raised the idea of using binary plugins here too, but that seems a way off:

I'd presume that doesn't help for the same reasons it doesn't help with SD, we'd still be stuck maintaining all the integrations.

@sipian

This comment has been minimized.

Copy link
Contributor

sipian commented Aug 29, 2018

What's the status of this proposal? I would like to take this up.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 29, 2018

We seem okay with an in-memory only backend to serve up traces on our web UI.

Full opentracing backend support does not seem viable at this time.

@sipian

This comment has been minimized.

Copy link
Contributor

sipian commented Aug 29, 2018

@brian-brazil
From the discussion, it looks like jaeger is preferred.
@tomwilkie
You mentioned that you have a branch somewhere. It'll be great if you could point that out.
Will help in getting things started.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 29, 2018

No, one person so far has requested that we support one particular backend with no generic integration. I believe this is not acceptable or sustainable, and is not in line with decisions and experience elsewhere in the project.

Separately we're talking about how we could use the tracing instrumentation we have, but this would be purely internal to the Prometheus process and not involve something like Jaeger. That is something that could be worked on.

@gouthamve

This comment has been minimized.

Copy link
Member Author

gouthamve commented Aug 29, 2018

Ideally, we would need an in-mem backend which uses something like a ring-buffer to store the traces and shows them via a UI.

I was talking to Tom yesterday about this. This is the branch he is referring to: https://github.com/weaveworks-experiments/loki/commits/protos mainly this commit: weaveworks-experiments/loki@0d0f2b3

He thinks the /pkg/model and /pkg/client could be useful as a base to build upon. It's totally upto you how you implement it though and a small and basic design doc would be good. One of the good things about Tom's code is that all this stored as protos and that makes it easy to send it out to other systems, for example.

Having said that, we should also take a look at open-census, specially because they provide these pages already: https://opencensus.io/core-concepts/z-pages/ and see if we should duplicate the functionalities.

Further, another idea is to build a opentracing multiplexer too, so that the traces get written to multiple backends, i.e, both in-mem and jaeger, as we (Cortex) uses jaeger. It'll be a cool thing to write, but again, it'll mostly duplicate the functionality of the OpenCensus zPages.

It'll be great if you do a comparison of OpenTracing and OpenMetrics.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 29, 2018

If we can re-use OpenCensus to produce zPages that'd be handy, and save us work.

so that the traces get written to multiple backends, i.e, both in-mem and jaeger, as we (Cortex) uses jaeger.

That wouldn't help with us having to compile in all possible backends.

It'll be great if you do a comparison of OpenTracing and OpenMetrics.

These are orthogonal. As are OpenTracing and OpenCensus.

@gouthamve

This comment has been minimized.

Copy link
Member Author

gouthamve commented Aug 29, 2018

That wouldn't help with us having to compile in all possible backends.

What I meant was that if someone was using promql as a library, they should be able to hookup their tracing backend too. And if it allows to have both the in-mem backend and whatever people want to hook up (only when using it as a library), it'll be better.

These are orthogonal. As are OpenTracing and OpenCensus.

If we're going to move to OpenCensus from OpenTracing, isn't doing a comparison ideal to know what it offers us beyond just the zPages and what we'll lose?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 29, 2018

What I meant was that if someone was using promql as a library, they should be able to hookup their tracing backend too.

Isn't that already the case today? I'd presume we'd be hooking this in in main rather than in promql itself.

If we're going to move to OpenCensus from OpenTracing, isn't doing a comparison ideal to know what it offers us beyond just the zPages and what we'll lose?

As far as I'm aware OpenCensus implements OpenTracing, so we shouldn't lose anything there.

@sipian

This comment has been minimized.

Copy link
Contributor

sipian commented Aug 31, 2018

@gouthamve
I did some study about opencensus and opentracing.
The way opentracing is used in prometheus can be easily transformed to opencensus.

@sipian

This comment has been minimized.

Copy link
Contributor

sipian commented Sep 25, 2018

I have opened a proposal for this project. Feedback is welcome :)
https://docs.google.com/document/d/1NpfEkXRswmhcuh-BsmtfOVRJ0r7Ft7S9_JIo0NWYoGk/edit?usp=sharing

@sipian sipian referenced a pull request that will close this issue Nov 8, 2018

Open

Add in-memory tracing back-end support #4837

@jcchavezs

This comment has been minimized.

Copy link

jcchavezs commented Mar 26, 2019

I might be late to join the discussion. Since the purpose of this feature is to provide visibility to the latency prone operations in prometheus I don't think support for different backends is a crucial need mostly because users are not correlating those traces with other services.

Zipkin just got a new UI (and we are just discussing whether to publish it as npm package) and writing an in-memory backend is not too hard (more so, we have been playing with voltb https://github.com/adriancole/zipkin-voltdb). We are up to collaborate on this, I can even come up with a PR as a POC.

It is of course possible to support different UIs but since there is no shared data model in tracing, every UI will require an own implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.