New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive Sampling #365

Open
yurishkuro opened this Issue Sep 1, 2017 · 9 comments

Comments

Projects
None yet
7 participants
@yurishkuro
Member

yurishkuro commented Sep 1, 2017

Problem

The most common way of using Jaeger client libraries is with probabilistic sampling which makes a determination if a new trace should be sampled or not. Sampling is necessary to control the amount of tracing data reaching the storage backend. There are two issues with the current approach:

  1. The individual microservices have little insight into what the appropriate sampling rate should be. For example, 0.001 probability (one trace per second per service instance) might seem reasonable, but if the fanout in some downstream services is very high it might flood the tracing backend.
  2. Sampling rates are defined on a per-service basis. If a service has two endpoints with vastly different throughputs, then its sampling rate will be driven on the high QPS endpoint, which may leave the low QPS endpoint never sampled. For example, if the QPS of the endpoints is different by a factor of 100, and the probability is set to 0.001, then the low QPS traffic will have only 1 in 100,000 chance to be sampled.

Proposed Solution

The adaptive sampling is a solution that addresses these issues by:

  1. Assigning sampling probabilities on a service + endpoint basis rather than just the service
  2. Using a lower bound rate limiter to ensure that all endpoints are sampled with a certain minimal rate
  3. Observing the impact of sampling rates on the overall number of traces sampled from a service and dynamically adjusting the per-endpoint sampling rates to meet certain target rates.

Status

Pending open-source of the backend functionality. Client work is done.

@robdefeo

This comment has been minimized.

robdefeo commented Apr 19, 2018

Any idea when the backend functionality will be opensourced?

@billowqiu

This comment has been minimized.

billowqiu commented May 15, 2018

"Adaptive Sampling" is now ok in backend?

@yurishkuro

This comment has been minimized.

Member

yurishkuro commented May 15, 2018

it's coming soon, @black-adder just finished rolling it out internally to all services, so it just needs a bit clean-up (from any internal deps) to move to open source.

@billowqiu

This comment has been minimized.

billowqiu commented May 15, 2018

thks @yurishkuro , i am investigating the jaeger and zipkin。

@trtg

This comment has been minimized.

trtg commented Jun 19, 2018

@yurishkuro any progress on this being released?

@yurishkuro

This comment has been minimized.

Member

yurishkuro commented Jun 19, 2018

question to @black-adder , "he is a-cooking something up"

@sergeyklay

This comment has been minimized.

sergeyklay commented Aug 7, 2018

@black-adder any news?

@black-adder

This comment has been minimized.

Collaborator

black-adder commented Aug 7, 2018

sorry all, I just started to move the pieces over, hopefully we'll have the whole thing in OSS this week.

@agxp

This comment has been minimized.

agxp commented Oct 11, 2018

What's the status on this? I would like to configure Jaeger to sample all traces on low load, and on high load sample at a certain probability. It doesn't seem possible currently. Thanks.
@black-adder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment