Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce means of producing metrics from all Spans regardless of sampling decision. #3145

Open
thpierce opened this issue Jan 25, 2023 · 3 comments
Assignees
Labels
spec:trace Related to the specification/trace directory

Comments

@thpierce
Copy link

What are you trying to achieve?

We wish to generate key metrics for customer applications, with the following requirements:

  • Metrics should be generated from Spans produced by OTEL auto-instrumentation.
  • Metrics are flexible w.r.t. schema and content.
  • Metrics should be generated from 100% of Spans, regardless of sampling decision.
  • Span sampling rules should still be respected w.r.t:
    • Exporting Spans from the Collector, and
    • Propagating sampling decisions to child spans.
  • Metrics generation has minimal impact to the application being monitored (e.g. w.r.t CPU/Memory/Network/etc.).

Currently we have considered two solutions for this problem that could be contributed back to the OTEL community, but are open to alternative solutions:

  1. Deferred Sampling
    1. Majority of this solution is described in this issue. The summary is that the sampling decision would be made in the SDK, but all Spans would be sent to the Collector where the decision would be executed.
    2. Once this is done, author a processor module for Collector that produces desired metrics from Spans (akin to, or an augmentation of, SpanMetricsProcessor.
  2. SDK Metrics Span Processing
    1. First, author a new aggregate Sampler in each OTEL SDK that takes the sampling decision made by a provided root sampler (i.e. parentbased_traceidratio) and converts all DROP sampling decisions to RECORD_ONLY sampling decisions.
    2. Then, author a new SpanProcessor in each OTEL SDK that extracts and exports metrics from spans.

What did you expect to see?

A means to produce flexible metrics from 100% of span data regardless of sampling decision, while still supporting arbitrary sampling rules.

This may require an OTEP, but starting with this issue to raise awareness/get feedback on approach.

Additional context.

Related Issues:

@thpierce thpierce added the spec:trace Related to the specification/trace directory label Jan 25, 2023
@jmacd
Copy link
Contributor

jmacd commented Jan 31, 2023

This was discussed in today's Spec SIG. One comment by @jsuereth stands out: Can we "do this" without Sampling? I see the answer being Yes, but we still run into problems with the Sampler API. As @thpierce notes, option 2 involves a new Sampler.

Here are some related issues:

It is difficult to compose probability samplers that could be used in a span-to-metrics pipeline:
#2179

It is difficult to construct spans whose links are not known, which could be solved with a new span state similar to the one discussed in this issue. #2918

@jsuereth
Copy link
Contributor

My suggestion was actually we could change the behavior of the Tracer such that:

  • The Sampler does its current job and responds with a sampling decision
  • If the presence of a "MetricSpanProcessor" exists or an "AllSpanProcessor" or whatever hook we have, the Sampler decision is overriden such that:
    • Sampled traces remain sampled
    • RECORD_ONLY traces remain recorded.
    • dropped traces turn into "METRICS_ONLY" or some other denomination.

We could be more flexible with a level-based approach here, but I don't think we need to impact the existing sampler API, we only need to impact the "types of spans" we need Tracer to interact with.

@weyert
Copy link

weyert commented Jan 31, 2023

This sounds interesting. I have been experimenting with the idea of letting traces generate metrics and only send a subset of the traces that otel-collector to the tracing backend. I was thinking like only set a small percentage + always traces with errors to be send to the tracing backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:trace Related to the specification/trace directory
Projects
None yet
Development

No branches or pull requests

4 participants