Introduce means of producing metrics from all Spans regardless of sampling decision. #3145

thpierce · 2023-01-25T17:58:01Z

What are you trying to achieve?

We wish to generate key metrics for customer applications, with the following requirements:

Metrics should be generated from Spans produced by OTEL auto-instrumentation.
Metrics are flexible w.r.t. schema and content.
Metrics should be generated from 100% of Spans, regardless of sampling decision.
Span sampling rules should still be respected w.r.t:
- Exporting Spans from the Collector, and
- Propagating sampling decisions to child spans.
Metrics generation has minimal impact to the application being monitored (e.g. w.r.t CPU/Memory/Network/etc.).

Currently we have considered two solutions for this problem that could be contributed back to the OTEL community, but are open to alternative solutions:

Deferred Sampling
1. Majority of this solution is described in this issue. The summary is that the sampling decision would be made in the SDK, but all Spans would be sent to the Collector where the decision would be executed.
2. Once this is done, author a processor module for Collector that produces desired metrics from Spans (akin to, or an augmentation of, SpanMetricsProcessor.
SDK Metrics Span Processing
1. First, author a new aggregate Sampler in each OTEL SDK that takes the sampling decision made by a provided root sampler (i.e. parentbased_traceidratio) and converts all DROP sampling decisions to RECORD_ONLY sampling decisions.
2. Then, author a new SpanProcessor in each OTEL SDK that extracts and exports metrics from spans.

What did you expect to see?

A means to produce flexible metrics from 100% of span data regardless of sampling decision, while still supporting arbitrary sampling rules.

This may require an OTEP, but starting with this issue to raise awareness/get feedback on approach.

Additional context.

Related Issues:

Deferred Sampling:
- Let SDKs export all the spans regardless of their sampled flag #2986
- Proposal: Span Stats processor opentelemetry-collector-contrib#403
SDK Metrics Span Processing
- Some of the discussion in Making tracing SDK metrics aware #381

jmacd · 2023-01-31T16:49:55Z

This was discussed in today's Spec SIG. One comment by @jsuereth stands out: Can we "do this" without Sampling? I see the answer being Yes, but we still run into problems with the Sampler API. As @thpierce notes, option 2 involves a new Sampler.

Here are some related issues:

It is difficult to compose probability samplers that could be used in a span-to-metrics pipeline:
#2179

It is difficult to construct spans whose links are not known, which could be solved with a new span state similar to the one discussed in this issue. #2918

jsuereth · 2023-01-31T17:03:50Z

My suggestion was actually we could change the behavior of the Tracer such that:

The Sampler does its current job and responds with a sampling decision
If the presence of a "MetricSpanProcessor" exists or an "AllSpanProcessor" or whatever hook we have, the Sampler decision is overriden such that:
- Sampled traces remain sampled
- RECORD_ONLY traces remain recorded.
- dropped traces turn into "METRICS_ONLY" or some other denomination.

We could be more flexible with a level-based approach here, but I don't think we need to impact the existing sampler API, we only need to impact the "types of spans" we need Tracer to interact with.

weyert · 2023-01-31T22:41:50Z

This sounds interesting. I have been experimenting with the idea of letting traces generate metrics and only send a subset of the traces that otel-collector to the tracing backend. I was thinking like only set a small percentage + always traces with errors to be send to the tracing backend.

thpierce added the spec:trace Related to the specification/trace directory label Jan 25, 2023

github-actions bot assigned jmacd Jan 25, 2023

thpierce mentioned this issue Mar 16, 2023

Add new Sampler and SpanProcessor to allow for generating metrics from 100% of spans without impacting sampling open-telemetry/opentelemetry-java-contrib#789

Closed

thpierce mentioned this issue Mar 30, 2023

Add new components to allow for generating metrics from 100% of spans without impacting sampling open-telemetry/opentelemetry-java-contrib#802

Merged

jmacd mentioned this issue Apr 24, 2024

Project Tracking: Sampling #4012

Open

kalyanaj mentioned this issue May 10, 2024

Sampler API V2: What, Why, and How? #4044

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce means of producing metrics from all Spans regardless of sampling decision. #3145

Introduce means of producing metrics from all Spans regardless of sampling decision. #3145

thpierce commented Jan 25, 2023

jmacd commented Jan 31, 2023

jsuereth commented Jan 31, 2023

weyert commented Jan 31, 2023

Introduce means of producing metrics from all Spans regardless of sampling decision. #3145

Introduce means of producing metrics from all Spans regardless of sampling decision. #3145

Comments

thpierce commented Jan 25, 2023

jmacd commented Jan 31, 2023

jsuereth commented Jan 31, 2023

weyert commented Jan 31, 2023