Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Metrics Processor APIs and SDK requirements #1116

Closed
as-polyakov opened this issue Oct 20, 2020 · 12 comments
Closed

Document Metrics Processor APIs and SDK requirements #1116

as-polyakov opened this issue Oct 20, 2020 · 12 comments
Assignees
Labels
area:sdk Related to the SDK priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:metrics Related to the specification/metrics directory

Comments

@as-polyakov
Copy link

(moving from open-telemetry/opentelemetry-java#1810)
Is your feature request related to a problem? Please describe.
Similar to SpanProcessor, we need a hook for metrics which would allow users to intercept metrics lifecycle eventы with their own logic.

Describe the solution you'd like

public interface MetricsProcessor {

  /**
   * Called when bind() method is called. Allows to manipulate labels which this
   * instrument is bound to. Particular use case includes enriching lables and/or adding more labels
   * depending on the Context
   * @param ctx context of the operation
   * @param descriptor instrument descriptor
   * @param labels immutable labels. When processors are chained output labels of the previous one is passed as
   * an input to the next one. Last labels returned by a chain of processors are used for bind() operation.
   * @return labels to be used as an input to the next processor in chain or bind() operation if this is the last processor
   */
    Labels onLabelsBound(Context ctx, InstrumentDescriptor descriptor, Labels labels);

  /**
   *  Called when .record() method is called. Allows to manipulate recorded value. When chained input of the next
   *  call is the output of the previous call. Final output is recorded
   * @param ctx context of the operation
   * @param descriptor instrument descriptor
   * @param labels immutable labels. When processors are chained output labels of the previous one is passed as
   * @param value recorded value
   * @return value to be used as an input to the next processor in chain or record() operation if this is the last processor
   */

    long onLongMeasurement(Context ctx, InstrumentDescriptor descriptor, Labels labels, long value);

    double onDoubleMeasurement(Context ctx, InstrumentDescriptor descriptor, Labels labels, double value);
}

Describe alternatives you've considered
Main problem I am trying to address - is to be able to add certain labels to metrics depending on Context (i.e. dynamically). One alternative can be to just do that and make it a standalone feature aka "sticky labels". Had a simple PR for this here. However MetricsProcessor is a more generic solution which can address other use cases.
As for the implementation, instead of using immutable Labels we can pass ReadWriteLables object, this would make it follow the same patter as in SpanProcessor

Additional context
There's a problem with this approach - statically bound instruments will not be able to dynamically add labels during record operation. Since we only intercept onLabelsBound() in case one statically binds an instrument and then uses that to record multiple measurements with different contexts, only first bind call will be intercepted

Example PR
https://github.com/parallelstream/opentelemetry-java/pull/3/commits

@as-polyakov as-polyakov added the spec:metrics Related to the specification/metrics directory label Oct 20, 2020
@andrewhsu andrewhsu added area:sdk Related to the SDK priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Oct 20, 2020
@jmacd jmacd changed the title MetricsProcessor Document Metrics Processor APIs and SDK requirements Oct 20, 2020
@jmacd
Copy link
Contributor

jmacd commented Oct 20, 2020

Hi @parallelstream
Please take a look at the WIP SDK specification: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/metrics/sdk.md#processor

There is already a metrics Processor concept, although it is not completely documented. @jkwatson and @bogdandrutu have been working with me to resolve terminology here -- I am under the impression that there is already something similar to a Processor component in the Java SDK, but the API is not quite like the one you gave.

In the OTel-Go SDK, I believe your stated goal can be realized by implementing one of the APIs in sdk/export/metric: https://github.com/open-telemetry/opentelemetry-go/blob/master/sdk/export/metric/metric.go

I believe onLabelsBound could be achieved by implementing the Accumulator API, whereas onLongMeasurement would be an Aggregator API, also note there is a separate Processor API oriented around whole collection intervals, so all three of these APIs can be implemented to control certain behavior. I believe the overall goal is to move toward a configurable SDK, since setting up some of the behavior you're after requires modifying behavior of the Accumulator and new Aggregators. @bogdandrutu mentioned that there may be missing APIs for supporting distributed context, i.e., aggregation over correlation context keys. Let's discuss this here.

As a requirement for the OpenTelemetry specification, this issue should remain open until we clarify:

(1) which exporter APIs are available for processing metrics
(2) which standard processor components and functionality are available in the default SDKs.

@andrewhsu andrewhsu added this to Required for GA or Allowed for GA. Add label required-for-ga or allowed-for-ga to issues when moving to this column. in GA Spec Burndown Oct 20, 2020
@as-polyakov
Copy link
Author

yeah, re distributed context - this is spot on what we're doing. We store common labels in the Baggage and want a standard way of writing a custom logic to extract it and attach as KV tags to metrics and spans produced. With Spans we do it in SpanProcessor, so smith similar for metrics was envisaged.

@hstan
Copy link

hstan commented Oct 21, 2020

👋 @jmacd
I've been working with @parallelstream on this, I just drafted a PR in OTel-Go SDK open-telemetry/opentelemetry-go#1271 to address this discussion. Could you please take a look and suggest if it's an appropriate approach? Thanks.

@jmacd
Copy link
Contributor

jmacd commented Oct 27, 2020

I'm interested in working on terminology a bit. We've been discussing an existing metrics Processor that is similar in ways to the same term in Tracing. The thing you're introducing in open-telemetry/opentelemetry-go#1271 is something we need, but I was initially confused because of the term.

There has been a bit of discussion about sampling in the metrics API, as part of a discussion about raw metric values and exemplars.

In the closed draft PR linked above I modified the metric Accumulator to allow for reducing label dimensions before building label sets. The connection to sampling is that if you want to sample raw metric events at full dimensionality but not accumulate them at full dimensionality, you need to capture these dimensions before they're accumulated. I'd like to find a better term for processing that happens during the early stages of the accumulator as in open-telemetry/opentelemetry-go#1023.

Your application here injects distributed context label values, also during the early stages of the accumulator. I see these applications as very similar. How does the term "AccumulatorProcessor" sound? It's a lot of letters but helps distinguish from the current processor concept, which is post-accumulator.

@jmacd
Copy link
Contributor

jmacd commented Nov 12, 2020

The linked-to PR in Go looks great. It's a short and sweet interface for enriching metric events with labels from the context. My only question is whether we should adopt the term "Processor" for this new interface and return the existing Processor component to using a term such as "Batcher" or "Integrator", both of which we have used in the past. I'm looking for others' input.

@as-polyakov
Copy link
Author

as-polyakov commented Nov 24, 2020

Hi @jmacd thanks for pointers. After reading the above I see the confusion and generally agree with your assessment - we need to get terminology right.
Let's step back for a while from baggage enricher Hanshuo illustrated and look at the problem broader. What we need to do is to have an extension point where we can intercept and potentially modify labels for recordings at a very early stage of the export process - before they reach Accumulator. This is what I proposed in the very beginning but naming there was confusing as I didn't know that Processor is smith we already reserve for a much later stage of the export pipeline.

AccumulatorProcessor seems reasonable to me. It needs to expose same methods I outlined and needs to be called right after the record() method itself, before metric gets into the accumulator. So if I just modify the name of the interface, rest stays intact:

public interface AccumulatorProcessor {


    Labels onLabelsBound(Context ctx, InstrumentDescriptor descriptor, Labels labels);


    long onLongMeasurement(Context ctx, InstrumentDescriptor descriptor, Labels labels, long value);

    double onDoubleMeasurement(Context ctx, InstrumentDescriptor descriptor, Labels labels, double value);
}

If we have this extension point, then baggage-based labels population which Hanshuo shows becomes just an application-level implementation of the AccumulatorProcessor and might not even live in OTel repo, it can be smith we keep for ourselves (we can consider making it part of standard OTel though if people find it useful).

Implementation wise I agree it needs to go before building the label set for an instrument, I thought bind() method of AbtractSynchrousInstrument can be a good place per my PR

thoughts?

@jmacd
Copy link
Contributor

jmacd commented Dec 4, 2020

@parallelstream I'm very supportive of this idea. As far as terminology, @bogdandrutu implied he thinks we should use the name "Processor" for this API you are describing--which sits between the API and the Accumulator, and we should adopt another term for what is today "Processor" which sits between the Accumulator and Exporter (e.g., #1198). Prior terms we've used for the component between Accumulator and Exporter: "Batcher", "Integrator".

As far as the interface itself (independent of name), I agree with the following:

baggage-based labels population which Hanshuo shows becomes just an application-level implementation of the AccumulatorProcessor

I want to challenge you to find more than one application of the API as a starting point, and I hinted at one above. See the note here in a draft PR, which I had developed only as a proof-of-concept for label reduction. Label reduction can be performed in either of these locations: between the API and the Accumulator, or between the Accumulator and the Exporter, i.e., before or after aggregation of an interval.

OTLP has support for RawValue exemplars, and at the time this was added we also discussed support for a sample_count field that would convey the effective sampling rate for metric exemplars. (In Statsd, this is expressed as @sample_rate, the inverse of sample_count.) The idea expressed in the code linked above is that we can sample metric events at full dimensionality while reducing dimensionality before the Accumulator. Done correctly, the stream of metric exemplars approximates the full-dimensionality timeseries, and I'm suggesting that it can be calculated in an AccumulatorProcessor. The challenge is to (somehow) output the selected set of raw values, which requires a different export interface. Does this interest you?

@jmacd
Copy link
Contributor

jmacd commented Dec 4, 2020

@bogdandrutu I have a related question. In the past you have mentioned that because of the cost of extracting correlations from baggage, that you have seen the work of processing labels out of the critical path into a queue of metric events. Apparently this is being proposed in OTel-Java (@jkwatson, @carlosalberto), and I wonder if we should be talking about any notions of metric event queuing at the specification level.

If you follow the discussion above (#1116 (comment)), it shows that we can replace a stream of metric-API events with a stream of sampled metric-API events using a hypothetical AccumulatorProcessor API. Should we try to formalize a kind of metric-event-stream Processor that might have other applications, such as to offload expensive metric operations into the background? This could generalize even further than the AccumulatorProcessor interface, maybe?

@jkwatson
Copy link
Contributor

jkwatson commented Dec 4, 2020

FYI, I don't know of any proposal to do this in otel-java. Can you point me at an issue or conversation I might have missed?

@as-polyakov
Copy link
Author

@parallelstream I'm very supportive of this idea. As far as terminology, @bogdandrutu implied he thinks we should use the name "Processor" for this API you are describing--which sits between the API and the Accumulator, and we should adopt another term for what is today "Processor" which sits between the Accumulator and Exporter (e.g., #1198). Prior terms we've used for the component between Accumulator and Exporter: "Batcher", "Integrator".

This makes ton of sense to me. I would be happy with either, but as a user "Processor" is naturally closer and understandable.

I want to challenge you to find more than one application of the API as a starting point, and I hinted at one above.

Sure thing. I can think of a few:

  • Common tags is a widely used feature which we also need (https://micrometer.io/docs/concepts#_common_tags). Even though one might argue we can use Resource for that, not everything suits Resource well plus sometimes it is great to have it dynamically defined
  • tags baggage enrichment @hstan is talking about
  • various tags translation/enrichment when using third party instrumentations. Imagine you pull an SQS client which comes instrumented with OT and you want certain tags to be added and/or modified (an example could be adding a queue specific metadata). Changing the original library is not an option while having this processor interface solves all of these problems.

The challenge is to (somehow) output the selected set of raw values, which requires a different export interface. Does this interest you?

Not at this stage. We are adopting a different approach - we export all data as raw from the SDK and do cardinality reduction as well as roll ups and throttling in a streaming processing pipeline

@as-polyakov
Copy link
Author

@jmacd do you want me to go ahead and create a PR for the changes proposed along the lines of an example PR here https://github.com/parallelstream/opentelemetry-java/pull/3/commits? Do we have enough understanding now about what we are trying to achieve?

@jmacd jmacd removed their assignment Feb 4, 2021
@reyang reyang self-assigned this Mar 3, 2021
@reyang reyang removed this from Required/Allowed for GA, todo. in GA Spec Burndown Mar 10, 2021
@reyang reyang added this to To do in Spec - Metrics API/SDK Experimental Release via automation Mar 10, 2021
@reyang reyang moved this from To do to In progress in Spec - Metrics API/SDK Experimental Release Sep 2, 2021
@reyang
Copy link
Member

reyang commented Sep 14, 2021

The Metrics SDK has changed a lot and the processor that takes raw measurements will not be part of the scope for the initial stable release.

@reyang reyang closed this as completed Sep 14, 2021
Spec - Metrics API/SDK Experimental Release automation moved this from In progress to Done Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:metrics Related to the specification/metrics directory
Projects
No open projects
Development

No branches or pull requests

6 participants