Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/prometheus + processor/cumulativetodelta] Handling unknown start timestamps on cumulative streams #17190

Closed
danielgblanco opened this issue Dec 21, 2022 · 7 comments
Labels
bug Something isn't working data:metrics Metric related issues priority:p2 Medium processor/cumulativetodelta Cumulative To Delta processor Stale

Comments

@danielgblanco
Copy link

danielgblanco commented Dec 21, 2022

Component(s)

processor/cumulativetodelta, receiver/prometheus

What happened?

Description

When using a prometheus receiver to scrape targets that don't expose timestamps on metrics exported (e.g. another collector), the first scrape of a cumulative counter results in a data point with the same StartTimestamp and Timestamp, and the current value of the counter. For example, a collector accepting 20 spans per minute would report the following metric after it has been running for a while (when the scraping collector restarts and does the first scrape):

Metric #1
Descriptor:
     -> Name: otelcol_receiver_accepted_spans
     -> Description: Number of spans successfully pushed into the pipeline.
     -> Unit:
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> receiver: Str(otlp)
     -> service_instance_id: Str(2be3b72c-51b4-4e97-9f34-df7f462d9833)
     -> service_name: Str(otelcol-contrib)
     -> service_version: Str(0.64.1)
     -> transport: Str(grpc)
StartTimestamp: 2022-12-21 12:14:41.041 +0000 UTC
Timestamp: 2022-12-21 12:14:41.041 +0000 UTC
Value: 1256.000000

This is the intended behaviour if these metrics are exported with cumulative temporality and the exporter disregards the StartTimestamp (as it would be with other Prometheus exporters). However, when using it with a cumulativetodelta processor, and exporting OTLP data points with delta temporality, it seems to violate the advice in https://opentelemetry.io/docs/reference/specification/metrics/data-model/#cumulative-streams-handling-unknown-start-time.

Clearly, prometheus receiver reporting a 0 value here wouldn't be a good solution, because the second data point would have different StartTimestamp and Timestamp and be treated as a true counter reset, and I don't believe it'd be prometheus receiver responsibility to handle state and keep a "new" count of cumulative counters.

I believe, according to spec, that it should be the cumulativetodelta processor responsibility to handle cases where StartTimestamp and Timestamp are treat that as a "start" value, dropping the metric, while storing the value. Although, looking at the code, I'm unsure if the DeltaValue here should be considered valid (not sure in what cases the first value of a monotonic sum should be considered valid, I'd expect to have at least two values to calculate delta):

And the assumption here that only non-cumulative values should not report the first value, does it consider the case in question?

Steps to Reproduce

Run a collector scraping another collector (controlling scrape port via prometheus.io/scrape_port annotation) which is receiving approximately 20 spans per minute. The collector scraping Prometheus metric using the config below must be restarted a while after the collector being scraped to clearly see the behaviour.

Expected Result

The following metric data point reported with Timestamp: 2022-12-21 13:34:32.866 +0000 UTC and no data point reported with Timestamp: 2022-12-21 13:34:02.866 +0000 UTC:

Metric #4
Descriptor:
     -> Name: otelcol_receiver_accepted_spans
     -> Description: Number of spans successfully pushed into the pipeline.
     -> Unit:
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Delta
NumberDataPoints #0
Data point attributes:
     -> receiver: Str(otlp)
     -> service_instance_id: Str(2be3b72c-51b4-4e97-9f34-df7f462d9833)
     -> service_name: Str(otelcol-contrib)
     -> service_version: Str(0.64.1)
     -> transport: Str(grpc)
StartTimestamp: 2022-12-21 13:34:02.866 +0000 UTC
Timestamp: 2022-12-21 13:34:32.866 +0000 UTC
Value: 10.000000

Actual Result

The following data point being reported

Metric #4
Descriptor:
     -> Name: otelcol_receiver_accepted_spans
     -> Description: Number of spans successfully pushed into the pipeline.
     -> Unit:
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Delta
NumberDataPoints #0
Data point attributes:
     -> receiver: Str(otlp)
     -> service_instance_id: Str(2be3b72c-51b4-4e97-9f34-df7f462d9833)
     -> service_name: Str(otelcol-contrib)
     -> service_version: Str(0.64.1)
     -> transport: Str(grpc)
StartTimestamp: 2022-12-21 13:34:02.866 +0000 UTC
Timestamp: 2022-12-21 13:34:32.866 +0000 UTC
Value: 2970.000000

Collector version

v0.67.0

Environment information

Environment

otel/opentelemetry-collector-contrib Docker image running on Kubernetes v1.25.3

OpenTelemetry Collector configuration

receivers:
  prometheus:
    config:
      scrape_configs:
      - job_name: opentelemetry-collector
        kubernetes_sd_configs:
        - role: pod
        metric_relabel_configs:
        - action: keep
          regex: otelcol_receiver_accepted_spans
          source_labels:
          - __name__
        relabel_configs:
        - action: keep
          regex: opentelemetry-collector
          source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_name
        - action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $$1:$$2
          source_labels:
          - __address__
          - __meta_kubernetes_pod_annotation_prometheus_io_scrape_port
          target_label: __address__
        scrape_interval: 30s
exporters:
  logging:
    verbosity: detailed
processors:
  cumulativetodelta: null
service:
  pipelines:
    metrics:
      exporters:
      - logging
      processors:
      - cumulativetodelta
      receivers:
      - prometheus

Log output

No response

Additional context

No response

@danielgblanco danielgblanco added bug Something isn't working needs triage New item requiring triage labels Dec 21, 2022
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@Aneurysm9 Aneurysm9 added priority:p2 Medium data:metrics Metric related issues processor/cumulativetodelta Cumulative To Delta processor and removed needs triage New item requiring triage labels Dec 28, 2022
@github-actions
Copy link
Contributor

Pinging code owners for processor/cumulativetodelta: @TylerHelmuth. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Feb 27, 2023
@danielgblanco
Copy link
Author

After #18224 and #18298 I'll test this out in the example above and will close this ticket as soon possible if fixed.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 26, 2023
@mx-psi
Copy link
Member

mx-psi commented Jul 26, 2023

After #18224 and #18298 I'll test this out in the example above and will close this ticket as soon possible if fixed.

Any updates @danielgblanco ?

@danielgblanco
Copy link
Author

Thanks for the reminder @mx-psi . I've just tested the above steps with otel/opentelemetry-collector-contrib:0.81.0 and I'm seeing the expected behaviour when using prometheus receiver going through cumulativetodelta processor:

  1. On first scrape, only Gauge type metrics are exported. These have a StartTimestamp: 1970-01-01 00:00:00 +0000 UTC and Timestamp being the current timestamp.
  2. On second scrape, all metrics are exported. Gauge type metrics contain StartTimestamp: 1970-01-01 00:00:00 +0000 UTC and Timestamp being the current timestamp (as expected), and Sum type metrics contain StartTimestamp and Timestamp corresponding to the interval considered for deltas.

As such, I'll close this ticket and consider it fixed 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data:metrics Metric related issues priority:p2 Medium processor/cumulativetodelta Cumulative To Delta processor Stale
Projects
None yet
Development

No branches or pull requests

3 participants