Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Prometheus Receiver] Histogram and Summary metric count value outputs minInt64 for non-numerical input values #6376

Closed
PaurushGarg opened this issue Nov 18, 2021 · 2 comments · Fixed by #7043
Labels
comp:prometheus Prometheus related issues comp: receiver Receiver

Comments

@PaurushGarg
Copy link
Member

PaurushGarg commented Nov 18, 2021

Describe the bug

We want to ensure OpenMetrics / Prometheus compatibility in the OpenTelemetry Collector. We have been building compatibility tests to verify the OpenMetrics spec is fully supported on both the OpenTelemetry Collector Prometheus receiver and PRW exporter as well as in Prometheus itself.

Prometheus Receiver should assign a metric staleNaN value, if the metric is missing in the current scrape but was present in the previous scrape. However, currently metric builder do not assign staleNaN values to the histogram and summary values, that are passed by the Prometheus scrape loop for failed scrapes.
Histogram and Summary count values are int64, and the casting of float64 type num-numerical values(staleNaN, normalNaN, and +-Inf) assign minInt64(-9223372036854775808) number to the count.

Steps to reproduce

  • Run func TestEndToEnd(t *testing.T)
  • Currently, the validate loop is skipped for the tests, re-enable the validate loop by removing/commenting following lines from func testEndToEnd(...) (lines 1442-1445)
if true {
   t.Log(`Skipping the "up" metric checks as they seem to be spuriously failing after staleness marker insertions`)
   return}

  • Note: the test fails in getValidScrapes due to staleness, inspect the metrics and find the first failed scrape. The Histogram and Summary count values in the failed scrapes are minInt64(-9223372036854775808) instead of non-numerical values (staleNaN, normalNaN, and +-Inf)

What did you see instead?

Scraping endpoints that contains histogram/summary metric and a failed scrape in between, produces the following graph in Prometheus Web UI. The histogram/summary count value is plotted as the peak in the below graph:
Screen Shot 2021-11-01 at 9 40 08 PM

However, if same data is passed directly to the Prometheus Server. Prometheus WebUI produces following graph:
Screen Shot 2021-11-03 at 7 29 30 AM

Possible Solution

Since count value (int64) for histogram/summary can not be assigned float64 values, one possible solution is to use directly use OTLP format in OTLP Prometheus receiver metricbuilder, and assign datapoint flags (MetricDataPointFlagNoRecordedValue) to the metric as staleness marker. See linked issue: #6400

What version did you use?

Collector-Contrib: v- 0.37.1

Additional context
Related to open-telemetry/wg-prometheus#57
Linked Issue: #6400 #6000 #6087

cc @alolita @Aneurysm9

@bogdandrutu
Copy link
Member

@PaurushGarg I agree, the prometheus receiver should not set that "NaN" value but instead should use the OTLP native no-value present for that (for all metrics not just for histograms).

@PaurushGarg
Copy link
Member Author

@PaurushGarg I agree, the prometheus receiver should not set that "NaN" value but instead should use the OTLP native no-value present for that (for all metrics not just for histograms).

@bogdandrutu thanks. Is there a tracking issue for Prometheus Receiver to directly use OTLP format in metric builder? If not, do we need to create one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:prometheus Prometheus related issues comp: receiver Receiver
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants