Prometheus Receiver - "up" metric not used as expected, and not recorded. #3089

gillg · 2021-05-04T10:12:42Z

Describe the bug

Maybe I don't understand someting, but the usual auto-generated metric "up" seems not handled correctly on the receiver.
As I understand we have a special case if the scraping discover a metric "up" in a Prometheus metrics page.
https://github.com/open-telemetry/opentelemetry-collector/blob/main/receiver/prometheusreceiver/internal/metricsbuilder.go#L96

It's also visible in tests here :

opentelemetry-collector/receiver/prometheusreceiver/internal/metricsbuilder_test.go

Line 47 in 1e1f24b

    
           "up":              {Metric: "up", Type: textparse.MetricTypeCounter, Help: "", Unit: ""},

The receiver is completly dropping all intenal prometheus metrics ! Why ?

This could make sense if the metric emiter create the "up" metric but it's not the normal use case of prometheus.
As explained here: https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series
The scrapper (here our receiver) is generating the new metrics "up" and "scrape_xxxxx", applying labels from the scrapping config.
We can also imagine another collector on top of the first collector (with a prometheus exporter). In this case, some up metrics could already exists when the second receiver scrape it, but it shouldn't drop these metrics. It should add a new "up" metrics with new labels (because its config will be necessarly different), or in the worse case we can imagine rename the original "up" metric to "external_up" metric to follow Promeheus concepts.

For exemple the config parameter on Prometheus scrape config honor_labels permits to override any existing job and instance labels or rename them to external_job and external_instance before generating new labels.

I can help to make changes and test, but I don't found exactly where the HTTP call to the target is made.

Steps to reproduce

If you add a scrape config with an invalid endpoint, you will have logs like Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1620121899828, "target_labels": "map[instance:172.17.0.1:xxxx job:xxxxxxx otel_job:xxxxxx]"}
But you don't record any metric to inform about it.

What did you expect to see?

We should record a metric "up" for each scraping phases.

Metric name : "up"
Metric type : UpDownCounter (basicaly a boolean)
Metric labels : instance + job + [relabel_configs]
Metric description : The scrapping succeed or not

If the metric with the same labels already exists override it (User should be able to create relabel_configs to deal with it)

What did you see instead?

No way to know if your otel collector receiver was able to scrape a target or not.

What version did you use?
v0.25.0

What config did you use?

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

  prometheus:
    buffer_period: 234
    buffer_count: 45
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 5s
          static_configs:
            - targets: ['otel-collector:8888']
          relabel_configs:
          # Trick because otel collector not expose the job
          - action: replace
            replacement: otel-collector
            target_label: otel_job
        - job_name: fake-target
          scrape_interval: 5s
          static_configs:
            - targets: ['fake-domain:9045']
          relabel_configs:
          # Trick because otel collector not expose the job
          - action: replace
            replacement: fake-target
            target_label: otel_job

The text was updated successfully, but these errors were encountered:

gillg · 2021-05-04T10:14:22Z

In the meantime, generate scrape_duration_seconds, scrape_samples_post_metric_relabeling, scrape_samples_scraped, scrape_series_added could be a good point to have a fully prometheus compatible scrapper.
To discuss

gillg · 2021-05-04T14:58:15Z

In fact... I have the feeling that only remove this line could resolve the problem :

https://github.com/open-telemetry/opentelemetry-collector/blob/main/receiver/prometheusreceiver/internal/metricsbuilder.go#L99

The scrapping lib seems already add the metric ? Am I wrong ?

odeke-em · 2021-05-05T20:18:09Z

@gillg thank you for filing this bug and for debugging! I have a PR going to address this per #2918 which I am going to update shortly and then hopefully this week we can merge it in and have the Up metric being exported.

gillg · 2021-06-30T07:48:22Z

Closed by PR #3116

gillg added the bug Something isn't working label May 4, 2021

gillg mentioned this issue May 6, 2021

"wake up" internal prometheus scrapper metrics (up / scrape_xxxx) #3116

Merged

alolita added area:prometheus area:receiver labels May 6, 2021

gillg closed this as completed Jun 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus Receiver - "up" metric not used as expected, and not recorded. #3089

Prometheus Receiver - "up" metric not used as expected, and not recorded. #3089

gillg commented May 4, 2021

gillg commented May 4, 2021

gillg commented May 4, 2021

odeke-em commented May 5, 2021

gillg commented Jun 30, 2021

Prometheus Receiver - "up" metric not used as expected, and not recorded. #3089

Prometheus Receiver - "up" metric not used as expected, and not recorded. #3089

Comments

gillg commented May 4, 2021

gillg commented May 4, 2021

gillg commented May 4, 2021

odeke-em commented May 5, 2021

gillg commented Jun 30, 2021