Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Receiver - "up" metric not used as expected, and not recorded. #3089

Closed
gillg opened this issue May 4, 2021 · 4 comments
Closed
Labels
area:receiver bug Something isn't working

Comments

@gillg
Copy link
Contributor

gillg commented May 4, 2021

Describe the bug

Maybe I don't understand someting, but the usual auto-generated metric "up" seems not handled correctly on the receiver.
As I understand we have a special case if the scraping discover a metric "up" in a Prometheus metrics page.
https://github.com/open-telemetry/opentelemetry-collector/blob/main/receiver/prometheusreceiver/internal/metricsbuilder.go#L96

It's also visible in tests here :

"up": {Metric: "up", Type: textparse.MetricTypeCounter, Help: "", Unit: ""},

The receiver is completly dropping all intenal prometheus metrics ! Why ?

This could make sense if the metric emiter create the "up" metric but it's not the normal use case of prometheus.
As explained here: https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series
The scrapper (here our receiver) is generating the new metrics "up" and "scrape_xxxxx", applying labels from the scrapping config.
We can also imagine another collector on top of the first collector (with a prometheus exporter). In this case, some up metrics could already exists when the second receiver scrape it, but it shouldn't drop these metrics. It should add a new "up" metrics with new labels (because its config will be necessarly different), or in the worse case we can imagine rename the original "up" metric to "external_up" metric to follow Promeheus concepts.

For exemple the config parameter on Prometheus scrape config honor_labels permits to override any existing job and instance labels or rename them to external_job and external_instance before generating new labels.

I can help to make changes and test, but I don't found exactly where the HTTP call to the target is made.

Steps to reproduce

If you add a scrape config with an invalid endpoint, you will have logs like Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1620121899828, "target_labels": "map[instance:172.17.0.1:xxxx job:xxxxxxx otel_job:xxxxxx]"}
But you don't record any metric to inform about it.

What did you expect to see?

We should record a metric "up" for each scraping phases.

Metric name : "up"
Metric type : UpDownCounter (basicaly a boolean)
Metric labels : instance + job + [relabel_configs]
Metric description : The scrapping succeed or not

If the metric with the same labels already exists override it (User should be able to create relabel_configs to deal with it)

What did you see instead?

No way to know if your otel collector receiver was able to scrape a target or not.

What version did you use?
v0.25.0

What config did you use?

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

  prometheus:
    buffer_period: 234
    buffer_count: 45
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 5s
          static_configs:
            - targets: ['otel-collector:8888']
          relabel_configs:
          # Trick because otel collector not expose the job
          - action: replace
            replacement: otel-collector
            target_label: otel_job
        - job_name: fake-target
          scrape_interval: 5s
          static_configs:
            - targets: ['fake-domain:9045']
          relabel_configs:
          # Trick because otel collector not expose the job
          - action: replace
            replacement: fake-target
            target_label: otel_job
@gillg gillg added the bug Something isn't working label May 4, 2021
@gillg
Copy link
Contributor Author

gillg commented May 4, 2021

In the meantime, generate scrape_duration_seconds, scrape_samples_post_metric_relabeling, scrape_samples_scraped, scrape_series_added could be a good point to have a fully prometheus compatible scrapper.
To discuss

@gillg
Copy link
Contributor Author

gillg commented May 4, 2021

In fact... I have the feeling that only remove this line could resolve the problem :

https://github.com/open-telemetry/opentelemetry-collector/blob/main/receiver/prometheusreceiver/internal/metricsbuilder.go#L99

The scrapping lib seems already add the metric ? Am I wrong ?

@odeke-em
Copy link
Member

odeke-em commented May 5, 2021

@gillg thank you for filing this bug and for debugging! I have a PR going to address this per #2918 which I am going to update shortly and then hopefully this week we can merge it in and have the Up metric being exported.

@gillg
Copy link
Contributor Author

gillg commented Jun 30, 2021

Closed by PR #3116

@gillg gillg closed this as completed Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:receiver bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants