Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper hits same target URL multiple times in each interval #3768

Closed
jkohen opened this Issue Jan 30, 2018 · 4 comments

Comments

Projects
None yet
2 participants
@jkohen
Copy link
Contributor

jkohen commented Jan 30, 2018

This is because target deduplication considers target URL and target labels. I have a K8s SD configuration that renames __meta_kubernetes_container_name into container_name, so the scraper thinks that multiple targets pointing to the same URL are different and scrapes them all, getting duplicate data back, and exporting it as different time series.

This happens when there are multiple containers per port, in at least two situations:

  1. containers with no ports specified, and Prometheus adds the default port number (e.g. :80).
  2. prometheus.io/port annotation in use, overriding the port of all containers to that value.

I've verified this by changing Target.String() to print the labels in addition to the URL.

I could be missing something, but I can't come up with a good reason why the labels contribute to the uniqueness of the target: the labels aren't part of the scrape request and I can't see how a single Prometheus process could see two different targets using the same URL. Doing some archeology in the code, the logic has been around for ~2 years, and the unit tests only populate labels for the components of the URL, so I can see how this case would have been missed.

I'm happy to take a stab at a fix. Can the maintainers comment on this approach?

  • Change Target.hash() to only consider the URL.
  • Update the test target in the unit tests to use the URL object instead of labels.
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 30, 2018

It's perfectly valid to have differently labelled targets all hitting the same URL, as otherwise you'd find your targets merged with other targets and some of your time series missing. Labels are the primary identity of a target, and affect what happens to the data that is scraped before it is ingested.

It sounds like you need to adjust your relabelling.

@jkohen

This comment has been minimized.

Copy link
Contributor Author

jkohen commented Jan 30, 2018

Thanks, Brian. I see your point. I do need the container_name, so I can't just drop it. I expected that the port annotation would behave as a selector, not as an override for the label value. What I want doesn't seem possible with the relabel language, but that's a different issue, and I agree with you that this isn't a bug as written (even though I'm not sure there's a solution for my problem).

For future reference, I'm using this rule from the example prometheus-service.yml:

        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
          target_label: __address__

@jkohen jkohen closed this Jan 30, 2018

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 31, 2018

That doesn't sound quite right, I'd suggest posting your full scrape config to the users list.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.