Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recycled IP causes wrong metrics #2262

Closed
NoumanSaleem opened this Issue Dec 7, 2016 · 5 comments

Comments

Projects
None yet
3 participants
@NoumanSaleem
Copy link

NoumanSaleem commented Dec 7, 2016

What did you do?
During a load test, we had new instances launch which were given IPs previously seen and scraped by Prometheus.

Our scrape configuration uses Consul and assigns the name label to __meta_consul_service. The issue is that the name label was not updated to reflect the new service running at that scrape target, and caused inconsistencies in metrics.

What did you expect to see?
Not sure, maybe the name label to be updated? I guess I'm not 100% positive this issue resides with Prometheus, or possibly Consul was at fault for not updating the service. Just wanted to open this issue in case others had encountered this.

Environment

  • System information:
    Linux 3.13.0-93-generic x86_64
  • Prometheus version:
    prometheus, version 1.1.1 (branch: master, revision: 24db241)
    build user: root@90d3f69e2d67
    build date: 20160907-09:42:10
    go version: go1.6.3
  • Prometheus configuration file:
global:
  scrape_interval:     1m
  evaluation_interval: 1m

scrape_configs:
  - job_name: 'service'
    consul_sd_configs:
    - server: 172.17.42.1:8500
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: ',((api|data)-service),'
        action: keep
      - source_labels: [__meta_consul_service_address, __meta_consul_service_port]
        separator: ':'
        target_label: __address__
      - source_labels: [__meta_consul_service]
        target_label: name
      - source_labels: [__meta_consul_tags]
        target_label: tags
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 7, 2016

This this all happen in the space of 5 minutes? If so you're hitting staleness, wait a bit.

@NoumanSaleem

This comment has been minimized.

Copy link
Author

NoumanSaleem commented Dec 7, 2016

@brian-brazil sounds like Prometheus handles this then? Good to hear. I wish I had collected more info for this issue, including monitoring it to see if it resolved itself.

Does that mean the metrics for those 5 minutes will be lost? Is it preferable to instead include the name label in the exported metrics themselves, instead of having them in scrape config?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 7, 2016

Does that mean the metrics for those 5 minutes will be lost?

No, certain queries will return both for a while though.

beorn7 added a commit that referenced this issue Jan 6, 2017

Retrieval: Avoid copying Target
retreival.Target contains a mutex. It was copied in the Targets()
call. This potentially can wreak a lot of havoc.

It might even have caused the issues reported as #2266 and #2262 .
@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Feb 16, 2017

#2323 fixed the case for K8s SD. If this wasn't just a staleness issue as suggested by @brian-brazil , I would assume it is fixed in the same way. Please re-open with further evidence if you think otherwise.

@beorn7 beorn7 closed this Feb 16, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.