Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus 2.0.0-beta.0 gaps in metrics #2982

Closed
hanikesn opened this Issue Jul 24, 2017 · 8 comments

Comments

Projects
None yet
4 participants
@hanikesn
Copy link

hanikesn commented Jul 24, 2017

I'm running the prometheus 2.0.0-beta.0 in our kubernetes 1.7.1 cluster and it's having trouble with gaps in the metrics which are scraped from cadvisor. Prometheus 1 .7.0 scrapes them fine:

2.0.0-beta.0
prometheus2

1.7.0
promtheus

Scrape rule:

- job_name: kubernetes-cadvisor
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
  - api_server: null
    role: node
    namespaces:
      names: []

There's nothing suspicious in the logs.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 24, 2017

The gaps are probably real, and this is the new staleness handling. Are you sure that the containers in question were returned at every scrape?

@hanikesn

This comment has been minimized.

Copy link
Author

hanikesn commented Jul 24, 2017

Are you sure that the containers in question were returned at every scrape?

I need to dig into this deeper.

The gaps are probably real, and this is the new staleness handling.

You can't always control what you're scraping so I'm wondering whether it makes sense to allow overriding this behavior per target.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 24, 2017

I'm wondering whether it makes sense to allow overriding this behavior per target.

It's all or nothing.

@hanikesn

This comment has been minimized.

Copy link
Author

hanikesn commented Jul 24, 2017

The /metrics endpoint actually randomly alters between showing systemd cgroups and docker containers. So this is clearly a bug in kubernetes/cadvisor.

See:
#2916
kubernetes/kubernetes#48483

@hanikesn hanikesn closed this Jul 24, 2017

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Aug 1, 2017

The linked k8s issue does not resolve this in general I think.
Even after the fix, the cAdvisor endpoint exposes a lot of metrics only occasionally.

@dadux

This comment has been minimized.

Copy link

dadux commented Aug 3, 2017

Just upgraded to 2.0.0-beta.0 seeing the same "gaps" with the cAdvisor target.

@fabxc - I'm not familiar with the staleness handling, but my understanding is in that case prometheus 1.7 metrics would be inaccurate too and therefore there's no point for us rolling back ?

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Aug 3, 2017

Yea, so cAdvisor metrics are flapping on and off. 2.0 just happens to detect this and set staleness markers.
Practically, it's just reduced sampling frequency so what 1.7 shows is not totally wrong.

Overall it's definitely a cAdvisor issue though google/cadvisor#1704

@gouthamve gouthamve referenced this issue Nov 27, 2017

Closed

omit data #3514

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.