Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes discovery is reporting pods that are long gone #2398

Closed
checketts opened this Issue Feb 6, 2017 · 10 comments

Comments

Projects
None yet
4 participants
@checketts
Copy link

checketts commented Feb 6, 2017

What did you do?
Navigating to the /targets page for Prometheus when running the Kubernetes discovery and viewing the kubernetes-pods job.

What did you expect to see?
I expected to see just pods that existed

What did you see instead? Under which circumstances?
I am instead seeing that 2 pods are listed as target that haven't existed for over 1 month (showing just one in the screenshot).

I'm unable to locate the pods using kubectl or any other means. There are 2 current pods from the same deployment that are being tracked, so the 2 old ones are from a previous version of the deployment.

image

Environment
Prometheus is running in s Kubernetes Pod using Kubernetes Discovery

  • System information:

Linux 4.4.0-45-generic x86_64

  • Prometheus version:
prometheus, version 1.5.0 (branch: master, revision: d840f2c400629a846b210cf58d65b9fbae0f1d5c)
  build user:       root@a04ed5b536e3
  build date:       20170123-13:56:24
  go version:       go1.7.4
  • Prometheus configuration file:
- job_name: 'kubernetes-pods'
  kubernetes_sd_configs:
  - role: pod

  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: (.+):(?:\d+);(\d+)
    replacement: ${1}:${2}
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name
  - source_labels: [__meta_kubernetes_pod_annotation_appVersion]
    action: replace
    target_label: app_version
  • Logs:
time="2017-02-06T06:50:44Z" level=info msg="Starting prometheus (version=1.5.0, branch=master, revision=d840f2c400629a846b210cf58d65b9fbae0f1d5c)" source="main.go:75"
time="2017-02-06T06:50:44Z" level=info msg="Build context (go=go1.7.4, user=root@a04ed5b536e3, date=20170123-13:56:24)" source="main.go:76"
time="2017-02-06T06:50:44Z" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:248"
time="2017-02-06T06:50:44Z" level=info msg="Loading series map and head chunks..." source="storage.go:373"
time="2017-02-06T06:50:44Z" level=info msg="0 series loaded." source="storage.go:378"
time="2017-02-06T06:50:44Z" level=info msg="Listening on :9090" source="web.go:259"
time="2017-02-06T06:50:44Z" level=info msg="Starting target manager..." source="targetmanager.go:61"
time="2017-02-06T06:50:44Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:94"
time="2017-02-06T06:50:44Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:94"
time="2017-02-06T06:50:44Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:94"
time="2017-02-06T06:50:44Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:94"
time="2017-02-06T06:50:44Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:94"
time="2017-02-06T06:50:44Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:94"
@brancz

This comment has been minimized.

Copy link
Member

brancz commented Feb 7, 2017

Does this persist over restarts? I'm trying to figure out whether this information is incorrectly returned by the API or if it's an issue with Prometheus (kubectl may not show it while the API still exposes it).

@checketts

This comment has been minimized.

Copy link
Author

checketts commented Feb 7, 2017

Yes, it persists across Prometheus restarts.

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Feb 7, 2017

Interesting, that sounds like Kubernetes is returning those through the API. Let's have a look at the payload that the Kubernetes API returns when Prometheus discovers the targets. Could you start kubectl proxy and curl localhost:8001/api/v1/pods and paste the result in a gist? Then I can analyze that response.

@checketts

This comment has been minimized.

Copy link
Author

checketts commented Feb 7, 2017

Here is the gist: https://gist.github.com/checketts/285135b6fd6de0c192a22934d53fb7af

drydock-2582274429-3q0yg is one of the current pods and it appears in the output. I don't see drydock-2604689587-itpsi which is one of the old pods that is long gone.

@checketts

This comment has been minimized.

Copy link
Author

checketts commented Feb 7, 2017

Well I just did another restart (changing the storage path at the same time, for storage reasons) and now the phantom pod is gone. Sorry I wasn't able to provide any useful data.

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Feb 9, 2017

Interesting, if @fabxc doesn't have an idea what may have happened, I'd say we close this here and reopen if we see it occur again and can analyze the underlying issue further.

@checketts

This comment has been minimized.

Copy link
Author

checketts commented Feb 9, 2017

Thanks. I'll keep an eye on it and reopen if needed.

@checketts checketts closed this Feb 9, 2017

@andrewhowdencom

This comment has been minimized.

Copy link

andrewhowdencom commented Feb 9, 2017

Interestingly, I am also seeing expired pods; was initially confused as resource usage was all 0. Am happy to look further into this if required.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Feb 10, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.