Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes: Pods don't have their deployment as a label #2073

Closed
EdSchouten opened this Issue Oct 10, 2016 · 10 comments

Comments

Projects
None yet
3 participants
@EdSchouten
Copy link
Contributor

EdSchouten commented Oct 10, 2016

What did you do?
We're using the latest version of the prom/prometheus Docker image, so that we can run Prometheus on Kubernetes 1.3.6. Our Prometheus configuration is almost entirely based on a stock version of:

https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml

What did you expect to see?
We were expecting to see that the name of the deployment of a pod would also be available through a label attached to metrics related to pods. That way you could just say something like this:

ALERT MyDeploymentDown
  IF sum(up{kubernetes_deployment_name="apache"}) / count(up{kubernetes_deployment_name="apache"}) < 0.5
  ...

In other words: throw an alert if more than half of the jobs in the deployment are down.

What did you see instead? Under which circumstances?

When starting up pods, they get discovered and scraped by Prometheus. They properly have their namespace, pod name, etc. associated with it, but no deployment name. For example:

up{instance="10.101.241.142:80",job="kubernetes-pods",k8s_app="alertmanager",kubernetes_namespace="kube-system",kubernetes_pod_name="alertmanager-3154633503-mwx0t",pod_template_hash="3154633503",version="v1"}    

Environment

  • System information:

    Linux 4.6.3-coreos x86_64

  • Prometheus version:

prometheus, version 1.2.0 (branch: master, revision: 522c93361459686fe3687f5ffe68c2ee34ea5c8e)
  build user:       root@c8088ddaf2a8
  build date:       20161007-12:53:55
  go version:       go1.6.3
@brancz

This comment has been minimized.

Copy link
Member

brancz commented Oct 10, 2016

As a heads up, we are currently in the process of rewriting the kubernetes service discovery.

As far as I understand your problem can be solved by pointing a service at the pods created through the deployment, and then use endpoint discovery. Then you can adapt your query to select the up metric based on the service name.

@EdSchouten

This comment has been minimized.

Copy link
Contributor Author

EdSchouten commented Oct 11, 2016

Hi Frederic,

Thanks for the pointers! I just gave the service endpoint discovery a try and it looks like that also has its own set of limitations:

  • With the default config, we only seem to set the kubernetes_name and kubernetes_namespace labels. This means that if I start a service with more than one replica, they will both use the same set of labels.
  • Looking at the full set of labels available, the only label that seems unique across replicas is the IP address. This is a bit inconvenient, as these are more likely to be reused across restarts than, say, the pod name.
@brancz

This comment has been minimized.

Copy link
Member

brancz commented Oct 11, 2016

If that's what you want you can use relabelling rules to map for example the pod name to a label. However, I don't see how

they will both use the same set of labels

is a problem. (All time series include the instance label, indicating where it was scraped from)

For example: I have a Deployment of Prometheus in kubernetes and then run this query in the console:

up{job="prometheus"}

Then the result includes the instance label:

screen shot 2016-10-11 at 11 07 57 am

So what you could do is this query:

count(up{job="prometheus"} == 0) > count(up{job="prometheus"} == 1)

Which would be the same semantic as your query, as soon as there are more instances down than up it triggers.

You can also have a look into kube-state-metrics as you can combine more information with it.

@EdSchouten

This comment has been minimized.

Copy link
Contributor Author

EdSchouten commented Oct 11, 2016

Oh, wow. This is interesting. There is indeed an instance label that's also attached to the metrics. If I run the query you've shown above, I get exactly the same results. Sorry for the noise, and thanks a lot for your help!

Now I'm sort of backtracking why I didn't notice this earlier. Instead of running queries, I was merely looking at the /targets page of Prometheus to figure out how the data was going to be exposed. This is what I get when I hover over the labels:

screen shot 2016-10-11 at 11 27 29

Would it make sense to extend this page to simply show all labels that will be attached instead of making some of these (job, instance) implicit?

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Oct 11, 2016

No problem, happy I could help!

I can see how implicit labels might be confusing. I wouldn't mind displaying it, however, this was worked on way before my time 🙂 and since they are explicitly not show I'm guessing there was a thought behind it. Maybe @fabxc, @brian-brazil, @beorn7 or @juliusv have an opinion or insight to share.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Oct 11, 2016

Would it make sense to extend this page to simply show all labels that will be attached instead of making some of these (job, instance) implicit?

This is already the case. The tooltip labels are those available to relabelling.

count(up{job="prometheus"} == 0) > count(up{job="prometheus"} == 1)

avg(up{job="prometheus"}) < .5 is safer, as if all hosts are down the above query will return nothing.

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Oct 11, 2016

Thanks for sharing Brian! That makes sense, and thanks for correcting my statement.

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Oct 12, 2016

Actually since #2062 has just been merged all existing labels are viewed on the targets page. The job label is implicit, however, shown through the grouping of targets. Hope it's all clarified @EdSchouten.

@EdSchouten

This comment has been minimized.

Copy link
Contributor Author

EdSchouten commented Oct 12, 2016

That makes sense. Thanks a lot!

@EdSchouten EdSchouten closed this Oct 12, 2016

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.