Should allow aggregation of pod/container metrics by deployment #70

ghost · 2017-01-18T13:08:03Z

Pods and containers export important metrics like health or container restart count. These metrics are most useful when viewed at a deployment aggregation level (i.e., summed over all pods belonging to the same deployment) or on the replica set level. Individual pods are less useful, because a pod might go away for benign reasons.

To do the aggregation, I need labels that reference the deployment. For example, for the standard pod from a deployment named "foo-12345-fpgj", I'd need a label "foo" that doesn't include the replica set identifier ("12345") or the pod identifier ("fpgj").

This bug is for tracking. We're already in touch with the Stackdriver and Kubernetes folks in Google who're hopefully making this happen.

brancz · 2017-01-18T13:43:49Z

kube-state-metrics intends to mirror what the Kubernetes API exposes. If there is a clear connection between objects I'm happy to expose that in a label in the kube_pod_info metric. A candidate for that could be an ownerReference, which if a Pod belongs to a ReplicaSet is set to that ReplicaSet.

Alternatively if using Prometheus, you can use relabelling rules to parse these things out of the Pods name.

More generally regarding metric and label exposure these rules apply:

juliusv · 2017-11-06T13:44:25Z

So we have the ReplicaSet information in the kube_pod_info metric, but as also mentioned in #27, one is usually not interested in the ReplicaSet directly, but the Deployment / DaemonSet / ... that created it. That is, the creator of the creator. Though including this two-level info could be seen as somewhat arbitrary (what if there is nothing above the ReplicaSet or a directly created pod, or what if even the Deployment was created by something higher up that you want to track?), it seems that 99% of pods are either directly rooted at something exactly 2 creation levels apart (Deployment, DaemonSet, ...) or started as standalone pods.

So I think it would be useful and not too problematic to include that information in the kube_pod_info metric somehow. The names of those objects should also not change during a pod's lifetime, meaning there shouldn't be a concern about denormalization changing all pod series here.

The question would be what to name the labels for this. We already have created_by_kind and created_by_name labels for the first parent, but what would the labels be called for the grandparent?

brancz · 2017-11-06T13:53:26Z

Generally I'm all for this if we can get this information presented in a reasonable way.

The created_by_* labels are deprecated, as the underlying annotation on upstream objects is as well in favor of the OwnerReferences. The problem with that is that an object can have multiple owners, and then it's hard to create a label "owners owner", as that can be a list rather than a single value.

juliusv · 2017-11-06T15:20:40Z

Ah damn, wasn't aware of multiple owners. That makes the whole thing harder indeed. Do you think multiple owners will actually be common, or an exceptional thing?

brancz · 2017-11-07T08:30:21Z

Already seeing that today already unfortunately, so we can't just make assumptions. While maybe not as simple as it should be, we can still solve this with a couple of recording rules in Prometheus with joins and group_left statements. I'm guessing that's what you were trying to avoid though 🙂 .

fejta-bot · 2018-02-07T14:30:42Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-03-09T15:15:32Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-04-08T16:03:50Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

…ry-pick-69-to-release-4.10 [release-4.10] Bug 2078835: internal/store: fix potential panic in pod store

rndstr mentioned this issue Aug 1, 2017

Allow aggregation of pod metrics by service #195

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 7, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 9, 2018

k8s-ci-robot closed this as completed Apr 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should allow aggregation of pod/container metrics by deployment #70

Should allow aggregation of pod/container metrics by deployment #70

ghost commented Jan 18, 2017

brancz commented Jan 18, 2017 •

edited

juliusv commented Nov 6, 2017

brancz commented Nov 6, 2017

juliusv commented Nov 6, 2017

brancz commented Nov 7, 2017

fejta-bot commented Feb 7, 2018

fejta-bot commented Mar 9, 2018

fejta-bot commented Apr 8, 2018

Should allow aggregation of pod/container metrics by deployment #70

Should allow aggregation of pod/container metrics by deployment #70

Comments

ghost commented Jan 18, 2017

brancz commented Jan 18, 2017 • edited

juliusv commented Nov 6, 2017

brancz commented Nov 6, 2017

juliusv commented Nov 6, 2017

brancz commented Nov 7, 2017

fejta-bot commented Feb 7, 2018

fejta-bot commented Mar 9, 2018

fejta-bot commented Apr 8, 2018

brancz commented Jan 18, 2017 •

edited