New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistent Prometheus cAdvisor metrics #51473

Merged
merged 1 commit into from Aug 29, 2017

Conversation

@bboreham
Contributor

bboreham commented Aug 28, 2017

What this PR does / why we need it:

We need this because otherwise kubelet is exposing different sets of Prometheus metrics that randomly include or do not include containers.

See also google/cadvisor#1704; quoting here:

Prometheus requires that all metrics in the same family have the same labels, so we arrange to supply blank strings for missing labels

The function containerPrometheusLabels() conditionally adds various metric labels from container labels - pod name, image, etc. However, when it receives the metrics, Prometheus checks that all metrics in the same family have the same label set, and rejects those that do not.

Since containers are collected in (somewhat) random order, depending on which kind is seen first you get one set of metrics or the other.

Changing the container labels function to always add the same set of labels, adding "" when it doesn't have a real value, eliminates the issue in my testing.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Fixes #50151

Special notes for your reviewer:

I have made the same fix in two places. I am 98% sure the one in cadvisor_linux.go isn't used and indeed cannot be used, but have not gone fully down that rabbit-hole.

Release note:

Fix inconsistent Prometheus cAdvisor metrics
Make Prometheus cAdvisor metrics labels consistent
Prometheus requires that all metrics in the same family have the same
labels, so we arrange to supply blank strings for missing labels

See google/cadvisor#1704
@k8s-ci-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-ci-robot

k8s-ci-robot Aug 28, 2017

Contributor

Hi @bboreham. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Contributor

k8s-ci-robot commented Aug 28, 2017

Hi @bboreham. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dashpole

This comment has been minimized.

Show comment
Hide comment
@dashpole

dashpole Aug 28, 2017

Contributor

/ok-to-test

Contributor

dashpole commented Aug 28, 2017

/ok-to-test

@dashpole

This comment has been minimized.

Show comment
Hide comment
@dashpole

dashpole Aug 28, 2017

Contributor

/test pull-kubernetes-federation-e2e-gce

Contributor

dashpole commented Aug 28, 2017

/test pull-kubernetes-federation-e2e-gce

@dashpole

This comment has been minimized.

Show comment
Hide comment
@dashpole

dashpole Aug 28, 2017

Contributor

This change looks good, although it doesnt appear to fix all issues. At least for me, it looks like diskio metrics are still not consistent. I'll have to look into that...
/lgtm

Contributor

dashpole commented Aug 28, 2017

This change looks good, although it doesnt appear to fix all issues. At least for me, it looks like diskio metrics are still not consistent. I'll have to look into that...
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Aug 28, 2017

@dashpole

This comment has been minimized.

Show comment
Hide comment
@dashpole

dashpole Aug 28, 2017

Contributor

/assign @Random-Liu
for approval. This fixes google/cadvisor#1704

Contributor

dashpole commented Aug 28, 2017

/assign @Random-Liu
for approval. This fixes google/cadvisor#1704

@dashpole

This comment has been minimized.

Show comment
Hide comment
@dashpole

dashpole Aug 28, 2017

Contributor

/test pull-kubernetes-federation-e2e-gce

Contributor

dashpole commented Aug 28, 2017

/test pull-kubernetes-federation-e2e-gce

@Random-Liu

This comment has been minimized.

Show comment
Hide comment
@Random-Liu

Random-Liu Aug 28, 2017

Member

/approve

Member

Random-Liu commented Aug 28, 2017

/approve

@k8s-merge-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-merge-robot

k8s-merge-robot Aug 28, 2017

Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Random-Liu, bboreham, dashpole

Associated issue: 50151

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

Contributor

k8s-merge-robot commented Aug 28, 2017

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Random-Liu, bboreham, dashpole

Associated issue: 50151

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

Show comment
Hide comment
@fejta-bot

fejta-bot Aug 29, 2017

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

fejta-bot commented Aug 29, 2017

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot

This comment has been minimized.

Show comment
Hide comment
@fejta-bot

fejta-bot Aug 29, 2017

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

fejta-bot commented Aug 29, 2017

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-merge-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-merge-robot

k8s-merge-robot Aug 29, 2017

Contributor

Automatic merge from submit-queue (batch tested with PRs 51471, 50561, 50435, 51473, 51436)

Contributor

k8s-merge-robot commented Aug 29, 2017

Automatic merge from submit-queue (batch tested with PRs 51471, 50561, 50435, 51473, 51436)

@k8s-merge-robot k8s-merge-robot merged commit cc557e6 into kubernetes:master Aug 29, 2017

14 checks passed

Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation bboreham authorized
Details
pull-kubernetes-bazel Job succeeded.
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-e2e-gce-bazel Job succeeded.
Details
pull-kubernetes-e2e-gce-etcd3 Jenkins job succeeded.
Details
pull-kubernetes-e2e-gce-gpu Jenkins job succeeded.
Details
pull-kubernetes-e2e-kops-aws Jenkins job succeeded.
Details
pull-kubernetes-federation-e2e-gce Jenkins job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Jenkins job succeeded.
Details
pull-kubernetes-node-e2e Jenkins job succeeded.
Details
pull-kubernetes-unit Jenkins job succeeded.
Details
pull-kubernetes-verify Jenkins job succeeded.
Details
@sylr

This comment has been minimized.

Show comment
Hide comment
@sylr

sylr Aug 29, 2017

Contributor

This needs to be cherry picked into release-1.7 before the next release.

Contributor

sylr commented Aug 29, 2017

This needs to be cherry picked into release-1.7 before the next release.

@sylr

This comment has been minimized.

Show comment
Hide comment
@sylr

sylr Sep 7, 2017

Contributor

@Random-Liu or anyone could launch the release-1.7 cherry-pick process ?

Contributor

sylr commented Sep 7, 2017

@Random-Liu or anyone could launch the release-1.7 cherry-pick process ?

@luxas

This comment has been minimized.

Show comment
Hide comment
@luxas

luxas Sep 11, 2017

Member

@sylr if you want to cherrypick this; just propose doing so.
Install https://github.com/github/hub and run

GITHUB_USER=sylr hack/cherry-pick-pull.sh upstream/release-1.7 51473

(also see the doc: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md)

No need to wait for others to do this.
Thanks!

Member

luxas commented Sep 11, 2017

@sylr if you want to cherrypick this; just propose doing so.
Install https://github.com/github/hub and run

GITHUB_USER=sylr hack/cherry-pick-pull.sh upstream/release-1.7 51473

(also see the doc: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md)

No need to wait for others to do this.
Thanks!

@k8s-cherrypick-bot

This comment has been minimized.

Show comment
Hide comment
@k8s-cherrypick-bot

k8s-cherrypick-bot Sep 11, 2017

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

k8s-cherrypick-bot commented Sep 11, 2017

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

@sylr

This comment has been minimized.

Show comment
Hide comment
@sylr

sylr Sep 11, 2017

Contributor

@luxas Thanks

Note to self:

GITHUB_USER=sylr UPSTREAM_REMOTE=origin FORK_REMOTE=sylr hack/cherry_pick_pull.sh sylr/release-1.7 51473
Contributor

sylr commented Sep 11, 2017

@luxas Thanks

Note to self:

GITHUB_USER=sylr UPSTREAM_REMOTE=origin FORK_REMOTE=sylr hack/cherry_pick_pull.sh sylr/release-1.7 51473
@luxas

This comment has been minimized.

Show comment
Hide comment
@luxas

luxas Sep 11, 2017

Member

yeah, if that are the git remote names you're using

Member

luxas commented Sep 11, 2017

yeah, if that are the git remote names you're using

@AndreaGiardini

This comment has been minimized.

Show comment
Hide comment
@AndreaGiardini

AndreaGiardini Sep 11, 2017

@luxas I think you need to add a release milestone to get this cherry-pick go on

AndreaGiardini commented Sep 11, 2017

@luxas I think you need to add a release milestone to get this cherry-pick go on

@luxas

This comment has been minimized.

Show comment
Hide comment
@luxas

luxas Sep 11, 2017

Member

@AndreaGiardini AFAIK, that's not totally mandatory, but makes it much more clear and the normal flow, so will do. Thanks!

Member

luxas commented Sep 11, 2017

@AndreaGiardini AFAIK, that's not totally mandatory, but makes it much more clear and the normal flow, so will do. Thanks!

@luxas luxas added this to the v1.7 milestone Sep 11, 2017

k8s-merge-robot added a commit that referenced this pull request Sep 17, 2017

Merge pull request #52268 from sylr/automated-cherry-pick-of-#51473-s…
…ylr-release-1.7

Automatic merge from submit-queue.

Automated cherry pick of #51473

Cherry pick of #51473 on release-1.7.

#51473: Make Prometheus cAdvisor metrics labels consistent
@k8s-cherrypick-bot

This comment has been minimized.

Show comment
Hide comment
@k8s-cherrypick-bot

k8s-cherrypick-bot Sep 17, 2017

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

k8s-cherrypick-bot commented Sep 17, 2017

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment