Work-around for missing memory metrics on CRI-O exited containers #88734

joelsmith · 2020-03-02T15:09:20Z

What type of PR is this?
/kind bug

What this PR does / why we need it:

HPA needs metrics for exited init containers before it will take action. By setting memory and CPU usage to zero for any containers that cAdvisor didn't provide statistics for, we are assured that HPA will be able to correctly calculate pod resource usage.

This PR essentially makes the kubelet's cAdvisor stats provider behave the same as its CRI stats provider (see #74336).

Which issue(s) this PR fixes:
No issue filed.

Special notes for your reviewer:

N/A

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
N/A

HPA needs metrics for exited init containers before it will take action. By setting memory and CPU usage to zero for any containers that cAdvisor didn't provide statistics for, we are assured that HPA will be able to correctly calculate pod resource usage.

joelsmith · 2020-03-05T20:38:34Z

/cc @dashpole
This is the issue that I brought up in the sig-node meeting this week.

dashpole

Should we do this for all metrics (e.g. disk as well) or do you think we should only do so for cpu and memory?

dashpole · 2020-03-05T23:30:40Z

pkg/kubelet/stats/helper.go

+	} else {
+		memoryStats = &statsapi.MemoryStats{
+			Time:            metav1.NewTime(cstat.Timestamp),
+			WorkingSetBytes: uint64Ptr(0),


Should we default other fields to 0 to have consistent behavior?

Perhaps. I was just trying to match the behavior in the CRI stats provider, here: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/stats/cri_stats_provider.go#L537

joelsmith · 2020-03-05T23:40:43Z

As to the question of populating other missing metrics like disk and network, I'm not sure. It's certainly worth thinking about.

dashpole · 2020-03-06T00:11:29Z

/priority important-soon
This meets my bar for a bug-fix. Feel free to follow-up with other changes discussed above if you are interested.
/lgtm
/approve

k8s-ci-robot · 2020-03-06T00:11:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, joelsmith

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/stats/OWNERS~~ [dashpole]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rphillips · 2020-03-10T20:17:33Z

@dashpole @derekwaynecarr thoughts on getting this into 1.18?

dashpole · 2020-03-10T20:24:36Z

@rphillips I am in favor.

derekwaynecarr · 2020-03-10T21:57:18Z

its a bug fix that impacts users.

/milestone v1.18

joelsmith · 2020-03-10T22:37:37Z

/retest

k8s-ci-robot · 2020-03-10T23:09:59Z

@joelsmith: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kubernetes-conformance-kind-ga-only-parallel	`da98829`	link	`/test pull-kubernetes-conformance-kind-ga-only-parallel`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

matthias50 · 2020-05-08T13:46:13Z

@joelsmith, @dashpole @derekwaynecarr, we are running into this and are currently about to upgrade from 1.15 to 1.6. Can we get this cherry-picked back to 1.17 and 1.16?

joelsmith · 2020-05-08T14:31:28Z

I've opened #90900 (1.16) and #90901 (1.17).
Edit: both of these have merged, so in addition to being fixed in 1.18.0, it should also be in 1.16.9 and 1.17.6.

…734-upstream-release-1.16 Automated cherry pick of #88734: Work-around for missing metrics on CRI-O exited containers

…734-upstream-release-1.17 Automated cherry pick of #88734: Work-around for missing metrics on CRI-O exited containers

k8s-ci-robot requested review from feiskyer and yujuhong March 2, 2020 15:11

joelsmith force-pushed the master branch from 4c4a21a to ae3f00e Compare March 3, 2020 18:05

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 3, 2020

joelsmith force-pushed the master branch from ae3f00e to da98829 Compare March 5, 2020 20:20

joelsmith changed the title ~~WIP: Work-around for missing memory metrics on CRI-O exited containers~~ Work-around for missing memory metrics on CRI-O exited containers Mar 5, 2020

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 5, 2020

k8s-ci-robot requested a review from dashpole March 5, 2020 20:38

dashpole reviewed Mar 5, 2020

View reviewed changes

k8s-ci-robot assigned dashpole Mar 6, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2020

joelsmith mentioned this pull request Mar 6, 2020

Bug 1749468: UPSTREAM: 88734: Work-around for missing metrics on CRI-O exited containers openshift/origin#24653

Merged

k8s-ci-robot added this to the v1.18 milestone Mar 10, 2020

k8s-ci-robot merged commit 7989ca4 into kubernetes:master Mar 10, 2020

This was referenced May 8, 2020

Automated cherry pick of #88734: Work-around for missing metrics on CRI-O exited containers #90900

Merged

Automated cherry pick of #88734: Work-around for missing metrics on CRI-O exited containers #90901

Merged

k8s-ci-robot added a commit that referenced this pull request May 15, 2020

Merge pull request #90900 from joelsmith/automated-cherry-pick-of-#88…

cca04b2

…734-upstream-release-1.16 Automated cherry pick of #88734: Work-around for missing metrics on CRI-O exited containers

k8s-ci-robot added a commit that referenced this pull request May 15, 2020

Merge pull request #90901 from joelsmith/automated-cherry-pick-of-#88…

1596ecd

…734-upstream-release-1.17 Automated cherry pick of #88734: Work-around for missing metrics on CRI-O exited containers

github-actions bot mentioned this pull request May 27, 2020

Week Ending May 17, 2020 dev-obs/actus#156

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work-around for missing memory metrics on CRI-O exited containers #88734

Work-around for missing memory metrics on CRI-O exited containers #88734

joelsmith commented Mar 2, 2020 •

edited

joelsmith commented Mar 5, 2020 •

edited

dashpole left a comment

dashpole Mar 5, 2020

joelsmith Mar 5, 2020

joelsmith commented Mar 5, 2020

dashpole commented Mar 6, 2020

k8s-ci-robot commented Mar 6, 2020

rphillips commented Mar 10, 2020

dashpole commented Mar 10, 2020

derekwaynecarr commented Mar 10, 2020

joelsmith commented Mar 10, 2020

k8s-ci-robot commented Mar 10, 2020 •

edited

matthias50 commented May 8, 2020

joelsmith commented May 8, 2020 •

edited

Work-around for missing memory metrics on CRI-O exited containers #88734

Work-around for missing memory metrics on CRI-O exited containers #88734

Conversation

joelsmith commented Mar 2, 2020 • edited

joelsmith commented Mar 5, 2020 • edited

dashpole left a comment

Choose a reason for hiding this comment

dashpole Mar 5, 2020

Choose a reason for hiding this comment

joelsmith Mar 5, 2020

Choose a reason for hiding this comment

joelsmith commented Mar 5, 2020

dashpole commented Mar 6, 2020

k8s-ci-robot commented Mar 6, 2020

rphillips commented Mar 10, 2020

dashpole commented Mar 10, 2020

derekwaynecarr commented Mar 10, 2020

joelsmith commented Mar 10, 2020

k8s-ci-robot commented Mar 10, 2020 • edited

matthias50 commented May 8, 2020

joelsmith commented May 8, 2020 • edited

joelsmith commented Mar 2, 2020 •

edited

joelsmith commented Mar 5, 2020 •

edited

k8s-ci-robot commented Mar 10, 2020 •

edited

joelsmith commented May 8, 2020 •

edited