Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work-around for missing memory metrics on CRI-O exited containers #88734

Merged
merged 1 commit into from
Mar 10, 2020

Conversation

joelsmith
Copy link
Contributor

@joelsmith joelsmith commented Mar 2, 2020

What type of PR is this?
/kind bug

What this PR does / why we need it:

HPA needs metrics for exited init containers before it will take action. By setting memory and CPU usage to zero for any containers that cAdvisor didn't provide statistics for, we are assured that HPA will be able to correctly calculate pod resource usage.

This PR essentially makes the kubelet's cAdvisor stats provider behave the same as its CRI stats provider (see #74336).

Which issue(s) this PR fixes:
No issue filed.

Special notes for your reviewer:

N/A

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
N/A

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 2, 2020
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 3, 2020
HPA needs metrics for exited init containers before it will
take action. By setting memory and CPU usage to zero for any
containers that cAdvisor didn't provide statistics for, we
are assured that HPA will be able to correctly calculate
pod resource usage.
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Mar 5, 2020
@joelsmith joelsmith changed the title WIP: Work-around for missing memory metrics on CRI-O exited containers Work-around for missing memory metrics on CRI-O exited containers Mar 5, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 5, 2020
@joelsmith
Copy link
Contributor Author

joelsmith commented Mar 5, 2020

/cc @dashpole
This is the issue that I brought up in the sig-node meeting this week.

Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this for all metrics (e.g. disk as well) or do you think we should only do so for cpu and memory?

} else {
memoryStats = &statsapi.MemoryStats{
Time: metav1.NewTime(cstat.Timestamp),
WorkingSetBytes: uint64Ptr(0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we default other fields to 0 to have consistent behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. I was just trying to match the behavior in the CRI stats provider, here: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/stats/cri_stats_provider.go#L537

@joelsmith
Copy link
Contributor Author

As to the question of populating other missing metrics like disk and network, I'm not sure. It's certainly worth thinking about.

@dashpole
Copy link
Contributor

dashpole commented Mar 6, 2020

/priority important-soon
This meets my bar for a bug-fix. Feel free to follow-up with other changes discussed above if you are interested.
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 6, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, joelsmith

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2020
@rphillips
Copy link
Member

@dashpole @derekwaynecarr thoughts on getting this into 1.18?

@dashpole
Copy link
Contributor

@rphillips I am in favor.

@derekwaynecarr
Copy link
Member

its a bug fix that impacts users.

/milestone v1.18

@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Mar 10, 2020
@joelsmith
Copy link
Contributor Author

/retest

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 10, 2020

@joelsmith: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-conformance-kind-ga-only-parallel da98829 link /test pull-kubernetes-conformance-kind-ga-only-parallel

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot merged commit 7989ca4 into kubernetes:master Mar 10, 2020
@matthias50
Copy link
Contributor

@joelsmith, @dashpole @derekwaynecarr, we are running into this and are currently about to upgrade from 1.15 to 1.6. Can we get this cherry-picked back to 1.17 and 1.16?

@joelsmith
Copy link
Contributor Author

joelsmith commented May 8, 2020

I've opened #90900 (1.16) and #90901 (1.17).
Edit: both of these have merged, so in addition to being fixed in 1.18.0, it should also be in 1.16.9 and 1.17.6.

k8s-ci-robot added a commit that referenced this pull request May 15, 2020
…734-upstream-release-1.16

Automated cherry pick of #88734: Work-around for missing metrics on CRI-O exited containers
k8s-ci-robot added a commit that referenced this pull request May 15, 2020
…734-upstream-release-1.17

Automated cherry pick of #88734: Work-around for missing metrics on CRI-O exited containers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants