Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: Pods created and rapidly terminated get stuck #98424

Merged
merged 1 commit into from Feb 4, 2021

Conversation

rphillips
Copy link
Member

@rphillips rphillips commented Jan 26, 2021

What type of PR is this?
/kind bug
/sig node

What this PR does / why we need it:
This PR fixes the race when a pod gets terminated but crio has not had the chance to clean up the cgroup yet. The podKiller will register all pods into a map and bypass the cgroup cleanup in cleanupOrphanedPodCgroups if the kill is still pending.

In Openshift, we are seeing an issue where pods can be deleted from the API and the cgroup is torn down before the pod is completely killed. This is due to the Kubelet seeing the terminated pod from the API and the housekeeping routine (runs every 2 seconds) starts cleaning up the cgroup (cleanupOrphanedPodCgroups) if cgroupPerQOS is enabled.

The fix skips over the cgroup cleanup if the pod is being killed by podKiller. Once the pod exits the podKiller then pcm.Destroy is called within cleanupOrphanedPodCgroups.

  • Adds mock for FakeContainerManager
  • Adds mock for PodContainerManager
  • Unit test

Which issue(s) this PR fixes:
Fixes #98142

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 26, 2021
@k8s-ci-robot
Copy link
Contributor

@rphillips: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jan 26, 2021
@rphillips
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 26, 2021
@rphillips rphillips changed the title kubelet: only filter out terminated pods if the resources have been reclaimed WIP: kubelet: only filter out terminated pods if the resources have been reclaimed Jan 26, 2021
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 26, 2021
@ehashman ehashman added this to Waiting on Author in SIG Node PR Triage Jan 26, 2021
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 26, 2021
@rphillips rphillips changed the title WIP: kubelet: only filter out terminated pods if the resources have been reclaimed WIP: kubelet: add pods to termination map Jan 26, 2021
@ehashman
Copy link
Member

/cc @SergeyKanzhelev

Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks like an improvement to me (yay thread safety!!)

I'd love to see what this looks like against some of the e2es that are frequently failing. @rphillips I think some of them are OpenShift-only, do you want me to try porting them back to k8s? Maybe I can run them against this patch in another DNM PR?

pkg/kubelet/kubelet_pods.go Outdated Show resolved Hide resolved
@rphillips rphillips force-pushed the fixes/98142 branch 2 times, most recently from 3215c07 to a87e5b7 Compare January 28, 2021 01:18
@rphillips
Copy link
Member Author

/test pull-kubernetes-e2e-gce-ubuntu-containerd

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 4, 2021
do not delete the cgroup from a pod when it is being killed
@rphillips
Copy link
Member Author

/retest

@rphillips
Copy link
Member Author

@sjenning
Copy link
Contributor

sjenning commented Feb 4, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 4, 2021
Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@rphillips
Copy link
Member Author

/test pull-kubernetes-e2e-kind

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Feb 4, 2021

@rphillips: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-bazel-test f918e11 link /test pull-kubernetes-bazel-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jingxu97
Copy link
Contributor

I think this PR could consider cherrypick?

@jingxu97
Copy link
Contributor

@rphillips Just want to confirm, without this fix, the issue is pod is sometimes stuck in terminating phase due to cgroup clean up issue? Thanks!

@ehashman
Copy link
Member

@jingxu97 it's a pretty big change, I'm not sure if we want to cherry-pick this. There are some other improvements like #98933 going in that would be good to land together with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

error reading container (probably exited) json message: EOF