-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide active pods to cgroup cleanup #42585
provide active pods to cgroup cleanup #42585
Conversation
opening this for the moment, but i think this should help the referenced flake. |
fyi @sjenning |
i need to send an update to this PR to probably not look at runningPods at all. |
This is an interesting bug. |
Running pods is confusing and we should probably not have that exposed within the kubelet object at all. |
When implementing this for volumes, I found the best way to get the set of running pods is to get all pods from kl.podManager, and then check each pod's status from the status manger to see if they are terminated, which is what this PR does. So this LGTM! |
Yeah, @vishh I am not even sure we need to use runningPods here. But for the purpose of fixing flakes, this looks like the correct change. |
b4b9199
to
a297daf
Compare
given the potential for flakes here, this should probably be a p0. |
/lgtm |
7e3312e
to
5ce298c
Compare
SimpleMount e2e_node test passed for me with the latest PR (and it failed before). |
} | ||
|
||
return false | ||
return status.Phase == v1.PodFailed || status.Phase == v1.PodSucceeded || (pod.DeletionTimestamp != nil && notRunning(status.ContainerStatuses)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dashpole can we cleanup the volume code to use this active pod lister method instead? Not necessary for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. Nothing would give me more joy :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@derekwaynecarr your change to podIsTerminated
may cause other race conditions (e.g., pod worker still trying to start new containers). Could you remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't have any effects during pod startup, as this change only makes a difference when the deletion timestamp is set. I agree that we should do a more thorough review of what effects this could have, though. In general, I think this is a better definition of podIsTerminated, as a pod that has been deleted, and has no containers running is definitely in a terminal state.
/lgtm again |
[APPROVALNOTIFIER] This PR is APPROVED The following people have approved this PR: derekwaynecarr, vishh Needs approval from an approver in each of these OWNERS Files:
We suggest the following people: |
Having one function for termination filtering is preferred |
The GCI GCE failure is known: #42597 |
@k8s-bot gce etcd3 e2e test this |
bot didnt apply tag from #42585 (comment) so I am applying to get this critical fix in queue. |
Automatic merge from submit-queue (batch tested with PRs 42506, 42585, 42596, 42584) |
Even if the deletion timestamp is set and no containers are currently running, the pod worker may still be in the process of recreating those pods. Continue dispatch work to the pod worker to make sure deletion of containers are completed.
What this PR does / why we need it:
This PR provides more information for when a pod cgroup is considered orphaned. The running pods cache is based on the runtime's view of the world. we create pod cgroups before containers so we should just be looking at activePods.
Which issue this PR fixes
Fixes #42431