provide active pods to cgroup cleanup #42585

derekwaynecarr · 2017-03-06T18:23:14Z

What this PR does / why we need it:
This PR provides more information for when a pod cgroup is considered orphaned. The running pods cache is based on the runtime's view of the world. we create pod cgroups before containers so we should just be looking at activePods.

Which issue this PR fixes
Fixes #42431

k8s-reviewable · 2017-03-06T18:24:20Z

This change is

derekwaynecarr · 2017-03-06T18:24:27Z

opening this for the moment, but i think this should help the referenced flake.

derekwaynecarr · 2017-03-06T18:26:57Z

fyi @sjenning

derekwaynecarr · 2017-03-06T18:28:15Z

i need to send an update to this PR to probably not look at runningPods at all.

vishh · 2017-03-06T18:29:03Z

This is an interesting bug.

vishh · 2017-03-06T18:30:35Z

Running pods is confusing and we should probably not have that exposed within the kubelet object at all.
Ping once your get rid of using Running Pods. Otherwise this LGTM

dashpole · 2017-03-06T18:30:46Z

When implementing this for volumes, I found the best way to get the set of running pods is to get all pods from kl.podManager, and then check each pod's status from the status manger to see if they are terminated, which is what this PR does. So this LGTM!

dashpole · 2017-03-06T18:31:57Z

Yeah, @vishh I am not even sure we need to use runningPods here. But for the purpose of fixing flakes, this looks like the correct change.

derekwaynecarr · 2017-03-06T18:33:49Z

@vishh @dashpole -- updated.

derekwaynecarr · 2017-03-06T18:40:11Z

given the potential for flakes here, this should probably be a p0.

vishh · 2017-03-06T18:40:30Z

/lgtm

dashpole · 2017-03-06T22:57:20Z

SimpleMount e2e_node test passed for me with the latest PR (and it failed before).

vishh · 2017-03-06T23:21:24Z

pkg/kubelet/kubelet_pods.go

-	}
-
-	return false
+	return status.Phase == v1.PodFailed || status.Phase == v1.PodSucceeded || (pod.DeletionTimestamp != nil && notRunning(status.ContainerStatuses))


@dashpole can we cleanup the volume code to use this active pod lister method instead? Not necessary for this PR

yes. Nothing would give me more joy :)

@derekwaynecarr your change to podIsTerminated may cause other race conditions (e.g., pod worker still trying to start new containers). Could you remove this?

/cc @Random-Liu @dchen1107

This shouldn't have any effects during pod startup, as this change only makes a difference when the deletion timestamp is set. I agree that we should do a more thorough review of what effects this could have, though. In general, I think this is a better definition of podIsTerminated, as a pod that has been deleted, and has no containers running is definitely in a terminal state.

vishh · 2017-03-06T23:21:44Z

/lgtm again

k8s-github-robot · 2017-03-06T23:21:46Z

[APPROVALNOTIFIER] This PR is APPROVED

The following people have approved this PR: derekwaynecarr, vishh

Needs approval from an approver in each of these OWNERS Files:

~~pkg/kubelet/OWNERS~~ [derekwaynecarr,vishh]

We suggest the following people:
cc @timstclair
You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

derekwaynecarr · 2017-03-06T23:24:11Z

Having one function for termination filtering is preferred

derekwaynecarr · 2017-03-06T23:39:38Z

@k8s-bot gce etcd3 e2e test this
@k8s-bot gci gce e2e test this

dashpole · 2017-03-06T23:39:46Z

The GCI GCE failure is known: #42597

dashpole · 2017-03-07T00:49:09Z

@k8s-bot gce etcd3 e2e test this

derekwaynecarr · 2017-03-07T01:41:15Z

bot didnt apply tag from #42585 (comment) so I am applying to get this critical fix in queue.

k8s-github-robot · 2017-03-07T06:20:10Z

Automatic merge from submit-queue (batch tested with PRs 42506, 42585, 42596, 42584)

Even if the deletion timestamp is set and no containers are currently running, the pod worker may still be in the process of recreating those pods. Continue dispatch work to the pod worker to make sure deletion of containers are completed.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 6, 2017

derekwaynecarr added this to the v1.6 milestone Mar 6, 2017

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2017

k8s-github-robot assigned yujuhong Mar 6, 2017

k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Mar 6, 2017

k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 6, 2017

derekwaynecarr assigned dashpole and vishh Mar 6, 2017

derekwaynecarr requested a review from sjenning March 6, 2017 18:23

derekwaynecarr changed the title ~~provide active pods to cgroup cleanup~~ WIP: provide active pods to cgroup cleanup Mar 6, 2017

derekwaynecarr mentioned this pull request Mar 6, 2017

ci-kubernetes-node-kubelet broken since 16:00pm PST yesterday (03/01) #42431

Closed

derekwaynecarr force-pushed the cgroup-flake branch from b4b9199 to a297daf Compare March 6, 2017 18:32

derekwaynecarr changed the title ~~WIP: provide active pods to cgroup cleanup~~ provide active pods to cgroup cleanup Mar 6, 2017

derekwaynecarr force-pushed the cgroup-flake branch from 7e3312e to 5ce298c Compare March 6, 2017 22:37

k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 6, 2017

vishh reviewed Mar 6, 2017

View reviewed changes

derekwaynecarr mentioned this pull request Mar 6, 2017

[k8s.io] Kubelet Cgroup Manager Pod containers On scheduling a Guaranteed Pod Pod containers should have been created under the cgroup-root {E2eNode Suite} #42563

Closed

derekwaynecarr added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 7, 2017

k8s-github-robot merged commit a1c5d1b into kubernetes:master Mar 7, 2017

derekwaynecarr mentioned this pull request Mar 10, 2017

[k8s.io] ConfigMap should be consumable via environment variable [Conformance] {E2eNode Suite} #42526

Closed

dashpole mentioned this pull request Mar 11, 2017

[Bug] Handle container restarts and avoid using runtime pod cache while allocating GPUs #42942

Merged

gnufied mentioned this pull request Mar 15, 2017

[k8s.io] Projected should project all components that make up the projection API [Conformance] [Volume] [Projection] {E2eNode Suite} #42945

Closed

liaoyw mentioned this pull request May 4, 2017

Pod cgroup removed(process killed) before containers stopped, cause cni gets empty network namespace when TearDownPod #45334

Closed

yujuhong mentioned this pull request May 15, 2019

Prevent Kubelet from clearing terminated pod status #77870

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provide active pods to cgroup cleanup #42585

provide active pods to cgroup cleanup #42585

derekwaynecarr commented Mar 6, 2017 •

edited

k8s-reviewable commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

vishh commented Mar 6, 2017

vishh commented Mar 6, 2017

dashpole commented Mar 6, 2017

dashpole commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

vishh commented Mar 6, 2017

dashpole commented Mar 6, 2017

vishh Mar 6, 2017

dashpole Mar 6, 2017

yujuhong Mar 9, 2017

dashpole Apr 4, 2017

vishh commented Mar 6, 2017

k8s-github-robot commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

dashpole commented Mar 6, 2017

dashpole commented Mar 7, 2017

derekwaynecarr commented Mar 7, 2017

k8s-github-robot commented Mar 7, 2017

provide active pods to cgroup cleanup #42585

provide active pods to cgroup cleanup #42585

Conversation

derekwaynecarr commented Mar 6, 2017 • edited

k8s-reviewable commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

vishh commented Mar 6, 2017

vishh commented Mar 6, 2017

dashpole commented Mar 6, 2017

dashpole commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

vishh commented Mar 6, 2017

dashpole commented Mar 6, 2017

vishh Mar 6, 2017

Choose a reason for hiding this comment

dashpole Mar 6, 2017

Choose a reason for hiding this comment

yujuhong Mar 9, 2017

Choose a reason for hiding this comment

dashpole Apr 4, 2017

Choose a reason for hiding this comment

vishh commented Mar 6, 2017

k8s-github-robot commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

derekwaynecarr commented Mar 6, 2017

dashpole commented Mar 6, 2017

dashpole commented Mar 7, 2017

derekwaynecarr commented Mar 7, 2017

k8s-github-robot commented Mar 7, 2017

derekwaynecarr commented Mar 6, 2017 •

edited