Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide active pods to cgroup cleanup #42585

Merged
merged 1 commit into from
Mar 7, 2017

Conversation

derekwaynecarr
Copy link
Member

@derekwaynecarr derekwaynecarr commented Mar 6, 2017

What this PR does / why we need it:
This PR provides more information for when a pod cgroup is considered orphaned. The running pods cache is based on the runtime's view of the world. we create pod cgroups before containers so we should just be looking at activePods.

Which issue this PR fixes
Fixes #42431

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 6, 2017
@derekwaynecarr derekwaynecarr added this to the v1.6 milestone Mar 6, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2017
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Mar 6, 2017
@derekwaynecarr derekwaynecarr added release-note-none Denotes a PR that doesn't merit a release note. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note-label-needed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 6, 2017
@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 6, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@derekwaynecarr
Copy link
Member Author

opening this for the moment, but i think this should help the referenced flake.

@derekwaynecarr derekwaynecarr changed the title provide active pods to cgroup cleanup WIP: provide active pods to cgroup cleanup Mar 6, 2017
@derekwaynecarr
Copy link
Member Author

fyi @sjenning

@derekwaynecarr
Copy link
Member Author

i need to send an update to this PR to probably not look at runningPods at all.

@vishh
Copy link
Contributor

vishh commented Mar 6, 2017

This is an interesting bug.

@vishh
Copy link
Contributor

vishh commented Mar 6, 2017

Running pods is confusing and we should probably not have that exposed within the kubelet object at all.
Ping once your get rid of using Running Pods. Otherwise this LGTM

@dashpole
Copy link
Contributor

dashpole commented Mar 6, 2017

When implementing this for volumes, I found the best way to get the set of running pods is to get all pods from kl.podManager, and then check each pod's status from the status manger to see if they are terminated, which is what this PR does. So this LGTM!

@dashpole
Copy link
Contributor

dashpole commented Mar 6, 2017

Yeah, @vishh I am not even sure we need to use runningPods here. But for the purpose of fixing flakes, this looks like the correct change.

@derekwaynecarr derekwaynecarr changed the title WIP: provide active pods to cgroup cleanup provide active pods to cgroup cleanup Mar 6, 2017
@derekwaynecarr
Copy link
Member Author

@vishh @dashpole -- updated.

@derekwaynecarr
Copy link
Member Author

given the potential for flakes here, this should probably be a p0.

@vishh
Copy link
Contributor

vishh commented Mar 6, 2017

/lgtm

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 6, 2017
@dashpole
Copy link
Contributor

dashpole commented Mar 6, 2017

SimpleMount e2e_node test passed for me with the latest PR (and it failed before).

}

return false
return status.Phase == v1.PodFailed || status.Phase == v1.PodSucceeded || (pod.DeletionTimestamp != nil && notRunning(status.ContainerStatuses))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dashpole can we cleanup the volume code to use this active pod lister method instead? Not necessary for this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. Nothing would give me more joy :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekwaynecarr your change to podIsTerminated may cause other race conditions (e.g., pod worker still trying to start new containers). Could you remove this?

/cc @Random-Liu @dchen1107

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't have any effects during pod startup, as this change only makes a difference when the deletion timestamp is set. I agree that we should do a more thorough review of what effects this could have, though. In general, I think this is a better definition of podIsTerminated, as a pod that has been deleted, and has no containers running is definitely in a terminal state.

@vishh
Copy link
Contributor

vishh commented Mar 6, 2017

/lgtm again

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

The following people have approved this PR: derekwaynecarr, vishh

Needs approval from an approver in each of these OWNERS Files:

We suggest the following people:
cc @timstclair
You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@derekwaynecarr
Copy link
Member Author

Having one function for termination filtering is preferred

@derekwaynecarr
Copy link
Member Author

@k8s-bot gce etcd3 e2e test this
@k8s-bot gci gce e2e test this

@dashpole
Copy link
Contributor

dashpole commented Mar 6, 2017

The GCI GCE failure is known: #42597

@dashpole
Copy link
Contributor

dashpole commented Mar 7, 2017

@k8s-bot gce etcd3 e2e test this

@derekwaynecarr
Copy link
Member Author

bot didnt apply tag from #42585 (comment) so I am applying to get this critical fix in queue.

@derekwaynecarr derekwaynecarr added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 7, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 42506, 42585, 42596, 42584)

@k8s-github-robot k8s-github-robot merged commit a1c5d1b into kubernetes:master Mar 7, 2017
yujuhong added a commit to yujuhong/kubernetes that referenced this pull request Mar 10, 2017
Even if the deletion timestamp is set and no containers are currently
running, the pod worker may still be in the process of recreating those
pods. Continue dispatch work to the pod worker to make sure deletion of
containers are completed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ci-kubernetes-node-kubelet broken since 16:00pm PST yesterday (03/01)
8 participants