Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: don't reject pods without adding them to the pod manager #37661

Merged
merged 1 commit into from
Dec 1, 2016

Conversation

yujuhong
Copy link
Contributor

@yujuhong yujuhong commented Nov 29, 2016

kubelet relies on the pod manager as a cache of the pods in the apiserver (and
other sources) . The cache should be kept up-to-date even when rejecting pods.
Without this, kubelet may decide at any point to drop the status update
(request to the apiserver) for the rejected pod since it would think the pod no
longer exists in the apiserver.

This should fix #37658

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 29, 2016
@k8s-oncall
Copy link

This change is Reviewable

@yujuhong yujuhong added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Nov 29, 2016
existingPods := kl.podManager.GetPods()
// Always add the pod to the pod manager. Kubelet relies on the pod
// manager as the source of truth for the desired state. If a pod does
// not esist in the pod manager, it means that it has been deleted in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/esist/exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@Random-Liu
Copy link
Member

LGTM. Thanks for the fix. And sorry for not catching this when review that PR. :p

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Nov 29, 2016
@yujuhong yujuhong added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Nov 30, 2016
@yujuhong yujuhong added this to the v1.5 milestone Nov 30, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit 66690bd0df8a941fe3999da512c7e60d384cc8a9. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@Random-Liu Random-Liu added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2016
@derekwaynecarr
Copy link
Member

@sjenning - I think this may also fix the issue we saw today where pods that were stuck terminating even when containers were all terminated remained stuck after a Kubelet restart. It seems those pods were never getting their status updated which would happen if they had been filtered out of pod manager. Need to verify this more though .

@dims
Copy link
Member

dims commented Nov 30, 2016

@k8s-bot gci gke e2e test this

@calebamiles
Copy link
Contributor

should this get a cherry pick label @dchen1107, @Random-Liu?

cc: @saad-ali, @dims, @kubernetes/sig-node

@yujuhong
Copy link
Contributor Author

should this get a cherry pick label

Yes. Just added.

@yujuhong
Copy link
Contributor Author

FWIW, I reproduced this issue in my own cluster using a somewhat extreme test case (>1000 pods assigned to one node), and verified the fix. However, it's good to keep in mind that the kubelet's apiserver client has limited QPS and if the replication control keeps creating new pods assigned to the node, kubelet's status update throughout would never be able to catch up.

@yujuhong yujuhong removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2016
@yujuhong
Copy link
Contributor Author

@dchen1107 @Random-Liu I added a check to see whether the pod in question has terminated or not. PTAL again, thanks!

/cc @dashpole

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE etcd3 e2e failed for commit a71e6b780b789149f4fd7de1fe98127fc61bd0e4. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce etcd3 e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@yujuhong
Copy link
Contributor Author

@k8s-bot gce etcd3 e2e test this

kubelet relies on the pod manager as a cache of the pods in the apiserver (and
other sources) . The cache should be kept up-to-date even when rejecting pods.
Without this, kubelet may decide at any point to drop the status update
(request to the apiserver) for the rejected pod since it would think the pod no
longer exists in the apiserver.

Also check if the pod to-be-admitted has terminated or not. In the case where
it has terminated, skip the admission process completely.
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 1, 2016
@Random-Liu Random-Liu added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 1, 2016
@Random-Liu
Copy link
Member

LGTM

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit c4b33f3 into kubernetes:master Dec 1, 2016
@yujuhong yujuhong added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Dec 1, 2016
@saad-ali saad-ali added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Dec 1, 2016
saad-ali added a commit that referenced this pull request Dec 1, 2016
…61-upstream-release-1.5

Automated cherry pick of #37661
@yujuhong yujuhong deleted the always_add_pods branch February 2, 2017 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pods rejected by Kubelet might stay at pending phase forever
10 participants