-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet: don't reject pods without adding them to the pod manager #37661
Conversation
existingPods := kl.podManager.GetPods() | ||
// Always add the pod to the pod manager. Kubelet relies on the pod | ||
// manager as the source of truth for the desired state. If a pod does | ||
// not esist in the pod manager, it means that it has been deleted in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/esist/exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
LGTM. Thanks for the fix. And sorry for not catching this when review that PR. :p |
79b97ce
to
66690bd
Compare
Jenkins GCI GKE smoke e2e failed for commit 66690bd0df8a941fe3999da512c7e60d384cc8a9. Full PR test history. The magic incantation to run this job again is |
@sjenning - I think this may also fix the issue we saw today where pods that were stuck terminating even when containers were all terminated remained stuck after a Kubelet restart. It seems those pods were never getting their status updated which would happen if they had been filtered out of pod manager. Need to verify this more though . |
@k8s-bot gci gke e2e test this |
should this get a cherry pick label @dchen1107, @Random-Liu? |
Yes. Just added. |
FWIW, I reproduced this issue in my own cluster using a somewhat extreme test case (>1000 pods assigned to one node), and verified the fix. However, it's good to keep in mind that the kubelet's apiserver client has limited QPS and if the replication control keeps creating new pods assigned to the node, kubelet's status update throughout would never be able to catch up. |
66690bd
to
bddcdf7
Compare
@dchen1107 @Random-Liu I added a check to see whether the pod in question has terminated or not. PTAL again, thanks! /cc @dashpole |
bddcdf7
to
a71e6b7
Compare
Jenkins GCE etcd3 e2e failed for commit a71e6b780b789149f4fd7de1fe98127fc61bd0e4. Full PR test history. The magic incantation to run this job again is |
@k8s-bot gce etcd3 e2e test this |
kubelet relies on the pod manager as a cache of the pods in the apiserver (and other sources) . The cache should be kept up-to-date even when rejecting pods. Without this, kubelet may decide at any point to drop the status update (request to the apiserver) for the rejected pod since it would think the pod no longer exists in the apiserver. Also check if the pod to-be-admitted has terminated or not. In the case where it has terminated, skip the admission process completely.
a71e6b7
to
69caf53
Compare
LGTM |
Automatic merge from submit-queue |
…61-upstream-release-1.5 Automated cherry pick of #37661
kubelet relies on the pod manager as a cache of the pods in the apiserver (and
other sources) . The cache should be kept up-to-date even when rejecting pods.
Without this, kubelet may decide at any point to drop the status update
(request to the apiserver) for the rejected pod since it would think the pod no
longer exists in the apiserver.
This should fix #37658