Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync the status of static Pods #84951

Merged
merged 1 commit into from Dec 12, 2019

Conversation

@tedyu
Copy link
Contributor

tedyu commented Nov 8, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
This refines the change introduced by #77661.
This PR sync's status for static pods.

Which issue(s) this PR fixes:
Fixes #84931
#85389
#82405

Fixed a regression where the kubelet would fail to update the ready status of pods.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@tedyu

This comment has been minimized.

Copy link
Contributor Author

tedyu commented Nov 8, 2019

/priority important-soon

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Nov 8, 2019

@tedyu: GitHub didn't allow me to request PR reviews from the following users: mfpierre.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @mfpierre

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yutedz yutedz force-pushed the yutedz:status-mgr-sync-static branch from 1b7d07a to 45f6634 Nov 8, 2019
@tedyu

This comment has been minimized.

Copy link
Contributor Author

tedyu commented Nov 8, 2019

/test pull-kubernetes-e2e-gce

@hvaara

This comment has been minimized.

Copy link
Member

hvaara commented Nov 9, 2019

/retest

@tedyu

This comment has been minimized.

Copy link
Contributor Author

tedyu commented Nov 10, 2019

/assign @Random-Liu

@mfpierre

This comment has been minimized.

Copy link
Contributor

mfpierre commented Nov 12, 2019

Hey @tedyu thanks for refining the PR, the change looks good to me, however I'm not sure I get why this would cause the mentionned bugs, do we understand the root cause of the issue?

@kayrus

This comment has been minimized.

Copy link
Contributor

kayrus commented Dec 4, 2019

I confirm, PR works. It would be also great to have it in 1.15.x (kayrus@e2a18d9)

@databus23

This comment has been minimized.

Copy link
Contributor

databus23 commented Dec 4, 2019

@mfpierre From my understanding what is happening sometimes (after a reconnect?) the kubelet does not notice that the pod state in the apiserver (Ready: false) differs from its local state (Ready: true) and therefore it does not perform an update to correct the state in the apiserver.
Looking at the proposed fix it seems to me that the additional update for pod status of static pods corrupts the cache for non-static pods causing this bug.

@haosdent

This comment has been minimized.

Copy link
Member

haosdent commented Dec 4, 2019

Verified in our cluster, +1 for @tedyu 's change.

@databus23

This comment has been minimized.

Copy link
Contributor

databus23 commented Dec 4, 2019

@tedyu I think this should be rather a kind/bug and probably there should also be a release note that says something like "Fixed a regression where the kubelet would fail to update the ready status of pods". I also think the current title does not communicate the urgency/importance of the PR.

@yutedz yutedz force-pushed the yutedz:status-mgr-sync-static branch from 45f6634 to 3678433 Dec 4, 2019
@tedyu

This comment has been minimized.

Copy link
Contributor Author

tedyu commented Dec 6, 2019

@k8s-ci-robot k8s-ci-robot requested a review from dchen1107 Dec 6, 2019
@databus23

This comment has been minimized.

Copy link
Contributor

databus23 commented Dec 11, 2019

Can we pretty please make some progress here? This bug is hitting us on a regular basis.

@dchen1107

This comment has been minimized.

Copy link
Member

dchen1107 commented Dec 12, 2019

Ok, let me think this loudly since I am not the original reviewer of #77661 which causes the regression. #77661 addressed the status update / sync issue for static pods, but introduce a new regression to non-static pods. This fix is minimal by limiting the extra status update to only static pods.

/lgtm
/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Dec 12, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, tedyu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dchen1107

This comment has been minimized.

Copy link
Member

dchen1107 commented Dec 12, 2019

@tedyu Thanks for fixing that regression. We might want to cherrypick this fix back to previous releases later.

@tedyu

This comment has been minimized.

Copy link
Contributor Author

tedyu commented Dec 12, 2019

Created cherry pick for 1.17 branch.

For 1.16, I got:

+++ Conflicts detected:

UU pkg/kubelet/kubelet_getters.go

+++ Please resolve the conflicts in another window (and remember to 'git add / git am --continue')
@k8s-ci-robot k8s-ci-robot merged commit 010291d into kubernetes:master Dec 12, 2019
14 of 15 checks passed
14 of 15 checks passed
tide Not mergeable. Retesting: pull-kubernetes-kubemark-e2e-gce-big
Details
cla/linuxfoundation yutedz authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Dec 12, 2019
@jmcmeek

This comment has been minimized.

Copy link
Contributor

jmcmeek commented Dec 12, 2019

@tedyu What's the 1.17 cherry pick? I don't see it - just 1.15 and 1.16 PRs above.

Sorry - just answered my own question - #86189

@tedyu

This comment has been minimized.

Copy link
Contributor Author

tedyu commented Dec 12, 2019

Cherry pick for 1.17 is #86189

@haosdent haosdent mentioned this pull request Jan 21, 2020
6 of 6 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.