Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Ensure there is one running static pod with the same full name" #107734

Conversation

rphillips
Copy link
Member

@rphillips rphillips commented Jan 24, 2022

Reverts #104743 #107695

Issue: #107733

Reverts handling of static pod fullnames that may sometimes cause static pods to fail to restart.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 24, 2022
@k8s-ci-robot k8s-ci-robot added area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 24, 2022
@rphillips
Copy link
Member Author

/kind regression
/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added kind/regression Categorizes issue or PR as related to a regression from a prior release. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 24, 2022
@ehashman
Copy link
Member

I think the revert makes sense.

I noticed when I was trying to add coverage for #107695 that we added newStaticPod here in the test coverage but not newStaticPodWithPhase to match the existing mocks:

We lack coverage for this:

if !isRuntimePod && (pod.Status.Phase == v1.PodFailed || pod.Status.Phase == v1.PodSucceeded) {
// check to see if the pod is not running and the pod is terminal.
// If this succeeds then record in the podWorker that it is terminated.
if statusCache, err := p.podCache.Get(pod.UID); err == nil {
if isPodStatusCacheTerminal(statusCache) {

So @rphillips tried to add in the missing pod fullname, but I'm not 100% sure we have a reproducer that can catch that.

I started adding a case:

diff --git a/pkg/kubelet/pod_workers_test.go b/pkg/kubelet/pod_workers_test.go
index 4028c06c292..fc5f975acfd 100644
--- a/pkg/kubelet/pod_workers_test.go
+++ b/pkg/kubelet/pod_workers_test.go
@@ -164,6 +164,21 @@ func newStaticPod(uid, name string) *v1.Pod {
        }
 }
 
+func newStaticPodWithPhase(uid, name string, phase v1.PodPhase) *v1.Pod {
+       return &v1.Pod{
+               ObjectMeta: metav1.ObjectMeta{
+                       UID:  types.UID(uid),
+                       Name: name,
+                       Annotations: map[string]string{
+                               kubetypes.ConfigSourceAnnotationKey: kubetypes.FileSource,
+                       },
+               },
+               Status: v1.PodStatus{
+                       Phase: phase,
+               },
+       }
+}
+
 // syncPodRecord is a record of a sync pod call
 type syncPodRecord struct {
        name       string
@@ -856,6 +871,24 @@ func Test_allowPodStart(t *testing.T) {
                        },
                        allowed: true,
                },
+               {
+                       desc: "static pod if the static pod is terminated and rebooting",
+                       pod:  newStaticPodWithPhase("uid-0", "foo", v1.PodSucceeded),
+                       podSyncStatuses: map[types.UID]*podSyncStatus{
+                               "uid-0": {
+                                       fullname: "foo_",
+                               },
+                       },
+                       // waitingToStartStaticPodsByFullname: map[string][]types.UID{
+                       //      "foo_": {
+                       //              types.UID("uid-0"),
+                       //      },
+                       // },
+                       startedStaticPodsByFullname: map[string]types.UID{
+                               "foo_": types.UID("uid-0"),
+                       },
+                       allowed: true,
+               },
        }
 
        for _, tc := range testCases {

But then I realized that we don't have anything mocking the worker podCache, which will be necessary to give proper coverage, and so we probably have a whole matrix of missing test cases. Hence, I think it might be easiest to revert for now.

/lgtm
/cc @gjkim42
/triage accepted

/release-note-edit

Reverts handling of static pod fullnames that may sometimes cause static pods to fail to restart.

@k8s-ci-robot
Copy link
Contributor

@ehashman: /release-note-edit must be used with a single release note block.

In response to this:

I think the revert makes sense.

I noticed when I was trying to add coverage for #107695 that we added newStaticPod here in the test coverage but not newStaticPodWithPhase to match the existing mocks:

We lack coverage for this:

if !isRuntimePod && (pod.Status.Phase == v1.PodFailed || pod.Status.Phase == v1.PodSucceeded) {
// check to see if the pod is not running and the pod is terminal.
// If this succeeds then record in the podWorker that it is terminated.
if statusCache, err := p.podCache.Get(pod.UID); err == nil {
if isPodStatusCacheTerminal(statusCache) {

So @rphillips tried to add in the missing pod fullname, but I'm not 100% sure we have a reproducer that can catch that.

I started adding a case:

diff --git a/pkg/kubelet/pod_workers_test.go b/pkg/kubelet/pod_workers_test.go
index 4028c06c292..fc5f975acfd 100644
--- a/pkg/kubelet/pod_workers_test.go
+++ b/pkg/kubelet/pod_workers_test.go
@@ -164,6 +164,21 @@ func newStaticPod(uid, name string) *v1.Pod {
       }
}

+func newStaticPodWithPhase(uid, name string, phase v1.PodPhase) *v1.Pod {
+       return &v1.Pod{
+               ObjectMeta: metav1.ObjectMeta{
+                       UID:  types.UID(uid),
+                       Name: name,
+                       Annotations: map[string]string{
+                               kubetypes.ConfigSourceAnnotationKey: kubetypes.FileSource,
+                       },
+               },
+               Status: v1.PodStatus{
+                       Phase: phase,
+               },
+       }
+}
+
// syncPodRecord is a record of a sync pod call
type syncPodRecord struct {
       name       string
@@ -856,6 +871,24 @@ func Test_allowPodStart(t *testing.T) {
                       },
                       allowed: true,
               },
+               {
+                       desc: "static pod if the static pod is terminated and rebooting",
+                       pod:  newStaticPodWithPhase("uid-0", "foo", v1.PodSucceeded),
+                       podSyncStatuses: map[types.UID]*podSyncStatus{
+                               "uid-0": {
+                                       fullname: "foo_",
+                               },
+                       },
+                       // waitingToStartStaticPodsByFullname: map[string][]types.UID{
+                       //      "foo_": {
+                       //              types.UID("uid-0"),
+                       //      },
+                       // },
+                       startedStaticPodsByFullname: map[string]types.UID{
+                               "foo_": types.UID("uid-0"),
+                       },
+                       allowed: true,
+               },
       }

       for _, tc := range testCases {

But then I realized that we don't have anything mocking the worker podCache, which will be necessary to give proper coverage, and so we probably have a whole matrix of missing test cases. Hence, I think it might be easiest to revert for now.

/lgtm
/cc @gjkim42
/triage accepted

/release-note-edit

Reverts handling of static pod fullnames that may sometimes cause static pods to fail to restart.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 24, 2022
@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 24, 2022
@gjkim42
Copy link
Member

gjkim42 commented Jan 31, 2022

Found a reproducer here.
#107733 (comment)
e2e test: #107855

And made a fix for it.
#107854

Maybe the fix could be an alternative.

@ehashman
Copy link
Member

First I can't dig this issue as there is no clear reproducer.

I am not sure what this CI job was doing, but I guess the CI job was trying to restart an etcd pod which is a static pod. Am I right? Does the job always fail? or sometimes?

I believe the reproducer is to delete and recreate the static pod multiple times, such that we hit a race between teardown and setup. We are seeing that sometimes the new static pod never comes up, such as in the linked run above.

@rphillips has confirmed through some local testing that a revert fixes the issue we're seeing, whereas other patches still seem to run into it (e.g. #107854 (comment))

Are we sure that this patch fixes the bug in question? We had a lot of other patches go in from Clayton in this release, it's possible that this bug this intended to fix was already fixed elsewhere.

I think right now the best path forward is to revert this, add test coverage that demonstrates the problem, and then add a patch that fixes the problem. The original PR here only added unit tests and did not add an e2e loop with static pods.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 31, 2022
@rphillips
Copy link
Member Author

/hold cancel

+1 with @ehashman ... We need to revisit the original PR and get more testing around it.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 31, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrunalp, rphillips

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 31, 2022
@ehashman
Copy link
Member

/test pull-kubernetes-e2e-gce-ubuntu-containerd

@smarterclayton
Copy link
Contributor

smarterclayton commented Jan 31, 2022

/hold

So one serious concern here is that the fix was addressing a demonstrable hard break in static pods. If we've regressed elsewhere, reverting just moves the problem around (doesn't result in fewer problems). I'd prefer to spend a small amount of time and find a fix, while at the same time improving the test coverage based on our new failure mode.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 31, 2022
@ehashman
Copy link
Member

So one serious concern here is that the fix was addressing a demonstrable hard break in static pods. If we've regressed elsewhere, reverting just moves the problem around (doesn't result in fewer problems). I'd prefer to spend a small amount of time and find a fix, while at the same time improving the test coverage based on our new failure mode.

I believe the demonstrable break was your PR #104847 which fixed #104648. I think there may be some mixed communication around bugs and patches which caused this patch to land; it is not clear to me what this is actually fixing, and it's definitely causing a regression.

/test pull-kubernetes-e2e-gce-ubuntu-containerd

@rphillips
Copy link
Member Author

+1 the original PR clearly regresses the static pods, and the reasoning behind the issue may have been fixed in the followup pod lifecycle PRs we fixed last year.

@smarterclayton
Copy link
Contributor

#104743 (comment) and #104743 (comment) summarizes why that fix is necessary - we don't want people to use static UIDs with static pods, and we want the "obvious correct behavior". All static pod updates look like force deletion on the apiserver to the kubelet - the old version disappears, the new version is held on the kubelet until the old version completes. Therefore, reverting #104743 would allow static pods without static uids to be reentrant, which is surprising.

The attached fix looks correct, and after reviewing the test it also seems to reproduce the problem described. I had some additional feedback on the issue, but I don't think a revert is appropriate until we determine the fix has further issues or there is another issue buried underneath it.

@rphillips
Copy link
Member Author

rphillips commented Jan 31, 2022

@smarterclayton Prior to #104743 there is logic to prevent static pods of the same name to be started. The name is derived from the [namespace]_[pod name] tuple. This check would prevent the static uid from being re-entrant since the block is just derived from the tuple and not the UID.

@gjkim42
Copy link
Member

gjkim42 commented Jan 31, 2022

@smarterclayton Prior to #104743 there is logic to prevent static pods of the same name to be started. The name is derived from the [namespace]_[pod name] tuple. This check would prevent the static uid from being re-entrant since the block is just derived from the tuple and not the UID.

I guess that the logic has not been enough to prevent the new static pod to be reentrant, and #104743 was intended to guarantee the graceful termination of a static pod.

I can understand that this bug is more urgent to be fixed. However, the graceful termination of a static pod that #104743 wanted to guarantee should be fixed someday unless we decide not to support it.

@rphillips
Copy link
Member Author

@gjkim42 I'm confused as to the ordering of events. #97722 was opened prior to the pod lifecycle refactor, and I think we handled this case. I believe #97722 should have been closed. Is there a known test case that failed?

}, 19*time.Second, 200*time.Millisecond).Should(gomega.BeNil())
})

ginkgo.It("mirror pod termination should satisfy grace period when static pod is updated [NodeConformance]", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the e2e test of this PR demonstrates the failed case.
or you can reproduce it using #97722 (comment) original issue's description.

@gjkim42
Copy link
Member

gjkim42 commented Jan 31, 2022

@rphillips
#97722 had been fixed by #98103, and was reproduced again after the pod lifecycle refactor.
#104743 fixed it again later.

@rphillips
Copy link
Member Author

@gjkim42 Didn't this patch fix it? d571980

@gjkim42
Copy link
Member

gjkim42 commented Jan 31, 2022

@rphillips
No to my knowledge. They are different issues.

One handles the recreation of static pods having the same uid with the same fullname(technically one pod).
The other(#97722) handles the recreation(terminate one and create the other) of static pods having the different uids with the same fullname(technically two pods).

@rphillips
Copy link
Member Author

rphillips commented Jan 31, 2022

@gjkim42 After reviewing the original PR, I noticed this change in logic. completeTerminating does not cleanup startedStaticPodsByFullname, when it did cleanup terminatingStaticPodFullnames before. I tested a build with similar cleanup logic and it does seem to help the issue.

@gjkim42
Copy link
Member

gjkim42 commented Feb 1, 2022

@gjkim42 After reviewing the original PR, I noticed this change in logic. completeTerminating does not cleanup startedStaticPodsByFullname, when it did cleanup terminatingStaticPodFullnames before. I tested a build with similar cleanup logic and it does seem to help the issue.

@rphillips
I think the logic is technically the same as it was. This logic complements the change.
Removing a static pod from startedStaticPodsByFullname in completeTerminating function is not what we want to do.

@rphillips
Copy link
Member Author

closing in favor of #107900

@rphillips rphillips closed this Feb 2, 2022
SIG Node CI/Test Board automation moved this from Issues - In progress to Done Feb 2, 2022
SIG Node PR Triage automation moved this from Triage to Done Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

Successfully merging this pull request may close these issues.

None yet

7 participants