New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet: fix create sandbox delete pod race #98933
kubelet: fix create sandbox delete pod race #98933
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/triage accepted
/priority important-soon
/lgtm
84eca53
to
27f6f83
Compare
/retest |
27f6f83
to
4940c71
Compare
rebased with current master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we've broken the assumptions here, causing this test to fail:
kubernetes/test/e2e_node/mirror_pod_grace_period_test.go
Lines 67 to 78 in 838bb6a
for { | |
if time.Now().Sub(start).Seconds() > 19 { | |
break | |
} | |
pod, err := f.ClientSet.CoreV1().Pods(ns).Get(context.TODO(), mirrorPodName, metav1.GetOptions{}) | |
framework.ExpectNoError(err) | |
if pod.Status.Phase != v1.PodRunning { | |
framework.Failf("expected the mirror pod %q to be running, got %q", mirrorPodName, pod.Status.Phase) | |
} | |
// have some pause in between the API server queries to avoid throttling | |
time.Sleep(time.Duration(200) * time.Millisecond) | |
} |
The mirror pod has been deleted by the time we hit that code block, so I'm not sure why the test is asserting it will still be there.
The other failures look like possible flakes,
/retest
|
||
// GetPodStatus and the following SyncPod will not return errors in the | ||
// case where the pod has been deleted. We are not adding any pods into | ||
// the fakePodProvider so they are 'deleted'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
4940c71
to
f989ada
Compare
/retest |
... pushed the wrong button on the retest |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ehashman, mrunalp, rphillips The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
2 similar comments
/retest |
/retest |
/retest Review the full test history for this PR. Silence the bot with an |
What type of PR is this?
/kind bug
/sig node
/kind flake
What this PR does / why we need it:
This PR addresses a race condition where createPodSandbox can return an error from various external modules (CNI, CRI, CSI) when the pod has been deleted. This is not an error case, so we should not log or record an event. I have created a log message at log level 4 if anyone would like to trace this code path.
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: