New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch feature flag to beta for pod replacement policy and add e2e test #121491
Switch feature flag to beta for pod replacement policy and add e2e test #121491
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @dejanzele. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
You should add a release note for this. |
test/e2e/apps/job.go
Outdated
pods, err := e2ejob.GetJobPods(ctx, f.ClientSet, f.Namespace.Name, job.Name) | ||
framework.ExpectNoError(err, "failed to get pod list for job in namespace: %s", f.Namespace.Name) | ||
|
||
framework.ExpectNoError(e2epod.DeletePodsWithGracePeriod(ctx, f.ClientSet, pods.Items, 0), "failed to delete pods in namespace: %s", f.Namespace.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually let's redesign this test:
- Set
-termination-grace-period
to something long (1min should be fine). - Change
DeletePodsWithGracePeriod
to have a grace period (not zero). - At this point (line 364, ), check that the Job status has one terminating pod and no active pods.
- Once we have verified the desired status, issue another
DeletePodsWithGracePeriod
, but this time with zero grace period. This will cause a SIGKILL and immediately delete the pods. - Verify pods recreated.
fcaafc0
to
32d9375
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/retitle Switch feature flag to beta for pod replacement policy and add e2e test
LGTM label has been added. Git tree hash: e96ce45c0275349a0d1c0d68dd6116896b4a23e5
|
if job.Status.Active == 0 && job.Status.Failed == 0 && *job.Status.Terminating == 1 { | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what prevents the pod from finishing too fast, so that it's in a terminating state before reaching Failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm good question, kubelet needs some time to transition it, but maybe there is a small chance it happens super fast and test fails. I don't have a good answer, do you have ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why we had the grace period in the pod itself... the tricky part is forcing a delete after sending a SIGTERM.
/lgtm |
LGTM label has been added. Git tree hash: 16e1a0a28f9962a65ecf1c8efe4b654d1db9317a
|
Can you squash? |
update pod replacement policy feature flag comment and refactor the e2e test for pod replacement policy minor fixes for pod replacement policy and e2e test fix wrong assertions for pod replacement policy e2e test more fixes to pod replacement policy e2e test refactor PodReplacementPolicy e2e test to use finalizers fix unit tests when pod replacement policy feature flag is promoted to beta fix podgc controller unit tests when pod replacement feature is enabled fix lint issue in pod replacement policy e2e test assert no error in defer function for removing finalizer in pod replacement policy e2e test implement test using a sh trap for pod replacement policy reduce sleep after SIGTERM in pod replacement policy e2e test to 5s
f095ca7
to
e98c33b
Compare
done |
Thanks! |
job := e2ejob.NewTestJob("", "pod-recreate-failed", v1.RestartPolicyNever, 1, 1, nil, 1) | ||
job.Spec.PodReplacementPolicy = ptr.To(batchv1.Failed) | ||
job.Spec.Template.Spec.Containers[0].Command = []string{"/bin/sh", "-c", `_term(){ | ||
sleep 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary if we follow the design proposed here: #121491 (comment)? I think we could use the "notTerminate" image, and another call for DeletePodsWithGracePeriod
with 0
. Then, we don't need the 5s penalty. But I'm happy to do this improvement later in the test freeze not to risk the code freeze.
EDIT: Actually, I tested locally with kubectl
, that when the first delete for a sleep pod has grace-period=30, then the next executed right after with grace-period=0 does not terminate the pod quickly (still takes ~30s). This was surprising, but in that case I don't see any easy improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, dejanzele, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
What type of PR is this?
/sig apps
/kind feature
What this PR does / why we need it:
Promotes feature flag for PodReplacementPolicy to beta and adds e2e test for the Failed PodReplacementPolicy.
Which issue(s) this PR fixes:
Part of kubernetes/enhancements#3939
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: