Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for second Beta with GA criteria for "KEP-3329: Retriable and non-retriable Pod failures for Jobs" #3757

Merged
merged 2 commits into from
Feb 7, 2023

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Jan 19, 2023

  • One-line PR description: Update graduation criteria for GA
  • Main changes:
    • proposal to modify Kubelet to transition into Failed scheduled Pending pods which are Terminating (and with finalizer)
    • refreshed GA graduation criteria

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 19, 2023
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 19, 2023
@mimowo mimowo force-pushed the ga-update-pod-failure-policy branch 2 times, most recently from b9b7791 to 6aa4aa7 Compare January 23, 2023 11:12
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 23, 2023
@mimowo mimowo force-pushed the ga-update-pod-failure-policy branch 2 times, most recently from ad51a0a to d9271eb Compare January 23, 2023 12:42
@mimowo mimowo changed the title WIP: Update KEP with criteria for GA Update KEP with criteria for GA Jan 23, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2023
@mimowo mimowo force-pushed the ga-update-pod-failure-policy branch 5 times, most recently from 078daec to 0736384 Compare January 26, 2023 10:53
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 26, 2023
@mimowo mimowo force-pushed the ga-update-pod-failure-policy branch 8 times, most recently from d71f1e8 to c3df52a Compare January 26, 2023 11:46
@mimowo
Copy link
Contributor Author

mimowo commented Feb 6, 2023

@bobbypage @SergeyKanzhelev I've updated the PR, addressing the comments. PTAL and let me know if there is something more requiring an update.

@mimowo mimowo requested review from bobbypage and SergeyKanzhelev and removed request for kow3ns and bobbypage February 6, 2023 11:58
Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good from a PRR perspective, except we have added one question to the PRR this cycle:

https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/README.md?plain=1#L723

Can you answer that please?

Comment on lines +1082 to +1087
This scenario is a little bit different, the config is correct, but the
pod is deleted by a user while in the `Pending` phase. In that case, the
pods transition into the `Running` phase and fail soon after. With the proposed
change the transition will happen earlier, thus saving resources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not super clear why this scenario will result in phase=Failed being applied, but not the previous scenario. Not needed for the KEP doc itself, but I think we should understand what is going on here...

Copy link
Contributor Author

@mimowo mimowo Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this scenario the pod actually transitions to the Running phase, but the container is killed, respending the grace period. If the container completes with exitCode 0 within the graceful period it actually ends the entire pod in Succeed state, otherwise it ends with exitCode 137.

EDIT: updated the scenario text to mention the relationship with the graceful period.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, thanks for clarifying this in the doc.

@bobbypage
Copy link
Member

I have one more comment about the terminology used in the KEP (terminating vs deleting), but LGTM on the content, thank you for all of the updates and clarifying the scenarios.

@mimowo mimowo force-pushed the ga-update-pod-failure-policy branch 2 times, most recently from 250f603 to 570d0f3 Compare February 7, 2023 10:23
@mimowo mimowo force-pushed the ga-update-pod-failure-policy branch from 570d0f3 to 439c237 Compare February 7, 2023 10:37
@mimowo
Copy link
Contributor Author

mimowo commented Feb 7, 2023

Thanks, this looks good from a PRR perspective, except we have added one question to the PRR this cycle:
https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/README.md?plain=1#L723
Can you answer that please?

Done, PTAL. The feature only introduces an additional API PATCH call, what is answered in the other point.

@mimowo mimowo force-pushed the ga-update-pod-failure-policy branch 4 times, most recently from aca8704 to 0527245 Compare February 7, 2023 14:43
Signed-off-by: Michal Wozniak <mw219725@gmail.com>
@mimowo mimowo force-pushed the ga-update-pod-failure-policy branch from 0527245 to 8ce4cde Compare February 7, 2023 14:46
@johnbelamaric
Copy link
Member

Ok, PRR looks good and is approved (no prow command needed this time, but want to make it explicit for the enhancements team).

@bobbypage
Copy link
Member

bobbypage commented Feb 7, 2023

Thanks for all the updates and clarifications in the KEP!

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2023
@k8s-ci-robot k8s-ci-robot merged commit 10c2d18 into kubernetes:master Feb 7, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Feb 7, 2023
@dchen1107
Copy link
Member

Thanks @bobbypage

I also reviewed it.

/lgtm
/approve from SIG Node perspective.

@mimowo
Copy link
Contributor Author

mimowo commented Feb 8, 2023

@bobbypage @dchen1107 thank you for completing the review!

@mimowo mimowo deleted the ga-update-pod-failure-policy branch February 8, 2023 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

9 participants