Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2008: Provide details about additional checkpoint/restore use cases #4305

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adrianreber
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 19, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @adrianreber. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 19, 2023
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Oct 19, 2023
@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Oct 19, 2023
@bart0sh bart0sh added this to Triage in SIG Node PR Triage Oct 20, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Oct 20, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 20, 2023
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Oct 20, 2023
@adrianreber
Copy link
Contributor Author

@mikebrow as discussed I tried to update the KEP with different possible use cases. PTAL.

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional future stories read well... not obvious if they are to be officially supported cases or additional will not be supported cases

Did you want to add more details to the forensics story? Including a scenario how/why kubelet needs to provide garbage collection / manage which are garbage collected..

Similarly, for cli enablement kubernetes/kubernetes#120898 (comment) do you want to go over the expansion of this kubelet endpoint to cover kubectl enablement, mention how a user would use kubectl in these stories.. and probably add the management (manual and automatic .. CRUD) options?

@haircommander haircommander moved this from Needs Reviewer to Waiting on Author in SIG Node PR Triage Oct 26, 2023
@adrianreber
Copy link
Contributor Author

@mikebrow Thanks

additional future stories read well... not obvious if they are to be officially supported cases or additional will not be supported cases

Added a paragraph that future stories currently already can be used with the existing implementation, but that additional support for a more user friendly implementations can be implemented. Not very concrete, but acknowledged that additional support can be added.

Did you want to add more details to the forensics story? Including a scenario how/why kubelet needs to provide garbage collection / manage which are garbage collected..

I added a paragraph about checkpoint archive management. That is might be different depending on the use case but I gave three examples how it could be implemented.

Similarly, for cli enablement kubernetes/kubernetes#120898 (comment) do you want to go over the expansion of this kubelet endpoint to cover kubectl enablement, mention how a user would use kubectl in these stories.. and probably add the management (manual and automatic .. CRUD) options?

I mentioned my kubectl PR in the Future Enhancements section.

PTAL.

@adrianreber
Copy link
Contributor Author

@rst0git Thanks for the suggestions and the patch. Applied.

@adrianreber
Copy link
Contributor Author

@mikebrow do you have any additional comments? Any recommendations how to move this forward?

@bart0sh bart0sh moved this from Waiting on Author to Needs Reviewer in SIG Node PR Triage Dec 8, 2023
Co-Authored-By: Radostin Stoyanov <radostin.stoyanov@eng.ox.ac.uk>
Signed-off-by: Adrian Reber <areber@redhat.com>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: adrianreber
Once this PR has been reviewed and has the lgtm label, please assign mrbobbytables for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@soltysh
Copy link
Contributor

soltysh commented Feb 6, 2024

@adrianreber any particular reason this is separate from #4288, can we get both PRs merged as one?

@adrianreber
Copy link
Contributor Author

@adrianreber any particular reason this is separate from #4288, can we get both PRs merged as one?

This is again complicated. My initial approach to get this feature in Kubernetes around 2019 was to focus on container migration. Then we came up with a much simpler way using the "Forensic Container Checkpointing" story. So we worked on that and it got merged. One of the steps to get it from Alpha to Beta at that time was at least one container engine should implement the new CRI API calls. CRI-O did.

When trying to graduate from Alpha to Beta the feedback from SIG-Node was that containerd should also implement it. From containerd's side I was told that the current KEP focussing on "Forensic Container Checkpointing" feels incomplete and all possible use cases should be described in the KEP.

So now I have a Alpha to Beta graduation KEP which depends on containerd support. containerd wants to merge the CheckpointContainer CRI implementation PR after this one here with the complete story is merged.

The Alpha to Beta graduation is full of discussions and adding additional discussions about use cases seems to complicate everything even more.

If everybody is happy with the changes of this PR I can include it in the other PR. I don't know.

@soltysh
Copy link
Contributor

soltysh commented Feb 7, 2024

I'll defer to sig-node leads about that decision, it's theirs call about merging, I'm asking only from PRR review perspective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
SIG Node PR Triage
Needs Reviewer
Development

Successfully merging this pull request may close these issues.

None yet

7 participants