Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'checkpoint' command to kubectl #120898

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adrianreber
Copy link
Contributor

@adrianreber adrianreber commented Sep 26, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

Kubernetes 1.25 introduced the possibility to checkpoint a container.

For details please see the KEP 2008 Forensic Container Checkpointing kubernetes/enhancements#2008

The initial implementation only provided a kubelet API endpoint to trigger a checkpoint. The main reason for not extending it to the API server and kubectl was that checkpointing is a completely new concept.

Although the result of the checkpointing, the checkpoint archive, is only accessible by root it is important to remember that it contains all memory pages and thus all possible passwords, private keys and random numbers. With the checkpoint archive being only accessible by root it does not directly make it easier to access this potentially confidential information as root would be able to retrieve that information anyway.

Now, at least three Kubernetes releases later, we have not heard any negative feedback about the checkpoint archive and its data. There were, however, many questions to be able to create a checkpoint via kubectl and not just via the kubelet API endpoint.

This commit adds 'checkpoint' support to kubectl. The 'checkpoint' command is heavily influenced by the code of the 'exec' and 'logs' command. The checkpoint command is only available behind the 'alpha' sub-command as the "Forensic Container Checkpointing" KEP is still marked as Alpha.

Example output:

 $ kubectl alpha checkpoint test-pod-1 -c container-2
 Node:                  127.0.0.1/127.0.0.1
 Namespace:             default
 Pod:                   test-pod-1
 Container:             container-2
 Checkpoint Archive:    /var/lib/kubelet/checkpoints/checkpoint-archive.tar

The tests are implemented that they handle a CRI implementation with and without a implementation of the CRI RPC call 'ContainerCheckpoint'.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added `checkpoint` command to kubectl.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/2008

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 26, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @adrianreber. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/code-generation area/kubectl area/test kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 26, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: adrianreber
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@adrianreber
Copy link
Contributor Author

@mikebrow PTAL

@adrianreber adrianreber marked this pull request as ready for review September 28, 2023 06:55
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 28, 2023
@jiahuif
Copy link
Member

jiahuif commented Sep 28, 2023

/assign @mikebrow
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 28, 2023
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 20, 2023
@dims dims added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 24, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mochizuki875
Copy link
Member

Hi, I'm watching this issue and having high hopes for this feature.
I think it would be convenient to allow k8s user to checkpoint the specific container using kubectl.
Will this PR not be restarted?

@adrianreber
Copy link
Contributor Author

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 5, 2024
@adrianreber
Copy link
Contributor Author

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Feb 5, 2024
@k8s-ci-robot
Copy link
Contributor

@adrianreber: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Feb 5, 2024
@bart0sh bart0sh added this to Triage in SIG Node PR Triage Feb 6, 2024
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Archive-it in SIG Node CI/Test Board Feb 7, 2024
@bart0sh
Copy link
Contributor

bart0sh commented Feb 8, 2024

@adrianreber please, rebase the PR, thanks!

@bart0sh bart0sh moved this from Triage to Waiting on Author in SIG Node PR Triage Feb 8, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 9, 2024
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 2, 2024
Kubernetes 1.25 introduced the possibility to checkpoint a container.

For details please see the KEP 2008 Forensic Container Checkpointing
kubernetes/enhancements#2008

The initial implementation only provided a kubelet API endpoint to
trigger a checkpoint. The main reason for not extending it to the API
server and kubectl was that checkpointing is a completely new concept.

Although the result of the checkpointing, the checkpoint archive, is only
accessible by root it is important to remember that it contains all
memory pages and thus all possible passwords, private keys and random
numbers. With the checkpoint archive being only accessible by root it
does not directly make it easier to access this potentially confidential
information as root would be able to retrieve that information anyway.

Now, at least three Kubernetes releases later, we have not heard any
negative feedback about the checkpoint archive and its data. There were,
however, many questions to be able to create a checkpoint via kubectl
and not just via the kubelet API endpoint.

This commit adds 'checkpoint' support to kubectl. The 'checkpoint'
command is heavily influenced by the code of the 'exec' and 'logs'
command. The checkpoint command is only available behind the 'alpha'
sub-command as the "Forensic Container Checkpointing" KEP is still
marked as Alpha.

Example output:

 $ kubectl alpha checkpoint test-pod -c container-2
 Node:                  127.0.0.1/127.0.0.1
 Namespace:             default
 Pod:                   test-pod-1
 Container:             container-2
 Checkpoint Archive:    /var/lib/kubelet/checkpoints/checkpoint-archive.tar

The tests are implemented that they handle a CRI implementation with and
without a implementation of the CRI RPC call 'ContainerCheckpoint'.

Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber adrianreber force-pushed the 2023-09-26-kubectl-checkpoint branch from c43a966 to bd19044 Compare March 26, 2024 09:24
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/code-generation area/kubectl area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: In Progress
Status: In Progress
SIG Node PR Triage
Waiting on Author
Development

Successfully merging this pull request may close these issues.

None yet

8 participants