New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch 'ContainerCheckpoint' from Alpha to Beta #123215
Switch 'ContainerCheckpoint' from Alpha to Beta #123215
Conversation
Hi @adrianreber. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
/test |
@kannon92: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-crio-cgroupv1-node-e2e-features |
Make sure you trigger You get:
|
@kannon92 Thanks. That is a good starting point. |
280797f
to
0fdb396
Compare
/test pull-crio-cgroupv1-node-e2e-features |
0fdb396
to
f897249
Compare
/test pull-crio-cgroupv1-node-e2e-features |
LGTM label has been added. Git tree hash: 9bcc632b046445672359e82fb4ad2dc4a0a2d49f
|
/triage accepted |
// if the container engine returns that it explicitly has disabled support for it. | ||
// or | ||
// '(rpc error: code = Unknown desc = checkpoint/restore support not available)' | ||
// if the container engine explicitly disabled the checkpoint/restore support | ||
if (int(statusError.ErrStatus.Code)) == http.StatusInternalServerError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be to fail rather than skip.
We should only be running these tests when we know checkpointing is set up.
We've seen cases where this jobs get skipped and no one really looks at them since they are not failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm happy to do this as a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how to handle this without skipping. As long as we test against CRI implementations that do not support that CheckpointContainer
RPC, either because of not implementing or explicitly disabling it, we need a way to ignore the test result. Skipping seems the one thing which allows this.
Do you have another idea, how to distinguish between a failure and the not implemented state. That is the reasons I am checking for the error message. These error message are the known reasons if an engine has not implemented it or explicitly disables support. If another failure is returned it should appear as a test error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So its not clear to me how you configured these jobs..
Are all crio jobs able to run this?
I think I'd expect (if crio, these tests should work). Currently if containerd, they are not working. I'd explictly add that you expect this to fail for containerd.
That way you are making sure you know the expected behavior. skip is not great imo because as I said we will not report that anything went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done as a followup I think. But just want to call it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely makes sense to think about this. Skipping is indeed not optimal as it might hide real errors. Let's find a better solution in a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @mikebrow
/assign @mrunalp @dchen1107 @deads2k if you could find some time, would you want to confirm that we are correctly securing the checkpoint endpoint? |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adrianreber, mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-verify |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
Changelog suggestion -Graduate "Forensic Container Checkpointing" (KEP #2008) from Alpha to Beta.
+Graduated _forensic container checkpointing_ [KEP #2008](https://kep.k8s.io/2008) from Alpha to Beta. |
@sftim I updated the PR description with your suggestion. Is that enough? |
Yep, that's fine. |
Hi @adrianreber , so will it be available as a beta feature in v1.30? |
Yes. KEP and code has been merged. |
Forensic Container Checkpointing as described in KEP 2008 moves from Alpha to Beta. This is the corresponding code change.
In the KEP following changes for Alpha to Beta graduation were described:
ContainerCheckpoint
CRI RPC.checkpoint
API endpoint.at https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/server/auth.go#L101-L108
names once they exist
kubelet_runtime_operations_errors_total
)checkpoint
)What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR does the code changes as discussed in kubernetes/enhancements#4288. Graduating the KEP 2008 "Forensic Container Checkpointing" from Alpha to Beta. All changes have been discussed in the corresponding KEP.
Which issue(s) this PR fixes:
Special notes for your reviewer:
This PR contains the code changes for graduating the KEP 2008 "Forensic Container Checkpointing" from Alpha to Beta.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: