New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrations: verify source and target containerdisks using checksums #10747
migrations: verify source and target containerdisks using checksums #10747
Conversation
e8ee571
to
edfdf6e
Compare
/cc @brianmcarey @xpivarc |
Unit tests are still missing, I'll be adding some shortly. |
edfdf6e
to
fac50ef
Compare
@xpivarc @fossedihelm the PR is ready for review |
fac50ef
to
5bc966d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @acardace! Some small nit below.
IIUC if the checksum is not present, it is like a checksum match.
So, in case of upgrade with LiveMigrate
workload update strategy, it can be possible that the containerdisk differ from the original one. Am I right?
Thank you!
pkg/virt-handler/vm.go
Outdated
@@ -2683,6 +2722,18 @@ func (d *VirtualMachineController) vmUpdateHelperMigrationTarget(origVMI *v1.Vir | |||
return nil | |||
} | |||
|
|||
// Verify container disks checksum | |||
err = d.containerDiskMounter.VerifyChecksums(vmi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this will change the existing behavior. In the case when floating tags are used for containerdisk images, with this change the migration will fail if a tag points to a newer version. The current implementation is sort of agnostic to that, since it refers to the image by its digest. Probably the topic 'whether the usage of floating tags is a good practice or not' is debatable, but I think that if we decide to go with this approach, it should be clearly documented that floating tags will likely cause migration failure with crio >= 1.28. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately yes, we will break some users because Kubernetes doesn't require to report digest. I believe this is the only reasonable handling. Documenting this and discouraging users from using floating tags is a must. Note: This will be backported to 1.0 and 1.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a thought: would it make sense to mention that right in the release notes?
Isn't cri-o breaking a long-standing promise doing this, without providing an alternative? |
Well, it depends. Some time ago I faced a similar issue with an old containerdisk image and |
I see. I still think that during that time a lot of consumers have accumulated dependencies on this assumption, so an alternative should have at least been provided. https://www.hyrumslaw.com/ |
I personally tend to agree. We can potentially at least initiate a discussion, perhaps, with crio folks... or even try to bring this up to a broader k8s community (though, looking at the previously mentioned issue, which is still open after >5 years... I am not very optimistic on that). |
5bc966d
to
9d46b67
Compare
9d46b67
to
6c776fd
Compare
/retest-required |
pkg/virt-handler/vm.go
Outdated
@@ -2683,6 +2722,18 @@ func (d *VirtualMachineController) vmUpdateHelperMigrationTarget(origVMI *v1.Vir | |||
return nil | |||
} | |||
|
|||
// Verify container disks checksum | |||
err = d.containerDiskMounter.VerifyChecksums(vmi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a thought: would it make sense to mention that right in the release notes?
/unhold |
hey folks, sorry for chiming in late here. I do agree that we ar technically breaking behavior we've kept for a while, despite not promising that's how it would always work. The problem is really in the Kubelet relying on ImageRef when it should be thinking about ImageID. We could potentially pursue a fix there and then revert cri-o/cri-o#7149 to have imageRef refer to the image originally pulled, but until we can fix the underlying issue in cri-o/cri-o#7143 we have to choose kubelet compatibility over kubevirt cc @saschagrunert |
@vasiliy-ul @xpivarc please take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Just added one suggestion inline.
…cksum Since CRIO 1.28 doesn't post a pullable imageID in the containerStatus field inside the Pod status, this patch makes virt-handler compute and verify CRC32 checksums of the source and target pod containerdisks. If the checksum do not match virt-handler will fail the migration. Signed-off-by: Antonio Cardace <acardace@redhat.com>
Signed-off-by: Antonio Cardace <acardace@redhat.com>
Signed-off-by: Antonio Cardace <acardace@redhat.com>
Refactor to re-use `getRootDiskPath()` and `getKernelArtifactPaths` to reduce code duplication. Signed-off-by: Antonio Cardace <acardace@redhat.com>
to account for the 32Mi buffer used to verify containerdisk checksums Signed-off-by: Antonio Cardace <acardace@redhat.com>
4121b74
to
8de93af
Compare
I am going to review this today. |
/retest-required |
@dhiller's review-bot says:
I found suspicious hunks:
Files:
This PR does not satisfy at least one automated review criteria. This PR does not require further manual action. Note: botreview (kubevirt/project-infra#3100) is a Work In Progress! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
func (m *mounter) getContainerDiskPath(vmi *v1.VirtualMachineInstance, volume *v1.Volume, volumeIndex int) (*safepath.Path, error) { | ||
sock, err := m.socketPathGetter(vmi, volumeIndex) | ||
if err != nil { | ||
return nil, ErrDiskContainerGone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The caller of getContainerDiskPath
or caller of the caller could just check errors.Is(err, os.ErrNotExist)
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: xpivarc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherrypick release-1.1 |
@fossedihelm: new pull request created: #10873 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-1.0 |
@acardace: #10747 failed to apply on top of branch "release-1.0":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Starting from version 1.28 CRIO doesn't post a pullable imageID in the containerStatus field inside the Pod status.
Therefore virt-controller cannot guarantee that containerdisks pulled on the target pod will be equal to what was pulled on the source virt-launcher.
This patch makes virt-handler compute and verify CRC32 checksums of the source and target pod containerdisks to make sure they did not change.
If the checksums do not match virt-handler will fail the migration.
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #10616
Special notes for your reviewer:
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note: