New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Vsock support to KubeVirt #8546
Conversation
Hi @zhuchenwang. Thanks for your PR. I'm waiting for a kubevirt member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
one small nit, otherwise LGTM |
tests/vmi_vsock_test.go
Outdated
tests.RunMigrationAndExpectCompletion(virtClient, migration, tests.MigrationWaitTime) | ||
|
||
By("Ensuring the Vsock is still present") | ||
Expect(console.SafeExpectBatch(vmi, []expect.Batcher{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After considering the comment from @vladikr I am not sure if libvirt would pick up the new vosck CID on the new server. There is a good chance that it kept the one from the old CID (which in this case accidentally matches the new assigned CID).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding after reading vsock man page libvirt will try to retain CID and if it is unavailable on the host after live-migration
it will be assigned with a new CID and it will notify the guest about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this only means that live migration is supported and the hypervisor may have to give it a different cid. It does not mean that you get from the kernel another cid without involving the hypervisor (at least pretty sure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Libvirt does retain the CID if available, which is demonstrated in the test case.
Also the email thread confirms the behavior https://www.spinics.net/lists/netdev/msg631998.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the email thread confirms the behavior https://www.spinics.net/lists/netdev/msg631998.html.
Pefect. That is how it should be. What I did not think about is, that the source libvirt is defining the destionation domain xml during migration. This is a little bit cumbersome for us, since we have to transfer additional info (like here the new CID) back to the source handler, so that we can set it on the target ...
We do that for some info already and the final domain modification is happening at https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-launcher/virtwrap/live-migration-source.go#L257 (you can find it when following the MigrateVMI
grcp call in virt-launcher.
I think our best option is to add the CID to the VMI status and let' virt-controller hand out the CIDs. We don't have a back-flow channel between target-handler and source-handler. It would be similar to the target CPU set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmohr yeah, we will just connect from VM to the host(CID of value 2
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may just be much simpler for you to do it in virt-controller. The status would be your persistent storage and you have an initial sync "guarantee" after the cache sync where you can warm up your in-memory-view of the CIDs allocated. But what you propose will work too. Going with virt-controller just feels more robust and means no possible file-leftovers or filing cleanup steps. You also don't have to change the CID between miagrations and don't have to send that info back-and-forth.
Thanks for the suggestion. This makes sense to me and I also want to simplify things. I will make the corresponding change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to have virt-controller allocate the CIDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the network namespace support in vsock to simplify the cid allocation? From the example, you can see there the 2 VMs are using the same cid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it got never merged in the kernel. We had initially the same thought 💭
Hey. Would it be possible to then get rid of todays guest agent channel, and use vsock there as well? IOW: Can't we avoid to introduce a second channel type, and instead replace the one existing? |
Hm, vsock has security implications, not sure if you read the other pr discussion, so it is at this moment not for everyone. Probably at some point. I would not touch qemu guest agent at this stage. Nothing would speak against extending vsock later on, the implementation is neutral regarding to that. Also we do not automatically attach vsock devices to not break the API. What do you think? |
Thanks for the summary Roman. I can follow your reasoning, yet it is important to find points of alignment/deduplication imo. |
/retest |
@fabiand could you explain what you mean? I am not sure I understand your concern or thought. |
/retest |
99a7126
to
f31d29f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@PiotrProkop: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last tiny request. Then I think we are done.
virt-controller is responsible for allocating the CID for VMIs in the cluster. The CIDs are unique cluster wide, so that the migrated VMs will have the same CIDs. Signed-off-by: Zhuchen Wang <zcwang@google.com>
/lgtm /hold Let's wait for the proposal to be approved and merged. Then we can unhold. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rmohr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold proposal is merged: kubevirt/community#191 (comment) |
/test all |
/retest |
2 similar comments
/retest |
/retest |
@zhuchenwang: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This syncs the seccomp-profile with the latest changes in containerd's profile, applying the same changes as containerd/containerd@17a9324 Some background from the associated ticket: > We want to use vsock for guest-host communication on KubeVirt > (https://github.com/kubevirt/kubevirt). In KubeVirt we run VMs in pods. > > However since anyone can just connect from any pod to any VM with the > default seccomp settings, we cannot limit connection attempts to our > privileged node-agent. > > ### Describe the solution you'd like > We want to deny the `socket` syscall for the `AF_VSOCK` family by default. > > I see in [1] and [2] that AF_VSOCK was actually already blocked for some > time, but that got reverted since some architectures support the `socketcall` > syscall which can't be restricted properly. However we are mostly interested > in `arm64` and `amd64` where limiting `socket` would probably be enough. > > ### Additional context > I know that in theory we could use our own seccomp profiles, but we would want > to provide security for as many users as possible which use KubeVirt, and there > it would be very helpful if this protection could be added by being part of the > DefaultRuntime profile to easily ensure that it is active for all pods [3]. > > Impact on existing workloads: It is unlikely that this will disturb any existing > workload, becuase VSOCK is almost exclusively used for host-guest commmunication. > However if someone would still use it: Privileged pods would still be able to > use `socket` for `AF_VSOCK`, custom seccomp policies could be applied too. > Further it was already blocked for quite some time and the blockade got lifted > due to reasons not related to AF_VSOCK. > > The PR in KubeVirt which adds VSOCK support for additional context: [4] > > [1]: moby#29076 (comment) > [2]: moby@dcf2632 > [3]: https://kubernetes.io/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads > [4]: kubevirt/kubevirt#8546 Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This syncs the seccomp-profile with the latest changes in containerd's profile, applying the same changes as containerd/containerd@17a9324 Some background from the associated ticket: > We want to use vsock for guest-host communication on KubeVirt > (https://github.com/kubevirt/kubevirt). In KubeVirt we run VMs in pods. > > However since anyone can just connect from any pod to any VM with the > default seccomp settings, we cannot limit connection attempts to our > privileged node-agent. > > ### Describe the solution you'd like > We want to deny the `socket` syscall for the `AF_VSOCK` family by default. > > I see in [1] and [2] that AF_VSOCK was actually already blocked for some > time, but that got reverted since some architectures support the `socketcall` > syscall which can't be restricted properly. However we are mostly interested > in `arm64` and `amd64` where limiting `socket` would probably be enough. > > ### Additional context > I know that in theory we could use our own seccomp profiles, but we would want > to provide security for as many users as possible which use KubeVirt, and there > it would be very helpful if this protection could be added by being part of the > DefaultRuntime profile to easily ensure that it is active for all pods [3]. > > Impact on existing workloads: It is unlikely that this will disturb any existing > workload, becuase VSOCK is almost exclusively used for host-guest commmunication. > However if someone would still use it: Privileged pods would still be able to > use `socket` for `AF_VSOCK`, custom seccomp policies could be applied too. > Further it was already blocked for quite some time and the blockade got lifted > due to reasons not related to AF_VSOCK. > > The PR in KubeVirt which adds VSOCK support for additional context: [4] > > [1]: moby#29076 (comment) > [2]: moby@dcf2632 > [3]: https://kubernetes.io/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads > [4]: kubevirt/kubevirt#8546 Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 57b2290) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This syncs the seccomp-profile with the latest changes in containerd's profile, applying the same changes as containerd/containerd@17a9324 Some background from the associated ticket: > We want to use vsock for guest-host communication on KubeVirt > (https://github.com/kubevirt/kubevirt). In KubeVirt we run VMs in pods. > > However since anyone can just connect from any pod to any VM with the > default seccomp settings, we cannot limit connection attempts to our > privileged node-agent. > > ### Describe the solution you'd like > We want to deny the `socket` syscall for the `AF_VSOCK` family by default. > > I see in [1] and [2] that AF_VSOCK was actually already blocked for some > time, but that got reverted since some architectures support the `socketcall` > syscall which can't be restricted properly. However we are mostly interested > in `arm64` and `amd64` where limiting `socket` would probably be enough. > > ### Additional context > I know that in theory we could use our own seccomp profiles, but we would want > to provide security for as many users as possible which use KubeVirt, and there > it would be very helpful if this protection could be added by being part of the > DefaultRuntime profile to easily ensure that it is active for all pods [3]. > > Impact on existing workloads: It is unlikely that this will disturb any existing > workload, becuase VSOCK is almost exclusively used for host-guest commmunication. > However if someone would still use it: Privileged pods would still be able to > use `socket` for `AF_VSOCK`, custom seccomp policies could be applied too. > Further it was already blocked for quite some time and the blockade got lifted > due to reasons not related to AF_VSOCK. > > The PR in KubeVirt which adds VSOCK support for additional context: [4] > > [1]: moby#29076 (comment) > [2]: moby@dcf2632 > [3]: https://kubernetes.io/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads > [4]: kubevirt/kubevirt#8546 Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit 57b2290) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Zhuchen Wang zcwang@google.com
What this PR does / why we need it:
We will have our guest agent running in the VM listening to the Vsock channel. We want to expose a REST endpoint on virt-handler so that our controller can just call the REST API on virt-handler and the virt-handler will forward the stream to the guest agent, similar to vnc and serial.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
This PR implements the same feature as #8531. The major difference is that virt-handler is responsible for allocating the CID in this PR. We had a lot of internal discussion about the attack surface for this feature and that's why we hold the change to upstream. It's glad to see other people has the same requirement. I hope this PR can help the reviewers to understand a slightly different use case and decide which way is good for both cases. Also credit to PiotrProkop, because I took the webhook code and integrate from #8531.
Release note: