Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vsock support to KubeVirt #8546

Merged
merged 1 commit into from Oct 18, 2022
Merged

Add Vsock support to KubeVirt #8546

merged 1 commit into from Oct 18, 2022

Conversation

zhuchenwang
Copy link
Contributor

Signed-off-by: Zhuchen Wang zcwang@google.com

What this PR does / why we need it:

We will have our guest agent running in the VM listening to the Vsock channel. We want to expose a REST endpoint on virt-handler so that our controller can just call the REST API on virt-handler and the virt-handler will forward the stream to the guest agent, similar to vnc and serial.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:
This PR implements the same feature as #8531. The major difference is that virt-handler is responsible for allocating the CID in this PR. We had a lot of internal discussion about the attack surface for this feature and that's why we hold the change to upstream. It's glad to see other people has the same requirement. I hope this PR can help the reviewers to understand a slightly different use case and decide which way is good for both cases. Also credit to PiotrProkop, because I took the webhook code and integrate from #8531.

Release note:

Provides the Vsock feature for KubeVirt VMs.

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Sep 29, 2022
@kubevirt-bot kubevirt-bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL labels Sep 29, 2022
@kubevirt-bot
Copy link
Contributor

Hi @zhuchenwang. Thanks for your PR.

I'm waiting for a kubevirt member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rmohr
Copy link
Member

rmohr commented Sep 29, 2022

/ok-to-test

@kubevirt-bot kubevirt-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 29, 2022
@PiotrProkop
Copy link
Contributor

one small nit, otherwise LGTM

tests.RunMigrationAndExpectCompletion(virtClient, migration, tests.MigrationWaitTime)

By("Ensuring the Vsock is still present")
Expect(console.SafeExpectBatch(vmi, []expect.Batcher{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After considering the comment from @vladikr I am not sure if libvirt would pick up the new vosck CID on the new server. There is a good chance that it kept the one from the old CID (which in this case accidentally matches the new assigned CID).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding after reading vsock man page libvirt will try to retain CID and if it is unavailable on the host after live-migration it will be assigned with a new CID and it will notify the guest about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only means that live migration is supported and the hypervisor may have to give it a different cid. It does not mean that you get from the kernel another cid without involving the hypervisor (at least pretty sure).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Libvirt does retain the CID if available, which is demonstrated in the test case.

Also the email thread confirms the behavior https://www.spinics.net/lists/netdev/msg631998.html.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the email thread confirms the behavior https://www.spinics.net/lists/netdev/msg631998.html.

Pefect. That is how it should be. What I did not think about is, that the source libvirt is defining the destionation domain xml during migration. This is a little bit cumbersome for us, since we have to transfer additional info (like here the new CID) back to the source handler, so that we can set it on the target ...

We do that for some info already and the final domain modification is happening at https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-launcher/virtwrap/live-migration-source.go#L257 (you can find it when following the MigrateVMI grcp call in virt-launcher.

I think our best option is to add the CID to the VMI status and let' virt-controller hand out the CIDs. We don't have a back-flow channel between target-handler and source-handler. It would be similar to the target CPU set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmohr yeah, we will just connect from VM to the host(CID of value 2).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may just be much simpler for you to do it in virt-controller. The status would be your persistent storage and you have an initial sync "guarantee" after the cache sync where you can warm up your in-memory-view of the CIDs allocated. But what you propose will work too. Going with virt-controller just feels more robust and means no possible file-leftovers or filing cleanup steps. You also don't have to change the CID between miagrations and don't have to send that info back-and-forth.

Thanks for the suggestion. This makes sense to me and I also want to simplify things. I will make the corresponding change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to have virt-controller allocate the CIDs.

Copy link
Member

@alicefr alicefr Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the network namespace support in vsock to simplify the cid allocation? From the example, you can see there the 2 VMs are using the same cid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it got never merged in the kernel. We had initially the same thought 💭

@fabiand
Copy link
Member

fabiand commented Sep 30, 2022

Hey. Would it be possible to then get rid of todays guest agent channel, and use vsock there as well?

IOW: Can't we avoid to introduce a second channel type, and instead replace the one existing?

@rmohr
Copy link
Member

rmohr commented Sep 30, 2022

Hey. Would it be possible to then get rid of todays guest agent channel, and use vsock there as well?

IOW: Can't we avoid to introduce a second channel type, and instead replace the one existing?

Hm, vsock has security implications, not sure if you read the other pr discussion, so it is at this moment not for everyone. Probably at some point. I would not touch qemu guest agent at this stage. Nothing would speak against extending vsock later on, the implementation is neutral regarding to that. Also we do not automatically attach vsock devices to not break the API. What do you think?

@fabiand
Copy link
Member

fabiand commented Oct 5, 2022

Thanks for the summary Roman.

I can follow your reasoning, yet it is important to find points of alignment/deduplication imo.

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2022
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2022
@rmohr
Copy link
Member

rmohr commented Oct 10, 2022

/retest

@rmohr
Copy link
Member

rmohr commented Oct 10, 2022

Thanks for the summary Roman.

I can follow your reasoning, yet it is important to find points of alignment/deduplication imo.

@fabiand could you explain what you mean? I am not sure I understand your concern or thought.

@rmohr
Copy link
Member

rmohr commented Oct 11, 2022

/retest

Copy link
Contributor

@PiotrProkop PiotrProkop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kubevirt-bot
Copy link
Contributor

@PiotrProkop: changing LGTM is restricted to collaborators

In response to this:

LGTM

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rmohr
Copy link
Member

rmohr commented Oct 14, 2022

/retest

Copy link
Member

@rmohr rmohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last tiny request. Then I think we are done.

pkg/virt-controller/watch/vsock_test.go Show resolved Hide resolved
virt-controller is responsible for allocating the CID for VMIs in the cluster.
The CIDs are unique cluster wide, so that the migrated VMs will have the
same CIDs.

Signed-off-by: Zhuchen Wang <zcwang@google.com>
@rmohr
Copy link
Member

rmohr commented Oct 14, 2022

/lgtm
/approve

/hold

Let's wait for the proposal to be approved and merged. Then we can unhold.

@kubevirt-bot kubevirt-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Oct 14, 2022
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rmohr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 14, 2022
@rmohr
Copy link
Member

rmohr commented Oct 18, 2022

/unhold

proposal is merged: kubevirt/community#191 (comment)

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 18, 2022
@rmohr
Copy link
Member

rmohr commented Oct 18, 2022

/test all

@rmohr
Copy link
Member

rmohr commented Oct 18, 2022

/retest

2 similar comments
@PiotrProkop
Copy link
Contributor

/retest

@rmohr
Copy link
Member

rmohr commented Oct 18, 2022

/retest

@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented Oct 18, 2022

@zhuchenwang: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-check-tests-for-flakes f63babd link false /test pull-kubevirt-check-tests-for-flakes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kubevirt-bot kubevirt-bot merged commit cc2f93d into kubevirt:main Oct 18, 2022
thaJeztah added a commit to thaJeztah/docker that referenced this pull request Dec 1, 2022
This syncs the seccomp-profile with the latest changes in containerd's
profile, applying the same changes as containerd/containerd@17a9324

Some background from the associated ticket:

> We want to use vsock for guest-host communication on KubeVirt
> (https://github.com/kubevirt/kubevirt). In KubeVirt we run VMs in pods.
>
> However since anyone can just connect from any pod to any VM with the
> default seccomp settings, we cannot limit connection attempts to our
> privileged node-agent.
>
> ### Describe the solution you'd like
> We want to deny the `socket` syscall for the `AF_VSOCK` family by default.
>
> I see in [1] and [2] that AF_VSOCK was actually already blocked for some
> time, but that got reverted since some architectures support the `socketcall`
> syscall which can't be restricted properly. However we are mostly interested
> in `arm64` and `amd64` where limiting `socket` would probably be enough.
>
> ### Additional context
> I know that in theory we could use our own seccomp profiles, but we would want
> to provide security for as many users as possible which use KubeVirt, and there
> it would be very helpful if this protection could be added by being part of the
> DefaultRuntime profile to easily ensure that it is active for all pods [3].
>
> Impact on existing workloads: It is unlikely that this will disturb any existing
> workload, becuase VSOCK is almost exclusively used for host-guest commmunication.
> However if someone would still use it: Privileged pods would still be able to
> use `socket` for `AF_VSOCK`, custom seccomp policies could be applied too.
> Further it was already blocked for quite some time and the blockade got lifted
> due to reasons not related to AF_VSOCK.
>
> The PR in KubeVirt which adds VSOCK support for additional context: [4]
>
> [1]: moby#29076 (comment)
> [2]: moby@dcf2632
> [3]: https://kubernetes.io/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads
> [4]: kubevirt/kubevirt#8546

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit to thaJeztah/docker that referenced this pull request Dec 1, 2022
This syncs the seccomp-profile with the latest changes in containerd's
profile, applying the same changes as containerd/containerd@17a9324

Some background from the associated ticket:

> We want to use vsock for guest-host communication on KubeVirt
> (https://github.com/kubevirt/kubevirt). In KubeVirt we run VMs in pods.
>
> However since anyone can just connect from any pod to any VM with the
> default seccomp settings, we cannot limit connection attempts to our
> privileged node-agent.
>
> ### Describe the solution you'd like
> We want to deny the `socket` syscall for the `AF_VSOCK` family by default.
>
> I see in [1] and [2] that AF_VSOCK was actually already blocked for some
> time, but that got reverted since some architectures support the `socketcall`
> syscall which can't be restricted properly. However we are mostly interested
> in `arm64` and `amd64` where limiting `socket` would probably be enough.
>
> ### Additional context
> I know that in theory we could use our own seccomp profiles, but we would want
> to provide security for as many users as possible which use KubeVirt, and there
> it would be very helpful if this protection could be added by being part of the
> DefaultRuntime profile to easily ensure that it is active for all pods [3].
>
> Impact on existing workloads: It is unlikely that this will disturb any existing
> workload, becuase VSOCK is almost exclusively used for host-guest commmunication.
> However if someone would still use it: Privileged pods would still be able to
> use `socket` for `AF_VSOCK`, custom seccomp policies could be applied too.
> Further it was already blocked for quite some time and the blockade got lifted
> due to reasons not related to AF_VSOCK.
>
> The PR in KubeVirt which adds VSOCK support for additional context: [4]
>
> [1]: moby#29076 (comment)
> [2]: moby@dcf2632
> [3]: https://kubernetes.io/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads
> [4]: kubevirt/kubevirt#8546

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 57b2290)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit to thaJeztah/docker that referenced this pull request Dec 1, 2022
This syncs the seccomp-profile with the latest changes in containerd's
profile, applying the same changes as containerd/containerd@17a9324

Some background from the associated ticket:

> We want to use vsock for guest-host communication on KubeVirt
> (https://github.com/kubevirt/kubevirt). In KubeVirt we run VMs in pods.
>
> However since anyone can just connect from any pod to any VM with the
> default seccomp settings, we cannot limit connection attempts to our
> privileged node-agent.
>
> ### Describe the solution you'd like
> We want to deny the `socket` syscall for the `AF_VSOCK` family by default.
>
> I see in [1] and [2] that AF_VSOCK was actually already blocked for some
> time, but that got reverted since some architectures support the `socketcall`
> syscall which can't be restricted properly. However we are mostly interested
> in `arm64` and `amd64` where limiting `socket` would probably be enough.
>
> ### Additional context
> I know that in theory we could use our own seccomp profiles, but we would want
> to provide security for as many users as possible which use KubeVirt, and there
> it would be very helpful if this protection could be added by being part of the
> DefaultRuntime profile to easily ensure that it is active for all pods [3].
>
> Impact on existing workloads: It is unlikely that this will disturb any existing
> workload, becuase VSOCK is almost exclusively used for host-guest commmunication.
> However if someone would still use it: Privileged pods would still be able to
> use `socket` for `AF_VSOCK`, custom seccomp policies could be applied too.
> Further it was already blocked for quite some time and the blockade got lifted
> due to reasons not related to AF_VSOCK.
>
> The PR in KubeVirt which adds VSOCK support for additional context: [4]
>
> [1]: moby#29076 (comment)
> [2]: moby@dcf2632
> [3]: https://kubernetes.io/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads
> [4]: kubevirt/kubevirt#8546

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 57b2290)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants