Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add vsock support to KubeVirt #191

Merged
merged 3 commits into from
Oct 18, 2022
Merged

Conversation

rmohr
Copy link
Member

@rmohr rmohr commented Oct 14, 2022

Proposal to add vsock support to KubeVirt.

gdoc which was used for initial discussions: https://docs.google.com/document/d/1KfFsO6jKvie9EGMl9FOMtfqAYAMUFOTl2nIup14besA/edit

Signed-off-by: Roman Mohr <rmohr@google.com>
@kubevirt-bot kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Oct 14, 2022
@rmohr
Copy link
Member Author

rmohr commented Oct 14, 2022

/cc @rthallisey
/cc @vladikr

Signed-off-by: Roman Mohr <rmohr@google.com>
@rthallisey
Copy link
Contributor

/lgtm
Reviewed in the gdoc. Proposal lgtm.

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 14, 2022

#### API changes

A `autoattachVsock` flag will be added to the VMI spec at `spec.domain.devices` which defaults to `false` to stay backward compatible. Also a `VsockCID` field will be added to the VMI status as `status.vsockCID` for CID allocation and lookup.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any chance that we'll need more fields associated with vsock other that "true|false" for attachment? if so, maybe we shouldn't use a boolean.

Copy link
Member Author

@rmohr rmohr Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libvirt has three fields, one for auto-cid selection, one for the virtio model and one for manual cid assignment. Model has to be in sync with other virtio devices and auto-assignment of the CID does not make sense in our design. We could not come up with additional use-cases where more fields would have to be on the VMI. For e.g. confidential computing I would expect that vsock as a whole has to be disabled, but that's about it ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you think about something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could that be that we will need more than one vsock per guest at some point?

Copy link
Member Author

@rmohr rmohr Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it is technically possible to attach multiple vsock devices, but guest drivers don't have a concept of multiple vsock devices. See for instance https://bugzilla.redhat.com/show_bug.cgi?id=1455015.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. I thought I read about multiple transports with vsock, but I might be wrong.
I guess we won't be able to have both the qemu guest agent and another agent using vsocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vsock device has ports like TCP/IP. We can have a lot of agents over the single device. That is probably what you read about.

@vladikr
Copy link
Member

vladikr commented Oct 14, 2022

/lgtm

@davidvossel
Copy link
Member

/approve
/hold

The community has already been discussing this design (in google doc form) for a few days before this pr was created, and no strong objections have been raised. I put a hold on the PR simply to give more people time to comment on this before it's merged. If there are no new open discussions by next Tuesday (Oct 18th), then I think it's reasonable to remove the hold.

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 14, 2022
@kubevirt-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidvossel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 14, 2022

* The whole `vsock` features will live behind a feature gate
* By default the first 1024 ports of a vsock device are privileged. Services trying to connect to those require `CAP_NET_BIND_SERVICE`.
* We aim to block the `AF_VSOCK` `socket` syscall in containerd (https://github.com/containerd/containerd/issues/7442), however if that will happen is not sure. It is therefore right now the responsibility of the vendor to apply the required seccomp change to all unprivileged ports, or to verify that the `CAP_NET_BIND_SERVICE` privilege is handed out carefully enough
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you choose containerd? I think containers/common would also benefit and might even reach a broader spectrum of users. (Kubervirt itself doesn't use containerd for some time...)

Copy link
Member Author

@rmohr rmohr Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes KubeVirt uses cri-o in CI, but we use containerd and KubeVirt is not designed to only work with a specific CRI. This is an independent effort of this proposal to secure it for our purpose. I think it makes a lot of sense to also push this to other popular CRIs, but it is out of scope of this proposal :). But I explicitly wanted to mention this so that it can easily be picked up for other CRIs if there is interest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I read the proposal as a community - meaning we == community, and in that sense, it would make more sense to aim for changes in crio(therefore containers/common). I can pick this up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see the confusion now regarding to the usage of we. 👍

### Possible next steps

* Allow running `qemu-guest-agent` over virtio-vsock too
* Introduce a general purpose mechanism via gRPC in virt-handler which allows efficient and scalable collection of version and readiness of registered third-party agents without having to alter virt-handler, or having to run additional node-local services. Also move the `qemu-guest-agent` polling behind this mechanism.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit qemu-quest-agent use case is mentioned above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what you mean? Running qemu-guest-agent on vsock is an explicit non-goal, but I wanted to outline how a possible future integration in a more general guest-agent mechanism could look like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two points:

  1. Allow running qemu-guest-agent over virtio-vsock too
  2. end of the point "Also move the qemu-guest-agent polling behind this mechanism.
    " <- feels redundant

Copy link
Member Author

@rmohr rmohr Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, they are two different points though:

  • This proposal does not change code-paths in virt-handler to allow running the qemu-guest-agent via vsock, that is the first point.
  • Another point would be introducing a common polling mechanism, where no extra handling for the qemu-guest-agent would be there, it would use a common extensible mechanism.

That are just some rough ideas to highlight that while qemu-guest-agent is not tackled in this proposal, it is not in the way of doing so later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I got you. I still feel like the last part of the second point is redundant. Your comment explained it very well, maybe we can incorporate it into the points?

 * Remove ambiguity of who is meant with "We" in the security section
 * Better explain how future steps beyond this proposal could look like

Signed-off-by: Roman Mohr <rmohr@google.com>
@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2022
@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2022
@rmohr
Copy link
Member Author

rmohr commented Oct 18, 2022

/unhold

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 18, 2022
@kubevirt-bot kubevirt-bot merged commit 45220b7 into kubevirt:main Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants