New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Vhostuser implementation #3208
Conversation
|
Hi @krsacme. Thanks for your PR. I'm waiting for a kubevirt member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1 similar comment
|
Hi @krsacme. Thanks for your PR. I'm waiting for a kubevirt member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Hi @krsacme. Thanks for your PR. I'm waiting for a kubevirt member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test all |
|
@krsacme Great to see this PR, Saravanan, thanks. |
|
@krsacme: The following tests failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Thanks a lot for the contribution @krsacme! The code looks okay, but we need to make sure this feature is understood, tested and maintainable. In general, we don't ship code that does not have test coverage. I'd like to ask you to add following:
|
|
Please also remember the user-guide |
|
thanks @vladikr @dankenigsberg @phoracek @fabiand for the inputs. Now that I have the consensus on the approach, I will add the required items specified above and remove changes like unit >> uint32 to a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
There's a lot going on this this PR. I generally understand the basic intent of what is being accomplished here, but the approach has a couple of concerning elements in it. Specifically the exposing of the VMI pod to the /var/lib/vhost_sockets/ and /var/run/openvswitch/ host paths.
Can we get some more information about why this is necessary, and why a CNI plugin can't perform this privileged access on the PODs behalf in a more controlled way?
Now that I have the consensus on the approach, I will add the required items specified above and remove changes like unit >> uint32 to a separate PR.
Is there some sort of design document around this? It's unclear to me what the consensus is you're talking about. Maybe I'm missing a conversation?
|
@davidvossel thanks for you comments. Here are the reason for these 2 host directories:
I was referring to the consensus on the direction of the apporach, didn't wanted to contiune on the wrong direction. I have added the doc now with detailed steps. @phoracek I have added the steps in the docs directory, which I have been using on the baremetal deployment. |
OvS is acting as a client to the vhost socket setup by the qemu process, right? If we mount an EmptyDir named What we're trying to avoid here is a shared hostpath volume containing vhost sockets accessible to all VMI pods on the host. The technique above would avoid hostpath entirely, which is ideal if possible. |
Yes,
This looks like feasible, I will try this approach and will update accordingly. Thanks. |
|
I had trouble in using the complete path as it is for the As per the man pages, the size for storing the path in I am going to try to create a link the above mentioned directory to a local directory something like |
that's really frustrating. I've hit the same limit a few times as well.
If OvS is managed as a Pod, you can expose /var/lib/kubelet/pods as a HostPath volume to OvS in the Pod definition and specify the volumeMount's MountPath as something simple like maybe that's what you're already doing? |
The end goal is to run OvS with DPDK inside a POD, but it is not acheived yet with this PR. Running OvS on the host or POD does not have any impact with this PR. Currently, with this PR in userspace cni, an intermediate directory |
54b98cf
to
e9bd87a
Compare
|
Is this PR OK to merge now?@davidvossel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not code related to the following changes
libvirt container image changes
Added openvswitch package
Removed explicit user and group config in qemu.conf,
as it is default, so that group can be overridden
|
@wavezhang: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Saravanan KR <skramaja@redhat.com>
Integrated userspace cni for ovsdpdk type. The vhostuser
socket will be shared between kubevirt VM and openvswitch
with DPDK running on the host. In order to support
vhostuser, shared memory is required. NUMA cells are
added with share memory support. Added app-netutil
in order to get the vhostuser interface details from
the annotations, which will be updated by userspace
cni.
Run openvswitch in host as openvswitch:hugetlbfs (default)
Create user qemu with uid 107 for vhostsocket directory
Create vhostsocket directory
mkdir -p /var/lib/vhost_sockets
touch /var/lib/vhost_sockets/touched
chown qemu:hugetlbfs -R /var/lib/vhost_sockets/
chmod g+r -R /var/lib/vhost_sockets/
Build userspace cni
libvirt container image changes
Added openvswitch package
Removed explicit user and group config in qemu.conf,
as it is default, so that group can be overridden
Create VMI with interface of type vhostuser
VM XML with vhostuser
---------------------
<interface type='vhostuser'>
<mac address='52:54:00:df:73:da'/>
<source type='unix' path='/var/lib/cni/usrspcni/6390c723f122-net1' mode='server'/>
<target dev='6390c723f122-net1'/>
<model type='virtio'/>
<driver rx_queue_size='1024' tx_queue_size='1024'/>
<alias name='ua-vhost-user-net-2'/>
<address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
</interface>
VM XML with sharemem
--------------------
<cpu mode='custom' match='exact' check='full'>
...
<numa>
<cell id='0' cpus='0-7' memory='8388608' unit='KiB' memAccess='shared'/>
</numa>
</cpu>
Signed-off-by: Saravanan KR <skramaja@redhat.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| } else if iface.Vhostuser != nil { | ||
| domainIface.Type = "vhostuser" | ||
| interfaceName := GetPodInterfaceName(networks, cniNetworks, iface.Name) | ||
| vhostPath, vhostMode, err := getVhostuserInfo(interfaceName, c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to set MAC address here if CNI returns a mac.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
userspace cni does not provide mac address. It is not possible to validate it without userspace cni. In this iteration, the mac address is left with libvirt auto generated. In the next iteration, the support will be added in both cni and kubevrit to support custom mac address.
|
@vladikr @davidvossel @phoracek - I know this is a big patch to review, do you suggest that this can be broken in to smaller patches for easier review? |
|
@krsacme: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@krsacme please assign/cc me on you future PRs. |
|
@davidvossel @wavezhang @phoracek @SchSeba @krsacme @cijohnson Hi,
Could you please confirm that after doing these things you don't see reason why not to merge the functionality? Before i consider investing effort i want to know whether there are no other known (but not written down) problems preventing merge (i.e. some qemu incompatibility, developer decision not to have it,...). @krsacme If i can ask, do you have any particular reasons why you stopped implementing this feature? Any kind of stopper/blockers that i can eventually hit too? Also did you make the topology working locally at any point (i.e. before the PR split)? PS: sorry if asking in closed PR is not the right place (do you want me to do question issue or something?). Feel free also to add people to discussion/reply if needed. |
AFAIU, you're right. You're welcome to bring this topic to the KubeVirt community call; I'd love to hear about it there.
|
@fgschwan There is no technical reason in stopping it other than the traction and efforts required to integrate it with I was focussing on using vhostuser ports as extra networks, the default network is not changed. I was able to get it working on the bare-metal servers end to end. And I had some working version on the kubevrtici (deployment works but ping was not working in kubevirtci local deployment, need to do dig further to find the exact reason). We all know that this feature will be primarily used in bare-metal servers but making it working kubevirtci environment (kubevirt VM running in a container inside worker VM node) which is mandatory to get the changes merged was challenging and it requires efforts to complete it. Let me know if any specific things need to be discussed regarding it. |
|
@fgschwan I'd just want to point out that I do see value in this proposal, and re-starting this effort. This is extremely valuable for single node clusters / or traffic patterns where inner-node east west traffic are more common. Did you summarize your use cases ? |
|
@maiqueb I provided some use case info in kubevirt dev mailing thread (https://groups.google.com/g/kubevirt-dev/c/opZPTUNsoMQ/m/nHgRMrsKAgAJ). However i can add some more information here. I want to implement service chaining (also called service function chaining/SFC - https://www.sdxcentral.com/data-center/virtualization/definitions/what-is-network-service-chaining/). This is standard solution how to dynamically create path between chosen network functions (1 container/1VM container=1 network function) is SDN(software defined network). Ideally all functions should be use container natively, however some software is hard to containerize and therefore it is running as VM. Hence usage of Kubevirt. And i need for the dataplane high performance networking (for control/management plane is ok to use any interface that Kubevirt already supports). So the main one node use case should look like this:
There can be multiple variations to this use case, including but not limited to use various combinations of container-native and VM-container services, to use multiple k8s nodes (ideally to keep as much related services in one node). From perspective of Kubevirt implementation, this is not that important because Kubevirt will not interact with all those components. It will interract only with VPP (usespace cni that will export vhost socket). Therefore kubevirtci test should probably be only pinging between VPP and kubevirt container VM by using vhost interface. However, i understand, that from overall perspective, the use case is really relevant. |
|
I had missed your email !! Thanks for kicking this forward :) |
Closed: Its being reworked as smaller PRs. Tracking list
What this PR does / why we need it:
Add the implementation to use OvS-DPDK with kubevirt VM
Special notes for your reviewer:
Integrated userspace cni for ovsdpdk type. The vhostuser
socket will be shared between kubevirt VM and openvswitch
with DPDK running on the host. In order to support
vhostuser, shared memory is required. NUMA cells are
added with share memory support. Added app-netutil
in order to get the vhostuser interface details from
the annotations, which will be updated by userspace
cni.
* mkdir -p /var/lib/vhost_sockets
* touch /var/lib/vhost_sockets/touched
* chown qemu:hugetlbfs -R /var/lib/vhost_sockets/
* chmod g+r -R /var/lib/vhost_sockets/
* Added openvswitch package
* Removed explicit user and group config in qemu.conf,
as it is default, so that group can be overridden
VM XML with vhostuser
VM XML with sharedmem