Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine tune selector based on VF PCI address needed #157

Closed
ark-g opened this issue Aug 8, 2019 · 20 comments
Closed

Fine tune selector based on VF PCI address needed #157

ark-g opened this issue Aug 8, 2019 · 20 comments

Comments

@ark-g
Copy link
Contributor

ark-g commented Aug 8, 2019

I have a situation when the existing selectors for Mellanox NIC is not enough to define groups of VF's, so I think it is worth to define additional selector that will allow to choose specific VF from each physical port to some group/configuration.
Issue:

  1. For each PF I need to allocate one VF (both these VF's make bond interface inside container).
  2. There are two types of containers:
    • First type uses regular VF for creating regular bond interface inside container
    • Second type uses VF for DPDK (also two VF from each PF per container)

For Intel NIC I am doing bind VF's to different drivers (igb_uio and i40evf) before running sriov-network-device-plugin, and then, using "driver" selector to split VF's to different groups.
For Mellanox NIC it is not possible, since it uses the same driver for both DPDK and regular interfaces.
So I suggest to have a 'fine-tune' selector to choose specific VF PCI addresses for each group/configuration, but any other solution/advice is also welcome.
Thanks.

@zshi-redhat
Copy link
Collaborator

what about using pfNames selector?
For example:

  1. using pfNames selector to group VFs from same PF into a separate resource
  2. have pod request two resources get VFs from two PFs
  3. bind VFs inside pod.

@ark-g
Copy link
Contributor Author

ark-g commented Aug 8, 2019

@zshi-redhat,
Thanks for response.

Please find below my configuration for Intel NIC.
I do not see how to build the equal configuration for Mellanox NIC.
AFAIK when you specify pfNames without additional selector the plugin will take all VF's of this PF to the group.

{
   "resourceList":
   [
      {
         "resourceName": "sriov_netpf0",
         "isRdma": false,
         "sriovMode": true,
         "deviceType": "netdevice",
         "selectors": {
            "drivers": ["i40evf"],
            "pfNames": ["netpf0"]
         }
      },
      {
         "resourceName": "sriov_netpf1",
         "isRdma": false,
         "sriovMode": true,
         "deviceType": "netdevice",
         "selectors": {
            "drivers": ["i40evf"],
            "pfNames": ["netpf1"]
         }
      },
      {
         "resourceName": "sriov_dpdk_netpf0",
         "isRdma": false,
         "sriovMode": false,
         "deviceType": "uio",
         "selectors": {
            "drivers": ["igb_uio"],
            "pfNames": ["netpf0"]
         }
      },
      {
         "resourceName": "sriov_dpdk_netpf1",
         "isRdma": false,
         "sriovMode": false,
         "deviceType": "uio",
         "selectors": {
            "drivers": ["igb_uio"],
            "pfNames": ["netpf1"]
         }
      }

   ]
}

@moshe010
Copy link
Contributor

moshe010 commented Aug 8, 2019

@zshi-redhat , They are using the driver of the VFs to select what resource it belongs. So if you have one PF and they half VF to be DPDK and half to netdevice, with intel is simple because they have dpdk driver and netdevice vf driver. With Mellanox dpdk and netdevice are using the same driver so we don't have away to differentiate between them. I think we should have selector which is regex on VF interface names. Something like vfName: eth[0-4]

@ark-g
Copy link
Contributor Author

ark-g commented Aug 8, 2019

@moshe010
There many schemes for interface naming and the naming itself is depended from system definitions, so I think it is more robust to use selector based on PCI addresses,
like: "pciAddresses" : [ "18:02.*", "18:0c.*" ], but your suggestion will also be good for my case.

@moshe010
Copy link
Contributor

moshe010 commented Aug 8, 2019

@ark-g, we don't want to do pci address config, because this is across all servers. I would expect you will put a udev rule to make sure the VF name are consistent across all servers

@ark-g
Copy link
Contributor Author

ark-g commented Aug 8, 2019

@moshe010 , you are right that it is possible, but it requires additional system configuration in udev, that I would like to avoid.

@zshi-redhat
Copy link
Collaborator

zshi-redhat commented Aug 9, 2019

@moshe010 thanks for the explanation!
@ark-g may I know why you need to split VFs from the same PF if all the VFs can be used equally?
From the pod side, it can always request the same Mellanox NIC resource no matter the application running inside the pod is using dpdk or kernel device.
Is it because the application uses resource name to know which mode(kernel, dpdk) it should run?

@ark-g
Copy link
Contributor Author

ark-g commented Aug 9, 2019

@zshi-redhat
You are right, for Mellanox NIC POV there is no difference what is the use case: DPDK or regular interface. The issue started when the same code should support both NIC's (Intel and Mellanox).
Intel approach forces me to split VFs into two groups, since when interface used as DPDK then it is not interface anymore from Linux perspective and I need to have complex logic for managing VF's (bind/unbind) or have statically two pools of VF's + two types of resources.

@moshe010
Copy link
Contributor

moshe010 commented Aug 9, 2019

@ark-g, Why intel approach force to split it you can just specific "drivers": ["igb_uio", "i40evf"] also you don't need the "sriovMode" and "deviceType" it was in the old config.

@ark-g
Copy link
Contributor Author

ark-g commented Aug 9, 2019

@moshe010
Interesting idea!
But in this case how the internal logic on sriov-network-device plugin will know that for pods of type A need to allocate only the VF's binded to i40evf and for pods of type B needs to allocate the VF's binded to igb_uio driver ?

@moshe010
Copy link
Contributor

I think the allocation is done by kubelet, so this won't help you. (I guess)

@zshi-redhat
Copy link
Collaborator

@ahalim-intel any thoughts on this?

@moshe010
Copy link
Contributor

just to provide more context we talked extending the pfName selector to support also the following:
"pfNames": ["enp0s0f0:0-7","enp2s2f1:8-15"] What do you think?

@ark-g
Copy link
Contributor Author

ark-g commented Aug 20, 2019

@moshe010
I think this is simple and good enough to provide needed flexibility for VF pool declaration.
Please pay attention that in your example the "enp0s0f0:0" is a possible alias of an interface name.
I think that separator of <PFName> and <VFIndex> fields should be '@' or '#' or '$' to avoid
confusion with interface name, like :<PFName>#<VFIndexStart>-<VFIndexEnd>
where <VFIndexStart> is less or equal to <VFIndexEnd>

@moshe010
Copy link
Contributor

yes, it make sense

@ahalimx86
Copy link
Collaborator

@ark-g @zshi-redhat

Interesting idea!
But in this case how the internal logic on sriov-network-device plugin will know that for pods of type A need to allocate only the VF's binded to i40evf and for pods of type B needs to allocate the VF's binded to igb_uio driver ?

If I understand the use case correctly, the combination if "pfNames" and "drivers" will let you define your resource pool. You may have VFs under same PF either kernel driver or dpdk driver, device plugin will pick the right VF based on the selectors criteria.

Once you have your resource pool configured correctly then it's up to you decide which pods gets what through resource requests or net CRDs.

@ark-g
Copy link
Contributor Author

ark-g commented Aug 20, 2019

@ahalim-intel
Yes, absolutely correct, in case if Intel NIC I have selector based on "driver".

@ark-g
Copy link
Contributor Author

ark-g commented Aug 21, 2019

If there are no conceptual objections, then should I start to prepare POC code ?

@zshi-redhat
Copy link
Collaborator

@ark-g @zshi-redhat

Interesting idea!
But in this case how the internal logic on sriov-network-device plugin will know that for pods of type A need to allocate only the VF's binded to i40evf and for pods of type B needs to allocate the VF's binded to igb_uio driver ?

If I understand the use case correctly, the combination if "pfNames" and "drivers" will let you define your resource pool. You may have VFs under same PF either kernel driver or dpdk driver, device plugin will pick the right VF based on the selectors criteria.

@ahalim-intel I understand this is to split VFs from same PF into separate resource groups even if VFs are bind to same driver.

Once you have your resource pool configured correctly then it's up to you decide which pods gets what through resource requests or net CRDs.

@ahalimx86
Copy link
Collaborator

closing - feature has been implemented #165

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants