Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass DeviceID to each plugin in configuration list #448

Merged

Conversation

adrianchiris
Copy link
Contributor

Until today, it was hardcoded that DeviceID will only be
injected for the first CNI in the chain.

This commit modifies multus to pass DeviceID to each network
configuration element in a network configuration list.
This will allow multiple CNI's to act on DeviceID when CNI
plugins are being chained for a specific network.

The change is required to allow CNI's to ensure network
isolation (introduced in kernel >= 5.2.0 see [1]) for RDMA devices
when exist.

e.g for SR-IOV network:
sriov-cni moves network device associated with the provided DeviceID
to to the container's network namespace.
An "RDMA cni" would do the same for the corresponding RDMA device when
RDMA traffic is desired on the network.

[1] https://patchwork.kernel.org/cover/10810451/

Until today, it was hardcoded that DeviceID will only be
injected for the first CNI in the chain.

This commit modifies multus to pass DeviceID to each network
configuration element in a network configuration list.
This will  allow multiple CNI's to act on DeviceID when CNI
plugins are being chained for a specific network.

The change is required to allow CNI's to ensure network
isolation (introduced in kernel >= 5.2.0 see [1]) for RDMA devices
when exist.

e.g for SR-IOV network:
sriov-cni moves network device associated with the provided DeviceID
to to the container's network namespace.
An "RDMA cni" would do the same for the corresponding RDMA device when
RDMA traffic is desired on the network.

[1] https://patchwork.kernel.org/cover/10810451/
@dougbtv
Copy link
Member

dougbtv commented Mar 5, 2020

General direction looks good to me, and makes sense -- looks like this fills a gap that we're missing by only catching the first one.

Any input from @zshi-redhat or @ahalim-intel from an SR-IOV point of view?

@zshi-redhat
Copy link
Contributor

zshi-redhat commented Mar 10, 2020

I agree with Doug on this is a reasonable change.

I got one question regarding the example:

e.g for SR-IOV network:
sriov-cni moves network device associated with the provided DeviceID
to to the container's network namespace.
An "RDMA cni" would do the same for the corresponding RDMA device when
RDMA traffic is desired on the network.

when using RoCE mode, we only need one VF device in pod/container which is moved to pod net namespace by sriov-cni, how above example differs from RoCE usage? I assume the application will need two device specs or descriptors in pod, one is for normal VF net dev spec, the other is for infiniband dev spec?

@adrianchiris
Copy link
Contributor Author

adrianchiris commented Mar 10, 2020

To send RDMA traffic you need an RDMA device(regardless of the underlying protocol e.g RoCE, Infiniband, iWARP)

Today RDMA subsystem network namespace awareness mode is set to shared so moving an RDMA device to the container's network namespace is not needed (essentially RDMA devices are visible)

if Kernel RDMA subsystem network namespace awareness mode is set to exclusive you will need to move the relevant RDMA device to the container network namespace to perform RDMA traffic.
For RoCE you would then chain sriov cni with rdma cni.

The mounts under /dev/infiniband are still needed but are not related to CNI.

@zshi-redhat
Copy link
Contributor

To send RDMA traffic you need an RDMA device(regardless of the underlying protocol e.g RoCE, Infiniband, iWARP)

Today RDMA subsystem network namespace awareness mode is set to shared so moving an RDMA device to the container's network namespace is not needed (essentially RDMA devices are visible)

if Kernel RDMA subsystem network namespace awareness mode is set to exclusive you will need to move the relevant RDMA device to the container network namespace to perform RDMA traffic.
For RoCE you would then chain sriov cni with rdma cni.

So for using RDMA in exclusive mode, there will be one additional device be moved to container namespace by RDMA CNI? I'm curious about what does this device look like.

@adrianchiris
Copy link
Contributor Author

adrianchiris commented Mar 10, 2020

you just run rdma tool on a setup with RDMA capable devices

root# rdma dev show
0: mlx5_0: node_type ca fw 12.27.0396 node_guid e41d:2d03:0061:f5fa sys_image_guid e41d:2d03:0061:f5fa
1: mlx5_1: node_type ca fw 12.27.0396 node_guid e41d:2d03:0061:f5fb sys_image_guid e41d:2d03:0061:f5fa
2: mlx4_0: node_type ca fw 2.42.5000 node_guid e41d:2d03:002d:8d90 sys_image_guid e41d:2d03:002d:8d93
67: mlx5_2: node_type ca fw 12.27.0396 node_guid 0000:0000:0000:0000 sys_image_guid e41d:2d03:0061:f5fa
68: mlx5_3: node_type ca fw 12.27.0396 node_guid 0000:0000:0000:0000 sys_image_guid e41d:2d03:0061:f5fa
69: mlx5_4: node_type ca fw 12.27.0396 node_guid 0000:0000:0000:0000 sys_image_guid e41d:2d03:0061:f5fa
70: mlx5_5: node_type ca fw 12.27.0396 node_guid 0000:0000:0000:0000 sys_image_guid e41d:2d03:0061:f5fa
71: mlx5_6: node_type ca fw 12.27.0396 node_guid 0000:0000:0000:0000 sys_image_guid e41d:2d03:0061:f5fa
72: mlx5_7: node_type ca fw 12.27.0396 node_guid 0000:0000:0000:0000 sys_image_guid e41d:2d03:0061:f5fa
73: mlx5_8: node_type ca fw 12.27.0396 node_guid 0000:0000:0000:0000 sys_image_guid e41d:2d03:0061:f5fa
74: mlx5_9: node_type ca fw 12.27.0396 node_guid 0000:0000:0000:0000 sys_image_guid e41d:2d03:0061:f5fa

@adrianchiris
Copy link
Contributor Author

adrianchiris commented Mar 10, 2020

So for using RDMA in exclusive mode, there will be one additional device be moved to container namespace by RDMA CNI?

exactly

@zshi-redhat
Copy link
Contributor

So for using RDMA in exclusive mode, there will be one additional device be moved to container namespace by RDMA CNI?

exactly

thanks! this looks good tome.

@s1061123
Copy link
Member

@zshi-redhat thank you for the review! LGTM, so let's merge it tomorrow (to wait more q).

@adrianchiris
Copy link
Contributor Author

adrianchiris commented Mar 12, 2020

@s1061123 @dougbtv can we merge it ? are there any open questions ?

@s1061123 s1061123 merged commit 5577822 into k8snetworkplumbingwg:master Mar 12, 2020
@s1061123
Copy link
Member

@adrianchiris merged, thank you so much for that!

@adrianchiris
Copy link
Contributor Author

Thank you :)

@adrianchiris
Copy link
Contributor Author

@dougbtv @s1061123 is there a release scheduled for Multus any time soon (not sure about the release schedule for this project)?

would you like me to backport this change to a release branch ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants