Skip to content

Latest commit

 

History

History
191 lines (156 loc) · 8.63 KB

py-ibverbs.rst

File metadata and controls

191 lines (156 loc) · 8.63 KB

Support for ibverbs

Receiver performance can be significantly improved by using the Infiniband Verbs API instead of the BSD sockets API. This is currently only tested on Linux with ConnectX® NICs. It depends on device managed flow steering (DMFS).

There are a number of limitations in the current implementation:

  • Only IPv4 is supported.
  • VLAN tagging, IP optional headers, and IP fragmentation are not supported.
  • For sending, only multicast is supported.

Within these limitations, it is quite easy to take advantage of this faster code path. The main difficulties are that one must specify the IP address of the interface that will send or receive the packets, and that the CAP_NET_RAW capability may be needed. The netifaces2 module can help find the IP address for an interface by name, and the spead2_net_raw tool simplifies the process of getting the CAP_NET_RAW capability.

System configuration

ConnectX®-3

Add the following to /etc/modprobe.d/mlnx.conf:

options ib_uverbs disable_raw_qp_enforcement=1
options mlx4_core fast_drop=1
options mlx4_core log_num_mgm_entry_size=-1

Note

Setting log_num_mgm_entry_size to -7 instead of -1 will activate faster static device-managed flow steering. This has some limitations (refer to the manual for details), but can improve performance when capturing a large number of multicast groups.

ConnectX®-4+, MLNX OFED up to 4.9

Add the following to /etc/modprobe.d/mlnx.conf:

options ib_uverbs disable_raw_qp_enforcement=1

All other cases

No system configuration is needed, but the CAP_NET_RAW capability is required. Running as root will achieve this; a full discussion of Linux capabilities is beyond the scope of this manual. The spead2_net_raw utility can also be used to give users access to this capability without exposing full root access. For more information, see the libvma documentation.

Multicast loopback

By default, multicast traffic sent using ibverbs can also be received on the same port. While convenient, this is a slow path in the NIC, and can limit performance. To disable this loopback, write 1 to /sys/class/net/{interface}/settings/force_local_lb_disable (note that the setting does not persist across reboots).

Receiving

The ibverbs API can be used programmatically by using an extra method of :pyspead2.recv.Stream.

The configuration is specified using a :pyspead.recv.UdpIbvConfig.

If supported by the NIC and the drivers, the receive code will automatically use a "multi-packet receive queue", which allows each packet to consume only the amount of space needed in the buffer. This is currently only supported on ConnectX®-4+ with MLNX OFED drivers 5.0 or later (or upstream rdma-core). When in use, the max_size parameter has little impact on performance, and is used only to reject larger packets.

When multi-packet receive queues are not supported, performance can be improved by making max_size as small as possible for the intended data stream. This will increase the number of packets that can be buffered (because the buffer is divided into fixed-size slots), and also improve memory efficiency by keeping data more-or-less contiguous.

Environment variables

An existing application can be forced to use ibverbs for all IPv4 readers, by setting the environment variable SPEAD2_IBV_INTERFACE to the IP address of the interface to receive the packets. Note that calls to :pyspead2.recv.Stream.add_udp_reader that pass an explicit interface will use that interface, overriding SPEAD2_IBV_INTERFACE; in this case, SPEAD2_IBV_INTERFACE serves only to enable the override.

It is also possible to specify SPEAD2_IBV_COMP_VECTOR to override the completion channel vector from the default.

Note that this environment variable currently has no effect on senders.

Sending

Sending is done by using the class :pyspead2.send.UdpIbvStream instead of :pyspead2.send.UdpStream. It has a different constructor, but the same methods. There is also a :pyspead2.send.asyncio.UdpIbvStream class, analogous to :pyspead2.send.asyncio.UdpStream.

There is an additional configuration class for ibverbs-specific configuration: