Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
dc3b25f
verbs: Allow query only device support for QP data in order
dkkranz Jul 13, 2025
a4f3f44
efa: Add single sub CQ poll variant
YonatanNachum Jul 3, 2025
fbd0b88
efa: Add option to create single threaded CQ
YonatanNachum Jul 3, 2025
92ad54b
libibverbs: Rename ibv_reg_mr_in to ibv_mr_init_attr
shefty Aug 12, 2025
091ddb5
Update library version to be 60.0
rleon Aug 13, 2025
003724a
providers/mana: Add error code mappings for retry and rnr timeouts
Aug 12, 2025
3735e62
tests: Fix RDMA transport domain test capability validation
Jul 23, 2025
648c951
tests: Update PCIE mapping flag of mlx5 DMABUF
ShacharKagan Oct 29, 2024
d1669a9
tests: Update CmdHcaCap in mlx5 PRM struct
Feb 10, 2025
5f6cefd
pyverbs: Add DevX events API
Mar 19, 2024
e304dfe
tests: Add tests for DevX events
Apr 1, 2024
ebc4822
pyverbs: Extend mlx5dv_flow
Jan 5, 2025
348a32e
tests: Add flow counter test
Feb 19, 2025
1dc3e8d
mlx5: Add support for bulk flow counters in mlx5dv_create_flow
msanalla Jul 8, 2025
ec3d7f5
mlx5: Implement UAR fallback for td allocation
msanalla Jul 28, 2025
55ee455
tests: Add CX9 to MLX5_DEVS list
Feb 26, 2025
513106c
pyverbs: Add DevX async command completion support
lganti-nbu May 5, 2025
ee8b3b3
tests: Add test for async command completion in DevX
lganti-nbu May 5, 2025
e65fabe
tests: Cover different RDMA matcher priorities
May 6, 2025
947b4a1
tests: Refactor requires_root
s-assaf Jun 19, 2025
fd734dd
tests: Add test for privileged QKEY functionality
s-assaf Jun 25, 2025
3a288e4
mlx5: Add support for bulk flow counters in mlx5dv_create_flow
msanalla Jul 8, 2025
f291e45
pyverbs: Add support to flow counters with offset
Jul 16, 2025
5bb2523
tests: Add test for flow counter action with offset
Jul 20, 2025
a15caa0
pyverbs: Add support to MREX and DMA Handle
Jun 8, 2025
5e4b222
tests: Add tests for MREX and DMAHandle
Jun 8, 2025
bc5b068
Update kernel headers
shefty Aug 18, 2025
7a04b9e
Merge pull request #1636 from yishaih/mlx5_misc
yishaih Aug 21, 2025
4181700
libibverbs: Fix the issue of ibv_ud_pingpong failing in RDMA communic…
Aug 22, 2025
5efc181
bnxt_re/lib: Add support for flow create/destroy
Aug 18, 2025
0f09871
bnxt_re/lib: Dont allow unsupported qp type creation
Aug 18, 2025
7b1a686
librdmacm: Provide interfaces to resolve IB services
MarkZhang81 Aug 25, 2025
c14fa9a
Merge pull request #1606 from dkkranz/query_data_in_order_fix
jgunthorpe Sep 2, 2025
a9019e1
Merge pull request #1629 from amzn/cq_optimizations
jgunthorpe Sep 2, 2025
e8097fe
Merge pull request #1631 from TaranovK/kotaranov/new_err_codes
jgunthorpe Sep 2, 2025
cb74af3
Merge pull request #1635 from EdwardSro/pr-tests
rleon Sep 2, 2025
8eaec4f
Merge pull request #1637 from YanLongDai/fix_ud_pingpong_bug
jgunthorpe Sep 2, 2025
179b646
librdmacm/cmtime: Drop unused 's' option
vladum Sep 2, 2025
6d71f1c
librdmacm/cmtime: Update man page
vladum Sep 2, 2025
8efa975
efa: Make base_ops static
jgunthorpe Sep 9, 2025
95b47f1
Merge pull request #1645 from jgunthorpe/efa_static
rleon Sep 10, 2025
318a889
Merge pull request #1633 from EdwardSro/pr-tests-dmabuf
rleon Sep 10, 2025
b62669d
Merge pull request #1632 from EdwardSro/pr-tests-fixes
rleon Sep 10, 2025
9ac596e
Merge pull request #1634 from EdwardSro/pr-tests-update-prm
rleon Sep 10, 2025
5df6832
Merge pull request #1642 from vladum/dist-cmtime-man
rleon Sep 10, 2025
f75db8c
infiniband-diags: Fix sa_get_handle to use smi/gsi API
mazorasaf Sep 4, 2025
26480ca
ibqueryerrors: Fix SMP call to use correct port
mazorasaf Sep 4, 2025
7d7ca52
librdmacm: Provide an interface to write an event into a CM
MarkZhang81 Jan 23, 2025
30828be
librdmacm: Support DNS in resolve and query addrinfo
MarkZhang81 Jan 27, 2025
dfc0afd
librdmacm: Support IB SA resolve in rdma_getaddrinfo()
MarkZhang81 Feb 3, 2025
4b83401
librdmacm: Document new address resolution APIs
MarkZhang81 Feb 10, 2025
c307ee5
Merge pull request #1641 from mazorasaf/sa_get_handle_fix
rleon Sep 15, 2025
1f9a5e3
Merge pull request #1628 from shefty/svcrec
rleon Sep 15, 2025
3f9d0f9
Merge pull request #1638 from selvintxavier/flow_steering
rleon Sep 15, 2025
93e224f
efa: Extend DV query CQ to return doorbell
mrgolin Sep 15, 2025
4962844
dracut: unify and improve dracut rdma module
jozzsi Sep 9, 2025
11d1baa
providers/erdma: Add SEND_WITH_INV support
hz-cheng Sep 17, 2025
949e4a3
providers/erdma: Fix typo
hz-cheng Sep 17, 2025
359bc2f
proviers/erdma: Fix wrong length passed to ibv_dofork_range
hz-cheng Sep 17, 2025
c7370e0
Merge pull request #1643 from jozzsi/1
jgunthorpe Sep 17, 2025
4010460
pyverbs: Release Python GIL when calling blocking CMID functions
Timon-Kruiper Sep 18, 2025
247966c
Merge pull request #1647 from hz-cheng/upstream/fixes-20250917
rleon Sep 21, 2025
e23cb0a
mlx5: Fix byte_count type in umr_sg_list_create
dstaay-fb Oct 11, 2025
5321d80
Merge pull request #1646 from amzn/cq_doorbell
rleon Oct 19, 2025
15adbcf
libhns: Fix wrong WQE data when QP wraps around
Oct 22, 2025
19bd0c8
libhns: Clean up an extra blank line
Oct 22, 2025
2671a14
Update library version to be 61.0
rleon Oct 26, 2025
754e098
Merge pull request #1652 from hginjgerx/fix
rleon Oct 27, 2025
01341ca
Merge pull request #1650 from dstaay-fb/fix-byte-count-type
rleon Oct 27, 2025
7bb1297
Merge pull request #1648 from Timon-Kruiper/pyverbs_gil
rleon Oct 29, 2025
55841ca
libibverbs: Document verbs semantic model
shefty Oct 12, 2025
a8a8b66
libibverbs: Introduce ultra ethernet transport support
shefty Dec 10, 2024
f03837f
libibverbs: Add support for UET QPs
shefty Dec 10, 2024
d8f141a
libibverbs: Add job id support
shefty Apr 14, 2025
8cea5c1
libibverbs: Add job key support
shefty Apr 15, 2025
7a77e43
libibverbs: Allow posting WRs for RU QPs
shefty Dec 11, 2024
387ed8e
libibverbs: Report UET transport details in completions
shefty Dec 11, 2024
7279497
libibverbs: Support memory registrations for UET
shefty Oct 15, 2025
7794cae
libibverbs: Support adjustable QP msg and data semantics
shefty Jul 30, 2025
388704e
libibverbs: Allow provider to describe immediate data limits
shefty Jul 29, 2025
a3223c2
libibverbs: Define attaching a MR to a QP
shefty Oct 15, 2025
0451f91
libibverbs: Add support for user to select the rkey
shefty Oct 15, 2025
2d3fca9
libibverbs: Add support for 'derived' MRs
shefty Oct 16, 2025
bba2936
libibverbs: Add UET initiator setting
shefty Oct 16, 2025
85cc0e7
libibverbs: Extend ibv_wq to support UET resource index
shefty Oct 16, 2025
942bee0
libibverbs: Update API documentation with UET job concepts
shefty Oct 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ endif()
set(PACKAGE_NAME "RDMA")

# See Documentation/versioning.md
set(PACKAGE_VERSION "59.0")
set(PACKAGE_VERSION "61.0")
# When this is changed the values in these files need changing too:
# debian/control
# debian/libibverbs1.symbols
Expand Down
311 changes: 306 additions & 5 deletions Documentation/libibverbs.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
# Introduction

libibverbs is a library that allows programs to use RDMA "verbs" for
direct access to RDMA (currently InfiniBand and iWARP) hardware from
userspace. For more information on RDMA verbs, see the InfiniBand
Architecture Specification vol. 1, especially chapter 11, and the RDMA
Consortium's RDMA Protocol Verbs Specification.
libibverbs is a library that allows userspace programs direct
access to high-performance network hardware. See the Verbs
Semantics section at the end of this document for details
on RDMA and verbs constructs.

# Using libibverbs

Expand Down Expand Up @@ -74,3 +73,305 @@ The following table describes the expected behavior when VERBS_LOG_LEVEL is set:
|-----------------|---------------------------------|------------------------------------------------|
| Regular prints | Output to VERBS_LOG_FILE if set | Output to VERBS_LOG_FILE, or stderr if not set |
| Datapath prints | Compiled out, no output | Output to VERBS_LOG_FILE, or stderr if not set |


# Verbs Semantics

Verbs is defined by the InfiniBand Architecture Specification
(vol. 1, chapter 11) as an abstract definition of the functionality
provided by an Infiniband NIC. libibverbs was designed as a formal
software API aligned with that abstraction. As a result, API names,
including the library name, are closely aligned with those defined
for Infiniband.

However, the library and API have evolved to support additional
high-performance transports and NICs. libibverbs constructs have
expanded beyond their traditional roles and definitions, except that
the original Infiniband naming has been kept for backwards
compatibility purposes.

Today, verbs can be viewed as defining software primitives for
network hardware supporting one or more of the following:

- Network queues are directly accessible from user space.
- Network hardware can directly access application memory buffers.
- The transport supports RDMA operations.

The following sections describe select libibverbs constructs in terms
of their current semantics and, where appropriate, historical context.
Items are ordered conceptually.

*RDMA*
: RDMA takes on several different meanings based on context,
which are further described below. RDMA stands for remote direct memory
access. Historically, RDMA referred to network operations which could
directly read or write application data buffers at the target.
The use of the term RDMA has since evolved to encompass not just
network operations, but also the key features of such devices:

- Zero-copy: no intermediate buffering
- Low CPU utilization: transport offload
- High bandwidth and low latency

*RDMA Verbs*
: RDMA verbs is the more generic name given to the libibverbs API,
as it implies support for other transports beyond Infiniband.
A device which supports RDMA verbs is accessible through this library.

A common, but restricted, industry use of the term RDMA verbs frequently
implies the subset of libibverbs APIs and semantics focused on reliable-
connected communication. This document will use the term RDMA verbs as
a synonym for the libibverbs API as a whole.

*RDMA-Core*
: The rdma-core is a set of libraries for interfacing with the Linux
kernel RDMA subsystem. Two key rdma-core libraries are this one,
libibverbs, and the librdmacm, which is used to establish connections.

The rdma-core is considered an essential component of Linux RDMA.
It is used to ensure that the kernel ABI is stable and implements the
user space portion of the kernel RDMA IOCTL API.

*RDMA Device / Verbs Device / NIC*
: An RDMA or verbs device is one which is accessible through the Linux
RDMA subsystem, and as a result, plugs into the libibverbs and rdma-core
framework. NICs plug into the RDMA subsystem to expose hardware
primitives supported by verbs (described above) or RDMA-like features.

NICs do not necessarily need to support RDMA operations or transports
in order to leverage the rdma-core infrastructure. It is sufficient for
a NIC to expose similar features found in RDMA devices.

*RDMA Operation*
: RDMA operations refer to network transport functions that read or write
data buffers at the target without host CPU intervention. RDMA reads
copy data from a remote memory region to the network and return the data
to the initiator of the request. RDMA writes copy data from a local
memory region to the network and place it directly into a memory region
at the target.

*RDMA Transport*
: An RDMA transport can be considered any transport that supports RDMA
operations. Common RDMA transports include Infiniband,
RoCE (RDMA over Converged Ethernet), RoCE version 2, and iWarp. RoCE
and RoCEv2 are Infiniband transports over the Ethernet link layer, with
differences only in their lower-level addressing.
However, the term Infiniband usually refers to the Infiniband transport
over the Infiniband link layer. RoCE is used when explicitly
referring to Ethernet based solutions. RoCE version 2 is often included
or implied by references to RoCE.

*Device Node*
: The original intent of device node type was to identify if an Infiniband
device was a NIC, switch, or router. Infiniband NICs were labeled as
channel adapters (CA). Node type was extended to identify the transport
being manipulated by verb primitives. Devices which implemented other
transports were assigned new node types. As a result, applications which
targeted a specific transport, such as Infiniband or RoCE, relied on node
type to indirectly identify the transport.

*Protection Domain (PD)*
: A protection domain provides process-level isolation of resources and is
considered a fundamental security construct for Linux RDMA devices.
A PD defines a boundary between memory regions and queue pairs. A
network data transfer is associated with a single queue pair. That queue
pair may only access a memory region that shares the same protection
domain as itself. This prevents a user space process from accessing
memory buffers outside of its address space.

Protection domains provide security for regions accessed
by both local and remote operations. Local access includes work requests
posted to HW command queues which reference memory regions. Remote
access includes RDMA operations which read or write memory regions.

A queue pair is associated with a single PD. The PD verifies that hardware
access to a given lkey or rkey is valid for the specified QP and the
initiating or targeted process has permission to the lkey or rkey. Vendors
may implement a PD using a variety of mechanisms, but are required to meet
the defined security isolation.

*Memory Region (MR)*
: A memory region identifies a virtual address range known to the NIC.
MRs are registered address ranges accessible by the NIC for local and
remote operations. The process of creating a MR associates the given
virtual address range with a protection domain, in order to ensure
process-level isolation.

Once allocated, data transfers reference the MR using a key value (lkey
and/or rkey). When accessing a MR as part of a data transfer, an offset
into the memory region is specified. The offset is relative to the start
of the region and may either be 0-based or based on the region’s starting
virtual address.

*lkey*
: The lkey is designed as a hardware identifier for a locally accessed data
buffer. Because work requests are formatted by user space software and
may be written directly to hardware queues, hardware must validate
that the memory buffers being referenced are accessible to the application.

NIC hardware may not have access to the operating system's
virtual address translation table. Instead, hardware can use the lkey to
identify the registered memory region, which in turn identifies a protection
domain, which finally identifies the calling process. The protection domain
the processing queue pair must match that of the accessed memory region.
This prevents an application from sending data from buffers outside of its
virtual address space.

*rkey*
: The rkey is designed as a transport identifier for remotely accessed data
buffers. It's conceptually like an lkey, but the value is
shared across the network. An rkey is associated with transport
permissions.

*Completion Queue (CQ)*
: A completion queue is designed to represent a hardware queue where the
status of asynchronous operations is reported. Each asynchronous
operation (i.e. data transfer) is expected to write a single entry
into the completion queue.

*Queue Pair (QP)*
: A queue pair was originally defined as a transport addressable set of
hardware queues, with a QP consisting of send and receive queues (defined
below). The evolved definition of a QP refers only to the transport
addressability of an endpoint. A QP's address is identified as a
queue pair number (QPN), which is conceptually like a transport
port number. In networking stack models, a QP is considered a transport
layer object.

The internal structure of the QP is not constrained to a pair of queues.
The number of hardware queues and their purpose may vary based on how
the QP is configured. A QP may have 0 or more command queues used for
posting data transfer requests (send queues) and 0 or more command queues
for posting data buffers used to receive incoming messages (receive queues).

*Receive Queue (RQ)*
: Receive queues are command queues belonging to queue pairs. Receive
commands post application buffers to receive incoming data.

Receive queues are configured as part of queue pair setup. A RQ is
accessed indirectly through the QP when submitting receive work requests.

*Shared Receive Queue (SRQ)*
: A shared receive queue is a single hardware command queue for posting
buffers to receive incoming data. This command queue may be shared
among multiple QPs, such that data that arrives on any associated QP
may retrieve a previously posted buffer from the SRQ. QPs that share
the same SRQ coordinate their access to posted buffers such that a
single posted operation is matched with a single incoming message.

Unlike receive queues, SRQs are accessed directly by applications to
submit receive work requests.

*Send Queue (SQ)*
: More generically, a send queue is a transmit queue. It
represents a command queue for operations that initiate a network operation.
A send queue may also be used to submit commands that update hardware
resources, such as updating memory regions. Network operations submitted
through the send queue include message sends, RDMA reads, RDMA writes, and
atomic operations, among others.

Send queues are configured as part of queue pair setup. A SQ is
accessed indirectly through the QP when submitting send work requests.

*Send Message*
: A send message refers to a specific type of transport data transfer.
A send message operation copies data from a local buffer to the network
and transfers the data as a single transport unit. The receiving NIC
copies the data from the network into a user posted receive message
buffer(s).

Like the term RDMA, the meaning of send is context dependent. Send could
refer to the transmit command queue, any operation posted to the transmit
(send) queue, or a send message operation.

*Work Request (WR)*
: A work request is a command submitted to a queue pair, work queue, or
shared receive queue. Work requests define the type of network operation
to perform, including references to any memory regions the operation will
access.

A send work request is a transmit operation that is directed to the send
queue of a queue pair. A receive work request is an operation posted
to either a shared receive queue or a QP's receive queue.

*Address Handle (AH)*
: An address handle identifies the link and/or network layer addressing to
a network port or multicast group.

With legacy Infiniband, an address handle is a link layer object. For other
transports, including RoCE, the address handle is a network layer object.

*Global Identifier (GID)*
: Infiniband defines a GID as an optional network-layer or multicast address.
Because GIDs are large enough to store an IPv6 address, their use has evolved
to support other transports. A GID identifies a network port, with the most
well-known GIDs being IPv4 and IPv6 addresses.

*GID Type*
: The GID type determines the specific type of GID address being referenced.
Additionally, it identifies the set of addressing headers underneath the
transport header.

An RDMA transport protocol may be layered over different networking stacks.
An RDMA transport may layer directly over a link layer (like Infiniband or
Ethernet), over the network layer (such as IP), or another transport
layer (such as TCP or UDP). The GID type conveys how the RDMA transport
stack is constructed, as well as how the GID address is interpreted.

*GID Index*
: RDMA addresses are securely managed to ensure that unprivileged
applications do not inject arbitrary source addresses into the network.
Transport addresses are injected by the queue pair. Network addresses
are selected from a set of addresses stored in a source addressing table.

The source addressing table is referred to as a GID table. The GID index
identifies an entry into that table. The GID table exposed to a user
space process contains only those addresses usable by that process.
Queue pairs are frequently assigned a specific GID index to use for their
source network address when initially configured.

*Device Context*
: Identifies an instance of an opened RDMA device.

*command fd - cmd_fd*
: File descriptor used to communicate with the kernel device driver.
Associated with the device context and opened by the library.
The cmd_fd communicates with the kernel via ioctl’s and is used
to allocate, configure, and release device resources.

Applications interact with the cmd_fd indirectly by calling libibverbs
function calls.

*async_fd*
: File descriptor used to report asynchronous events.
Associated with the device context and opened by the library.

Applications may interact directly with the async_fd, such as waiting
on the fd via select/poll, to receive notifications when an async event
has been reported.

*Job ID*
: A job ID identifies a single distributed application. The job object
is a device-level object that maps to a job ID and may be shared between
processes. The configuration of a job object, such as assigning its
job ID value, is considered a privileged operation.

Multiple job objects, each assigned the same job ID value, may be needed
to represent a single, higher-level logical job running on the network.
This may be nessary for jobs that span multiple RDMA devices, for
example, where each job object may be configured for different source
addressing.

*Job Key*
: A job key associates a job object with a specific protection domain. This
provides secure access to the actual job ID value stored with the job
object, while restricting which memory regions data transfers to / from
that job may access.

*Address Table*
: An address table is a virtual address array associated with a job object.
The address table allows local processes that belong to the same job to
share addressing and scalable encryption information to peer QPs.

The address table is an optional but integrated component to a job
object.
2 changes: 1 addition & 1 deletion debian/changelog
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
rdma-core (59.0-1) unstable; urgency=medium
rdma-core (61.0-1) unstable; urgency=medium

* New upstream release.

Expand Down
4 changes: 4 additions & 0 deletions debian/librdmacm1.symbols
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ librdmacm.so.1 librdmacm1 #MINVER#
RDMACM_1.1@RDMACM_1.1 16
RDMACM_1.2@RDMACM_1.2 23
RDMACM_1.3@RDMACM_1.3 31
RDMACM_1.4@RDMACM_1.4 60
raccept@RDMACM_1.0 1.0.16
rbind@RDMACM_1.0 1.0.16
rclose@RDMACM_1.0 1.0.16
Expand Down Expand Up @@ -43,12 +44,15 @@ librdmacm.so.1 librdmacm1 #MINVER#
rdma_listen@RDMACM_1.0 1.0.15
rdma_migrate_id@RDMACM_1.0 1.0.15
rdma_notify@RDMACM_1.0 1.0.15
rdma_query_addrinfo@RDMACM_1.4 60
rdma_reject@RDMACM_1.0 1.0.15
rdma_reject_ece@RDMACM_1.3 31
rdma_resolve_addr@RDMACM_1.0 1.0.15
rdma_resolve_addrinfo@RDMACM_1.4 60
rdma_resolve_route@RDMACM_1.0 1.0.15
rdma_set_local_ece@RDMACM_1.3 31
rdma_set_option@RDMACM_1.0 1.0.15
rdma_write_cm_event@RDMACM_1.4 60
rfcntl@RDMACM_1.0 1.0.16
rgetpeername@RDMACM_1.0 1.0.16
rgetsockname@RDMACM_1.0 1.0.16
Expand Down
Loading