call ibv_reg_mr failed using mapped memory #266

tangrc99 · 2023-06-20T10:23:54Z

    gpu_mem_handle_t m_t;    // from gdrcopy/test/common.h
    if( (r = gpu_mem_alloc(&m_t,10000,1,1) ) != CUDA_SUCCESS) {
        return -1;
    }
    gdr_mh_t handle;
    char *gpu_mapped_mem  = NULL;

    if( (ret = gdr_pin_buffer(g_t, m_t.ptr, m_t.allocated_size, 0,0,&handle)) != 0 ) {
        return -1;
    }
    if( (ret = gdr_map(g_t,handle,&gpu_mapped_mem,m_t.allocated_size) ) != 0 ){
        return -1;
    }

    char *gdr_mem =  gpu_mapped_mem;  // the ptr I try to register

I try to register gdr_mem using ibv_reg_mr, but got an errno EFAULT.
I am using the A10 GPU on CentOS 8.5

The text was updated successfully, but these errors were encountered:

drossetti · 2023-06-20T12:44:03Z

@tangrc99 this expected as the implementation of ibv_reg_mr in the Linux kernel requires the virtual address range to be backed by CPU memory pages.

More exactly, pin_user_pages does not work on CPU mappings of PCIe resources created via io_remap_pfn_range.

The official way of enabling RDMA on GPU memory is:

querying a dma-buf file descriptor from a GPU device pointer (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g51e719462c04ee90a6b0f8b2a75fe031) and
passing that to ibv_reg_dmabuf_mr.

For a full deployment case, see for example https://github.com/openucx/ucx/blob/1308d2055ab0ba948eac213c8cfcd92776c34a53/src/uct/cuda/cuda_copy/cuda_copy_md.c#L410 and https://github.com/openucx/ucx/blob/1308d2055ab0ba948eac213c8cfcd92776c34a53/src/uct/ib/base/ib_md.c#L480.

tangrc99 · 2023-06-20T13:00:28Z

Thanks, cause A10 don't support dma-buf file descriptor. Can I use GDR on A10 with other methods ？

drossetti · 2023-06-20T13:39:16Z

It should. Are you using the openrm variant of the GPU kernel-mode driver, see https://developer.nvidia.com/blog/nvidia-releases-open-source-gpu-kernel-modules/ ?

tangrc99 · 2023-06-21T02:08:14Z

function cuMemGetHandleForAddressRange requires CU_MEM_RANGE_HANDLE_TYPE_DMA_BUF_FD which is 0 on A10. nv_peer_mem and nvidia-peermem is already loaded, is there any other requirements ?

pakmarkthub · 2023-06-21T02:27:56Z

Hi @tangrc99,

Neither nvidia-peermem nor nv_peer_mem involves in dmabuf. A10 should support dmabuf. Could you check if your SW stack is new enough to support dmabuf?

NVIDIA driver with the open variant version 515 or later.
CUDA 11.7 or later.
Linux kernel version 5.12 or later. This is for the NIC stack. The GPU stack does not have this requirement.

tangrc99 · 2023-06-21T02:32:54Z

Thanks, My Linux kernel 4.18.0 is too old.

drossetti · 2023-07-13T14:07:41Z

In that case you can use the legacy RDMA memory registration path, i.e. ibv_reg_mr, which involves the peer-direct kernel infrastructure (for example provided by MLNX_OFED) and nvidia-peermem.

drossetti added the question label Jun 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

call ibv_reg_mr failed using mapped memory #266

call ibv_reg_mr failed using mapped memory #266

tangrc99 commented Jun 20, 2023

drossetti commented Jun 20, 2023 •

edited

Loading

tangrc99 commented Jun 20, 2023 •

edited

Loading

drossetti commented Jun 20, 2023

tangrc99 commented Jun 21, 2023 •

edited

Loading

pakmarkthub commented Jun 21, 2023 •

edited

Loading

tangrc99 commented Jun 21, 2023

drossetti commented Jul 13, 2023 •

edited

Loading

call ibv_reg_mr failed using mapped memory #266

call ibv_reg_mr failed using mapped memory #266

Comments

tangrc99 commented Jun 20, 2023

drossetti commented Jun 20, 2023 • edited Loading

tangrc99 commented Jun 20, 2023 • edited Loading

drossetti commented Jun 20, 2023

tangrc99 commented Jun 21, 2023 • edited Loading

pakmarkthub commented Jun 21, 2023 • edited Loading

tangrc99 commented Jun 21, 2023

drossetti commented Jul 13, 2023 • edited Loading

drossetti commented Jun 20, 2023 •

edited

Loading

tangrc99 commented Jun 20, 2023 •

edited

Loading

tangrc99 commented Jun 21, 2023 •

edited

Loading

pakmarkthub commented Jun 21, 2023 •

edited

Loading

drossetti commented Jul 13, 2023 •

edited

Loading