Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(degradation) VMware *ERROR*: Host (Initiator) is not allowed to use RDMA operations (responder_resources 1) #3115

Closed
Dante4 opened this issue Sep 2, 2023 · 5 comments
Assignees
Labels

Comments

@Dante4
Copy link

Dante4 commented Sep 2, 2023

Sighting report

When I try to connect device by NVMe over RDMA from VMware I receive error:

2023-09-02T21:29:09.289Z cpu11:2097768)nvmerdma:2805 [ctlr 272, queue 0] status: Failure, param.conn: data 0x45e90d3f972c, dataLen 148, responderResources 0, initiatorDepth 0, flowControl 0, retryCount 0, rnrRetryCount 0, srq 0, qpNum 0
2023-09-02T21:29:09.289Z cpu11:2097768)nvmerdma:2811 [ctlr 272, queue 0] Reject data: recfmt 0, sts 0x8
2023-09-02T21:29:09.289Z cpu6:2099683 opID=6f4e1902)nvmerdma:593 [ctlr 272, queue 0] Failed to establish RDMA connection, state 4: Failure
2023-09-02T21:29:09.294Z cpu6:2099683 opID=6f4e1902)nvmerdma:322 [ctlr 272, queue 0] Failed to connect to CM for queue: Failure

Expected Behavior

I receive nvme-of device in VMware

Current Behavior

I receive error:

[2023-09-02 21:29:15.497641] rdma.c:1332:nvmf_rdma_connect: *ERROR*: Host (Initiator) is not allowed to use RDMA operations (responder_resources 1)
[2023-09-02 21:29:15.497720] rdma.c:3555:nvmf_process_cm_event: *ERROR*: Unable to process connect event. rc: -1

Steps to Reproduce

  1. Create target with following commands (I have also tried using scripts/rpc.py bdev_nvme_attach_controller -b nvme0 -a 1c:00.0 -t pcie)
screen 
build/bin/nvmf_tgt
ctrl+A + d
scripts/rpc.py nvmf_create_transport -t RDMA -u 8192 -i 131072 -c 8192
scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a -s SPDK00000000000001 -d SPDK_Controller
scripts/rpc.py bdev_aio_create /dev/md0 md0 
scripts/rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 md0
scripts/rpc.py nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode1 -t rdma -a 10.20.0.5 -s 4420
  1. Create VMware NVME over RDMA Storage Adapter
  2. Try to connect to SPDK target
  3. Error in nvmf_tgt stdout

Context (Environment including OS version, SPDK version, etc.)

Target -
OS: Ubuntu 22.04
Kernel: 6.2.0-31
SPDK version: v23.09-pre git sha1 5ec4a06

Initiator:
VMware ESXi 7.0.3 build-21930508
VMware ESXi 7.0.3 build-21422485

Hardware:
ConnectX-3 Pro

Everything is working if I use TCP instead of RDMA. But iSER is working just fine and I can discover controller with built-in kernel nvme target with RDMA, so the problem is not RDMA by itself :(

And if I change
if (rdma_param->initiator_depth == 0)
in https://github.com/spdk/spdk/blob/master/lib/nvmf/rdma.c
to
if (rdma_param->initiator_depth == 1)

And version
v23.05 also working as intended

@Dante4 Dante4 added the Sighting label Sep 2, 2023
@Dante4 Dante4 changed the title VMware *ERROR*: Host (Initiator) is not allowed to use RDMA operations (responder_resources 1) (degradation) VMware *ERROR*: Host (Initiator) is not allowed to use RDMA operations (responder_resources 1) Sep 3, 2023
@tomzawadzki
Copy link
Contributor

And if I change
if (rdma_param->initiator_depth == 0)
in https://github.com/spdk/spdk/blob/master/lib/nvmf/rdma.c
to
if (rdma_param->initiator_depth == 1)

And version
v23.05 also working as intended

This seems to point to patch 6bc8d26 and the change from quote suggest that initiator_depth is set to 0, which per commit message is not acceptable.

@AlekseyMarchuk would you be able to take a look ?

@AlekseyMarchuk
Copy link
Member

I didn't test this patch with VMware. From the log provided by @Dante4 we can see that VMware sent rdma_conn_param which was all zeroes. That is a bit incorrect and it seems that IB driver sets some defaults using HW capability.

2023-09-02T21:29:09.289Z cpu11:2097768)nvmerdma:2805 [ctlr 272, queue 0] status: Failure, param.conn: data 0x45e90d3f972c, dataLen 148, responderResources 0, initiatorDepth 0, flowControl 0, retryCount 0, rnrRetryCount 0, srq 0, qpNum 0

I'll push a patch to make this check less strict

@AlekseyMarchuk
Copy link
Member

@Dante4 could you try this patch https://review.spdk.io/gerrit/c/spdk/spdk/+/19732 ?
I'd like to clarify if this change helped you to work around this problem. You changed check of initiator_depth while in logs we can see that error happened due to responder_resources

And if I change
if (rdma_param->initiator_depth == 0)
in https://github.com/spdk/spdk/blob/master/lib/nvmf/rdma.c
to
if (rdma_param->initiator_depth == 1)

@Dante4
Copy link
Author

Dante4 commented Sep 5, 2023

@Dante4 could you try this patch https://review.spdk.io/gerrit/c/spdk/spdk/+/19732 ? I'd like to clarify if this change helped you to work around this problem. You changed check of initiator_depth while in logs we can see that error happened due to responder_resources

And if I change
if (rdma_param->initiator_depth == 0)
in https://github.com/spdk/spdk/blob/master/lib/nvmf/rdma.c
to
if (rdma_param->initiator_depth == 1)

Uups, I misscopyed, it should be responder_resources == 1, instead of initiator_depth.

The patch is working, but flood in log too much
17 messages per connected host

image

@AlekseyMarchuk
Copy link
Member

Thank you for verification, I'll reduce number of logs

spdk-bot pushed a commit that referenced this issue Sep 12, 2023
Initiator must not set intiator_depth since
accordig to the NVMf spec it can't issue RDMA
operations. But some drivers set it to incorrect
value. We can allow such connections, just
print a warning when admin qpair is connected

Fixes issue #3115

Signed-off-by: Alexey Marchuk <alexeymar@nvidia.com>
Change-Id: I006d8bb609819cb97b3b57051ce9ffdcb80796a6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/19732
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <jim.harris@gmail.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Mellanox Build Bot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants