Skip to content

nvmf_tgt seg fault #500

@jrgruher

Description

@jrgruher

I'm running a dual socket Skylake server with P4510 NVMe and 100Gb Mellanox CX4 NIC. OS is Ubuntu 18.04 with kernel 4.18.16. SPDK version is 18.10, FIO version is 3.12. I'm running the SPDK NVMeoF target and exercising it from an initiator system (similar config to the target but with 50Gb NIC) using FIO with the bdev plugin. I find 128K sequential workloads reliably and immediately seg fault nvmf_tgt. I can run 4KB random workloads without experiencing the seg fault, so the problem seems tied to the block size and/or IO pattern. I can run the same IO pattern against a local PCIe device using SPDK without a problem, I only see the failure when running the NVMeoF target with FIO running the IO patter from an SPDK initiator system.

Steps to reproduce and seg fault output follow below.

Start the target:
sudo ~/install/spdk/app/nvmf_tgt/nvmf_tgt -m 0x0000F0 -r /var/tmp/spdk1.sock

Configure the target:
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d1 -t pcie -a 0000:1a:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d2 -t pcie -a 0000:1b:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d3 -t pcie -a 0000:1c:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d4 -t pcie -a 0000:1d:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d5 -t pcie -a 0000:3d:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d6 -t pcie -a 0000:3e:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d7 -t pcie -a 0000:3f:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d8 -t pcie -a 0000:40:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_raid_bdev -n raid1 -s 4 -r 0 -b "d1n1 d2n1 d3n1 d4n1 d5n1 d6n1 d7n1 d8n1"
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_store raid1 store1
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l1 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l2 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l3 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l4 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l5 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l6 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l7 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l8 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l9 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l10 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l11 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l12 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn1 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn2 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn3 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn4 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn5 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn6 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn7 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn8 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn9 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn10 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn11 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn12 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn1 store1/l1
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn2 store1/l2
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn3 store1/l3
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn4 store1/l4
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn5 store1/l5
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn6 store1/l6
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn7 store1/l7
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn8 store1/l8
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn9 store1/l9
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn10 store1/l10
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn11 store1/l11
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn12 store1/l12
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn1 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn2 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn3 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn4 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn5 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn6 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn7 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn8 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn9 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn10 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn11 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn12 -t rdma -a 10.5.0.202 -s 4420

FIO file on initiator:
[global]
rw=rw
rwmixread=100
numjobs=1
iodepth=32
bs=128k
direct=1
thread=1
time_based=1
ramp_time=10
runtime=10
ioengine=spdk_bdev
spdk_conf=/home/don/fio/nvmeof.conf
group_reporting=1
unified_rw_reporting=1
exitall=1
randrepeat=0
norandommap=1
cpus_allowed_policy=split
cpus_allowed=1-2
[job1]
filename=b0n1

Config file on initiator:
[Nvme]
TransportID "trtype:RDMA traddr:10.5.0.202 trsvcid:4420 subnqn:nqn.2018-11.io.spdk:nqn1 adrfam:IPv4" b0

Run FIO on initiator and nvmf_tgt seg faults immediate:
sudo LD_PRELOAD=/home/don/install/spdk/examples/bdev/fio_plugin/fio_plugin fio sr.ini

Seg fault looks like this:
mlx5: donsl202: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000001 00000000 00000000 00000000
00000000 9d005304 0800011b 0008d0d2
rdma.c:2698:spdk_nvmf_rdma_poller_poll: *WARNING*: CQ error on CQ 0x7f079c01d170, Request 0x139670660105216 (4): local protection error
rdma.c: 501:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#1 changed to: IBV_QPS_ERR
rdma.c:2698:spdk_nvmf_rdma_poller_poll: *WARNING*: CQ error on CQ 0x7f079c01d170, Request 0x139670660105216 (5): Work Request Flushed Error
rdma.c: 501:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#1 changed to: IBV_QPS_ERR
rdma.c:2698:spdk_nvmf_rdma_poller_poll: *WARNING*: CQ error on CQ 0x7f079c01d170, Request 0x139670660106280 (5): Work Request Flushed Error
rdma.c: 501:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#1 changed to: IBV_QPS_ERR
rdma.c:2698:spdk_nvmf_rdma_poller_poll: *WARNING*: CQ error on CQ 0x7f079c01d170, Request 0x139670660106280 (5): Work Request Flushed Error
Segmentation fault

Adds this to dmesg:
[71561.859644] nvme nvme1: Connect rejected: status 8 (invalid service ID).
[71561.866466] nvme nvme1: rdma connection establishment failed (-104)
[71567.805288] reactor_7[9166]: segfault at 88 ip 00005630621e6580 sp 00007f07af5fc400 error 4 in nvmf_tgt[563062194000+df000]
[71567.805293] Code: 48 8b 30 e8 82 f7 ff ff e9 7d fe ff ff 0f 1f 44 00 00 41 81 f9 80 00 00 00 75 37 49 8b 07 4c 8b 70 40 48 c7 40 50 00 00 00 00 <49> 8b 96 88 00 00 00 48 89 50 58 49 8b 96 88 00 00 00 48 89 02 48

Metadata

Metadata

Labels

HighHigh Priority Bugbug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions