Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVMeofTCP SPDK host abort test #2048

Closed
gp2017 opened this issue Jul 16, 2021 · 18 comments
Closed

NVMeofTCP SPDK host abort test #2048

gp2017 opened this issue Jul 16, 2021 · 18 comments
Assignees
Labels
bug Patches out for review Waiting on Submitter currently waiting on input from submitter

Comments

@gp2017
Copy link

gp2017 commented Jul 16, 2021

I am using NVMeofTCP SPDK host and running IO abort test ( build/examples folder) to test my NVMeofTCP target. I am running abort test with different queue depth like 4, 8, 16, 32.
Test completes fine for queue depth 4, 8, and 16 but it hangs and seems to be in some kind of continuous loop for queue depth 32.

Expected Behavior

Abort test should not hang with "-q 32" parameter.

Abort test should complete with "-q 32" parameter in same manner as it is completing with "-q 4", "-q 8" and "-q 16" parameter.

Current Behavior

Abort test hangs with "-q 32" parameter.

Expected behavior is that test should complete for "-q 32" parameter in the same way as it is completing for "-q 4", . . . . ,"-q 16".

Possible Solution

From trace, it looks like host sent a write command and target responded with r2t and then host is not sending the requested data to target. Host needs to send the requested data and then should send the abort request as part of abort test.

Host is not sending the requested data in response to r2t from target.

Steps to Reproduce

  1. Command with q depth = 32 (failing case)
    ./abort -q 32 -s 4096 -w rw -M 50 -o 40960 -r 'trtype:tcp adrfam:IPv4 traddr:10.10.10.167 trsvcid:4420 subnqn:nqn.2015-09.com.cdw:nvme.1'
  2. Command with q depth=4 (passing case)
    ./abort -q 4 -s 4096 -w rw -M 50 -o 40960 -r 'trtype:tcp adrfam:IPv4 traddr:10.10.10.167 trsvcid:4420 subnqn:nqn.2015-09.com.cdw:nvme.1'
  3. Command with q depth=8 (passing case)
    ./abort -q 8 -s 4096 -w rw -M 50 -o 40960 -r 'trtype:tcp adrfam:IPv4 traddr:10.10.10.167 trsvcid:4420 subnqn:nqn.2015-09.com.cdw:nvme.1'
  4. Command with q depth= 16 (passing case)
    ./abort -q 16 -s 4096 -w rw -M 50 -o 40960 -r 'trtype:tcp adrfam:IPv4 traddr:10.10.10.167 trsvcid:4420 subnqn:nqn.2015-09.com.cdw:nvme.1'

Console output with q depth=4 (passed)

./abort -q 4 -s 4096 -w rw -M 50 -o 40960 -r 'trtype:tcp adrfam:IPv4 traddr:10.10.10.167 trsvcid:4420 subnqn:nqn.2015-09.com.cdw:nvme.1'
[2021-07-16 09:55:04.024076] Starting SPDK v21.07-pre git sha1 b73d3e6 / DPDK 21.02.0 initialization...
[2021-07-16 09:55:04.024156] [ DPDK EAL parameters: [2021-07-16 09:55:04.024169] abort [2021-07-16 09:55:04.024180] --no-shconf [2021-07-16 09:55:04.024190] -c 0x1 [2021-07-16 09:55:04.024198] -m 4096 [2021-07-16 09:55:04.024207] --no-pci [2021-07-16 09:55:04.024217] --log-level=lib.eal:6 [2021-07-16 09:55:04.024226] --log-level=lib.cryptodev:5 [2021-07-16 09:55:04.024235] --log-level=user1:6 [2021-07-16 09:55:04.024247] --iova-mode=pa [2021-07-16 09:55:04.024257] --base-virtaddr=0x200000000000 [2021-07-16 09:55:04.024268] --match-allocations [2021-07-16 09:55:04.024279] --file-prefix=spdk_pid126833 [2021-07-16 09:55:04.024289] ]
EAL: No available 1048576 kB hugepages reported
EAL: No legacy callbacks, legacy socket not created
Initializing NVMe Controllers
Attached to NVMe over Fabrics controller at 10.10.10.167:4420: nqn.2015-09.com.cdw:nvme.1
Associating TCP (addr:10.10.10.167 subnqn:nqn.2015-09.com.cdw:nvme.1) NSID 1 with lcore 0
Initialization complete. Launching workers.
NS: TCP (addr:10.10.10.167 subnqn:nqn.2015-09.com.cdw:nvme.1) NSID 1 I/O completed: 30102, failed: 15
CTRLR: TCP (addr:10.10.10.167 subnqn:nqn.2015-09.com.cdw:nvme.1) abort submitted 45, failed to submit 30072
success 15, unsuccess 30, failed 0

Console output with q depth= 16 (passed)
./abort -q 16 -s 4096 -w rw -M 50 -o 40960 -r 'trtype:tcp adrfam:IPv4 traddr:10.10.10.167 trsvcid:4420 subnqn:nqn.2015-09.com.c:nvme.1'
[2021-07-16 09:55:32.400777] Starting SPDK v21.07-pre git sha1 b73d3e6 / DPDK 21.02.0 initialization...
[2021-07-16 09:55:32.400855] [ DPDK EAL parameters: [2021-07-16 09:55:32.400868] abort [2021-07-16 09:55:32.400876] --no-shconf [2021-07-16 09:55:32.400886] -c 0x1 [2021-07-16 09:55:32.400896] -m 4096 [2021-07-16 09:55:32.400905] --no-pci [2021-07-16 09:55:32.400915] --log-level=lib.eal:6 [2021-07-16 09:55:32.400924] --log-level=lib.cryptodev:5 [2021-07-16 09:55:32.400934] --log-level=user1:6 [2021-07-16 09:55:32.400944] --iova-mode=pa [2021-07-16 09:55:32.400953] --base-virtaddr=0x200000000000 [2021-07-16 09:55:32.400963] --match-allocations [2021-07-16 09:55:32.400971] --file-prefix=spdk_pid126846 [2021-07-16 09:55:32.400980] ]
EAL: No available 1048576 kB hugepages reported
EAL: No legacy callbacks, legacy socket not created
Initializing NVMe Controllers
Attached to NVMe over Fabrics controller at 10.10.10.167:4420: nqn.2015-09.com.cdw:nvme.1
controller IO queue size 16 less than required
Consider using lower queue depth or small IO size because IO requests may be queued at the NVMe driver.
Associating TCP (addr:10.10.10.167 subnqn:nqn.2015-09.com.cdw:nvme.1) NSID 1 with lcore 0
Initialization complete. Launching workers.
NS: TCP (addr:10.10.10.167 subnqn:nqn.2015-09.com.cdw:nvme.1) NSID 1 I/O completed: 49920, failed: 16
CTRLR: TCP (addr:10.10.10.167 subnqn:nqn.2015-09.com.cdw:nvme.1) abort submitted 43, failed to submit 49893
success 16, unsuccess 27, failed 0

Console output with q depth = 32 (failed)
./abort -q 32 -s 4096 -w rw -M 50 -o 40960 -r 'trtype:tcp adrfam:IPv4 traddr:10.10.10.167 trsvcid:4420 subnqn:nqn.2015-09.com.cdw:nvme.1'
[2021-07-16 09:55:49.649538] Starting SPDK v21.07-pre git sha1 b73d3e6 / DPDK 21.02.0 initialization...
[2021-07-16 09:55:49.649617] [ DPDK EAL parameters: [2021-07-16 09:55:49.649630] abort [2021-07-16 09:55:49.649641] --no-shconf [2021-07-16 09:55:49.649652] -c 0x1 [2021-07-16 09:55:49.649660] -m 4096 [2021-07-16 09:55:49.649669] --no-pci [2021-07-16 09:55:49.649680] --log-level=lib.eal:6 [2021-07-16 09:55:49.649691] --log-level=lib.cryptodev:5 [2021-07-16 09:55:49.649702] --log-level=user1:6 [2021-07-16 09:55:49.649713] --iova-mode=pa [2021-07-16 09:55:49.649724] --base-virtaddr=0x200000000000 [2021-07-16 09:55:49.649736] --match-allocations [2021-07-16 09:55:49.649746] --file-prefix=spdk_pid126855 [2021-07-16 09:55:49.649758] ]
EAL: No available 1048576 kB hugepages reported
EAL: No legacy callbacks, legacy socket not created
Initializing NVMe Controllers
Attached to NVMe over Fabrics controller at 10.10.10.167:4420: nqn.2015-09.com.cdw:nvme.1
controller IO queue size 16 less than required
Consider using lower queue depth or small IO size because IO requests may be queued at the NVMe driver.
Associating TCP (addr:10.10.10.167 subnqn:nqn.2015-09.com.cdw:nvme.1) NSID 1 with lcore 0
Initialization complete. Launching workers.
[2021-07-16 09:55:52.997426] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997463] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997474] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: WRITE sqid:1 cid:0 nsid:1 lba:22110 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997483] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997490] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997497] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997503] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: WRITE sqid:1 cid:0 nsid:1 lba:22120 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997510] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997515] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997523] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997529] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: WRITE sqid:1 cid:0 nsid:1 lba:22130 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997534] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997542] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997549] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997557] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22140 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997565] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997571] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997577] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997583] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: WRITE sqid:1 cid:0 nsid:1 lba:22150 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997590] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997595] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997600] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997611] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: WRITE sqid:1 cid:0 nsid:1 lba:22160 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997619] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997627] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997634] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997642] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22170 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997650] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997658] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997666] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997674] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22180 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997681] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997689] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997697] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997704] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: WRITE sqid:1 cid:0 nsid:1 lba:22190 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997711] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997719] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997726] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997733] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22200 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997740] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997747] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997754] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997762] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22210 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997768] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997775] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997782] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997789] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22220 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997796] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997803] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997810] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997817] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22230 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997826] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997834] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997840] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997848] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22240 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997855] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997862] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997870] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997877] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22250 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997885] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997892] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997898] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997905] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22260 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997912] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997919] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997926] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997937] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22270 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997945] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997953] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997959] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997967] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22280 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.997974] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.997981] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.997989] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.997996] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22290 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.998003] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.998010] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.998017] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.998024] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: READ sqid:1 cid:0 nsid:1 lba:22300 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.998032] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.998040] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.998047] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.998055] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: WRITE sqid:1 cid:0 nsid:1 lba:22310 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.998062] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.998070] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.998076] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.998083] nvme_qpair.c: 272:nvme_io_qpair_print_command: NOTICE: WRITE sqid:1 cid:0 nsid:1 lba:22320 len:10 PRP1 0x0 PRP2 0x0
[2021-07-16 09:55:52.998090] nvme_qpair.c: 455:spdk_nvme_print_completion: NOTICE: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-07-16 09:55:52.998097] nvme_qpair.c: 594:nvme_qpair_abort_queued_reqs: ERROR: aborting queued i/o
[2021-07-16 09:55:52.998103] nvme_qpair.c: 536:nvme_qpair_manual_complete_request: NOTICE: Command completed manually:
[2021-07-16 09:55:52.998110] nvme_qpair.c: 272:nvme_io_qpair_print_command: *N

Context (Environment including OS version, SPDK version, etc.)

Ubuntu 18.04:
cat /etc/issue
Ubuntu 18.04.5 LTS \n \l

Kernel:
uname -sr
Linux 5.12.17-051217-generic

SPDK Version:
SPDK v21.07-pre

@jimharris
Copy link
Member

I am using NVMeofTCP SPDK host and running IO abort test ( build/examples folder) to test my NVMeofTCP target. I am running abort test with different queue depth like 4, 8, 16, 32.

Hi @gp2017,

Can you clarify "my NVMeofTCP target"? Is this something different than the Linux kernel NVMe/TCP or SPDK NVMe/TCP target?

Thanks,

Jim

@gp2017
Copy link
Author

gp2017 commented Jul 16, 2021

Hello Jim,
I am running this abort test with my hardware NVMe/TCP target.
Yes, my target is different from Linux Kernel NVMe/TCP Target or SPDK NVMe/TCP Target.
Thanks,
GP

@gp2017
Copy link
Author

gp2017 commented Jul 16, 2021 via email

@jimharris
Copy link
Member

I am suspicious that this fails at QD=32 and that the default admin queue size is also 32. We certainly should make sure the admin queue is big enough to handle all of the aborts that the test app is sending.

Could you try this patch? https://review.spdk.io/gerrit/c/spdk/spdk/+/8812

The patch has only been compile tested, I've run no actual tests with it. If the patch works for the QD=32 case, please try higher queue depths as well (64 and 128).

If the patch does not work, please try reproducing this with the Linux kernel or SPDK target and confirm the issue happens there as well. It will make it easier for someone else to reproduce and root cause.

Thanks,

Jim

@gp2017
Copy link
Author

gp2017 commented Jul 16, 2021 via email

@tomzawadzki tomzawadzki added the Waiting on Submitter currently waiting on input from submitter label Jul 20, 2021
@tomzawadzki tomzawadzki added this to To do on next bug scrub in Bug scrub Jul 27, 2021
@tomzawadzki
Copy link
Contributor

[Bug scrub] @mikeBashStuff Will take a look at adding functional test case covering this issue.

@tomzawadzki tomzawadzki moved this from To do on next bug scrub to 27.07 in Bug scrub Jul 27, 2021
@mikeBashStuff
Copy link
Contributor

I am afraid that I am not able to reproduce the very same issue for the SPDK or kernel target. I see some other issues though.

With or without the patch I randomly see something like this for the SPDK target regardless of the qd (though most often it happens when it's >= 64):

 10:34:33       -- nvmf/target/abort_hw.sh@32 -- # /tmp/spdk/build/examples/abort -c 0xf -q 128 -w rw -M 50 -o 4096 -r 'trtype:tcp adrfam:IPv4 traddr:10.0.0.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:spdk_target'
[2021-07-28 10:34:33.940237] Starting SPDK v21.10-pre git sha1 ea0aaf5e8 / DPDK 21.05.0 initialization...
[2021-07-28 10:34:33.940985] [ DPDK EAL parameters: [2021-07-28 10:34:33.941119] abort [2021-07-28 10:34:33.941256] --no-shconf [2021-07-28 10:34:33.941382] -c 0xf [2021-07-28 10:34:33.941504] --no-pci [2021-07-28 10:34:33.941651] --log-level=lib.eal:6 [2021-07-28 10:34:33.941776] --log-level=lib.cryptodev:5 [2021-07-28 10:34:33.941916] --log-level=user1:6 [2021-07-28 10:34:33.942040] --iova-mode=pa [2021-07-28 10:34:33.942155] --base-virtaddr=0x200000000000 [2021-07-28 10:34:33.942274] --match-allocations [2021-07-28 10:34:33.942378] --file-prefix=spdk_pid38657 [2021-07-28 10:34:33.942496] ]
EAL: No available 1048576 kB hugepages reported
TELEMETRY: No legacy callbacks, legacy socket not created
Initializing NVMe Controllers
Attached to NVMe over Fabrics controller at 10.0.0.2:4420: nqn.2016-06.io.spdk:spdk_target
controller IO queue size 128 less than required
Consider using lower queue depth or small IO size because IO requests may be queued at the NVMe driver.
Associating TCP (addr:10.0.0.2 subnqn:nqn.2016-06.io.spdk:spdk_target) NSID 1 with lcore 0
Associating TCP (addr:10.0.0.2 subnqn:nqn.2016-06.io.spdk:spdk_target) NSID 1 with lcore 1
Associating TCP (addr:10.0.0.2 subnqn:nqn.2016-06.io.spdk:spdk_target) NSID 1 with lcore 2
Associating TCP (addr:10.0.0.2 subnqn:nqn.2016-06.io.spdk:spdk_target) NSID 1 with lcore 3
Initialization complete. Launching workers.
[2021-07-28 10:34:34.153703] nvme_ctrlr.c: 334:nvme_ctrlr_create_io_qpair: *ERROR*: [nqn.2016-06.io.spdk:spdk_target] get_cc failed
spdk_nvme_ctrlr_alloc_io_qpair failed
[2021-07-28 10:34:34.153973] nvme_ctrlr.c: 334:nvme_ctrlr_create_io_qpair: *ERROR*: [nqn.2016-06.io.spdk:spdk_target] get_cc failed
spdk_nvme_ctrlr_alloc_io_qpair failed
[2021-07-28 10:34:34.165643] nvme_ctrlr.c: 334:nvme_ctrlr_create_io_qpair: *ERROR*: [nqn.2016-06.io.spdk:spdk_target] get_cc failed
spdk_nvme_ctrlr_alloc_io_qpair failed
NS: TCP (addr:10.0.0.2 subnqn:nqn.2016-06.io.spdk:spdk_target) NSID 1 I/O completed: 0, failed: 0
CTRLR: TCP (addr:10.0.0.2 subnqn:nqn.2016-06.io.spdk:spdk_target) abort submitted 14, failed to submit 44824
         success 0, unsuccess 0, failed 0
NS: TCP (addr:10.0.0.2 subnqn:nqn.2016-06.io.spdk:spdk_target) NSID 1 I/O completed: 44838, failed: 0
NS: TCP (addr:10.0.0.2 subnqn:nqn.2016-06.io.spdk:spdk_target) NSID 1 I/O completed: 0, failed: 0
NS: TCP (addr:10.0.0.2 subnqn:nqn.2016-06.io.spdk:spdk_target) NSID 1 I/O completed: 0, failed: 0
/tmp/spdk/build/examples/abort: errors occured

With the patch the issue above still can be seen for qd < 128, popping up in a random fashion, but for qd=128 I get this:

10:38:46       -- nvmf/target/abort_hw.sh@32 -- # /tmp/spdk/build/examples/abort -c 0xf -q 128 -w rw -M 50 -o 4096 -r 'trtype:tcp adrfam:IPv4 traddr:10.0.0.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:spdk_target'
[2021-07-28 10:38:46.812695] Starting SPDK v21.10-pre git sha1 c0db2e1b6 / DPDK 21.05.0 initialization...
[2021-07-28 10:38:46.812785] [ DPDK EAL parameters: [2021-07-28 10:38:46.812813] abort [2021-07-28 10:38:46.812831] --no-shconf [2021-07-28 10:38:46.812845] -c 0xf [2021-07-28 10:38:46.812860] --no-pci [2021-07-28 10:38:46.812870] --log-level=lib.eal:6 [2021-07-28 10:38:46.812880] --log-level=lib.cryptodev:5 [2021-07-28 10:38:46.812894] --log-level=user1:6 [2021-07-28 10:38:46.812904] --iova-mode=pa [2021-07-28 10:38:46.812924] --base-virtaddr=0x200000000000 [2021-07-28 10:38:46.812938] --match-allocations [2021-07-28 10:38:46.812948] --file-prefix=spdk_pid44621 [2021-07-28 10:38:46.812957] ]
EAL: No available 1048576 kB hugepages reported
TELEMETRY: No legacy callbacks, legacy socket not created
Initializing NVMe Controllers
[2021-07-28 10:38:46.978721] ctrlr.c: 702:_nvmf_ctrlr_connect: *ERROR*: Invalid SQSIZE for admin queue 135 (min 1, max 127)
[2021-07-28 10:38:46.978821] nvme_fabric.c: 452:nvme_fabric_qpair_connect: *ERROR*: Connect command failed, rc -5, trtype:TCP adrfam:IPv4 traddr:10.0.0.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:spdk_target
[2021-07-28 10:38:46.978899] nvme_fabric.c: 460:nvme_fabric_qpair_connect: *ERROR*: Connect command completed with error: sct 1, sc 130
[2021-07-28 10:38:46.978949] nvme_tcp.c:1940:nvme_tcp_ctrlr_connect_qpair: *ERROR*: Failed to send an NVMe-oF Fabric CONNECT command
[2021-07-28 10:38:46.979018] nvme.c: 705:nvme_ctrlr_poll_internal: *ERROR*: Failed to initialize SSD: 10.0.0.2
[2021-07-28 10:38:46.979049] nvme_ctrlr.c: 937:nvme_ctrlr_fail: *ERROR*: [nqn.2016-06.io.spdk:spdk_target] in failed state.
[2021-07-28 10:38:46.979084] nvme_ctrlr.c: 965:nvme_ctrlr_shutdown_async: *ERROR*: [nqn.2016-06.io.spdk:spdk_target] get_cc() failed
No valid NVMe controllers found

But the abort itself doesn't report an actual failure here, i.e., its exit status is 0.

For the kernel target (configured locally) it seems like everything works as expected, but the abort's logs look a bit peculiar to me:

 10:49:17       -- nvmf/target/abort_hw.sh@32 -- # /tmp/spdk/build/examples/abort -q 128 -w rw -M 50 -o 4096 -r 'trtype:tcp adrfam:IPv4 traddr:10.0.0.1 trsvcid:4420 subnqn:kernel_target'
[2021-07-28 10:49:17.340207] Starting SPDK v21.10-pre git sha1 c0db2e1b6 / DPDK 21.05.0 initialization...
[2021-07-28 10:49:17.340317] [ DPDK EAL parameters: [2021-07-28 10:49:17.340354] abort [2021-07-28 10:49:17.340370] --no-shconf [2021-07-28 10:49:17.340384] -c 0x1 [2021-07-28 10:49:17.340398] --no-pci [2021-07-28 10:49:17.340411] --log-level=lib.eal:6 [2021-07-28 10:49:17.340425] --log-level=lib.cryptodev:5 [2021-07-28 10:49:17.340443] --log-level=user1:6 [2021-07-28 10:49:17.340457] --iova-mode=pa [2021-07-28 10:49:17.340470] --base-virtaddr=0x200000000000 [2021-07-28 10:49:17.340483] --match-allocations [2021-07-28 10:49:17.340507] --file-prefix=spdk_pid45968 [2021-07-28 10:49:17.340523] ]
EAL: No available 1048576 kB hugepages reported
TELEMETRY: No legacy callbacks, legacy socket not created
Initializing NVMe Controllers
Attached to NVMe over Fabrics controller at 10.0.0.1:4420: kernel_target
Associating TCP (addr:10.0.0.1 subnqn:kernel_target) NSID 1 with lcore 0
Initialization complete. Launching workers.
NS: TCP (addr:10.0.0.1 subnqn:kernel_target) NSID 1 I/O completed: 57980, failed: 0
CTRLR: TCP (addr:10.0.0.1 subnqn:kernel_target) abort submitted 14486, failed to submit 43494
         success 0, unsuccess 14486, failed 0

So it succeeds, but the unsuccess == submitted for each qd size - is this something normal? What's the difference between unsuccess and failed?

I added these tests https://review.spdk.io/gerrit/c/spdk/spdk/+/8962 on top of the https://review.spdk.io/gerrit/c/spdk/spdk/+/8812 to see how they behave under CI (considering the above I expect them to fail). Not sure if they properly cover the actual use-case here so any comments, suggestions on gerrit would be appreciated. :)

@mikeBashStuff
Copy link
Contributor

Here's one build run: https://ci.spdk.io/results/autotest-per-patch/builds/54449/archive/nvmf-tcp-phy-autotest/build.log

Surprisingly, successful, however, for the SPDK target and qd=128 there can be seen this (as described above):

00:10:39.171   14:54:20	-- nvmf/target/abort_hw.sh@32 -- # /var/jenkins/workspace/nvmf-tcp-phy-autotest/spdk/build/examples/abort -q 128 -w rw -M 50 -o 4096 -r 'trtype:tcp adrfam:IPv4 traddr:10.0.0.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:spdk_target'
00:10:39.183  [2021-07-28 14:54:20.867532] Starting SPDK v21.10-pre git sha1 05291afb5 / DPDK 21.05.0 initialization...
00:10:39.183  [2021-07-28 14:54:20.867643] [ DPDK EAL parameters: [2021-07-28 14:54:20.867659] abort [2021-07-28 14:54:20.867670] --no-shconf [2021-07-28 14:54:20.867681] -c 0x1 [2021-07-28 14:54:20.867691] --no-pci [2021-07-28 14:54:20.867701] --log-level=lib.eal:6 [2021-07-28 14:54:20.867711] --log-level=lib.cryptodev:5 [2021-07-28 14:54:20.867721] --log-level=user1:6 [2021-07-28 14:54:20.867730] --base-virtaddr=0x200000000000 [2021-07-28 14:54:20.867740] --match-allocations [2021-07-28 14:54:20.867750] --file-prefix=spdk_pid22742 [2021-07-28 14:54:20.867760] ]
00:10:39.197  EAL: No free 2048 kB hugepages reported on node 1
00:10:39.197  EAL: No available 1048576 kB hugepages reported
00:10:39.213  TELEMETRY: No legacy callbacks, legacy socket not created
00:10:39.236  [2021-07-28 14:54:20.921323] ctrlr.c: 702:_nvmf_ctrlr_connect: *ERROR*: Invalid SQSIZE for admin queue 135 (min 1, max 127)
00:10:39.237  [2021-07-28 14:54:20.921482] nvme_fabric.c: 452:nvme_fabric_qpair_connect: *ERROR*: Connect command failed, rc -5, trtype:TCP adrfam:IPv4 traddr:10.0.0.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:spdk_target
00:10:39.237  [2021-07-28 14:54:20.921525] nvme_fabric.c: 460:nvme_fabric_qpair_connect: *ERROR*: Connect command completed with error: sct 1, sc 130
00:10:39.237  [2021-07-28 14:54:20.921542] nvme_tcp.c:1940:nvme_tcp_ctrlr_connect_qpair: *ERROR*: Failed to send an NVMe-oF Fabric CONNECT command
00:10:39.237  [2021-07-28 14:54:20.921582] nvme.c: 705:nvme_ctrlr_poll_internal: *ERROR*: Failed to initialize SSD: 10.0.0.2
00:10:39.238  [2021-07-28 14:54:20.921603] nvme_ctrlr.c: 937:nvme_ctrlr_fail: *ERROR*: [nqn.2016-06.io.spdk:spdk_target] in failed state.
00:10:39.238  [2021-07-28 14:54:20.921627] nvme_ctrlr.c: 965:nvme_ctrlr_shutdown_async: *ERROR*: [nqn.2016-06.io.spdk:spdk_target] get_cc() failed
00:10:39.238  No valid NVMe controllers found
00:10:39.239  Initializing NVMe Controllers

@jimharris
Copy link
Member

Thanks @mikeBashStuff!

00:10:39.236  [2021-07-28 14:54:20.921323] ctrlr.c: 702:_nvmf_ctrlr_connect: *ERROR*: Invalid SQSIZE for admin queue 135 (min 1, max 127)

This error would go away if we specify -a 256 to the nvmf_create_transport RPC. But I think for now, let's just not worry about QD=128. QD=64 is sufficient.

I just realized you're using -c 0xF, while the original submitter passed no -c option (meaning just a single core). Do you have any luck reproducing the submitter's issue if you omit the -c option? I think the -c option is causing the get_cc errors - since my patch is just calculating queue depth based on a single worker.

Also @gp2017, could you try the patch provided above?

-Jim

@mikeBashStuff
Copy link
Contributor

@jimharris Initially I did use -c 0xf but then I dropped it and also excluded it from the final version of the patch I pushed - the results posted in my second comment include abort running without the -c argument (main issue didn't pop up without it either I am afraid). That said, I just noticed I've been using way smaller io size in these tests (4k instead of 40960 as seen in the description) - not sure if this is a major factor but will run additional tests tomorrow and keep them as close to the initial report as possible.

@jimharris
Copy link
Member

@jimharris Initially I did use -c 0xf but then I dropped it and also excluded it from the final version of the patch I pushed - the results posted in my second comment include abort running without the -c argument (main issue didn't pop up without it either I am afraid). That said, I just noticed I've been using way smaller io size in these tests (4k instead of 40960 as seen in the description) - not sure if this is a major factor but will run additional tests tomorrow and keep them as close to the initial report as possible.

Sounds good @mikeBashStuff. Thanks!

@mikeBashStuff
Copy link
Contributor

mikeBashStuff commented Jul 30, 2021

@jimharris I run the very same tests but with io size bumped up to 40960. Still can't reproduce the main issue, however, with or without the patch the qd=64 is now hit by SEGV - core file here abort.zip and the bt:

gdb: warning: Couldn't determine a path for the index cache directory.
[New LWP 6301]
[New LWP 6302]
[New LWP 6303]

warning: Expected absolute pathname for libpthread in the inferior, but got .gnu_debugdata for /lib64/libpthread.so.0.

warning: File "/usr/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /usr/lib64/libthread_db-1.0.so
line to your configuration file "$HOME/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "$HOME/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.

warning: Expected absolute pathname for libpthread in the inferior, but got .gnu_debugdata for /lib64/libpthread.so.0.

warning: File "/usr/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Core was generated by `/home/vagrant/spdk_repo/spdk/build/examples/abort -q 64 -w rw -M 50 -o 40960 -r'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  nvme_request_remove_child (child=0x2000003e7600, parent=0x0) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1246
1246            parent->num_children--;
[Current thread is 1 (LWP 6301)]

Thread 3 (LWP 6303):
#0  0x00007f30a2a60a7f in accept () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00000000004b81a3 in socket_listener ()
No symbol table info available.
#2  0x00007f30a2a57299 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007f30a297f353 in ?? () from /lib64/libc.so.6
No symbol table info available.

Thread 2 (LWP 6302):
#0  0x00007f30a297f69e in ?? () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000000000477374 in eal_intr_thread_main ()
No symbol table info available.
#2  0x00007f30a2a57299 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007f30a297f353 in ?? () from /lib64/libc.so.6
No symbol table info available.

Thread 1 (LWP 6301):
#0  nvme_request_remove_child (child=0x2000003e7600, parent=0x0) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1246
No locals.
#1  nvme_complete_abort_request (ctx=0x2000003e7600, cpl=0x2000003e76a4) at nvme_ctrlr_cmd.c:671
        req = 0x2000003e7600
        parent = 0x0
        ctrlr = <optimized out>
#2  0x000000000041f7dc in nvme_complete_request (qpair=<optimized out>, qpair=<optimized out>, cpl=<optimized out>, req=0x2000003e7600, cb_arg=<optimized out>, cb_fn=<optimized out>) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1204
        err_cpl = {cdw0 = 4094080, cdw1 = 8192, sqhd = 424, sqid = 89, cid = 0, {status_raw = 0, status = {p = 0, sc = 0, sct = 0, crd = 0, m = 0, dnr = 0}}}
        cmd = <optimized out>
        err_cpl = {cdw0 = <optimized out>, cdw1 = <optimized out>, sqhd = <optimized out>, sqid = <optimized out>, cid = <optimized out>, {status_raw = <optimized out>, status = {p = <optimized out>, sc = <optimized out>, sct = <optimized out>, crd = <optimized out>, m = <optimized out>, dnr = <optimized out>}}}
        cmd = <optimized out>
#3  nvme_ctrlr_retry_queued_abort (ctrlr=0x1dffcd0) at nvme_ctrlr_cmd.c:589
        next = 0x2000003e7600
        tmp = 0x0
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#4  0x000000000041fbb4 in nvme_ctrlr_retry_queued_abort (ctrlr=<optimized out>) at nvme_ctrlr_cmd.c:575
        next = <optimized out>
        tmp = <optimized out>
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#5  nvme_complete_abort_request (ctx=0x2000003e7880, cpl=0x2000003e7924) at nvme_ctrlr_cmd.c:669
        req = 0x2000003e7880
        parent = 0x0
        ctrlr = <optimized out>
#6  0x000000000041f7dc in nvme_complete_request (qpair=<optimized out>, qpair=<optimized out>, cpl=<optimized out>, req=0x2000003e7880, cb_arg=<optimized out>, cb_fn=<optimized out>) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1204
        err_cpl = {cdw0 = 4094720, cdw1 = 8192, sqhd = 424, sqid = 89, cid = 0, {status_raw = 0, status = {p = 0, sc = 0, sct = 0, crd = 0, m = 0, dnr = 0}}}
        cmd = <optimized out>
        err_cpl = {cdw0 = <optimized out>, cdw1 = <optimized out>, sqhd = <optimized out>, sqid = <optimized out>, cid = <optimized out>, {status_raw = <optimized out>, status = {p = <optimized out>, sc = <optimized out>, sct = <optimized out>, crd = <optimized out>, m = <optimized out>, dnr = <optimized out>}}}
        cmd = <optimized out>
#7  nvme_ctrlr_retry_queued_abort (ctrlr=0x1dffcd0) at nvme_ctrlr_cmd.c:589
        next = 0x2000003e7880
        tmp = 0x2000003e7600
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#8  0x000000000041fbb4 in nvme_ctrlr_retry_queued_abort (ctrlr=<optimized out>) at nvme_ctrlr_cmd.c:575
        next = <optimized out>
        tmp = <optimized out>
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#9  nvme_complete_abort_request (ctx=0x2000003e7b00, cpl=0x2000003e7ba4) at nvme_ctrlr_cmd.c:669
        req = 0x2000003e7b00
        parent = 0x0
        ctrlr = <optimized out>
#10 0x000000000041f7dc in nvme_complete_request (qpair=<optimized out>, qpair=<optimized out>, cpl=<optimized out>, req=0x2000003e7b00, cb_arg=<optimized out>, cb_fn=<optimized out>) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1204
        err_cpl = {cdw0 = 4095360, cdw1 = 8192, sqhd = 424, sqid = 89, cid = 0, {status_raw = 0, status = {p = 0, sc = 0, sct = 0, crd = 0, m = 0, dnr = 0}}}
        cmd = <optimized out>
        err_cpl = {cdw0 = <optimized out>, cdw1 = <optimized out>, sqhd = <optimized out>, sqid = <optimized out>, cid = <optimized out>, {status_raw = <optimized out>, status = {p = <optimized out>, sc = <optimized out>, sct = <optimized out>, crd = <optimized out>, m = <optimized out>, dnr = <optimized out>}}}
        cmd = <optimized out>
#11 nvme_ctrlr_retry_queued_abort (ctrlr=0x1dffcd0) at nvme_ctrlr_cmd.c:589
        next = 0x2000003e7b00
        tmp = 0x2000003e7880
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#12 0x000000000041fbb4 in nvme_ctrlr_retry_queued_abort (ctrlr=<optimized out>) at nvme_ctrlr_cmd.c:575
        next = <optimized out>
        tmp = <optimized out>
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#13 nvme_complete_abort_request (ctx=0x2000003e7d80, cpl=0x2000003e7e24) at nvme_ctrlr_cmd.c:669
        req = 0x2000003e7d80
        parent = 0x0
        ctrlr = <optimized out>
#14 0x000000000041f7dc in nvme_complete_request (qpair=<optimized out>, qpair=<optimized out>, cpl=<optimized out>, req=0x2000003e7d80, cb_arg=<optimized out>, cb_fn=<optimized out>) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1204
        err_cpl = {cdw0 = 4096640, cdw1 = 8192, sqhd = 424, sqid = 89, cid = 0, {status_raw = 0, status = {p = 0, sc = 0, sct = 0, crd = 0, m = 0, dnr = 0}}}
        cmd = <optimized out>
        err_cpl = {cdw0 = <optimized out>, cdw1 = <optimized out>, sqhd = <optimized out>, sqid = <optimized out>, cid = <optimized out>, {status_raw = <optimized out>, status = {p = <optimized out>, sc = <optimized out>, sct = <optimized out>, crd = <optimized out>, m = <optimized out>, dnr = <optimized out>}}}
        cmd = <optimized out>
#15 nvme_ctrlr_retry_queued_abort (ctrlr=0x1dffcd0) at nvme_ctrlr_cmd.c:589
        next = 0x2000003e7d80
        tmp = 0x2000003e7b00
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#16 0x000000000041fbb4 in nvme_ctrlr_retry_queued_abort (ctrlr=<optimized out>) at nvme_ctrlr_cmd.c:575
        next = <optimized out>
        tmp = <optimized out>
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#17 nvme_complete_abort_request (ctx=0x2000003e8280, cpl=0x2000003e8324) at nvme_ctrlr_cmd.c:669
        req = 0x2000003e8280
        parent = 0x0
        ctrlr = <optimized out>
#18 0x000000000041f7dc in nvme_complete_request (qpair=<optimized out>, qpair=<optimized out>, cpl=<optimized out>, req=0x2000003e8280, cb_arg=<optimized out>, cb_fn=<optimized out>) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1204
        err_cpl = {cdw0 = 4096000, cdw1 = 8192, sqhd = 424, sqid = 89, cid = 0, {status_raw = 0, status = {p = 0, sc = 0, sct = 0, crd = 0, m = 0, dnr = 0}}}
        cmd = <optimized out>
        err_cpl = {cdw0 = <optimized out>, cdw1 = <optimized out>, sqhd = <optimized out>, sqid = <optimized out>, cid = <optimized out>, {status_raw = <optimized out>, status = {p = <optimized out>, sc = <optimized out>, sct = <optimized out>, crd = <optimized out>, m = <optimized out>, dnr = <optimized out>}}}
        cmd = <optimized out>
#19 nvme_ctrlr_retry_queued_abort (ctrlr=0x1dffcd0) at nvme_ctrlr_cmd.c:589
        next = 0x2000003e8280
        tmp = 0x2000003e7d80
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#20 0x000000000041fbb4 in nvme_ctrlr_retry_queued_abort (ctrlr=<optimized out>) at nvme_ctrlr_cmd.c:575
        next = <optimized out>
        tmp = <optimized out>
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#21 nvme_complete_abort_request (ctx=0x2000003e8000, cpl=0x2000003e80a4) at nvme_ctrlr_cmd.c:669
        req = 0x2000003e8000
        parent = 0x0
        ctrlr = <optimized out>
#22 0x000000000041f7dc in nvme_complete_request (qpair=<optimized out>, qpair=<optimized out>, cpl=<optimized out>, req=0x2000003e8000, cb_arg=<optimized out>, cb_fn=<optimized out>) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1204
        err_cpl = {cdw0 = 4097280, cdw1 = 8192, sqhd = 424, sqid = 89, cid = 0, {status_raw = 0, status = {p = 0, sc = 0, sct = 0, crd = 0, m = 0, dnr = 0}}}
        cmd = <optimized out>
        err_cpl = {cdw0 = <optimized out>, cdw1 = <optimized out>, sqhd = <optimized out>, sqid = <optimized out>, cid = <optimized out>, {status_raw = <optimized out>, status = {p = <optimized out>, sc = <optimized out>, sct = <optimized out>, crd = <optimized out>, m = <optimized out>, dnr = <optimized out>}}}
        cmd = <optimized out>
#23 nvme_ctrlr_retry_queued_abort (ctrlr=0x1dffcd0) at nvme_ctrlr_cmd.c:589
        next = 0x2000003e8000
        tmp = 0x2000003e8280
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#24 0x000000000041fbb4 in nvme_ctrlr_retry_queued_abort (ctrlr=<optimized out>) at nvme_ctrlr_cmd.c:575
        next = <optimized out>
        tmp = <optimized out>
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#25 nvme_complete_abort_request (ctx=0x2000003e8500, cpl=0x2000003e85a4) at nvme_ctrlr_cmd.c:669
        req = 0x2000003e8500
        parent = 0x0
        ctrlr = <optimized out>
#26 0x000000000041f7dc in nvme_complete_request (qpair=<optimized out>, qpair=<optimized out>, cpl=<optimized out>, req=0x2000003e8500, cb_arg=<optimized out>, cb_fn=<optimized out>) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1204
        err_cpl = {cdw0 = 4097920, cdw1 = 8192, sqhd = 424, sqid = 89, cid = 0, {status_raw = 0, status = {p = 0, sc = 0, sct = 0, crd = 0, m = 0, dnr = 0}}}
        cmd = <optimized out>
        err_cpl = {cdw0 = <optimized out>, cdw1 = <optimized out>, sqhd = <optimized out>, sqid = <optimized out>, cid = <optimized out>, {status_raw = <optimized out>, status = {p = <optimized out>, sc = <optimized out>, sct = <optimized out>, crd = <optimized out>, m = <optimized out>, dnr = <optimized out>}}}
        cmd = <optimized out>
#27 nvme_ctrlr_retry_queued_abort (ctrlr=0x1dffcd0) at nvme_ctrlr_cmd.c:589
        next = 0x2000003e8500
        tmp = 0x2000003e8000
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#28 0x000000000041fbb4 in nvme_ctrlr_retry_queued_abort (ctrlr=<optimized out>) at nvme_ctrlr_cmd.c:575
        next = <optimized out>
        tmp = <optimized out>
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#29 nvme_complete_abort_request (ctx=0x2000003e8780, cpl=0x2000003e8824) at nvme_ctrlr_cmd.c:669
        req = 0x2000003e8780
        parent = 0x0
        ctrlr = <optimized out>
#30 0x000000000041f7dc in nvme_complete_request (qpair=<optimized out>, qpair=<optimized out>, cpl=<optimized out>, req=0x2000003e8780, cb_arg=<optimized out>, cb_fn=<optimized out>) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1204
        err_cpl = {cdw0 = 4099200, cdw1 = 8192, sqhd = 424, sqid = 89, cid = 0, {status_raw = 0, status = {p = 0, sc = 0, sct = 0, crd = 0, m = 0, dnr = 0}}}
        cmd = <optimized out>
        err_cpl = {cdw0 = <optimized out>, cdw1 = <optimized out>, sqhd = <optimized out>, sqid = <optimized out>, cid = <optimized out>, {status_raw = <optimized out>, status = {p = <optimized out>, sc = <optimized out>, sct = <optimized out>, crd = <optimized out>, m = <optimized out>, dnr = <optimized out>}}}
        cmd = <optimized out>
#31 nvme_ctrlr_retry_queued_abort (ctrlr=0x1dffcd0) at nvme_ctrlr_cmd.c:589
        next = 0x2000003e8780
        tmp = 0x2000003e8500
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#32 0x000000000041fbb4 in nvme_ctrlr_retry_queued_abort (ctrlr=<optimized out>) at nvme_ctrlr_cmd.c:575
        next = <optimized out>
        tmp = <optimized out>
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#33 nvme_complete_abort_request (ctx=0x2000003e8c80, cpl=0x2000003e8d24) at nvme_ctrlr_cmd.c:669
        req = 0x2000003e8c80
        parent = 0x0
        ctrlr = <optimized out>
#34 0x000000000041f7dc in nvme_complete_request (qpair=<optimized out>, qpair=<optimized out>, cpl=<optimized out>, req=0x2000003e8c80, cb_arg=<optimized out>, cb_fn=<optimized out>) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1204
        err_cpl = {cdw0 = 4098560, cdw1 = 8192, sqhd = 424, sqid = 89, cid = 0, {status_raw = 0, status = {p = 0, sc = 0, sct = 0, crd = 0, m = 0, dnr = 0}}}
        cmd = <optimized out>
        err_cpl = {cdw0 = <optimized out>, cdw1 = <optimized out>, sqhd = <optimized out>, sqid = <optimized out>, cid = <optimized out>, {status_raw = <optimized out>, status = {p = <optimized out>, sc = <optimized out>, sct = <optimized out>, crd = <optimized out>, m = <optimized out>, dnr = <optimized out>}}}
        cmd = <optimized out>
#35 nvme_ctrlr_retry_queued_abort (ctrlr=0x1dffcd0) at nvme_ctrlr_cmd.c:589
        next = 0x2000003e8c80
        tmp = 0x2000003e8780
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#36 0x000000000041fbb4 in nvme_ctrlr_retry_queued_abort (ctrlr=<optimized out>) at nvme_ctrlr_cmd.c:575
        next = <optimized out>
        tmp = <optimized out>
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#37 nvme_complete_abort_request (ctx=0x2000003e8a00, cpl=0x2000003e8aa4) at nvme_ctrlr_cmd.c:669
        req = 0x2000003e8a00
        parent = 0x0
        ctrlr = <optimized out>
#38 0x000000000041f7dc in nvme_complete_request (qpair=<optimized out>, qpair=<optimized out>, cpl=<optimized out>, req=0x2000003e8a00, cb_arg=<optimized out>, cb_fn=<optimized out>) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1204
        err_cpl = {cdw0 = 4099840, cdw1 = 8192, sqhd = 424, sqid = 89, cid = 0, {status_raw = 0, status = {p = 0, sc = 0, sct = 0, crd = 0, m = 0, dnr = 0}}}
        cmd = <optimized out>
        err_cpl = {cdw0 = <optimized out>, cdw1 = <optimized out>, sqhd = <optimized out>, sqid = <optimized out>, cid = <optimized out>, {status_raw = <optimized out>, status = {p = <optimized out>, sc = <optimized out>, sct = <optimized out>, crd = <optimized out>, m = <optimized out>, dnr = <optimized out>}}}
        cmd = <optimized out>
#39 nvme_ctrlr_retry_queued_abort (ctrlr=0x1dffcd0) at nvme_ctrlr_cmd.c:589
        next = 0x2000003e8a00
        tmp = 0x2000003e8c80
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#40 0x000000000041fbb4 in nvme_ctrlr_retry_queued_abort (ctrlr=<optimized out>) at nvme_ctrlr_cmd.c:575
        next = <optimized out>
        tmp = <optimized out>
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#41 nvme_complete_abort_request (ctx=0x2000003e8f00, cpl=0x2000003e8fa4) at nvme_ctrlr_cmd.c:669
        req = 0x2000003e8f00
        parent = 0x0
        ctrlr = <optimized out>
#42 0x000000000041f7dc in nvme_complete_request (qpair=<optimized out>, qpair=<optimized out>, cpl=<optimized out>, req=0x2000003e8f00, cb_arg=<optimized out>, cb_fn=<optimized out>) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1204
        err_cpl = {cdw0 = 2726688727, cdw1 = 32560, sqhd = 0, sqid = 0, cid = 0, {status_raw = 0, status = {p = 0, sc = 0, sct = 0, crd = 0, m = 0, dnr = 0}}}
        cmd = <optimized out>
        err_cpl = {cdw0 = <optimized out>, cdw1 = <optimized out>, sqhd = <optimized out>, sqid = <optimized out>, cid = <optimized out>, {status_raw = <optimized out>, status = {p = <optimized out>, sc = <optimized out>, sct = <optimized out>, crd = <optimized out>, m = <optimized out>, dnr = <optimized out>}}}
        cmd = <optimized out>
#43 nvme_ctrlr_retry_queued_abort (ctrlr=0x1dffcd0) at nvme_ctrlr_cmd.c:589
        next = 0x2000003e8f00
        tmp = 0x2000003e8a00
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#44 0x000000000041fbb4 in nvme_ctrlr_retry_queued_abort (ctrlr=<optimized out>) at nvme_ctrlr_cmd.c:575
        next = <optimized out>
        tmp = <optimized out>
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#45 nvme_complete_abort_request (ctx=0x2000003e9180, cpl=0x2000003e9224) at nvme_ctrlr_cmd.c:669
        req = 0x2000003e9180
        parent = 0x0
        ctrlr = <optimized out>
#46 0x000000000041f7dc in nvme_complete_request (qpair=<optimized out>, qpair=<optimized out>, cpl=<optimized out>, req=0x2000003e9180, cb_arg=<optimized out>, cb_fn=<optimized out>) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1204
        err_cpl = {cdw0 = 1, cdw1 = 0, sqhd = 16384, sqid = 41636, cid = 32560, {status_raw = 0, status = {p = 0, sc = 0, sct = 0, crd = 0, m = 0, dnr = 0}}}
        cmd = <optimized out>
        err_cpl = {cdw0 = <optimized out>, cdw1 = <optimized out>, sqhd = <optimized out>, sqid = <optimized out>, cid = <optimized out>, {status_raw = <optimized out>, status = {p = <optimized out>, sc = <optimized out>, sct = <optimized out>, crd = <optimized out>, m = <optimized out>, dnr = <optimized out>}}}
        cmd = <optimized out>
#47 nvme_ctrlr_retry_queued_abort (ctrlr=0x1dffcd0) at nvme_ctrlr_cmd.c:589
        next = 0x2000003e9180
        tmp = 0x2000003e8f00
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#48 0x000000000041fbb4 in nvme_ctrlr_retry_queued_abort (ctrlr=<optimized out>) at nvme_ctrlr_cmd.c:575
        next = <optimized out>
        tmp = <optimized out>
        rc = <optimized out>
        __func__ = "nvme_ctrlr_retry_queued_abort"
#49 nvme_complete_abort_request (ctx=0x2000003e9cc0, cpl=0x7ffca053c170) at nvme_ctrlr_cmd.c:669
        req = 0x2000003e9cc0
        parent = 0x2000003e9680
        ctrlr = <optimized out>
#50 0x0000000000437545 in nvme_complete_request (qpair=<optimized out>, qpair=<optimized out>, cpl=<optimized out>, req=0x2000003e9cc0, cb_arg=<optimized out>, cb_fn=<optimized out>) at /home/vagrant/spdk_repo/spdk/lib/nvme/nvme_internal.h:1204
        err_cpl = {cdw0 = 2097152, cdw1 = 0, sqhd = 5680, sqid = 480, cid = 0, {status_raw = 0, status = {p = 0, sc = 0, sct = 0, crd = 0, m = 0, dnr = 0}}}
        cmd = <optimized out>
        err_cpl = {cdw0 = <optimized out>, cdw1 = <optimized out>, sqhd = <optimized out>, sqid = <optimized out>, cid = <optimized out>, {status_raw = <optimized out>, status = {p = <optimized out>, sc = <optimized out>, sct = <optimized out>, crd = <optimized out>, m = <optimized out>, dnr = <optimized out>}}}
        cmd = <optimized out>
#51 nvme_tcp_req_complete (tcp_req=tcp_req@entry=0x1e01e60, rsp=rsp@entry=0x7ffca053c170) at nvme_tcp.c:764
        req = 0x2000003e9cc0
#52 0x0000000000437740 in nvme_tcp_qpair_abort_reqs (qpair=0x1e01630, dnr=<optimized out>) at nvme_tcp.c:780
        tcp_req = 0x1e01e60
        tmp = 0x1e01780
        cpl = {cdw0 = 5, cdw1 = 0, sqhd = 5680, sqid = 480, cid = 0, {status_raw = 16, status = {p = 0, sc = 8, sct = 0, crd = 0, m = 0, dnr = 0}}}
        tqpair = 0x1e01630
#53 0x0000000000435d86 in nvme_transport_ctrlr_disconnect_qpair (ctrlr=0x1dffcd0, qpair=qpair@entry=0x1e01630) at nvme_transport.c:410
        transport = 0x60e480 <g_spdk_transports+352>
#54 0x0000000000439b82 in nvme_tcp_qpair_process_completions (qpair=0x1e01630, max_completions=<optimized out>) at nvme_tcp.c:1757
        tqpair = 0x1e01630
        reaped = 0
        rc = <optimized out>
#55 0x00000000004328e1 in spdk_nvme_qpair_process_completions (qpair=0x1e01630, max_completions=max_completions@entry=0) at nvme_qpair.c:718
        ret = <optimized out>
        req = <optimized out>
        tmp = <optimized out>
        __func__ = "spdk_nvme_qpair_process_completions"
#56 0x0000000000424312 in spdk_nvme_ctrlr_process_admin_completions (ctrlr=0x1dffcd0) at nvme_ctrlr.c:3794
        num_completions = 0
        rc = 0
#57 0x000000000041cc41 in work_fn (arg=arg@entry=0x1dd4950) at abort.c:521
        worker = 0x1dd4950
        ns_ctx = <optimized out>
        ctrlr_ctx = 0x1e0ce90
        ns_entry = <optimized out>
        opts = {qprio = SPDK_NVME_QPRIO_URGENT, io_queue_size = 128, io_queue_requests = 512, {delay_cmd_submit = false, delay_pcie_doorbell = false}, sq = {vaddr = 0x0, paddr = 0, buffer_size = 0}, cq = {vaddr = 0x0, paddr = 0, buffer_size = 0}, create_only = false}
        tsc_end = 3154912778977
        unfinished_ctx = 0
#58 0x0000000000419b93 in main (argc=<optimized out>, argv=0x7ffca053c4f8) at abort.c:1098
        rc = <optimized out>
        worker = 0x0
        main_worker = 0x1dd4950
        opts = {name = 0x58f1dc "abort", core_mask = 0x59b0f5 "0x1", shm_id = -1, mem_channel = -1, main_core = -1, mem_size = -1, no_pci = true, hugepage_single_segments = false, unlink_hugepage = false, num_pci_addr = 0, hugedir = 0x0, pci_blocked = 0x0, pci_allowed = 0x0, iova_mode = 0x0, base_virtaddr = 35184372088832, env_context = 0x0}

From time to time (but quite rarely) I also see this for any potential abort run (noticed it when abort was built without the patch, not sure if it's related in any way though):

 08:52:25       -- nvmf/target/abort_hw.sh@32 -- # /home/vagrant/spdk_repo/spdk/build/examples/abort -q 16 -w rw -M 50 -o 40960 -r 'trtype:tcp adrfam:IPv4 traddr:10.0.0.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:spdk_target'
[2021-07-30 08:52:25.565233] Starting SPDK v21.10-pre git sha1 05291afb5 / DPDK 21.05.0 initialization...
[2021-07-30 08:52:25.565336] [ DPDK EAL parameters: [2021-07-30 08:52:25.565366] abort [2021-07-30 08:52:25.565387] --no-shconf [2021-07-30 08:52:25.565401] -c 0x1 [2021-07-30 08:52:25.565414] --no-pci [2021-07-30 08:52:25.565427] --log-level=lib.eal:6 [2021-07-30 08:52:25.565440] --log-level=lib.cryptodev:5 [2021-07-30 08:52:25.565454] --log-level=user1:6 [2021-07-30 08:52:25.565467] --iova-mode=pa [2021-07-30 08:52:25.565480] --base-virtaddr=0x200000000000 [2021-07-30 08:52:25.565494] --match-allocations [2021-07-30 08:52:25.565507] --file-prefix=spdk_pid12086 [2021-07-30 08:52:25.565530] ]
EAL: No available 1048576 kB hugepages reported
TELEMETRY: No legacy callbacks, legacy socket not created
Initializing NVMe Controllers
Attached to NVMe over Fabrics controller at 10.0.0.2:4420: nqn.2016-06.io.spdk:spdk_target
Associating TCP (addr:10.0.0.2 subnqn:nqn.2016-06.io.spdk:spdk_target) NSID 1 with lcore 0
Initialization complete. Launching workers.
[2021-07-30 08:52:26.565978] nvme_tcp.c:1000:nvme_tcp_c2h_term_req_dump: *ERROR*: Error info of pdu(0x200000372440): Invalid PDU Header Field
I/O submission failed
I/O submission failed
I/O submission failed
I/O submission failed
I/O submission failed
I/O submission failed
I/O submission failed
I/O submission failed
I/O submission failed
I/O submission failed
I/O submission failed
I/O submission failed
I/O submission failed
I/O submission failed
I/O submission failed
I/O submission failed
[2021-07-30 08:52:26.566195] nvme_qpair.c: 720:spdk_nvme_qpair_process_completions: *ERROR*: CQ transport error -6 (No such device or address) on qpair id 1
[2021-07-30 08:52:26.566258] tcp.c:1235:nvmf_tcp_qpair_set_recv_state: *ERROR*: The recv state of tqpair=0x281a0d0 is same with the state(5) to be set
NS: TCP (addr:10.0.0.2 subnqn:nqn.2016-06.io.spdk:spdk_target) NSID 1 I/O completed: 5267, failed: 16
CTRLR: TCP (addr:10.0.0.2 subnqn:nqn.2016-06.io.spdk:spdk_target) abort submitted 106, failed to submit 5177
         success 37, unsuccess 69, failed 0

This doesn't force abort to exit != 0 though so the tests continue (and next qds are working fine until qd=64 is run).

@jimharris
Copy link
Member

Thanks @mikeBashStuff. I'll take a look at this some more on my system. I was hoping this would be a relatively straightforward fix, but you've clearly found some other issues in the abort path with this testing.

Let's still wait to see if my patch fixes the original issue for @gp2017. If so, we can merge it, and then open a new issue for some of these others issues you've found.

@jimharris
Copy link
Member

@mikeBashStuff - can you describe in more detail how you hit these most recent issues?

I've been using the abort.sh test script with a few modifications.

  1. If the delay bdev is in place, I can't hit any issues at all.
  2. If I just run the abort test to a malloc bdev, and modify the abort parameters in the script, I can hit the "I/O submission failed" issue.
  3. I'm not able to hit the core dump issue at all.

@mikeBashStuff
Copy link
Contributor

@jimharris I've been using this instead https://review.spdk.io/gerrit/c/spdk/spdk/+/8962 - my idea was to use actual nvme controller for the bdev since I reckoned such setup would be a bit closer to what was reported. So with this patch you could simply run test/nvmf/target/abort_hw.sh (and just to note that I can see the very same behavior regardless if it's run inside the VM or on a bare-metal system) taking for granted you have at least one nvme device around.

@jimharris
Copy link
Member

That is a really good test @mikeBashStuff.

I think what I'd like to do is move this new testing approach, and the other failures/dumps you've found, to a new issue. The stuff you are finding seems unrelated to what @gp2017 reported, but is still something we need to continue working on.

@mikeBashStuff
Copy link
Contributor

@jimharris Moved what I found to #2063.

@tomzawadzki tomzawadzki added this to To do on next bug scrub in Bug scrub Aug 10, 2021
@tomzawadzki tomzawadzki moved this from To do on next bug scrub to 10.08 in Bug scrub Aug 10, 2021
spdk-bot pushed a commit that referenced this issue Aug 11, 2021
This abort test app will send a lot of abort commands on
the admin queue.  The default admin queue size is
relatively small (32) so increase it if necessary to
account for the expected number of outstanding abort
commands as well as any extra admin commands that may
be sent during test execution such as Keep Alive.

Fixes issue #2048.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I5f64b7fc72a028299b860f09e30d430a64c95d2a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8812
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
@jimharris
Copy link
Member

I believe this a595959 patch fixes the specific issue described here. @gp2017, please confirm and reopen if you see otherwise.

@jimharris jimharris self-assigned this Aug 21, 2021
@jimharris jimharris added bug and removed Sighting labels Aug 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Patches out for review Waiting on Submitter currently waiting on input from submitter
Projects
None yet
Development

No branches or pull requests

5 participants