Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: INTERNAL DEVICE ERROR (00/06) nvme_set_num_queues failed! #18

Closed
hlitz opened this issue Mar 23, 2016 · 11 comments
Closed

perf: INTERNAL DEVICE ERROR (00/06) nvme_set_num_queues failed! #18

hlitz opened this issue Mar 23, 2016 · 11 comments

Comments

@hlitz
Copy link

hlitz commented Mar 23, 2016

$ sudo ./perf -s 4096 -q 64 -w randrw -M 0 -c 1 -t 60
...
Created task_pool
Initializing NVMe Controllers
SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:00000007 cdw11:ffffffff
INTERNAL DEVICE ERROR (00/06) sqid:0 cid:15 cdw0:0 sqhd:0002 p:1 m:0 dnr:0
nvme_set_num_queues failed!
nvme_attach failed for controller at pci bdf 4:0:0

$ sudo ./identify
...
EAL: TSC frequency is ~2299998 KHz
EAL: Master lcore 0 is ready (tid=feab4940;cpuset=[0])
SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:00000007 cdw11:ffffffff
INTERNAL DEVICE ERROR (00/06) sqid:0 cid:15 cdw0:0 sqhd:0002 p:1 m:0 dnr:0
nvme_set_num_queues failed!
failed to attach to NVMe controller at PCI BDF 4:0:0

$ sudo ./nvme-cli/nvme error-log /dev/nvme0
NVMe Status:INTERNAL(6)

@ghost
Copy link

ghost commented Mar 23, 2016

SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:00000007 cdw11:ffffffff

This is a Set Features command with feature = 7, Number of Queues, which is normal during startup.

However, the value of CDW11 here means we are requesting 65536 submission and completion queues, which is invalid; the maximum allowed number of queues is 65535, which translates to 65534 since this is a 0-based value, and that would be encoded as 0xfffefffe. Even more confusingly, the SPDK driver requests only 1024 queues by default, which means CDW11 should be 0x3ff03ff, unless the app overrides the default num_io_queues.

Is this an unmodified, up-to-date SPDK library + perf example?

What NVMe device(s) are attached?

There is a possibly bug here which I've just noticed: if num_io_queues is overriden to 0, the nvme_ctrlr_set_num_queues() function will subtract 1 and get the noted 0xffffffff value; however, this shouldn't be happening unless something has been modified (perf does not override the default num_io_queues).

@hlitz
Copy link
Author

hlitz commented Mar 24, 2016

just did a git pull on the Intel SPDK repo and ran an unmodified perf test. This is my output:

EAL: Master lcore 0 is ready (tid=68e22940;cpuset=[0])
Initializing NVMe Controllers
Attaching to 0000:04:00.00
SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:00000007 cdw11:03ff03ff
INTERNAL DEVICE ERROR (00/06) sqid:0 cid:15 cdw0:0 sqhd:0002 p:1 m:0 dnr:0
nvme_set_num_queues failed!

cdw11 is no longer ffffffffff but it’s still much larger than expected and the same error occurs.

From print statements with this Intel perf test, I can tell that ctrlr->ops.num_io_queues is 1024 (which sounds like what it should be according to the thread on github). It’s the spdk_nvme_qpair_process_completions() function, which checks the admin queue, that fails. Perhaps something is wrong with the admin queue?

@ghost
Copy link

ghost commented Mar 24, 2016

The new CDW11 value looks correct - it is actually two copies of 1023 in the high and low words (number of I/O completion queues and number of I/O submission queues - these should always be the same since SPDK allocates submission and completion queues in pairs).

However, I am not sure what could be causing the command to fail; if the device supports fewer than the requested number of queues, it is supposed to choose its maximum value and report it back via CDW0.

The failure is noticed in the admin queue checking function, since that is where we poll for completions on admin commands, but if we are actually receiving completions here, I doubt anything is wrong with the admin queue.

Can you verify that this behavior still occurs after e.g. power cycling the host machine and/or NVMe device (especially if the kernel NVMe driver is also not working)?

@hlitz
Copy link
Author

hlitz commented Apr 29, 2016

So I looked into this issue again. Rebooted the machine but the error stays the same. Is there any way to hard reset the device? Or is there a hardware defect?

@benlwalker
Copy link
Member

We believe this was a hardware error. Please let us know if there is still a problem, but closing the issue for now.

@hradl
Copy link

hradl commented May 18, 2017

Hi - I have a similar problem - was the problem resolved?

@benlwalker
Copy link
Member

We believe the SSD in question failed, but we weren't able to confirm. You're welcome to open a new issue with a log of the problem you are seeing and we'll take a look.

@hradl
Copy link

hradl commented May 18, 2017

I have 4 cards in one server - 3 of them went offline at the sametime and returns the same error
I have not been able to regain control these cards but i think its unlikely that 3 cards go bad, no hw reboot / power off / on has resolved the problem. Can you help me getting the necessary logs/debug info?

@hradl
Copy link

hradl commented May 18, 2017

firmware version is 8EV101F0
from kernel log
nvme 020d:01:00.0: rtas_msi: allocated virq 37
nvme 020d:01:00.0: Could not set queue count (6)
nvme nvme1: IO queues not created

@benlwalker
Copy link
Member

Reboot and then dump the output from dmesg (and filter it down to the messages with 'nvme' in them maybe). But if the Linux kernel driver isn't loading (which is what the above log indicates) then it definitely isn't an SPDK problem.

Can you switch which PCIe slots the cards are in, maybe?

@hradl
Copy link

hradl commented May 19, 2017

I found other articles having similar problem at intel support. I don't think its a SPDK problem
thanks anyway
https://communities.intel.com/thread/109279

mgerdts pushed a commit to mgerdts/spdk that referenced this issue Dec 13, 2021
Signed-off-by: Andrii Holovchenko <andriih@mellanox.com>

Co-authored-by: Andrii Holovchenko <andriih@mellanox.com>
tmakatos pushed a commit to tmakatos/spdk that referenced this issue Feb 8, 2022
The controller data structure may be freed before subsystem resume done
callback, we can take endpoint as the input parameter to avoid this issue.

AddressSanitizer: heap-use-after-free on address 0x625000046100 at pc 0x00000082818f bp 0x7fff7b09bd10 sp 0x7fff7b09bd00
READ of size 8 at 0x625000046100 thread T0 (reactor_0)
    #0 0x82818e in vfio_user_dev_quiesce_resume_done /spdk/lib/nvmf/vfio_user.c:2147
    #1 0x782cc0 in subsystem_state_change_done /spdk/lib/nvmf/subsystem.c:634
    #2 0xad047b in _call_completion /spdk/lib/thread/thread.c:2344
    #3 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710
    spdk#4 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926
    spdk#5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)
    spdk#12 0x407abd in _start (/spdk/build/bin/nvmf_tgt+0x407abd)

0x625000046100 is located 0 bytes inside of 8320-byte region [0x625000046100,0x625000048180)
freed by thread T0 (reactor_0) here:
    #0 0x7f82219ff91f in __interceptor_free (/lib64/libasan.so.5+0x10d91f)
    #1 0x837059 in _free_ctrlr /spdk/lib/nvmf/vfio_user.c:2976
    #2 0x837327 in free_ctrlr /spdk/lib/nvmf/vfio_user.c:2996
    #3 0x843541 in nvmf_vfio_user_close_qpair /spdk/lib/nvmf/vfio_user.c:3742
    spdk#4 0x7d1d91 in nvmf_transport_qpair_fini /spdk/lib/nvmf/transport.c:604
    spdk#5 0x7ad922 in _nvmf_qpair_destroy /spdk/lib/nvmf/nvmf.c:1055
    spdk#6 0x761362 in nvmf_qpair_request_cleanup /spdk/lib/nvmf/ctrlr.c:4026
    spdk#7 0x761906 in spdk_nvmf_request_free /spdk/lib/nvmf/ctrlr.c:4041
    spdk#8 0x75a931 in nvmf_qpair_free_aer /spdk/lib/nvmf/ctrlr.c:3576
    spdk#9 0x7ae626 in spdk_nvmf_qpair_disconnect /spdk/lib/nvmf/nvmf.c:1127
    spdk#10 0x83db36 in _vfio_user_qpair_disconnect /spdk/lib/nvmf/vfio_user.c:3433
    spdk#11 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710
    spdk#12 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926
    spdk#13 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#14 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#15 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#16 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#17 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#18 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#19 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)

previously allocated by thread T0 (reactor_0) here:
    #0 0x7f82219fff16 in __interceptor_calloc (/lib64/libasan.so.5+0x10df16)
    #1 0x837413 in nvmf_vfio_user_create_ctrlr /spdk/lib/nvmf/vfio_user.c:3010
    #2 0x83bc68 in nvmf_vfio_user_accept /spdk/lib/nvmf/vfio_user.c:3313
    #3 0xabfbd8 in thread_execute_timed_poller /spdk/lib/thread/thread.c:872
    spdk#4 0xac0c75 in thread_poll /spdk/lib/thread/thread.c:960
    spdk#5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)

SUMMARY: AddressSanitizer: heap-use-after-free /spdk/lib/nvmf/vfio_user.c:2147 in vfio_user_dev_quiesce_resume_done

Change-Id: Icf5e5b360b9107a3c5eb960ae59b7fe10ace1c66
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
jlevon pushed a commit to jlevon/spdk that referenced this issue Feb 9, 2022
The controller data structure may be freed before subsystem resume done
callback, we can take endpoint as the input parameter to avoid this issue.

AddressSanitizer: heap-use-after-free on address 0x625000046100 at pc 0x00000082818f bp 0x7fff7b09bd10 sp 0x7fff7b09bd00
READ of size 8 at 0x625000046100 thread T0 (reactor_0)
    #0 0x82818e in vfio_user_dev_quiesce_resume_done /spdk/lib/nvmf/vfio_user.c:2147
    spdk#1 0x782cc0 in subsystem_state_change_done /spdk/lib/nvmf/subsystem.c:634
    spdk#2 0xad047b in _call_completion /spdk/lib/thread/thread.c:2344
    spdk#3 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710
    spdk#4 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926
    spdk#5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)
    spdk#12 0x407abd in _start (/spdk/build/bin/nvmf_tgt+0x407abd)

0x625000046100 is located 0 bytes inside of 8320-byte region [0x625000046100,0x625000048180)
freed by thread T0 (reactor_0) here:
    #0 0x7f82219ff91f in __interceptor_free (/lib64/libasan.so.5+0x10d91f)
    spdk#1 0x837059 in _free_ctrlr /spdk/lib/nvmf/vfio_user.c:2976
    spdk#2 0x837327 in free_ctrlr /spdk/lib/nvmf/vfio_user.c:2996
    spdk#3 0x843541 in nvmf_vfio_user_close_qpair /spdk/lib/nvmf/vfio_user.c:3742
    spdk#4 0x7d1d91 in nvmf_transport_qpair_fini /spdk/lib/nvmf/transport.c:604
    spdk#5 0x7ad922 in _nvmf_qpair_destroy /spdk/lib/nvmf/nvmf.c:1055
    spdk#6 0x761362 in nvmf_qpair_request_cleanup /spdk/lib/nvmf/ctrlr.c:4026
    spdk#7 0x761906 in spdk_nvmf_request_free /spdk/lib/nvmf/ctrlr.c:4041
    spdk#8 0x75a931 in nvmf_qpair_free_aer /spdk/lib/nvmf/ctrlr.c:3576
    spdk#9 0x7ae626 in spdk_nvmf_qpair_disconnect /spdk/lib/nvmf/nvmf.c:1127
    spdk#10 0x83db36 in _vfio_user_qpair_disconnect /spdk/lib/nvmf/vfio_user.c:3433
    spdk#11 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710
    spdk#12 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926
    spdk#13 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#14 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#15 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#16 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#17 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#18 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#19 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)

previously allocated by thread T0 (reactor_0) here:
    #0 0x7f82219fff16 in __interceptor_calloc (/lib64/libasan.so.5+0x10df16)
    spdk#1 0x837413 in nvmf_vfio_user_create_ctrlr /spdk/lib/nvmf/vfio_user.c:3010
    spdk#2 0x83bc68 in nvmf_vfio_user_accept /spdk/lib/nvmf/vfio_user.c:3313
    spdk#3 0xabfbd8 in thread_execute_timed_poller /spdk/lib/thread/thread.c:872
    spdk#4 0xac0c75 in thread_poll /spdk/lib/thread/thread.c:960
    spdk#5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)

SUMMARY: AddressSanitizer: heap-use-after-free /spdk/lib/nvmf/vfio_user.c:2147 in vfio_user_dev_quiesce_resume_done

Change-Id: Icf5e5b360b9107a3c5eb960ae59b7fe10ace1c66
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
spdk-bot pushed a commit that referenced this issue Feb 10, 2022
The controller data structure may be freed before subsystem resume done
callback, we can take endpoint as the input parameter to avoid this issue.

AddressSanitizer: heap-use-after-free on address 0x625000046100 at pc 0x00000082818f bp 0x7fff7b09bd10 sp 0x7fff7b09bd00
READ of size 8 at 0x625000046100 thread T0 (reactor_0)
    #0 0x82818e in vfio_user_dev_quiesce_resume_done /spdk/lib/nvmf/vfio_user.c:2147
    #1 0x782cc0 in subsystem_state_change_done /spdk/lib/nvmf/subsystem.c:634
    #2 0xad047b in _call_completion /spdk/lib/thread/thread.c:2344
    #3 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710
    #4 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926
    #5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    #6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    #7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    #8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    #9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    #10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    #11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)
    #12 0x407abd in _start (/spdk/build/bin/nvmf_tgt+0x407abd)

0x625000046100 is located 0 bytes inside of 8320-byte region [0x625000046100,0x625000048180)
freed by thread T0 (reactor_0) here:
    #0 0x7f82219ff91f in __interceptor_free (/lib64/libasan.so.5+0x10d91f)
    #1 0x837059 in _free_ctrlr /spdk/lib/nvmf/vfio_user.c:2976
    #2 0x837327 in free_ctrlr /spdk/lib/nvmf/vfio_user.c:2996
    #3 0x843541 in nvmf_vfio_user_close_qpair /spdk/lib/nvmf/vfio_user.c:3742
    #4 0x7d1d91 in nvmf_transport_qpair_fini /spdk/lib/nvmf/transport.c:604
    #5 0x7ad922 in _nvmf_qpair_destroy /spdk/lib/nvmf/nvmf.c:1055
    #6 0x761362 in nvmf_qpair_request_cleanup /spdk/lib/nvmf/ctrlr.c:4026
    #7 0x761906 in spdk_nvmf_request_free /spdk/lib/nvmf/ctrlr.c:4041
    #8 0x75a931 in nvmf_qpair_free_aer /spdk/lib/nvmf/ctrlr.c:3576
    #9 0x7ae626 in spdk_nvmf_qpair_disconnect /spdk/lib/nvmf/nvmf.c:1127
    #10 0x83db36 in _vfio_user_qpair_disconnect /spdk/lib/nvmf/vfio_user.c:3433
    #11 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710
    #12 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926
    #13 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    #14 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    #15 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    #16 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    #17 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    #18 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    #19 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)

previously allocated by thread T0 (reactor_0) here:
    #0 0x7f82219fff16 in __interceptor_calloc (/lib64/libasan.so.5+0x10df16)
    #1 0x837413 in nvmf_vfio_user_create_ctrlr /spdk/lib/nvmf/vfio_user.c:3010
    #2 0x83bc68 in nvmf_vfio_user_accept /spdk/lib/nvmf/vfio_user.c:3313
    #3 0xabfbd8 in thread_execute_timed_poller /spdk/lib/thread/thread.c:872
    #4 0xac0c75 in thread_poll /spdk/lib/thread/thread.c:960
    #5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    #6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    #7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    #8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    #9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    #10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    #11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)

SUMMARY: AddressSanitizer: heap-use-after-free /spdk/lib/nvmf/vfio_user.c:2147 in vfio_user_dev_quiesce_resume_done

Change-Id: Icf5e5b360b9107a3c5eb960ae59b7fe10ace1c66
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11420
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: John Levon <levon@movementarian.org>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
jlevon pushed a commit to jlevon/spdk that referenced this issue May 12, 2022
The controller data structure may be freed before subsystem resume done
callback, we can take endpoint as the input parameter to avoid this issue.

AddressSanitizer: heap-use-after-free on address 0x625000046100 at pc 0x00000082818f bp 0x7fff7b09bd10 sp 0x7fff7b09bd00
READ of size 8 at 0x625000046100 thread T0 (reactor_0)
    #0 0x82818e in vfio_user_dev_quiesce_resume_done /spdk/lib/nvmf/vfio_user.c:2147
    spdk#1 0x782cc0 in subsystem_state_change_done /spdk/lib/nvmf/subsystem.c:634
    spdk#2 0xad047b in _call_completion /spdk/lib/thread/thread.c:2344
    spdk#3 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710
    spdk#4 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926
    spdk#5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)
    spdk#12 0x407abd in _start (/spdk/build/bin/nvmf_tgt+0x407abd)

0x625000046100 is located 0 bytes inside of 8320-byte region [0x625000046100,0x625000048180)
freed by thread T0 (reactor_0) here:
    #0 0x7f82219ff91f in __interceptor_free (/lib64/libasan.so.5+0x10d91f)
    spdk#1 0x837059 in _free_ctrlr /spdk/lib/nvmf/vfio_user.c:2976
    spdk#2 0x837327 in free_ctrlr /spdk/lib/nvmf/vfio_user.c:2996
    spdk#3 0x843541 in nvmf_vfio_user_close_qpair /spdk/lib/nvmf/vfio_user.c:3742
    spdk#4 0x7d1d91 in nvmf_transport_qpair_fini /spdk/lib/nvmf/transport.c:604
    spdk#5 0x7ad922 in _nvmf_qpair_destroy /spdk/lib/nvmf/nvmf.c:1055
    spdk#6 0x761362 in nvmf_qpair_request_cleanup /spdk/lib/nvmf/ctrlr.c:4026
    spdk#7 0x761906 in spdk_nvmf_request_free /spdk/lib/nvmf/ctrlr.c:4041
    spdk#8 0x75a931 in nvmf_qpair_free_aer /spdk/lib/nvmf/ctrlr.c:3576
    spdk#9 0x7ae626 in spdk_nvmf_qpair_disconnect /spdk/lib/nvmf/nvmf.c:1127
    spdk#10 0x83db36 in _vfio_user_qpair_disconnect /spdk/lib/nvmf/vfio_user.c:3433
    spdk#11 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710
    spdk#12 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926
    spdk#13 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#14 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#15 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#16 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#17 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#18 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#19 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)

previously allocated by thread T0 (reactor_0) here:
    #0 0x7f82219fff16 in __interceptor_calloc (/lib64/libasan.so.5+0x10df16)
    spdk#1 0x837413 in nvmf_vfio_user_create_ctrlr /spdk/lib/nvmf/vfio_user.c:3010
    spdk#2 0x83bc68 in nvmf_vfio_user_accept /spdk/lib/nvmf/vfio_user.c:3313
    spdk#3 0xabfbd8 in thread_execute_timed_poller /spdk/lib/thread/thread.c:872
    spdk#4 0xac0c75 in thread_poll /spdk/lib/thread/thread.c:960
    spdk#5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)

SUMMARY: AddressSanitizer: heap-use-after-free /spdk/lib/nvmf/vfio_user.c:2147 in vfio_user_dev_quiesce_resume_done

Change-Id: Icf5e5b360b9107a3c5eb960ae59b7fe10ace1c66
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11420
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: John Levon <levon@movementarian.org>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
(cherry picked from commit 6f0ff37)
mgerdts pushed a commit to mgerdts/spdk that referenced this issue Mar 22, 2023
Signed-off-by: Andrii Holovchenko <andriih@mellanox.com>

Co-authored-by: Andrii Holovchenko <andriih@mellanox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants