Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix address resolution issue with -s option #30

Closed
wants to merge 1 commit into from
Closed

Fix address resolution issue with -s option #30

wants to merge 1 commit into from

Conversation

quantumdude836
Copy link

In qemu_chr_parse_socket, port is copied to addr, but has_port is not set, so port isn't copied in future qapi_copy_SocketAddress calls. This results in an "address resolution failed for ::" message when using the -s option (or using -gdb tcp::<port>, i.e. not specifying a hostname).

`port` is set in `addr`, but `has_port` is not, so `port` isn't copied in
  `qapi_copy_SocketAddress`
@up2wing
Copy link
Contributor

up2wing commented Oct 26, 2015

Well, as I know, here is just a mirror of qemu.git. It seemed that Qemu only accept patches from mailing list:
http://wiki.qemu.org/Contribute/SubmitAPatch

Thx

@quantumdude836
Copy link
Author

Ah ok, thanks

lalrae pushed a commit to lalrae/qemu that referenced this pull request Apr 18, 2016
target-mips: fix hflags modified in delay / forbidden slot

Fixes qemu#29

Signed-off-by: Leon Alrae leon.alrae@imgtec.com
aik referenced this pull request in open-power-host-os/qemu Jun 15, 2016
When a previously removed CPU is hotplugged, ensure that ICP is associated
with it. This is required to correctly remove that CPU on the target side
after migration.

This fixes the following softlock up when CPU removal is attempted on the
target side after migration:

NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [drmgr:1755]
CPU: 5 PID: 1755 Comm: drmgr Not tainted 4.1.0-rc8+ #30
task: c0000000f3fbaa00 ti: c0000000f9ee0000 task.ti: c0000000f9ee0000
NIP: c000000000145d6c LR: c000000000145d34 CTR: c0000000000b9730
REGS: c0000000f9ee30c0 TRAP: 0901   Not tainted  (4.1.0-rc8+)
MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 42008482  XER: 00000000
CFAR: c000000000145d7c SOFTE: 1
00145d34 c0000000f9ee3340 c0000000014d0100 0000000000000000
00000040 0000000000000004 c0000000ffe1fab0 c000000001670100
00000001 0000000000000003 0000000000003800 c0000000008f0760
22000484 c00000000ff21180
NIP [c000000000145d6c] .smp_call_function_single+0x16c/0x1d0
LR [c000000000145d34] .smp_call_function_single+0x134/0x1d0
Call Trace:
[c0000000f9ee3340] [c000000000145d34] .smp_call_function_single+0x134/0x1d0 (unreliable)
[c0000000f9ee33f0] [c0000000001d1940] .perf_event_exit_cpu+0xd0/0x1a0
[c0000000f9ee34e0] [c0000000001d1c88] .perf_cpu_notify+0x58/0x90
[c0000000f9ee3550] [c0000000000cccfc] .notifier_call_chain+0x8c/0x100
[c0000000f9ee35f0] [c0000000001e4848] ._cpu_down+0xc8/0x3f0
[c0000000f9ee36c0] [c0000000001e4bc0] .cpu_down+0x50/0x90
[c0000000f9ee3740] [c000000000566494] .cpu_subsys_offline+0x24/0x40
[c0000000f9ee37c0] [c00000000055e6e8] .device_offline+0xf8/0x140
[c0000000f9ee3850] [c00000000007ead8] .dlpar_cpu_release+0x278/0x2e0
[c0000000f9ee3940] [c00000000001b214] .arch_cpu_release+0x54/0x80
[c0000000f9ee39c0] [c000000000566980] .cpu_release_store+0x40/0x70
[c0000000f9ee3a40] [c00000000055a754] .dev_attr_store+0x64/0xa0
[c0000000f9ee3ad0] [c00000000032096c] .sysfs_kf_write+0x7c/0xb0
[c0000000f9ee3b60] [c00000000031f900] .kernfs_fop_write+0x190/0x1f0
[c0000000f9ee3c00] [c00000000027afd8] .__vfs_write+0x68/0x190
[c0000000f9ee3cf0] [c00000000027bca8] .vfs_write+0xb8/0x200
[c0000000f9ee3d90] [c00000000027cc44] .SyS_write+0x64/0x110

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
qemu-deploy pushed a commit that referenced this pull request Jul 31, 2018
With a Spice port chardev, it is possible to reenter
monitor_qapi_event_queue() (when the client disconnects for
example). This will dead-lock on monitor_lock.

Instead, use some TLS variables to check for recursion and queue the
events.

Fixes:
 (gdb) bt
 #0  0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0
 #1  0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0
 #2  0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
 #3  0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645
 #4  0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149
 #5  0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235
 #6  0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316
 #7  0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197
 #8  0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197
 #9  0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414
 #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388
 #11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347
 #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0
 #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341
 #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
 #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
 #16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353
 #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199
 #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112
 #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147
 #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42
 #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425
 #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468
 #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517
 #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538
 #25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624
 #26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649
 #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58
 #28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822
 #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862
 #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644

Note that error report is now moved to the first caller, which may
receive an error for a recursed event. This is probably fine (95% of
callers use &error_abort, the rest have NULL error and ignore it)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180731150144.14022-1-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[*_no_recurse renamed to *_no_reenter, local variables reordered]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
qemu-deploy pushed a commit that referenced this pull request Sep 25, 2018
In qemu_laio_process_completions_and_submit, the AioContext is acquired
before the ioq_submit iteration and after qemu_laio_process_completions,
but the latter is not thread safe either.

This change avoids a number of random crashes when the Main Thread and
an IO Thread collide processing completions for the same AioContext.
This is an example of such crash:

 - The IO Thread is trying to acquire the AioContext at aio_co_enter,
   which evidences that it didn't lock it before:

Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)):
 #0  0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 #1  0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0
 #2  0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0)
    at ../nptl/pthread_mutex_lock.c:114
 #3  0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66
 #4  0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493
 #5  0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478
 #6  0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104
 #7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
 #8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
    at block/linux-aio.c:237
 #9  0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406
 #10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true)
    at util/aio-posix.c:693
 #11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64
 #12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308
 #13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

 - The Main Thread is also processing completions from the same
   AioContext, and crashes due to failed assertion at util/iov.c:78:

Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)):
 #0  0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 #1  0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90
 #2  0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset")
    at assert.c:92
 #3  0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101
 #4  0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78
 #5  0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410
 #6  0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88
 #7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
 #8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
    at block/linux-aio.c:237
 #9  0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272
 #10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497
 #11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574
 #12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false)
    at util/aio-posix.c:604
 #13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273
 #14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390
 #15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590
 #16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774
 #17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401
 #18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449
 #19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080)
    at block/commit.c:92
 #20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973
 #21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90
 #22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118
 #23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436
 #24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
 #25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201
 #26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854
 #27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215
 #28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238
 #29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497
 #30 0x00005631fbd81412 in main_loop () at vl.c:1866
 #31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4647

 - A closer examination shows that s->io_q.in_flight appears to have
   gone backwards:

(gdb) frame 7
 #7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
222	            qemu_laio_process_completion(laiocb);
(gdb) p s
$2 = (LinuxAioState *) 0x7fdfc0297670
(gdb) p *s
$3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0,
    in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0,
      sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241}
(gdb) p/x s->io_q.in_flight
$4 = 0xfffffff0

Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
famz pushed a commit to famz/qemu that referenced this pull request Sep 28, 2018
Both virtio-blk and virtio-scsi use virtio_queue_empty() as the
loop condition in VQ handlers (virtio_blk_handle_vq,
virtio_scsi_handle_cmd_vq). When a device is marked broken in
virtqueue_pop, for example if a vIOMMU address translation failed, we
want to break out of the loop.

This fixes a hanging problem when booting a CentOS 3.10.0-862.el7.x86_64
kernel:

  $ qemu-system-x86_64 \
    ... \
    -device intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on \
    -device virtio-scsi-pci,iommu_platform=on,ats=on,id=scsi0,bus=pci.4,addr=0x0

The dead loop happens immediately when the kernel boots and initializes
the device, where virtio_scsi_data_plane_handle_cmd will not return:

    > ...
    > qemu#13 0x00005586602b7793 in virtio_scsi_handle_cmd_vq
    > qemu#14 0x00005586602b8d66 in virtio_scsi_data_plane_handle_cmd
    > qemu#15 0x00005586602ddab7 in virtio_queue_notify_aio_vq
    > qemu#16 0x00005586602dfc9f in virtio_queue_host_notifier_aio_poll
    > qemu#17 0x00005586607885da in run_poll_handlers_once
    > qemu#18 0x000055866078880e in try_poll_mode
    > qemu#19 0x00005586607888eb in aio_poll
    > qemu#20 0x0000558660784561 in aio_wait_bh_oneshot
    > qemu#21 0x00005586602b9582 in virtio_scsi_dataplane_stop
    > qemu#22 0x00005586605a7110 in virtio_bus_stop_ioeventfd
    > qemu#23 0x00005586605a9426 in virtio_pci_stop_ioeventfd
    > qemu#24 0x00005586605ab808 in virtio_pci_common_write
    > qemu#25 0x0000558660242396 in memory_region_write_accessor
    > qemu#26 0x00005586602425ab in access_with_adjusted_size
    > qemu#27 0x0000558660245281 in memory_region_dispatch_write
    > qemu#28 0x00005586601e008e in flatview_write_continue
    > qemu#29 0x00005586601e01d8 in flatview_write
    > qemu#30 0x00005586601e04de in address_space_write
    > qemu#31 0x00005586601e052f in address_space_rw
    > qemu#32 0x00005586602607f2 in kvm_cpu_exec
    > qemu#33 0x0000558660227148 in qemu_kvm_cpu_thread_fn
    > qemu#34 0x000055866078bde7 in qemu_thread_start
    > qemu#35 0x00007f5784906594 in start_thread
    > qemu#36 0x00007f5784639e6f in clone

With this patch, virtio_queue_empty will now return 1 as soon as the
vdev is marked as broken, after a "virtio: zero sized buffers are not
allowed" error.

To be consistent, update virtio_queue_empty_rcu as well.

Signed-off-by: Fam Zheng <famz@redhat.com>

---

v2: - Drop ATS condition from the patch description since it is not
      essential.
    - Drop patch 2. [Paolo]
qemu-deploy pushed a commit that referenced this pull request Oct 3, 2018
Both virtio-blk and virtio-scsi use virtio_queue_empty() as the
loop condition in VQ handlers (virtio_blk_handle_vq,
virtio_scsi_handle_cmd_vq). When a device is marked broken in
virtqueue_pop, for example if a vIOMMU address translation failed, we
want to break out of the loop.

This fixes a hanging problem when booting a CentOS 3.10.0-862.el7.x86_64
kernel with ATS enabled:

  $ qemu-system-x86_64 \
    ... \
    -device intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on \
    -device virtio-scsi-pci,iommu_platform=on,ats=on,id=scsi0,bus=pci.4,addr=0x0

The dead loop happens immediately when the kernel boots and initializes
the device, where virtio_scsi_data_plane_handle_cmd will not return:

    > ...
    > #13 0x00005586602b7793 in virtio_scsi_handle_cmd_vq
    > #14 0x00005586602b8d66 in virtio_scsi_data_plane_handle_cmd
    > #15 0x00005586602ddab7 in virtio_queue_notify_aio_vq
    > #16 0x00005586602dfc9f in virtio_queue_host_notifier_aio_poll
    > #17 0x00005586607885da in run_poll_handlers_once
    > #18 0x000055866078880e in try_poll_mode
    > #19 0x00005586607888eb in aio_poll
    > #20 0x0000558660784561 in aio_wait_bh_oneshot
    > #21 0x00005586602b9582 in virtio_scsi_dataplane_stop
    > #22 0x00005586605a7110 in virtio_bus_stop_ioeventfd
    > #23 0x00005586605a9426 in virtio_pci_stop_ioeventfd
    > #24 0x00005586605ab808 in virtio_pci_common_write
    > #25 0x0000558660242396 in memory_region_write_accessor
    > #26 0x00005586602425ab in access_with_adjusted_size
    > #27 0x0000558660245281 in memory_region_dispatch_write
    > #28 0x00005586601e008e in flatview_write_continue
    > #29 0x00005586601e01d8 in flatview_write
    > #30 0x00005586601e04de in address_space_write
    > #31 0x00005586601e052f in address_space_rw
    > #32 0x00005586602607f2 in kvm_cpu_exec
    > #33 0x0000558660227148 in qemu_kvm_cpu_thread_fn
    > #34 0x000055866078bde7 in qemu_thread_start
    > #35 0x00007f5784906594 in start_thread
    > #36 0x00007f5784639e6f in clone

With this patch, virtio_queue_empty will now return 1 as soon as the
vdev is marked as broken, after a "virtio: zero sized buffers are not
allowed" error.

To be consistent, update virtio_queue_empty_rcu as well.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20180910145616.8598-2-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
pranith pushed a commit to pranith/qemu that referenced this pull request Oct 8, 2018
Both virtio-blk and virtio-scsi use virtio_queue_empty() as the
loop condition in VQ handlers (virtio_blk_handle_vq,
virtio_scsi_handle_cmd_vq). When a device is marked broken in
virtqueue_pop, for example if a vIOMMU address translation failed, we
want to break out of the loop.

This fixes a hanging problem when booting a CentOS 3.10.0-862.el7.x86_64
kernel with ATS enabled:

  $ qemu-system-x86_64 \
    ... \
    -device intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on \
    -device virtio-scsi-pci,iommu_platform=on,ats=on,id=scsi0,bus=pci.4,addr=0x0

The dead loop happens immediately when the kernel boots and initializes
the device, where virtio_scsi_data_plane_handle_cmd will not return:

    > ...
    > qemu#13 0x00005586602b7793 in virtio_scsi_handle_cmd_vq
    > qemu#14 0x00005586602b8d66 in virtio_scsi_data_plane_handle_cmd
    > qemu#15 0x00005586602ddab7 in virtio_queue_notify_aio_vq
    > qemu#16 0x00005586602dfc9f in virtio_queue_host_notifier_aio_poll
    > qemu#17 0x00005586607885da in run_poll_handlers_once
    > qemu#18 0x000055866078880e in try_poll_mode
    > qemu#19 0x00005586607888eb in aio_poll
    > qemu#20 0x0000558660784561 in aio_wait_bh_oneshot
    > qemu#21 0x00005586602b9582 in virtio_scsi_dataplane_stop
    > qemu#22 0x00005586605a7110 in virtio_bus_stop_ioeventfd
    > qemu#23 0x00005586605a9426 in virtio_pci_stop_ioeventfd
    > qemu#24 0x00005586605ab808 in virtio_pci_common_write
    > qemu#25 0x0000558660242396 in memory_region_write_accessor
    > qemu#26 0x00005586602425ab in access_with_adjusted_size
    > qemu#27 0x0000558660245281 in memory_region_dispatch_write
    > qemu#28 0x00005586601e008e in flatview_write_continue
    > qemu#29 0x00005586601e01d8 in flatview_write
    > qemu#30 0x00005586601e04de in address_space_write
    > qemu#31 0x00005586601e052f in address_space_rw
    > qemu#32 0x00005586602607f2 in kvm_cpu_exec
    > qemu#33 0x0000558660227148 in qemu_kvm_cpu_thread_fn
    > qemu#34 0x000055866078bde7 in qemu_thread_start
    > qemu#35 0x00007f5784906594 in start_thread
    > qemu#36 0x00007f5784639e6f in clone

With this patch, virtio_queue_empty will now return 1 as soon as the
vdev is marked as broken, after a "virtio: zero sized buffers are not
allowed" error.

To be consistent, update virtio_queue_empty_rcu as well.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20180910145616.8598-2-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
deltova pushed a commit to EPITA-GISTRE/qemu that referenced this pull request Oct 8, 2018
In qemu_laio_process_completions_and_submit, the AioContext is acquired
before the ioq_submit iteration and after qemu_laio_process_completions,
but the latter is not thread safe either.

This change avoids a number of random crashes when the Main Thread and
an IO Thread collide processing completions for the same AioContext.
This is an example of such crash:

 - The IO Thread is trying to acquire the AioContext at aio_co_enter,
   which evidences that it didn't lock it before:

Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)):
 #0  0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 qemu#1  0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0
 qemu#2  0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0)
    at ../nptl/pthread_mutex_lock.c:114
 qemu#3  0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66
 qemu#4  0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493
 qemu#5  0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478
 qemu#6  0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104
 qemu#7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
 qemu#8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
    at block/linux-aio.c:237
 qemu#9  0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406
 qemu#10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true)
    at util/aio-posix.c:693
 qemu#11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64
 qemu#12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308
 qemu#13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

 - The Main Thread is also processing completions from the same
   AioContext, and crashes due to failed assertion at util/iov.c:78:

Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)):
 #0  0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 qemu#1  0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90
 qemu#2  0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset")
    at assert.c:92
 qemu#3  0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101
 qemu#4  0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78
 qemu#5  0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410
 qemu#6  0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88
 qemu#7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
 qemu#8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
    at block/linux-aio.c:237
 qemu#9  0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272
 qemu#10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497
 qemu#11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574
 qemu#12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false)
    at util/aio-posix.c:604
 qemu#13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273
 qemu#14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390
 qemu#15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590
 qemu#16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774
 qemu#17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401
 qemu#18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449
 qemu#19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080)
    at block/commit.c:92
 qemu#20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973
 qemu#21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90
 qemu#22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118
 qemu#23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436
 qemu#24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
 qemu#25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201
 qemu#26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854
 qemu#27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215
 qemu#28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238
 qemu#29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497
 qemu#30 0x00005631fbd81412 in main_loop () at vl.c:1866
 qemu#31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4647

 - A closer examination shows that s->io_q.in_flight appears to have
   gone backwards:

(gdb) frame 7
 qemu#7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
222	            qemu_laio_process_completion(laiocb);
(gdb) p s
$2 = (LinuxAioState *) 0x7fdfc0297670
(gdb) p *s
$3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0,
    in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0,
      sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241}
(gdb) p/x s->io_q.in_flight
$4 = 0xfffffff0

Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
yashkmankad pushed a commit to yashkmankad/qemu that referenced this pull request Dec 5, 2018
RH-Author: Kevin Wolf <kwolf@redhat.com>
Message-id: <20181010202213.7372-14-kwolf@redhat.com>
Patchwork-id: 82603
O-Subject: [RHEL-8 qemu-kvm PATCH 23/44] block/linux-aio: acquire AioContext before qemu_laio_process_completions
Bugzilla: 1637976
RH-Acked-by: Max Reitz <mreitz@redhat.com>
RH-Acked-by: John Snow <jsnow@redhat.com>
RH-Acked-by: Thomas Huth <thuth@redhat.com>

From: Sergio Lopez <slp@redhat.com>

In qemu_laio_process_completions_and_submit, the AioContext is acquired
before the ioq_submit iteration and after qemu_laio_process_completions,
but the latter is not thread safe either.

This change avoids a number of random crashes when the Main Thread and
an IO Thread collide processing completions for the same AioContext.
This is an example of such crash:

 - The IO Thread is trying to acquire the AioContext at aio_co_enter,
   which evidences that it didn't lock it before:

Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)):
 #0  0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 qemu#1  0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0
 qemu#2  0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0)
    at ../nptl/pthread_mutex_lock.c:114
 qemu#3  0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66
 qemu#4  0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493
 qemu#5  0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478
 qemu#6  0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104
 qemu#7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
 qemu#8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
    at block/linux-aio.c:237
 qemu#9  0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406
 qemu#10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true)
    at util/aio-posix.c:693
 qemu#11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64
 qemu#12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308
 qemu#13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

 - The Main Thread is also processing completions from the same
   AioContext, and crashes due to failed assertion at util/iov.c:78:

Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)):
 #0  0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 qemu#1  0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90
 qemu#2  0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset")
    at assert.c:92
 qemu#3  0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101
 qemu#4  0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78
 qemu#5  0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410
 qemu#6  0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88
 qemu#7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
 qemu#8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
    at block/linux-aio.c:237
 qemu#9  0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272
 qemu#10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497
 qemu#11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574
 qemu#12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false)
    at util/aio-posix.c:604
 qemu#13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273
 qemu#14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390
 qemu#15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590
 qemu#16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774
 qemu#17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401
 qemu#18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449
 qemu#19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080)
    at block/commit.c:92
 qemu#20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973
 qemu#21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90
 qemu#22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118
 qemu#23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436
 qemu#24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
 qemu#25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201
 qemu#26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854
 qemu#27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215
 qemu#28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238
 qemu#29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497
 qemu#30 0x00005631fbd81412 in main_loop () at vl.c:1866
 qemu#31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4647

 - A closer examination shows that s->io_q.in_flight appears to have
   gone backwards:

(gdb) frame 7
 qemu#7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
222	            qemu_laio_process_completion(laiocb);
(gdb) p s
$2 = (LinuxAioState *) 0x7fdfc0297670
(gdb) p *s
$3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0,
    in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0,
      sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241}
(gdb) p/x s->io_q.in_flight
$4 = 0xfffffff0

Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit e091f0e)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Danilo C. L. de Paula <ddepaula@redhat.com>
@repo-lockdown
Copy link

repo-lockdown bot commented Apr 8, 2020

Thank you for your interest in the QEMU project.

This repository is a read-only mirror of the project's master
repostories hosted on https://git.qemu.org/git/qemu.git.
The project does not process merge requests filed on GitHub.

QEMU welcomes contributions of code (either fixing bugs or adding new
functionality). However, we get a lot of patches, and so we have some
guidelines about contributing on the project website:
https://www.qemu.org/contribute/

@repo-lockdown repo-lockdown bot locked and limited conversation to collaborators Apr 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants