-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tcmu: Add Read_capacity_10 and Start/Stop #1
base: qemu-tcmu
Are you sure you want to change the base?
Conversation
libtcmu is a Linux library for userspace programs to handle TCMU protocol, which is a SCSI transport between the target_core_user.ko and a userspace backend implementation. The former can be seen as a kernel SCSI Low-Level-Driver that forwards scsi requests to userspace via a UIO ring buffer. By linking to libtcmu, a program can serve as a scsi HBA. This patch adds qemu-tcmu utility that serves as a LIO userspace backend. Apart from how it interacts with the rest of the world, the role of qemu-tcmu is much alike to qemu-nbd: it can export any format/protocol that QEMU supports (in iscsi/loopback instead of NBD), for local or remote access. The main advantage with qemu-tcmu lies in the more flexible command protocol (SCSI) and more flexible iscsi or loopback frontends, the latter of which can theoretically allow zero-copy I/O from local machine, which makes qemu-tcmu potentially possible to serve data in better performance. RFC because only minimal scsi commands are now handled. The plan is to refactor and reuse scsi-disk emulation code. Also there is no test code. Similar to NBD, there will be QMP commands to start built-in targets so that guest images can be exported as well. Usage example script (tested on Fedora 24): set -e # For targetcli integration sudo systemctl start tcmu-runner qemu-img create test-img 1G # libtcmu needs to open UIO, give it access sudo chmod 777 /dev/uio0 qemu-tcmu test-img & sleep 1 IQN=iqn.2003-01.org.linux-iscsi.lemon.x8664:sn.4939fc29108f # Check that we have initialized correctly sudo targetcli ls | grep -q user:qemu # Create iscsi target sudo targetcli ls | grep -q $IQN || sudo targetcli /iscsi create wwn=$IQN sudo targetcli /iscsi/$IQN/tpg1 set attribute \ authentication=0 generate_node_acls=1 demo_mode_write_protect=0 \ prod_mode_write_protect=0 # Create the qemu-tcmu target sudo targetcli ls | grep -q d0 || \ sudo targetcli /backstores/user:qemu create d0 1G @drive # Export it as an iscsi LUN sudo targetcli ls | grep -q lun0 || \ sudo targetcli /iscsi/$IQN/tpg1/luns create storage_object=/backstores/user:qemu/d0 # Inspect the nodes again sudo targetcli ls # Test that the LIO export works qemu-img info iscsi://127.0.0.1/$IQN/0 qemu-io iscsi://127.0.0.1/$IQN/0 \ -c 'read -P 1 4k 1k' \ -c 'write -P 2 4k 1k' \ -c 'read -P 2 4k 1k' \ Signed-off-by: Fam Zheng <famz@redhat.com> --- v2: Two small fixes to help proof testing of this prototype: Fix READ CAPACITY. [Prasanna] Fix qiov. [Stefan]
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
@famz I added the stuff I sent you in the email, also can you update this branch to your v2? |
On Fri, 11/11 09:31, Bryant G. Ly wrote:
For now I'm not inclined to integrate your start/stop emulation patch because The plan has been to totally reuse scsi-disk code in QEMU for all SCSI commands
|
Qemu crash in the source side while migrating, after starting ipmi service inside vm. ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 \ -drive file=/work/suse/suse11_sp3_64_vt,format=raw,if=none,id=drive-virtio-disk0,cache=none \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 \ -vnc :99 -monitor vc -device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-kcs,bmc=bmc0,ioport=0xca2 Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffec4268700 (LWP 7657)] __memcpy_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:2757 (gdb) bt #0 __memcpy_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:2757 #1 0x00005555559ef775 in memcpy (__len=3, __src=0xc1421c, __dest=<optimized out>) at /usr/include/bits/string3.h:51 qemu#2 qemu_put_buffer (f=0x555557a97690, buf=0xc1421c <Address 0xc1421c out of bounds>, size=3) at migration/qemu-file.c:346 qemu#3 0x00005555559eef66 in vmstate_save_state (f=f@entry=0x555557a97690, vmsd=0x555555f8a5a0 <vmstate_ISAIPMIKCSDevice>, opaque=0x555557231160, vmdesc=vmdesc@entry=0x55555798cc40) at migration/vmstate.c:333 qemu#4 0x00005555557cfe45 in vmstate_save (f=f@entry=0x555557a97690, se=se@entry=0x555557231de0, vmdesc=vmdesc@entry=0x55555798cc40) at /mnt/sdb/zyy/qemu/migration/savevm.c:720 qemu#5 0x00005555557d2be7 in qemu_savevm_state_complete_precopy (f=0x555557a97690, iterable_only=iterable_only@entry=false) at /mnt/sdb/zyy/qemu/migration/savevm.c:1128 qemu#6 0x00005555559ea102 in migration_completion (start_time=<synthetic pointer>, old_vm_running=<synthetic pointer>, current_active_state=<optimized out>, s=0x5555560eaa80 <current_migration.44078>) at migration/migration.c:1707 qemu#7 migration_thread (opaque=0x5555560eaa80 <current_migration.44078>) at migration/migration.c:1855 qemu#8 0x00007ffff3900dc5 in start_thread (arg=0x7ffec4268700) at pthread_create.c:308 qemu#9 0x00007fffefc6c71d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Current code that handles Tx buffer desciprtor ring scanning employs the following algorithm: 1. Restore current buffer descriptor pointer from TBPTRn 2. Process current descriptor 3. If current descriptor has BD_WRAP flag set set current descriptor pointer to start of the descriptor ring 4. If current descriptor points to start of the ring exit the loop, otherwise increment current descriptor pointer and go to qemu#2 5. Store current descriptor in TBPTRn The way the code is implemented results in buffer descriptor ring being scanned starting at offset/descriptor #0. While covering 99% of the cases, this algorithm becomes problematic for a number of edge cases. Consider the following scenario: guest OS driver initializes descriptor ring to N individual descriptors and starts sending data out. Depending on the volume of traffic and probably guest OS driver implementation it is possible that an edge case where a packet, spread across 2 descriptors is placed in descriptors N - 1 and 0 in that order(it is easy to imagine similar examples involving more than 2 descriptors). What happens then is aforementioned algorithm starts at descriptor 0, sees a descriptor marked as BD_LAST, which it happily sends out as a separate packet(very much malformed at this point) then the iteration continues and the first part of the original packet is tacked to the next transmission which ends up being bogus as well. This behvaiour can be pretty reliably observed when scp'ing data from a guest OS via TAP interface for files larger than 160K (every time for 700K+). This patch changes the scanning algorithm to do the following: 1. Restore "current" buffer descriptor pointer from TBPTRn 2. If "current" descriptor does not have BD_TX_READY set, goto qemu#6 3. Process current descriptor 4. If "current" descriptor has BD_WRAP flag set "current" descriptor pointer to start of the descriptor ring otherwise set increment "current" by the size of one descriptor 5. Goto #1 6. Save "current" buffer descriptor in TBPTRn This way we preserve the information about which descriptor was processed last and always start where we left off avoiding the original problem. On top of that, judging by the following excerpt from MPC8548ERM (p. 14-48): "... When the end of the TxBD ring is reached, eTSEC initializes TBPTRn to the value in the corresponding TBASEn. The TBPTR register is internally written by the eTSEC’s DMA controller during transmission. The pointer increments by eight (bytes) each time a descriptor is closed successfully by the eTSEC..." revised algorithm might also a more correct way of emulating this aspect of eTSEC peripheral. Cc: Alexander Graf <agraf@suse.de> Cc: Scott Wood <scottwood@freescale.com> Cc: Jason Wang <jasowang@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
QEMU will crash with the follow backtrace if the new created thread exited before we call qemu_thread_set_name() for it. (gdb) bt #0 0x00007f9a68b095d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007f9a68b0acc8 in __GI_abort () at abort.c:90 qemu#2 0x00007f9a69cda389 in PAT_abort () from /usr/lib64/libuvpuserhotfix.so qemu#3 0x00007f9a69cdda0d in patchIllInsHandler () from /usr/lib64/libuvpuserhotfix.so qemu#4 <signal handler called> qemu#5 pthread_setname_np (th=140298470549248, name=name@entry=0x8cc74a "io-task-worker") at ../nptl/sysdeps/unix/sysv/linux/pthread_setname.c:49 qemu#6 0x00000000007f5f20 in qemu_thread_set_name (thread=thread@entry=0x7ffd2ac09680, name=name@entry=0x8cc74a "io-task-worker") at util/qemu_thread_posix.c:459 qemu#7 0x00000000007f679e in qemu_thread_create (thread=thread@entry=0x7ffd2ac09680, name=name@entry=0x8cc74a "io-task-worker",start_routine=start_routine@entry=0x7c1300 <qio_task_thread_worker>, arg=arg@entry=0x7f99b8001720, mode=mode@entry=1) at util/qemu_thread_posix.c:498 qemu#8 0x00000000007c15b6 in qio_task_run_in_thread (task=task@entry=0x7f99b80033d0, worker=worker@entry=0x7bd920 <qio_channel_socket_connect_worker>, opaque=0x7f99b8003370, destroy=0x7c6220 <qapi_free_SocketAddress>) at io/task.c:133 qemu#9 0x00000000007bda04 in qio_channel_socket_connect_async (ioc=0x7f99b80014c0, addr=0x37235d0, callback=callback@entry=0x54ad00 <qemu_chr_socket_connected>, opaque=opaque@entry=0x38118b0, destroy=destroy@entry=0x0) at io/channel_socket.c:191 qemu#10 0x00000000005487f6 in socket_reconnect_timeout (opaque=0x38118b0) at qemu_char.c:4402 qemu#11 0x00007f9a6a1533b3 in g_timeout_dispatch () from /usr/lib64/libglib-2.0.so.0 qemu#12 0x00007f9a6a15299a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 qemu#13 0x0000000000747386 in glib_pollfds_poll () at main_loop.c:227 qemu#14 0x0000000000747424 in os_host_main_loop_wait (timeout=404000000) at main_loop.c:272 qemu#15 0x0000000000747575 in main_loop_wait (nonblocking=nonblocking@entry=0) at main_loop.c:520 qemu#16 0x0000000000557d31 in main_loop () at vl.c:2170 qemu#17 0x000000000041c8b7 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:5083 Let's detach the new thread after calling qemu_thread_set_name(). Signed-off-by: Caoxinhua <caoxinhua@huawei.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Message-Id: <1483493521-9604-1-git-send-email-zhang.zhanghailiang@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
AioHandlers marked ->is_external must be skipped when aio_node_check() fails. bdrv_drained_begin() needs this to prevent dataplane from submitting new I/O requests while another thread accesses the device and relies on it being quiesced. This patch fixes the following segfault: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at qemu/block/io.c:2650 2650 bdrv_io_plug(child->bs); [Current thread is 1 (Thread 0x7ff5c4bd1c80 (LWP 10917))] (gdb) bt #0 0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at qemu/block/io.c:2650 #1 0x00005577f6114363 in blk_io_plug (blk=0x5577f7b8ba20) at qemu/block/block-backend.c:1561 qemu#2 0x00005577f5d4091d in virtio_blk_handle_vq (s=0x5577f9ada030, vq=0x5577f9b3d2a0) at qemu/hw/block/virtio-blk.c:589 qemu#3 0x00005577f5d4240d in virtio_blk_data_plane_handle_output (vdev=0x5577f9ada030, vq=0x5577f9b3d2a0) at qemu/hw/block/dataplane/virtio-blk.c:158 qemu#4 0x00005577f5d88acd in virtio_queue_notify_aio_vq (vq=0x5577f9b3d2a0) at qemu/hw/virtio/virtio.c:1304 qemu#5 0x00005577f5d8aaaf in virtio_queue_host_notifier_aio_poll (opaque=0x5577f9b3d308) at qemu/hw/virtio/virtio.c:2134 qemu#6 0x00005577f60ca077 in run_poll_handlers_once (ctx=0x5577f79ddbb0) at qemu/aio-posix.c:493 qemu#7 0x00005577f60ca268 in try_poll_mode (ctx=0x5577f79ddbb0, blocking=true) at qemu/aio-posix.c:569 qemu#8 0x00005577f60ca331 in aio_poll (ctx=0x5577f79ddbb0, blocking=true) at qemu/aio-posix.c:601 qemu#9 0x00005577f612722a in bdrv_flush (bs=0x5577f7c20970) at qemu/block/io.c:2403 qemu#10 0x00005577f60c1b2d in bdrv_close (bs=0x5577f7c20970) at qemu/block.c:2322 qemu#11 0x00005577f60c20e7 in bdrv_delete (bs=0x5577f7c20970) at qemu/block.c:2465 qemu#12 0x00005577f60c3ecf in bdrv_unref (bs=0x5577f7c20970) at qemu/block.c:3425 qemu#13 0x00005577f60bf951 in bdrv_root_unref_child (child=0x5577f7a2de70) at qemu/block.c:1361 qemu#14 0x00005577f6112162 in blk_remove_bs (blk=0x5577f7b8ba20) at qemu/block/block-backend.c:491 qemu#15 0x00005577f6111b1b in blk_remove_all_bs () at qemu/block/block-backend.c:245 qemu#16 0x00005577f60c1db6 in bdrv_close_all () at qemu/block.c:2382 qemu#17 0x00005577f5e60cca in main (argc=20, argv=0x7ffea6eb8398, envp=0x7ffea6eb8440) at qemu/vl.c:4684 The key thing is that bdrv_close() uses bdrv_drained_begin() and virtio_queue_host_notifier_aio_poll() must not be called. Thanks to Fam Zheng <famz@redhat.com> for identifying the root cause of this crash. Reported-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Tested-by: Alberto Garcia <berto@igalia.com> Message-id: 20170124095350.16679-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Running postcopy-test with ASAN produces the following error: QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 tests/postcopy-test ... ================================================================= ==23641==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1556600000 at pc 0x55b8e9d28208 bp 0x7f1555f4d3c0 sp 0x7f1555f4d3b0 READ of size 8 at 0x7f1556600000 thread T6 #0 0x55b8e9d28207 in htab_save_first_pass /home/elmarco/src/qq/hw/ppc/spapr.c:1528 #1 0x55b8e9d2939c in htab_save_iterate /home/elmarco/src/qq/hw/ppc/spapr.c:1665 qemu#2 0x55b8e9beae3a in qemu_savevm_state_iterate /home/elmarco/src/qq/migration/savevm.c:1044 qemu#3 0x55b8ea677733 in migration_thread /home/elmarco/src/qq/migration/migration.c:1976 qemu#4 0x7f15845f46c9 in start_thread (/lib64/libpthread.so.0+0x76c9) qemu#5 0x7f157d9d0f7e in clone (/lib64/libc.so.6+0x107f7e) 0x7f1556600000 is located 0 bytes to the right of 2097152-byte region [0x7f1556400000,0x7f1556600000) allocated by thread T0 here: #0 0x7f159bb76980 in posix_memalign (/lib64/libasan.so.3+0xc7980) #1 0x55b8eab185b2 in qemu_try_memalign /home/elmarco/src/qq/util/oslib-posix.c:106 qemu#2 0x55b8eab186c8 in qemu_memalign /home/elmarco/src/qq/util/oslib-posix.c:122 qemu#3 0x55b8e9d268a8 in spapr_reallocate_hpt /home/elmarco/src/qq/hw/ppc/spapr.c:1214 qemu#4 0x55b8e9d26e04 in ppc_spapr_reset /home/elmarco/src/qq/hw/ppc/spapr.c:1261 qemu#5 0x55b8ea12e913 in qemu_system_reset /home/elmarco/src/qq/vl.c:1697 qemu#6 0x55b8ea13fa40 in main /home/elmarco/src/qq/vl.c:4679 qemu#7 0x7f157d8e9400 in __libc_start_main (/lib64/libc.so.6+0x20400) Thread T6 created by T0 here: #0 0x7f159bae0488 in __interceptor_pthread_create (/lib64/libasan.so.3+0x31488) #1 0x55b8eab1d9cb in qemu_thread_create /home/elmarco/src/qq/util/qemu-thread-posix.c:465 qemu#2 0x55b8ea67874c in migrate_fd_connect /home/elmarco/src/qq/migration/migration.c:2096 qemu#3 0x55b8ea66cbb0 in migration_channel_connect /home/elmarco/src/qq/migration/migration.c:500 qemu#4 0x55b8ea678f38 in socket_outgoing_migration /home/elmarco/src/qq/migration/socket.c:87 qemu#5 0x55b8eaa5a03a in qio_task_complete /home/elmarco/src/qq/io/task.c:142 qemu#6 0x55b8eaa599cc in gio_task_thread_result /home/elmarco/src/qq/io/task.c:88 qemu#7 0x7f15823e38e6 (/lib64/libglib-2.0.so.0+0x468e6) SUMMARY: AddressSanitizer: heap-buffer-overflow /home/elmarco/src/qq/hw/ppc/spapr.c:1528 in htab_save_first_pass index seems to be wrongly incremented, unless I miss something that would be worth a comment. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
During block job completion, nothing is preventing block_job_defer_to_main_loop_bh from being called in a nested aio_poll(), which is a trouble, such as in this code path: qmp_block_commit commit_active_start bdrv_reopen bdrv_reopen_multiple bdrv_reopen_prepare bdrv_flush aio_poll aio_bh_poll aio_bh_call block_job_defer_to_main_loop_bh stream_complete bdrv_reopen block_job_defer_to_main_loop_bh is the last step of the stream job, which should have been "paused" by the bdrv_drained_begin/end in bdrv_reopen_multiple, but it is not done because it's in the form of a main loop BH. Similar to why block jobs should be paused between drained_begin and drained_end, BHs they schedule must be excluded as well. To achieve this, this patch forces draining the BH in BDRV_POLL_WHILE. As a side effect this fixes a hang in block_job_detach_aio_context during system_reset when a block job is ready: #0 0x0000555555aa79f3 in bdrv_drain_recurse #1 0x0000555555aa825d in bdrv_drained_begin qemu#2 0x0000555555aa8449 in bdrv_drain qemu#3 0x0000555555a9c356 in blk_drain qemu#4 0x0000555555aa3cfd in mirror_drain qemu#5 0x0000555555a66e11 in block_job_detach_aio_context qemu#6 0x0000555555a62f4d in bdrv_detach_aio_context qemu#7 0x0000555555a63116 in bdrv_set_aio_context qemu#8 0x0000555555a9d326 in blk_set_aio_context qemu#9 0x00005555557e38da in virtio_blk_data_plane_stop qemu#10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd qemu#11 0x00005555559fa49b in virtio_bus_stop_ioeventfd qemu#12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd qemu#13 0x00005555559f6a18 in virtio_pci_reset qemu#14 0x00005555559139a9 in qdev_reset_one qemu#15 0x0000555555916738 in qbus_walk_children qemu#16 0x0000555555913318 in qdev_walk_children qemu#17 0x0000555555916738 in qbus_walk_children qemu#18 0x00005555559168ca in qemu_devices_reset qemu#19 0x000055555581fcbb in pc_machine_reset qemu#20 0x00005555558a4d96 in qemu_system_reset qemu#21 0x000055555577157a in main_loop_should_exit qemu#22 0x000055555577157a in main_loop qemu#23 0x000055555577157a in main The rationale is that the loop in block_job_detach_aio_context cannot make any progress in pausing/completing the job, because bs->in_flight is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop BH. With this patch, it does. Reported-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170418143044.12187-3-famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Tested-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
ASAN detects an "unknown-crash" when running pxe-test: /ppc64/pxe/spapr-vlan: ================================================================= ==7143==ERROR: AddressSanitizer: unknown-crash on address 0x7f6dcd298d30 at pc 0x55e22218830d bp 0x7f6dcd2989e0 sp 0x7f6dcd2989d0 READ of size 128 at 0x7f6dcd298d30 thread T2 #0 0x55e22218830c in tftp_session_allocate /home/elmarco/src/qq/slirp/tftp.c:73 #1 0x55e22218a1f8 in tftp_handle_rrq /home/elmarco/src/qq/slirp/tftp.c:289 qemu#2 0x55e22218b54c in tftp_input /home/elmarco/src/qq/slirp/tftp.c:446 qemu#3 0x55e2221833fe in udp6_input /home/elmarco/src/qq/slirp/udp6.c:82 qemu#4 0x55e222137b17 in ip6_input /home/elmarco/src/qq/slirp/ip6_input.c:67 Address 0x7f6dcd298d30 is located in stack of thread T2 at offset 96 in frame #0 0x55e222182420 in udp6_input /home/elmarco/src/qq/slirp/udp6.c:13 This frame has 3 object(s): [32, 48) '<unknown>' [96, 124) 'lhost' <== Memory access at offset 96 partially overflows this variable [160, 200) 'save_ip' <== Memory access at offset 96 partially underflows this variable The sockaddr_storage pointer is the sockaddr_in6 lhost on the stack. Copy only the source addr size. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
If trace backend is set to TRACE_NOP, trace_get_vcpu_event_count returns 0, cause bitmap_new call abort. The abort can be triggered as follows: $ ./configure --enable-trace-backend=nop --target-list=x86_64-softmmu $ gdb ./x86_64-softmmu/qemu-system-x86_64 -M q35,accel=kvm -m 1G (gdb) bt #0 0x00007ffff04e25f7 in raise () from /lib64/libc.so.6 #1 0x00007ffff04e3ce8 in abort () from /lib64/libc.so.6 qemu#2 0x00005555559de905 in bitmap_new (nbits=<optimized out>) at /home/root/git/qemu2.git/include/qemu/bitmap.h:96 qemu#3 cpu_common_initfn (obj=0x555556621d30) at qom/cpu.c:399 qemu#4 0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bbb0) at qom/object.c:341 qemu#5 0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bd30) at qom/object.c:341 qemu#6 0x0000555555a11efc in object_initialize_with_type (data=data@entry=0x555556621d30, size=76560, type=type@entry=0x55555656bd30) at qom/object.c:376 qemu#7 0x0000555555a12061 in object_new_with_type (type=0x55555656bd30) at qom/object.c:484 qemu#8 0x0000555555a121c5 in object_new (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu") at qom/object.c:494 qemu#9 0x00005555557f6e3d in pc_new_cpu (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu", apic_id=0, errp=errp@entry=0x5555565391b0 <error_fatal>) at /home/root/git/qemu2.git/hw/i386/pc.c:1101 qemu#10 0x00005555557fa33e in pc_cpus_init (pcms=pcms@entry=0x5555565f9690) at /home/root/git/qemu2.git/hw/i386/pc.c:1184 qemu#11 0x00005555557fe0f6 in pc_q35_init (machine=0x5555565f9690) at /home/root/git/qemu2.git/hw/i386/pc_q35.c:121 qemu#12 0x000055555574fbad in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4562 Signed-off-by: Anthony Xu <anthony.xu@intel.com> Message-id: 1494369432-15418-1-git-send-email-anthony.xu@intel.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Spotted by ASAN: /x86_64/hmp/pc-0.12: ================================================================= ==22538==ERROR: LeakSanitizer: detected memory leaks Direct leak of 224 byte(s) in 1 object(s) allocated from: #0 0x7f0f63cdee60 in malloc (/lib64/libasan.so.3+0xc6e60) #1 0x556f11ff32d7 in tcp_newtcpcb /home/elmarco/src/qemu/slirp/tcp_subr.c:250 qemu#2 0x556f11fdb1d1 in tcp_listen /home/elmarco/src/qemu/slirp/socket.c:688 qemu#3 0x556f11fca9d5 in slirp_add_hostfwd /home/elmarco/src/qemu/slirp/slirp.c:1052 qemu#4 0x556f11f8db41 in slirp_hostfwd /home/elmarco/src/qemu/net/slirp.c:506 qemu#5 0x556f11f8dd83 in hmp_hostfwd_add /home/elmarco/src/qemu/net/slirp.c:535 There might be a better way to fix this, but calling slirp tcp_close() doesn't work. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Since commit d4c19cd ("virtio-serial: add missing virtio_detach_element() call") the following commands may cause QEMU to segfault: $ qemu -M accel=kvm -cpu host -m 1G \ -drive if=virtio,file=test.img,format=raw \ -device virtio-serial-pci,id=virtio-serial0 \ -chardev socket,id=channel1,path=/tmp/chardev.sock,server,nowait \ -device virtserialport,chardev=channel1,bus=virtio-serial0.0,id=port1 $ nc -U /tmp/chardev.sock ^C (guest)$ cat /dev/zero >/dev/vport0p1 The segfault is non-deterministic: if the event loop notices the socket has been closed then there is no crash. The disconnect has to happen right before QEMU attempts to write data to the socket. The backtrace is as follows: Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x00005555557e0698 in do_flush_queued_data (port=0x5555582cedf0, vq=0x7fffcc854290, vdev=0x55555807b1d0) at hw/char/virtio-serial-bus.c:180 180 for (i = port->iov_idx; i < port->elem->out_num; i++) { #1 0x000055555580d363 in virtio_queue_notify_vq (vq=0x7fffcc854290) at hw/virtio/virtio.c:1524 qemu#2 0x000055555580d363 in virtio_queue_host_notifier_read (n=0x7fffcc8542f8) at hw/virtio/virtio.c:2430 qemu#3 0x0000555555b3482c in aio_dispatch_handlers (ctx=ctx@entry=0x5555566b8c80) at util/aio-posix.c:399 qemu#4 0x0000555555b350d8 in aio_dispatch (ctx=0x5555566b8c80) at util/aio-posix.c:430 qemu#5 0x0000555555b3212e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261 qemu#6 0x00007fffde71de52 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 qemu#7 0x0000555555b34353 in glib_pollfds_poll () at util/main-loop.c:213 qemu#8 0x0000555555b34353 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261 qemu#9 0x0000555555b34353 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:517 qemu#10 0x0000555555773207 in main_loop () at vl.c:1917 qemu#11 0x0000555555773207 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4751 The do_flush_queued_data() function does not anticipate chardev close events during vsc->have_data(). It expects port->elem to remain non-NULL for the duration its for loop. The fix is simply to return from do_flush_queued_data() if the port closes because the close event already frees port->elem and drains the virtqueue - there is nothing left for do_flush_queued_data() to do. Reported-by: Sitong Liu <siliu@redhat.com> Reported-by: Min Deng <mdeng@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
…ered Submission of requests on linux aio is a bit tricky and can lead to requests completions on submission path: 44713c9 ("linux-aio: Handle io_submit() failure gracefully") 0ed93d8 ("linux-aio: process completions from ioq_submit()") That means that any coroutine which has been yielded in order to wait for completion can be resumed from submission path and be eventually terminated (freed). The following use-after-free crash was observed when IO throttling was enabled: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f5813dff700 (LWP 56417)] virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252 (gdb) bt #0 virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252 ^^^^^^^^^^^^^^ remember the address #1 virtqueue_fill (vq=0x5598b20d21b0, elem=0x7f5804009a30, len=1, idx=0) at virtio.c:282 qemu#2 virtqueue_push (vq=0x5598b20d21b0, elem=elem@entry=0x7f5804009a30, len=<optimized out>) at virtio.c:308 qemu#3 virtio_blk_req_complete (req=req@entry=0x7f5804009a30, status=status@entry=0 '\000') at virtio-blk.c:61 qemu#4 virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at virtio-blk.c:126 qemu#5 blk_aio_complete (acb=0x7f58040068d0) at block-backend.c:923 qemu#6 coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:78 (gdb) p * elem $8 = {index = 77, out_num = 2, in_num = 1, in_addr = 0x7f5804009ad8, out_addr = 0x7f5804009ae0, in_sg = 0x0, out_sg = 0x7f5804009a50} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 'in_sg' and 'out_sg' are invalid. e.g. it is impossible that 'in_sg' is zero, instead its value must be equal to: (gdb) p/x 0x7f5804009ad8 + sizeof(elem->in_addr[0]) + 2 * sizeof(elem->out_addr[0]) $26 = 0x7f5804009af0 Seems 'elem' was corrupted. Meanwhile another thread raised an abort: Thread 12 (Thread 0x7f57f2ffd700 (LWP 56426)): #0 raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 abort () from /lib/x86_64-linux-gnu/libc.so.6 qemu#2 qemu_coroutine_enter (co=0x7f5804009af0) at qemu-coroutine.c:113 qemu#3 qemu_co_queue_run_restart (co=0x7f5804009a30) at qemu-coroutine-lock.c:60 qemu#4 qemu_coroutine_enter (co=0x7f5804009a30) at qemu-coroutine.c:119 ^^^^^^^^^^^^^^^^^^ WTF?? this is equal to elem from crashed thread qemu#5 qemu_co_queue_run_restart (co=0x7f57e7f16ae0) at qemu-coroutine-lock.c:60 qemu#6 qemu_coroutine_enter (co=0x7f57e7f16ae0) at qemu-coroutine.c:119 qemu#7 qemu_co_queue_run_restart (co=0x7f5807e112a0) at qemu-coroutine-lock.c:60 qemu#8 qemu_coroutine_enter (co=0x7f5807e112a0) at qemu-coroutine.c:119 qemu#9 qemu_co_queue_run_restart (co=0x7f5807f17820) at qemu-coroutine-lock.c:60 qemu#10 qemu_coroutine_enter (co=0x7f5807f17820) at qemu-coroutine.c:119 qemu#11 qemu_co_queue_run_restart (co=0x7f57e7f18e10) at qemu-coroutine-lock.c:60 qemu#12 qemu_coroutine_enter (co=0x7f57e7f18e10) at qemu-coroutine.c:119 qemu#13 qemu_co_enter_next (queue=queue@entry=0x5598b1e742d0) at qemu-coroutine-lock.c:106 qemu#14 timer_cb (blk=0x5598b1e74280, is_write=<optimized out>) at throttle-groups.c:419 Crash can be explained by access of 'co' object from the loop inside qemu_co_queue_run_restart(): while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) { QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next); ^^^^^^^^^^^^^^^^^^^^ on each iteration 'co' is accessed, but 'co' can be already freed qemu_coroutine_enter(next); } When 'next' coroutine is resumed (entered) it can in its turn resume 'co', and eventually free it. That's why we see 'co' (which was freed) has the same address as 'elem' from the first backtrace. The fix is obvious: use temporary queue and do not touch coroutine after first qemu_coroutine_enter() is invoked. The issue is quite rare and happens every ~12 hours on very high IO and CPU load (building linux kernel with -j512 inside guest) when IO throttling is enabled. With the fix applied guest is running ~35 hours and is still alive so far. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170601160847.23720-1-roman.penyaev@profitbricks.com Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Fam Zheng <famz@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
V flag for subtraction is: v = (res ^ src1) & (src1 ^ src2) (see COMPUTE_CCR() in target/m68k/helper.c) But gen_flush_flags() uses: v = (res ^ src2) & (src1 ^ src2) The problem has been found with the following program: .global _start _start: move.l #-2147483648,%d0 subq.l #1,%d0 jvc 1f move.l #1,%d1 move.l #1,%d0 trap #0 1: move.l #0,%d1 move.l #1,%d0 trap #0 It works fine (exit(1)) on real hardware, and with "-singlestep". "-singlestep" uses gen_helper_flush_flags(), whereas without "-singlestep", V flag is computed directly in gen_flush_flags(). This patch updates gen_flush_flags() to have the same result as with gen_helper_flush_flags(). Signed-off-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <20170614203905.19657-1-laurent@vivier.eu>
Commit e987c37 ("hw/i386: check if nvdimm is enabled before plugging") introduced a check to reject nvdimm hotplug if -machine pc,nvdimm=on was not given. This check executes after pc_dimm_memory_plug() has already completed and does not reverse the effect of this function in the case of failure. Perform the check before calling pc_dimm_memory_plug(). This fixes the following abort: $ qemu -M accel=kvm -m 1G,slots=4,maxmem=8G \ -object memory-backend-file,id=mem1,share=on,mem-path=nvdimm.dat,size=1G (qemu) device_add nvdimm,memdev=mem1 nvdimm is not enabled: missing 'nvdimm' in '-M' (qemu) device_add nvdimm,memdev=mem1 Core dumped The backtrace is: #0 0x00007fffdb5b191f in raise () at /lib64/libc.so.6 #1 0x00007fffdb5b351a in abort () at /lib64/libc.so.6 qemu#2 0x00007fffdb5a9da7 in __assert_fail_base () at /lib64/libc.so.6 qemu#3 0x00007fffdb5a9e52 in () at /lib64/libc.so.6 qemu#4 0x000055555577a5fa in qemu_ram_set_idstr (new_block=0x555556747a00, name=<optimized out>, dev=dev@entry=0x555556705590) at qemu/exec.c:1709 qemu#5 0x0000555555a0fe86 in vmstate_register_ram (mr=mr@entry=0x55555673a0e0, dev=dev@entry=0x555556705590) at migration/savevm.c:2293 qemu#6 0x0000555555965088 in pc_dimm_memory_plug (dev=dev@entry=0x555556705590, hpms=hpms@entry=0x5555566bb0e0, mr=mr@entry=0x555556705630, align=<optimized out>, errp=errp@entry=0x7fffffffc660) at hw/mem/pc-dimm.c:110 qemu#7 0x000055555581d89b in pc_dimm_plug (errp=0x7fffffffc6c0, dev=0x555556705590, hotplug_dev=<optimized out>) at qemu/hw/i386/pc.c:1713 qemu#8 0x000055555581d89b in pc_machine_device_plug_cb (hotplug_dev=<optimized out>, dev=0x555556705590, errp=0x7fffffffc6c0) at qemu/hw/i386/pc.c:2004 qemu#9 0x0000555555914da6 in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7fffffffc7e8) at hw/core/qdev.c:926 Cc: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
"nc" is freed after hotplug vhost-user, but the watcher is not removed. The QEMU crash when the watcher access the "nc" when socket disconnects. Program received signal SIGSEGV, Segmentation fault. #0 object_get_class (obj=obj@entry=0x2) at qom/object.c:750 #1 0x00007f9bb4180da1 in qemu_chr_fe_disconnect (be=<optimized out>) at chardev/char-fe.c:372 qemu#2 0x00007f9bb40d1100 in net_vhost_user_watch (chan=<optimized out>, cond=<optimized out>, opaque=<optimized out>) at net/vhost-user.c:188 qemu#3 0x00007f9baf97f99a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 qemu#4 0x00007f9bb41d7ebc in glib_pollfds_poll () at util/main-loop.c:213 qemu#5 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261 qemu#6 main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:515 qemu#7 0x00007f9bb3e266a7 in main_loop () at vl.c:1917 qemu#8 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4786 Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The following segfault is encountered if the NBD server closes the UNIX domain socket immediately after negotiation: Program terminated with signal SIGSEGV, Segmentation fault. #0 aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441 441 QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines, (gdb) bt #0 0x000000d3c01a50f8 in aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441 #1 0x000000d3c012fa90 in nbd_coroutine_end (bs=bs@entry=0xd3c0fec650, request=<optimized out>) at block/nbd-client.c:207 qemu#2 0x000000d3c012fb58 in nbd_client_co_preadv (bs=0xd3c0fec650, offset=0, bytes=<optimized out>, qiov=0x7ffc10a91b20, flags=0) at block/nbd-client.c:237 qemu#3 0x000000d3c0128e63 in bdrv_driver_preadv (bs=bs@entry=0xd3c0fec650, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=0) at block/io.c:836 qemu#4 0x000000d3c012c3e0 in bdrv_aligned_preadv (child=child@entry=0xd3c0ff51d0, req=req@entry=0x7f31885d6e90, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=1, qiov=qiov@entry=0x7ffc10a91b20, f +lags=0) at block/io.c:1086 qemu#5 0x000000d3c012c6b8 in bdrv_co_preadv (child=0xd3c0ff51d0, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=flags@entry=0) at block/io.c:1182 qemu#6 0x000000d3c011cc17 in blk_co_preadv (blk=0xd3c0ff4f80, offset=0, bytes=512, qiov=0x7ffc10a91b20, flags=0) at block/block-backend.c:1032 qemu#7 0x000000d3c011ccec in blk_read_entry (opaque=0x7ffc10a91b40) at block/block-backend.c:1079 qemu#8 0x000000d3c01bbb96 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79 qemu#9 0x00007f3196cb8600 in __start_context () at /lib64/libc.so.6 The problem is that nbd_client_init() uses nbd_client_attach_aio_context() -> aio_co_schedule(new_context, client->read_reply_co). Execution of read_reply_co is deferred to a BH which doesn't run until later. In the mean time blk_co_preadv() can be called and nbd_coroutine_end() calls aio_wake() on read_reply_co. At this point in time read_reply_co's ctx isn't set because it has never been entered yet. This patch simplifies the nbd_co_send_request() -> nbd_co_receive_reply() -> nbd_coroutine_end() lifecycle to just nbd_co_send_request() -> nbd_co_receive_reply(). The request is "ended" if an error occurs at any point. Callers no longer have to invoke nbd_coroutine_end(). This cleanup also eliminates the segfault because we don't call aio_co_schedule() to wake up s->read_reply_co if sending the request failed. It is only necessary to wake up s->read_reply_co if a reply was received. Note this only happens with UNIX domain sockets on Linux. It doesn't seem possible to reproduce this with TCP sockets. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170829122745.14309-2-stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
currently for sh4 cpu_model argument for '-cpu' option could be either 'cpu model' name or cpu_typename. however typically '-cpu' takes 'cpu model' name and cpu type for sh4 target isn't advertised publicly ('-cpu help' prints only 'cpu model' names) so we shouldn't care about this use case (it's more of a bug). 1. Drop '-cpu cpu_typename' to align with the rest of targets. 2. Compose searched for typename from cpu model and use it with object_class_by_name() directly instead of over-complicated object_class_get_list() g_slist_find_custom() + superh_cpu_name_compare() With #1 droped, qemu#2 could be used for both lookups which simplifies superh_cpu_class_by_name() quite a bit. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1507211474-188400-23-git-send-email-imammedo@redhat.com> [ehabkost: Include fixup sent by Igor] Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Migration of pseries is broken with TCG because QEMU tries to restore KVM MMU state unconditionally. The result is a SIGSEGV in kvm_vm_ioctl(): #0 kvm_vm_ioctl (s=0x0, type=-2146390353) at qemu/accel/kvm/kvm-all.c:2032 #1 0x00000001003e3e2c in kvmppc_configure_v3_mmu (cpu=<optimized out>, radix=<optimized out>, gtse=<optimized out>, proc_tbl=<optimized out>) at qemu/target/ppc/kvm.c:396 qemu#2 0x00000001002f8b88 in spapr_post_load (opaque=0x1019103c0, version_id=<optimized out>) at qemu/hw/ppc/spapr.c:1578 qemu#3 0x000000010059e4cc in vmstate_load_state (f=0x106230000, vmsd=0x1009479e0 <vmstate_spapr>, opaque=0x1019103c0, version_id=<optimized out>) at qemu/migration/vmstate.c:165 qemu#4 0x00000001005987e0 in vmstate_load (f=<optimized out>, se=<optimized out>) at qemu/migration/savevm.c:748 This patch fixes the problem by not calling the KVM function with the TCG mode. Fixes: d39c90f ("spapr: Fix migration of Radix guests") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
when qemu is started with '-no-acpi' CLI option, an attempt to unplug a CPU using device_del results in null pointer dereference at: #0 object_get_class #1 pc_machine_device_unplug_request_cb qemu#2 qmp_marshal_device_del which is caused by pcms->acpi_dev == NULL due to ACPI support being disabled. Considering that ACPI support is necessary for unplug to work, check that it's enabled and fail unplug request gracefully if no acpi device were found. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The passed-through device might be an express device. In this case the old code allocated a too small emulated config space in pci_config_alloc() since pci_config_size() returned the size for a non-express device. This leads to an out-of-bound write in xen_pt_config_reg_init(), which sometimes results in crashes. So set is_express as already done for KVM in vfio-pci. Shortened ASan report: ==17512==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x611000041648 at pc 0x55e0fdac51ff bp 0x7ffe4af07410 sp 0x7ffe4af07408 WRITE of size 2 at 0x611000041648 thread T0 #0 0x55e0fdac51fe in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53 #1 0x55e0fdac51fe in stw_he_p include/qemu/bswap.h:330 qemu#2 0x55e0fdac51fe in stw_le_p include/qemu/bswap.h:379 qemu#3 0x55e0fdac51fe in pci_set_word include/hw/pci/pci.h:490 qemu#4 0x55e0fdac51fe in xen_pt_config_reg_init hw/xen/xen_pt_config_init.c:1991 qemu#5 0x55e0fdac51fe in xen_pt_config_init hw/xen/xen_pt_config_init.c:2067 qemu#6 0x55e0fdabcf4d in xen_pt_realize hw/xen/xen_pt.c:830 qemu#7 0x55e0fdf59666 in pci_qdev_realize hw/pci/pci.c:2034 qemu#8 0x55e0fdda7d3d in device_set_realized hw/core/qdev.c:914 [...] 0x611000041648 is located 8 bytes to the right of 256-byte region [0x611000041540,0x611000041640) allocated by thread T0 here: #0 0x7ff596a94bb8 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xd9bb8) #1 0x7ff57da66580 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x50580) qemu#2 0x55e0fdda7d3d in device_set_realized hw/core/qdev.c:914 [...] Signed-off-by: Simon Gaiser <hw42@ipsumj.de> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
bdrv_unref() requires the AioContext lock because bdrv_flush() uses BDRV_POLL_WHILE(), which assumes the AioContext is currently held. If BDRV_POLL_WHILE() runs without AioContext held the pthread_mutex_unlock() call in aio_context_release() fails. This patch moves bdrv_unref() into the AioContext locked region to solve the following pthread_mutex_unlock() failure: #0 0x00007f566181969b in raise () at /lib64/libc.so.6 #1 0x00007f566181b3b1 in abort () at /lib64/libc.so.6 qemu#2 0x00005592cd590458 in error_exit (err=<optimized out>, msg=msg@entry=0x5592cdaf6d60 <__func__.23977> "qemu_mutex_unlock") at util/qemu-thread-posix.c:36 qemu#3 0x00005592cd96e738 in qemu_mutex_unlock (mutex=mutex@entry=0x5592ce9505e0) at util/qemu-thread-posix.c:96 qemu#4 0x00005592cd969b69 in aio_context_release (ctx=ctx@entry=0x5592ce950580) at util/async.c:507 qemu#5 0x00005592cd8ead78 in bdrv_flush (bs=bs@entry=0x5592cfa87210) at block/io.c:2478 qemu#6 0x00005592cd89df30 in bdrv_close (bs=0x5592cfa87210) at block.c:3207 qemu#7 0x00005592cd89df30 in bdrv_delete (bs=0x5592cfa87210) at block.c:3395 qemu#8 0x00005592cd89df30 in bdrv_unref (bs=0x5592cfa87210) at block.c:4418 qemu#9 0x00005592cd6b7f86 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=<optimized out>, errp=errp@entry=0x7ffe4a1fc9d8) at blockdev.c:2308 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20171206144550.22295-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
There is a small chance that iothread_stop() hangs as follows: Thread 3 (Thread 0x7f63eba5f700 (LWP 16105)): #0 0x00007f64012c09b6 in ppoll () at /lib64/libc.so.6 #1 0x000055959992eac9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 qemu#2 0x000055959992eac9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322 qemu#3 0x0000559599930711 in aio_poll (ctx=0x55959bdb83c0, blocking=blocking@entry=true) at util/aio-posix.c:629 qemu#4 0x00005595996806fe in iothread_run (opaque=0x55959bd78400) at iothread.c:59 qemu#5 0x00007f640159f609 in start_thread () at /lib64/libpthread.so.0 qemu#6 0x00007f64012cce6f in clone () at /lib64/libc.so.6 Thread 1 (Thread 0x7f640b45b280 (LWP 16103)): #0 0x00007f64015a0b6d in pthread_join () at /lib64/libpthread.so.0 #1 0x00005595999332ef in qemu_thread_join (thread=<optimized out>) at util/qemu-thread-posix.c:547 qemu#2 0x00005595996808ae in iothread_stop (iothread=<optimized out>) at iothread.c:91 qemu#3 0x000055959968094d in iothread_stop_iter (object=<optimized out>, opaque=<optimized out>) at iothread.c:102 qemu#4 0x0000559599857d97 in do_object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0, recurse=recurse@entry=false) at qom/object.c:852 qemu#5 0x0000559599859477 in object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0) at qom/object.c:867 qemu#6 0x0000559599680a6e in iothread_stop_all () at iothread.c:341 qemu#7 0x000055959955b1d5 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4913 The relevant code from iothread_run() is: while (!atomic_read(&iothread->stopping)) { aio_poll(iothread->ctx, true); and iothread_stop(): iothread->stopping = true; aio_notify(iothread->ctx); ... qemu_thread_join(&iothread->thread); The following scenario can occur: 1. IOThread: while (!atomic_read(&iothread->stopping)) -> stopping=false 2. Main loop: iothread->stopping = true; aio_notify(iothread->ctx); 3. IOThread: aio_poll(iothread->ctx, true); -> hang The bug is explained by the AioContext->notify_me doc comments: "If this field is 0, everything (file descriptors, bottom halves, timers) will be re-evaluated before the next blocking poll(), thus the event_notifier_set call can be skipped." The problem is that "everything" does not include checking iothread->stopping. This means iothread_run() will block in aio_poll() if aio_notify() was called just before aio_poll(). This patch fixes the hang by replacing aio_notify() with aio_bh_schedule_oneshot(). This makes aio_poll() or g_main_loop_run() to return. Implementing this properly required a new bool running flag. The new flag prevents races that are tricky if we try to use iothread->stopping. Now iothread->stopping is purely for iothread_stop() and iothread->running is purely for the iothread_run() thread. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20171207201320.19284-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If we create a thread with QEMU_THREAD_DETACHED mode, QEMU may get a segfault with low probability. The backtrace is: #0 0x00007f46c60291d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007f46c602a8c8 in __GI_abort () at abort.c:90 qemu#2 0x00000000008543c9 in PAT_abort () qemu#3 0x000000000085140d in patchIllInsHandler () qemu#4 <signal handler called> qemu#5 pthread_detach (th=139933037614848) at pthread_detach.c:50 qemu#6 0x0000000000829759 in qemu_thread_create (thread=thread@entry=0x7ffdaa8205e0, name=name@entry=0x94d94a "io-task-worker", start_routine=start_routine@entry=0x7eb9a0 <qio_task_thread_worker>, arg=arg@entry=0x3f5cf70, mode=mode@entry=1) at util/qemu_thread_posix.c:512 qemu#7 0x00000000007ebc96 in qio_task_run_in_thread (task=0x31db2c0, worker=worker@entry=0x7e7e40 <qio_channel_socket_connect_worker>, opaque=0xcd23380, destroy=0x7f1180 <qapi_free_SocketAddress>) at io/task.c:141 qemu#8 0x00000000007e7f33 in qio_channel_socket_connect_async (ioc=ioc@entry=0x626c0b0, addr=<optimized out>, callback=callback@entry=0x55e080 <qemu_chr_socket_connected>, opaque=opaque@entry=0x42862c0, destroy=destroy@entry=0x0) at io/channel_socket.c:194 qemu#9 0x000000000055bdd1 in socket_reconnect_timeout (opaque=0x42862c0) at qemu_char.c:4744 qemu#10 0x00007f46c72483b3 in g_timeout_dispatch () from /usr/lib64/libglib-2.0.so.0 qemu#11 0x00007f46c724799a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 qemu#12 0x000000000076c646 in glib_pollfds_poll () at main_loop.c:228 qemu#13 0x000000000076c6eb in os_host_main_loop_wait (timeout=348000000) at main_loop.c:273 qemu#14 0x000000000076c815 in main_loop_wait (nonblocking=nonblocking@entry=0) at main_loop.c:521 qemu#15 0x000000000056a511 in main_loop () at vl.c:2076 qemu#16 0x0000000000420705 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4940 The cause of this problem is a glibc bug; for more information, see https://sourceware.org/bugzilla/show_bug.cgi?id=19951. The solution for this bug is to use pthread_attr_setdetachstate. There is a similar issue with pthread_setname_np, which is moved from creating thread to created thread. Signed-off-by: linzhecheng <linzhecheng@huawei.com> Message-Id: <20171128044656.10592-1-linzhecheng@huawei.com> Reviewed-by: Fam Zheng <famz@redhat.com> [Simplify the code by removing qemu_thread_set_name, and free the arguments before invoking the start routine. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The find_desc_by_name() from util/qemu-option.c relies on the .name not being NULL to call strcmp(). This check becomes unsafe when the list is not NULL-terminated, which is the case of nbd_runtime_opts in block/nbd.c, and can result in segmentation fault when strcmp() tries to access an invalid memory: #0 0x00007fff8c75f7d4 in __strcmp_power9 () from /lib64/libc.so.6 #1 0x00000000102d3ec8 in find_desc_by_name (desc=0x1036d6f0, name=0x28e46670 "server.path") at util/qemu-option.c:166 qemu#2 0x00000000102d93e0 in qemu_opts_absorb_qdict (opts=0x28e47a80, qdict=0x28e469a0, errp=0x7fffec247c98) at util/qemu-option.c:1026 qemu#3 0x000000001012a2e4 in nbd_open (bs=0x28e42290, options=0x28e469a0, flags=24578, errp=0x7fffec247d80) at block/nbd.c:406 qemu#4 0x00000000100144e8 in bdrv_open_driver (bs=0x28e42290, drv=0x1036e070 <bdrv_nbd_unix>, node_name=0x0, options=0x28e469a0, open_flags=24578, errp=0x7fffec247f50) at block.c:1135 qemu#5 0x0000000010015b04 in bdrv_open_common (bs=0x28e42290, file=0x0, options=0x28e469a0, errp=0x7fffec247f50) at block.c:1395 >From gdb, the desc[i].name was not NULL and resulted in strcmp() accessing an invalid memory: >>> p desc[5] $8 = { name = 0x1037f098 "R27A", type = 1561964883, help = 0xc0bbb23e <error: Cannot access memory at address 0xc0bbb23e>, def_value_str = 0x2 <error: Cannot access memory at address 0x2> } >>> p desc[6] $9 = { name = 0x103dac78 <__gcov0.do_qemu_init_bdrv_nbd_init> "\001", type = 272101528, help = 0x29ec0b754403e31f <error: Cannot access memory at address 0x29ec0b754403e31f>, def_value_str = 0x81f343b9 <error: Cannot access memory at address 0x81f343b9> } This patch fixes the segmentation fault in strcmp() by adding a NULL element at the end of nbd_runtime_opts.desc list, which is the common practice to most of other structs like runtime_opts in block/null.c. Thus, the desc[i].name != NULL check becomes safe because it will not evaluate to true when .desc list reached its end. Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com> Buglink: https://bugs.launchpad.net/qemu/+bug/1727259 Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.vnet.ibm.com> Message-Id: <20180105133241.14141-2-muriloo@linux.vnet.ibm.com> CC: qemu-stable@nongnu.org Fixes: 7ccc44f Signed-off-by: Eric Blake <eblake@redhat.com>
/public/qobject_is_equal_conversion: OK ================================================================= ==14396==ERROR: LeakSanitizer: detected memory leaks Direct leak of 56 byte(s) in 1 object(s) allocated from: #0 0x7f07682c5850 in malloc (/lib64/libasan.so.4+0xde850) #1 0x7f0767d12f0c in g_malloc ../glib/gmem.c:94 qemu#2 0x7f0767d131cf in g_malloc_n ../glib/gmem.c:331 qemu#3 0x562bd767371f in do_test_equality /home/elmarco/src/qq/tests/check-qobject.c:49 qemu#4 0x562bd7674a35 in qobject_is_equal_dict_test /home/elmarco/src/qq/tests/check-qobject.c:267 qemu#5 0x7f0767d37b04 in test_case_run ../glib/gtestutils.c:2237 qemu#6 0x7f0767d37ec4 in g_test_run_suite_internal ../glib/gtestutils.c:2321 qemu#7 0x7f0767d37f6d in g_test_run_suite_internal ../glib/gtestutils.c:2333 qemu#8 0x7f0767d38184 in g_test_run_suite ../glib/gtestutils.c:2408 qemu#9 0x7f0767d36e0d in g_test_run ../glib/gtestutils.c:1674 qemu#10 0x562bd7674e75 in main /home/elmarco/src/qq/tests/check-qobject.c:327 qemu#11 0x7f0766009039 in __libc_start_main (/lib64/libc.so.6+0x21039) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180104160523.22995-9-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Note that data_dir[] will now point to allocated strings. Fixes: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7f1448181850 in malloc (/lib64/libasan.so.4+0xde850) #1 0x7f1446ed8f0c in g_malloc ../glib/gmem.c:94 qemu#2 0x7f1446ed91cf in g_malloc_n ../glib/gmem.c:331 qemu#3 0x7f1446ef739a in g_strsplit ../glib/gstrfuncs.c:2364 qemu#4 0x55cf276439d7 in main /home/elmarco/src/qq/vl.c:4311 qemu#5 0x7f143dfad039 in __libc_start_main (/lib64/libc.so.6+0x21039) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180104160523.22995-10-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Leak found thanks to ASAN: Direct leak of 8 byte(s) in 1 object(s) allocated from: #0 0x55995789ac90 in __interceptor_malloc (/home/elmarco/src/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x1510c90) #1 0x7f0a91190f0c in g_malloc /home/elmarco/src/gnome/glib/builddir/../glib/gmem.c:94 qemu#2 0x5599580a281c in v9fs_path_copy /home/elmarco/src/qemu/hw/9pfs/9p.c:196:17 qemu#3 0x559958f9ec5d in coroutine_trampoline /home/elmarco/src/qemu/util/coroutine-ucontext.c:116:9 qemu#4 0x7f0a8766ebbf (/lib64/libc.so.6+0x50bbf) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org>
Fixes the following ASAN report: Direct leak of 128 byte(s) in 8 object(s) allocated from: #0 0x7fefce311850 in malloc (/lib64/libasan.so.4+0xde850) #1 0x7fefcdd5ef0c in g_malloc ../glib/gmem.c:94 qemu#2 0x559b976faff0 in create_ahci_io_test /home/elmarco/src/qemu/tests/ahci-test.c:1810 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180215212552.26997-6-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix the following ASAN reports: ==20125==ERROR: LeakSanitizer: detected memory leaks Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f0faea03a38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38) #1 0x7f0fae450f75 in g_malloc0 ../glib/gmem.c:124 qemu#2 0x562fffd526fc in machine_start /home/elmarco/src/qemu/tests/sdhci-test.c:180 Indirect leak of 152 byte(s) in 1 object(s) allocated from: #0 0x7f0faea03850 in malloc (/lib64/libasan.so.4+0xde850) #1 0x7f0fae450f0c in g_malloc ../glib/gmem.c:94 qemu#2 0x562fffd5d21d in qpci_init_pc /home/elmarco/src/qemu/tests/libqos/pci-pc.c:122 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180215212552.26997-7-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The way we determine if we can start the incoming migration was changed to use migration_has_all_channels() in: commit 428d890 Author: Juan Quintela <quintela@redhat.com> Date: Mon Jul 24 13:06:25 2017 +0200 migration: Create migration_has_all_channels This method in turn calls multifd_recv_all_channels_created() which is hardcoded to always return 'true' when multifd is not in use. This is a latent bug... ...activated in a following commit where that return result ends up acting as the flag to indicate whether it is possible to start processing the migration: commit 36c2f8b Author: Juan Quintela <quintela@redhat.com> Date: Wed Mar 7 08:40:52 2018 +0100 migration: Delay start of migration main routines This means that if channel initialization fails with normal migration, it'll never notice and attempt to start the incoming migration regardless and crash on a NULL pointer. This can be seen, for example, if a client connects to a server requiring TLS, but has an invalid x509 certificate: qemu-system-x86_64: The certificate hasn't got a known issuer qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed. #0 0x00007fffebd24f2b in raise () at /lib64/libc.so.6 #1 0x00007fffebd0f561 in abort () at /lib64/libc.so.6 qemu#2 0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 qemu#3 0x00007fffebd1d692 in () at /lib64/libc.so.6 qemu#4 0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386 qemu#5 0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116 qemu#6 0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6 qemu#7 0x0000000000000000 in () To handle the non-multifd case, we check whether mis->from_src_file is non-NULL. With this in place, the migration server drops the rejected client and stays around waiting for another, hopefully valid, client to arrive. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20180619163552.18206-1-berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
With a Spice port chardev, it is possible to reenter monitor_qapi_event_queue() (when the client disconnects for example). This will dead-lock on monitor_lock. Instead, use some TLS variables to check for recursion and queue the events. Fixes: (gdb) bt #0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0 #1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0 qemu#2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66 qemu#3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645 qemu#4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149 qemu#5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235 qemu#6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316 qemu#7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197 qemu#8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197 qemu#9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414 qemu#10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388 qemu#11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347 qemu#12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0 qemu#13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341 qemu#14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 qemu#15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 qemu#16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353 qemu#17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199 qemu#18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112 qemu#19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147 qemu#20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42 qemu#21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425 qemu#22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468 qemu#23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517 qemu#24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538 qemu#25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624 qemu#26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649 qemu#27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58 qemu#28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822 qemu#29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862 qemu#30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644 Note that error report is now moved to the first caller, which may receive an error for a recursed event. This is probably fine (95% of callers use &error_abort, the rest have NULL error and ignore it) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180731150144.14022-1-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [*_no_recurse renamed to *_no_reenter, local variables reordered] Signed-off-by: Markus Armbruster <armbru@redhat.com>
Spotted by ASAN: ================================================================= ==27907==ERROR: LeakSanitizer: detected memory leaks Direct leak of 4120 byte(s) in 1 object(s) allocated from: #0 0x7f913458ce50 in calloc (/lib64/libasan.so.5+0xeee50) #1 0x7f9133fd641d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d) qemu#2 0x5561c6643c95 in qdict_crumple_test_recursive /home/elmarco/src/qq/tests/check-block-qdict.c:438 qemu#3 0x7f9133ff7c49 (/lib64/libglib-2.0.so.0+0x73c49) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180809114417.28718-2-marcandre.lureau@redhat.com> [Screwed up in commit 2860b2b] Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
Spotted by ASAN, during make check... Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f8e27262c48 in malloc (/lib64/libasan.so.5+0xeec48) #1 0x7f8e26a5f3c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5) qemu#2 0x555ab67078a8 in qstring_from_str /home/elmarco/src/qq/qobject/qstring.c:67 qemu#3 0x555ab67071e4 in qstring_new /home/elmarco/src/qq/qobject/qstring.c:24 qemu#4 0x555ab6713fbf in qstring_from_escaped_str /home/elmarco/src/qq/qobject/json-parser.c:144 qemu#5 0x555ab671738c in parse_literal /home/elmarco/src/qq/qobject/json-parser.c:506 qemu#6 0x555ab67179c3 in parse_value /home/elmarco/src/qq/qobject/json-parser.c:569 qemu#7 0x555ab6715123 in parse_pair /home/elmarco/src/qq/qobject/json-parser.c:306 qemu#8 0x555ab6715483 in parse_object /home/elmarco/src/qq/qobject/json-parser.c:357 qemu#9 0x555ab671798b in parse_value /home/elmarco/src/qq/qobject/json-parser.c:561 qemu#10 0x555ab6717a6b in json_parser_parse_err /home/elmarco/src/qq/qobject/json-parser.c:592 qemu#11 0x555ab4fd4dcf in handle_qmp_command /home/elmarco/src/qq/monitor.c:4257 qemu#12 0x555ab6712c4d in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:105 qemu#13 0x555ab67e01e2 in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:323 qemu#14 0x555ab67e0af6 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:373 qemu#15 0x555ab6713010 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:124 qemu#16 0x555ab4fd58ec in monitor_qmp_read /home/elmarco/src/qq/monitor.c:4337 qemu#17 0x555ab6559df2 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:175 qemu#18 0x555ab6559e95 in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:187 qemu#19 0x555ab6560127 in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66 qemu#20 0x555ab65d9c73 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84 qemu#21 0x7f8e26a598ac in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x4c8ac) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180809114417.28718-4-marcandre.lureau@redhat.com> [Screwed up in commit b273145] Cc: qemu-stable@nongnu.org Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
The ast2500 SDRAM training routine busy waits on the 'init cycle busy state' bit in DDR PHY Control/Status register #1 (MCR60). This ensures the bit always reads zero, and allows training to complete with upstream u-boot on the ast2500-evb. Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Message-id: 20180807075757.7242-5-joel@jms.id.au Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
if qio_channel_rdma_readv return QIO_CHANNEL_ERR_BLOCK, the destination qemu crash. The backtrace is: (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00000000008db50e in qio_channel_set_aio_fd_handler (ioc=0x38111e0, ctx=0x3726080, io_read=0x8db841 <qio_channel_restart_read>, io_write=0x0, opaque=0x38111e0) at io/channel.c: qemu#2 0x00000000008db952 in qio_channel_set_aio_fd_handlers (ioc=0x38111e0) at io/channel.c:438 qemu#3 0x00000000008dbab4 in qio_channel_yield (ioc=0x38111e0, condition=G_IO_IN) at io/channel.c:47 qemu#4 0x00000000007a870b in channel_get_buffer (opaque=0x38111e0, buf=0x440c038 "", pos=0, size=327 at migration/qemu-file-channel.c:83 qemu#5 0x00000000007a70f6 in qemu_fill_buffer (f=0x440c000) at migration/qemu-file.c:299 qemu#6 0x00000000007a79d0 in qemu_peek_byte (f=0x440c000, offset=0) at migration/qemu-file.c:562 qemu#7 0x00000000007a7a22 in qemu_get_byte (f=0x440c000) at migration/qemu-file.c:575 qemu#8 0x00000000007a7c78 in qemu_get_be32 (f=0x440c000) at migration/qemu-file.c:655 qemu#9 0x00000000007a0508 in qemu_loadvm_state (f=0x440c000) at migration/savevm.c:2126 qemu#10 0x0000000000794141 in process_incoming_migration_co (opaque=0x0) at migration/migration.c:366 qemu#11 0x000000000095c598 in coroutine_trampoline (i0=84033984, i1=0) at util/coroutine-ucontext.c:1 qemu#12 0x00007f9c0db56d40 in ?? () from /lib64/libc.so.6 qemu#13 0x00007f96fe858760 in ?? () qemu#14 0x0000000000000000 in ?? () RDMA QIOChannel not implement io_set_aio_fd_handler. so qio_channel_set_aio_fd_handler will access NULL pointer. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
when qio_channel_read return QIO_CHANNEL_ERR_BLOCK, the source qemu crash. The backtrace is: (gdb) bt #0 0x00007fb20aba91d7 in raise () from /lib64/libc.so.6 #1 0x00007fb20abaa8c8 in abort () from /lib64/libc.so.6 qemu#2 0x00007fb20aba2146 in __assert_fail_base () from /lib64/libc.so.6 qemu#3 0x00007fb20aba21f2 in __assert_fail () from /lib64/libc.so.6 qemu#4 0x00000000008dba2d in qio_channel_yield (ioc=0x22f9e20, condition=G_IO_IN) at io/channel.c:460 qemu#5 0x00000000007a870b in channel_get_buffer (opaque=0x22f9e20, buf=0x3d54038 "", pos=0, size=32768) at migration/qemu-file-channel.c:83 qemu#6 0x00000000007a70f6 in qemu_fill_buffer (f=0x3d54000) at migration/qemu-file.c:299 qemu#7 0x00000000007a79d0 in qemu_peek_byte (f=0x3d54000, offset=0) at migration/qemu-file.c:562 qemu#8 0x00000000007a7a22 in qemu_get_byte (f=0x3d54000) at migration/qemu-file.c:575 qemu#9 0x00000000007a7c46 in qemu_get_be16 (f=0x3d54000) at migration/qemu-file.c:647 qemu#10 0x0000000000796db7 in source_return_path_thread (opaque=0x2242280) at migration/migration.c:1794 qemu#11 0x00000000009428fa in qemu_thread_start (args=0x3e58420) at util/qemu-thread-posix.c:504 qemu#12 0x00007fb20af3ddc5 in start_thread () from /lib64/libpthread.so.0 qemu#13 0x00007fb20ac6b74d in clone () from /lib64/libc.so.6 This patch fixed by invoke qio_channel_yield only when qemu_in_coroutine(). Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Because RDMA QIOChannel not implement shutdown function, If the to_dst_file was set error, the return path thread will wait forever. and the migration thread will wait return path thread exit. the backtrace of return path thread is: (gdb) bt #0 0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6 #1 0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000) at qemu-timer.c:325 qemu#2 0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000) at migration/rdma.c:1501 qemu#3 0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000, byte_len=0x7ef7091d0640) at migration/rdma.c:1580 qemu#4 0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000, head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726 qemu#5 0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720, expecting=3) at migration/rdma.c:1903 qemu#6 0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8, size=32768) at migration/rdma.c:2714 qemu#7 0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232 qemu#8 0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0) at migration/qemu-file.c:502 qemu#9 0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515 qemu#10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591 qemu#11 0x00000000006a46d3 in source_return_path_thread ( opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331 qemu#12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0 qemu#13 0x00007f372a77635d in clone () from /lib64/libc.so.6 the backtrace of migration thread is: (gdb) bt #0 0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0 #1 0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>) at util/qemu-thread-posix.c:504 qemu#2 0x00000000006a4bc5 in await_return_path_close_on_source ( ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460 qemu#3 0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>, current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980) at migration/migration.c:1695 qemu#4 0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1837 qemu#5 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0 qemu#6 0x00007f372a77635d in clone () from /lib64/libc.so.6 Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
tests/cdrom-test -p /x86_64/cdrom/boot/megasas Produces the following ASAN leak. ==25700==ERROR: LeakSanitizer: detected memory leaks Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7f06f8faac48 in malloc (/lib64/libasan.so.5+0xeec48) #1 0x7f06f87a73c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5) qemu#2 0x55a729f17738 in pci_dma_sglist_init /home/elmarco/src/qq/include/hw/pci/pci.h:818 qemu#3 0x55a729f2a706 in megasas_map_dcmd /home/elmarco/src/qq/hw/scsi/megasas.c:698 qemu#4 0x55a729f39421 in megasas_handle_dcmd /home/elmarco/src/qq/hw/scsi/megasas.c:1574 qemu#5 0x55a729f3f70d in megasas_handle_frame /home/elmarco/src/qq/hw/scsi/megasas.c:1955 qemu#6 0x55a729f40939 in megasas_mmio_write /home/elmarco/src/qq/hw/scsi/megasas.c:2119 qemu#7 0x55a729f41102 in megasas_port_write /home/elmarco/src/qq/hw/scsi/megasas.c:2170 qemu#8 0x55a729220e60 in memory_region_write_accessor /home/elmarco/src/qq/memory.c:527 qemu#9 0x55a7292212b3 in access_with_adjusted_size /home/elmarco/src/qq/memory.c:594 qemu#10 0x55a72922cf70 in memory_region_dispatch_write /home/elmarco/src/qq/memory.c:1473 qemu#11 0x55a7290f5907 in flatview_write_continue /home/elmarco/src/qq/exec.c:3255 qemu#12 0x55a7290f5ceb in flatview_write /home/elmarco/src/qq/exec.c:3294 qemu#13 0x55a7290f6457 in address_space_write /home/elmarco/src/qq/exec.c:3384 qemu#14 0x55a7290f64a8 in address_space_rw /home/elmarco/src/qq/exec.c:3395 qemu#15 0x55a72929ecb0 in kvm_handle_io /home/elmarco/src/qq/accel/kvm/kvm-all.c:1729 qemu#16 0x55a7292a0db5 in kvm_cpu_exec /home/elmarco/src/qq/accel/kvm/kvm-all.c:1969 qemu#17 0x55a7291c4212 in qemu_kvm_cpu_thread_fn /home/elmarco/src/qq/cpus.c:1215 qemu#18 0x55a72a966a6c in qemu_thread_start /home/elmarco/src/qq/util/qemu-thread-posix.c:504 qemu#19 0x7f06ed486593 in start_thread (/lib64/libpthread.so.0+0x7593) Move the qemu_sglist_destroy() from megasas_complete_command() to megasas_unmap_frame(), so map/unmap are balanced. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180814141247.32336-1-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Spotted by ASAN: ================================================================= ==5378==ERROR: LeakSanitizer: detected memory leaks Direct leak of 65536 byte(s) in 1 object(s) allocated from: #0 0x7f788f83bc48 in malloc (/lib64/libasan.so.5+0xeec48) #1 0x7f788c9923c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5) qemu#2 0x5622a1fe37bc in coroutine_trampoline /home/elmarco/src/qq/util/coroutine-ucontext.c:116 qemu#3 0x7f788a15d75f in __correctly_grouped_prefixwc (/lib64/libc.so.6+0x4c75f) (Broken in commit 4c8158e.) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-id: 20180809114417.28718-3-marcandre.lureau@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
We need to set cs->halted to 1 before calling ppc_set_compat. The reason is that ppc_set_compat kicks up the new thread created to manage the hotplugged KVM virtual CPU and the code drives directly to KVM_RUN ioctl. When cs->halted is 1, the code: int kvm_cpu_exec(CPUState *cpu) ... if (kvm_arch_process_async_events(cpu)) { atomic_set(&cpu->exit_request, 0); return EXCP_HLT; } ... returns before it reaches KVM_RUN, giving time to the main thread to finish its job. Otherwise we can fall in a deadlock because the KVM thread will issue the KVM_RUN ioctl while the main thread is setting up KVM registers. Depending on how these jobs are scheduled we'll end up freezing QEMU. The following output shows kvm_vcpu_ioctl sleeping because it cannot get the mutex and never will. PS: kvm_vcpu_ioctl was triggered kvm_set_one_reg - compat_pvr. STATE: TASK_UNINTERRUPTIBLE|TASK_WAKEKILL PID: 61564 TASK: c000003e981e0780 CPU: 48 COMMAND: "qemu-system-ppc" #0 [c000003e982679a0] __schedule at c000000000b10a44 #1 [c000003e98267a60] schedule at c000000000b113a8 qemu#2 [c000003e98267a90] schedule_preempt_disabled at c000000000b11910 qemu#3 [c000003e98267ab0] __mutex_lock at c000000000b132ec qemu#4 [c000003e98267bc0] kvm_vcpu_ioctl at c00800000ea03140 [kvm] qemu#5 [c000003e98267d20] do_vfs_ioctl at c000000000407d30 qemu#6 [c000003e98267dc0] ksys_ioctl at c000000000408674 qemu#7 [c000003e98267e10] sys_ioctl at c0000000004086f8 qemu#8 [c000003e98267e30] system_call at c00000000000b488 crash> struct -x kvm.vcpus 0xc000003da0000000 vcpus = {0xc000003db4880000, 0xc000003d52b80000, 0xc0000039e9c80000, 0xc000003d0e200000, 0xc000003d58280000, 0x0, 0x0, ...} crash> struct -x kvm_vcpu.mutex.owner 0xc000003d58280000 mutex.owner = { counter = 0xc000003a23a5c881 <- flag 1: waiters }, crash> bt 0xc000003a23a5c880 PID: 61579 TASK: c000003a23a5c880 CPU: 9 COMMAND: "CPU 4/KVM" (active) crash> struct -x kvm_vcpu.mutex.wait_list 0xc000003d58280000 mutex.wait_list = { next = 0xc000003e98267b10, prev = 0xc000003e98267b10 }, crash> struct -x mutex_waiter.task 0xc000003e98267b10 task = 0xc000003e981e0780 The following command-line was used to reproduce the problem (note: gdb and trace can change the results). $ qemu-ppc/build/ppc64-softmmu/qemu-system-ppc64 -cpu host \ -enable-kvm -m 4096 \ -smp 4,maxcpus=8,sockets=1,cores=2,threads=4 \ -display none -nographic \ -drive file=disk1.qcow2,format=qcow2 ... (qemu) device_add host-spapr-cpu-core,core-id=4 [no interaction is possible after it, only SIGKILL to take the terminal back] Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Spotted by ASAN doing some manual testing: Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f5fcdc75e50 in calloc (/lib64/libasan.so.5+0xeee50) #1 0x7f5fcd47241d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d) qemu#2 0x55f989be92ce in timer_new /home/elmarco/src/qq/include/qemu/timer.h:561 qemu#3 0x55f989be92ff in timer_new_ms /home/elmarco/src/qq/include/qemu/timer.h:630 qemu#4 0x55f989c0219d in hmp_migrate /home/elmarco/src/qq/hmp.c:2038 qemu#5 0x55f98955927b in handle_hmp_command /home/elmarco/src/qq/monitor.c:3498 qemu#6 0x55f98955fb8c in monitor_command_cb /home/elmarco/src/qq/monitor.c:4371 qemu#7 0x55f98ad40f11 in readline_handle_byte /home/elmarco/src/qq/util/readline.c:393 qemu#8 0x55f98955fa4f in monitor_read /home/elmarco/src/qq/monitor.c:4354 qemu#9 0x55f98aae30d7 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:175 qemu#10 0x55f98aae317a in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:187 qemu#11 0x55f98aae940c in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66 qemu#12 0x55f98ab63018 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84 qemu#13 0x7f5fcd46c8ac in g_main_dispatch gmain.c:3177 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180901134652.25884-1-marcandre.lureau@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
In qemu_laio_process_completions_and_submit, the AioContext is acquired before the ioq_submit iteration and after qemu_laio_process_completions, but the latter is not thread safe either. This change avoids a number of random crashes when the Main Thread and an IO Thread collide processing completions for the same AioContext. This is an example of such crash: - The IO Thread is trying to acquire the AioContext at aio_co_enter, which evidences that it didn't lock it before: Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)): #0 0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0 qemu#2 0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0) at ../nptl/pthread_mutex_lock.c:114 qemu#3 0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66 qemu#4 0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493 qemu#5 0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478 qemu#6 0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104 qemu#7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 qemu#8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 qemu#9 0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406 qemu#10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true) at util/aio-posix.c:693 qemu#11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64 qemu#12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308 qemu#13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 - The Main Thread is also processing completions from the same AioContext, and crashes due to failed assertion at util/iov.c:78: Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)): #0 0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90 qemu#2 0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:92 qemu#3 0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101 qemu#4 0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78 qemu#5 0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410 qemu#6 0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88 qemu#7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 qemu#8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 qemu#9 0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272 qemu#10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497 qemu#11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574 qemu#12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false) at util/aio-posix.c:604 qemu#13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273 qemu#14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390 qemu#15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590 qemu#16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774 qemu#17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401 qemu#18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449 qemu#19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080) at block/commit.c:92 qemu#20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973 qemu#21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90 qemu#22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118 qemu#23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436 qemu#24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261 qemu#25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201 qemu#26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854 qemu#27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215 qemu#28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238 qemu#29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497 qemu#30 0x00005631fbd81412 in main_loop () at vl.c:1866 qemu#31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4647 - A closer examination shows that s->io_q.in_flight appears to have gone backwards: (gdb) frame 7 qemu#7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 222 qemu_laio_process_completion(laiocb); (gdb) p s $2 = (LinuxAioState *) 0x7fdfc0297670 (gdb) p *s $3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0, in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0, sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241} (gdb) p/x s->io_q.in_flight $4 = 0xfffffff0 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Spotted by ASAN while running: $ tests/migration-test -p /x86_64/migration/postcopy/recovery ================================================================= ==18034==ERROR: LeakSanitizer: detected memory leaks Direct leak of 33864 byte(s) in 1 object(s) allocated from: #0 0x7f3da7f31e50 in calloc (/lib64/libasan.so.5+0xeee50) #1 0x7f3da644441d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d) qemu#2 0x55af9db15440 in qemu_fopen_channel_input /home/elmarco/src/qemu/migration/qemu-file-channel.c:183 qemu#3 0x55af9db15413 in channel_get_output_return_path /home/elmarco/src/qemu/migration/qemu-file-channel.c:159 qemu#4 0x55af9db0d4ac in qemu_file_get_return_path /home/elmarco/src/qemu/migration/qemu-file.c:78 qemu#5 0x55af9dad5e4f in open_return_path_on_source /home/elmarco/src/qemu/migration/migration.c:2295 qemu#6 0x55af9dadb3bf in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:3111 qemu#7 0x55af9dae1bf3 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:91 qemu#8 0x55af9daddeca in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:108 qemu#9 0x55af9e13d3db in qio_task_complete /home/elmarco/src/qemu/io/task.c:158 qemu#10 0x55af9e13ca03 in qio_task_thread_result /home/elmarco/src/qemu/io/task.c:89 qemu#11 0x7f3da643b1ca in g_idle_dispatch gmain.c:5535 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180925092245.29565-1-marcandre.lureau@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Recently, the test case has started failing because some job related functions want to drop the AioContext lock even though it hasn't been taken: (gdb) bt #0 0x00007f51c067c9fb in raise () from /lib64/libc.so.6 #1 0x00007f51c067e77d in abort () from /lib64/libc.so.6 qemu#2 0x0000558c9d5dde7b in error_exit (err=<optimized out>, msg=msg@entry=0x558c9d6fe120 <__func__.18373> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36 qemu#3 0x0000558c9d6b5263 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x558c9f3999a0, file=file@entry=0x558c9d6fd36f "util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96 qemu#4 0x0000558c9d6b0565 in aio_context_release (ctx=ctx@entry=0x558c9f399940) at util/async.c:516 qemu#5 0x0000558c9d5eb3da in job_completed_txn_abort (job=0x558c9f68e640) at job.c:738 qemu#6 0x0000558c9d5eb227 in job_finish_sync (job=0x558c9f68e640, finish=finish@entry=0x558c9d5eb8d0 <job_cancel_err>, errp=errp@entry=0x0) at job.c:986 qemu#7 0x0000558c9d5eb8ee in job_cancel_sync (job=<optimized out>) at job.c:941 qemu#8 0x0000558c9d64d853 in replication_close (bs=<optimized out>) at block/replication.c:148 qemu#9 0x0000558c9d5e5c9f in bdrv_close (bs=0x558c9f41b020) at block.c:3420 qemu#10 bdrv_delete (bs=0x558c9f41b020) at block.c:3629 qemu#11 bdrv_unref (bs=0x558c9f41b020) at block.c:4685 qemu#12 0x0000558c9d62a3f3 in blk_remove_bs (blk=blk@entry=0x558c9f42a7c0) at block/block-backend.c:783 qemu#13 0x0000558c9d62a667 in blk_delete (blk=0x558c9f42a7c0) at block/block-backend.c:402 qemu#14 blk_unref (blk=0x558c9f42a7c0) at block/block-backend.c:457 qemu#15 0x0000558c9d5dfcea in test_secondary_stop () at tests/test-replication.c:478 qemu#16 0x00007f51c1f13178 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 qemu#17 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 qemu#18 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 qemu#19 0x00007f51c1f13552 in g_test_run_suite () from /lib64/libglib-2.0.so.0 qemu#20 0x00007f51c1f13571 in g_test_run () from /lib64/libglib-2.0.so.0 qemu#21 0x0000558c9d5de31f in main (argc=<optimized out>, argv=<optimized out>) at tests/test-replication.c:581 It is yet unclear whether this should really be considered a bug in the test case or whether blk_unref() should work for callers that haven't taken the AioContext lock, but in order to fix the build tests quickly, just take the AioContext lock around blk_unref(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>
A socket chardev may not have associated address (when adding client fd manually for example). But on disconnect, updating socket filename expects an address and may lead to this crash: Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000555555d8c70c in SocketAddress_to_str (prefix=0x555556043062 "disconnected:", addr=0x0, is_listen=false, is_telnet=false) at /home/elmarco/src/qq/chardev/char-socket.c:388 388 switch (addr->type) { (gdb) bt #0 0x0000555555d8c70c in SocketAddress_to_str (prefix=0x555556043062 "disconnected:", addr=0x0, is_listen=false, is_telnet=false) at /home/elmarco/src/qq/chardev/char-socket.c:388 #1 0x0000555555d8c8aa in update_disconnected_filename (s=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:419 qemu#2 0x0000555555d8c959 in tcp_chr_disconnect (chr=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:438 qemu#3 0x0000555555d8cba1 in tcp_chr_hup (channel=0x555556b75690, cond=G_IO_HUP, opaque=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:482 qemu#4 0x0000555555da596e in qio_channel_fd_source_dispatch (source=0x555556bb68b0, callback=0x555555d8cb58 <tcp_chr_hup>, user_data=0x555556b1ed00) at /home/elmarco/src/qq/io/channel-watch.c:84 Replace filename with a generic "disconnected:socket" in this case. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Spotted by ASAN: ================================================================= ==11893==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1120 byte(s) in 28 object(s) allocated from: #0 0x7fd0515b0c48 in malloc (/lib64/libasan.so.5+0xeec48) #1 0x7fd050ffa3c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5) qemu#2 0x559e708b56a4 in qstring_from_str /home/elmarco/src/qq/qobject/qstring.c:66 qemu#3 0x559e708b4fe0 in qstring_new /home/elmarco/src/qq/qobject/qstring.c:23 qemu#4 0x559e708bda7d in parse_string /home/elmarco/src/qq/qobject/json-parser.c:143 qemu#5 0x559e708c1009 in parse_literal /home/elmarco/src/qq/qobject/json-parser.c:484 qemu#6 0x559e708c1627 in parse_value /home/elmarco/src/qq/qobject/json-parser.c:547 qemu#7 0x559e708c1c67 in json_parser_parse /home/elmarco/src/qq/qobject/json-parser.c:573 qemu#8 0x559e708bc0ff in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:92 qemu#9 0x559e708d1655 in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:292 qemu#10 0x559e708d1fe1 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:339 qemu#11 0x559e708bc856 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:121 qemu#12 0x559e708b8b4b in qobject_from_jsonv /home/elmarco/src/qq/qobject/qjson.c:69 qemu#13 0x559e708b8d02 in qobject_from_json /home/elmarco/src/qq/qobject/qjson.c:83 qemu#14 0x559e708a74ae in from_json_str /home/elmarco/src/qq/tests/check-qjson.c:30 qemu#15 0x559e708a9f83 in utf8_string /home/elmarco/src/qq/tests/check-qjson.c:781 qemu#16 0x7fd05101bc49 in test_case_run gtestutils.c:2255 qemu#17 0x7fd05101bc49 in g_test_run_suite_internal gtestutils.c:2339 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180901211917.10372-1-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
A quick coredump on an incomplete command line: ./x86_64-softmmu/qemu-system-x86_64 -mon mode=control,pretty=on #0 0x00007ffff723d9e4 in g_str_hash () at /lib64/libglib-2.0.so.0 #1 0x00007ffff723ce38 in g_hash_table_lookup () at /lib64/libglib-2.0.so.0 qemu#2 0x0000555555cc0073 in object_class_property_find (klass=0x5555566a94b0, name=0x0, errp=0x0) at qom/object.c:1135 qemu#3 0x0000555555cc004b in object_class_property_find (klass=0x5555566a9440, name=0x0, errp=0x0) at qom/object.c:1129 qemu#4 0x0000555555cbfe6e in object_property_find (obj=0x5555568348c0, name=0x0, errp=0x0) at qom/object.c:1080 qemu#5 0x0000555555cc183d in object_resolve_path_component (parent=0x5555568348c0, part=0x0) at qom/object.c:1762 qemu#6 0x0000555555d82071 in qemu_chr_find (name=0x0) at chardev/char.c:802 qemu#7 0x00005555559d77cb in mon_init_func (opaque=0x0, opts=0x5555566b65a0, errp=0x0) at vl.c:2291 Fix it to instead fail gracefully. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20181023213600.364086-1-eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
When using the 9P2000.u version of the protocol, the following shell command line in the guest can cause QEMU to crash: while true; do rm -rf aa; mkdir -p a/b & touch a/b/c & mv a aa; done With 9P2000.u, file renaming is handled by the WSTAT command. The v9fs_wstat() function calls v9fs_complete_rename(), which calls v9fs_fix_path() for every fid whose path is affected by the change. The involved calls to v9fs_path_copy() may race with any other access to the fid path performed by some worker thread, causing a crash like shown below: Thread 12 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59 59 while (*path && fd != -1) { (gdb) bt #0 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59 #1 0x0000555555a25e0c in local_opendir_nofollow (fs_ctx=0x555557d958b8, path=0x0) at hw/9pfs/9p-local.c:92 qemu#2 0x0000555555a261b8 in local_lstat (fs_ctx=0x555557d958b8, fs_path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/9p-local.c:185 qemu#3 0x0000555555a2b367 in v9fs_co_lstat (pdu=0x555557d97498, path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/cofile.c:53 qemu#4 0x0000555555a1e9e2 in v9fs_stat (opaque=0x555557d97498) at hw/9pfs/9p.c:1083 qemu#5 0x0000555555e060a2 in coroutine_trampoline (i0=-669165424, i1=32767) at util/coroutine-ucontext.c:116 qemu#6 0x00007fffef4f5600 in __start_context () at /lib64/libc.so.6 qemu#7 0x0000000000000000 in () (gdb) The fix is to take the path write lock when calling v9fs_complete_rename(), like in v9fs_rename(). Impact: DoS triggered by unprivileged guest users. Fixes: CVE-2018-19489 Cc: P J P <ppandit@redhat.com> Reported-by: zhibin hu <noirfate@gmail.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Greg Kurz <groug@kaod.org>
After iov_discard_front(), the iov may be smaller than its initial size. Fixes the heap-buffer-overflow spotted by ASAN: ==9036==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6060000001e0 at pc 0x7fe632eca3f0 bp 0x7ffddc4a05a0 sp 0x7ffddc49fd48 WRITE of size 32 at 0x6060000001e0 thread T0 #0 0x7fe632eca3ef (/lib64/libasan.so.5+0x773ef) #1 0x7fe632ecad23 in __interceptor_recvmsg (/lib64/libasan.so.5+0x77d23) qemu#2 0x561e7491936b in vubr_backend_recv_cb /home/elmarco/src/qemu/tests/vhost-user-bridge.c:333 qemu#3 0x561e74917711 in dispatcher_wait /home/elmarco/src/qemu/tests/vhost-user-bridge.c:160 qemu#4 0x561e7491c3b5 in vubr_run /home/elmarco/src/qemu/tests/vhost-user-bridge.c:725 qemu#5 0x561e7491c85c in main /home/elmarco/src/qemu/tests/vhost-user-bridge.c:806 qemu#6 0x7fe631a6c412 in __libc_start_main (/lib64/libc.so.6+0x24412) qemu#7 0x561e7491667d in _start (/home/elmarco/src/qemu/build/tests/vhost-user-bridge+0x3967d) 0x6060000001e0 is located 0 bytes to the right of 64-byte region [0x6060000001a0,0x6060000001e0) allocated by thread T0 here: #0 0x7fe632f42848 in __interceptor_malloc (/lib64/libasan.so.5+0xef848) #1 0x561e7493acd8 in virtqueue_alloc_element /home/elmarco/src/qemu/contrib/libvhost-user/libvhost-user.c:1848 qemu#2 0x561e7493c2a8 in vu_queue_pop /home/elmarco/src/qemu/contrib/libvhost-user/libvhost-user.c:1954 qemu#3 0x561e749189bf in vubr_backend_recv_cb /home/elmarco/src/qemu/tests/vhost-user-bridge.c:297 qemu#4 0x561e74917711 in dispatcher_wait /home/elmarco/src/qemu/tests/vhost-user-bridge.c:160 qemu#5 0x561e7491c3b5 in vubr_run /home/elmarco/src/qemu/tests/vhost-user-bridge.c:725 qemu#6 0x561e7491c85c in main /home/elmarco/src/qemu/tests/vhost-user-bridge.c:806 qemu#7 0x7fe631a6c412 in __libc_start_main (/lib64/libc.so.6+0x24412) SUMMARY: AddressSanitizer: heap-buffer-overflow (/lib64/libasan.so.5+0x773ef) Shadow bytes around the buggy address: 0x0c0c7fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c0c7fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c0c7fff8000: fa fa fa fa 00 00 00 00 00 00 05 fa fa fa fa fa 0x0c0c7fff8010: 00 00 00 00 00 00 00 00 fa fa fa fa fd fd fd fd 0x0c0c7fff8020: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd =>0x0c0c7fff8030: fa fa fa fa 00 00 00 00 00 00 00 00[fa]fa fa fa 0x0c0c7fff8040: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd 0x0c0c7fff8050: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd 0x0c0c7fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c0c7fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c0c7fff8080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20181109173028.3372-1-marcandre.lureau@redhat.com> Signed-off-by: Paolo BOnzini <pbonzini@redhat.com>
Let start from the beginning: Commit b9e413d (in 2.9) "block: explicitly acquire aiocontext in aio callbacks that need it" added pairs of aio_context_acquire/release to mirror_write_complete and mirror_read_complete, when they were aio callbacks for blk_aio_* calls. Then, commit 2e1990b (in 3.0) "block/mirror: Convert to coroutines" dropped these blk_aio_* calls, than mirror_write_complete and mirror_read_complete are not callbacks more, and don't need additional aiocontext acquiring. Furthermore, mirror_read_complete calls blk_co_pwritev inside these pair of aio_context_acquire/release, which leads to the following dead-lock with mirror: (gdb) info thr Id Target Id Frame 3 Thread (LWP 145412) "qemu-system-x86" syscall () 2 Thread (LWP 145416) "qemu-system-x86" __lll_lock_wait () * 1 Thread (LWP 145411) "qemu-system-x86" __lll_lock_wait () (gdb) bt #0 __lll_lock_wait () #1 _L_lock_812 () qemu#2 __GI___pthread_mutex_lock qemu#3 qemu_mutex_lock_impl (mutex=0x561032dce420 <qemu_global_mutex>, file=0x5610327d8654 "util/main-loop.c", line=236) at util/qemu-thread-posix.c:66 qemu#4 qemu_mutex_lock_iothread_impl qemu#5 os_host_main_loop_wait (timeout=480116000) at util/main-loop.c:236 qemu#6 main_loop_wait (nonblocking=0) at util/main-loop.c:497 qemu#7 main_loop () at vl.c:1892 qemu#8 main Printing contents of qemu_global_mutex, I see that "__owner = 145416", so, thr1 is main loop, and now it wants BQL, which is owned by thr2. (gdb) thr 2 (gdb) bt #0 __lll_lock_wait () #1 _L_lock_870 () qemu#2 __GI___pthread_mutex_lock qemu#3 qemu_mutex_lock_impl (mutex=0x561034d25dc0, ... qemu#4 aio_context_acquire (ctx=0x561034d25d60) qemu#5 dma_blk_cb qemu#6 dma_blk_io qemu#7 dma_blk_read qemu#8 ide_dma_cb qemu#9 bmdma_cmd_writeb qemu#10 bmdma_write qemu#11 memory_region_write_accessor qemu#12 access_with_adjusted_size qemu#15 flatview_write qemu#16 address_space_write qemu#17 address_space_rw qemu#18 kvm_handle_io qemu#19 kvm_cpu_exec qemu#20 qemu_kvm_cpu_thread_fn qemu#21 qemu_thread_start qemu#22 start_thread qemu#23 clone () Printing mutex in fr 2, I see "__owner = 145411", so thr2 wants aio context mutex, which is owned by thr1. Classic dead-lock. Then, let's check that aio context is hold by mirror coroutine: just print coroutine stack of first tracked request in mirror job target: (gdb) [...] (gdb) qemu coroutine 0x561035dd0860 #0 qemu_coroutine_switch #1 qemu_coroutine_yield qemu#2 qemu_co_mutex_lock_slowpath qemu#3 qemu_co_mutex_lock qemu#4 qcow2_co_pwritev qemu#5 bdrv_driver_pwritev qemu#6 bdrv_aligned_pwritev qemu#7 bdrv_co_pwritev qemu#8 blk_co_pwritev qemu#9 mirror_read_complete () at block/mirror.c:232 qemu#10 mirror_co_read () at block/mirror.c:370 qemu#11 coroutine_trampoline qemu#12 __start_context Yes it is mirror_read_complete calling blk_co_pwritev after acquiring aio context. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When a monitor is connected to a Spice chardev, the monitor cleanup can dead-lock: #0 0x00007f43446637fd in __lll_lock_wait () at /lib64/libpthread.so.0 #1 0x00007f434465ccf4 in pthread_mutex_lock () at /lib64/libpthread.so.0 qemu#2 0x0000556dd79f22ba in qemu_mutex_lock_impl (mutex=0x556dd81c9220 <monitor_lock>, file=0x556dd7ae3648 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66 qemu#3 0x0000556dd7431bd5 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x556dd9abc850, errp=0x7fffb7bbddd8) at /home/elmarco/src/qq/monitor.c:645 qemu#4 0x0000556dd79d476b in qapi_event_send_spice_disconnected (server=0x556dd98ee760, client=0x556ddaaa8560, errp=0x556dd82180d0 <error_abort>) at qapi/qapi-events-ui.c:149 qemu#5 0x0000556dd7870fc1 in channel_event (event=3, info=0x556ddad1b590) at /home/elmarco/src/qq/ui/spice-core.c:235 qemu#6 0x00007f434560a6bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x556ddad1b590) at reds.c:316 qemu#7 0x00007f43455f393b in main_dispatcher_self_handle_channel_event (info=0x556ddad1b590, event=3, self=0x556dd9a7d8c0) at main-dispatcher.c:197 qemu#8 0x00007f43455f393b in main_dispatcher_channel_event (self=0x556dd9a7d8c0, event=event@entry=3, info=0x556ddad1b590) at main-dispatcher.c:197 qemu#9 0x00007f4345612833 in red_stream_push_channel_event (s=s@entry=0x556ddae2ef40, event=event@entry=3) at red-stream.c:414 qemu#10 0x00007f434561286b in red_stream_free (s=0x556ddae2ef40) at red-stream.c:388 qemu#11 0x00007f43455f9ddc in red_channel_client_finalize (object=0x556dd9bb21a0) at red-channel-client.c:347 qemu#12 0x00007f434b5f9fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0 qemu#13 0x00007f43455fc212 in red_channel_client_push (rcc=0x556dd9bb21a0) at red-channel-client.c:1341 qemu#14 0x0000556dd76081ba in spice_port_set_fe_open (chr=0x556dd9925e20, fe_open=0) at /home/elmarco/src/qq/chardev/spice.c:241 qemu#15 0x0000556dd796d74a in qemu_chr_fe_set_open (be=0x556dd9a37c00, fe_open=0) at /home/elmarco/src/qq/chardev/char-fe.c:340 qemu#16 0x0000556dd796d4d9 in qemu_chr_fe_set_handlers (b=0x556dd9a37c00, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=true) at /home/elmarco/src/qq/chardev/char-fe.c:280 qemu#17 0x0000556dd796d359 in qemu_chr_fe_deinit (b=0x556dd9a37c00, del=false) at /home/elmarco/src/qq/chardev/char-fe.c:233 qemu#18 0x0000556dd7432240 in monitor_data_destroy (mon=0x556dd9a37c00) at /home/elmarco/src/qq/monitor.c:786 qemu#19 0x0000556dd743b968 in monitor_cleanup () at /home/elmarco/src/qq/monitor.c:4683 qemu#20 0x0000556dd75ce776 in main (argc=3, argv=0x7fffb7bbe458, envp=0x7fffb7bbe478) at /home/elmarco/src/qq/vl.c:4660 Because spice code tries to emit a "disconnected" signal on the monitors. Fix this dead-lock by releasing the monitor lock for flush/destroy. monitor_lock protects mon_list, monitor_qapi_event_state and monitor_destroyed. monitor_flush() and monitor_data_destroy() don't access any of those variables. monitor_cleanup()'s loop is safe because it uses QTAILQ_FOREACH_SAFE(), and no further monitor can be added after calling monitor_cleanup() thanks to monitor_destroyed check in monitor_list_append(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20181205203737.9011-8-marcandre.lureau@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
Currently the log backend prints the process id of QEMU at the start of each output line, but since threads share the same PID there is no clear distinction between their outputs. Having the thread id present in the log makes it easier to see when output comes from different threads. E.g.: 12423@1538597569.672527:qemu_mutex_lock waiting on mutex 0x1103ee60 (/root/qemu/util/main-loop.c:236) ... 12430@1538597569.503928:qemu_mutex_unlock released mutex 0x1103ee60 (/root/qemu/cpus.c:1238) 12431@1538597569.503937:qemu_mutex_locked taken mutex 0x1103ee60 (/root/qemu/cpus.c:1257) ^here In the above, 12423 is the main process id and 12430 & 12431 are the two vcpu threads. (qemu) info cpus * CPU #0: thread_id=12430 CPU #1: thread_id=12431 Suggested-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The JSON parser happily accepts duplicate object member names. The last value wins. Reproducer #1: $ qemu-system-x86_64 -qmp stdio {"QMP": {"version": {"qemu": {"micro": 93, "minor": 0, "major": 3}, "package": "v3.1.0-rc3-7-g87a45d86ed"}, "capabilities": []}} {'execute':'qmp_capabilities'} {"return": {}} {'execute':'blockdev-add','arguments':{'driver':'null-co', 'node-name':'foo','node-name':'bar'}} {"return": {}} {'execute':'query-named-block-nodes'} {"return": [{ [...] "node-name": "bar" [...] }]} Reproducer qemu#2 is iotest 229. Fix the parser to reject duplicates, and fix iotest 229 not to use them. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20181206121743.20762-1-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> [Trailing whitespace tidied up] Signed-off-by: Markus Armbruster <armbru@redhat.com>
When encounter error, multifd_send_thread should always notify who pay attention to it before exit. Otherwise it may block migration_thread at multifd_send_sync_main forever. Error as follow: ------------------------------------------------------------------------------- (gdb) bt #0 0x00007f4d669dfa0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 #1 0x00007f4d669dfa9f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0 qemu#2 0x00007f4d669dfb3b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0 qemu#3 0x0000562ccf0a5614 in qemu_sem_wait (sem=sem@entry=0x562cd1b698e8) at util/qemu-thread-posix.c:319 qemu#4 0x0000562ccecb4752 in multifd_send_sync_main (rs=<optimized out>) at /qemu/migration/ram.c:1099 qemu#5 0x0000562ccecb95f4 in ram_save_iterate (f=0x562cd0ecc000, opaque=<optimized out>) at /qemu/migration/ram.c:3550 qemu#6 0x0000562ccef43c23 in qemu_savevm_state_iterate (f=0x562cd0ecc000, postcopy=false) at migration/savevm.c:1189 qemu#7 0x0000562ccef3dcf3 in migration_iteration_run (s=0x562cd09fabf0) at migration/migration.c:3131 qemu#8 migration_thread (opaque=opaque@entry=0x562cd09fabf0) at migration/migration.c:3258 qemu#9 0x0000562ccf0a4c26 in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:502 qemu#10 0x00007f4d669d9e25 in start_thread () from /lib64/libpthread.so.0 qemu#11 0x00007f4d6670635d in clone () from /lib64/libc.so.6 (gdb) f 4 qemu#4 0x0000562ccecb4752 in multifd_send_sync_main (rs=<optimized out>) at /qemu/migration/ram.c:1099 1099 qemu_sem_wait(&p->sem_sync); (gdb) list 1094 } 1095 for (i = 0; i < migrate_multifd_channels(); i++) { 1096 MultiFDSendParams *p = &multifd_send_state->params[i]; 1097 1098 trace_multifd_send_sync_main_wait(p->id); 1099 qemu_sem_wait(&p->sem_sync); 1100 } 1101 trace_multifd_send_sync_main(multifd_send_state->packet_num); 1102 } 1103 (gdb) p i $1 = 0 (gdb) p multifd_send_state->params[0].pending_job $2 = 2 //It means the job before MULTIFD_FLAG_SYNC has already fail (gdb) p multifd_send_state->params[0].quit $3 = true Signed-off-by: Ivan Ren <ivanren@tencent.com> Message-Id: <1567044996-2362-1-git-send-email-ivanren@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The 'blockdev-create' QMP command was introduced as experimental feature in commit b0292b8, using the assert() debug call. It got promoted to 'stable' command in 3fb588a, but the assert call was not removed. Some block drivers are optional, and bdrv_find_format() might return a NULL value, triggering the assertion. Stable code is not expected to abort, so return an error instead. This is easily reproducible when libnfs is not installed: ./configure [...] module support no Block whitelist (rw) Block whitelist (ro) libiscsi support yes libnfs support no [...] Start QEMU: $ qemu-system-x86_64 -S -qmp unix:/tmp/qemu.qmp,server,nowait Send the 'blockdev-create' with the 'nfs' driver: $ ( cat << 'EOF' {'execute': 'qmp_capabilities'} {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'} EOF ) | socat STDIO UNIX:/tmp/qemu.qmp {"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 4}, "package": "v4.1.0-733-g89ea03a7dc"}, "capabilities": ["oob"]}} {"return": {}} QEMU crashes: $ gdb qemu-system-x86_64 core Program received signal SIGSEGV, Segmentation fault. (gdb) bt #0 0x00007ffff510957f in raise () at /lib64/libc.so.6 #1 0x00007ffff50f3895 in abort () at /lib64/libc.so.6 qemu#2 0x00007ffff50f3769 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 qemu#3 0x00007ffff5101a26 in .annobin_assert.c_end () at /lib64/libc.so.6 qemu#4 0x0000555555d7e1f1 in qmp_blockdev_create (job_id=0x555556baee40 "x", options=0x555557666610, errp=0x7fffffffc770) at block/create.c:69 qemu#5 0x0000555555c96b52 in qmp_marshal_blockdev_create (args=0x7fffdc003830, ret=0x7fffffffc7f8, errp=0x7fffffffc7f0) at qapi/qapi-commands-block-core.c:1314 qemu#6 0x0000555555deb0a0 in do_qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false, errp=0x7fffffffc898) at qapi/qmp-dispatch.c:131 qemu#7 0x0000555555deb2a1 in qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false) at qapi/qmp-dispatch.c:174 With this patch applied, QEMU returns a QMP error: {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'} {"id": "x", "error": {"class": "GenericError", "desc": "Block driver 'nfs' not found or not supported"}} Cc: qemu-stable@nongnu.org Reported-by: Xu Tian <xutian@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
…uid. without kvm commit 412a3c41, CPUID(EAX=0xd,ECX=0).EBX always equal to 0 even through guest update xcr0, this will crash legacy guest(e.g., CentOS 6). Below is the call trace on the guest. [ 0.000000] kernel BUG at mm/bootmem.c:469! [ 0.000000] invalid opcode: 0000 [#1] SMP [ 0.000000] last sysfs file: [ 0.000000] CPU 0 [ 0.000000] Modules linked in: [ 0.000000] [ 0.000000] Pid: 0, comm: swapper Tainted: G --------------- H 2.6.32-279#2 Red Hat KVM [ 0.000000] RIP: 0010:[<ffffffff81c4edc4>] [<ffffffff81c4edc4>] alloc_bootmem_core+0x7b/0x29e [ 0.000000] RSP: 0018:ffffffff81a01cd8 EFLAGS: 00010046 [ 0.000000] RAX: ffffffff81cb1748 RBX: ffffffff81cb1720 RCX: 0000000001000000 [ 0.000000] RDX: 0000000000000040 RSI: 0000000000000000 RDI: ffffffff81cb1720 [ 0.000000] RBP: ffffffff81a01d38 R08: 0000000000000000 R09: 0000000000001000 [ 0.000000] R10: 02008921da802087 R11: 00000000ffff8800 R12: 0000000000000000 [ 0.000000] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000001000000 [ 0.000000] FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 [ 0.000000] CS: 0010 DS: 0018 ES: 0018 CR0: 0000000080050033 [ 0.000000] CR2: 0000000000000000 CR3: 0000000001a85000 CR4: 00000000001406b0 [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a8d020) [ 0.000000] Stack: [ 0.000000] 0000000000000002 81a01dd881eaf060 000000007e5fe227 0000000000001001 [ 0.000000] <d> 0000000000000040 0000000000000001 0000006cffffffff 0000000001000000 [ 0.000000] <d> ffffffff81cb1720 0000000000000000 0000000000000000 0000000000000000 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff81c4f074>] ___alloc_bootmem_nopanic+0x8d/0xca [ 0.000000] [<ffffffff81c4f0cf>] ___alloc_bootmem+0x11/0x39 [ 0.000000] [<ffffffff81c4f172>] __alloc_bootmem+0xb/0xd [ 0.000000] [<ffffffff814d42d9>] xsave_cntxt_init+0x249/0x2c0 [ 0.000000] [<ffffffff814e0689>] init_thread_xstate+0x17/0x25 [ 0.000000] [<ffffffff814e0710>] fpu_init+0x79/0xaa [ 0.000000] [<ffffffff814e27e3>] cpu_init+0x301/0x344 [ 0.000000] [<ffffffff81276395>] ? sort+0x155/0x230 [ 0.000000] [<ffffffff81c30cf2>] trap_init+0x24e/0x25f [ 0.000000] [<ffffffff81c2bd73>] start_kernel+0x21c/0x430 [ 0.000000] [<ffffffff81c2b33a>] x86_64_start_reservations+0x125/0x129 [ 0.000000] [<ffffffff81c2b438>] x86_64_start_kernel+0xfa/0x109 [ 0.000000] Code: 03 48 89 f1 49 c1 e8 0c 48 0f af d0 48 c7 c6 00 a6 61 81 48 c7 c7 00 e5 79 81 31 c0 4c 89 74 24 08 e8 f2 d7 89 ff 4d 85 e4 75 04 <0f> 0b eb fe 48 8b 45 c0 48 83 e8 01 48 85 45 c0 74 04 0f 0b eb Signed-off-by: Bingsong Si <owen.si@ucloud.cn> Message-Id: <20190822042901.16858-1-owen.si@ucloud.cn> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Bryant G. Ly bryantly@linux.vnet.ibm.com