Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcmu: Add Read_capacity_10 and Start/Stop #1

Open
wants to merge 2 commits into
base: qemu-tcmu
Choose a base branch
from
Open

tcmu: Add Read_capacity_10 and Start/Stop #1

wants to merge 2 commits into from

Conversation

bgly
Copy link

@bgly bgly commented Nov 11, 2016

Signed-off-by: Bryant G. Ly bryantly@linux.vnet.ibm.com

Fam Zheng and others added 2 commits November 4, 2016 16:35
libtcmu is a Linux library for userspace programs to handle TCMU
protocol, which is a SCSI transport between the target_core_user.ko and
a userspace backend implementation. The former can be seen as a kernel
SCSI Low-Level-Driver that forwards scsi requests to userspace via a
UIO ring buffer. By linking to libtcmu, a program can serve as a scsi
HBA.

This patch adds qemu-tcmu utility that serves as a LIO userspace
backend.  Apart from how it interacts with the rest of the world, the
role of qemu-tcmu is much alike to qemu-nbd: it can export any
format/protocol that QEMU supports (in iscsi/loopback instead of NBD),
for local or remote access. The main advantage with qemu-tcmu lies in
the more flexible command protocol (SCSI) and more flexible iscsi or
loopback frontends, the latter of which can theoretically allow zero-copy
I/O from local machine, which makes qemu-tcmu potentially possible to
serve data in better performance.

RFC because only minimal scsi commands are now handled. The plan is to
refactor and reuse scsi-disk emulation code. Also there is no test code.

Similar to NBD, there will be QMP commands to start built-in targets so
that guest images can be exported as well.

Usage example script (tested on Fedora 24):

    set -e

    # For targetcli integration
    sudo systemctl start tcmu-runner

    qemu-img create test-img 1G

    # libtcmu needs to open UIO, give it access
    sudo chmod 777 /dev/uio0

    qemu-tcmu test-img &

    sleep 1

    IQN=iqn.2003-01.org.linux-iscsi.lemon.x8664:sn.4939fc29108f

    # Check that we have initialized correctly
    sudo targetcli ls | grep -q user:qemu

    # Create iscsi target
    sudo targetcli ls | grep -q $IQN || sudo targetcli /iscsi create wwn=$IQN
    sudo targetcli /iscsi/$IQN/tpg1 set attribute \
        authentication=0 generate_node_acls=1 demo_mode_write_protect=0 \
        prod_mode_write_protect=0

    # Create the qemu-tcmu target
    sudo targetcli ls | grep -q d0 || \
        sudo targetcli /backstores/user:qemu create d0 1G @drive

    # Export it as an iscsi LUN
    sudo targetcli ls | grep -q lun0 || \
        sudo targetcli /iscsi/$IQN/tpg1/luns create storage_object=/backstores/user:qemu/d0

    # Inspect the nodes again
    sudo targetcli ls

    # Test that the LIO export works
    qemu-img info iscsi://127.0.0.1/$IQN/0
    qemu-io iscsi://127.0.0.1/$IQN/0 \
        -c 'read -P 1 4k 1k' \
        -c 'write -P 2 4k 1k' \
        -c 'read -P 2 4k 1k' \

Signed-off-by: Fam Zheng <famz@redhat.com>

---

v2: Two small fixes to help proof testing of this prototype:
    Fix READ CAPACITY. [Prasanna]
    Fix qiov. [Stefan]
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
@bgly
Copy link
Author

bgly commented Nov 11, 2016

@famz I added the stuff I sent you in the email, also can you update this branch to your v2?

@famz
Copy link
Owner

famz commented Nov 12, 2016

On Fri, 11/11 09:31, Bryant G. Ly wrote:

@famz I added the stuff I sent you in the email, also can you update this
branch to your v2?

For now I'm not inclined to integrate your start/stop emulation patch because
that will further bump the required libtcmu version to build qemu-tcmu, though
it's not a big deal given we are already requiring a pretty recent version.

The plan has been to totally reuse scsi-disk code in QEMU for all SCSI commands
emulation, instead of using libtcmu provided helpers, and that will solve the
start/stop and read capacity 16 issues you see. It will be implemented in v3, I
think.

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1 (comment)

famz pushed a commit that referenced this pull request Nov 29, 2016
Qemu crash in the source side while migrating, after starting ipmi service inside vm.

./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 \
-drive file=/work/suse/suse11_sp3_64_vt,format=raw,if=none,id=drive-virtio-disk0,cache=none \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 \
-vnc :99 -monitor vc -device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-kcs,bmc=bmc0,ioport=0xca2

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffec4268700 (LWP 7657)]
__memcpy_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:2757
(gdb) bt
 #0  __memcpy_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:2757
 #1  0x00005555559ef775 in memcpy (__len=3, __src=0xc1421c, __dest=<optimized out>)
     at /usr/include/bits/string3.h:51
 qemu#2  qemu_put_buffer (f=0x555557a97690, buf=0xc1421c <Address 0xc1421c out of bounds>, size=3)
     at migration/qemu-file.c:346
 qemu#3  0x00005555559eef66 in vmstate_save_state (f=f@entry=0x555557a97690,
     vmsd=0x555555f8a5a0 <vmstate_ISAIPMIKCSDevice>, opaque=0x555557231160,
     vmdesc=vmdesc@entry=0x55555798cc40) at migration/vmstate.c:333
 qemu#4  0x00005555557cfe45 in vmstate_save (f=f@entry=0x555557a97690, se=se@entry=0x555557231de0,
     vmdesc=vmdesc@entry=0x55555798cc40) at /mnt/sdb/zyy/qemu/migration/savevm.c:720
 qemu#5  0x00005555557d2be7 in qemu_savevm_state_complete_precopy (f=0x555557a97690,
     iterable_only=iterable_only@entry=false) at /mnt/sdb/zyy/qemu/migration/savevm.c:1128
 qemu#6  0x00005555559ea102 in migration_completion (start_time=<synthetic pointer>,
     old_vm_running=<synthetic pointer>, current_active_state=<optimized out>,
     s=0x5555560eaa80 <current_migration.44078>) at migration/migration.c:1707
 qemu#7  migration_thread (opaque=0x5555560eaa80 <current_migration.44078>) at migration/migration.c:1855
 qemu#8  0x00007ffff3900dc5 in start_thread (arg=0x7ffec4268700) at pthread_create.c:308
 qemu#9  0x00007fffefc6c71d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
famz pushed a commit that referenced this pull request Jan 23, 2017
Current code that handles Tx buffer desciprtor ring scanning employs the
following algorithm:

	1. Restore current buffer descriptor pointer from TBPTRn

	2. Process current descriptor

	3. If current descriptor has BD_WRAP flag set set current
	   descriptor pointer to start of the descriptor ring

	4. If current descriptor points to start of the ring exit the
	   loop, otherwise increment current descriptor pointer and go
	   to qemu#2

	5. Store current descriptor in TBPTRn

The way the code is implemented results in buffer descriptor ring being
scanned starting at offset/descriptor #0. While covering 99% of the
cases, this algorithm becomes problematic for a number of edge cases.

Consider the following scenario: guest OS driver initializes descriptor
ring to N individual descriptors and starts sending data out. Depending
on the volume of traffic and probably guest OS driver implementation it
is possible that an edge case where a packet, spread across 2
descriptors is placed in descriptors N - 1 and 0 in that order(it is
easy to imagine similar examples involving more than 2 descriptors).

What happens then is aforementioned algorithm starts at descriptor 0,
sees a descriptor marked as BD_LAST, which it happily sends out as a
separate packet(very much malformed at this point) then the iteration
continues and the first part of the original packet is tacked to the
next transmission which ends up being bogus as well.

This behvaiour can be pretty reliably observed when scp'ing data from a
guest OS via TAP interface for files larger than 160K (every time for
700K+).

This patch changes the scanning algorithm to do the following:

	1. Restore "current" buffer descriptor pointer from
	   TBPTRn

	2. If "current" descriptor does not have BD_TX_READY set, goto qemu#6

	3. Process current descriptor

	4. If "current" descriptor has BD_WRAP flag set "current"
	   descriptor pointer to start of the descriptor ring otherwise
	   set increment "current" by the size of one descriptor

	5. Goto #1

	6. Save "current" buffer descriptor in TBPTRn

This way we preserve the information about which descriptor was
processed last and always start where we left off avoiding the original
problem. On top of that, judging by the following excerpt from
MPC8548ERM (p. 14-48):

"... When the end of the TxBD ring is reached, eTSEC initializes TBPTRn
to the value in the corresponding TBASEn. The TBPTR register is
internally written by the eTSEC’s DMA controller during
transmission. The pointer increments by eight (bytes) each time a
descriptor is closed successfully by the eTSEC..."

revised algorithm might also a more correct way of emulating this aspect
of eTSEC peripheral.

Cc: Alexander Graf <agraf@suse.de>
Cc: Scott Wood <scottwood@freescale.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
famz pushed a commit that referenced this pull request Feb 24, 2017
QEMU will crash with the follow backtrace if the new created thread exited before
we call qemu_thread_set_name() for it.

  (gdb) bt
  #0 0x00007f9a68b095d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
  #1 0x00007f9a68b0acc8 in __GI_abort () at abort.c:90
  qemu#2 0x00007f9a69cda389 in PAT_abort () from /usr/lib64/libuvpuserhotfix.so
  qemu#3 0x00007f9a69cdda0d in patchIllInsHandler () from /usr/lib64/libuvpuserhotfix.so
  qemu#4 <signal handler called>
  qemu#5 pthread_setname_np (th=140298470549248, name=name@entry=0x8cc74a "io-task-worker") at ../nptl/sysdeps/unix/sysv/linux/pthread_setname.c:49
  qemu#6 0x00000000007f5f20 in qemu_thread_set_name (thread=thread@entry=0x7ffd2ac09680, name=name@entry=0x8cc74a "io-task-worker") at util/qemu_thread_posix.c:459
  qemu#7 0x00000000007f679e in qemu_thread_create (thread=thread@entry=0x7ffd2ac09680, name=name@entry=0x8cc74a "io-task-worker",start_routine=start_routine@entry=0x7c1300 <qio_task_thread_worker>, arg=arg@entry=0x7f99b8001720, mode=mode@entry=1) at util/qemu_thread_posix.c:498
  qemu#8 0x00000000007c15b6 in qio_task_run_in_thread (task=task@entry=0x7f99b80033d0, worker=worker@entry=0x7bd920 <qio_channel_socket_connect_worker>, opaque=0x7f99b8003370, destroy=0x7c6220 <qapi_free_SocketAddress>) at io/task.c:133
  qemu#9 0x00000000007bda04 in qio_channel_socket_connect_async (ioc=0x7f99b80014c0, addr=0x37235d0, callback=callback@entry=0x54ad00 <qemu_chr_socket_connected>, opaque=opaque@entry=0x38118b0, destroy=destroy@entry=0x0) at io/channel_socket.c:191
  qemu#10 0x00000000005487f6 in socket_reconnect_timeout (opaque=0x38118b0) at qemu_char.c:4402
  qemu#11 0x00007f9a6a1533b3 in g_timeout_dispatch () from /usr/lib64/libglib-2.0.so.0
  qemu#12 0x00007f9a6a15299a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
  qemu#13 0x0000000000747386 in glib_pollfds_poll () at main_loop.c:227
  qemu#14 0x0000000000747424 in os_host_main_loop_wait (timeout=404000000) at main_loop.c:272
  qemu#15 0x0000000000747575 in main_loop_wait (nonblocking=nonblocking@entry=0) at main_loop.c:520
  qemu#16 0x0000000000557d31 in main_loop () at vl.c:2170
  qemu#17 0x000000000041c8b7 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:5083

Let's detach the new thread after calling qemu_thread_set_name().

Signed-off-by: Caoxinhua <caoxinhua@huawei.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Message-Id: <1483493521-9604-1-git-send-email-zhang.zhanghailiang@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
famz pushed a commit that referenced this pull request Feb 24, 2017
AioHandlers marked ->is_external must be skipped when aio_node_check()
fails.  bdrv_drained_begin() needs this to prevent dataplane from
submitting new I/O requests while another thread accesses the device and
relies on it being quiesced.

This patch fixes the following segfault:

  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at qemu/block/io.c:2650
  2650            bdrv_io_plug(child->bs);
  [Current thread is 1 (Thread 0x7ff5c4bd1c80 (LWP 10917))]
  (gdb) bt
  #0  0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at qemu/block/io.c:2650
  #1  0x00005577f6114363 in blk_io_plug (blk=0x5577f7b8ba20) at qemu/block/block-backend.c:1561
  qemu#2  0x00005577f5d4091d in virtio_blk_handle_vq (s=0x5577f9ada030, vq=0x5577f9b3d2a0) at qemu/hw/block/virtio-blk.c:589
  qemu#3  0x00005577f5d4240d in virtio_blk_data_plane_handle_output (vdev=0x5577f9ada030, vq=0x5577f9b3d2a0) at qemu/hw/block/dataplane/virtio-blk.c:158
  qemu#4  0x00005577f5d88acd in virtio_queue_notify_aio_vq (vq=0x5577f9b3d2a0) at qemu/hw/virtio/virtio.c:1304
  qemu#5  0x00005577f5d8aaaf in virtio_queue_host_notifier_aio_poll (opaque=0x5577f9b3d308) at qemu/hw/virtio/virtio.c:2134
  qemu#6  0x00005577f60ca077 in run_poll_handlers_once (ctx=0x5577f79ddbb0) at qemu/aio-posix.c:493
  qemu#7  0x00005577f60ca268 in try_poll_mode (ctx=0x5577f79ddbb0, blocking=true) at qemu/aio-posix.c:569
  qemu#8  0x00005577f60ca331 in aio_poll (ctx=0x5577f79ddbb0, blocking=true) at qemu/aio-posix.c:601
  qemu#9  0x00005577f612722a in bdrv_flush (bs=0x5577f7c20970) at qemu/block/io.c:2403
  qemu#10 0x00005577f60c1b2d in bdrv_close (bs=0x5577f7c20970) at qemu/block.c:2322
  qemu#11 0x00005577f60c20e7 in bdrv_delete (bs=0x5577f7c20970) at qemu/block.c:2465
  qemu#12 0x00005577f60c3ecf in bdrv_unref (bs=0x5577f7c20970) at qemu/block.c:3425
  qemu#13 0x00005577f60bf951 in bdrv_root_unref_child (child=0x5577f7a2de70) at qemu/block.c:1361
  qemu#14 0x00005577f6112162 in blk_remove_bs (blk=0x5577f7b8ba20) at qemu/block/block-backend.c:491
  qemu#15 0x00005577f6111b1b in blk_remove_all_bs () at qemu/block/block-backend.c:245
  qemu#16 0x00005577f60c1db6 in bdrv_close_all () at qemu/block.c:2382
  qemu#17 0x00005577f5e60cca in main (argc=20, argv=0x7ffea6eb8398, envp=0x7ffea6eb8440) at qemu/vl.c:4684

The key thing is that bdrv_close() uses bdrv_drained_begin() and
virtio_queue_host_notifier_aio_poll() must not be called.

Thanks to Fam Zheng <famz@redhat.com> for identifying the root cause of
this crash.

Reported-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Tested-by: Alberto Garcia <berto@igalia.com>
Message-id: 20170124095350.16679-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
famz pushed a commit that referenced this pull request Apr 20, 2017
Running postcopy-test with ASAN produces the following error:

QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64  tests/postcopy-test
...
=================================================================
==23641==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1556600000 at pc 0x55b8e9d28208 bp 0x7f1555f4d3c0 sp 0x7f1555f4d3b0
READ of size 8 at 0x7f1556600000 thread T6
    #0 0x55b8e9d28207 in htab_save_first_pass /home/elmarco/src/qq/hw/ppc/spapr.c:1528
    #1 0x55b8e9d2939c in htab_save_iterate /home/elmarco/src/qq/hw/ppc/spapr.c:1665
    qemu#2 0x55b8e9beae3a in qemu_savevm_state_iterate /home/elmarco/src/qq/migration/savevm.c:1044
    qemu#3 0x55b8ea677733 in migration_thread /home/elmarco/src/qq/migration/migration.c:1976
    qemu#4 0x7f15845f46c9 in start_thread (/lib64/libpthread.so.0+0x76c9)
    qemu#5 0x7f157d9d0f7e in clone (/lib64/libc.so.6+0x107f7e)

0x7f1556600000 is located 0 bytes to the right of 2097152-byte region [0x7f1556400000,0x7f1556600000)
allocated by thread T0 here:
    #0 0x7f159bb76980 in posix_memalign (/lib64/libasan.so.3+0xc7980)
    #1 0x55b8eab185b2 in qemu_try_memalign /home/elmarco/src/qq/util/oslib-posix.c:106
    qemu#2 0x55b8eab186c8 in qemu_memalign /home/elmarco/src/qq/util/oslib-posix.c:122
    qemu#3 0x55b8e9d268a8 in spapr_reallocate_hpt /home/elmarco/src/qq/hw/ppc/spapr.c:1214
    qemu#4 0x55b8e9d26e04 in ppc_spapr_reset /home/elmarco/src/qq/hw/ppc/spapr.c:1261
    qemu#5 0x55b8ea12e913 in qemu_system_reset /home/elmarco/src/qq/vl.c:1697
    qemu#6 0x55b8ea13fa40 in main /home/elmarco/src/qq/vl.c:4679
    qemu#7 0x7f157d8e9400 in __libc_start_main (/lib64/libc.so.6+0x20400)

Thread T6 created by T0 here:
    #0 0x7f159bae0488 in __interceptor_pthread_create (/lib64/libasan.so.3+0x31488)
    #1 0x55b8eab1d9cb in qemu_thread_create /home/elmarco/src/qq/util/qemu-thread-posix.c:465
    qemu#2 0x55b8ea67874c in migrate_fd_connect /home/elmarco/src/qq/migration/migration.c:2096
    qemu#3 0x55b8ea66cbb0 in migration_channel_connect /home/elmarco/src/qq/migration/migration.c:500
    qemu#4 0x55b8ea678f38 in socket_outgoing_migration /home/elmarco/src/qq/migration/socket.c:87
    qemu#5 0x55b8eaa5a03a in qio_task_complete /home/elmarco/src/qq/io/task.c:142
    qemu#6 0x55b8eaa599cc in gio_task_thread_result /home/elmarco/src/qq/io/task.c:88
    qemu#7 0x7f15823e38e6  (/lib64/libglib-2.0.so.0+0x468e6)
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/elmarco/src/qq/hw/ppc/spapr.c:1528 in htab_save_first_pass

index seems to be wrongly incremented, unless I miss something that
would be worth a comment.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
famz pushed a commit that referenced this pull request Apr 20, 2017
During block job completion, nothing is preventing
block_job_defer_to_main_loop_bh from being called in a nested
aio_poll(), which is a trouble, such as in this code path:

    qmp_block_commit
      commit_active_start
        bdrv_reopen
          bdrv_reopen_multiple
            bdrv_reopen_prepare
              bdrv_flush
                aio_poll
                  aio_bh_poll
                    aio_bh_call
                      block_job_defer_to_main_loop_bh
                        stream_complete
                          bdrv_reopen

block_job_defer_to_main_loop_bh is the last step of the stream job,
which should have been "paused" by the bdrv_drained_begin/end in
bdrv_reopen_multiple, but it is not done because it's in the form of a
main loop BH.

Similar to why block jobs should be paused between drained_begin and
drained_end, BHs they schedule must be excluded as well.  To achieve
this, this patch forces draining the BH in BDRV_POLL_WHILE.

As a side effect this fixes a hang in block_job_detach_aio_context
during system_reset when a block job is ready:

    #0  0x0000555555aa79f3 in bdrv_drain_recurse
    #1  0x0000555555aa825d in bdrv_drained_begin
    qemu#2  0x0000555555aa8449 in bdrv_drain
    qemu#3  0x0000555555a9c356 in blk_drain
    qemu#4  0x0000555555aa3cfd in mirror_drain
    qemu#5  0x0000555555a66e11 in block_job_detach_aio_context
    qemu#6  0x0000555555a62f4d in bdrv_detach_aio_context
    qemu#7  0x0000555555a63116 in bdrv_set_aio_context
    qemu#8  0x0000555555a9d326 in blk_set_aio_context
    qemu#9  0x00005555557e38da in virtio_blk_data_plane_stop
    qemu#10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd
    qemu#11 0x00005555559fa49b in virtio_bus_stop_ioeventfd
    qemu#12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd
    qemu#13 0x00005555559f6a18 in virtio_pci_reset
    qemu#14 0x00005555559139a9 in qdev_reset_one
    qemu#15 0x0000555555916738 in qbus_walk_children
    qemu#16 0x0000555555913318 in qdev_walk_children
    qemu#17 0x0000555555916738 in qbus_walk_children
    qemu#18 0x00005555559168ca in qemu_devices_reset
    qemu#19 0x000055555581fcbb in pc_machine_reset
    qemu#20 0x00005555558a4d96 in qemu_system_reset
    qemu#21 0x000055555577157a in main_loop_should_exit
    qemu#22 0x000055555577157a in main_loop
    qemu#23 0x000055555577157a in main

The rationale is that the loop in block_job_detach_aio_context cannot
make any progress in pausing/completing the job, because bs->in_flight
is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop
BH. With this patch, it does.

Reported-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170418143044.12187-3-famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Tested-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
famz pushed a commit that referenced this pull request May 31, 2017
ASAN detects an "unknown-crash" when running pxe-test:

/ppc64/pxe/spapr-vlan: =================================================================
==7143==ERROR: AddressSanitizer: unknown-crash on address 0x7f6dcd298d30 at pc 0x55e22218830d bp 0x7f6dcd2989e0 sp 0x7f6dcd2989d0
READ of size 128 at 0x7f6dcd298d30 thread T2
    #0 0x55e22218830c in tftp_session_allocate /home/elmarco/src/qq/slirp/tftp.c:73
    #1 0x55e22218a1f8 in tftp_handle_rrq /home/elmarco/src/qq/slirp/tftp.c:289
    qemu#2 0x55e22218b54c in tftp_input /home/elmarco/src/qq/slirp/tftp.c:446
    qemu#3 0x55e2221833fe in udp6_input /home/elmarco/src/qq/slirp/udp6.c:82
    qemu#4 0x55e222137b17 in ip6_input /home/elmarco/src/qq/slirp/ip6_input.c:67

Address 0x7f6dcd298d30 is located in stack of thread T2 at offset 96 in frame
    #0 0x55e222182420 in udp6_input /home/elmarco/src/qq/slirp/udp6.c:13

  This frame has 3 object(s):
    [32, 48) '<unknown>'
    [96, 124) 'lhost' <== Memory access at offset 96 partially overflows this variable
    [160, 200) 'save_ip' <== Memory access at offset 96 partially underflows this variable

The sockaddr_storage pointer is the sockaddr_in6 lhost on the
stack. Copy only the source addr size.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
famz pushed a commit that referenced this pull request May 31, 2017
If trace backend is set to TRACE_NOP, trace_get_vcpu_event_count
returns 0, cause bitmap_new call abort.

The abort can be triggered as follows:

  $ ./configure --enable-trace-backend=nop --target-list=x86_64-softmmu
  $ gdb ./x86_64-softmmu/qemu-system-x86_64 -M q35,accel=kvm -m 1G
  (gdb) bt
  #0  0x00007ffff04e25f7 in raise () from /lib64/libc.so.6
  #1  0x00007ffff04e3ce8 in abort () from /lib64/libc.so.6
  qemu#2  0x00005555559de905 in bitmap_new (nbits=<optimized out>)
      at /home/root/git/qemu2.git/include/qemu/bitmap.h:96
  qemu#3  cpu_common_initfn (obj=0x555556621d30) at qom/cpu.c:399
  qemu#4  0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bbb0) at qom/object.c:341
  qemu#5  0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bd30) at qom/object.c:341
  qemu#6  0x0000555555a11efc in object_initialize_with_type (data=data@entry=0x555556621d30, size=76560,
      type=type@entry=0x55555656bd30) at qom/object.c:376
  qemu#7  0x0000555555a12061 in object_new_with_type (type=0x55555656bd30) at qom/object.c:484
  qemu#8  0x0000555555a121c5 in object_new (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu")
      at qom/object.c:494
  qemu#9  0x00005555557f6e3d in pc_new_cpu (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu", apic_id=0,
      errp=errp@entry=0x5555565391b0 <error_fatal>) at /home/root/git/qemu2.git/hw/i386/pc.c:1101
  qemu#10 0x00005555557fa33e in pc_cpus_init (pcms=pcms@entry=0x5555565f9690)
      at /home/root/git/qemu2.git/hw/i386/pc.c:1184
  qemu#11 0x00005555557fe0f6 in pc_q35_init (machine=0x5555565f9690) at /home/root/git/qemu2.git/hw/i386/pc_q35.c:121
  qemu#12 0x000055555574fbad in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4562

Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Message-id: 1494369432-15418-1-git-send-email-anthony.xu@intel.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
famz pushed a commit that referenced this pull request May 31, 2017
Spotted by ASAN:

/x86_64/hmp/pc-0.12:
=================================================================
==22538==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 224 byte(s) in 1 object(s) allocated from:
    #0 0x7f0f63cdee60 in malloc (/lib64/libasan.so.3+0xc6e60)
    #1 0x556f11ff32d7 in tcp_newtcpcb /home/elmarco/src/qemu/slirp/tcp_subr.c:250
    qemu#2 0x556f11fdb1d1 in tcp_listen /home/elmarco/src/qemu/slirp/socket.c:688
    qemu#3 0x556f11fca9d5 in slirp_add_hostfwd /home/elmarco/src/qemu/slirp/slirp.c:1052
    qemu#4 0x556f11f8db41 in slirp_hostfwd /home/elmarco/src/qemu/net/slirp.c:506
    qemu#5 0x556f11f8dd83 in hmp_hostfwd_add /home/elmarco/src/qemu/net/slirp.c:535

There might be a better way to fix this, but calling slirp tcp_close()
doesn't work.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
famz pushed a commit that referenced this pull request Jun 22, 2017
Since commit d4c19cd ("virtio-serial:
add missing virtio_detach_element() call") the following commands may
cause QEMU to segfault:

  $ qemu -M accel=kvm -cpu host -m 1G \
         -drive if=virtio,file=test.img,format=raw \
         -device virtio-serial-pci,id=virtio-serial0 \
         -chardev socket,id=channel1,path=/tmp/chardev.sock,server,nowait \
         -device virtserialport,chardev=channel1,bus=virtio-serial0.0,id=port1
  $ nc -U /tmp/chardev.sock
  ^C

  (guest)$ cat /dev/zero >/dev/vport0p1

The segfault is non-deterministic: if the event loop notices the socket
has been closed then there is no crash.  The disconnect has to happen
right before QEMU attempts to write data to the socket.

The backtrace is as follows:

  Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
  0x00005555557e0698 in do_flush_queued_data (port=0x5555582cedf0, vq=0x7fffcc854290, vdev=0x55555807b1d0) at hw/char/virtio-serial-bus.c:180
  180           for (i = port->iov_idx; i < port->elem->out_num; i++) {
  #1  0x000055555580d363 in virtio_queue_notify_vq (vq=0x7fffcc854290) at hw/virtio/virtio.c:1524
  qemu#2  0x000055555580d363 in virtio_queue_host_notifier_read (n=0x7fffcc8542f8) at hw/virtio/virtio.c:2430
  qemu#3  0x0000555555b3482c in aio_dispatch_handlers (ctx=ctx@entry=0x5555566b8c80) at util/aio-posix.c:399
  qemu#4  0x0000555555b350d8 in aio_dispatch (ctx=0x5555566b8c80) at util/aio-posix.c:430
  qemu#5  0x0000555555b3212e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
  qemu#6  0x00007fffde71de52 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
  qemu#7  0x0000555555b34353 in glib_pollfds_poll () at util/main-loop.c:213
  qemu#8  0x0000555555b34353 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
  qemu#9  0x0000555555b34353 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:517
  qemu#10 0x0000555555773207 in main_loop () at vl.c:1917
  qemu#11 0x0000555555773207 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4751

The do_flush_queued_data() function does not anticipate chardev close
events during vsc->have_data().  It expects port->elem to remain
non-NULL for the duration its for loop.

The fix is simply to return from do_flush_queued_data() if the port
closes because the close event already frees port->elem and drains the
virtqueue - there is nothing left for do_flush_queued_data() to do.

Reported-by: Sitong Liu <siliu@redhat.com>
Reported-by: Min Deng <mdeng@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
famz pushed a commit that referenced this pull request Jun 22, 2017
…ered

Submission of requests on linux aio is a bit tricky and can lead to
requests completions on submission path:

44713c9 ("linux-aio: Handle io_submit() failure gracefully")
0ed93d8 ("linux-aio: process completions from ioq_submit()")

That means that any coroutine which has been yielded in order to wait
for completion can be resumed from submission path and be eventually
terminated (freed).

The following use-after-free crash was observed when IO throttling
was enabled:

 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x7f5813dff700 (LWP 56417)]
 virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
 (gdb) bt
 #0  virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
                              ^^^^^^^^^^^^^^
                              remember the address

 #1  virtqueue_fill (vq=0x5598b20d21b0, elem=0x7f5804009a30, len=1, idx=0) at virtio.c:282
 qemu#2  virtqueue_push (vq=0x5598b20d21b0, elem=elem@entry=0x7f5804009a30, len=<optimized out>) at virtio.c:308
 qemu#3  virtio_blk_req_complete (req=req@entry=0x7f5804009a30, status=status@entry=0 '\000') at virtio-blk.c:61
 qemu#4  virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at virtio-blk.c:126
 qemu#5  blk_aio_complete (acb=0x7f58040068d0) at block-backend.c:923
 qemu#6  coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:78

 (gdb) p * elem
 $8 = {index = 77, out_num = 2, in_num = 1,
       in_addr = 0x7f5804009ad8, out_addr = 0x7f5804009ae0,
       in_sg = 0x0, out_sg = 0x7f5804009a50}
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       'in_sg' and 'out_sg' are invalid.
       e.g. it is impossible that 'in_sg' is zero,
       instead its value must be equal to:

       (gdb) p/x 0x7f5804009ad8 + sizeof(elem->in_addr[0]) + 2 * sizeof(elem->out_addr[0])
       $26 = 0x7f5804009af0

Seems 'elem' was corrupted.  Meanwhile another thread raised an abort:

 Thread 12 (Thread 0x7f57f2ffd700 (LWP 56426)):
 #0  raise () from /lib/x86_64-linux-gnu/libc.so.6
 #1  abort () from /lib/x86_64-linux-gnu/libc.so.6
 qemu#2  qemu_coroutine_enter (co=0x7f5804009af0) at qemu-coroutine.c:113
 qemu#3  qemu_co_queue_run_restart (co=0x7f5804009a30) at qemu-coroutine-lock.c:60
 qemu#4  qemu_coroutine_enter (co=0x7f5804009a30) at qemu-coroutine.c:119
                           ^^^^^^^^^^^^^^^^^^
                           WTF?? this is equal to elem from crashed thread

 qemu#5  qemu_co_queue_run_restart (co=0x7f57e7f16ae0) at qemu-coroutine-lock.c:60
 qemu#6  qemu_coroutine_enter (co=0x7f57e7f16ae0) at qemu-coroutine.c:119
 qemu#7  qemu_co_queue_run_restart (co=0x7f5807e112a0) at qemu-coroutine-lock.c:60
 qemu#8  qemu_coroutine_enter (co=0x7f5807e112a0) at qemu-coroutine.c:119
 qemu#9  qemu_co_queue_run_restart (co=0x7f5807f17820) at qemu-coroutine-lock.c:60
 qemu#10 qemu_coroutine_enter (co=0x7f5807f17820) at qemu-coroutine.c:119
 qemu#11 qemu_co_queue_run_restart (co=0x7f57e7f18e10) at qemu-coroutine-lock.c:60
 qemu#12 qemu_coroutine_enter (co=0x7f57e7f18e10) at qemu-coroutine.c:119
 qemu#13 qemu_co_enter_next (queue=queue@entry=0x5598b1e742d0) at qemu-coroutine-lock.c:106
 qemu#14 timer_cb (blk=0x5598b1e74280, is_write=<optimized out>) at throttle-groups.c:419

Crash can be explained by access of 'co' object from the loop inside
qemu_co_queue_run_restart():

  while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
      QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
                           ^^^^^^^^^^^^^^^^^^^^
                           on each iteration 'co' is accessed,
                           but 'co' can be already freed

      qemu_coroutine_enter(next);
  }

When 'next' coroutine is resumed (entered) it can in its turn resume
'co', and eventually free it.  That's why we see 'co' (which was freed)
has the same address as 'elem' from the first backtrace.

The fix is obvious: use temporary queue and do not touch coroutine after
first qemu_coroutine_enter() is invoked.

The issue is quite rare and happens every ~12 hours on very high IO
and CPU load (building linux kernel with -j512 inside guest) when IO
throttling is enabled.  With the fix applied guest is running ~35 hours
and is still alive so far.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170601160847.23720-1-roman.penyaev@profitbricks.com
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Fam Zheng <famz@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
famz pushed a commit that referenced this pull request Jun 22, 2017
V flag for subtraction is:

   v = (res ^ src1) & (src1 ^ src2)

(see COMPUTE_CCR() in target/m68k/helper.c)

But gen_flush_flags() uses:

   v = (res ^ src2) & (src1 ^ src2)

The problem has been found with the following program:

        .global _start
_start:
        move.l  #-2147483648,%d0
        subq.l  #1,%d0
        jvc     1f
        move.l #1,%d1
        move.l #1,%d0
        trap #0
1:
        move.l #0,%d1
        move.l #1,%d0
        trap #0

It works fine (exit(1)) on real hardware, and with "-singlestep".

"-singlestep" uses gen_helper_flush_flags(), whereas
without "-singlestep", V flag is computed directly in
gen_flush_flags().

This patch updates gen_flush_flags() to have the same result
as with gen_helper_flush_flags().

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <20170614203905.19657-1-laurent@vivier.eu>
famz pushed a commit that referenced this pull request Jun 22, 2017
Commit e987c37 ("hw/i386: check if
nvdimm is enabled before plugging") introduced a check to reject nvdimm
hotplug if -machine pc,nvdimm=on was not given.

This check executes after pc_dimm_memory_plug() has already completed
and does not reverse the effect of this function in the case of failure.

Perform the check before calling pc_dimm_memory_plug().  This fixes the
following abort:

  $ qemu -M accel=kvm -m 1G,slots=4,maxmem=8G \
         -object memory-backend-file,id=mem1,share=on,mem-path=nvdimm.dat,size=1G
  (qemu) device_add nvdimm,memdev=mem1
  nvdimm is not enabled: missing 'nvdimm' in '-M'
  (qemu) device_add nvdimm,memdev=mem1
  Core dumped

The backtrace is:

  #0  0x00007fffdb5b191f in raise () at /lib64/libc.so.6
  #1  0x00007fffdb5b351a in abort () at /lib64/libc.so.6
  qemu#2  0x00007fffdb5a9da7 in __assert_fail_base () at /lib64/libc.so.6
  qemu#3  0x00007fffdb5a9e52 in  () at /lib64/libc.so.6
  qemu#4  0x000055555577a5fa in qemu_ram_set_idstr (new_block=0x555556747a00, name=<optimized out>, dev=dev@entry=0x555556705590) at qemu/exec.c:1709
  qemu#5  0x0000555555a0fe86 in vmstate_register_ram (mr=mr@entry=0x55555673a0e0, dev=dev@entry=0x555556705590) at migration/savevm.c:2293
  qemu#6  0x0000555555965088 in pc_dimm_memory_plug (dev=dev@entry=0x555556705590, hpms=hpms@entry=0x5555566bb0e0, mr=mr@entry=0x555556705630, align=<optimized out>, errp=errp@entry=0x7fffffffc660)
      at hw/mem/pc-dimm.c:110
  qemu#7  0x000055555581d89b in pc_dimm_plug (errp=0x7fffffffc6c0, dev=0x555556705590, hotplug_dev=<optimized out>) at qemu/hw/i386/pc.c:1713
  qemu#8  0x000055555581d89b in pc_machine_device_plug_cb (hotplug_dev=<optimized out>, dev=0x555556705590, errp=0x7fffffffc6c0) at qemu/hw/i386/pc.c:2004
  qemu#9  0x0000555555914da6 in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7fffffffc7e8) at hw/core/qdev.c:926

Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
famz pushed a commit that referenced this pull request Aug 15, 2017
"nc" is freed after hotplug vhost-user, but the watcher is not removed.
The QEMU crash when the watcher access the "nc" when socket disconnects.

    Program received signal SIGSEGV, Segmentation fault.
    #0  object_get_class (obj=obj@entry=0x2) at qom/object.c:750
    #1  0x00007f9bb4180da1 in qemu_chr_fe_disconnect (be=<optimized out>) at chardev/char-fe.c:372
    qemu#2  0x00007f9bb40d1100 in net_vhost_user_watch (chan=<optimized out>, cond=<optimized out>, opaque=<optimized out>) at net/vhost-user.c:188
    qemu#3  0x00007f9baf97f99a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
    qemu#4  0x00007f9bb41d7ebc in glib_pollfds_poll () at util/main-loop.c:213
    qemu#5  os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
    qemu#6  main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:515
    qemu#7  0x00007f9bb3e266a7 in main_loop () at vl.c:1917
    qemu#8  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4786

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
famz pushed a commit that referenced this pull request Oct 9, 2017
The following segfault is encountered if the NBD server closes the UNIX
domain socket immediately after negotiation:

  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441
  441       QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
  (gdb) bt
  #0  0x000000d3c01a50f8 in aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441
  #1  0x000000d3c012fa90 in nbd_coroutine_end (bs=bs@entry=0xd3c0fec650, request=<optimized out>) at block/nbd-client.c:207
  qemu#2  0x000000d3c012fb58 in nbd_client_co_preadv (bs=0xd3c0fec650, offset=0, bytes=<optimized out>, qiov=0x7ffc10a91b20, flags=0) at block/nbd-client.c:237
  qemu#3  0x000000d3c0128e63 in bdrv_driver_preadv (bs=bs@entry=0xd3c0fec650, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=0) at block/io.c:836
  qemu#4  0x000000d3c012c3e0 in bdrv_aligned_preadv (child=child@entry=0xd3c0ff51d0, req=req@entry=0x7f31885d6e90, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=1, qiov=qiov@entry=0x7ffc10a91b20, f
+lags=0) at block/io.c:1086
  qemu#5  0x000000d3c012c6b8 in bdrv_co_preadv (child=0xd3c0ff51d0, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=flags@entry=0) at block/io.c:1182
  qemu#6  0x000000d3c011cc17 in blk_co_preadv (blk=0xd3c0ff4f80, offset=0, bytes=512, qiov=0x7ffc10a91b20, flags=0) at block/block-backend.c:1032
  qemu#7  0x000000d3c011ccec in blk_read_entry (opaque=0x7ffc10a91b40) at block/block-backend.c:1079
  qemu#8  0x000000d3c01bbb96 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79
  qemu#9  0x00007f3196cb8600 in __start_context () at /lib64/libc.so.6

The problem is that nbd_client_init() uses
nbd_client_attach_aio_context() -> aio_co_schedule(new_context,
client->read_reply_co).  Execution of read_reply_co is deferred to a BH
which doesn't run until later.

In the mean time blk_co_preadv() can be called and nbd_coroutine_end()
calls aio_wake() on read_reply_co.  At this point in time
read_reply_co's ctx isn't set because it has never been entered yet.

This patch simplifies the nbd_co_send_request() ->
nbd_co_receive_reply() -> nbd_coroutine_end() lifecycle to just
nbd_co_send_request() -> nbd_co_receive_reply().  The request is "ended"
if an error occurs at any point.  Callers no longer have to invoke
nbd_coroutine_end().

This cleanup also eliminates the segfault because we don't call
aio_co_schedule() to wake up s->read_reply_co if sending the request
failed.  It is only necessary to wake up s->read_reply_co if a reply was
received.

Note this only happens with UNIX domain sockets on Linux.  It doesn't
seem possible to reproduce this with TCP sockets.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20170829122745.14309-2-stefanha@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
famz pushed a commit that referenced this pull request Jan 25, 2018
currently for sh4 cpu_model argument for '-cpu' option
could be either 'cpu model' name or cpu_typename.

however typically '-cpu' takes 'cpu model' name and
cpu type for sh4 target isn't advertised publicly
('-cpu help' prints only 'cpu model' names) so we
shouldn't care about this use case (it's more of a bug).

1. Drop '-cpu cpu_typename' to align with the rest of
   targets.
2. Compose searched for typename from cpu model and use
   it with object_class_by_name() directly instead of
   over-complicated
       object_class_get_list()
       g_slist_find_custom() + superh_cpu_name_compare()

With #1 droped, qemu#2 could be used for both lookups which
simplifies superh_cpu_class_by_name() quite a bit.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1507211474-188400-23-git-send-email-imammedo@redhat.com>
[ehabkost: Include fixup sent by Igor]
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
famz pushed a commit that referenced this pull request Jan 25, 2018
Migration of pseries is broken with TCG because
QEMU tries to restore KVM MMU state unconditionally.

The result is a SIGSEGV in kvm_vm_ioctl():

  #0  kvm_vm_ioctl (s=0x0, type=-2146390353)
      at qemu/accel/kvm/kvm-all.c:2032
  #1  0x00000001003e3e2c in kvmppc_configure_v3_mmu (cpu=<optimized out>,
      radix=<optimized out>, gtse=<optimized out>, proc_tbl=<optimized out>)
      at qemu/target/ppc/kvm.c:396
  qemu#2  0x00000001002f8b88 in spapr_post_load (opaque=0x1019103c0,
      version_id=<optimized out>) at qemu/hw/ppc/spapr.c:1578
  qemu#3  0x000000010059e4cc in vmstate_load_state (f=0x106230000,
      vmsd=0x1009479e0 <vmstate_spapr>, opaque=0x1019103c0,
      version_id=<optimized out>) at qemu/migration/vmstate.c:165
  qemu#4  0x00000001005987e0 in vmstate_load (f=<optimized out>, se=<optimized out>)
      at qemu/migration/savevm.c:748

This patch fixes the problem by not calling the KVM function with the
TCG mode.

Fixes: d39c90f ("spapr: Fix migration of Radix guests")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
famz pushed a commit that referenced this pull request Jan 25, 2018
when qemu is started with '-no-acpi' CLI option, an attempt
to unplug a CPU using device_del results in null pointer
dereference at:

  #0 object_get_class
  #1 pc_machine_device_unplug_request_cb
  qemu#2 qmp_marshal_device_del

which is caused by pcms->acpi_dev == NULL due to ACPI support
being disabled.

Considering that ACPI support is necessary for unplug to work,
check that it's enabled and fail unplug request gracefully
if no acpi device were found.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
famz pushed a commit that referenced this pull request Jan 25, 2018
The passed-through device might be an express device. In this case the
old code allocated a too small emulated config space in
pci_config_alloc() since pci_config_size() returned the size for a
non-express device. This leads to an out-of-bound write in
xen_pt_config_reg_init(), which sometimes results in crashes. So set
is_express as already done for KVM in vfio-pci.

Shortened ASan report:

==17512==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x611000041648 at pc 0x55e0fdac51ff bp 0x7ffe4af07410 sp 0x7ffe4af07408
WRITE of size 2 at 0x611000041648 thread T0
    #0 0x55e0fdac51fe in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53
    #1 0x55e0fdac51fe in stw_he_p include/qemu/bswap.h:330
    qemu#2 0x55e0fdac51fe in stw_le_p include/qemu/bswap.h:379
    qemu#3 0x55e0fdac51fe in pci_set_word include/hw/pci/pci.h:490
    qemu#4 0x55e0fdac51fe in xen_pt_config_reg_init hw/xen/xen_pt_config_init.c:1991
    qemu#5 0x55e0fdac51fe in xen_pt_config_init hw/xen/xen_pt_config_init.c:2067
    qemu#6 0x55e0fdabcf4d in xen_pt_realize hw/xen/xen_pt.c:830
    qemu#7 0x55e0fdf59666 in pci_qdev_realize hw/pci/pci.c:2034
    qemu#8 0x55e0fdda7d3d in device_set_realized hw/core/qdev.c:914
[...]

0x611000041648 is located 8 bytes to the right of 256-byte region [0x611000041540,0x611000041640)
allocated by thread T0 here:
    #0 0x7ff596a94bb8 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xd9bb8)
    #1 0x7ff57da66580 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x50580)
    qemu#2 0x55e0fdda7d3d in device_set_realized hw/core/qdev.c:914
[...]

Signed-off-by: Simon Gaiser <hw42@ipsumj.de>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
famz pushed a commit that referenced this pull request Jan 25, 2018
bdrv_unref() requires the AioContext lock because bdrv_flush() uses
BDRV_POLL_WHILE(), which assumes the AioContext is currently held.  If
BDRV_POLL_WHILE() runs without AioContext held the
pthread_mutex_unlock() call in aio_context_release() fails.

This patch moves bdrv_unref() into the AioContext locked region to solve
the following pthread_mutex_unlock() failure:

  #0  0x00007f566181969b in raise () at /lib64/libc.so.6
  #1  0x00007f566181b3b1 in abort () at /lib64/libc.so.6
  qemu#2  0x00005592cd590458 in error_exit (err=<optimized out>, msg=msg@entry=0x5592cdaf6d60 <__func__.23977> "qemu_mutex_unlock") at util/qemu-thread-posix.c:36
  qemu#3  0x00005592cd96e738 in qemu_mutex_unlock (mutex=mutex@entry=0x5592ce9505e0) at util/qemu-thread-posix.c:96
  qemu#4  0x00005592cd969b69 in aio_context_release (ctx=ctx@entry=0x5592ce950580) at util/async.c:507
  qemu#5  0x00005592cd8ead78 in bdrv_flush (bs=bs@entry=0x5592cfa87210) at block/io.c:2478
  qemu#6  0x00005592cd89df30 in bdrv_close (bs=0x5592cfa87210) at block.c:3207
  qemu#7  0x00005592cd89df30 in bdrv_delete (bs=0x5592cfa87210) at block.c:3395
  qemu#8  0x00005592cd89df30 in bdrv_unref (bs=0x5592cfa87210) at block.c:4418
  qemu#9  0x00005592cd6b7f86 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=<optimized out>, errp=errp@entry=0x7ffe4a1fc9d8) at blockdev.c:2308

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171206144550.22295-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
famz pushed a commit that referenced this pull request Jan 25, 2018
There is a small chance that iothread_stop() hangs as follows:

  Thread 3 (Thread 0x7f63eba5f700 (LWP 16105)):
  #0  0x00007f64012c09b6 in ppoll () at /lib64/libc.so.6
  #1  0x000055959992eac9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
  qemu#2  0x000055959992eac9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322
  qemu#3  0x0000559599930711 in aio_poll (ctx=0x55959bdb83c0, blocking=blocking@entry=true) at util/aio-posix.c:629
  qemu#4  0x00005595996806fe in iothread_run (opaque=0x55959bd78400) at iothread.c:59
  qemu#5  0x00007f640159f609 in start_thread () at /lib64/libpthread.so.0
  qemu#6  0x00007f64012cce6f in clone () at /lib64/libc.so.6

  Thread 1 (Thread 0x7f640b45b280 (LWP 16103)):
  #0  0x00007f64015a0b6d in pthread_join () at /lib64/libpthread.so.0
  #1  0x00005595999332ef in qemu_thread_join (thread=<optimized out>) at util/qemu-thread-posix.c:547
  qemu#2  0x00005595996808ae in iothread_stop (iothread=<optimized out>) at iothread.c:91
  qemu#3  0x000055959968094d in iothread_stop_iter (object=<optimized out>, opaque=<optimized out>) at iothread.c:102
  qemu#4  0x0000559599857d97 in do_object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0, recurse=recurse@entry=false) at qom/object.c:852
  qemu#5  0x0000559599859477 in object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0) at qom/object.c:867
  qemu#6  0x0000559599680a6e in iothread_stop_all () at iothread.c:341
  qemu#7  0x000055959955b1d5 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4913

The relevant code from iothread_run() is:

  while (!atomic_read(&iothread->stopping)) {
      aio_poll(iothread->ctx, true);

and iothread_stop():

  iothread->stopping = true;
  aio_notify(iothread->ctx);
  ...
  qemu_thread_join(&iothread->thread);

The following scenario can occur:

1. IOThread:
  while (!atomic_read(&iothread->stopping)) -> stopping=false

2. Main loop:
  iothread->stopping = true;
  aio_notify(iothread->ctx);

3. IOThread:
  aio_poll(iothread->ctx, true); -> hang

The bug is explained by the AioContext->notify_me doc comments:

  "If this field is 0, everything (file descriptors, bottom halves,
  timers) will be re-evaluated before the next blocking poll(), thus the
  event_notifier_set call can be skipped."

The problem is that "everything" does not include checking
iothread->stopping.  This means iothread_run() will block in aio_poll()
if aio_notify() was called just before aio_poll().

This patch fixes the hang by replacing aio_notify() with
aio_bh_schedule_oneshot().  This makes aio_poll() or g_main_loop_run()
to return.

Implementing this properly required a new bool running flag.  The new
flag prevents races that are tricky if we try to use iothread->stopping.
Now iothread->stopping is purely for iothread_stop() and
iothread->running is purely for the iothread_run() thread.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171207201320.19284-6-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
famz pushed a commit that referenced this pull request Jan 25, 2018
If we create a thread with QEMU_THREAD_DETACHED mode, QEMU may get a segfault with low probability.

The backtrace is:
   #0  0x00007f46c60291d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
   #1  0x00007f46c602a8c8 in __GI_abort () at abort.c:90
   qemu#2  0x00000000008543c9 in PAT_abort ()
   qemu#3  0x000000000085140d in patchIllInsHandler ()
   qemu#4  <signal handler called>
   qemu#5  pthread_detach (th=139933037614848) at pthread_detach.c:50
   qemu#6  0x0000000000829759 in qemu_thread_create (thread=thread@entry=0x7ffdaa8205e0, name=name@entry=0x94d94a "io-task-worker", start_routine=start_routine@entry=0x7eb9a0 <qio_task_thread_worker>,
       arg=arg@entry=0x3f5cf70, mode=mode@entry=1) at util/qemu_thread_posix.c:512
   qemu#7  0x00000000007ebc96 in qio_task_run_in_thread (task=0x31db2c0, worker=worker@entry=0x7e7e40 <qio_channel_socket_connect_worker>, opaque=0xcd23380, destroy=0x7f1180 <qapi_free_SocketAddress>)
       at io/task.c:141
   qemu#8  0x00000000007e7f33 in qio_channel_socket_connect_async (ioc=ioc@entry=0x626c0b0, addr=<optimized out>, callback=callback@entry=0x55e080 <qemu_chr_socket_connected>, opaque=opaque@entry=0x42862c0,
       destroy=destroy@entry=0x0) at io/channel_socket.c:194
   qemu#9  0x000000000055bdd1 in socket_reconnect_timeout (opaque=0x42862c0) at qemu_char.c:4744
   qemu#10 0x00007f46c72483b3 in g_timeout_dispatch () from /usr/lib64/libglib-2.0.so.0
   qemu#11 0x00007f46c724799a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
   qemu#12 0x000000000076c646 in glib_pollfds_poll () at main_loop.c:228
   qemu#13 0x000000000076c6eb in os_host_main_loop_wait (timeout=348000000) at main_loop.c:273
   qemu#14 0x000000000076c815 in main_loop_wait (nonblocking=nonblocking@entry=0) at main_loop.c:521
   qemu#15 0x000000000056a511 in main_loop () at vl.c:2076
   qemu#16 0x0000000000420705 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4940

The cause of this problem is a glibc bug; for more information, see
https://sourceware.org/bugzilla/show_bug.cgi?id=19951.
The solution for this bug is to use pthread_attr_setdetachstate.

There is a similar issue with pthread_setname_np, which is moved
from creating thread to created thread.

Signed-off-by: linzhecheng <linzhecheng@huawei.com>
Message-Id: <20171128044656.10592-1-linzhecheng@huawei.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
[Simplify the code by removing qemu_thread_set_name, and free the arguments
 before invoking the start routine. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
famz pushed a commit that referenced this pull request Jan 25, 2018
The find_desc_by_name() from util/qemu-option.c relies on the .name not being
NULL to call strcmp(). This check becomes unsafe when the list is not
NULL-terminated, which is the case of nbd_runtime_opts in block/nbd.c, and can
result in segmentation fault when strcmp() tries to access an invalid memory:

    #0 0x00007fff8c75f7d4 in __strcmp_power9 () from /lib64/libc.so.6
    #1 0x00000000102d3ec8 in find_desc_by_name (desc=0x1036d6f0, name=0x28e46670 "server.path") at util/qemu-option.c:166
    qemu#2 0x00000000102d93e0 in qemu_opts_absorb_qdict (opts=0x28e47a80, qdict=0x28e469a0, errp=0x7fffec247c98) at util/qemu-option.c:1026
    qemu#3 0x000000001012a2e4 in nbd_open (bs=0x28e42290, options=0x28e469a0, flags=24578, errp=0x7fffec247d80) at block/nbd.c:406
    qemu#4 0x00000000100144e8 in bdrv_open_driver (bs=0x28e42290, drv=0x1036e070 <bdrv_nbd_unix>, node_name=0x0, options=0x28e469a0, open_flags=24578, errp=0x7fffec247f50) at block.c:1135
    qemu#5 0x0000000010015b04 in bdrv_open_common (bs=0x28e42290, file=0x0, options=0x28e469a0, errp=0x7fffec247f50) at block.c:1395

>From gdb, the desc[i].name was not NULL and resulted in strcmp() accessing an
invalid memory:

    >>> p desc[5]
    $8 = {
      name = 0x1037f098 "R27A",
      type = 1561964883,
      help = 0xc0bbb23e <error: Cannot access memory at address 0xc0bbb23e>,
      def_value_str = 0x2 <error: Cannot access memory at address 0x2>
    }
    >>> p desc[6]
    $9 = {
      name = 0x103dac78 <__gcov0.do_qemu_init_bdrv_nbd_init> "\001",
      type = 272101528,
      help = 0x29ec0b754403e31f <error: Cannot access memory at address 0x29ec0b754403e31f>,
      def_value_str = 0x81f343b9 <error: Cannot access memory at address 0x81f343b9>
    }

This patch fixes the segmentation fault in strcmp() by adding a NULL element at
the end of nbd_runtime_opts.desc list, which is the common practice to most of
other structs like runtime_opts in block/null.c. Thus, the desc[i].name != NULL
check becomes safe because it will not evaluate to true when .desc list reached
its end.

Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1727259
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.vnet.ibm.com>
Message-Id: <20180105133241.14141-2-muriloo@linux.vnet.ibm.com>
CC: qemu-stable@nongnu.org
Fixes: 7ccc44f
Signed-off-by: Eric Blake <eblake@redhat.com>
famz pushed a commit that referenced this pull request Jan 25, 2018
/public/qobject_is_equal_conversion: OK

=================================================================
==14396==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 56 byte(s) in 1 object(s) allocated from:
    #0 0x7f07682c5850 in malloc (/lib64/libasan.so.4+0xde850)
    #1 0x7f0767d12f0c in g_malloc ../glib/gmem.c:94
    qemu#2 0x7f0767d131cf in g_malloc_n ../glib/gmem.c:331
    qemu#3 0x562bd767371f in do_test_equality /home/elmarco/src/qq/tests/check-qobject.c:49
    qemu#4 0x562bd7674a35 in qobject_is_equal_dict_test /home/elmarco/src/qq/tests/check-qobject.c:267
    qemu#5 0x7f0767d37b04 in test_case_run ../glib/gtestutils.c:2237
    qemu#6 0x7f0767d37ec4 in g_test_run_suite_internal ../glib/gtestutils.c:2321
    qemu#7 0x7f0767d37f6d in g_test_run_suite_internal ../glib/gtestutils.c:2333
    qemu#8 0x7f0767d38184 in g_test_run_suite ../glib/gtestutils.c:2408
    qemu#9 0x7f0767d36e0d in g_test_run ../glib/gtestutils.c:1674
    qemu#10 0x562bd7674e75 in main /home/elmarco/src/qq/tests/check-qobject.c:327
    qemu#11 0x7f0766009039 in __libc_start_main (/lib64/libc.so.6+0x21039)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180104160523.22995-9-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
famz pushed a commit that referenced this pull request Jan 25, 2018
Note that data_dir[] will now point to allocated strings.

Fixes:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x7f1448181850 in malloc (/lib64/libasan.so.4+0xde850)
    #1 0x7f1446ed8f0c in g_malloc ../glib/gmem.c:94
    qemu#2 0x7f1446ed91cf in g_malloc_n ../glib/gmem.c:331
    qemu#3 0x7f1446ef739a in g_strsplit ../glib/gstrfuncs.c:2364
    qemu#4 0x55cf276439d7 in main /home/elmarco/src/qq/vl.c:4311
    qemu#5 0x7f143dfad039 in __libc_start_main (/lib64/libc.so.6+0x21039)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180104160523.22995-10-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
famz pushed a commit that referenced this pull request Mar 7, 2018
Leak found thanks to ASAN:

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x55995789ac90 in __interceptor_malloc (/home/elmarco/src/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x1510c90)
    #1 0x7f0a91190f0c in g_malloc /home/elmarco/src/gnome/glib/builddir/../glib/gmem.c:94
    qemu#2 0x5599580a281c in v9fs_path_copy /home/elmarco/src/qemu/hw/9pfs/9p.c:196:17
    qemu#3 0x559958f9ec5d in coroutine_trampoline /home/elmarco/src/qemu/util/coroutine-ucontext.c:116:9
    qemu#4 0x7f0a8766ebbf  (/lib64/libc.so.6+0x50bbf)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
famz pushed a commit that referenced this pull request Mar 7, 2018
Fixes the following ASAN report:

Direct leak of 128 byte(s) in 8 object(s) allocated from:
    #0 0x7fefce311850 in malloc (/lib64/libasan.so.4+0xde850)
    #1 0x7fefcdd5ef0c in g_malloc ../glib/gmem.c:94
    qemu#2 0x559b976faff0 in create_ahci_io_test /home/elmarco/src/qemu/tests/ahci-test.c:1810

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180215212552.26997-6-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
famz pushed a commit that referenced this pull request Mar 7, 2018
Fix the following ASAN reports:

==20125==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f0faea03a38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38)
    #1 0x7f0fae450f75 in g_malloc0 ../glib/gmem.c:124
    qemu#2 0x562fffd526fc in machine_start /home/elmarco/src/qemu/tests/sdhci-test.c:180

Indirect leak of 152 byte(s) in 1 object(s) allocated from:
    #0 0x7f0faea03850 in malloc (/lib64/libasan.so.4+0xde850)
    #1 0x7f0fae450f0c in g_malloc ../glib/gmem.c:94
    qemu#2 0x562fffd5d21d in qpci_init_pc /home/elmarco/src/qemu/tests/libqos/pci-pc.c:122

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180215212552.26997-7-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
famz pushed a commit that referenced this pull request Aug 15, 2018
The way we determine if we can start the incoming migration was
changed to use migration_has_all_channels() in:

  commit 428d890
  Author: Juan Quintela <quintela@redhat.com>
  Date:   Mon Jul 24 13:06:25 2017 +0200

    migration: Create migration_has_all_channels

This method in turn calls multifd_recv_all_channels_created()
which is hardcoded to always return 'true' when multifd is
not in use. This is a latent bug...

...activated in a following commit where that return result
ends up acting as the flag to indicate whether it is possible
to start processing the migration:

  commit 36c2f8b
  Author: Juan Quintela <quintela@redhat.com>
  Date:   Wed Mar 7 08:40:52 2018 +0100

    migration: Delay start of migration main routines

This means that if channel initialization fails with normal
migration, it'll never notice and attempt to start the
incoming migration regardless and crash on a NULL pointer.

This can be seen, for example, if a client connects to a server
requiring TLS, but has an invalid x509 certificate:

qemu-system-x86_64: The certificate hasn't got a known issuer
qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed.

 #0  0x00007fffebd24f2b in raise () at /lib64/libc.so.6
 #1  0x00007fffebd0f561 in abort () at /lib64/libc.so.6
 qemu#2  0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
 qemu#3  0x00007fffebd1d692 in  () at /lib64/libc.so.6
 qemu#4  0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386
 qemu#5  0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116
 qemu#6  0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6
 qemu#7  0x0000000000000000 in  ()

To handle the non-multifd case, we check whether mis->from_src_file
is non-NULL. With this in place, the migration server drops the
rejected client and stays around waiting for another, hopefully
valid, client to arrive.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180619163552.18206-1-berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
famz pushed a commit that referenced this pull request Aug 15, 2018
With a Spice port chardev, it is possible to reenter
monitor_qapi_event_queue() (when the client disconnects for
example). This will dead-lock on monitor_lock.

Instead, use some TLS variables to check for recursion and queue the
events.

Fixes:
 (gdb) bt
 #0  0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0
 #1  0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0
 qemu#2  0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
 qemu#3  0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645
 qemu#4  0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149
 qemu#5  0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235
 qemu#6  0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316
 qemu#7  0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197
 qemu#8  0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197
 qemu#9  0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414
 qemu#10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388
 qemu#11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347
 qemu#12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0
 qemu#13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341
 qemu#14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
 qemu#15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
 qemu#16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353
 qemu#17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199
 qemu#18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112
 qemu#19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147
 qemu#20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42
 qemu#21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425
 qemu#22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468
 qemu#23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517
 qemu#24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538
 qemu#25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624
 qemu#26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649
 qemu#27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58
 qemu#28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822
 qemu#29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862
 qemu#30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644

Note that error report is now moved to the first caller, which may
receive an error for a recursed event. This is probably fine (95% of
callers use &error_abort, the rest have NULL error and ignore it)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180731150144.14022-1-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[*_no_recurse renamed to *_no_reenter, local variables reordered]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
famz pushed a commit that referenced this pull request Aug 24, 2018
Spotted by ASAN:

=================================================================
==27907==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4120 byte(s) in 1 object(s) allocated from:
    #0 0x7f913458ce50 in calloc (/lib64/libasan.so.5+0xeee50)
    #1 0x7f9133fd641d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d)
    qemu#2 0x5561c6643c95 in qdict_crumple_test_recursive /home/elmarco/src/qq/tests/check-block-qdict.c:438
    qemu#3 0x7f9133ff7c49  (/lib64/libglib-2.0.so.0+0x73c49)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180809114417.28718-2-marcandre.lureau@redhat.com>
[Screwed up in commit 2860b2b]
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
famz pushed a commit that referenced this pull request Aug 24, 2018
Spotted by ASAN, during make check...

Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f8e27262c48 in malloc (/lib64/libasan.so.5+0xeec48)
    #1 0x7f8e26a5f3c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5)
    qemu#2 0x555ab67078a8 in qstring_from_str /home/elmarco/src/qq/qobject/qstring.c:67
    qemu#3 0x555ab67071e4 in qstring_new /home/elmarco/src/qq/qobject/qstring.c:24
    qemu#4 0x555ab6713fbf in qstring_from_escaped_str /home/elmarco/src/qq/qobject/json-parser.c:144
    qemu#5 0x555ab671738c in parse_literal /home/elmarco/src/qq/qobject/json-parser.c:506
    qemu#6 0x555ab67179c3 in parse_value /home/elmarco/src/qq/qobject/json-parser.c:569
    qemu#7 0x555ab6715123 in parse_pair /home/elmarco/src/qq/qobject/json-parser.c:306
    qemu#8 0x555ab6715483 in parse_object /home/elmarco/src/qq/qobject/json-parser.c:357
    qemu#9 0x555ab671798b in parse_value /home/elmarco/src/qq/qobject/json-parser.c:561
    qemu#10 0x555ab6717a6b in json_parser_parse_err /home/elmarco/src/qq/qobject/json-parser.c:592
    qemu#11 0x555ab4fd4dcf in handle_qmp_command /home/elmarco/src/qq/monitor.c:4257
    qemu#12 0x555ab6712c4d in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:105
    qemu#13 0x555ab67e01e2 in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:323
    qemu#14 0x555ab67e0af6 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:373
    qemu#15 0x555ab6713010 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:124
    qemu#16 0x555ab4fd58ec in monitor_qmp_read /home/elmarco/src/qq/monitor.c:4337
    qemu#17 0x555ab6559df2 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:175
    qemu#18 0x555ab6559e95 in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:187
    qemu#19 0x555ab6560127 in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66
    qemu#20 0x555ab65d9c73 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84
    qemu#21 0x7f8e26a598ac in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x4c8ac)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180809114417.28718-4-marcandre.lureau@redhat.com>
[Screwed up in commit b273145]
Cc: qemu-stable@nongnu.org
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
famz pushed a commit that referenced this pull request Aug 24, 2018
The ast2500 SDRAM training routine busy waits on the 'init cycle busy
state' bit in DDR PHY Control/Status register #1 (MCR60).

This ensures the bit always reads zero, and allows training to
complete with upstream u-boot on the ast2500-evb.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20180807075757.7242-5-joel@jms.id.au
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
famz pushed a commit that referenced this pull request Sep 6, 2018
if qio_channel_rdma_readv return QIO_CHANNEL_ERR_BLOCK, the destination qemu
crash.

The backtrace is:
(gdb) bt
    #0  0x0000000000000000 in ?? ()
    #1  0x00000000008db50e in qio_channel_set_aio_fd_handler (ioc=0x38111e0, ctx=0x3726080,
        io_read=0x8db841 <qio_channel_restart_read>, io_write=0x0, opaque=0x38111e0) at io/channel.c:
    qemu#2  0x00000000008db952 in qio_channel_set_aio_fd_handlers (ioc=0x38111e0) at io/channel.c:438
    qemu#3  0x00000000008dbab4 in qio_channel_yield (ioc=0x38111e0, condition=G_IO_IN) at io/channel.c:47
    qemu#4  0x00000000007a870b in channel_get_buffer (opaque=0x38111e0, buf=0x440c038 "", pos=0, size=327
        at migration/qemu-file-channel.c:83
    qemu#5  0x00000000007a70f6 in qemu_fill_buffer (f=0x440c000) at migration/qemu-file.c:299
    qemu#6  0x00000000007a79d0 in qemu_peek_byte (f=0x440c000, offset=0) at migration/qemu-file.c:562
    qemu#7  0x00000000007a7a22 in qemu_get_byte (f=0x440c000) at migration/qemu-file.c:575
    qemu#8  0x00000000007a7c78 in qemu_get_be32 (f=0x440c000) at migration/qemu-file.c:655
    qemu#9  0x00000000007a0508 in qemu_loadvm_state (f=0x440c000) at migration/savevm.c:2126
    qemu#10 0x0000000000794141 in process_incoming_migration_co (opaque=0x0) at migration/migration.c:366
    qemu#11 0x000000000095c598 in coroutine_trampoline (i0=84033984, i1=0) at util/coroutine-ucontext.c:1
    qemu#12 0x00007f9c0db56d40 in ?? () from /lib64/libc.so.6
    qemu#13 0x00007f96fe858760 in ?? ()
    qemu#14 0x0000000000000000 in ?? ()

RDMA QIOChannel not implement io_set_aio_fd_handler. so
qio_channel_set_aio_fd_handler will access NULL pointer.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
famz pushed a commit that referenced this pull request Sep 6, 2018
when qio_channel_read return QIO_CHANNEL_ERR_BLOCK, the source qemu crash.

The backtrace is:
    (gdb) bt
    #0  0x00007fb20aba91d7 in raise () from /lib64/libc.so.6
    #1  0x00007fb20abaa8c8 in abort () from /lib64/libc.so.6
    qemu#2  0x00007fb20aba2146 in __assert_fail_base () from /lib64/libc.so.6
    qemu#3  0x00007fb20aba21f2 in __assert_fail () from /lib64/libc.so.6
    qemu#4  0x00000000008dba2d in qio_channel_yield (ioc=0x22f9e20, condition=G_IO_IN) at io/channel.c:460
    qemu#5  0x00000000007a870b in channel_get_buffer (opaque=0x22f9e20, buf=0x3d54038 "", pos=0, size=32768)
        at migration/qemu-file-channel.c:83
    qemu#6  0x00000000007a70f6 in qemu_fill_buffer (f=0x3d54000) at migration/qemu-file.c:299
    qemu#7  0x00000000007a79d0 in qemu_peek_byte (f=0x3d54000, offset=0) at migration/qemu-file.c:562
    qemu#8  0x00000000007a7a22 in qemu_get_byte (f=0x3d54000) at migration/qemu-file.c:575
    qemu#9  0x00000000007a7c46 in qemu_get_be16 (f=0x3d54000) at migration/qemu-file.c:647
    qemu#10 0x0000000000796db7 in source_return_path_thread (opaque=0x2242280) at migration/migration.c:1794
    qemu#11 0x00000000009428fa in qemu_thread_start (args=0x3e58420) at util/qemu-thread-posix.c:504
    qemu#12 0x00007fb20af3ddc5 in start_thread () from /lib64/libpthread.so.0
    qemu#13 0x00007fb20ac6b74d in clone () from /lib64/libc.so.6

This patch fixed by invoke qio_channel_yield only when qemu_in_coroutine().

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
famz pushed a commit that referenced this pull request Sep 6, 2018
Because RDMA QIOChannel not implement shutdown function,
If the to_dst_file was set error, the return path thread
will wait forever. and the migration thread will wait
return path thread exit.

the backtrace of return path thread is:

(gdb) bt
    #0  0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6
    #1  0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000)
        at qemu-timer.c:325
    qemu#2  0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000)
        at migration/rdma.c:1501
    qemu#3  0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000,
        byte_len=0x7ef7091d0640) at migration/rdma.c:1580
    qemu#4  0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000,
        head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726
    qemu#5  0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720,
        expecting=3) at migration/rdma.c:1903
    qemu#6  0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8,
        size=32768) at migration/rdma.c:2714
    qemu#7  0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232
    qemu#8  0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0)
        at migration/qemu-file.c:502
    qemu#9  0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515
    qemu#10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591
    qemu#11 0x00000000006a46d3 in source_return_path_thread (
        opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331
    qemu#12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
    qemu#13 0x00007f372a77635d in clone () from /lib64/libc.so.6

the backtrace of migration thread is:

(gdb) bt
    #0  0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0
    #1  0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>)
        at util/qemu-thread-posix.c:504
    qemu#2  0x00000000006a4bc5 in await_return_path_close_on_source (
        ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460
    qemu#3  0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>,
        current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980)
        at migration/migration.c:1695
    qemu#4  0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>)
        at migration/migration.c:1837
    qemu#5  0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
    qemu#6  0x00007f372a77635d in clone () from /lib64/libc.so.6

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
famz pushed a commit that referenced this pull request Sep 6, 2018
tests/cdrom-test -p /x86_64/cdrom/boot/megasas

Produces the following ASAN leak.

==25700==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x7f06f8faac48 in malloc (/lib64/libasan.so.5+0xeec48)
    #1 0x7f06f87a73c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5)
    qemu#2 0x55a729f17738 in pci_dma_sglist_init /home/elmarco/src/qq/include/hw/pci/pci.h:818
    qemu#3 0x55a729f2a706 in megasas_map_dcmd /home/elmarco/src/qq/hw/scsi/megasas.c:698
    qemu#4 0x55a729f39421 in megasas_handle_dcmd /home/elmarco/src/qq/hw/scsi/megasas.c:1574
    qemu#5 0x55a729f3f70d in megasas_handle_frame /home/elmarco/src/qq/hw/scsi/megasas.c:1955
    qemu#6 0x55a729f40939 in megasas_mmio_write /home/elmarco/src/qq/hw/scsi/megasas.c:2119
    qemu#7 0x55a729f41102 in megasas_port_write /home/elmarco/src/qq/hw/scsi/megasas.c:2170
    qemu#8 0x55a729220e60 in memory_region_write_accessor /home/elmarco/src/qq/memory.c:527
    qemu#9 0x55a7292212b3 in access_with_adjusted_size /home/elmarco/src/qq/memory.c:594
    qemu#10 0x55a72922cf70 in memory_region_dispatch_write /home/elmarco/src/qq/memory.c:1473
    qemu#11 0x55a7290f5907 in flatview_write_continue /home/elmarco/src/qq/exec.c:3255
    qemu#12 0x55a7290f5ceb in flatview_write /home/elmarco/src/qq/exec.c:3294
    qemu#13 0x55a7290f6457 in address_space_write /home/elmarco/src/qq/exec.c:3384
    qemu#14 0x55a7290f64a8 in address_space_rw /home/elmarco/src/qq/exec.c:3395
    qemu#15 0x55a72929ecb0 in kvm_handle_io /home/elmarco/src/qq/accel/kvm/kvm-all.c:1729
    qemu#16 0x55a7292a0db5 in kvm_cpu_exec /home/elmarco/src/qq/accel/kvm/kvm-all.c:1969
    qemu#17 0x55a7291c4212 in qemu_kvm_cpu_thread_fn /home/elmarco/src/qq/cpus.c:1215
    qemu#18 0x55a72a966a6c in qemu_thread_start /home/elmarco/src/qq/util/qemu-thread-posix.c:504
    qemu#19 0x7f06ed486593 in start_thread (/lib64/libpthread.so.0+0x7593)

Move the qemu_sglist_destroy() from megasas_complete_command() to
megasas_unmap_frame(), so map/unmap are balanced.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180814141247.32336-1-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
famz pushed a commit that referenced this pull request Sep 28, 2018
Spotted by ASAN:

=================================================================
==5378==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 65536 byte(s) in 1 object(s) allocated from:
    #0 0x7f788f83bc48 in malloc (/lib64/libasan.so.5+0xeec48)
    #1 0x7f788c9923c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5)
    qemu#2 0x5622a1fe37bc in coroutine_trampoline /home/elmarco/src/qq/util/coroutine-ucontext.c:116
    qemu#3 0x7f788a15d75f in __correctly_grouped_prefixwc (/lib64/libc.so.6+0x4c75f)

(Broken in commit 4c8158e.)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20180809114417.28718-3-marcandre.lureau@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
famz pushed a commit that referenced this pull request Sep 28, 2018
We need to set cs->halted to 1 before calling ppc_set_compat. The reason
is that ppc_set_compat kicks up the new thread created to manage the
hotplugged KVM virtual CPU and the code drives directly to KVM_RUN
ioctl. When cs->halted is 1, the code:

int kvm_cpu_exec(CPUState *cpu)
...
     if (kvm_arch_process_async_events(cpu)) {
         atomic_set(&cpu->exit_request, 0);
         return EXCP_HLT;
     }
...

returns before it reaches KVM_RUN, giving time to the main thread to
finish its job. Otherwise we can fall in a deadlock because the KVM
thread will issue the KVM_RUN ioctl while the main thread is setting up
KVM registers. Depending on how these jobs are scheduled we'll end up
freezing QEMU.

The following output shows kvm_vcpu_ioctl sleeping because it cannot get
the mutex and never will.
PS: kvm_vcpu_ioctl was triggered kvm_set_one_reg - compat_pvr.

STATE: TASK_UNINTERRUPTIBLE|TASK_WAKEKILL

PID: 61564  TASK: c000003e981e0780  CPU: 48  COMMAND: "qemu-system-ppc"
 #0 [c000003e982679a0] __schedule at c000000000b10a44
 #1 [c000003e98267a60] schedule at c000000000b113a8
 qemu#2 [c000003e98267a90] schedule_preempt_disabled at c000000000b11910
 qemu#3 [c000003e98267ab0] __mutex_lock at c000000000b132ec
 qemu#4 [c000003e98267bc0] kvm_vcpu_ioctl at c00800000ea03140 [kvm]
 qemu#5 [c000003e98267d20] do_vfs_ioctl at c000000000407d30
 qemu#6 [c000003e98267dc0] ksys_ioctl at c000000000408674
 qemu#7 [c000003e98267e10] sys_ioctl at c0000000004086f8
 qemu#8 [c000003e98267e30] system_call at c00000000000b488

crash> struct -x kvm.vcpus 0xc000003da0000000
vcpus = {0xc000003db4880000, 0xc000003d52b80000, 0xc0000039e9c80000, 0xc000003d0e200000, 0xc000003d58280000, 0x0, 0x0, ...}

crash> struct -x kvm_vcpu.mutex.owner 0xc000003d58280000
  mutex.owner = {
    counter = 0xc000003a23a5c881 <- flag 1: waiters
  },

crash> bt 0xc000003a23a5c880
PID: 61579  TASK: c000003a23a5c880  CPU: 9   COMMAND: "CPU 4/KVM"
(active)

crash> struct -x kvm_vcpu.mutex.wait_list 0xc000003d58280000
  mutex.wait_list = {
    next = 0xc000003e98267b10,
    prev = 0xc000003e98267b10
  },

crash> struct -x mutex_waiter.task 0xc000003e98267b10
  task = 0xc000003e981e0780

The following command-line was used to reproduce the problem (note: gdb
and trace can change the results).

 $ qemu-ppc/build/ppc64-softmmu/qemu-system-ppc64 -cpu host \
     -enable-kvm -m 4096 \
     -smp 4,maxcpus=8,sockets=1,cores=2,threads=4 \
     -display none -nographic \
     -drive file=disk1.qcow2,format=qcow2
 ...
 (qemu) device_add host-spapr-cpu-core,core-id=4
[no interaction is possible after it, only SIGKILL to take the terminal
back]

Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
famz pushed a commit that referenced this pull request Sep 28, 2018
Spotted by ASAN doing some manual testing:

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f5fcdc75e50 in calloc (/lib64/libasan.so.5+0xeee50)
    #1 0x7f5fcd47241d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d)
    qemu#2 0x55f989be92ce in timer_new /home/elmarco/src/qq/include/qemu/timer.h:561
    qemu#3 0x55f989be92ff in timer_new_ms /home/elmarco/src/qq/include/qemu/timer.h:630
    qemu#4 0x55f989c0219d in hmp_migrate /home/elmarco/src/qq/hmp.c:2038
    qemu#5 0x55f98955927b in handle_hmp_command /home/elmarco/src/qq/monitor.c:3498
    qemu#6 0x55f98955fb8c in monitor_command_cb /home/elmarco/src/qq/monitor.c:4371
    qemu#7 0x55f98ad40f11 in readline_handle_byte /home/elmarco/src/qq/util/readline.c:393
    qemu#8 0x55f98955fa4f in monitor_read /home/elmarco/src/qq/monitor.c:4354
    qemu#9 0x55f98aae30d7 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:175
    qemu#10 0x55f98aae317a in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:187
    qemu#11 0x55f98aae940c in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66
    qemu#12 0x55f98ab63018 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84
    qemu#13 0x7f5fcd46c8ac in g_main_dispatch gmain.c:3177

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180901134652.25884-1-marcandre.lureau@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
famz pushed a commit that referenced this pull request Sep 28, 2018
In qemu_laio_process_completions_and_submit, the AioContext is acquired
before the ioq_submit iteration and after qemu_laio_process_completions,
but the latter is not thread safe either.

This change avoids a number of random crashes when the Main Thread and
an IO Thread collide processing completions for the same AioContext.
This is an example of such crash:

 - The IO Thread is trying to acquire the AioContext at aio_co_enter,
   which evidences that it didn't lock it before:

Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)):
 #0  0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 #1  0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0
 qemu#2  0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0)
    at ../nptl/pthread_mutex_lock.c:114
 qemu#3  0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66
 qemu#4  0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493
 qemu#5  0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478
 qemu#6  0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104
 qemu#7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
 qemu#8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
    at block/linux-aio.c:237
 qemu#9  0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406
 qemu#10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true)
    at util/aio-posix.c:693
 qemu#11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64
 qemu#12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308
 qemu#13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

 - The Main Thread is also processing completions from the same
   AioContext, and crashes due to failed assertion at util/iov.c:78:

Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)):
 #0  0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 #1  0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90
 qemu#2  0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset")
    at assert.c:92
 qemu#3  0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101
 qemu#4  0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78
 qemu#5  0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410
 qemu#6  0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88
 qemu#7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
 qemu#8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
    at block/linux-aio.c:237
 qemu#9  0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272
 qemu#10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497
 qemu#11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574
 qemu#12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false)
    at util/aio-posix.c:604
 qemu#13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273
 qemu#14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390
 qemu#15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590
 qemu#16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774
 qemu#17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401
 qemu#18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449
 qemu#19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080)
    at block/commit.c:92
 qemu#20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973
 qemu#21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90
 qemu#22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118
 qemu#23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436
 qemu#24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
 qemu#25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201
 qemu#26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854
 qemu#27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215
 qemu#28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238
 qemu#29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497
 qemu#30 0x00005631fbd81412 in main_loop () at vl.c:1866
 qemu#31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4647

 - A closer examination shows that s->io_q.in_flight appears to have
   gone backwards:

(gdb) frame 7
 qemu#7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
222	            qemu_laio_process_completion(laiocb);
(gdb) p s
$2 = (LinuxAioState *) 0x7fdfc0297670
(gdb) p *s
$3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0,
    in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0,
      sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241}
(gdb) p/x s->io_q.in_flight
$4 = 0xfffffff0

Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
famz pushed a commit that referenced this pull request Oct 31, 2018
Spotted by ASAN while running:

$ tests/migration-test -p /x86_64/migration/postcopy/recovery

=================================================================
==18034==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 33864 byte(s) in 1 object(s) allocated from:
    #0 0x7f3da7f31e50 in calloc (/lib64/libasan.so.5+0xeee50)
    #1 0x7f3da644441d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d)
    qemu#2 0x55af9db15440 in qemu_fopen_channel_input /home/elmarco/src/qemu/migration/qemu-file-channel.c:183
    qemu#3 0x55af9db15413 in channel_get_output_return_path /home/elmarco/src/qemu/migration/qemu-file-channel.c:159
    qemu#4 0x55af9db0d4ac in qemu_file_get_return_path /home/elmarco/src/qemu/migration/qemu-file.c:78
    qemu#5 0x55af9dad5e4f in open_return_path_on_source /home/elmarco/src/qemu/migration/migration.c:2295
    qemu#6 0x55af9dadb3bf in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:3111
    qemu#7 0x55af9dae1bf3 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:91
    qemu#8 0x55af9daddeca in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:108
    qemu#9 0x55af9e13d3db in qio_task_complete /home/elmarco/src/qemu/io/task.c:158
    qemu#10 0x55af9e13ca03 in qio_task_thread_result /home/elmarco/src/qemu/io/task.c:89
    qemu#11 0x7f3da643b1ca in g_idle_dispatch gmain.c:5535

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180925092245.29565-1-marcandre.lureau@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
famz pushed a commit that referenced this pull request Oct 31, 2018
Recently, the test case has started failing because some job related
functions want to drop the AioContext lock even though it hasn't been
taken:

    (gdb) bt
    #0  0x00007f51c067c9fb in raise () from /lib64/libc.so.6
    #1  0x00007f51c067e77d in abort () from /lib64/libc.so.6
    qemu#2  0x0000558c9d5dde7b in error_exit (err=<optimized out>, msg=msg@entry=0x558c9d6fe120 <__func__.18373> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
    qemu#3  0x0000558c9d6b5263 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x558c9f3999a0, file=file@entry=0x558c9d6fd36f "util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96
    qemu#4  0x0000558c9d6b0565 in aio_context_release (ctx=ctx@entry=0x558c9f399940) at util/async.c:516
    qemu#5  0x0000558c9d5eb3da in job_completed_txn_abort (job=0x558c9f68e640) at job.c:738
    qemu#6  0x0000558c9d5eb227 in job_finish_sync (job=0x558c9f68e640, finish=finish@entry=0x558c9d5eb8d0 <job_cancel_err>, errp=errp@entry=0x0) at job.c:986
    qemu#7  0x0000558c9d5eb8ee in job_cancel_sync (job=<optimized out>) at job.c:941
    qemu#8  0x0000558c9d64d853 in replication_close (bs=<optimized out>) at block/replication.c:148
    qemu#9  0x0000558c9d5e5c9f in bdrv_close (bs=0x558c9f41b020) at block.c:3420
    qemu#10 bdrv_delete (bs=0x558c9f41b020) at block.c:3629
    qemu#11 bdrv_unref (bs=0x558c9f41b020) at block.c:4685
    qemu#12 0x0000558c9d62a3f3 in blk_remove_bs (blk=blk@entry=0x558c9f42a7c0) at block/block-backend.c:783
    qemu#13 0x0000558c9d62a667 in blk_delete (blk=0x558c9f42a7c0) at block/block-backend.c:402
    qemu#14 blk_unref (blk=0x558c9f42a7c0) at block/block-backend.c:457
    qemu#15 0x0000558c9d5dfcea in test_secondary_stop () at tests/test-replication.c:478
    qemu#16 0x00007f51c1f13178 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
    qemu#17 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
    qemu#18 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
    qemu#19 0x00007f51c1f13552 in g_test_run_suite () from /lib64/libglib-2.0.so.0
    qemu#20 0x00007f51c1f13571 in g_test_run () from /lib64/libglib-2.0.so.0
    qemu#21 0x0000558c9d5de31f in main (argc=<optimized out>, argv=<optimized out>) at tests/test-replication.c:581

It is yet unclear whether this should really be considered a bug in the
test case or whether blk_unref() should work for callers that haven't
taken the AioContext lock, but in order to fix the build tests quickly,
just take the AioContext lock around blk_unref().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
famz pushed a commit that referenced this pull request Oct 31, 2018
A socket chardev may not have associated address (when adding client
fd manually for example). But on disconnect, updating socket filename
expects an address and may lead to this crash:

  Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
  0x0000555555d8c70c in SocketAddress_to_str (prefix=0x555556043062 "disconnected:", addr=0x0, is_listen=false, is_telnet=false) at /home/elmarco/src/qq/chardev/char-socket.c:388
  388	    switch (addr->type) {
  (gdb) bt
  #0  0x0000555555d8c70c in SocketAddress_to_str (prefix=0x555556043062 "disconnected:", addr=0x0, is_listen=false, is_telnet=false) at /home/elmarco/src/qq/chardev/char-socket.c:388
  #1  0x0000555555d8c8aa in update_disconnected_filename (s=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:419
  qemu#2  0x0000555555d8c959 in tcp_chr_disconnect (chr=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:438
  qemu#3  0x0000555555d8cba1 in tcp_chr_hup (channel=0x555556b75690, cond=G_IO_HUP, opaque=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:482
  qemu#4  0x0000555555da596e in qio_channel_fd_source_dispatch (source=0x555556bb68b0, callback=0x555555d8cb58 <tcp_chr_hup>, user_data=0x555556b1ed00) at /home/elmarco/src/qq/io/channel-watch.c:84

Replace filename with a generic "disconnected:socket" in this case.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
famz pushed a commit that referenced this pull request Oct 31, 2018
Spotted by ASAN:
=================================================================
==11893==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1120 byte(s) in 28 object(s) allocated from:
    #0 0x7fd0515b0c48 in malloc (/lib64/libasan.so.5+0xeec48)
    #1 0x7fd050ffa3c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5)
    qemu#2 0x559e708b56a4 in qstring_from_str /home/elmarco/src/qq/qobject/qstring.c:66
    qemu#3 0x559e708b4fe0 in qstring_new /home/elmarco/src/qq/qobject/qstring.c:23
    qemu#4 0x559e708bda7d in parse_string /home/elmarco/src/qq/qobject/json-parser.c:143
    qemu#5 0x559e708c1009 in parse_literal /home/elmarco/src/qq/qobject/json-parser.c:484
    qemu#6 0x559e708c1627 in parse_value /home/elmarco/src/qq/qobject/json-parser.c:547
    qemu#7 0x559e708c1c67 in json_parser_parse /home/elmarco/src/qq/qobject/json-parser.c:573
    qemu#8 0x559e708bc0ff in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:92
    qemu#9 0x559e708d1655 in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:292
    qemu#10 0x559e708d1fe1 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:339
    qemu#11 0x559e708bc856 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:121
    qemu#12 0x559e708b8b4b in qobject_from_jsonv /home/elmarco/src/qq/qobject/qjson.c:69
    qemu#13 0x559e708b8d02 in qobject_from_json /home/elmarco/src/qq/qobject/qjson.c:83
    qemu#14 0x559e708a74ae in from_json_str /home/elmarco/src/qq/tests/check-qjson.c:30
    qemu#15 0x559e708a9f83 in utf8_string /home/elmarco/src/qq/tests/check-qjson.c:781
    qemu#16 0x7fd05101bc49 in test_case_run gtestutils.c:2255
    qemu#17 0x7fd05101bc49 in g_test_run_suite_internal gtestutils.c:2339

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180901211917.10372-1-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
famz pushed a commit that referenced this pull request Jan 2, 2019
A quick coredump on an incomplete command line:
./x86_64-softmmu/qemu-system-x86_64 -mon mode=control,pretty=on

 #0  0x00007ffff723d9e4 in g_str_hash () at /lib64/libglib-2.0.so.0
 #1  0x00007ffff723ce38 in g_hash_table_lookup () at /lib64/libglib-2.0.so.0
 qemu#2  0x0000555555cc0073 in object_class_property_find (klass=0x5555566a94b0, name=0x0, errp=0x0) at qom/object.c:1135
 qemu#3  0x0000555555cc004b in object_class_property_find (klass=0x5555566a9440, name=0x0, errp=0x0) at qom/object.c:1129
 qemu#4  0x0000555555cbfe6e in object_property_find (obj=0x5555568348c0, name=0x0, errp=0x0) at qom/object.c:1080
 qemu#5  0x0000555555cc183d in object_resolve_path_component (parent=0x5555568348c0, part=0x0) at qom/object.c:1762
 qemu#6  0x0000555555d82071 in qemu_chr_find (name=0x0) at chardev/char.c:802
 qemu#7  0x00005555559d77cb in mon_init_func (opaque=0x0, opts=0x5555566b65a0, errp=0x0) at vl.c:2291

Fix it to instead fail gracefully.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181023213600.364086-1-eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
famz pushed a commit that referenced this pull request Jan 2, 2019
When using the 9P2000.u version of the protocol, the following shell
command line in the guest can cause QEMU to crash:

    while true; do rm -rf aa; mkdir -p a/b & touch a/b/c & mv a aa; done

With 9P2000.u, file renaming is handled by the WSTAT command. The
v9fs_wstat() function calls v9fs_complete_rename(), which calls
v9fs_fix_path() for every fid whose path is affected by the change.
The involved calls to v9fs_path_copy() may race with any other access
to the fid path performed by some worker thread, causing a crash like
shown below:

Thread 12 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0,
 flags=65536, mode=0) at hw/9pfs/9p-local.c:59
59          while (*path && fd != -1) {
(gdb) bt
#0  0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8,
 path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59
#1  0x0000555555a25e0c in local_opendir_nofollow (fs_ctx=0x555557d958b8,
 path=0x0) at hw/9pfs/9p-local.c:92
qemu#2  0x0000555555a261b8 in local_lstat (fs_ctx=0x555557d958b8,
 fs_path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/9p-local.c:185
qemu#3  0x0000555555a2b367 in v9fs_co_lstat (pdu=0x555557d97498,
 path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/cofile.c:53
qemu#4  0x0000555555a1e9e2 in v9fs_stat (opaque=0x555557d97498)
 at hw/9pfs/9p.c:1083
qemu#5  0x0000555555e060a2 in coroutine_trampoline (i0=-669165424, i1=32767)
 at util/coroutine-ucontext.c:116
qemu#6  0x00007fffef4f5600 in __start_context () at /lib64/libc.so.6
qemu#7  0x0000000000000000 in  ()
(gdb)

The fix is to take the path write lock when calling v9fs_complete_rename(),
like in v9fs_rename().

Impact:  DoS triggered by unprivileged guest users.

Fixes: CVE-2018-19489
Cc: P J P <ppandit@redhat.com>
Reported-by: zhibin hu <noirfate@gmail.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
famz pushed a commit that referenced this pull request Jan 2, 2019
After iov_discard_front(), the iov may be smaller than its initial
size. Fixes the heap-buffer-overflow spotted by ASAN:

==9036==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6060000001e0 at pc 0x7fe632eca3f0 bp 0x7ffddc4a05a0 sp 0x7ffddc49fd48
WRITE of size 32 at 0x6060000001e0 thread T0
    #0 0x7fe632eca3ef  (/lib64/libasan.so.5+0x773ef)
    #1 0x7fe632ecad23 in __interceptor_recvmsg (/lib64/libasan.so.5+0x77d23)
    qemu#2 0x561e7491936b in vubr_backend_recv_cb /home/elmarco/src/qemu/tests/vhost-user-bridge.c:333
    qemu#3 0x561e74917711 in dispatcher_wait /home/elmarco/src/qemu/tests/vhost-user-bridge.c:160
    qemu#4 0x561e7491c3b5 in vubr_run /home/elmarco/src/qemu/tests/vhost-user-bridge.c:725
    qemu#5 0x561e7491c85c in main /home/elmarco/src/qemu/tests/vhost-user-bridge.c:806
    qemu#6 0x7fe631a6c412 in __libc_start_main (/lib64/libc.so.6+0x24412)
    qemu#7 0x561e7491667d in _start (/home/elmarco/src/qemu/build/tests/vhost-user-bridge+0x3967d)

0x6060000001e0 is located 0 bytes to the right of 64-byte region [0x6060000001a0,0x6060000001e0)
allocated by thread T0 here:
    #0 0x7fe632f42848 in __interceptor_malloc (/lib64/libasan.so.5+0xef848)
    #1 0x561e7493acd8 in virtqueue_alloc_element /home/elmarco/src/qemu/contrib/libvhost-user/libvhost-user.c:1848
    qemu#2 0x561e7493c2a8 in vu_queue_pop /home/elmarco/src/qemu/contrib/libvhost-user/libvhost-user.c:1954
    qemu#3 0x561e749189bf in vubr_backend_recv_cb /home/elmarco/src/qemu/tests/vhost-user-bridge.c:297
    qemu#4 0x561e74917711 in dispatcher_wait /home/elmarco/src/qemu/tests/vhost-user-bridge.c:160
    qemu#5 0x561e7491c3b5 in vubr_run /home/elmarco/src/qemu/tests/vhost-user-bridge.c:725
    qemu#6 0x561e7491c85c in main /home/elmarco/src/qemu/tests/vhost-user-bridge.c:806
    qemu#7 0x7fe631a6c412 in __libc_start_main (/lib64/libc.so.6+0x24412)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/lib64/libasan.so.5+0x773ef)
Shadow bytes around the buggy address:
  0x0c0c7fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c0c7fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c0c7fff8000: fa fa fa fa 00 00 00 00 00 00 05 fa fa fa fa fa
  0x0c0c7fff8010: 00 00 00 00 00 00 00 00 fa fa fa fa fd fd fd fd
  0x0c0c7fff8020: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
=>0x0c0c7fff8030: fa fa fa fa 00 00 00 00 00 00 00 00[fa]fa fa fa
  0x0c0c7fff8040: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
  0x0c0c7fff8050: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c0c7fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20181109173028.3372-1-marcandre.lureau@redhat.com>
Signed-off-by: Paolo BOnzini <pbonzini@redhat.com>
famz pushed a commit that referenced this pull request Jan 2, 2019
Let start from the beginning:

Commit b9e413d (in 2.9)
"block: explicitly acquire aiocontext in aio callbacks that need it"
added pairs of aio_context_acquire/release to mirror_write_complete and
mirror_read_complete, when they were aio callbacks for blk_aio_* calls.

Then, commit 2e1990b (in 3.0) "block/mirror: Convert to coroutines"
dropped these blk_aio_* calls, than mirror_write_complete and
mirror_read_complete are not callbacks more, and don't need additional
aiocontext acquiring. Furthermore, mirror_read_complete calls
blk_co_pwritev inside these pair of aio_context_acquire/release, which
leads to the following dead-lock with mirror:

 (gdb) info thr
   Id   Target Id         Frame
   3    Thread (LWP 145412) "qemu-system-x86" syscall ()
   2    Thread (LWP 145416) "qemu-system-x86" __lll_lock_wait ()
 * 1    Thread (LWP 145411) "qemu-system-x86" __lll_lock_wait ()

 (gdb) bt
 #0  __lll_lock_wait ()
 #1  _L_lock_812 ()
 qemu#2  __GI___pthread_mutex_lock
 qemu#3  qemu_mutex_lock_impl (mutex=0x561032dce420 <qemu_global_mutex>,
     file=0x5610327d8654 "util/main-loop.c", line=236) at
     util/qemu-thread-posix.c:66
 qemu#4  qemu_mutex_lock_iothread_impl
 qemu#5  os_host_main_loop_wait (timeout=480116000) at util/main-loop.c:236
 qemu#6  main_loop_wait (nonblocking=0) at util/main-loop.c:497
 qemu#7  main_loop () at vl.c:1892
 qemu#8  main

Printing contents of qemu_global_mutex, I see that "__owner = 145416",
so, thr1 is main loop, and now it wants BQL, which is owned by thr2.

 (gdb) thr 2
 (gdb) bt
 #0  __lll_lock_wait ()
 #1  _L_lock_870 ()
 qemu#2  __GI___pthread_mutex_lock
 qemu#3  qemu_mutex_lock_impl (mutex=0x561034d25dc0, ...
 qemu#4  aio_context_acquire (ctx=0x561034d25d60)
 qemu#5  dma_blk_cb
 qemu#6  dma_blk_io
 qemu#7  dma_blk_read
 qemu#8  ide_dma_cb
 qemu#9  bmdma_cmd_writeb
 qemu#10 bmdma_write
 qemu#11 memory_region_write_accessor
 qemu#12 access_with_adjusted_size
 qemu#15 flatview_write
 qemu#16 address_space_write
 qemu#17 address_space_rw
 qemu#18 kvm_handle_io
 qemu#19 kvm_cpu_exec
 qemu#20 qemu_kvm_cpu_thread_fn
 qemu#21 qemu_thread_start
 qemu#22 start_thread
 qemu#23 clone ()

Printing mutex in fr 2, I see "__owner = 145411", so thr2 wants aio
context mutex, which is owned by thr1. Classic dead-lock.

Then, let's check that aio context is hold by mirror coroutine: just
print coroutine stack of first tracked request in mirror job target:

 (gdb) [...]
 (gdb) qemu coroutine 0x561035dd0860
 #0  qemu_coroutine_switch
 #1  qemu_coroutine_yield
 qemu#2  qemu_co_mutex_lock_slowpath
 qemu#3  qemu_co_mutex_lock
 qemu#4  qcow2_co_pwritev
 qemu#5  bdrv_driver_pwritev
 qemu#6  bdrv_aligned_pwritev
 qemu#7  bdrv_co_pwritev
 qemu#8  blk_co_pwritev
 qemu#9  mirror_read_complete () at block/mirror.c:232
 qemu#10 mirror_co_read () at block/mirror.c:370
 qemu#11 coroutine_trampoline
 qemu#12 __start_context

Yes it is mirror_read_complete calling blk_co_pwritev after acquiring
aio context.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
famz pushed a commit that referenced this pull request Jan 2, 2019
When a monitor is connected to a Spice chardev, the monitor cleanup
can dead-lock:

 #0  0x00007f43446637fd in __lll_lock_wait () at /lib64/libpthread.so.0
 #1  0x00007f434465ccf4 in pthread_mutex_lock () at /lib64/libpthread.so.0
 qemu#2  0x0000556dd79f22ba in qemu_mutex_lock_impl (mutex=0x556dd81c9220 <monitor_lock>, file=0x556dd7ae3648 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
 qemu#3  0x0000556dd7431bd5 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x556dd9abc850, errp=0x7fffb7bbddd8) at /home/elmarco/src/qq/monitor.c:645
 qemu#4  0x0000556dd79d476b in qapi_event_send_spice_disconnected (server=0x556dd98ee760, client=0x556ddaaa8560, errp=0x556dd82180d0 <error_abort>) at qapi/qapi-events-ui.c:149
 qemu#5  0x0000556dd7870fc1 in channel_event (event=3, info=0x556ddad1b590) at /home/elmarco/src/qq/ui/spice-core.c:235
 qemu#6  0x00007f434560a6bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x556ddad1b590) at reds.c:316
 qemu#7  0x00007f43455f393b in main_dispatcher_self_handle_channel_event (info=0x556ddad1b590, event=3, self=0x556dd9a7d8c0) at main-dispatcher.c:197
 qemu#8  0x00007f43455f393b in main_dispatcher_channel_event (self=0x556dd9a7d8c0, event=event@entry=3, info=0x556ddad1b590) at main-dispatcher.c:197
 qemu#9  0x00007f4345612833 in red_stream_push_channel_event (s=s@entry=0x556ddae2ef40, event=event@entry=3) at red-stream.c:414
 qemu#10 0x00007f434561286b in red_stream_free (s=0x556ddae2ef40) at red-stream.c:388
 qemu#11 0x00007f43455f9ddc in red_channel_client_finalize (object=0x556dd9bb21a0) at red-channel-client.c:347
 qemu#12 0x00007f434b5f9fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0
 qemu#13 0x00007f43455fc212 in red_channel_client_push (rcc=0x556dd9bb21a0) at red-channel-client.c:1341
 qemu#14 0x0000556dd76081ba in spice_port_set_fe_open (chr=0x556dd9925e20, fe_open=0) at /home/elmarco/src/qq/chardev/spice.c:241
 qemu#15 0x0000556dd796d74a in qemu_chr_fe_set_open (be=0x556dd9a37c00, fe_open=0) at /home/elmarco/src/qq/chardev/char-fe.c:340
 qemu#16 0x0000556dd796d4d9 in qemu_chr_fe_set_handlers (b=0x556dd9a37c00, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=true) at /home/elmarco/src/qq/chardev/char-fe.c:280
 qemu#17 0x0000556dd796d359 in qemu_chr_fe_deinit (b=0x556dd9a37c00, del=false) at /home/elmarco/src/qq/chardev/char-fe.c:233
 qemu#18 0x0000556dd7432240 in monitor_data_destroy (mon=0x556dd9a37c00) at /home/elmarco/src/qq/monitor.c:786
 qemu#19 0x0000556dd743b968 in monitor_cleanup () at /home/elmarco/src/qq/monitor.c:4683
 qemu#20 0x0000556dd75ce776 in main (argc=3, argv=0x7fffb7bbe458, envp=0x7fffb7bbe478) at /home/elmarco/src/qq/vl.c:4660

Because spice code tries to emit a "disconnected" signal on the
monitors. Fix this dead-lock by releasing the monitor lock for
flush/destroy.

monitor_lock protects mon_list, monitor_qapi_event_state and
monitor_destroyed. monitor_flush() and monitor_data_destroy() don't
access any of those variables.

monitor_cleanup()'s loop is safe because it uses
QTAILQ_FOREACH_SAFE(), and no further monitor can be added after
calling monitor_cleanup() thanks to monitor_destroyed check in
monitor_list_append().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20181205203737.9011-8-marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
famz pushed a commit that referenced this pull request Jan 2, 2019
Currently the log backend prints the process id of QEMU at the start
of each output line, but since threads share the same PID there is no
clear distinction between their outputs.

Having the thread id present in the log makes it easier to see when
output comes from different threads. E.g.:

12423@1538597569.672527:qemu_mutex_lock waiting on mutex 0x1103ee60 (/root/qemu/util/main-loop.c:236)
...
12430@1538597569.503928:qemu_mutex_unlock released mutex 0x1103ee60 (/root/qemu/cpus.c:1238)
12431@1538597569.503937:qemu_mutex_locked taken mutex 0x1103ee60 (/root/qemu/cpus.c:1257)
^here

In the above, 12423 is the main process id and 12430 & 12431 are the
two vcpu threads.

 (qemu) info cpus
 * CPU #0: thread_id=12430
   CPU #1: thread_id=12431

Suggested-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
famz pushed a commit that referenced this pull request Jan 2, 2019
The JSON parser happily accepts duplicate object member names.  The
last value wins.  Reproducer #1:

    $ qemu-system-x86_64 -qmp stdio
    {"QMP": {"version": {"qemu": {"micro": 93, "minor": 0, "major": 3},
    "package": "v3.1.0-rc3-7-g87a45d86ed"}, "capabilities": []}}
    {'execute':'qmp_capabilities'}
    {"return": {}}
    {'execute':'blockdev-add','arguments':{'driver':'null-co',
     'node-name':'foo','node-name':'bar'}}
    {"return": {}}
    {'execute':'query-named-block-nodes'}
    {"return": [{ [...] "node-name": "bar" [...] }]}

Reproducer qemu#2 is iotest 229.

Fix the parser to reject duplicates, and fix iotest 229 not to use
them.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20181206121743.20762-1-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[Trailing whitespace tidied up]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
famz pushed a commit that referenced this pull request Oct 23, 2019
When encounter error, multifd_send_thread should always notify who pay
attention to it before exit. Otherwise it may block migration_thread
at multifd_send_sync_main forever.

Error as follow:
-------------------------------------------------------------------------------
 (gdb) bt
 #0  0x00007f4d669dfa0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
 #1  0x00007f4d669dfa9f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
 qemu#2  0x00007f4d669dfb3b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
 qemu#3  0x0000562ccf0a5614 in qemu_sem_wait (sem=sem@entry=0x562cd1b698e8) at util/qemu-thread-posix.c:319
 qemu#4  0x0000562ccecb4752 in multifd_send_sync_main (rs=<optimized out>) at /qemu/migration/ram.c:1099
 qemu#5  0x0000562ccecb95f4 in ram_save_iterate (f=0x562cd0ecc000, opaque=<optimized out>) at /qemu/migration/ram.c:3550
 qemu#6  0x0000562ccef43c23 in qemu_savevm_state_iterate (f=0x562cd0ecc000, postcopy=false) at migration/savevm.c:1189
 qemu#7  0x0000562ccef3dcf3 in migration_iteration_run (s=0x562cd09fabf0) at migration/migration.c:3131
 qemu#8  migration_thread (opaque=opaque@entry=0x562cd09fabf0) at migration/migration.c:3258
 qemu#9  0x0000562ccf0a4c26 in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:502
 qemu#10 0x00007f4d669d9e25 in start_thread () from /lib64/libpthread.so.0
 qemu#11 0x00007f4d6670635d in clone () from /lib64/libc.so.6
 (gdb) f 4
 qemu#4  0x0000562ccecb4752 in multifd_send_sync_main (rs=<optimized out>) at /qemu/migration/ram.c:1099
 1099	        qemu_sem_wait(&p->sem_sync);
 (gdb) list
 1094	    }
 1095	    for (i = 0; i < migrate_multifd_channels(); i++) {
 1096	        MultiFDSendParams *p = &multifd_send_state->params[i];
 1097
 1098	        trace_multifd_send_sync_main_wait(p->id);
 1099	        qemu_sem_wait(&p->sem_sync);
 1100	    }
 1101	    trace_multifd_send_sync_main(multifd_send_state->packet_num);
 1102	}
 1103
 (gdb) p i
 $1 = 0
 (gdb)  p multifd_send_state->params[0].pending_job
 $2 = 2    //It means the job before MULTIFD_FLAG_SYNC has already fail
 (gdb)  p multifd_send_state->params[0].quit
 $3 = true

Signed-off-by: Ivan Ren <ivanren@tencent.com>
Message-Id: <1567044996-2362-1-git-send-email-ivanren@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
famz pushed a commit that referenced this pull request Oct 23, 2019
The 'blockdev-create' QMP command was introduced as experimental
feature in commit b0292b8, using the assert() debug call.
It got promoted to 'stable' command in 3fb588a, but the
assert call was not removed.

Some block drivers are optional, and bdrv_find_format() might
return a NULL value, triggering the assertion.

Stable code is not expected to abort, so return an error instead.

This is easily reproducible when libnfs is not installed:

  ./configure
  [...]
  module support    no
  Block whitelist (rw)
  Block whitelist (ro)
  libiscsi support  yes
  libnfs support    no
  [...]

Start QEMU:

  $ qemu-system-x86_64 -S -qmp unix:/tmp/qemu.qmp,server,nowait

Send the 'blockdev-create' with the 'nfs' driver:

  $ ( cat << 'EOF'
  {'execute': 'qmp_capabilities'}
  {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'}
  EOF
  ) | socat STDIO UNIX:/tmp/qemu.qmp
  {"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 4}, "package": "v4.1.0-733-g89ea03a7dc"}, "capabilities": ["oob"]}}
  {"return": {}}

QEMU crashes:

  $ gdb qemu-system-x86_64 core
  Program received signal SIGSEGV, Segmentation fault.
  (gdb) bt
  #0  0x00007ffff510957f in raise () at /lib64/libc.so.6
  #1  0x00007ffff50f3895 in abort () at /lib64/libc.so.6
  qemu#2  0x00007ffff50f3769 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
  qemu#3  0x00007ffff5101a26 in .annobin_assert.c_end () at /lib64/libc.so.6
  qemu#4  0x0000555555d7e1f1 in qmp_blockdev_create (job_id=0x555556baee40 "x", options=0x555557666610, errp=0x7fffffffc770) at block/create.c:69
  qemu#5  0x0000555555c96b52 in qmp_marshal_blockdev_create (args=0x7fffdc003830, ret=0x7fffffffc7f8, errp=0x7fffffffc7f0) at qapi/qapi-commands-block-core.c:1314
  qemu#6  0x0000555555deb0a0 in do_qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false, errp=0x7fffffffc898) at qapi/qmp-dispatch.c:131
  qemu#7  0x0000555555deb2a1 in qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false) at qapi/qmp-dispatch.c:174

With this patch applied, QEMU returns a QMP error:

  {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'}
  {"id": "x", "error": {"class": "GenericError", "desc": "Block driver 'nfs' not found or not supported"}}

Cc: qemu-stable@nongnu.org
Reported-by: Xu Tian <xutian@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
famz pushed a commit that referenced this pull request Oct 23, 2019
…uid.

without kvm commit 412a3c41, CPUID(EAX=0xd,ECX=0).EBX always equal to 0 even
through guest update xcr0, this will crash legacy guest(e.g., CentOS 6).
Below is the call trace on the guest.

[    0.000000] kernel BUG at mm/bootmem.c:469!
[    0.000000] invalid opcode: 0000 [#1] SMP
[    0.000000] last sysfs file:
[    0.000000] CPU 0
[    0.000000] Modules linked in:
[    0.000000]
[    0.000000] Pid: 0, comm: swapper Tainted: G           --------------- H  2.6.32-279#2 Red Hat KVM
[    0.000000] RIP: 0010:[<ffffffff81c4edc4>]  [<ffffffff81c4edc4>] alloc_bootmem_core+0x7b/0x29e
[    0.000000] RSP: 0018:ffffffff81a01cd8  EFLAGS: 00010046
[    0.000000] RAX: ffffffff81cb1748 RBX: ffffffff81cb1720 RCX: 0000000001000000
[    0.000000] RDX: 0000000000000040 RSI: 0000000000000000 RDI: ffffffff81cb1720
[    0.000000] RBP: ffffffff81a01d38 R08: 0000000000000000 R09: 0000000000001000
[    0.000000] R10: 02008921da802087 R11: 00000000ffff8800 R12: 0000000000000000
[    0.000000] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000001000000
[    0.000000] FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0018 ES: 0018 CR0: 0000000080050033
[    0.000000] CR2: 0000000000000000 CR3: 0000000001a85000 CR4: 00000000001406b0
[    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.000000] Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a8d020)
[    0.000000] Stack:
[    0.000000]  0000000000000002 81a01dd881eaf060 000000007e5fe227 0000000000001001
[    0.000000] <d> 0000000000000040 0000000000000001 0000006cffffffff 0000000001000000
[    0.000000] <d> ffffffff81cb1720 0000000000000000 0000000000000000 0000000000000000
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff81c4f074>] ___alloc_bootmem_nopanic+0x8d/0xca
[    0.000000]  [<ffffffff81c4f0cf>] ___alloc_bootmem+0x11/0x39
[    0.000000]  [<ffffffff81c4f172>] __alloc_bootmem+0xb/0xd
[    0.000000]  [<ffffffff814d42d9>] xsave_cntxt_init+0x249/0x2c0
[    0.000000]  [<ffffffff814e0689>] init_thread_xstate+0x17/0x25
[    0.000000]  [<ffffffff814e0710>] fpu_init+0x79/0xaa
[    0.000000]  [<ffffffff814e27e3>] cpu_init+0x301/0x344
[    0.000000]  [<ffffffff81276395>] ? sort+0x155/0x230
[    0.000000]  [<ffffffff81c30cf2>] trap_init+0x24e/0x25f
[    0.000000]  [<ffffffff81c2bd73>] start_kernel+0x21c/0x430
[    0.000000]  [<ffffffff81c2b33a>] x86_64_start_reservations+0x125/0x129
[    0.000000]  [<ffffffff81c2b438>] x86_64_start_kernel+0xfa/0x109
[    0.000000] Code: 03 48 89 f1 49 c1 e8 0c 48 0f af d0 48 c7 c6 00 a6 61 81 48 c7 c7 00 e5 79 81 31 c0 4c 89 74 24 08 e8 f2 d7 89 ff 4d 85 e4 75 04 <0f> 0b eb fe 48 8b 45 c0 48 83 e8 01 48 85 45
c0 74 04 0f 0b eb

Signed-off-by: Bingsong Si <owen.si@ucloud.cn>
Message-Id: <20190822042901.16858-1-owen.si@ucloud.cn>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants