Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Earlgrey emulation based on qemu 8.0.2 #4

Merged
merged 129 commits into from
Jul 6, 2023

Conversation

loiclefort
Copy link

No description provided.

Weiwei Li and others added 15 commits June 22, 2023 10:40
PMP entries before (including) the matched PMP entry may only cover partial
of the TLB page, and this may split the page into regions with different
permissions. Such as for PMP0 (0x80000008~0x8000000F, R) and PMP1 (0x80000000~
0x80000FFF, RWX), write access to 0x80000000 will match PMP1. However we cannot
cache the translation result in the TLB since this will make the write access
to 0x80000008 bypass the check of PMP0. So we should check all of them instead
of the matched PMP entry in pmp_get_tlb_size() and set the tlb_size to 1 in
this case.
Set tlb_size to TARGET_PAGE_SIZE if PMP is not support or there is no PMP rules.

Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn>
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230517091519.34439-2-liweiwei@iscas.ac.cn>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

(cherry picked from commit dc7b599
https://github.com/alistair23/qemu.git riscv-to-apply.next)
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
when `--without-default-features` is used and `--enable-cocoa` is not,
configure script fails as meson does not search for objc compiler but
later attempts to report objc information.

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
There are two dscratch registers in current specifications (0.13.2 and 1.0.0)

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
…ion.

From RISC-V Privileged Architecture,

3.1.9. Machine Interrupt Registers (mip and mie)

"Bits 15:0 are allocated to standard interrupt causes only, while bits
16 and above are designated for platform or custom use."

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
semihosting should only be handled as an exception, not as an interrupt

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
from riscv-privileged / 3.1.7 MTVEC Base-Address Register:

"The mtvec register must always be implemented, but can contain a read-only value."

i.e. there should be a way to configure mtvec w/o relying on a CSR write instruction.

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Warning: this change impacts all hpmcounters: mcycle, minstret, ...

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Fix issue when enabling an already pending interrupt: the interrupt would not be
triggered properly.

Signed-off-by: Loïc Lefort <loic@rivosinc.com>
loiclefort and others added 14 commits June 26, 2023 18:25
Signed-off-by: Loïc Lefort <loic@rivosinc.com>
Signed-off-by: Loïc Lefort <loic@rivosinc.com>
Fix several issues with ePMP checks on writes to pmpaddr/pmpcfg.

For pmpaddr: Rule Locking Bypass (mseccfg.RLB) was not implemented for writes
to locked entries/rules.

For pmpcfg: with Machine Mode Lockdown (mseccfg.MML) set (and RLB not set),
writes to pmpcfg would be allowed for the wrong cases of ePMP truth table.
The existing code restricts writes like so:
- L=1, X=1: cases 8, 10, 12, 14
- L=0, RWX!=WX: cases 0-2, 4-6
From the ePMP specification: "Adding a rule with executable privileges that
either is M-mode-only or a locked Shared-Region is not possible (...)"
This description matches cases 9-11, 13 of the truth table. This commit
implements the check by using pmp_get_epmp_operation to convert between PMP
configuration and ePMP truth table cases.

Signed-off-by: Loïc Lefort <loic@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Although Zbr extension once part of bitmanip specification up to v0.94
as not been ratified and even entirely removed from the final v1.0
specification, it is nevertheless used by the lowrisc-ibex core as an
optional ISA, which is enabled in the OpenTitan project.

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Follow vendor-device syntax used with other RISCV cores

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Define a generic version of lowrisc-ibex core that can be used in several machines:

 - leave MISA empty so that generic properties can be defined for this core
 - remove all arbitrary default properties but ISA I,C,U which are mandatory for ibex
 - define default mtvec which is only support vectored mode
 - update privilege version (1.12) according to the Ibex documentation
 - define ibex architecture identifier

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
 - remove hart array (mostly useless, its definition is incoherent and
   prevent from applying properties to CPU cores)
 - remove kernel filename option, as it doubles up with firmware option

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Use a separate Kconfig symbol for Ibex UART device:
having an Ibex CPU does not imply usage of this UART implementation.

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Use a separate Kconfig symbol for Ibex Timer device:
having an Ibex CPU does not imply usage of this Timer implementation.

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Use a separate Kconfig symbol for Ibex SPI Host device:
having an Ibex CPU does not imply usage of this SPI Host implementation.

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Legacy OpenTitan machine devices are no longer compatible with current OpenTitan HW.
opentitan machine name is too generic, ot-earlgrey implements the standalone
version for OpenTitan based on CW310 FPGA hardware description.

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
rivos-eblot and others added 20 commits July 5, 2023 10:50
* add a new QEMU device (ot_otbn)
* add a proxy between the QEMU device (C) and the OTBN emulator (Rust)
* import OTBN emulator based on G. Chadwick RSS (RISC-V simulator)

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Loïc Lefort <loic@rivosinc.com>
Signed-off-by: Loïc Lefort <loic@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Dummy cycles should be expressed in bytes, not bits.

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
…cmd. completion

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Loïc Lefort <loic@rivosinc.com>
Signed-off-by: Loïc Lefort <loic@rivosinc.com>
Signed-off-by: Loïc Lefort <loic@rivosinc.com>
Signed-off-by: Loïc Lefort <loic@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
For now, it is implemented as a dummy backend.

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>
@loiclefort
Copy link
Author

Pushed a new version:

  • fixed license headers for scripts
  • updated OTBN emulation license to latest changes from RRS (Apache 2 + LLVM exception).

Copy link

@mundaym mundaym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've not done a thorough review of the code (obviously!) but I'm happy for this to be merged when you (@loiclefort) and @rivos-eblot are ready.

@loiclefort loiclefort merged commit 447498e into lowRISC:ot-earlgrey-8.0.2 Jul 6, 2023
@loiclefort loiclefort deleted the rivos/opentitan branch July 6, 2023 12:16
pamaury pushed a commit to pamaury/qemu-ot that referenced this pull request Aug 8, 2023
This reverts commit b320e21,
which accidentally broke TCG, because it made the TCG -cpu max
report the presence of MTE to the guest even if the board hadn't
enabled MTE by wiring up the tag RAM. This meant that if the guest
then tried to use MTE QEMU would segfault accessing the
non-existent tag RAM:

    ==346473==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address (pc 0x55f328952a4a bp 0x00000213a400 sp 0x7f7871859b80 T346476)
    ==346473==The signal is caused by a READ memory access.
    ==346473==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
        #0 0x55f328952a4a in address_space_to_flatview /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/exec/memory.h:1108:12
        lowRISC#1 0x55f328952a4a in address_space_translate /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/exec/memory.h:2797:31
        lowRISC#2 0x55f328952a4a in allocation_tag_mem /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../target/arm/tcg/mte_helper.c:176:10
        lowRISC#3 0x55f32895366c in helper_stgm /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../target/arm/tcg/mte_helper.c:461:15
        lowRISC#4 0x7f782431a293  (<unknown module>)

It's also not clear that the KVM logic is correct either:
MTE defaults to on there, rather than being only on if the
board wants it on.

Revert the whole commit for now so we can sort out the issues.

(We didn't catch this in CI because we have no test cases in
avocado that use guests with MTE support.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20230519145808.348701-1-peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
pamaury pushed a commit to pamaury/qemu-ot that referenced this pull request Aug 8, 2023
…moryRegions

Currently when portio_list MemoryRegions are freed using portio_list_destroy() the RCU
thread segfaults generating a backtrace similar to that below:

    #0 0x5555599a34b6 in phys_section_destroy ../softmmu/physmem.c:996
    lowRISC#1 0x5555599a37a3 in phys_sections_free ../softmmu/physmem.c:1011
    lowRISC#2 0x5555599b24aa in address_space_dispatch_free ../softmmu/physmem.c:2430
    lowRISC#3 0x55555996a283 in flatview_destroy ../softmmu/memory.c:292
    lowRISC#4 0x55555a2cb9fb in call_rcu_thread ../util/rcu.c:284
    lowRISC#5 0x55555a29b71d in qemu_thread_start ../util/qemu-thread-posix.c:541
    lowRISC#6 0x7ffff4a0cea6 in start_thread nptl/pthread_create.c:477
    lowRISC#7 0x7ffff492ca2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfca2e)

The problem here is that portio_list_destroy() unparents the portio_list
MemoryRegions causing them to be freed immediately, however the flatview
still has a reference to the MemoryRegion and so causes a use-after-free
segfault when the RCU thread next updates the flatview.

Solve the lifetime issue by making MemoryRegionPortioList the owner of the
portio_list MemoryRegions, and then reparenting them to the portio_list
owner. This ensures that they can be accessed as QOM children via the
portio_list owner, yet the MemoryRegionPortioList owns the refcount.

Update portio_list_destroy() to unparent the MemoryRegion from the
portio_list owner (while keeping mrpio->mr live until finalization of the
MemoryRegionPortioList), so that the portio_list MemoryRegions remain
allocated until flatview_destroy() removes the final refcount upon the
next flatview update.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230419151652.362717-4-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
mundaym pushed a commit that referenced this pull request Aug 18, 2023
in order to avoid requests being stuck in a BlockBackend's request
queue during cleanup. Having such requests can lead to a deadlock [0]
with a virtio-scsi-pci device using iothread that's busy with IO when
initiating a shutdown with QMP 'quit'.

There is a race where such a queued request can continue sometime
(maybe after bdrv_child_free()?) during bdrv_root_unref_child() [1].
The completion will hold the AioContext lock and wait for the BQL
during SCSI completion, but the main thread will hold the BQL and
wait for the AioContext as part of bdrv_root_unref_child(), leading to
the deadlock [0].

[0]:

> Thread 3 (Thread 0x7f3bbd87b700 (LWP 135952) "qemu-system-x86"):
> #0  __lll_lock_wait (futex=futex@entry=0x564183365f00 <qemu_global_mutex>, private=0) at lowlevellock.c:52
> #1  0x00007f3bc1c0d843 in __GI___pthread_mutex_lock (mutex=0x564183365f00 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
> #2  0x0000564182939f2e in qemu_mutex_lock_impl (mutex=0x564183365f00 <qemu_global_mutex>, file=0x564182b7f774 "../softmmu/physmem.c", line=2593) at ../util/qemu-thread-posix.c:94
> #3  0x000056418247cc2a in qemu_mutex_lock_iothread_impl (file=0x564182b7f774 "../softmmu/physmem.c", line=2593) at ../softmmu/cpus.c:504
> #4  0x00005641826d5325 in prepare_mmio_access (mr=0x5641856148a0) at ../softmmu/physmem.c:2593
> #5  0x00005641826d6fe7 in address_space_stl_internal (as=0x56418679b310, addr=4276113408, val=16418, attrs=..., result=0x0, endian=DEVICE_LITTLE_ENDIAN) at /home/febner/repos/qemu/memory_ldst.c.inc:318
> #6  0x00005641826d7154 in address_space_stl_le (as=0x56418679b310, addr=4276113408, val=16418, attrs=..., result=0x0) at /home/febner/repos/qemu/memory_ldst.c.inc:357
> #7  0x0000564182374b07 in pci_msi_trigger (dev=0x56418679b0d0, msg=...) at ../hw/pci/pci.c:359
> #8  0x000056418237118b in msi_send_message (dev=0x56418679b0d0, msg=...) at ../hw/pci/msi.c:379
> #9  0x0000564182372c10 in msix_notify (dev=0x56418679b0d0, vector=8) at ../hw/pci/msix.c:542
> #10 0x000056418243719c in virtio_pci_notify (d=0x56418679b0d0, vector=8) at ../hw/virtio/virtio-pci.c:77
> #11 0x00005641826933b0 in virtio_notify_vector (vdev=0x5641867a34a0, vector=8) at ../hw/virtio/virtio.c:1985
> #12 0x00005641826948d6 in virtio_irq (vq=0x5641867ac078) at ../hw/virtio/virtio.c:2461
> #13 0x0000564182694978 in virtio_notify (vdev=0x5641867a34a0, vq=0x5641867ac078) at ../hw/virtio/virtio.c:2473
> #14 0x0000564182665b83 in virtio_scsi_complete_req (req=0x7f3bb000e5d0) at ../hw/scsi/virtio-scsi.c:115
> #15 0x00005641826670ce in virtio_scsi_complete_cmd_req (req=0x7f3bb000e5d0) at ../hw/scsi/virtio-scsi.c:641
> #16 0x000056418266736b in virtio_scsi_command_complete (r=0x7f3bb0010560, resid=0) at ../hw/scsi/virtio-scsi.c:712
> #17 0x000056418239aac6 in scsi_req_complete (req=0x7f3bb0010560, status=2) at ../hw/scsi/scsi-bus.c:1526
> #18 0x000056418239e090 in scsi_handle_rw_error (r=0x7f3bb0010560, ret=-123, acct_failed=false) at ../hw/scsi/scsi-disk.c:242
> #19 0x000056418239e13f in scsi_disk_req_check_error (r=0x7f3bb0010560, ret=-123, acct_failed=false) at ../hw/scsi/scsi-disk.c:265
> #20 0x000056418239e482 in scsi_dma_complete_noio (r=0x7f3bb0010560, ret=-123) at ../hw/scsi/scsi-disk.c:340
> #21 0x000056418239e5d9 in scsi_dma_complete (opaque=0x7f3bb0010560, ret=-123) at ../hw/scsi/scsi-disk.c:371
> #22 0x00005641824809ad in dma_complete (dbs=0x7f3bb000d9d0, ret=-123) at ../softmmu/dma-helpers.c:107
> #23 0x0000564182480a72 in dma_blk_cb (opaque=0x7f3bb000d9d0, ret=-123) at ../softmmu/dma-helpers.c:127
> #24 0x00005641827bf78a in blk_aio_complete (acb=0x7f3bb00021a0) at ../block/block-backend.c:1563
> #25 0x00005641827bfa5e in blk_aio_write_entry (opaque=0x7f3bb00021a0) at ../block/block-backend.c:1630
> #26 0x000056418295638a in coroutine_trampoline (i0=-1342102448, i1=32571) at ../util/coroutine-ucontext.c:177
> #27 0x00007f3bc0caed40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #28 0x00007f3bbd8757f0 in ?? ()
> #29 0x0000000000000000 in ?? ()
>
> Thread 1 (Thread 0x7f3bbe3e9280 (LWP 135944) "qemu-system-x86"):
> #0  __lll_lock_wait (futex=futex@entry=0x5641856f2a00, private=0) at lowlevellock.c:52
> #1  0x00007f3bc1c0d8d1 in __GI___pthread_mutex_lock (mutex=0x5641856f2a00) at ../nptl/pthread_mutex_lock.c:115
> #2  0x0000564182939f2e in qemu_mutex_lock_impl (mutex=0x5641856f2a00, file=0x564182c0e319 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:94
> #3  0x000056418293a140 in qemu_rec_mutex_lock_impl (mutex=0x5641856f2a00, file=0x564182c0e319 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:149
> #4  0x00005641829532d5 in aio_context_acquire (ctx=0x5641856f29a0) at ../util/async.c:728
> #5  0x000056418279d5df in bdrv_set_aio_context_commit (opaque=0x5641856e6e50) at ../block.c:7493
> #6  0x000056418294e288 in tran_commit (tran=0x56418630bfe0) at ../util/transactions.c:87
> #7  0x000056418279d880 in bdrv_try_change_aio_context (bs=0x5641856f7130, ctx=0x56418548f810, ignore_child=0x0, errp=0x0) at ../block.c:7626
> #8  0x0000564182793f39 in bdrv_root_unref_child (child=0x5641856f47d0) at ../block.c:3242
> #9  0x00005641827be137 in blk_remove_bs (blk=0x564185709880) at ../block/block-backend.c:914
> #10 0x00005641827bd689 in blk_remove_all_bs () at ../block/block-backend.c:583
> #11 0x0000564182798699 in bdrv_close_all () at ../block.c:5117
> #12 0x000056418248a5b2 in qemu_cleanup () at ../softmmu/runstate.c:821
> #13 0x0000564182738603 in qemu_default_main () at ../softmmu/main.c:38
> #14 0x0000564182738631 in main (argc=30, argv=0x7ffd675a8a48) at ../softmmu/main.c:48
>
> (gdb) p *((QemuMutex*)0x5641856f2a00)
> $1 = {lock = {__data = {__lock = 2, __count = 2, __owner = 135952, ...
> (gdb) p *((QemuMutex*)0x564183365f00)
> $2 = {lock = {__data = {__lock = 2, __count = 0, __owner = 135944, ...

[1]:

> Thread 1 "qemu-system-x86" hit Breakpoint 5, bdrv_drain_all_end () at ../block/io.c:551
> #0  bdrv_drain_all_end () at ../block/io.c:551
> #1  0x00005569810f0376 in bdrv_graph_wrlock (bs=0x0) at ../block/graph-lock.c:156
> #2  0x00005569810bd3e0 in bdrv_replace_child_noperm (child=0x556982e2d7d0, new_bs=0x0) at ../block.c:2897
> #3  0x00005569810bdef2 in bdrv_root_unref_child (child=0x556982e2d7d0) at ../block.c:3227
> #4  0x00005569810e8137 in blk_remove_bs (blk=0x556982e42880) at ../block/block-backend.c:914
> #5  0x00005569810e7689 in blk_remove_all_bs () at ../block/block-backend.c:583
> #6  0x00005569810c2699 in bdrv_close_all () at ../block.c:5117
> #7  0x0000556980db45b2 in qemu_cleanup () at ../softmmu/runstate.c:821
> #8  0x0000556981062603 in qemu_default_main () at ../softmmu/main.c:38
> #9  0x0000556981062631 in main (argc=30, argv=0x7ffd7a82a418) at ../softmmu/main.c:48
> [Switching to Thread 0x7fe76dab2700 (LWP 103649)]
>
> Thread 3 "qemu-system-x86" hit Breakpoint 4, blk_inc_in_flight (blk=0x556982e42880) at ../block/block-backend.c:1505
> #0  blk_inc_in_flight (blk=0x556982e42880) at ../block/block-backend.c:1505
> #1  0x00005569810e8f36 in blk_wait_while_drained (blk=0x556982e42880) at ../block/block-backend.c:1312
> #2  0x00005569810e9231 in blk_co_do_pwritev_part (blk=0x556982e42880, offset=3422961664, bytes=4096, qiov=0x556983028060, qiov_offset=0, flags=0) at ../block/block-backend.c:1402
> #3  0x00005569810e9a4b in blk_aio_write_entry (opaque=0x556982e2cfa0) at ../block/block-backend.c:1628
> #4  0x000055698128038a in coroutine_trampoline (i0=-2090057872, i1=21865) at ../util/coroutine-ucontext.c:177
> #5  0x00007fe770f50d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x00007ffd7a829570 in ?? ()
> #7  0x0000000000000000 in ?? ()

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20230706131418.423713-1-f.ebner@proxmox.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
loiclefort pushed a commit that referenced this pull request Dec 5, 2023
virtio_load() as a whole should run in coroutine context because it
reads from the migration stream and we don't want this to block.

However, it calls virtio_set_features_nocheck() and devices don't
expect their .set_features callback to run in a coroutine and therefore
call functions that may not be called in coroutine context. To fix this,
drop out of coroutine context for calling virtio_set_features_nocheck().

Without this fix, the following crash was reported:

  #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
  #1  0x00007efc738c05d3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
  #2  0x00007efc73873d26 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
  #3  0x00007efc738477f3 in __GI_abort () at abort.c:79
  #4  0x00007efc7384771b in __assert_fail_base (fmt=0x7efc739dbcb8 "", assertion=assertion@entry=0x560aebfbf5cf "!qemu_in_coroutine()",
     file=file@entry=0x560aebfcd2d4 "../block/graph-lock.c", line=line@entry=275, function=function@entry=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:92
  #5  0x00007efc7386ccc6 in __assert_fail (assertion=0x560aebfbf5cf "!qemu_in_coroutine()", file=0x560aebfcd2d4 "../block/graph-lock.c", line=275,
     function=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:101
  #6  0x0000560aebcd8dd6 in bdrv_register_buf ()
  #7  0x0000560aeb97ed97 in ram_block_added.llvm ()
  #8  0x0000560aebb8303f in ram_block_add.llvm ()
  #9  0x0000560aebb834fa in qemu_ram_alloc_internal.llvm ()
  #10 0x0000560aebb2ac98 in vfio_region_mmap ()
  #11 0x0000560aebb3ea0f in vfio_bars_register ()
  #12 0x0000560aebb3c628 in vfio_realize ()
  #13 0x0000560aeb90f0c2 in pci_qdev_realize ()
  #14 0x0000560aebc40305 in device_set_realized ()
  #15 0x0000560aebc48e07 in property_set_bool.llvm ()
  #16 0x0000560aebc46582 in object_property_set ()
  #17 0x0000560aebc4cd58 in object_property_set_qobject ()
  #18 0x0000560aebc46ba7 in object_property_set_bool ()
  #19 0x0000560aeb98b3ca in qdev_device_add_from_qdict ()
  #20 0x0000560aebb1fbaf in virtio_net_set_features ()
  #21 0x0000560aebb46b51 in virtio_set_features_nocheck ()
  #22 0x0000560aebb47107 in virtio_load ()
  #23 0x0000560aeb9ae7ce in vmstate_load_state ()
  #24 0x0000560aeb9d2ee9 in qemu_loadvm_state_main ()
  #25 0x0000560aeb9d45e1 in qemu_loadvm_state ()
  #26 0x0000560aeb9bc32c in process_incoming_migration_co.llvm ()
  #27 0x0000560aebeace56 in coroutine_trampoline.llvm ()

Cc: qemu-stable@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-832
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230905145002.46391-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 92e2e6a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
loiclefort pushed a commit that referenced this pull request Dec 5, 2023
Replace the return path retry logic with finishing and restarting the
thread. This fixes a race when resuming the migration that leads to a
segfault.

Currently when doing postcopy we consider that an IO error on the
return path file could be due to a network intermittency. We then keep
the thread alive but have it do cleanup of the 'from_dst_file' and
wait on the 'postcopy_pause_rp' semaphore. When the user issues a
migrate resume, a new return path is opened and the thread is allowed
to continue.

There's a race condition in the above mechanism. It is possible for
the new return path file to be setup *before* the cleanup code in the
return path thread has had a chance to run, leading to the *new* file
being closed and the pointer set to NULL. When the thread is released
after the resume, it tries to dereference 'from_dst_file' and crashes:

Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
154         return f->last_error;

(gdb) bt
 #0  0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
 #1  0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206
 #2  0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876
 #3  0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541
 #4  0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477
 #5  0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Here's the race (important bit is open_return_path happening before
migration_release_dst_files):

migration                 | qmp                         | return path
--------------------------+-----------------------------+---------------------------------
			    qmp_migrate_pause()
			     shutdown(ms->to_dst_file)
			      f->last_error = -EIO
migrate_detect_error()
 postcopy_pause()
  set_state(PAUSED)
  wait(postcopy_pause_sem)
			    qmp_migrate(resume)
			    migrate_fd_connect()
			     resume = state == PAUSED
			     open_return_path <-- TOO SOON!
			     set_state(RECOVER)
			     post(postcopy_pause_sem)
							(incoming closes to_src_file)
							res = qemu_file_get_error(rp)
							migration_release_dst_files()
							ms->rp_state.from_dst_file = NULL
  post(postcopy_pause_rp_sem)
							postcopy_pause_return_path_thread()
							  wait(postcopy_pause_rp_sem)
							rp = ms->rp_state.from_dst_file
							goto retry
							qemu_file_get_error(rp)
							SIGSEGV
-------------------------------------------------------------------------------------------

We can keep the retry logic without having the thread alive and
waiting. The only piece of data used by it is the 'from_dst_file' and
it is only allowed to proceed after a migrate resume is issued and the
semaphore released at migrate_fd_connect().

Move the retry logic to outside the thread by waiting for the thread
to finish before pausing the migration.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-8-farosas@suse.de>
(cherry picked from commit ef796ee)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants