Initial Earlgrey emulation based on qemu 8.0.2 #4

loiclefort · 2023-06-26T13:27:10Z

No description provided.

PMP entries before (including) the matched PMP entry may only cover partial of the TLB page, and this may split the page into regions with different permissions. Such as for PMP0 (0x80000008~0x8000000F, R) and PMP1 (0x80000000~ 0x80000FFF, RWX), write access to 0x80000000 will match PMP1. However we cannot cache the translation result in the TLB since this will make the write access to 0x80000008 bypass the check of PMP0. So we should check all of them instead of the matched PMP entry in pmp_get_tlb_size() and set the tlb_size to 1 in this case. Set tlb_size to TARGET_PAGE_SIZE if PMP is not support or there is no PMP rules. Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn> Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn> Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230517091519.34439-2-liweiwei@iscas.ac.cn> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> (cherry picked from commit dc7b599 https://github.com/alistair23/qemu.git riscv-to-apply.next)

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

when `--without-default-features` is used and `--enable-cocoa` is not, configure script fails as meson does not search for objc compiler but later attempts to report objc information. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

There are two dscratch registers in current specifications (0.13.2 and 1.0.0) Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

…ion. From RISC-V Privileged Architecture, 3.1.9. Machine Interrupt Registers (mip and mie) "Bits 15:0 are allocated to standard interrupt causes only, while bits 16 and above are designated for platform or custom use." Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

semihosting should only be handled as an exception, not as an interrupt Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

from riscv-privileged / 3.1.7 MTVEC Base-Address Register: "The mtvec register must always be implemented, but can contain a read-only value." i.e. there should be a way to configure mtvec w/o relying on a CSR write instruction. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Warning: this change impacts all hpmcounters: mcycle, minstret, ... Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Fix issue when enabling an already pending interrupt: the interrupt would not be triggered properly. Signed-off-by: Loïc Lefort <loic@rivosinc.com>

Signed-off-by: Loïc Lefort <loic@rivosinc.com>

Fix several issues with ePMP checks on writes to pmpaddr/pmpcfg. For pmpaddr: Rule Locking Bypass (mseccfg.RLB) was not implemented for writes to locked entries/rules. For pmpcfg: with Machine Mode Lockdown (mseccfg.MML) set (and RLB not set), writes to pmpcfg would be allowed for the wrong cases of ePMP truth table. The existing code restricts writes like so: - L=1, X=1: cases 8, 10, 12, 14 - L=0, RWX!=WX: cases 0-2, 4-6 From the ePMP specification: "Adding a rule with executable privileges that either is M-mode-only or a locked Shared-Region is not possible (...)" This description matches cases 9-11, 13 of the truth table. This commit implements the check by using pmp_get_epmp_operation to convert between PMP configuration and ePMP truth table cases. Signed-off-by: Loïc Lefort <loic@rivosinc.com>

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Although Zbr extension once part of bitmanip specification up to v0.94 as not been ratified and even entirely removed from the final v1.0 specification, it is nevertheless used by the lowrisc-ibex core as an optional ISA, which is enabled in the OpenTitan project. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Follow vendor-device syntax used with other RISCV cores Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Define a generic version of lowrisc-ibex core that can be used in several machines: - leave MISA empty so that generic properties can be defined for this core - remove all arbitrary default properties but ISA I,C,U which are mandatory for ibex - define default mtvec which is only support vectored mode - update privilege version (1.12) according to the Ibex documentation - define ibex architecture identifier Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

- remove hart array (mostly useless, its definition is incoherent and prevent from applying properties to CPU cores) - remove kernel filename option, as it doubles up with firmware option Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Use a separate Kconfig symbol for Ibex UART device: having an Ibex CPU does not imply usage of this UART implementation. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Use a separate Kconfig symbol for Ibex Timer device: having an Ibex CPU does not imply usage of this Timer implementation. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Use a separate Kconfig symbol for Ibex SPI Host device: having an Ibex CPU does not imply usage of this SPI Host implementation. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Legacy OpenTitan machine devices are no longer compatible with current OpenTitan HW. opentitan machine name is too generic, ot-earlgrey implements the standalone version for OpenTitan based on CW310 FPGA hardware description. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

* add a new QEMU device (ot_otbn) * add a proxy between the QEMU device (C) and the OTBN emulator (Rust) * import OTBN emulator based on G. Chadwick RSS (RISC-V simulator) Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Signed-off-by: Loïc Lefort <loic@rivosinc.com>

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Dummy cycles should be expressed in bytes, not bits. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

…cmd. completion Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Signed-off-by: Loïc Lefort <loic@rivosinc.com>

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

For now, it is implemented as a dummy backend. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

loiclefort · 2023-07-05T09:10:36Z

Pushed a new version:

fixed license headers for scripts
updated OTBN emulation license to latest changes from RRS (Apache 2 + LLVM exception).

mundaym

Thanks. I've not done a thorough review of the code (obviously!) but I'm happy for this to be merged when you (@loiclefort) and @rivos-eblot are ready.

This reverts commit b320e21, which accidentally broke TCG, because it made the TCG -cpu max report the presence of MTE to the guest even if the board hadn't enabled MTE by wiring up the tag RAM. This meant that if the guest then tried to use MTE QEMU would segfault accessing the non-existent tag RAM: ==346473==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address (pc 0x55f328952a4a bp 0x00000213a400 sp 0x7f7871859b80 T346476) ==346473==The signal is caused by a READ memory access. ==346473==Hint: this fault was caused by a dereference of a high value address (see register values below). Disassemble the provided pc to learn which register was used. #0 0x55f328952a4a in address_space_to_flatview /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/exec/memory.h:1108:12 lowRISC#1 0x55f328952a4a in address_space_translate /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/exec/memory.h:2797:31 lowRISC#2 0x55f328952a4a in allocation_tag_mem /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../target/arm/tcg/mte_helper.c:176:10 lowRISC#3 0x55f32895366c in helper_stgm /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../target/arm/tcg/mte_helper.c:461:15 lowRISC#4 0x7f782431a293 (<unknown module>) It's also not clear that the KVM logic is correct either: MTE defaults to on there, rather than being only on if the board wants it on. Revert the whole commit for now so we can sort out the issues. (We didn't catch this in CI because we have no test cases in avocado that use guests with MTE support.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20230519145808.348701-1-peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

…moryRegions Currently when portio_list MemoryRegions are freed using portio_list_destroy() the RCU thread segfaults generating a backtrace similar to that below: #0 0x5555599a34b6 in phys_section_destroy ../softmmu/physmem.c:996 lowRISC#1 0x5555599a37a3 in phys_sections_free ../softmmu/physmem.c:1011 lowRISC#2 0x5555599b24aa in address_space_dispatch_free ../softmmu/physmem.c:2430 lowRISC#3 0x55555996a283 in flatview_destroy ../softmmu/memory.c:292 lowRISC#4 0x55555a2cb9fb in call_rcu_thread ../util/rcu.c:284 lowRISC#5 0x55555a29b71d in qemu_thread_start ../util/qemu-thread-posix.c:541 lowRISC#6 0x7ffff4a0cea6 in start_thread nptl/pthread_create.c:477 lowRISC#7 0x7ffff492ca2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfca2e) The problem here is that portio_list_destroy() unparents the portio_list MemoryRegions causing them to be freed immediately, however the flatview still has a reference to the MemoryRegion and so causes a use-after-free segfault when the RCU thread next updates the flatview. Solve the lifetime issue by making MemoryRegionPortioList the owner of the portio_list MemoryRegions, and then reparenting them to the portio_list owner. This ensures that they can be accessed as QOM children via the portio_list owner, yet the MemoryRegionPortioList owns the refcount. Update portio_list_destroy() to unparent the MemoryRegion from the portio_list owner (while keeping mrpio->mr live until finalization of the MemoryRegionPortioList), so that the portio_list MemoryRegions remain allocated until flatview_destroy() removes the final refcount upon the next flatview update. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230419151652.362717-4-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

in order to avoid requests being stuck in a BlockBackend's request queue during cleanup. Having such requests can lead to a deadlock [0] with a virtio-scsi-pci device using iothread that's busy with IO when initiating a shutdown with QMP 'quit'. There is a race where such a queued request can continue sometime (maybe after bdrv_child_free()?) during bdrv_root_unref_child() [1]. The completion will hold the AioContext lock and wait for the BQL during SCSI completion, but the main thread will hold the BQL and wait for the AioContext as part of bdrv_root_unref_child(), leading to the deadlock [0]. [0]: > Thread 3 (Thread 0x7f3bbd87b700 (LWP 135952) "qemu-system-x86"): > #0 __lll_lock_wait (futex=futex@entry=0x564183365f00 <qemu_global_mutex>, private=0) at lowlevellock.c:52 > #1 0x00007f3bc1c0d843 in __GI___pthread_mutex_lock (mutex=0x564183365f00 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80 > #2 0x0000564182939f2e in qemu_mutex_lock_impl (mutex=0x564183365f00 <qemu_global_mutex>, file=0x564182b7f774 "../softmmu/physmem.c", line=2593) at ../util/qemu-thread-posix.c:94 > #3 0x000056418247cc2a in qemu_mutex_lock_iothread_impl (file=0x564182b7f774 "../softmmu/physmem.c", line=2593) at ../softmmu/cpus.c:504 > #4 0x00005641826d5325 in prepare_mmio_access (mr=0x5641856148a0) at ../softmmu/physmem.c:2593 > #5 0x00005641826d6fe7 in address_space_stl_internal (as=0x56418679b310, addr=4276113408, val=16418, attrs=..., result=0x0, endian=DEVICE_LITTLE_ENDIAN) at /home/febner/repos/qemu/memory_ldst.c.inc:318 > #6 0x00005641826d7154 in address_space_stl_le (as=0x56418679b310, addr=4276113408, val=16418, attrs=..., result=0x0) at /home/febner/repos/qemu/memory_ldst.c.inc:357 > #7 0x0000564182374b07 in pci_msi_trigger (dev=0x56418679b0d0, msg=...) at ../hw/pci/pci.c:359 > #8 0x000056418237118b in msi_send_message (dev=0x56418679b0d0, msg=...) at ../hw/pci/msi.c:379 > #9 0x0000564182372c10 in msix_notify (dev=0x56418679b0d0, vector=8) at ../hw/pci/msix.c:542 > #10 0x000056418243719c in virtio_pci_notify (d=0x56418679b0d0, vector=8) at ../hw/virtio/virtio-pci.c:77 > #11 0x00005641826933b0 in virtio_notify_vector (vdev=0x5641867a34a0, vector=8) at ../hw/virtio/virtio.c:1985 > #12 0x00005641826948d6 in virtio_irq (vq=0x5641867ac078) at ../hw/virtio/virtio.c:2461 > #13 0x0000564182694978 in virtio_notify (vdev=0x5641867a34a0, vq=0x5641867ac078) at ../hw/virtio/virtio.c:2473 > #14 0x0000564182665b83 in virtio_scsi_complete_req (req=0x7f3bb000e5d0) at ../hw/scsi/virtio-scsi.c:115 > #15 0x00005641826670ce in virtio_scsi_complete_cmd_req (req=0x7f3bb000e5d0) at ../hw/scsi/virtio-scsi.c:641 > #16 0x000056418266736b in virtio_scsi_command_complete (r=0x7f3bb0010560, resid=0) at ../hw/scsi/virtio-scsi.c:712 > #17 0x000056418239aac6 in scsi_req_complete (req=0x7f3bb0010560, status=2) at ../hw/scsi/scsi-bus.c:1526 > #18 0x000056418239e090 in scsi_handle_rw_error (r=0x7f3bb0010560, ret=-123, acct_failed=false) at ../hw/scsi/scsi-disk.c:242 > #19 0x000056418239e13f in scsi_disk_req_check_error (r=0x7f3bb0010560, ret=-123, acct_failed=false) at ../hw/scsi/scsi-disk.c:265 > #20 0x000056418239e482 in scsi_dma_complete_noio (r=0x7f3bb0010560, ret=-123) at ../hw/scsi/scsi-disk.c:340 > #21 0x000056418239e5d9 in scsi_dma_complete (opaque=0x7f3bb0010560, ret=-123) at ../hw/scsi/scsi-disk.c:371 > #22 0x00005641824809ad in dma_complete (dbs=0x7f3bb000d9d0, ret=-123) at ../softmmu/dma-helpers.c:107 > #23 0x0000564182480a72 in dma_blk_cb (opaque=0x7f3bb000d9d0, ret=-123) at ../softmmu/dma-helpers.c:127 > #24 0x00005641827bf78a in blk_aio_complete (acb=0x7f3bb00021a0) at ../block/block-backend.c:1563 > #25 0x00005641827bfa5e in blk_aio_write_entry (opaque=0x7f3bb00021a0) at ../block/block-backend.c:1630 > #26 0x000056418295638a in coroutine_trampoline (i0=-1342102448, i1=32571) at ../util/coroutine-ucontext.c:177 > #27 0x00007f3bc0caed40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #28 0x00007f3bbd8757f0 in ?? () > #29 0x0000000000000000 in ?? () > > Thread 1 (Thread 0x7f3bbe3e9280 (LWP 135944) "qemu-system-x86"): > #0 __lll_lock_wait (futex=futex@entry=0x5641856f2a00, private=0) at lowlevellock.c:52 > #1 0x00007f3bc1c0d8d1 in __GI___pthread_mutex_lock (mutex=0x5641856f2a00) at ../nptl/pthread_mutex_lock.c:115 > #2 0x0000564182939f2e in qemu_mutex_lock_impl (mutex=0x5641856f2a00, file=0x564182c0e319 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:94 > #3 0x000056418293a140 in qemu_rec_mutex_lock_impl (mutex=0x5641856f2a00, file=0x564182c0e319 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:149 > #4 0x00005641829532d5 in aio_context_acquire (ctx=0x5641856f29a0) at ../util/async.c:728 > #5 0x000056418279d5df in bdrv_set_aio_context_commit (opaque=0x5641856e6e50) at ../block.c:7493 > #6 0x000056418294e288 in tran_commit (tran=0x56418630bfe0) at ../util/transactions.c:87 > #7 0x000056418279d880 in bdrv_try_change_aio_context (bs=0x5641856f7130, ctx=0x56418548f810, ignore_child=0x0, errp=0x0) at ../block.c:7626 > #8 0x0000564182793f39 in bdrv_root_unref_child (child=0x5641856f47d0) at ../block.c:3242 > #9 0x00005641827be137 in blk_remove_bs (blk=0x564185709880) at ../block/block-backend.c:914 > #10 0x00005641827bd689 in blk_remove_all_bs () at ../block/block-backend.c:583 > #11 0x0000564182798699 in bdrv_close_all () at ../block.c:5117 > #12 0x000056418248a5b2 in qemu_cleanup () at ../softmmu/runstate.c:821 > #13 0x0000564182738603 in qemu_default_main () at ../softmmu/main.c:38 > #14 0x0000564182738631 in main (argc=30, argv=0x7ffd675a8a48) at ../softmmu/main.c:48 > > (gdb) p *((QemuMutex*)0x5641856f2a00) > $1 = {lock = {__data = {__lock = 2, __count = 2, __owner = 135952, ... > (gdb) p *((QemuMutex*)0x564183365f00) > $2 = {lock = {__data = {__lock = 2, __count = 0, __owner = 135944, ... [1]: > Thread 1 "qemu-system-x86" hit Breakpoint 5, bdrv_drain_all_end () at ../block/io.c:551 > #0 bdrv_drain_all_end () at ../block/io.c:551 > #1 0x00005569810f0376 in bdrv_graph_wrlock (bs=0x0) at ../block/graph-lock.c:156 > #2 0x00005569810bd3e0 in bdrv_replace_child_noperm (child=0x556982e2d7d0, new_bs=0x0) at ../block.c:2897 > #3 0x00005569810bdef2 in bdrv_root_unref_child (child=0x556982e2d7d0) at ../block.c:3227 > #4 0x00005569810e8137 in blk_remove_bs (blk=0x556982e42880) at ../block/block-backend.c:914 > #5 0x00005569810e7689 in blk_remove_all_bs () at ../block/block-backend.c:583 > #6 0x00005569810c2699 in bdrv_close_all () at ../block.c:5117 > #7 0x0000556980db45b2 in qemu_cleanup () at ../softmmu/runstate.c:821 > #8 0x0000556981062603 in qemu_default_main () at ../softmmu/main.c:38 > #9 0x0000556981062631 in main (argc=30, argv=0x7ffd7a82a418) at ../softmmu/main.c:48 > [Switching to Thread 0x7fe76dab2700 (LWP 103649)] > > Thread 3 "qemu-system-x86" hit Breakpoint 4, blk_inc_in_flight (blk=0x556982e42880) at ../block/block-backend.c:1505 > #0 blk_inc_in_flight (blk=0x556982e42880) at ../block/block-backend.c:1505 > #1 0x00005569810e8f36 in blk_wait_while_drained (blk=0x556982e42880) at ../block/block-backend.c:1312 > #2 0x00005569810e9231 in blk_co_do_pwritev_part (blk=0x556982e42880, offset=3422961664, bytes=4096, qiov=0x556983028060, qiov_offset=0, flags=0) at ../block/block-backend.c:1402 > #3 0x00005569810e9a4b in blk_aio_write_entry (opaque=0x556982e2cfa0) at ../block/block-backend.c:1628 > #4 0x000055698128038a in coroutine_trampoline (i0=-2090057872, i1=21865) at ../util/coroutine-ucontext.c:177 > #5 0x00007fe770f50d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #6 0x00007ffd7a829570 in ?? () > #7 0x0000000000000000 in ?? () Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20230706131418.423713-1-f.ebner@proxmox.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

virtio_load() as a whole should run in coroutine context because it reads from the migration stream and we don't want this to block. However, it calls virtio_set_features_nocheck() and devices don't expect their .set_features callback to run in a coroutine and therefore call functions that may not be called in coroutine context. To fix this, drop out of coroutine context for calling virtio_set_features_nocheck(). Without this fix, the following crash was reported: #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007efc738c05d3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78 #2 0x00007efc73873d26 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007efc738477f3 in __GI_abort () at abort.c:79 #4 0x00007efc7384771b in __assert_fail_base (fmt=0x7efc739dbcb8 "", assertion=assertion@entry=0x560aebfbf5cf "!qemu_in_coroutine()", file=file@entry=0x560aebfcd2d4 "../block/graph-lock.c", line=line@entry=275, function=function@entry=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:92 #5 0x00007efc7386ccc6 in __assert_fail (assertion=0x560aebfbf5cf "!qemu_in_coroutine()", file=0x560aebfcd2d4 "../block/graph-lock.c", line=275, function=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:101 #6 0x0000560aebcd8dd6 in bdrv_register_buf () #7 0x0000560aeb97ed97 in ram_block_added.llvm () #8 0x0000560aebb8303f in ram_block_add.llvm () #9 0x0000560aebb834fa in qemu_ram_alloc_internal.llvm () #10 0x0000560aebb2ac98 in vfio_region_mmap () #11 0x0000560aebb3ea0f in vfio_bars_register () #12 0x0000560aebb3c628 in vfio_realize () #13 0x0000560aeb90f0c2 in pci_qdev_realize () #14 0x0000560aebc40305 in device_set_realized () #15 0x0000560aebc48e07 in property_set_bool.llvm () #16 0x0000560aebc46582 in object_property_set () #17 0x0000560aebc4cd58 in object_property_set_qobject () #18 0x0000560aebc46ba7 in object_property_set_bool () #19 0x0000560aeb98b3ca in qdev_device_add_from_qdict () #20 0x0000560aebb1fbaf in virtio_net_set_features () #21 0x0000560aebb46b51 in virtio_set_features_nocheck () #22 0x0000560aebb47107 in virtio_load () #23 0x0000560aeb9ae7ce in vmstate_load_state () #24 0x0000560aeb9d2ee9 in qemu_loadvm_state_main () #25 0x0000560aeb9d45e1 in qemu_loadvm_state () #26 0x0000560aeb9bc32c in process_incoming_migration_co.llvm () #27 0x0000560aebeace56 in coroutine_trampoline.llvm () Cc: qemu-stable@nongnu.org Buglink: https://issues.redhat.com/browse/RHEL-832 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230905145002.46391-3-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit 92e2e6a) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

Replace the return path retry logic with finishing and restarting the thread. This fixes a race when resuming the migration that leads to a segfault. Currently when doing postcopy we consider that an IO error on the return path file could be due to a network intermittency. We then keep the thread alive but have it do cleanup of the 'from_dst_file' and wait on the 'postcopy_pause_rp' semaphore. When the user issues a migrate resume, a new return path is opened and the thread is allowed to continue. There's a race condition in the above mechanism. It is possible for the new return path file to be setup *before* the cleanup code in the return path thread has had a chance to run, leading to the *new* file being closed and the pointer set to NULL. When the thread is released after the resume, it tries to dereference 'from_dst_file' and crashes: Thread 7 "return path" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffd1dbf700 (LWP 9611)] 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154 154 return f->last_error; (gdb) bt #0 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154 #1 0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206 #2 0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876 #3 0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541 #4 0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477 #5 0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Here's the race (important bit is open_return_path happening before migration_release_dst_files): migration | qmp | return path --------------------------+-----------------------------+--------------------------------- qmp_migrate_pause() shutdown(ms->to_dst_file) f->last_error = -EIO migrate_detect_error() postcopy_pause() set_state(PAUSED) wait(postcopy_pause_sem) qmp_migrate(resume) migrate_fd_connect() resume = state == PAUSED open_return_path <-- TOO SOON! set_state(RECOVER) post(postcopy_pause_sem) (incoming closes to_src_file) res = qemu_file_get_error(rp) migration_release_dst_files() ms->rp_state.from_dst_file = NULL post(postcopy_pause_rp_sem) postcopy_pause_return_path_thread() wait(postcopy_pause_rp_sem) rp = ms->rp_state.from_dst_file goto retry qemu_file_get_error(rp) SIGSEGV ------------------------------------------------------------------------------------------- We can keep the retry logic without having the thread alive and waiting. The only piece of data used by it is the 'from_dst_file' and it is only allowed to proceed after a migrate resume is issued and the semaphore released at migrate_fd_connect(). Move the retry logic to outside the thread by waiting for the thread to finish before pausing the migration. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-8-farosas@suse.de> (cherry picked from commit ef796ee) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

Weiwei Li and others added 15 commits June 22, 2023 10:40

[ot] docs/devel: fix resettable API documentation function name

ce52526

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] net: fix warning on MacOS platforms

b8e227b

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] ui: fix warning on MacOS platforms

f2b40bc

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] meson.build: fix objc handling

9c2901a

when `--without-default-features` is used and `--enable-cocoa` is not, configure script fails as meson does not search for objc compiler but later attempts to report objc information. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] target/riscv: fix dscratch debug CSRs definitions

7919489

There are two dscratch registers in current specifications (0.13.2 and 1.0.0) Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] disas: fix decode of dscratch CSRs

d7758e9

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] disas: decode tinfo trigger CSR

b4a56ff

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] disas: fix invalid RISC-V pmpcfg CSR definitions

2566290

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] target/riscv: do not handle semi hosting as a first-class citizen

43405fb

semihosting should only be handled as an exception, not as an interrupt Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] target/riscv: get_ticks should not use time conversion

53d3b0f

Warning: this change impacts all hpmcounters: mcycle, minstret, ... Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] exec: poison RISC-V target-specific definitions

dbab5c4

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] hw/intc: sifive_plic: fix handling of pending interrupts

373ee46

Fix issue when enabling an already pending interrupt: the interrupt would not be triggered properly. Signed-off-by: Loïc Lefort <loic@rivosinc.com>

loiclefort mentioned this pull request Jun 26, 2023

Initial Earlgrey emulation based on qemu 8.0.2 #3

Closed

loiclefort and others added 14 commits June 26, 2023 18:25

[ot] target/riscv: remove useless check in pmp_is_locked

f4abf86

Signed-off-by: Loïc Lefort <loic@rivosinc.com>

[ot] target/riscv: move ePMP operation conversion into a function

57209c6

Signed-off-by: Loïc Lefort <loic@rivosinc.com>

[ot] target/riscv: add basic support for MSECCFGH CSR

2eb8847

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] target/riscv: add support for impl-defined initial PMP config

a050fce

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] disas: add Zbr instruction disassembling

c27a36c

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] target/riscv: rename ibex hart as lowrisc-ibex

bf3d4fb

Follow vendor-device syntax used with other RISCV cores Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] hw/char: fix Ibex UART device definition

7a57aa7

Use a separate Kconfig symbol for Ibex UART device: having an Ibex CPU does not imply usage of this UART implementation. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] hw/timer: fix Ibex Timer device definition

79adb5b

Use a separate Kconfig symbol for Ibex Timer device: having an Ibex CPU does not imply usage of this Timer implementation. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] hw/ssi: fix Ibex SPI Host device definition

c0edc63

Use a separate Kconfig symbol for Ibex SPI Host device: having an Ibex CPU does not imply usage of this SPI Host implementation. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

rivos-eblot and others added 20 commits July 5, 2023 10:50

[ot] hw/opentitan: ot_otbn: add OpenTitan OTBN emulator

19f66f5

* add a new QEMU device (ot_otbn) * add a proxy between the QEMU device (C) and the OTBN emulator (Rust) * import OTBN emulator based on G. Chadwick RSS (RISC-V simulator) Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] hw/riscv: ot_earlgrey: add OTBN device

46654e2

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] hw/opentitan: ot_hmac: add OpenTitan HMAC emulator

571c837

Signed-off-by: Loïc Lefort <loic@rivosinc.com>

[ot] hw/riscv: ot_earlgrey: add HMAC device

97a3937

Signed-off-by: Loïc Lefort <loic@rivosinc.com>

[ot] hw/opentitan: ot_spi_host: add OpenTitan SPI host implementation

df7b6a0

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] hw/block: m25p80: add SFDP data block for data flash ISSI IS25WP128

1bdf939

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] hw/block: m25p80: fix dummy cycles for FASTREAD

8dbb157

Dummy cycles should be expressed in bytes, not bits. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] hw/block: m25p80: write enable should be reset after each erase …

b1cc629

…cmd. completion Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] hw/riscv: ot_earlgrey: add SPI_HOST device

51b41a1

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] hw/opentitan: ot_timer: add OpenTitan Timer emulator

486cc83

Signed-off-by: Loïc Lefort <loic@rivosinc.com>

[ot] hw/riscv: ot_earlgrey: add Timer device

52cfd5f

Signed-off-by: Loïc Lefort <loic@rivosinc.com>

[ot] hw/opentitan: ot_aon_timer: add OpenTitan AON Timer implementation

fe41509

Signed-off-by: Loïc Lefort <loic@rivosinc.com>

[ot] hw/riscv: ot_earlgrey: add AON Timer device

8f66f81

Signed-off-by: Loïc Lefort <loic@rivosinc.com>

[ot] scripts/opentitan: add a script to boot ROM/ROM_EXT/BL0

1948fb2

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] scripts/opentitan: add new script to run test app w/ test ROM.

3e35060

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] scripts/opentitan: add a script to run OpenTitan unit tests

3bcca3a

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] doc/opentitan: document Earlgrey platform

feb7af4

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] update README file to reflect QEMU fork content.

d346c07

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] hw/opentitan: ot_sram_ctrl: add SRAM controller

6c38523

For now, it is implemented as a dummy backend. Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

[ot] hw/riscv: ot_earlgrey: add SRAM controllers

f03e907

Signed-off-by: Emmanuel Blot <eblot@rivosinc.com>

loiclefort force-pushed the rivos/opentitan branch from 101197e to f03e907 Compare July 5, 2023 09:06

mundaym approved these changes Jul 6, 2023

View reviewed changes

loiclefort merged commit 447498e into lowRISC:ot-earlgrey-8.0.2 Jul 6, 2023

loiclefort deleted the rivos/opentitan branch July 6, 2023 12:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Earlgrey emulation based on qemu 8.0.2 #4

Initial Earlgrey emulation based on qemu 8.0.2 #4

loiclefort commented Jun 26, 2023

loiclefort commented Jul 5, 2023

mundaym left a comment

Initial Earlgrey emulation based on qemu 8.0.2 #4

Initial Earlgrey emulation based on qemu 8.0.2 #4

Conversation

loiclefort commented Jun 26, 2023

loiclefort commented Jul 5, 2023

mundaym left a comment

Choose a reason for hiding this comment