-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using qemu-hppa-static leads to some hanging commands. #1
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We test with both gcc and clang in order to detect cases where clang issues warnings that gcc misses. To achieve this though we don't need to build QEMU in multiple different configurations. Just a single clang-on-linux build will be sufficient, if we have an "all enabled" config. This cuts the number of build jobs from 21 to 16, reducing the load imposed on shared Travis CI infra. This will make it practical to enable jobs for other interesting & useful configurations without DOS'ing Travis to much. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Although we've reduced the matrix to avoid repeating clang builds we can still add an additional clang build to use the latest stable version of clang which will typically be available on current distros. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
This documents the current design for upgrading TCG emulation to take advantage of modern CPUs by running a thread-per-CPU. The document goes through the various areas of the code affected by such a change and proposes design requirements for each part of the solution. The text marked with (Current solution[s]) to document what the current approaches being used are. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> --- v1 - initial version v2 - update discussion on locks - bit more detail on vCPU scheduling - explicitly mention Translation Blocks - emulated hardware state already covered by iomutex - a few minor rewords v3 - mention this covers system-mode - describe main main-loop and lookup hot-path - mention multi-concurrent-reader lookups - enumerate reasons for invalidation - add more details on lookup structures - describe the softmmu hot-path better - mention store-after-load barrier problem v4 - mention some cross-over between linux-user/system emulation - various minor grammar and scanning fixes - fix reference to tb_ctx.htbale - describe the solution for hot-path - more detail on TB flushing and invalidation - add (Current solution) following design requirements - more detail on iothread/BQL mutex - mention implicit memory barriers - add links to current LL/SC and cmpxchg patch sets - add TLB flag setting as an additional requirement v6 - remove DRAFTING, update copyright dates - document current solutions to each design requirement - tb_lock() serialisation for codegen/patch - cputlb changes to defer cross-vCPU flushes - cputlb atomic updates for slow-path - BQL usage for hardware serialisation - cmpxchg as initial atomic/synchronisation support mechanism
We know there will be cases where MTTCG won't work until additional work is done in the front/back ends to support. It will however be useful to be able to turn it on. As a result MTTCG will default to off unless the combination is supported. However the user can turn it on for the sake of testing. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> [AJB: move to -accel tcg,thread=multi|single, defaults] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> --- v1: - merge with add mttcg option. - update commit message v2: - machine_init->opts_init v3: - moved from -tcg to -accel tcg,thread=single|multi - fix checkpatch warnings v4: - make mttcg_enabled extern, qemu_tcg_mttcg_enabled() now just macro - qemu_tcg_configure now propagates Error instead of exiting - better error checking of thread=foo - use CONFIG flags for default_mttcg_enabled() - disable mttcg with icount, error if both forced on
Currently we rely on the side effect of the main loop grabbing the iothread_mutex to give any long running basic block chains a kick to ensure the next vCPU is scheduled. As this code is being re-factored and rationalised we now do it explicitly here. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> --- v2 - re-base fixes - get_ticks_per_sec() -> NANOSECONDS_PER_SEC v3 - add define for TCG_KICK_FREQ - fix checkpatch warning v4 - wrap next calc in inline qemu_tcg_next_kick() instead of macro v5 - move all kick code into own section - use global for timer - add helper functions to start/stop timer - stop timer when all cores paused
..and make the definition local to cpus. In preparation for MTTCG the concept of a global tcg_current_cpu will no longer make sense. However we still need to keep track of it in the single-threaded case to be able to exit quickly when required. qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as well as qemu_kick_rr_cpu() which will become a no-op in MTTCG. For the time being the setting of the global exit_request remains. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> --- v4: - keep global exit_request setting for now - fix merge conflicts v5: - merge conflicts with kick changes
This finally allows TCG to benefit from the iothread introduction: Drop the global mutex while running pure TCG CPU code. Reacquire the lock when entering MMIO or PIO emulation, or when leaving the TCG loop. We have to revert a few optimization for the current TCG threading model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not kicking it in qemu_cpu_kick. We also need to disable RAM block reordering until we have a more efficient locking mechanism at hand. Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here. These numbers demonstrate where we gain something: 20338 jan 20 0 331m 75m 6904 R 99 0.9 0:50.95 qemu-system-arm 20337 jan 20 0 331m 75m 6904 S 20 0.9 0:26.50 qemu-system-arm The guest CPU was fully loaded, but the iothread could still run mostly independent on a second core. Without the patch we don't get beyond 32206 jan 20 0 330m 73m 7036 R 82 0.9 1:06.00 qemu-system-arm 32204 jan 20 0 330m 73m 7036 S 21 0.9 0:17.03 qemu-system-arm We don't benefit significantly, though, when the guest is not fully loading a host CPU. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com> [FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex] Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> [EGC: fixed iothread lock for cpu-exec IRQ handling] Signed-off-by: Emilio G. Cota <cota@braap.org> [AJB: -smp single-threaded fix, rm old info from commit msg, review updates] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> --- v5 (ajb, base patches): - added an assert to BQL unlock/lock functions instead of hanging - ensure all cpu->interrupt_requests *modifications* protected by BQL - add a re-read on cpu->interrupt_request for correctness - BQL fixes for: - assert BQL held for PPC hypercalls (emulate_spar_hypercall) - SCLP service calls on s390x - merge conflict with kick timer patch v4 (ajb, base patches): - protect cpu->interrupt updates with BQL - fix wording io_mem_notdirty calls - s/we/with/ v3 (ajb, base-patches): - stale iothread_unlocks removed (cpu_exit/resume_from_signal deals with it in the longjmp). - fix re-base conflicts v2 (ajb): - merge with tcg: grab iothread lock in cpu-exec interrupt handling - use existing fns for tracking lock state - lock iothread for mem_region - add assert on mem region modification - ensure smm_helper holds iothread - Add JK s-o-b - Fix-up FK s-o-b annotation v1 (ajb, base-patches): - SMP failure now fixed by previous commit Changes from Fred Konrad (mttcg-v7 via paolo): * Rebase on the current HEAD. * Fixes a deadlock in qemu_devices_reset(). * Remove the mutex in address_space_*
There are now only two uses of the global exit_request left. The first ensures we exit the run_loop when we first start to process pending work and in the kick handler. This is just as easily done by setting the first_cpu->exit_request flag. The second use is in the round robin kick routine. The global exit_request ensured every vCPU would set its local exit_request and cause a full exit of the loop. Now the iothread isn't being held while running we can just rely on the kick handler to push us out as intended. We lightly re-factor the main vCPU thread to ensure cpu->exit_requests cause us to exit the main loop and process any IO requests that might come along. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> --- v5 - minor merge conflict with kick patch v4 - moved to after iothread unlocking patch - needed to remove kick exit_request as well. - remove extraneous cpu->exit_request check - remove stray exit_request setting - remove needless atomic operation
tb_lock() has long been used for linux-user mode to protect code generation. By enabling it now we prepare for MTTCG and ensure all code generation is serialised by this lock. The other major structure that needs protecting is the l1_map and its PageDesc structures. For the SoftMMU case we also use tb_lock() to protect these structures instead of linux-user mmap_lock() which as the name suggests serialises updates to the structure as a result of guest mmap operations. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> --- v4 - split from main tcg: enable thread-per-vCPU patch
There are a couple of changes that occur at the same time here: - introduce a single vCPU qemu_tcg_cpu_thread_fn One of these is spawned per vCPU with its own Thread and Condition variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old single threaded function. - the TLS current_cpu variable is now live for the lifetime of MTTCG vCPU threads. This is for future work where async jobs need to know the vCPU context they are operating in. The user to switch on multi-thread behaviour and spawn a thread per-vCPU. For a simple test kvm-unit-test like: ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi Will now use 4 vCPU threads and have an expected FAIL (instead of the unexpected PASS) as the default mode of the test has no protection when incrementing a shared variable. We enable the parallel_cpus flag to ensure we generate correct barrier and atomic code if supported by the front and backends. As each back end and front end is updated they can add CONFIG_MTTCG_TARGET and CONFIG_MTTCG_HOST to their respective make configurations so default_mttcg_enabled does the right thing. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [AJB: Some fixes, conditionally, commit rewording] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> --- v1 (ajb): - fix merge conflicts - maintain single-thread approach v2 - re-base fixes (no longer has tb_find_fast lock tweak ahead) - remove bogus break condition on cpu->stop/stopped - only process exiting cpus exit_request - handle all cpus idle case (fixes shutdown issues) - sleep on EXCP_HALTED in mttcg mode (prevent crash on start-up) - move icount timer into helper v3 - update the commit message - rm kick_timer tweaks (move to earlier tcg_current_cpu tweaks) - ensure linux-user clears cpu->exit_request in loop - purging of global exit_request and tcg_current_cpu in earlier patches - fix checkpatch warnings v4 - don't break loop on stopped, we may never schedule next in RR mode - make sure we flush iorequests of current cpu if we exited on one - add tcg_cpu_exec_start/end wraps for async work functions - stop killing of current_cpu on loop exit - set current_cpu in the single thread function - remove sleep special case, add qemu_tcg_should_sleep() for mttcg - no need to atomic set cpu->exit_request going into the loop - removed extraneous setting of exit_request - split tb_lock() part of patch - rename single thread fn to qemu_tcg_rr_cpu_thread_fn v5 - enable parallel_cpus for MTTCG (for barriers/atomics) - expand on CONFIG_ flags in commit message v7 - move parallel_cpus down into the mttcg leg
The patch enables handling atomic code in the guest. This should be preferably done in cpu_handle_exception(), but the current assumptions regarding when we can execute atomic sections cause a deadlock. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> [AJB: tweak title] Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
For SoftMMU the TLB flushes are an example of a task that can be triggered on one vCPU by another. To deal with this properly we need to use safe work to ensure these changes are done safely. The new assert can be enabled while debugging to catch these cases. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Some architectures allow to flush the tlb of other VCPUs. This is not a problem when we have only one thread for all VCPUs but it definitely needs to be an asynchronous work when we are in true multithreaded work. We take the tb_lock() when doing this to avoid racing with other threads which may be invalidating TB's at the same time. The alternative would be to use proper atomic primitives to clear the tlb entries en-mass. This patch doesn't do anything to protect other cputlb function being called in MTTCG mode making cross vCPU changes. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> [AJB: remove need for g_malloc on defer, make check fixes, tb_lock] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> --- v6 (base patches) - don't use cmpxchg_bool (we drop it later anyway) - use RUN_ON_CPU macros instead of inlines - bug out of tlb_flush if !tcg_enabled() (MacOSX make check failure) v5 (base patches) - take tb_lock() for memset - ensure tb_flush_page properly asyncs work for other vCPUs - use run_on_cpu_data v4 (base_patches) - brought forward from arm enabling series - restore pending_tlb_flush flag v1 - Remove tlb_flush_all just do the check in tlb_flush. - remove the need to g_malloc - tlb_flush calls direct if !cpu->created fixup! cputlb: introduce tlb_flush_* async work.
This moves the helper function closer to where it is called and updates the error message to report via error_report instead of the deprecated fprintf. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
The main use case for tlb_reset_dirty is to set the TLB_NOTDIRTY flags in TLB entries to force the slow-path on writes. This is used to mark page ranges containing code which has been translated so it can be invalidated if written to. To do this safely we need to ensure the TLB entries in question for all vCPUs are updated before we attempt to run the code otherwise a race could be introduced. To achieve this we atomically set the flag in tlb_reset_dirty_range and take care when setting it when the TLB entry is filled. The helper function is made static as it isn't used outside of cputlb. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> --- v6 - use TARGET_PAGE_BITS_MIN - use run_on_cpu helpers v7 - fix tlb_debug fmt for 32bit build
This introduces support to the cputlb API for flushing all CPUs TLBs with one call. This avoids the need for target helpers to iterate through the vCPUs themselves. Additionally these functions provide a "wait" argument which will cause the work to be scheduled and the calling vCPU to exit its loop. The result will be all the flush operations will be complete by the time the originating vCPU starts executing again. It is up to the caller to ensure enough state has been saved so execution can be restarted at the next appropriate instruction. Some guest architectures can defer completion of flush operations until later and are free to set wait to false. If they later schedule work using the async_safe_work mechanism they can be sure other vCPUs will have flushed their TLBs by the point execution returns from the safe work. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
When switching a new vCPU on we want to complete a bunch of the setup work before we start scheduling the vCPU thread. To do this cleanly we defer vCPU setup to async work which will run the vCPUs execution context as the thread is woken up. The scheduling of the work will kick the vCPU awake. This avoids potential races in MTTCG system emulation. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
cputlb owns the TLB entries and knows how to safely update them in MTTCG. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Most ARMCPRegInfo structures just allow updating of the CPU field. However some have more complex operations that *may* be have cross vCPU effects therefor need to be serialised. The most obvious examples at the moment are things that affect the GICv3 IRQ controller. To avoid applying this requirement to all registers with custom access functions we check for if the type is marked ARM_CP_IO. By default all MMIO access to devices already takes the BQL to serialise hardware emulation. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
As the arm_call_el_change_hook may affect global state (for example with updating the global GIC state) we need to assert/take the BQL. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
The WFE and YIELD instructions are really only hints and in TCG's case they were useful to move the scheduling on from one vCPU to the next. In the parallel context (MTTCG) this just causes an unnecessary cpu_exit and contention of the BQL. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
This is a purely mechanical change to make the ARM_CP flags neatly align and use a consistent format so it is easier to see which bit each flag is. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Some helpers may trigger an immediate exit of the cpu_loop. If this happens the PC need to be rectified to ensure the restart will begin on the next instruction. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Previously flushes on other vCPUs would only get serviced when they exited their TranslationBlocks. While this isn't overly problematic it violates the semantics of TLB flush from the point of view of source vCPU. To solve this we call the cputlb *_all_cpus() functions to do the flushes and ask it to ensure all flushes are completed before we start the next instruction. As this involves exiting the cpu_loop we need to ensure the PC is saved before the tlb helper functions are called. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
This enables the multi-threaded system emulation by default for ARMv7 and ARMv8 guests using the x86_64 TCG backend. This means: - The x86_64 TCG backend supports cmpxchg based atomic ops - The x86_64 TCG backend emits barriers for barrier ops And on the guest side: - The ARM translate.c/translate-64.c have been converted to - use MTTCG safe atomic primitives - emit the appropriate barrier ops - The ARM machine has been updated to - hold the BQL when modifying shared cross-vCPU state - defer cpu_reset to async safe work Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Adds tcg_gen_extract_* and tcg_gen_sextract_* for extraction of fixed position bitfields, much like we already have for deposit. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
Assert that len is not 0. Since we have asserted that ofs + len <= N, a later check for len == N implies that ofs == 0. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
While we don't require a new opcode, it is handy to have an expander that knows the first source is zero. Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
This allows us to use this detection within the TCG_TARGET_HAS_* macros, instead of requiring a function call into tcg-target.inc.c. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
rth7680
pushed a commit
that referenced
this pull request
Apr 5, 2021
Incoming enabled bitmaps are busy, because we do bdrv_dirty_bitmap_create_successor() for them. But disabled bitmaps being migrated are not marked busy, and user can remove them during the incoming migration. Then we may crash in cancel_incoming_locked() when try to remove the bitmap that was already removed by user, like this: #0 qemu_mutex_lock_impl (mutex=0x5593d88c50d1, file=0x559680554b20 "../block/dirty-bitmap.c", line=64) at ../util/qemu-thread-posix.c:77 #1 bdrv_dirty_bitmaps_lock (bs=0x5593d88c0ee9) at ../block/dirty-bitmap.c:64 #2 bdrv_release_dirty_bitmap (bitmap=0x5596810e9570) at ../block/dirty-bitmap.c:362 #3 cancel_incoming_locked (s=0x559680be8208 <dbm_state+40>) at ../migration/block-dirty-bitmap.c:918 qemu#4 dirty_bitmap_load (f=0x559681d02b10, opaque=0x559680be81e0 <dbm_state>, version_id=1) at ../migration/block-dirty-bitmap.c:1194 qemu#5 vmstate_load (f=0x559681d02b10, se=0x559680fb5810) at ../migration/savevm.c:908 qemu#6 qemu_loadvm_section_part_end (f=0x559681d02b10, mis=0x559680fb4a30) at ../migration/savevm.c:2473 qemu#7 qemu_loadvm_state_main (f=0x559681d02b10, mis=0x559680fb4a30) at ../migration/savevm.c:2626 qemu#8 postcopy_ram_listen_thread (opaque=0x0) at ../migration/savevm.c:1871 qemu#9 qemu_thread_start (args=0x5596817ccd10) at ../util/qemu-thread-posix.c:521 qemu#10 start_thread () at /lib64/libpthread.so.0 qemu#11 clone () at /lib64/libc.so.6 Note bs pointer taken from bitmap: it's definitely bad aligned. That's because we are in use after free, bitmap is already freed. So, let's make disabled bitmaps (being migrated) busy during incoming migration. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210322094906.5079-2-vsementsov@virtuozzo.com>
rth7680
pushed a commit
that referenced
this pull request
Apr 5, 2021
When building with --enable-sanitizers we get: Direct leak of 32 byte(s) in 2 object(s) allocated from: #0 0x5618479ec7cf in malloc (qemu-system-aarch64+0x233b7cf) #1 0x7f675745f958 in g_malloc (/lib64/libglib-2.0.so.0+0x58958) #2 0x561847f02ca2 in usb_packet_init hw/usb/core.c:531:5 #3 0x561848df4df4 in usb_ehci_init hw/usb/hcd-ehci.c:2575:5 qemu#4 0x561847c119ac in ehci_sysbus_init hw/usb/hcd-ehci-sysbus.c:73:5 qemu#5 0x56184a5bdab8 in object_init_with_type qom/object.c:375:9 qemu#6 0x56184a5bd955 in object_init_with_type qom/object.c:371:9 qemu#7 0x56184a5a2bda in object_initialize_with_type qom/object.c:517:5 qemu#8 0x56184a5a24d5 in object_initialize qom/object.c:536:5 qemu#9 0x56184a5a2f6c in object_initialize_child_with_propsv qom/object.c:566:5 qemu#10 0x56184a5a2e60 in object_initialize_child_with_props qom/object.c:549:10 qemu#11 0x56184a5a3a1e in object_initialize_child_internal qom/object.c:603:5 qemu#12 0x561849542d18 in npcm7xx_init hw/arm/npcm7xx.c:427:5 Similarly to commit d710e1e ("usb: ehci: fix memory leak in ehci"), fix by calling usb_ehci_finalize() to free the USBPacket. Fixes: 7341ea0 Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210323183701.281152-1-f4bug@amsat.org> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Apr 5, 2021
When building with --enable-sanitizers we get: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x5618479ec7cf in malloc (qemu-system-aarch64+0x233b7cf) #1 0x7f675745f958 in g_malloc (/lib64/libglib-2.0.so.0+0x58958) #2 0x561847c2dcc9 in xlnx_dp_init hw/display/xlnx_dp.c:1259:5 #3 0x56184a5bdab8 in object_init_with_type qom/object.c:375:9 qemu#4 0x56184a5a2bda in object_initialize_with_type qom/object.c:517:5 qemu#5 0x56184a5a24d5 in object_initialize qom/object.c:536:5 qemu#6 0x56184a5a2f6c in object_initialize_child_with_propsv qom/object.c:566:5 qemu#7 0x56184a5a2e60 in object_initialize_child_with_props qom/object.c:549:10 qemu#8 0x56184a5a3a1e in object_initialize_child_internal qom/object.c:603:5 qemu#9 0x5618495aa431 in xlnx_zynqmp_init hw/arm/xlnx-zynqmp.c:273:5 The RX/TX FIFOs are created in xlnx_dp_init(), add xlnx_dp_finalize() to destroy them. Fixes: 58ac482 ("introduce xlnx-dp") Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210323182958.277654-1-f4bug@amsat.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
rth7680
pushed a commit
that referenced
this pull request
Apr 5, 2021
g_hash_table_add always retains ownership of the pointer passed in as the key. Its return status merely indicates whether the added entry was new, or replaced an existing entry. Thus key must never be freed after this method returns. Spotted by ASAN: ==2407186==ERROR: AddressSanitizer: heap-use-after-free on address 0x6020003ac4f0 at pc 0x7ffff766659c bp 0x7fffffffd1d0 sp 0x7fffffffc980 READ of size 1 at 0x6020003ac4f0 thread T0 #0 0x7ffff766659b (/lib64/libasan.so.6+0x8a59b) #1 0x7ffff6bfa843 in g_str_equal ../glib/ghash.c:2303 #2 0x7ffff6bf8167 in g_hash_table_lookup_node ../glib/ghash.c:493 #3 0x7ffff6bf9b78 in g_hash_table_insert_internal ../glib/ghash.c:1598 qemu#4 0x7ffff6bf9c32 in g_hash_table_add ../glib/ghash.c:1689 qemu#5 0x5555596caad4 in module_load_one ../util/module.c:233 qemu#6 0x5555596ca949 in module_load_one ../util/module.c:225 qemu#7 0x5555596ca949 in module_load_one ../util/module.c:225 qemu#8 0x5555596cbdf4 in module_load_qom_all ../util/module.c:349 Typical C bug... Fixes: 9062912 ("module: use g_hash_table_add()") Cc: qemu-stable@nongnu.org Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210316134456.3243102-1-marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
May 6, 2021
Refactor qemu_xkeymap_mapping_table() to have a single exit point, so we can easily free the memory allocated by XGetAtomName(). This fixes when running a binary configured with --enable-sanitizers: Direct leak of 22 byte(s) in 1 object(s) allocated from: #0 0x561344a7473f in malloc (qemu-system-x86_64+0x1dab73f) #1 0x7fa4d9dc08aa in XGetAtomName (/lib64/libX11.so.6+0x2a8aa) Fixes: 2ec7870 ("ui: convert GTK and SDL1 frontends to keycodemapdb") Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210430155009.259755-1-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
rth7680
pushed a commit
that referenced
this pull request
May 20, 2021
Running the WDR opcode triggers a segfault: $ cat > foo.S << EOF > __start: > wdr > EOF $ avr-gcc -nostdlib -nostartfiles -mmcu=avr6 foo.S -o foo.elf $ qemu-system-avr -serial mon:stdio -nographic -no-reboot \ -M mega -bios foo.elf -d in_asm --singlestep IN: 0x00000000: WDR Segmentation fault (core dumped) (gdb) bt #0 0x00005555add0b23a in gdb_get_cpu_pid (cpu=0x5555af5a4af0) at ../gdbstub.c:718 #1 0x00005555add0b2dd in gdb_get_cpu_process (cpu=0x5555af5a4af0) at ../gdbstub.c:743 #2 0x00005555add0e477 in gdb_set_stop_cpu (cpu=0x5555af5a4af0) at ../gdbstub.c:2742 #3 0x00005555adc99b96 in cpu_handle_guest_debug (cpu=0x5555af5a4af0) at ../softmmu/cpus.c:306 qemu#4 0x00005555adcc66ab in rr_cpu_thread_fn (arg=0x5555af5a4af0) at ../accel/tcg/tcg-accel-ops-rr.c:224 qemu#5 0x00005555adefaf12 in qemu_thread_start (args=0x5555af5d9870) at ../util/qemu-thread-posix.c:521 qemu#6 0x00007f692d940ea5 in start_thread () from /lib64/libpthread.so.0 qemu#7 0x00007f692d6699fd in clone () from /lib64/libc.so.6 Since the watchdog peripheral is not implemented, simply log the opcode as unimplemented and keep going. Reported-by: Fred Konrad <konrad@adacore.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: KONRAD Frederic <frederic.konrad@adacore.com> Message-Id: <20210502190900.604292-1-f4bug@amsat.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
rth7680
pushed a commit
that referenced
this pull request
May 20, 2021
This is a partial revert of commits 77542d4 and bc79c87. Usually, an error during initialisation means that the configuration was wrong. Reconnecting won't make the error go away, but just turn the error condition into an endless loop. Avoid this and return errors again. Additionally, calling vhost_user_blk_disconnect() from the chardev event handler could result in use-after-free because none of the initialisation code expects that the device could just go away in the middle. So removing the call fixes crashes in several places. For example, using a num-queues setting that is incompatible with the backend would result in a crash like this (dereferencing dev->opaque, which is already NULL): #0 0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at ../hw/virtio/vhost-user.c:313 #1 0x0000555555d950d3 in qio_channel_fd_source_dispatch (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84 #2 0x00007ffff7b32a9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #3 0x00007ffff7b84a98 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0 qemu#4 0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0 qemu#5 0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402 qemu#6 0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133 qemu#7 0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566 qemu#8 0x0000555555cdd150 in vhost_user_blk_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcf90) at ../hw/block/vhost-user-blk.c:510 qemu#9 0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660 Note that this removes the ability to reconnect during initialisation (but not during operation) when there is no permanent error, but the backend restarts, as the implementation was buggy. This feature can be added back in a follow-up series after changing error paths to distinguish cases where retrying could help from cases with permanent errors. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210429171316.162022-3-kwolf@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
May 30, 2021
…accept destination side: $ build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5902,disable-ticketing -incoming rdma:192.168.1.10:8888 (qemu) migrate_set_capability postcopy-ram on (qemu) dest_init RDMA Device opened: kernel name rocep1s0f0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/rocep1s0f0, transport: (2) Ethernet Segmentation fault (core dumped) (gdb) bt #0 qemu_rdma_accept (rdma=0x0) at ../migration/rdma.c:3272 #1 rdma_accept_incoming_migration (opaque=0x0) at ../migration/rdma.c:3986 #2 0x0000563c9e51f02a in aio_dispatch_handler (ctx=ctx@entry=0x563ca0606010, node=0x563ca12b2150) at ../util/aio-posix.c:329 #3 0x0000563c9e51f752 in aio_dispatch_handlers (ctx=0x563ca0606010) at ../util/aio-posix.c:372 qemu#4 aio_dispatch (ctx=0x563ca0606010) at ../util/aio-posix.c:382 qemu#5 0x0000563c9e4f4d9e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:306 qemu#6 0x00007fe96ef3fa9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 qemu#7 0x0000563c9e4ffeb8 in glib_pollfds_poll () at ../util/main-loop.c:231 qemu#8 os_host_main_loop_wait (timeout=12188789) at ../util/main-loop.c:254 qemu#9 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530 qemu#10 0x0000563c9e3c7211 in qemu_main_loop () at ../softmmu/runstate.c:725 qemu#11 0x0000563c9dfd46fe in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50 The rdma return path will not be created when qemu incoming is starting since migrate_copy() is false at that moment, then a NULL return path rdma was referenced if the user enabled postcopy later. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210525080552.28259-3-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Jun 14, 2021
…ier() In the failover case configuration, virtio_net_device_realize() uses an add_migration_state_change_notifier() to add a state notifier, but this notifier is not removed by the unrealize function when the virtio-net card is unplugged. If the card is unplugged and a migration is started, the notifier is called and as it is not valid anymore QEMU crashes. This patch fixes the problem by adding the remove_migration_state_change_notifier() in virtio_net_device_unrealize(). The problem can be reproduced with: $ qemu-system-x86_64 -enable-kvm -m 1g -M q35 \ -device pcie-root-port,slot=4,id=root1 \ -device pcie-root-port,slot=5,id=root2 \ -device virtio-net-pci,id=net1,mac=52:54:00:6f:55:cc,failover=on,bus=root1 \ -monitor stdio disk.qcow2 (qemu) device_del net1 (qemu) migrate "exec:gzip -c > STATEFILE.gz" Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in () #1 0x0000555555d726d7 in notifier_list_notify (...) at .../util/notify.c:39 #2 0x0000555555842c1a in migrate_fd_connect (...) at .../migration/migration.c:3975 #3 0x0000555555950f7d in migration_channel_connect (...) error@entry=0x0) at .../migration/channel.c:107 qemu#4 0x0000555555910922 in exec_start_outgoing_migration (...) at .../migration/exec.c:42 Reported-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Jun 14, 2021
Commit 3ca1f32 "block: BdrvChildClass: add .get_parent_aio_context handler" introduced new handler and commit 228ca37 "block: drop ctx argument from bdrv_root_attach_child" made a generic use of it. But 3ca1f32 didn't update child_vvfat_qcow. Fix that. Before that fix the command ./build/qemu-system-x86_64 -usb -device usb-storage,drive=fat16 \ -drive file=fat:rw:fat-type=16:"<path of a host folder>",id=fat16,format=raw,if=none crashes: 1 bdrv_child_get_parent_aio_context (c=0x559d62426d20) at ../block.c:1440 2 bdrv_attach_child_common (child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target", child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, perm=3, shared_perm=4, opaque=0x559d62445690, child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60) at ../block.c:2795 3 bdrv_attach_child_noperm (parent_bs=0x559d62445690, child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target", child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60) at ../block.c:2855 4 bdrv_attach_child (parent_bs=0x559d62445690, child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target", child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, errp=0x7ffc74c2ae60) at ../block.c:2953 5 bdrv_open_child (filename=0x559d62464b80 "/var/tmp/vl.h3TIS4", options=0x559d6246ec20, bdref_key=0x559d606f9e3d "write-target", parent=0x559d62445690, child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, allow_none=false, errp=0x7ffc74c2ae60) at ../block.c:3351 6 enable_write_target (bs=0x559d62445690, errp=0x7ffc74c2ae60) at ../block/vvfat.c:3176 7 vvfat_open (bs=0x559d62445690, options=0x559d6244adb0, flags=155650, errp=0x7ffc74c2ae60) at ../block/vvfat.c:1236 8 bdrv_open_driver (bs=0x559d62445690, drv=0x559d60d4f7e0 <bdrv_vvfat>, node_name=0x0, options=0x559d6244adb0, open_flags=155650, errp=0x7ffc74c2af70) at ../block.c:1557 9 bdrv_open_common (bs=0x559d62445690, file=0x0, options=0x559d6244adb0, errp=0x7ffc74c2af70) at ... (gdb) fr 1 #1 0x0000559d603ea3bf in bdrv_child_get_parent_aio_context (c=0x559d62426d20) at ../block.c:1440 1440 return c->klass->get_parent_aio_context(c); (gdb) p c->klass $1 = (const BdrvChildClass *) 0x559d60c58d20 <child_vvfat_qcow> (gdb) p c->klass->get_parent_aio_context $2 = (AioContext *(*)(BdrvChild *)) 0x0 Fixes: 3ca1f32 Fixes: 228ca37 Reported-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210524101257.119377-2-vsementsov@virtuozzo.com> Tested-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Jun 14, 2021
This patch fixes the following: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f6ae4559859 in __GI_abort () at abort.c:79 #2 0x0000559aaa386720 in error_exit (err=16, msg=0x559aaa5973d0 <__func__.16227> "qemu_mutex_destroy") at util/qemu-thread-posix.c:36 #3 0x0000559aaa3868c5 in qemu_mutex_destroy (mutex=0x559aabffe828) at util/qemu-thread-posix.c:69 qemu#4 0x0000559aaa2f93a8 in char_finalize (obj=0x559aabffe800) at chardev/char.c:285 qemu#5 0x0000559aaa23318a in object_deinit (obj=0x559aabffe800, type=0x559aabfd7d20) at qom/object.c:606 qemu#6 0x0000559aaa2331b8 in object_deinit (obj=0x559aabffe800, type=0x559aabfd9060) at qom/object.c:610 qemu#7 0x0000559aaa233200 in object_finalize (data=0x559aabffe800) at qom/object.c:620 qemu#8 0x0000559aaa234202 in object_unref (obj=0x559aabffe800) at qom/object.c:1074 qemu#9 0x0000559aaa2356b6 in object_finalize_child_property (obj=0x559aac0dac10, name=0x559aac778760 "compare0-0", opaque=0x559aabffe800) at qom/object.c:1584 qemu#10 0x0000559aaa232f70 in object_property_del_all (obj=0x559aac0dac10) at qom/object.c:557 qemu#11 0x0000559aaa2331ed in object_finalize (data=0x559aac0dac10) at qom/object.c:619 qemu#12 0x0000559aaa234202 in object_unref (obj=0x559aac0dac10) at qom/object.c:1074 qemu#13 0x0000559aaa2356b6 in object_finalize_child_property (obj=0x559aac0c75c0, name=0x559aac0dadc0 "chardevs", opaque=0x559aac0dac10) at qom/object.c:1584 qemu#14 0x0000559aaa233071 in object_property_del_child (obj=0x559aac0c75c0, child=0x559aac0dac10, errp=0x0) at qom/object.c:580 qemu#15 0x0000559aaa233155 in object_unparent (obj=0x559aac0dac10) at qom/object.c:599 qemu#16 0x0000559aaa2fb721 in qemu_chr_cleanup () at chardev/char.c:1159 qemu#17 0x0000559aa9f9b110 in main (argc=54, argv=0x7ffeb62fa998, envp=0x7ffeb62fab50) at vl.c:4539 When chardev is cleaned up, chr_write_lock needs to be destroyed. But the colo-compare module is not cleaned up normally before it when the guest poweroff. It is holding chr_write_lock at this time. This will cause qemu crash.So we add the function of colo_compare_cleanup() before qemu_chr_cleanup() to fix the bug. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Jun 18, 2021
While the SB16 seems to work up to 48000 Hz, the "Sound Blaster Series Hardware Programming Guide" limit the sampling range from 4000 Hz to 44100 Hz (Section 3-9, 3-10: Digitized Sound I/O Programming, tables 3-2 and 3-3). Later, section 6-15 (DSP Commands) is more specific regarding the 41h / 42h registers (Set digitized sound output sampling rate): Valid sampling rates range from 5000 to 45000 Hz inclusive. There is no comment regarding error handling if the register is filled with an out-of-range value. (See also section 3-28 "8-bit or 16-bit Auto-initialize Transfer"). Assume limits are enforced in hardware. This fixes triggering an assertion in audio_calloc(): #1 abort #2 audio_bug audio/audio.c:119:9 #3 audio_calloc audio/audio.c:154:9 qemu#4 audio_pcm_sw_alloc_resources_out audio/audio_template.h:116:15 qemu#5 audio_pcm_sw_init_out audio/audio_template.h:175:11 qemu#6 audio_pcm_create_voice_pair_out audio/audio_template.h:410:9 qemu#7 AUD_open_out audio/audio_template.h:503:14 qemu#8 continue_dma8 hw/audio/sb16.c:216:20 qemu#9 dma_cmd8 hw/audio/sb16.c:276:5 qemu#10 command hw/audio/sb16.c:0 qemu#11 dsp_write hw/audio/sb16.c:949:13 qemu#12 portio_write softmmu/ioport.c:205:13 qemu#13 memory_region_write_accessor softmmu/memory.c:491:5 qemu#14 access_with_adjusted_size softmmu/memory.c:552:18 qemu#15 memory_region_dispatch_write softmmu/memory.c:0:13 qemu#16 flatview_write_continue softmmu/physmem.c:2759:23 qemu#17 flatview_write softmmu/physmem.c:2799:14 qemu#18 address_space_write softmmu/physmem.c:2891:18 qemu#19 cpu_outw softmmu/ioport.c:70:5 [*] http://www.baudline.com/solutions/full_duplex/sb16_pci/index.html OSS-Fuzz Report: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29174 Fixes: 85571bc ("audio merge (malc)") Buglink: https://bugs.launchpad.net/bugs/1910603 Tested-by: Qiang Liu <cyruscyliu@gmail.com> Reviewed-by: Qiang Liu <cyruscyliu@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20210616104349.2398060-1-f4bug@amsat.org> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Jul 14, 2021
backtrace: '0x00007ffff5f44ec2 in __ibv_dereg_mr_1_1 (mr=0x7fff1007d390) at /home/lizhijian/rdma-core/libibverbs/verbs.c:478 478 void *addr = mr->addr; (gdb) bt #0 0x00007ffff5f44ec2 in __ibv_dereg_mr_1_1 (mr=0x7fff1007d390) at /home/lizhijian/rdma-core/libibverbs/verbs.c:478 #1 0x0000555555891fcc in rdma_delete_block (block=<optimized out>, rdma=0x7fff38176010) at ../migration/rdma.c:691 #2 qemu_rdma_cleanup (rdma=0x7fff38176010) at ../migration/rdma.c:2365 #3 0x00005555558925b0 in qio_channel_rdma_close_rcu (rcu=0x555556b8b6c0) at ../migration/rdma.c:3073 qemu#4 0x0000555555d652a3 in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:281 qemu#5 0x0000555555d5edf9 in qemu_thread_start (args=0x7fffe88bb4d0) at ../util/qemu-thread-posix.c:541 qemu#6 0x00007ffff54c73f9 in start_thread () at /lib64/libpthread.so.0 qemu#7 0x00007ffff53f3b03 in clone () at /lib64/libc.so.6 ' Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210708144521.1959614-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Jul 26, 2021
When building the Pegasos2 machine stand-alone we get: $ qemu-system-ppc -M pegasos2 -bios pegasos2.rom ERROR:qom/object.c:714:object_new_with_type: assertion failed: (type != NULL) Bail out! ERROR:qom/object.c:714:object_new_with_type: assertion failed: (type != NULL) Looking at the backtraces: Thread 1 "qemu-system-ppc" received signal SIGABRT, Aborted. (gdb) bt #0 0x00007ffff53877d5 in raise () at /lib64/libc.so.6 #1 0x00007ffff5370895 in abort () at /lib64/libc.so.6 #2 0x00007ffff6dc4b6c in g_assertion_message_expr.cold () at /lib64/libglib-2.0.so.0 #3 0x00007ffff6e229ff in g_assertion_message_expr () at /lib64/libglib-2.0.so.0 qemu#4 0x0000555555a0c8f4 in object_new_with_type (type=0x0) at qom/object.c:714 qemu#5 0x0000555555a0c9d5 in object_new (typename=0x555555c7afe4 "isa-pit") at qom/object.c:747 qemu#6 0x0000555555a053b8 in qdev_new (name=0x555555c7afe4 "isa-pit") at hw/core/qdev.c:153 qemu#7 0x00005555557cdd05 in isa_new (name=0x555555c7afe4 "isa-pit") at hw/isa/isa-bus.c:160 qemu#8 0x00005555557cf518 in i8254_pit_init (bus=0x55555603d140, base=64, isa_irq=0, alt_irq=0x0) at include/hw/timer/i8254.h:54 qemu#9 0x00005555557d12f9 in vt8231_realize (d=0x5555563d9770, errp=0x7fffffffcc28) at hw/isa/vt82c686.c:704 (gdb) bt #0 0x00007ffff54bd7d5 in raise () at /lib64/libc.so.6 #1 0x00007ffff54a6895 in abort () at /lib64/libc.so.6 #2 0x00005555558f7796 in object_new (typename=0x555555ad4889 "isa-parallel") at qom/object.c:749 #3 object_new (typename=type0x555555ad4889 "isa-parallel") at qom/object.c:743 qemu#4 0x00005555558f0d46 in qdev_new (name=0x555555ad4889 "isa-parallel") at hw/core/qdev.c:153 qemu#5 0x000055555576b669 in isa_new (name=0x555555ad4889 "isa-parallel") at hw/isa/isa-bus.c:160 qemu#6 0x000055555576bbe8 in isa_superio_realize (dev=0x555555f15910, errp=<optimized out>) at hw/isa/isa-superio.c:54 qemu#7 0x000055555576d5ed in via_superio_realize (d=0x555555f15910, errp=0x7fffffffcb30) at hw/isa/vt82c686.c:292 qemu#8 0x00005555558f12c1 in device_set_realized (obj=<optimized out>, ...) at hw/core/qdev.c:761 qemu#9 0x00005555558f5066 in property_set_bool (obj=0x555555f15910, ..., errp=0x7fffffffcbb0) at qom/object.c:2262 qemu#10 0x00005555558f7f38 in object_property_set (obj=0x555555f15910, name=0x555555b1b1e3 "realized", ...) at qom/object.c:1407 qemu#11 0x00005555558fb2d0 in object_property_set_qobject (obj=0x555555f15910, name=0x555555b1b1e3 "realized", ...) at qom/qom-qobject.c:28 qemu#12 0x00005555558f8525 in object_property_set_bool (obj=0x555555f15910, name=0x555555b1b1e3 "realized", ...) at qom/object.c:1477 qemu#13 0x00005555558f18ee in qdev_realize (dev=0x555555f15910, bus=0x55555602a610, errp=<optimized out>) at hw/core/qdev.c:389 qemu#14 0x00005555558f197f in qdev_realize_and_unref (dev=0x555555f15910, bus=0x55555602a610, errp=<optimized out>) at hw/core/qdev.c:396 qemu#15 0x000055555576b709 in isa_realize_and_unref (errp=<optimized out>, bus=0x55555602a610, dev=0x555555f15910) at hw/isa/isa-bus.c:179 qemu#16 isa_create_simple (bus=0x55555602a610, name=0x555555adc33b "vt8231-superio") at hw/isa/isa-bus.c:173 qemu#17 0x000055555576d9b7 in vt8231_realize (d=0x555556186a50, errp=<optimized out>) at hw/isa/vt82c686.c:706 The "isa-pit" type (TYPE_I8254) and "isa-parallel" are missing. Add them. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu> Reviewed-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210515173716.358295-12-philmd@redhat.com> [PMD: Added "isa-parallel" later]
rth7680
pushed a commit
that referenced
this pull request
Sep 14, 2021
Instantiate SAI1/2/3 and ASRC as unimplemented devices to avoid random Linux kernel crashes, such as Unhandled fault: external abort on non-linefetch (0x808) at 0xd1580010 pgd = (ptrval) [d1580010] *pgd=8231b811, *pte=02034653, *ppte=02034453 Internal error: : 808 [#1] SMP ARM ... [<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54) [<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0) [<c09580f4>] (_regmap_write) from [<c095837c>] (_regmap_update_bits+0xe4/0xec) [<c095837c>] (_regmap_update_bits) from [<c09599b4>] (regmap_update_bits_base+0x50/0x74) [<c09599b4>] (regmap_update_bits_base) from [<c0d3e9e4>] (fsl_asrc_runtime_resume+0x1e4/0x21c) [<c0d3e9e4>] (fsl_asrc_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108) [<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64) [<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808) [<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0) [<c0942dfc>] (__pm_runtime_resume) from [<c0d3ecc4>] (fsl_asrc_probe+0x2a8/0x708) [<c0d3ecc4>] (fsl_asrc_probe) from [<c0935b08>] (platform_probe+0x58/0xb8) [<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334) [<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138) [<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8) [<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130) [<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8) [<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8) [<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118) [<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4) [<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c) [<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128) [<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38) or Unhandled fault: external abort on non-linefetch (0x808) at 0xd19b0000 pgd = (ptrval) [d19b0000] *pgd=82711811, *pte=308a0653, *ppte=308a0453 Internal error: : 808 [#1] SMP ARM ... [<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54) [<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0) [<c09580f4>] (_regmap_write) from [<c0959b28>] (regmap_write+0x3c/0x60) [<c0959b28>] (regmap_write) from [<c0d41130>] (fsl_sai_runtime_resume+0x9c/0x1ec) [<c0d41130>] (fsl_sai_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108) [<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64) [<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808) [<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0) [<c0942dfc>] (__pm_runtime_resume) from [<c0d4231c>] (fsl_sai_probe+0x2b8/0x65c) [<c0d4231c>] (fsl_sai_probe) from [<c0935b08>] (platform_probe+0x58/0xb8) [<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334) [<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138) [<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8) [<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130) [<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8) [<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8) [<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118) [<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4) [<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c) [<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128) [<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38) Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Message-id: 20210810160318.87376-1-linux@roeck-us.net Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
rth7680
pushed a commit
that referenced
this pull request
Sep 14, 2021
Instantiate SAI1/2/3 as unimplemented devices to avoid Linux kernel crashes such as the following. Unhandled fault: external abort on non-linefetch (0x808) at 0xd19b0000 pgd = (ptrval) [d19b0000] *pgd=82711811, *pte=308a0653, *ppte=308a0453 Internal error: : 808 [#1] SMP ARM Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc5 #1 ... [<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54) [<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0) [<c09580f4>] (_regmap_write) from [<c0959b28>] (regmap_write+0x3c/0x60) [<c0959b28>] (regmap_write) from [<c0d41130>] (fsl_sai_runtime_resume+0x9c/0x1ec) [<c0d41130>] (fsl_sai_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108) [<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64) [<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808) [<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0) [<c0942dfc>] (__pm_runtime_resume) from [<c0d4231c>] (fsl_sai_probe+0x2b8/0x65c) [<c0d4231c>] (fsl_sai_probe) from [<c0935b08>] (platform_probe+0x58/0xb8) [<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334) [<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138) [<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8) [<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130) [<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8) [<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8) [<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118) [<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4) [<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c) [<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128) [<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38) Signed-off-by: Guenter Roeck <linux@roeck-us.net> Message-id: 20210810175607.538090-1-linux@roeck-us.net Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
rth7680
pushed a commit
that referenced
this pull request
Sep 17, 2021
In mirror_iteration() we call mirror_wait_on_conflicts() with `self` parameter set to NULL. Starting from commit d44dae1 we dereference `self` pointer in mirror_wait_on_conflicts() without checks if it is not NULL. Backtrace: Program terminated with signal SIGSEGV, Segmentation fault. #0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>) at ../block/mirror.c:172 172 self->waiting_for_op = op; [Current thread is 1 (Thread 0x7f0908931ec0 (LWP 380249))] (gdb) bt #0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>) at ../block/mirror.c:172 #1 0x00005610c5d9d631 in mirror_run (job=0x5610c76a2c00, errp=<optimized out>) at ../block/mirror.c:491 #2 0x00005610c5d58726 in job_co_entry (opaque=0x5610c76a2c00) at ../job.c:917 #3 0x00005610c5f046c6 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:173 qemu#4 0x00007f0909975820 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /usr/lib64/libc.so.6 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001404 Fixes: d44dae1 ("block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210910124533.288318-1-sgarzare@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Oct 11, 2021
Both qemu and qemu-img use writeback cache mode by default, which is already documented in qemu(1). qemu-nbd uses writethrough cache mode by default, and the default cache mode is not documented. According to the qemu-nbd(8): --cache=CACHE The cache mode to be used with the file. See the documentation of the emulator's -drive cache=... option for allowed values. qemu(1) says: The default mode is cache=writeback. So users have no reason to assume that qemu-nbd is using writethough cache mode. The only hint is the painfully slow writing when using the defaults. Looking in git history, it seems that qemu used writethrough in the past to support broken guests that did not flush data properly, or could not flush due to limitations in qemu. But qemu-nbd clients can use NBD_CMD_FLUSH to flush data, so using writethrough does not help anyone. Change the default cache mode to writback, and document the default and available values properly in the online help and manual. With this change converting image via qemu-nbd is 3.5 times faster. $ qemu-img create dst.img 50g $ qemu-nbd -t -f raw -k /tmp/nbd.sock dst.img Before this change: $ hyperfine -r3 "./qemu-img convert -p -f raw -O raw -T none -W fedora34.img nbd+unix:///?socket=/tmp/nbd.sock" Benchmark #1: ./qemu-img convert -p -f raw -O raw -T none -W fedora34.img nbd+unix:///?socket=/tmp/nbd.sock Time (mean ± σ): 83.639 s ± 5.970 s [User: 2.733 s, System: 6.112 s] Range (min … max): 76.749 s … 87.245 s 3 runs After this change: $ hyperfine -r3 "./qemu-img convert -p -f raw -O raw -T none -W fedora34.img nbd+unix:///?socket=/tmp/nbd.sock" Benchmark #1: ./qemu-img convert -p -f raw -O raw -T none -W fedora34.img nbd+unix:///?socket=/tmp/nbd.sock Time (mean ± σ): 23.522 s ± 0.433 s [User: 2.083 s, System: 5.475 s] Range (min … max): 23.234 s … 24.019 s 3 runs Users can avoid the issue by using --cache=writeback[1] but the defaults should give good performance for the common use case. [1] https://bugzilla.redhat.com/1990656 Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-Id: <20210813205519.50518-1-nsoffer@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Nov 17, 2021
To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org> CC: Li Zhijian <lizhijian@cn.fujitsu.com> Date: Sat, 31 Jul 2021 22:05:51 +0800 (5 weeks, 4 days, 17 hours ago) multifd with unsupported protocol will cause a segment fault. (gdb) bt #0 0x0000563b4a93faf8 in socket_connect (addr=0x0, errp=0x7f7f02675410) at ../util/qemu-sockets.c:1190 #1 0x0000563b4a797a03 in qio_channel_socket_connect_sync (ioc=0x563b4d16e8c0, addr=0x0, errp=0x7f7f02675410) at ../io/channel-socket.c:145 #2 0x0000563b4a797abf in qio_channel_socket_connect_worker (task=0x563b4cd86c30, opaque=0x0) at ../io/channel-socket.c:168 #3 0x0000563b4a792631 in qio_task_thread_worker (opaque=0x563b4cd86c30) at ../io/task.c:124 qemu#4 0x0000563b4a91da69 in qemu_thread_start (args=0x563b4c44bb80) at ../util/qemu-thread-posix.c:541 qemu#5 0x00007f7fe9b5b3f9 in ?? () qemu#6 0x0000000000000000 in ?? () It's enough to check migrate_multifd_is_allowed() in multifd cleanup() and multifd setup() though there are so many other places using migrate_use_multifd(). Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Nov 17, 2021
PCI resource reserve capability should use LE format as all other PCI things. If we don't then seabios won't boot: === PCI new allocation pass #1 === PCI: check devices PCI: QEMU resource reserve cap: size 10000000000000 type io PCI: secondary bus 1 size 10000000000000 type io PCI: secondary bus 1 size 00200000 type mem PCI: secondary bus 1 size 00200000 type prefmem === PCI new allocation pass #2 === PCI: out of I/O address space This became more important since we started reserving IO by default, previously no one noticed. Fixes: e2a6290 ("hw/pcie-root-port: Fix hotplug for PCI devices requiring IO") Cc: marcel.apfelbaum@gmail.com Fixes: 226263f ("hw/pci: add QEMU-specific PCI capability to the Generic PCI Express Root Port") Cc: zuban32s@gmail.com Fixes: 6755e61 ("hw/pci: add PCI resource reserve capability to legacy PCI bridge") Cc: jing2.liu@linux.intel.com Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Nov 17, 2021
This patch fixed as follows: Thread 1 (Thread 0x7f34ee738d80 (LWP 11212)): #0 __pthread_clockjoin_ex (threadid=139847152957184, thread_return=0x7f30b1febf30, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145 #1 0x0000563401998e36 in qemu_thread_join (thread=0x563402d66610) at util/qemu-thread-posix.c:587 #2 0x00005634017a79fa in process_incoming_migration_co (opaque=0x0) at migration/migration.c:502 #3 0x00005634019b59c9 in coroutine_trampoline (i0=63395504, i1=22068) at util/coroutine-ucontext.c:115 qemu#4 0x00007f34ef860660 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /lib/x86_64-linux-gnu/libc.so.6 qemu#5 0x00007f30b21ee730 in ?? () qemu#6 0x0000000000000000 in ?? () Thread 13 (Thread 0x7f30b3dff700 (LWP 11747)): #0 __lll_lock_wait (futex=futex@entry=0x56340218ffa0 <qemu_global_mutex>, private=0) at lowlevellock.c:52 #1 0x00007f34efa000a3 in _GI__pthread_mutex_lock (mutex=0x56340218ffa0 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80 #2 0x0000563401997f99 in qemu_mutex_lock_impl (mutex=0x56340218ffa0 <qemu_global_mutex>, file=0x563401b7a80e "migration/colo.c", line=806) at util/qemu-thread-posix.c:78 #3 0x0000563401407144 in qemu_mutex_lock_iothread_impl (file=0x563401b7a80e "migration/colo.c", line=806) at /home/workspace/colo-qemu/cpus.c:1899 qemu#4 0x00005634017ba8e8 in colo_process_incoming_thread (opaque=0x563402d664c0) at migration/colo.c:806 qemu#5 0x0000563401998b72 in qemu_thread_start (args=0x5634039f8370) at util/qemu-thread-posix.c:519 qemu#6 0x00007f34ef9fd609 in start_thread (arg=<optimized out>) at pthread_create.c:477 qemu#7 0x00007f34ef924293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 The QEMU main thread is holding the lock: (gdb) p qemu_global_mutex $1 = {lock = {_data = {lock = 2, __count = 0, __owner = 11212, __nusers = 9, __kind = 0, __spins = 0, __elision = 0, __list = {_prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\314+\000\000\t", '\000' <repeats 26 times>, __align = 2}, file = 0x563401c07e4b "util/main-loop.c", line = 240, initialized = true} >From the call trace, we can see it is a deadlock bug. and the QEMU main thread holds the global mutex to wait until the COLO thread ends. and the colo thread wants to acquire the global mutex, which will cause a deadlock. So, we should release the qemu_global_mutex before waiting colo thread ends. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Nov 17, 2021
When trying to use the pc-dimm device on a non-NUMA machine, we get: $ qemu-system-arm -M none -cpu max -S \ -object memory-backend-file,id=mem1,size=1M,mem-path=/tmp/1m \ -device pc-dimm,id=dimm1,memdev=mem1 Segmentation fault (core dumped) (gdb) bt #0 pc_dimm_realize (dev=0x555556da3e90, errp=0x7fffffffcd10) at hw/mem/pc-dimm.c:184 #1 0x0000555555fe1f8f in device_set_realized (obj=0x555556da3e90, value=true, errp=0x7fffffffce18) at hw/core/qdev.c:531 #2 0x0000555555feb4a9 in property_set_bool (obj=0x555556da3e90, v=0x555556e54420, name=0x5555563c3c41 "realized", opaque=0x555556a704f0, errp=0x7fffffffce18) at qom/object.c:2257 To avoid that crash, restrict the pc-dimm NUMA check to machines supporting NUMA, and do not allow the use of 'node' property on non-NUMA machines. Suggested-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20211106145016.611332-1-f4bug@amsat.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Dec 14, 2021
Without the previous commit, when running 'make check-qtest-i386' with QEMU configured with '--enable-sanitizers' we get: AddressSanitizer:DEADLYSIGNAL ================================================================= ==287878==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000344 ==287878==The signal is caused by a WRITE memory access. ==287878==Hint: address points to the zero page. #0 0x564b2e5bac27 in blk_inc_in_flight block/block-backend.c:1346:5 #1 0x564b2e5bb228 in blk_pwritev_part block/block-backend.c:1317:5 #2 0x564b2e5bcd57 in blk_pwrite block/block-backend.c:1498:11 #3 0x564b2ca1cdd3 in fdctrl_write_data hw/block/fdc.c:2221:17 qemu#4 0x564b2ca1b2f7 in fdctrl_write hw/block/fdc.c:829:9 qemu#5 0x564b2dc49503 in portio_write softmmu/ioport.c:201:9 Add the reproducer for CVE-2021-20196. Suggested-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20211124161536.631563-4-philmd@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Feb 23, 2022
e.g. 1109 15:16:20.151506 Uninitialized bytes in ioctl_common_pre at offset 0 inside [0x7ffc516af9b8, 4) 1109 15:16:20.151659 ==588974==WARNING: MemorySanitizer: use-of-uninitialized-value 1109 15:16:20.312923 #0 0x5639b88acb21 in tap_probe_vnet_hdr_len third_party/qemu/net/tap-linux.c:183:9 1109 15:16:20.312952 #1 0x5639b88afd66 in net_tap_fd_init third_party/qemu/net/tap.c:409:9 1109 15:16:20.312954 #2 0x5639b88b2d1b in net_init_tap_one third_party/qemu/net/tap.c:681:19 1109 15:16:20.312956 #3 0x5639b88b16a8 in net_init_tap third_party/qemu/net/tap.c:912:13 1109 15:16:20.312957 qemu#4 0x5639b8890175 in net_client_init1 third_party/qemu/net/net.c:1110:9 1109 15:16:20.312958 qemu#5 0x5639b888f912 in net_client_init third_party/qemu/net/net.c:1208:15 1109 15:16:20.312960 qemu#6 0x5639b8894aa5 in net_param_nic third_party/qemu/net/net.c:1588:11 1109 15:16:20.312961 qemu#7 0x5639b900cd18 in qemu_opts_foreach third_party/qemu/util/qemu-option.c:1135:14 1109 15:16:20.312962 qemu#8 0x5639b889393c in net_init_clients third_party/qemu/net/net.c:1612:9 1109 15:16:20.312964 qemu#9 0x5639b717aaf3 in qemu_create_late_backends third_party/qemu/softmmu/vl.c:1962:5 1109 15:16:20.312965 qemu#10 0x5639b717aaf3 in qemu_init third_party/qemu/softmmu/vl.c:3694:5 1109 15:16:20.312967 qemu#11 0x5639b71083b8 in main third_party/qemu/softmmu/main.c:49:5 1109 15:16:20.312968 qemu#12 0x7f464de1d8d2 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x628d2) 1109 15:16:20.312969 qemu#13 0x5639b6bbd389 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120 1109 15:16:20.312970 1109 15:16:20.312975 Uninitialized value was stored to memory at 1109 15:16:20.313393 #0 0x5639b88acbee in tap_probe_vnet_hdr_len third_party/qemu/net/tap-linux.c 1109 15:16:20.313396 #1 0x5639b88afd66 in net_tap_fd_init third_party/qemu/net/tap.c:409:9 1109 15:16:20.313398 #2 0x5639b88b2d1b in net_init_tap_one third_party/qemu/net/tap.c:681:19 1109 15:16:20.313399 #3 0x5639b88b16a8 in net_init_tap third_party/qemu/net/tap.c:912:13 1109 15:16:20.313400 qemu#4 0x5639b8890175 in net_client_init1 third_party/qemu/net/net.c:1110:9 1109 15:16:20.313401 qemu#5 0x5639b888f912 in net_client_init third_party/qemu/net/net.c:1208:15 1109 15:16:20.313403 qemu#6 0x5639b8894aa5 in net_param_nic third_party/qemu/net/net.c:1588:11 1109 15:16:20.313404 qemu#7 0x5639b900cd18 in qemu_opts_foreach third_party/qemu/util/qemu-option.c:1135:14 1109 15:16:20.313405 qemu#8 0x5639b889393c in net_init_clients third_party/qemu/net/net.c:1612:9 1109 15:16:20.313407 qemu#9 0x5639b717aaf3 in qemu_create_late_backends third_party/qemu/softmmu/vl.c:1962:5 1109 15:16:20.313408 qemu#10 0x5639b717aaf3 in qemu_init third_party/qemu/softmmu/vl.c:3694:5 1109 15:16:20.313409 qemu#11 0x5639b71083b8 in main third_party/qemu/softmmu/main.c:49:5 1109 15:16:20.313410 qemu#12 0x7f464de1d8d2 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x628d2) 1109 15:16:20.313412 qemu#13 0x5639b6bbd389 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120 1109 15:16:20.313413 1109 15:16:20.313417 Uninitialized value was stored to memory at 1109 15:16:20.313791 #0 0x5639b88affbd in net_tap_fd_init third_party/qemu/net/tap.c:400:26 1109 15:16:20.313826 #1 0x5639b88b2d1b in net_init_tap_one third_party/qemu/net/tap.c:681:19 1109 15:16:20.313829 #2 0x5639b88b16a8 in net_init_tap third_party/qemu/net/tap.c:912:13 1109 15:16:20.313831 #3 0x5639b8890175 in net_client_init1 third_party/qemu/net/net.c:1110:9 1109 15:16:20.313836 qemu#4 0x5639b888f912 in net_client_init third_party/qemu/net/net.c:1208:15 1109 15:16:20.313838 qemu#5 0x5639b8894aa5 in net_param_nic third_party/qemu/net/net.c:1588:11 1109 15:16:20.313839 qemu#6 0x5639b900cd18 in qemu_opts_foreach third_party/qemu/util/qemu-option.c:1135:14 1109 15:16:20.313841 qemu#7 0x5639b889393c in net_init_clients third_party/qemu/net/net.c:1612:9 1109 15:16:20.313843 qemu#8 0x5639b717aaf3 in qemu_create_late_backends third_party/qemu/softmmu/vl.c:1962:5 1109 15:16:20.313844 qemu#9 0x5639b717aaf3 in qemu_init third_party/qemu/softmmu/vl.c:3694:5 1109 15:16:20.313845 qemu#10 0x5639b71083b8 in main third_party/qemu/softmmu/main.c:49:5 1109 15:16:20.313846 qemu#11 0x7f464de1d8d2 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x628d2) 1109 15:16:20.313847 qemu#12 0x5639b6bbd389 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120 1109 15:16:20.313849 1109 15:16:20.313851 Uninitialized value was created by an allocation of 'ifr' in the stack frame of function 'tap_probe_vnet_hdr' 1109 15:16:20.313855 #0 0x5639b88ac680 in tap_probe_vnet_hdr third_party/qemu/net/tap-linux.c:151 1109 15:16:20.313856 1109 15:16:20.313878 SUMMARY: MemorySanitizer: use-of-uninitialized-value third_party/qemu/net/tap-linux.c:183:9 in tap_probe_vnet_hdr_len Fixes: dc69004 ("net: move tap_probe_vnet_hdr() to tap-linux.c") Reviewed-by: Hao Wu <wuhaotsh@google.com> Reviewed-by: Patrick Venture <venture@google.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Peter Foley <pefoley@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Feb 23, 2022
`struct dirent' returned from readdir(3) could be shorter (or longer) than `sizeof(struct dirent)', thus memcpy of sizeof length will overread into unallocated page causing SIGSEGV. Example stack trace: #0 0x00005555559ebeed v9fs_co_readdir_many (/usr/bin/qemu-system-x86_64 + 0x497eed) #1 0x00005555559ec2e9 v9fs_readdir (/usr/bin/qemu-system-x86_64 + 0x4982e9) #2 0x0000555555eb7983 coroutine_trampoline (/usr/bin/qemu-system-x86_64 + 0x963983) #3 0x00007ffff73e0be0 n/a (n/a + 0x0) While fixing this, provide a helper for any future `struct dirent' cloning. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/841 Cc: qemu-stable@nongnu.org Co-authored-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Vitaly Chikunov <vt@altlinux.org> Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Acked-by: Greg Kurz <groug@kaod.org> Tested-by: Vitaly Chikunov <vt@altlinux.org> Message-Id: <20220216181821.3481527-1-vt@altlinux.org> [C.S. - Fix typo in source comment. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
rth7680
pushed a commit
that referenced
this pull request
Jan 16, 2023
This is below memleak detected when to quit the qemu-system-x86_64 (with vhost-scsi-pci). (qemu) quit ================================================================= ==15568==ERROR: LeakSanitizer: detected memory leaks Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f00aec57917 in __interceptor_calloc (/lib64/libasan.so.6+0xb4917) #1 0x7f00ada0d7b5 in g_malloc0 (/lib64/libglib-2.0.so.0+0x517b5) #2 0x5648ffd38bac in vhost_scsi_start ../hw/scsi/vhost-scsi.c:92 #3 0x5648ffd38d52 in vhost_scsi_set_status ../hw/scsi/vhost-scsi.c:131 qemu#4 0x5648ffda340e in virtio_set_status ../hw/virtio/virtio.c:2036 qemu#5 0x5648ff8de281 in virtio_ioport_write ../hw/virtio/virtio-pci.c:431 qemu#6 0x5648ff8deb29 in virtio_pci_config_write ../hw/virtio/virtio-pci.c:576 qemu#7 0x5648ffe5c0c2 in memory_region_write_accessor ../softmmu/memory.c:493 qemu#8 0x5648ffe5c424 in access_with_adjusted_size ../softmmu/memory.c:555 qemu#9 0x5648ffe6428f in memory_region_dispatch_write ../softmmu/memory.c:1515 qemu#10 0x5648ffe8613d in flatview_write_continue ../softmmu/physmem.c:2825 qemu#11 0x5648ffe86490 in flatview_write ../softmmu/physmem.c:2867 qemu#12 0x5648ffe86d9f in address_space_write ../softmmu/physmem.c:2963 qemu#13 0x5648ffe86e57 in address_space_rw ../softmmu/physmem.c:2973 qemu#14 0x5648fffbfb3d in kvm_handle_io ../accel/kvm/kvm-all.c:2639 qemu#15 0x5648fffc0e0d in kvm_cpu_exec ../accel/kvm/kvm-all.c:2890 qemu#16 0x5648fffc90a7 in kvm_vcpu_thread_fn ../accel/kvm/kvm-accel-ops.c:51 qemu#17 0x56490042400a in qemu_thread_start ../util/qemu-thread-posix.c:505 qemu#18 0x7f00ac3b6ea4 in start_thread (/lib64/libpthread.so.0+0x7ea4) Free the vsc->inflight at the 'stop' path. Fixes: b82526c ("vhost-scsi: support inflight io track") Cc: Joe Jin <joe.jin@oracle.com> Cc: Li Feng <fengli@smartx.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Message-Id: <20230104160433.21353-1-dongli.zhang@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
May 25, 2023
This reverts commit b320e21, which accidentally broke TCG, because it made the TCG -cpu max report the presence of MTE to the guest even if the board hadn't enabled MTE by wiring up the tag RAM. This meant that if the guest then tried to use MTE QEMU would segfault accessing the non-existent tag RAM: ==346473==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address (pc 0x55f328952a4a bp 0x00000213a400 sp 0x7f7871859b80 T346476) ==346473==The signal is caused by a READ memory access. ==346473==Hint: this fault was caused by a dereference of a high value address (see register values below). Disassemble the provided pc to learn which register was used. #0 0x55f328952a4a in address_space_to_flatview /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/exec/memory.h:1108:12 #1 0x55f328952a4a in address_space_translate /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/exec/memory.h:2797:31 #2 0x55f328952a4a in allocation_tag_mem /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../target/arm/tcg/mte_helper.c:176:10 #3 0x55f32895366c in helper_stgm /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../target/arm/tcg/mte_helper.c:461:15 qemu#4 0x7f782431a293 (<unknown module>) It's also not clear that the KVM logic is correct either: MTE defaults to on there, rather than being only on if the board wants it on. Revert the whole commit for now so we can sort out the issues. (We didn't catch this in CI because we have no test cases in avocado that use guests with MTE support.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20230519145808.348701-1-peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
rth7680
pushed a commit
that referenced
this pull request
May 25, 2023
…moryRegions Currently when portio_list MemoryRegions are freed using portio_list_destroy() the RCU thread segfaults generating a backtrace similar to that below: #0 0x5555599a34b6 in phys_section_destroy ../softmmu/physmem.c:996 #1 0x5555599a37a3 in phys_sections_free ../softmmu/physmem.c:1011 #2 0x5555599b24aa in address_space_dispatch_free ../softmmu/physmem.c:2430 #3 0x55555996a283 in flatview_destroy ../softmmu/memory.c:292 qemu#4 0x55555a2cb9fb in call_rcu_thread ../util/rcu.c:284 qemu#5 0x55555a29b71d in qemu_thread_start ../util/qemu-thread-posix.c:541 qemu#6 0x7ffff4a0cea6 in start_thread nptl/pthread_create.c:477 qemu#7 0x7ffff492ca2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfca2e) The problem here is that portio_list_destroy() unparents the portio_list MemoryRegions causing them to be freed immediately, however the flatview still has a reference to the MemoryRegion and so causes a use-after-free segfault when the RCU thread next updates the flatview. Solve the lifetime issue by making MemoryRegionPortioList the owner of the portio_list MemoryRegions, and then reparenting them to the portio_list owner. This ensures that they can be accessed as QOM children via the portio_list owner, yet the MemoryRegionPortioList owns the refcount. Update portio_list_destroy() to unparent the MemoryRegion from the portio_list owner (while keeping mrpio->mr live until finalization of the MemoryRegionPortioList), so that the portio_list MemoryRegions remain allocated until flatview_destroy() removes the final refcount upon the next flatview update. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230419151652.362717-4-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Jun 5, 2023
blk_set_aio_context() is not fully transactional because blk_do_set_aio_context() updates blk->ctx outside the transaction. Most of the time this goes unnoticed but a BlockDevOps.drained_end() callback that invokes blk_get_aio_context() fails assert(ctx == blk->ctx). This happens because blk->ctx is only assigned after BlockDevOps.drained_end() is called and we're in an intermediate state where BlockDrvierState nodes already have the new context and the BlockBackend still has the old context. Making blk_set_aio_context() fully transactional solves this assertion failure because the BlockBackend's context is updated as part of the transaction (before BlockDevOps.drained_end() is called). Split blk_do_set_aio_context() in order to solve this assertion failure. This helper function actually serves two different purposes: 1. It drives blk_set_aio_context(). 2. It responds to BdrvChildClass->change_aio_ctx(). Get rid of the helper function. Do #1 inside blk_set_aio_context() and do #2 inside blk_root_set_aio_ctx_commit(). This simplifies the code. The only drawback of the fully transactional approach is that blk_set_aio_context() must contend with blk_root_set_aio_ctx_commit() being invoked as part of the AioContext change propagation. This can be solved by temporarily setting blk->allow_aio_context_change to true. Future patches call blk_get_aio_context() from BlockDevOps->drained_end(), so this patch will become necessary. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-2-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
rth7680
pushed a commit
that referenced
this pull request
Jun 20, 2023
Command "qemu-system-riscv64 -machine virt -m 2G -smp 1 -numa node,mem=1G -numa node,mem=1G" would trigger this problem.Backtrace with: #0 0x0000555555b5b1a4 in riscv_numa_get_default_cpu_node_id at ../hw/riscv/numa.c:211 #1 0x00005555558ce510 in machine_numa_finish_cpu_init at ../hw/core/machine.c:1230 #2 0x00005555558ce9d3 in machine_run_board_init at ../hw/core/machine.c:1346 #3 0x0000555555aaedc3 in qemu_init_board at ../softmmu/vl.c:2513 qemu#4 0x0000555555aaf064 in qmp_x_exit_preconfig at ../softmmu/vl.c:2609 qemu#5 0x0000555555ab1916 in qemu_init at ../softmmu/vl.c:3617 qemu#6 0x000055555585463b in main at ../softmmu/main.c:47 This commit fixes the issue by adding parameter checks. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn> Signed-off-by: Yin Wang <yin.wang@intel.com> Message-Id: <20230519023758.1759434-1-yin.wang@intel.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When debootstrapping a Debian Stretch hppa chroot with a
qemu-hppa-static it looks sometimes as it hangs or timeouts.
Also networking seems not possible.
Running inside a Debian Stretch hppa chroot following
command "chroot stretch su -l root" seems to hang and
fails later. This seems also to be an issue when
debootstrap creates the chroot.
Running with QEMU_STRACE shows that a socket is opened,
something gets sent to it and then some answer is expected:
Debugging shows that the sending happens in following stack:
The socket was opened a little bit earlier in
But the problem is that the message sent in audit_send gets translated
into an empty buffer in do_sendto, because fd_trans_target_to_host_data
has no entry for this fd stored:
When the stocket got opened it did initially call fd_trans_register:
But it gets unregistered again, just a line after the call to do_socket:
When commenting this line, this NETLINK_AUDIT operation and
a "apt-get update" started to work.
The "fd_trans_unregister" seems to be included since 2015:
e36800c (linux-user: add signalfd/signalfd4 syscalls)
Tried to reproduce it with qemu-x86_64-static on amd64, but could
not see this issue. Up to now I see this just with hppa.
Therefore I do not know if this is just my configuration,
because of some "hppa" changes or if this
fd_trans_unregister is really wrong.
What do you think?
Can you reproduce it?