Commits on May 18, 2023

  1. acpi: pcihp: allow repeating hot-unplug requests

    with Q35 using ACPI PCI hotplug by default, user's request to unplug
    device is ignored when it's issued before guest OS has been booted.
    And any additional attempt to request device hot-unplug afterwards
    results in following error:
    
      "Device XYZ is already in the process of unplug"
    
    arguably it can be considered as a regression introduced by [2],
    before which it was possible to issue unplug request multiple
    times.
    
    Accept new uplug requests after timeout (1ms). This brings ACPI PCI
    hotplug on par with native PCIe unplug behavior [1] and allows user
    to repeat unplug requests at propper times.
    Set expire timeout to arbitrary 1msec so user won't be able to
    flood guest with SCI interrupts by calling device_del in tight loop.
    
    PS:
    ACPI spec doesn't mandate what OSPM can do with GPEx.status
    bits set before it's booted => it's impl. depended.
    Status bits may be retained (I tested with one Windows version)
    or cleared (Linux since 2.6 kernel times) during guest's ACPI
    subsystem initialization.
    Clearing status bits (though not wrong per se) hides the unplug
    event from guest, and it's upto user to repeat device_del later
    when guest is able to handle unplug requests.
    
    1) 18416c6 ("pcie: expire pending delete")
    2)
    Fixes: cce8944 ("qdev-monitor: Forbid repeated device_del")
    Signed-off-by: Igor Mammedov <imammedo@redhat.com>
    Acked-by: Gerd Hoffmann <kraxel@redhat.com>
    CC: mst@redhat.com
    CC: anisinha@redhat.com
    CC: jusual@redhat.com
    CC: kraxel@redhat.com
    Message-Id: <20230418090449.2155757-1-imammedo@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Ani Sinha <anisinha@redhat.com>
    (cherry picked from commit 0f689cf)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    Igor Mammedov authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    e557055 View commit details
    Browse the repository at this point in the history
  2. qemu-options: finesse the recommendations around -blockdev

    We are a bit premature in recommending -blockdev/-device as the best
    way to configure block devices. It seems there are times the more
    human friendly -drive still makes sense especially when -snapshot is
    involved.
    
    Improve the language to hopefully make things clearer.
    
    Suggested-by: Michael Tokarev <mjt@tls.msk.ru>
    Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Cc: Markus Armbruster <armbru@redhat.com>
    Cc: Kevin Wolf <kwolf@redhat.com>
    Message-Id: <20230424092249.58552-7-alex.bennee@linaro.org>
    (cherry picked from commit c1654c3)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    stsquad authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    4e35bd8 View commit details
    Browse the repository at this point in the history
  3. docs/about/deprecated.rst: Add "since 7.1" tag to dtb-kaslr-seed depr…

    …ecation
    
    In commit 5242876 we deprecated the dtb-kaslr-seed property of
    the virt board, but forgot the "since n.n" tag in the documentation
    of this in deprecated.rst.
    
    This deprecation note first appeared in the 7.1 release, so
    retrospectively add the correct "since 7.1" annotation to it.
    
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Message-id: 20230420122256.1023709-1-peter.maydell@linaro.org
    (cherry picked from commit ac64ebb)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    pm215 authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    0a65c45 View commit details
    Browse the repository at this point in the history
  4. target/arm: Initialize debug capabilities only once

    kvm_arm_init_debug() used to be called several times on a SMP system as
    kvm_arch_init_vcpu() calls it. Move the call to kvm_arch_init() to make
    sure it will be called only once; otherwise it will overwrite pointers
    to memory allocated with the previous call and leak it.
    
    Fixes: e4482ab ("target-arm: kvm - add support for HW assisted debug")
    Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
    Message-id: 20230405153644.25300-1-akihiko.odaki@daynix.com
    Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    (cherry picked from commit ad5c6dd)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    akihikodaki authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    32900bf View commit details
    Browse the repository at this point in the history
  5. hw/net/msf2-emac: Don't modify descriptor in-place in emac_store_desc()

    The msf2-emac ethernet controller has functions emac_load_desc() and
    emac_store_desc() which read and write the in-memory descriptor
    blocks and handle conversion between guest and host endianness.
    
    As currently written, emac_store_desc() does the endianness
    conversion in-place; this means that it effectively consumes the
    input EmacDesc struct, because on a big-endian host the fields will
    be overwritten with the little-endian versions of their values.
    Unfortunately, in all the callsites the code continues to access
    fields in the EmacDesc struct after it has called emac_store_desc()
    -- specifically, it looks at the d.next field.
    
    The effect of this is that on a big-endian host networking doesn't
    work because the address of the next descriptor is corrupted.
    
    We could fix this by making the callsite avoid using the struct; but
    it's more robust to have emac_store_desc() leave its input alone.
    
    (emac_load_desc() also does an in-place conversion, but here this is
    fine, because the function is supposed to be initializing the
    struct.)
    
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Message-id: 20230424151919.1333299-1-peter.maydell@linaro.org
    (cherry picked from commit d565f58)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    pm215 authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    e96dc26 View commit details
    Browse the repository at this point in the history
  6. hw/arm/boot: Make write_bootloader() public as arm_write_bootloader()

    The arm boot.c code includes a utility function write_bootloader()
    which assists in writing a boot-code fragment into guest memory,
    including handling endianness and fixing it up with entry point
    addresses and similar things.  This is useful not just for the boot.c
    code but also in board model code, so rename it to
    arm_write_bootloader() and make it globally visible.
    
    Since we are making it public, make its API a little neater: move the
    AddressSpace* argument to be next to the hwaddr argument, and allow
    the fixupcontext array to be const, since we never modify it in this
    function.
    
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Tested-by: Cédric Le Goater <clg@kaod.org>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    Message-id: 20230424152717.1333930-2-peter.maydell@linaro.org
    [PMM: Split out from another patch by Cédric, added doc comment]
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    (cherry picked from commit 0fe43f0)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    legoater authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    8e9c265 View commit details
    Browse the repository at this point in the history
  7. hw/arm/aspeed: Use arm_write_bootloader() to write the bootloader

    When writing the secondary-CPU stub boot loader code to the guest,
    use arm_write_bootloader() instead of directly calling
    rom_add_blob_fixed().  This fixes a bug on big-endian hosts, because
    arm_write_bootloader() will correctly byte-swap the host-byte-order
    array values into the guest-byte-order to write into the guest
    memory.
    
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Cédric Le Goater <clg@kaod.org>
    Tested-by: Cédric Le Goater <clg@kaod.org>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    Message-id: 20230424152717.1333930-3-peter.maydell@linaro.org
    [PMM: Moved the "make arm_write_bootloader() function public" part
     to its own patch; updated commit message to note that this fixes
     an actual bug; adjust to the API changes noted in previous commit]
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    (cherry picked from commit 902bba5)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    legoater authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    5ebe440 View commit details
    Browse the repository at this point in the history
  8. hw/arm/raspi: Use arm_write_bootloader() to write boot code

    When writing the secondary-CPU stub boot loader code to the guest,
    use arm_write_bootloader() instead of directly calling
    rom_add_blob_fixed().  This fixes a bug on big-endian hosts, because
    arm_write_bootloader() will correctly byte-swap the host-byte-order
    array values into the guest-byte-order to write into the guest
    memory.
    
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    Tested-by: Cédric Le Goater <clg@kaod.org>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Message-id: 20230424152717.1333930-4-peter.maydell@linaro.org
    (cherry picked from commit 0acbdb4)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    pm215 authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    d46d403 View commit details
    Browse the repository at this point in the history
  9. hw/intc/allwinner-a10-pic: Don't use set_bit()/clear_bit()

    The Allwinner PIC model uses set_bit() and clear_bit() to update the
    values in its irq_pending[] array when an interrupt arrives.  However
    it is using these functions wrongly: they work on an array of type
    'long', and it is passing an array of type 'uint32_t'.  Because the
    code manually figures out the right array element, this works on
    little-endian hosts and on 32-bit big-endian hosts, where bits 0..31
    in a 'long' are in the same place as they are in a 'uint32_t'.
    However it breaks on 64-bit big-endian hosts.
    
    Remove the use of set_bit() and clear_bit() in favour of using
    deposit32() on the array element.  This fixes a bug where on
    big-endian 64-bit hosts the guest kernel would hang early on in
    bootup.
    
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Message-id: 20230424152833.1334136-1-peter.maydell@linaro.org
    (cherry picked from commit 2c5fa07)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    pm215 authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    5eb742f View commit details
    Browse the repository at this point in the history
  10. target/arm: Define and use new load_cpu_field_low32()

    In several places in the 32-bit Arm translate.c, we try to use
    load_cpu_field() to load from a CPUARMState field into a TCGv_i32
    where the field is actually 64-bit. This works on little-endian
    hosts, but gives the wrong half of the register on big-endian.
    
    Add a new load_cpu_field_low32() which loads the low 32 bits
    of a 64-bit field into a TCGv_i32. The new macro includes a
    compile-time check against accidentally using it on a field
    of the wrong size. Use it to fix the two places in the code
    where we were using load_cpu_field() on a 64-bit field.
    
    This fixes a bug where on big-endian hosts the guest would
    crash after executing an ERET instruction, and a more corner
    case one where some UNDEFs for attempted accesses to MSR
    banked registers from Secure EL1 might go to the wrong EL.
    
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Message-id: 20230424153909.1419369-2-peter.maydell@linaro.org
    (cherry picked from commit 7f3a3d3)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    pm215 authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    e4e79c8 View commit details
    Browse the repository at this point in the history
  11. hw/sd/allwinner-sdhost: Correctly byteswap descriptor fields

    In allwinner_sdhost_process_desc() we just read directly from
    guest memory into a host TransferDescriptor struct and back.
    This only works on little-endian hosts. Abstract the reading
    and writing of descriptors into functions that handle the
    byte-swapping so that TransferDescriptor structs as seen by
    the rest of the code are always in host-order.
    
    This fixes a failure of one of the avocado tests on s390.
    
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Message-id: 20230424165053.1428857-2-peter.maydell@linaro.org
    (cherry picked from commit 3e20d90)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    pm215 authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    4b02ac7 View commit details
    Browse the repository at this point in the history
  12. hw/net/allwinner-sun8i-emac: Correctly byteswap descriptor fields

    In allwinner-sun8i-emac we just read directly from guest memory into
    a host FrameDescriptor struct and back.  This only works on
    little-endian hosts.  Reading and writing of descriptors is already
    abstracted into functions; make those functions also handle the
    byte-swapping so that TransferDescriptor structs as seen by the rest
    of the code are always in host-order, and fix two places that were
    doing ad-hoc descriptor reading without using the functions.
    
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Message-id: 20230424165053.1428857-3-peter.maydell@linaro.org
    (cherry picked from commit a4ae17e)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    pm215 authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    ec979ff View commit details
    Browse the repository at this point in the history
  13. softfloat: Fix the incorrect computation in float32_exp2

    The float32_exp2 function is computing wrong exponent of 2.
    
    For example, with the following set of values {0.1, 2.0, 2.0, -1.0},
    the expected output would be {1.071773, 4.000000, 4.000000, 0.500000}.
    Instead, the function is computing {1.119102, 3.382044, 3.382044, -0.191022}
    
    Looking at the code, the float32_exp2() attempts to do this
    
                      2     3     4     5           n
      x        x     x     x     x     x           x
     e  = 1 + --- + --- + --- + --- + --- + ... + --- + ...
               1!    2!    3!    4!    5!          n!
    
    But because of the typo it ends up doing
    
      x        x     x     x     x     x           x
     e  = 1 + --- + --- + --- + --- + --- + ... + --- + ...
               1!    2!    3!    4!    5!          n!
    
    This is because instead of the xnp which holds the numerator, parts_muladd
    is using the xp which is just 'x'.  Commit '572c4d862ff2' refactored this
    function, and mistakenly used xp instead of xnp.
    
    Cc: qemu-stable@nongnu.org
    Fixes: 572c4d8 "softfloat: Convert float32_exp2 to FloatParts"
    Partially-Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1623
    Reported-By: Luca Barbato (https://gitlab.com/lu-zero)
    Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
    Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
    Message-Id: <168304110865.537992.13059030916325018670.stgit@localhost.localdomain>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
    (cherry picked from commit 1098cc3)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    ShivaprasadGBhat authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    9b300a1 View commit details
    Browse the repository at this point in the history
  14. meson: leave unnecessary modules out of the build

    meson.build files choose whether to build modules based on foo.found()
    expressions.  If a feature is enabled (e.g. --enable-gtk), these expressions
    are true even if the code is not used by any emulator, and this results
    in an unexpected difference between modular and non-modular builds.
    
    For non-modular builds, the files are not included in any binary, and
    therefore the source files are never processed.  For modular builds,
    however, all .so files are unconditionally built by default, and therefore
    a normal "make" tries to build them.  However, the corresponding trace-*.h
    files are absent due to this conditional:
    
    if have_system
      trace_events_subdirs += [
        ...
        'ui',
        ...
      ]
    endif
    
    which was added to avoid wasting time running tracetool on unused trace-events
    files.  This causes a compilation failure; fix it by skipping module builds
    entirely if (depending on the module directory) have_block or have_system
    are false.
    
    Reported-by: Michael Tokarev <mjt@tls.msk.ru>
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    (cherry picked from commit ef70986)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    bonzini authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    e3074f6 View commit details
    Browse the repository at this point in the history
  15. block: Fix use after free in blockdev_mark_auto_del()

    job_cancel_locked() drops the job list lock temporarily and it may call
    aio_poll(). We must assume that the list has changed after this call.
    Also, with unlucky timing, it can end up freeing the job during
    job_completed_txn_abort_locked(), making the job pointer invalid, too.
    
    For both reasons, we can't just continue at block_job_next_locked(job).
    Instead, start at the head of the list again after job_cancel_locked()
    and skip those jobs that we already cancelled (or that are completing
    anyway).
    
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>
    Message-Id: <20230503140142.474404-1-kwolf@redhat.com>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>
    (cherry picked from commit e262687)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    Kevin Wolf authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    89640e0 View commit details
    Browse the repository at this point in the history
  16. target/riscv: Fix itrigger when icount is used

    When I boot a ubuntu image, QEMU output a "Bad icount read" message and exit.
    The reason is that when execute helper_mret or helper_sret, it will
    cause a call to icount_get_raw_locked (), which needs set can_do_io flag
    on cpustate.
    
    Thus we setting this flag when execute these two instructions.
    
    Signed-off-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
    Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
    Acked-by: Alistair Francis <alistair.francis@wdc.com>
    Message-Id: <20230324064011.976-1-zhiwei_liu@linux.alibaba.com>
    Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
    (cherry picked from commit df3ac6d)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    romanheros authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    666e6bb View commit details
    Browse the repository at this point in the history
  17. accel/tcg: Fix atomic_mmu_lookup for reads

    A copy-paste bug had us looking at the victim cache for writes.
    
    Cc: qemu-stable@nongnu.org
    Reported-by: Peter Maydell <peter.maydell@linaro.org>
    Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
    Fixes: 08dff43 ("tcg: Probe the proper permissions for atomic ops")
    Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
    Message-Id: <20230505204049.352469-1-richard.henderson@linaro.org>
    (cherry picked from commit 8c31325)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    rth7680 authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    0e262ee View commit details
    Browse the repository at this point in the history
  18. ui: Fix pixel colour channel order for PNG screenshots

    When we take a PNG screenshot the ordering of the colour channels in
    the data is not correct, resulting in the image having weird
    colouring compared to the actual display.  (Specifically, on a
    little-endian host the blue and red channels are swapped; on
    big-endian everything is wrong.)
    
    This happens because the pixman idea of the pixel data and the libpng
    idea differ.  PIXMAN_a8r8g8b8 defines that pixels are 32-bit values,
    with A in bits 24-31, R in bits 16-23, G in bits 8-15 and B in bits
    0-7.  This means that on little-endian systems the bytes in memory
    are
       B G R A
    and on big-endian systems they are
       A R G B
    
    libpng, on the other hand, thinks of pixels as being a series of
    values for each channel, so its format PNG_COLOR_TYPE_RGB_ALPHA
    always wants bytes in the order
       R G B A
    
    This isn't the same as the pixman order for either big or little
    endian hosts.
    
    The alpha channel is also unnecessary bulk in the output PNG file,
    because there is no alpha information in a screenshot.
    
    To handle the endianness issue, we already define in ui/qemu-pixman.h
    various PIXMAN_BE_* and PIXMAN_LE_* values that give consistent
    byte-order pixel channel formats.  So we can use PIXMAN_BE_r8g8b8 and
    PNG_COLOR_TYPE_RGB, which both have an in-memory byte order of
        R G B
    and 3 bytes per pixel.
    
    (PPM format screenshots get this right; they already use the
    PIXMAN_BE_r8g8b8 format.)
    
    Cc: qemu-stable@nongnu.org
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1622
    Fixes: 9a0a119 ("Added parameter to take screenshot with screendump as PNG")
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Message-id: 20230502135548.2451309-1-peter.maydell@linaro.org
    (cherry picked from commit cd22a0f)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    pm215 authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    379a05f View commit details
    Browse the repository at this point in the history
  19. async: Suppress GCC13 false positive in aio_bh_poll()

    GCC13 reports an error :
    
    ../util/async.c: In function ‘aio_bh_poll’:
    include/qemu/queue.h:303:22: error: storing the address of local variable ‘slice’ in ‘*ctx.bh_slice_list.sqh_last’ [-Werror=dangling-pointer=]
      303 |     (head)->sqh_last = &(elm)->field.sqe_next;                          \
          |     ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
    ../util/async.c:169:5: note: in expansion of macro ‘QSIMPLEQ_INSERT_TAIL’
      169 |     QSIMPLEQ_INSERT_TAIL(&ctx->bh_slice_list, &slice, next);
          |     ^~~~~~~~~~~~~~~~~~~~
    ../util/async.c:161:17: note: ‘slice’ declared here
      161 |     BHListSlice slice;
          |                 ^~~~~
    ../util/async.c:161:17: note: ‘ctx’ declared here
    
    But the local variable 'slice' is removed from the global context list
    in following loop of the same routine. Add a pragma to silent GCC.
    
    Cc: Stefan Hajnoczi <stefanha@redhat.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Daniel P. Berrangé <berrange@redhat.com>
    Signed-off-by: Cédric Le Goater <clg@redhat.com>
    Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
    Message-Id: <20230420202939.1982044-1-clg@kaod.org>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    (cherry picked from commit d66ba6d)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    (Mjt: cherry-picked to stable-7.2 to eliminate CI failures on win*)
    legoater authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    c94d55f View commit details
    Browse the repository at this point in the history
  20. tcg: ppc64: Fix mask generation for vextractdm

    In function do_extractm() the mask is calculated as
    dup_const(1 << (element_width - 1)). '1' being signed int
    works fine for MO_8,16,32. For MO_64, on PPC64 host
    this ends up becoming 0 on compilation. The vextractdm
    uses MO_64, and it ends up having mask as 0.
    
    Explicitly use 1ULL instead of signed int 1 like its
    used everywhere else.
    
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1536
    Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Reviewed-by: Lucas Mateus Castro <lucas.araujo@eldorado.org.br>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: Cédric Le Goater <clg@redhat.com>
    Message-Id: <168319292809.1159309.5817546227121323288.stgit@ltc-boston1.aus.stglabs.ibm.com>
    Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
    (cherry picked from commit 6a5d81b)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    ShivaprasadGBhat authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    afc11df View commit details
    Browse the repository at this point in the history
  21. hw/virtio/vhost-user: avoid using unitialized errp

    During protocol negotiation, when we the QEMU
    stub does not support a backend with F_CONFIG,
    it throws a warning and supresses the
    VHOST_USER_PROTOCOL_F_CONFIG bit.
    
    However, the warning uses warn_reportf_err macro
    and passes an unitialized errp pointer. However,
    the macro tries to edit the 'msg' member of the
    unitialized Error and segfaults.
    
    Instead, just use warn_report, which prints a
    warning message directly to the output.
    
    Fixes: 5653493 ("hw/virtio/vhost-user: don't suppress F_CONFIG when supported")
    Signed-off-by: Albert Esteve <aesteve@redhat.com>
    Message-Id: <20230302121719.9390-1-aesteve@redhat.com>
    Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    (cherry picked from commit 90e3123)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    aesteve-rh authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    a641521 View commit details
    Browse the repository at this point in the history
  22. virtio: fix reachable assertion due to stale value of cached region size

    In virtqueue_{split,packed}_get_avail_bytes() descriptors are read
    in a loop via MemoryRegionCache regions and calls to
    vring_{split,packed}_desc_read() - these take a region cache and the
    index of the descriptor to be read.
    
    For direct descriptors we use a cache provided by the caller, whose
    size matches that of the virtqueue vring. We limit the number of
    descriptors we can read by the size of that vring:
    
        max = vq->vring.num;
        ...
        MemoryRegionCache *desc_cache = &caches->desc;
    
    For indirect descriptors, we initialize a new cache and limit the
    number of descriptors by the size of the intermediate descriptor:
    
        len = address_space_cache_init(&indirect_desc_cache,
                                       vdev->dma_as,
                                       desc.addr, desc.len, false);
        desc_cache = &indirect_desc_cache;
        ...
        max = desc.len / sizeof(VRingDesc);
    
    However, the first initialization of `max` is done outside the loop
    where we process guest descriptors, while the second one is done
    inside. This means that a sequence of an indirect descriptor followed
    by a direct one will leave a stale value in `max`. If the second
    descriptor's `next` field is smaller than the stale value, but
    greater than the size of the virtqueue ring (and thus the cached
    region), a failed assertion will be triggered in
    address_space_read_cached() down the call chain.
    
    Fix this by initializing `max` inside the loop in both functions.
    
    Fixes: 9796d0a ("virtio: use address_space_map/unmap to access descriptors")
    Signed-off-by: Carlos López <clopez@suse.de>
    Message-Id: <20230302100358.3613-1-clopez@suse.de>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    (cherry picked from commit bbc1c32)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    00xc authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    2a0afe1 View commit details
    Browse the repository at this point in the history
  23. block/monitor: Fix crash when executing HMP commit

    hmp_commit() calls blk_is_available() from a non-coroutine context (and
    in the main loop). blk_is_available() is a co_wrapper_mixed_bdrv_rdlock
    function, and in the non-coroutine context it calls AIO_WAIT_WHILE(),
    which crashes if the aio_context lock is not taken before.
    
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1615
    Signed-off-by: Wang Liang <wangliangzz@inspur.com>
    Message-Id: <20230424103902.45265-1-wangliangzz@126.com>
    Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
    Reviewed-by: Kevin Wolf <kwolf@redhat.com>
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>
    (cherry picked from commit 8c1e8fb)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    Wang Liang authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    b7b814c View commit details
    Browse the repository at this point in the history
  24. target/s390x: Fix EXECUTE of relative branches

    Fix a problem similar to the one fixed by commit 703d03a
    ("target/s390x: Fix EXECUTE of relative long instructions"), but now
    for relative branches.
    
    Reported-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
    Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Message-Id: <20230426235813.198183-2-iii@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>
    (cherry picked from commit e8ecdfe)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    iii-i authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    6b71859 View commit details
    Browse the repository at this point in the history
  25. s390x/tcg: Fix LDER instruction format

    It's RRE, not RXE.
    
    Found by running valgrind's none/tests/s390x/bfp-2.
    
    Fixes: 86b5962 ("s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP")
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Message-Id: <20230511134726.469651-1-iii@linux.ibm.com>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Signed-off-by: Thomas Huth <thuth@redhat.com>
    (cherry picked from commit 970641d)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    (Mjt: context tweak)
    iii-i authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    00acdd8 View commit details
    Browse the repository at this point in the history
  26. 9pfs/xen: Fix segfault on shutdown

    xen_9pfs_free can't use gnttabdev since it is already closed and NULL-ed
    out when free is called.  Do the teardown in _disconnect().  This
    matches the setup done in _connect().
    
    trace-events are also added for the XenDevOps functions.
    
    Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
    Message-Id: <20230502143722.15613-1-jandryuk@gmail.com>
    [C.S.: - Remove redundant return in xen_9pfs_free().
           - Add comment to trace-events. ]
    Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
    (cherry picked from commit 92e667f)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    (Mjt: minor context conflict in hw/9pfs/xen-9p-backend.c)
    jandryuk authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    de6596a View commit details
    Browse the repository at this point in the history
  27. xen/pt: reserve PCI slot 2 for Intel igd-passthru

    Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
    as noted in docs/igd-assign.txt in the Qemu source code.
    
    Currently, when the xl toolstack is used to configure a Xen HVM guest with
    Intel IGD passthrough to the guest with the Qemu upstream device model,
    a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
    a different slot. This problem often prevents the guest from booting.
    
    The only available workarounds are not good: Configure Xen HVM guests to
    use the old and no longer maintained Qemu traditional device model
    available from xenbits.xen.org which does reserve slot 2 for the Intel
    IGD or use the "pc" machine type instead of the "xenfv" machine type and
    add the xen platform device at slot 3 using a command line option
    instead of patching qemu to fix the "xenfv" machine type directly. The
    second workaround causes some degredation in startup performance such as
    a longer boot time and reduced resolution of the grub menu that is
    displayed on the monitor. This patch avoids that reduced startup
    performance when using the Qemu upstream device model for Xen HVM guests
    configured with the igd-passthru=on option.
    
    To implement this feature in the Qemu upstream device model for Xen HVM
    guests, introduce the following new functions, types, and macros:
    
    * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
    * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
    * typedef XenPTQdevRealize function pointer
    * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
    * xen_igd_reserve_slot and xen_igd_clear_slot functions
    
    Michael Tsirkin:
    * Introduce XEN_PCI_IGD_DOMAIN, XEN_PCI_IGD_BUS, XEN_PCI_IGD_DEV, and
      XEN_PCI_IGD_FN - use them to compute the value of XEN_PCI_IGD_SLOT_MASK
    
    The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
    member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
    the xl toolstack with the gfx_passthru option enabled, which sets the
    igd-passthru=on option to Qemu for the Xen HVM machine type.
    
    The new xen_igd_reserve_slot function also needs to be implemented in
    hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
    when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
    in which case it does nothing.
    
    The new xen_igd_clear_slot function overrides qdev->realize of the parent
    PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
    since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
    created in hw/i386/pc_piix.c for the case when igd-passthru=on.
    
    Move the call to xen_host_pci_device_get, and the associated error
    handling, from xen_pt_realize to the new xen_igd_clear_slot function to
    initialize the device class and vendor values which enables the checks for
    the Intel IGD to succeed. The verification that the host device is an
    Intel IGD to be passed through is done by checking the domain, bus, slot,
    and function values as well as by checking that gfx_passthru is enabled,
    the device class is VGA, and the device vendor in Intel.
    
    Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
    Message-Id: <b1b4a21fe9a600b1322742dda55a40e9961daa57.1674346505.git.brchuckz@aol.com>
    Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
    (cherry picked from commit 4f67543)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    zmudc authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    6bebd3f View commit details
    Browse the repository at this point in the history
  28. Revert "vhost-user: Monitor slave channel in vhost_user_read()"

    This reverts commit db8a377.
    
    Motivation : this is breaking vhost-user with DPDK as reported in [0].
    
    Received unexpected msg type. Expected 22 received 40
    Fail to update device iotlb
    Received unexpected msg type. Expected 40 received 22
    Received unexpected msg type. Expected 22 received 11
    Fail to update device iotlb
    Received unexpected msg type. Expected 11 received 22
    vhost VQ 1 ring restore failed: -71: Protocol error (71)
    Received unexpected msg type. Expected 22 received 11
    Fail to update device iotlb
    Received unexpected msg type. Expected 11 received 22
    vhost VQ 0 ring restore failed: -71: Protocol error (71)
    unable to start vhost net: 71: falling back on userspace virtio
    
    The failing sequence that leads to the first error is :
    - QEMU sends a VHOST_USER_GET_STATUS (40) request to DPDK on the master
      socket
    - QEMU starts a nested event loop in order to wait for the
      VHOST_USER_GET_STATUS response and to be able to process messages from
      the slave channel
    - DPDK sends a couple of legitimate IOTLB miss messages on the slave
      channel
    - QEMU processes each IOTLB request and sends VHOST_USER_IOTLB_MSG (22)
      updates on the master socket
    - QEMU assumes to receive a response for the latest VHOST_USER_IOTLB_MSG
      but it gets the response for the VHOST_USER_GET_STATUS instead
    
    The subsequent errors have the same root cause : the nested event loop
    breaks the order by design. It lures QEMU to expect responses to the
    latest message sent on the master socket to arrive first.
    
    Since this was only needed for DAX enablement which is still not merged
    upstream, just drop the code for now. A working solution will have to
    be merged later on. Likely protect the master socket with a mutex
    and service the slave channel with a separate thread, as discussed with
    Maxime in the mail thread below.
    
    [0] https://lore.kernel.org/qemu-devel/43145ede-89dc-280e-b953-6a2b436de395@redhat.com/
    
    Reported-by: Yanghang Liu <yanghliu@redhat.com>
    Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2155173
    Signed-off-by: Greg Kurz <groug@kaod.org>
    Message-Id: <20230119172424.478268-2-groug@kaod.org>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
    Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
    (cherry picked from commit f340a59)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    gkurz authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    7620c12 View commit details
    Browse the repository at this point in the history
  29. Revert "vhost-user: Introduce nested event loop in vhost_user_read()"

    This reverts commit a7f523c.
    
    The nested event loop is broken by design. It's only user was removed.
    Drop the code as well so that nobody ever tries to use it again.
    
    I had to fix a couple of trivial conflicts around return values because
    of 025faa8 ("vhost-user: stick to -errno error return convention").
    
    Signed-off-by: Greg Kurz <groug@kaod.org>
    Message-Id: <20230119172424.478268-3-groug@kaod.org>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
    (cherry picked from commit 4382138)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    gkurz authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    0c6e954 View commit details
    Browse the repository at this point in the history
  30. target/ppc: Fix helper_pminsn() prototype

    GCC13 reports an error:
    
    ../target/ppc/excp_helper.c:2625:6: error: conflicting types for ‘helper_pminsn’ due to enum/integer mismatch; have ‘void(CPUPPCState *, powerpc_pm_insn_t)’ {aka ‘void(struct CPUArchState *, powerpc_pm_insn_t)’} [-Werror=enum-int-mismatch]
     2625 | void helper_pminsn(CPUPPCState *env, powerpc_pm_insn_t insn)
          |      ^~~~~~~~~~~~~
    In file included from /home/legoater/work/qemu/qemu.git/include/qemu/osdep.h:49,
                     from ../target/ppc/excp_helper.c:19:
    /home/legoater/work/qemu/qemu.git/include/exec/helper-head.h:23:27: note: previous declaration of ‘helper_pminsn’ with type ‘void(CPUArchState *, uint32_t)’ {aka ‘void(CPUArchState *, unsigned int)’}
       23 | #define HELPER(name) glue(helper_, name)
          |                           ^~~~~~~
    
    Fixes: 7778a57 ("ppc: Add P7/P8 Power Management instructions")
    Signed-off-by: Cédric Le Goater <clg@redhat.com>
    Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
    Message-Id: <20230321161609.716474-4-clg@kaod.org>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>
    (cherry picked from commit 07e4804)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    legoater authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    273147b View commit details
    Browse the repository at this point in the history
  31. tests/docker: bump the xtensa base to debian:11-slim

    Stretch is going out of support so things like security updates will
    fail. As the toolchain itself is binary it hopefully won't mind the
    underlying OS being updated.
    
    Message-Id: <20230503091244.1450613-3-alex.bennee@linaro.org>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Reviewed-by: Juan Quintela <quintela@redhat.com>
    Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
    Reported-by: Richard Henderson <richard.henderson@linaro.org>
    (cherry picked from commit 3217b84)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    stsquad authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    e7f1150 View commit details
    Browse the repository at this point in the history
  32. linux-user: Fix mips fp64 executables loading

    If a program requires fr1, we should set the FR bit of CP0 control status
    register and add F64 hardware flag. The corresponding `else if` branch
    statement is copied from the linux kernel sources (see `arch_check_elf` function
    in linux/arch/mips/kernel/elf.c).
    
    Signed-off-by: Daniil Kovalev <dkovalev@compiler-toolchain-for.me>
    Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
    Message-Id: <20230404052153.16617-1-dkovalev@compiler-toolchain-for.me>
    Signed-off-by: Laurent Vivier <laurent@vivier.eu>
    (cherry picked from commit a0f8d27)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    Daniil Kovalev authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    95cb7a7 View commit details
    Browse the repository at this point in the history
  33. linux-user: fix getgroups/setgroups allocations

    linux-user getgroups(), setgroups(), getgroups32() and setgroups32()
    used alloca() to allocate grouplist arrays, with unchecked gidsetsize
    coming from the "guest".  With NGROUPS_MAX being 65536 (linux, and it
    is common for an application to allocate NGROUPS_MAX for getgroups()),
    this means a typical allocation is half the megabyte on the stack.
    Which just overflows stack, which leads to immediate SIGSEGV in actual
    system getgroups() implementation.
    
    An example of such issue is aptitude, eg
    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=811087#72
    
    Cap gidsetsize to NGROUPS_MAX (return EINVAL if it is larger than that),
    and use heap allocation for grouplist instead of alloca().  While at it,
    fix coding style and make all 4 implementations identical.
    
    Try to not impose random limits - for example, allow gidsetsize to be
    negative for getgroups() - just do not allocate negative-sized grouplist
    in this case but still do actual getgroups() call.  But do not allow
    negative gidsetsize for setgroups() since its argument is unsigned.
    
    Capping by NGROUPS_MAX seems a bit arbitrary, - we can do more, it is
    not an error if set size will be NGROUPS_MAX+1. But we should not allow
    integer overflow for the array being allocated. Maybe it is enough to
    just call g_try_new() and return ENOMEM if it fails.
    
    Maybe there's also no need to convert setgroups() since this one is
    usually smaller and known beforehand (KERN_NGROUPS_MAX is actually 63, -
    this is apparently a kernel-imposed limit for runtime group set).
    
    The patch fixes aptitude segfault mentioned above.
    
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    Message-Id: <20230409105327.1273372-1-mjt@msgid.tls.msk.ru>
    Signed-off-by: Laurent Vivier <laurent@vivier.eu>
    (cherry picked from commit 1e35d32)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    Michael Tokarev committed May 18, 2023
    Copy the full SHA
    89bf901 View commit details
    Browse the repository at this point in the history
  34. migration: Handle block device inactivation failures better

    Consider what happens when performing a migration between two host
    machines connected to an NFS server serving multiple block devices to
    the guest, when the NFS server becomes unavailable.  The migration
    attempts to inactivate all block devices on the source (a necessary
    step before the destination can take over); but if the NFS server is
    non-responsive, the attempt to inactivate can itself fail.  When that
    happens, the destination fails to get the migrated guest (good,
    because the source wasn't able to flush everything properly):
    
      (qemu) qemu-kvm: load of migration failed: Input/output error
    
    at which point, our only hope for the guest is for the source to take
    back control.  With the current code base, the host outputs a message, but then appears to resume:
    
      (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)
    
      (src qemu)info status
       VM status: running
    
    but a second migration attempt now asserts:
    
      (src qemu) qemu-kvm: ../block.c:6738: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
    
    Whether the guest is recoverable on the source after the first failure
    is debatable, but what we do not want is to have qemu itself fail due
    to an assertion.  It looks like the problem is as follows:
    
    In migration.c:migration_completion(), the source sets 'inactivate' to
    true (since COLO is not enabled), then tries
    savevm.c:qemu_savevm_state_complete_precopy() with a request to
    inactivate block devices.  In turn, this calls
    block.c:bdrv_inactivate_all(), which fails when flushing runs up
    against the non-responsive NFS server.  With savevm failing, we are
    now left in a state where some, but not all, of the block devices have
    been inactivated; but migration_completion() then jumps to 'fail'
    rather than 'fail_invalidate' and skips an attempt to reclaim those
    those disks by calling bdrv_activate_all().  Even if we do attempt to
    reclaim disks, we aren't taking note of failure there, either.
    
    Thus, we have reached a state where the migration engine has forgotten
    all state about whether a block device is inactive, because we did not
    set s->block_inactive in enough places; so migration allows the source
    to reach vm_start() and resume execution, violating the block layer
    invariant that the guest CPUs should not be restarted while a device
    is inactive.  Note that the code in migration.c:migrate_fd_cancel()
    will also try to reactivate all block devices if s->block_inactive was
    set, but because we failed to set that flag after the first failure,
    the source assumes it has reclaimed all devices, even though it still
    has remaining inactivated devices and does not try again.  Normally,
    qmp_cont() will also try to reactivate all disks (or correctly fail if
    the disks are not reclaimable because NFS is not yet back up), but the
    auto-resumption of the source after a migration failure does not go
    through qmp_cont().  And because we have left the block layer in an
    inconsistent state with devices still inactivated, the later migration
    attempt is hitting the assertion failure.
    
    Since it is important to not resume the source with inactive disks,
    this patch marks s->block_inactive before attempting inactivation,
    rather than after succeeding, in order to prevent any vm_start() until
    it has successfully reactivated all devices.
    
    See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982
    
    Signed-off-by: Eric Blake <eblake@redhat.com>
    Reviewed-by: Juan Quintela <quintela@redhat.com>
    Acked-by: Lukas Straub <lukasstraub2@web.de>
    Tested-by: Lukas Straub <lukasstraub2@web.de>
    Signed-off-by: Juan Quintela <quintela@redhat.com>
    (cherry picked from commit 403d18a)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    ebblake authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    08fd840 View commit details
    Browse the repository at this point in the history
  35. migration: Minor control flow simplification

    No need to declare a temporary variable.
    
    Suggested-by: Juan Quintela <quintela@redhat.com>
    Fixes: 1df36e8c6289 ("migration: Handle block device inactivation failures better")
    Signed-off-by: Eric Blake <eblake@redhat.com>
    Reviewed-by: Juan Quintela <quintela@redhat.com>
    Signed-off-by: Juan Quintela <quintela@redhat.com>
    (cherry picked from commit 5d39f44)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    ebblake authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    b514d5a View commit details
    Browse the repository at this point in the history
  36. migration: Attempt disk reactivation in more failure scenarios

    Commit fe904ea added a fail_inactivate label, which tries to
    reactivate disks on the source after a failure while s->state ==
    MIGRATION_STATUS_ACTIVE, but didn't actually use the label if
    qemu_savevm_state_complete_precopy() failed.  This failure to
    reactivate is also present in commit 6039dd5 (also covering the new
    s->state == MIGRATION_STATUS_DEVICE state) and 403d18a (ensuring
    s->block_inactive is set more reliably).
    
    Consolidate the two labels back into one - no matter HOW migration is
    failed, if there is any chance we can reach vm_start() after having
    attempted inactivation, it is essential that we have tried to restart
    disks before then.  This also makes the cleanup more like
    migrate_fd_cancel().
    
    Suggested-by: Kevin Wolf <kwolf@redhat.com>
    Signed-off-by: Eric Blake <eblake@redhat.com>
    Message-Id: <20230502205212.134680-1-eblake@redhat.com>
    Acked-by: Peter Xu <peterx@redhat.com>
    Reviewed-by: Juan Quintela <quintela@redhat.com>
    Reviewed-by: Kevin Wolf <kwolf@redhat.com>
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>
    (cherry picked from commit 6dab4c9)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    (Mjt: minor context tweak near added comment in migration/migration.c)
    ebblake authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    7405624 View commit details
    Browse the repository at this point in the history
  37. target/arm: Fix vd == vm overlap in sve_ldff1_z

    If vd == vm, copy vm to scratch, so that we can pre-zero
    the output and still access the gather indicies.
    
    Cc: qemu-stable@nongnu.org
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1612
    Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
    Message-id: 20230504104232.1877774-1-richard.henderson@linaro.org
    Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    (cherry picked from commit a6771f2)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    rth7680 authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    d68a13c View commit details
    Browse the repository at this point in the history
  38. scsi-generic: fix buffer overflow on block limits inquiry

    Using linux 6.x guest, at boot time, an inquiry on a scsi-generic
    device makes qemu crash.  This is caused by a buffer overflow when
    scsi-generic patches the block limits VPD page.
    
    Do the operations on a temporary on-stack buffer that is guaranteed
    to be large enough.
    
    Reported-by: Théo Maillart <tmaillart@freebox.fr>
    Analyzed-by: Théo Maillart <tmaillart@freebox.fr>
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    (cherry picked from commit 9bd634b)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    bonzini authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    61f6b12 View commit details
    Browse the repository at this point in the history
  39. target/i386: fix operand size for VCOMI/VUCOMI instructions

    Compared to other SSE instructions, VUCOMISx and VCOMISx are different:
    the single and double precision versions are distinguished through a
    prefix, however they use no-prefix and 0x66 for SS and SD respectively.
    Scalar values usually are associated with 0xF2 and 0xF3.
    
    Because of these, they incorrectly perform a 128-bit memory load instead
    of a 32- or 64-bit load.  Fix this by writing a custom decoding function.
    
    I tested that the reproducer is fixed and the test-avx output does not
    change.
    
    Reported-by: Gabriele Svelto <gsvelto@mozilla.com>
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1637
    Fixes: f8d19ee ("target/i386: reimplement 0x0f 0x28-0x2f, add AVX", 2022-10-18)
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    (cherry picked from commit 2b55e47)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    bonzini authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    eee0666 View commit details
    Browse the repository at this point in the history
  40. target/i386: fix avx2 instructions vzeroall and vpermdq

    vzeroall: xmm_regs should be used instead of xmm_t0
    vpermdq: bit 3 and 7 of imm should be considered
    
    Signed-off-by: Xinyu Li <lixinyu20s@ict.ac.cn>
    Message-Id: <20230510145222.586487-1-lixinyu20s@ict.ac.cn>
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    (cherry picked from commit 056d649)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    Xinyu Li authored and Michael Tokarev committed May 18, 2023
    Copy the full SHA
    48b60eb View commit details
    Browse the repository at this point in the history

Commits on May 19, 2023

  1. vhost: fix possible wrap in SVQ descriptor ring

    QEMU invokes vhost_svq_add() when adding a guest's element
    into SVQ. In vhost_svq_add(), it uses vhost_svq_available_slots()
    to check whether QEMU can add the element into SVQ. If there is
    enough space, then QEMU combines some out descriptors and some
    in descriptors into one descriptor chain, and adds it into
    `svq->vring.desc` by vhost_svq_vring_write_descs().
    
    Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx`
    in vhost_svq_available_slots() returns the number of occupied elements,
    or the number of descriptor chains, instead of the number of occupied
    descriptors, which may cause wrapping in SVQ descriptor ring.
    
    Here is an example. In vhost_handle_guest_kick(), QEMU forwards
    as many available buffers to device by virtqueue_pop() and
    vhost_svq_add_element(). virtqueue_pop() returns a guest's element,
    and then this element is added into SVQ by vhost_svq_add_element(),
    a wrapper to vhost_svq_add(). If QEMU invokes virtqueue_pop() and
    vhost_svq_add_element() `svq->vring.num` times,
    vhost_svq_available_slots() thinks QEMU just ran out of slots and
    everything should work fine. But in fact, virtqueue_pop() returns
    `svq->vring.num` elements or descriptor chains, more than
    `svq->vring.num` descriptors due to guest memory fragmentation,
    and this causes wrapping in SVQ descriptor ring.
    
    This bug is valid even before marking the descriptors used.
    If the guest memory is fragmented, SVQ must add chains
    so it can try to add more descriptors than possible.
    
    This patch solves it by adding `num_free` field in
    VhostShadowVirtqueue structure and updating this field
    in vhost_svq_add() and vhost_svq_get_buf(), to record
    the number of free descriptors.
    
    Fixes: 100890f ("vhost: Shadow virtqueue buffers forwarding")
    Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
    Acked-by: Eugenio Pérez <eperezma@redhat.com>
    Message-Id: <20230509084817.3973-1-yin31149@gmail.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Tested-by: Lei Yang <leiyang@redhat.com>
    (cherry picked from commit 5d41055)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    JiaweiHawk authored and Michael Tokarev committed May 19, 2023
    Copy the full SHA
    6f4dc62 View commit details
    Browse the repository at this point in the history
  2. hw/cxl: cdat: Fix open file not closed in ct3_load_cdat()

    Open file descriptor not closed in error paths. Fix by replace
    open coded handling of read of whole file into a buffer with
    g_file_get_contents()
    
    Fixes: aba578b ("hw/cxl: CDAT Data Object Exchange implementation")
    Signed-off-by: Zeng Hao <zenghao@kylinos.cn>
    Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Suggested-by: Peter Maydell <peter.maydell@linaro.org>
    Suggested-by: Jonathan Cameron via <qemu-devel@nongnu.org>
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    
    --
    Changes since v5:
    - Drop if guard on g_free() as per checkpatch warning.
    Message-Id: <20230421132020.7408-2-Jonathan.Cameron@huawei.com>
    Reviewed-by: Fan Ni <fan.ni@samsung.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    (cherry picked from commit 71ba92f)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    Hao Zeng authored and Michael Tokarev committed May 19, 2023
    Copy the full SHA
    e3ae58b View commit details
    Browse the repository at this point in the history
  3. virtio-net: not enable vq reset feature unconditionally

    The commit 93a97dc ("virtio-net: enable vq reset feature") enables
    unconditionally vq reset feature as long as the device is emulated.
    This makes impossible to actually disable the feature, and it causes
    migration problems from qemu version previous than 7.2.
    
    The entire final commit is unneeded as device system already enable or
    disable the feature properly.
    
    This reverts commit 93a97dc.
    Fixes: 93a97dc ("virtio-net: enable vq reset feature")
    Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
    
    Message-Id: <20230504101447.389398-1-eperezma@redhat.com>
    Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    (cherry picked from commit 1fac00f)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    eugpermar authored and Michael Tokarev committed May 19, 2023
    Copy the full SHA
    311dce0 View commit details
    Browse the repository at this point in the history
  4. virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_req…

    …uest
    
    Ensure op_info is not NULL in case of QCRYPTODEV_BACKEND_ALG_SYM algtype.
    
    Fixes: 0e660a6 ("crypto: Introduce RSA algorithm")
    Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
    Reported-by: Yiming Tao <taoym@zju.edu.cn>
    Message-Id: <20230509075317.1132301-1-mcascell@redhat.com>
    Reviewed-by: Gonglei <arei.gonglei@huawei.com>
    Reviewed-by: zhenwei pi<pizhenwei@bytedance.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    (cherry picked from commit 3e69908)
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    (Mjt: context tweak after 999c789 cryptodev: Introduce cryptodev alg type in QAPI)
    Mauro Matteo Cascella authored and Michael Tokarev committed May 19, 2023
    Copy the full SHA
    486e00b View commit details
    Browse the repository at this point in the history