Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: qemu/qemu
base: afa351fe36d4
Choose a base ref
...
head repository: qemu/qemu
compare: b52daaf2c868
Choose a head ref
  • 18 commits
  • 11 files changed
  • 4 contributors

Commits on Jun 5, 2023

  1. util/iov: Make qiov_slice() public

    We want to inline qemu_iovec_init_extended() in block/io.c for padding
    requests, and having access to qiov_slice() is useful for this.  As a
    public function, it is renamed to qemu_iovec_slice().
    
    (We will need to count the number of I/O vector elements of a slice
    there, and then later process this slice.  Without qiov_slice(), we
    would need to call qemu_iovec_subvec_niov(), and all further
    IOV-processing functions may need to skip prefixing elements to
    accomodate for a qiov_offset.  Because qemu_iovec_subvec_niov()
    internally calls qiov_slice(), we can just have the block/io.c code call
    qiov_slice() itself, thus get the number of elements, and also create an
    iovec array with the superfluous prefixing elements stripped, so the
    following processing functions no longer need to skip them.)
    
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    Message-Id: <20230411173418.19549-2-hreitz@redhat.com>
    XanClic committed Jun 5, 2023
    Copy the full SHA
    3d06cea View commit details
    Browse the repository at this point in the history
  2. block: Collapse padded I/O vecs exceeding IOV_MAX

    When processing vectored guest requests that are not aligned to the
    storage request alignment, we pad them by adding head and/or tail
    buffers for a read-modify-write cycle.
    
    The guest can submit I/O vectors up to IOV_MAX (1024) in length, but
    with this padding, the vector can exceed that limit.  As of
    4c002ce ("util/iov: make
    qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the
    limit, instead returning an error to the guest.
    
    To the guest, this appears as a random I/O error.  We should not return
    an I/O error to the guest when it issued a perfectly valid request.
    
    Before 4c002ce, we just made the vector
    longer than IOV_MAX, which generally seems to work (because the guest
    assumes a smaller alignment than we really have, file-posix's
    raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and
    so emulate the request, so that the IOV_MAX does not matter).  However,
    that does not seem exactly great.
    
    I see two ways to fix this problem:
    1. We split such long requests into two requests.
    2. We join some elements of the vector into new buffers to make it
       shorter.
    
    I am wary of (1), because it seems like it may have unintended side
    effects.
    
    (2) on the other hand seems relatively simple to implement, with
    hopefully few side effects, so this patch does that.
    
    To do this, the use of qemu_iovec_init_extended() in bdrv_pad_request()
    is effectively replaced by the new function bdrv_create_padded_qiov(),
    which not only wraps the request IOV with padding head/tail, but also
    ensures that the resulting vector will not have more than IOV_MAX
    elements.  Putting that functionality into qemu_iovec_init_extended() is
    infeasible because it requires allocating a bounce buffer; doing so
    would require many more parameters (buffer alignment, how to initialize
    the buffer, and out parameters like the buffer, its length, and the
    original elements), which is not reasonable.
    
    Conversely, it is not difficult to move qemu_iovec_init_extended()'s
    functionality into bdrv_create_padded_qiov() by using public
    qemu_iovec_* functions, so that is what this patch does.
    
    Because bdrv_pad_request() was the only "serious" user of
    qemu_iovec_init_extended(), the next patch will remove the latter
    function, so the functionality is not implemented twice.
    
    Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2141964
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    Message-Id: <20230411173418.19549-3-hreitz@redhat.com>
    Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
    XanClic committed Jun 5, 2023
    Copy the full SHA
    1874331 View commit details
    Browse the repository at this point in the history
  3. util/iov: Remove qemu_iovec_init_extended()

    bdrv_pad_request() was the main user of qemu_iovec_init_extended().
    HEAD^ has removed that use, so we can remove qemu_iovec_init_extended()
    now.
    
    The only remaining user is qemu_iovec_init_slice(), which can easily
    inline the small part it really needs.
    
    Note that qemu_iovec_init_extended() offered a memcpy() optimization to
    initialize the new I/O vector.  qemu_iovec_concat_iov(), which is used
    to replace its functionality, does not, but calls qemu_iovec_add() for
    every single element.  If we decide this optimization was important, we
    will need to re-implement it in qemu_iovec_concat_iov(), which might
    also benefit its pre-existing users.
    
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    Message-Id: <20230411173418.19549-4-hreitz@redhat.com>
    XanClic committed Jun 5, 2023
    Copy the full SHA
    cc63f6f View commit details
    Browse the repository at this point in the history
  4. iotests/iov-padding: New test

    Test that even vectored IO requests with 1024 vector elements that are
    not aligned to the device's request alignment will succeed.
    
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    Message-Id: <20230411173418.19549-5-hreitz@redhat.com>
    XanClic committed Jun 5, 2023
    Copy the full SHA
    d7e1905 View commit details
    Browse the repository at this point in the history
  5. parallels: Out of image offset in BAT leads to image inflation

    data_end field in BDRVParallelsState is set to the biggest offset present
    in BAT. If this offset is outside of the image, any further write will
    create the cluster at this offset and/or the image will be truncated to
    this offset on close. This is definitely not correct.
    
    Raise an error in parallels_open() if data_end points outside the image
    and it is not a check (let the check to repaire the image). Set data_end
    to the end of the cluster with the last correct offset.
    
    Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
    Message-Id: <20230424093147.197643-2-alexander.ivanov@virtuozzo.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    AlexanderIvanov-Virtuozzo authored and XanClic committed Jun 5, 2023
    Copy the full SHA
    f5e715d View commit details
    Browse the repository at this point in the history
  6. parallels: Fix high_off calculation in parallels_co_check()

    Don't let high_off be more than the file size even if we don't fix the
    image.
    
    Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
    Reviewed-by: Denis V. Lunev <den@openvz.org>
    Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
    Message-Id: <20230424093147.197643-3-alexander.ivanov@virtuozzo.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    AlexanderIvanov-Virtuozzo authored and XanClic committed Jun 5, 2023
    Copy the full SHA
    ab2d739 View commit details
    Browse the repository at this point in the history
  7. parallels: Fix image_end_offset and data_end after out-of-image check

    Set data_end to the end of the last cluster inside the image. In such a
    way we can be sure that corrupted offsets in the BAT can't affect on the
    image size. If there are no allocated clusters set image_end_offset by
    data_end.
    
    Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
    Reviewed-by: Denis V. Lunev <den@openvz.org>
    Message-Id: <20230424093147.197643-4-alexander.ivanov@virtuozzo.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    AlexanderIvanov-Virtuozzo authored and XanClic committed Jun 5, 2023
    Copy the full SHA
    679749c View commit details
    Browse the repository at this point in the history
  8. parallels: create parallels_set_bat_entry_helper() to assign BAT value

    This helper will be reused in next patches during parallels_co_check
    rework to simplify its code.
    
    Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
    Reviewed-by: Denis V. Lunev <den@openvz.org>
    Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
    Message-Id: <20230424093147.197643-5-alexander.ivanov@virtuozzo.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    AlexanderIvanov-Virtuozzo authored and XanClic committed Jun 5, 2023
    Copy the full SHA
    b64b29b View commit details
    Browse the repository at this point in the history
  9. parallels: Use generic infrastructure for BAT writing in parallels_co…

    …_check()
    
    BAT is written in the context of conventional operations over the image
    inside bdrv_co_flush() when it calls parallels_co_flush_to_os() callback.
    Thus we should not modify BAT array directly, but call
    parallels_set_bat_entry() helper and bdrv_co_flush() further on. After
    that there is no need to manually write BAT and track its modification.
    
    This makes code more generic and allows to split parallels_set_bat_entry()
    for independent pieces.
    
    Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
    Reviewed-by: Denis V. Lunev <den@openvz.org>
    Message-Id: <20230424093147.197643-6-alexander.ivanov@virtuozzo.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    AlexanderIvanov-Virtuozzo authored and XanClic committed Jun 5, 2023
    Copy the full SHA
    3569cb7 View commit details
    Browse the repository at this point in the history
  10. parallels: Move check of unclean image to a separate function

    We will add more and more checks so we need a better code structure
    in parallels_co_check. Let each check performs in a separate loop
    in a separate helper.
    
    Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
    Reviewed-by: Denis V. Lunev <den@openvz.org>
    Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
    Message-Id: <20230424093147.197643-7-alexander.ivanov@virtuozzo.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    AlexanderIvanov-Virtuozzo authored and XanClic committed Jun 5, 2023
    Copy the full SHA
    96de69c View commit details
    Browse the repository at this point in the history
  11. parallels: Move check of cluster outside image to a separate function

    We will add more and more checks so we need a better code structure in
    parallels_co_check. Let each check performs in a separate loop in a
    separate helper.
    
    Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
    Reviewed-by: Denis V. Lunev <den@openvz.org>
    Message-Id: <20230424093147.197643-8-alexander.ivanov@virtuozzo.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    AlexanderIvanov-Virtuozzo authored and XanClic committed Jun 5, 2023
    Copy the full SHA
    6d416e5 View commit details
    Browse the repository at this point in the history
  12. parallels: Fix statistics calculation

    Exclude out-of-image clusters from allocated and fragmented clusters
    calculation.
    
    Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
    Message-Id: <20230424093147.197643-9-alexander.ivanov@virtuozzo.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    AlexanderIvanov-Virtuozzo authored and XanClic committed Jun 5, 2023
    Copy the full SHA
    9616f7a View commit details
    Browse the repository at this point in the history
  13. parallels: Move check of leaks to a separate function

    We will add more and more checks so we need a better code structure
    in parallels_co_check. Let each check performs in a separate loop
    in a separate helper.
    
    Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
    Message-Id: <20230424093147.197643-10-alexander.ivanov@virtuozzo.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    AlexanderIvanov-Virtuozzo authored and XanClic committed Jun 5, 2023
    Copy the full SHA
    09a21ed View commit details
    Browse the repository at this point in the history
  14. parallels: Move statistic collection to a separate function

    We will add more and more checks so we need a better code structure
    in parallels_co_check. Let each check performs in a separate loop
    in a separate helper.
    
    Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
    Reviewed-by: Denis V. Lunev <den@openvz.org>
    Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
    Message-Id: <20230424093147.197643-11-alexander.ivanov@virtuozzo.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    AlexanderIvanov-Virtuozzo authored and XanClic committed Jun 5, 2023
    Copy the full SHA
    7e259e2 View commit details
    Browse the repository at this point in the history
  15. parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD

    Replace the way we use mutex in parallels_co_check() for simplier
    and less error prone code.
    
    Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
    Reviewed-by: Denis V. Lunev <den@openvz.org>
    Message-Id: <20230424093147.197643-12-alexander.ivanov@virtuozzo.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    AlexanderIvanov-Virtuozzo authored and XanClic committed Jun 5, 2023
    Copy the full SHA
    c0fc051 View commit details
    Browse the repository at this point in the history
  16. parallels: Incorrect condition in out-of-image check

    All the offsets in the BAT must be lower than the file size.
    Fix the check condition for correct check.
    
    Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
    Reviewed-by: Denis V. Lunev <den@openvz.org>
    Message-Id: <20230424093147.197643-13-alexander.ivanov@virtuozzo.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    AlexanderIvanov-Virtuozzo authored and XanClic committed Jun 5, 2023
    Copy the full SHA
    029136f View commit details
    Browse the repository at this point in the history
  17. qcow2: add discard-no-unref option

    When we for example have a sparse qcow2 image and discard: unmap is enabled,
    there can be a lot of fragmentation in the image after some time. Especially on VM's
    that do a lot of writes/deletes.
    This causes the qcow2 image to grow even over 110% of its virtual size,
    because the free gaps in the image get too small to allocate new
    continuous clusters. So it allocates new space at the end of the image.
    
    Disabling discard is not an option, as discard is needed to keep the
    incremental backup size as low as possible. Without discard, the
    incremental backups would become large, as qemu thinks it's just dirty
    blocks but it doesn't know the blocks are unneeded.
    So we need to avoid fragmentation but also 'empty' the unneeded blocks in
    the image to have a small incremental backup.
    
    In addition, we also want to send the discards further down the stack, so
    the underlying blocks are still discarded.
    
    Therefor we introduce a new qcow2 option "discard-no-unref".
    When setting this option to true, discards will no longer have the qcow2
    driver relinquish cluster allocations. Other than that, the request is
    handled as normal: All clusters in range are marked as zero, and, if
    pass-discard-request is true, it is passed further down the stack.
    The only difference is that the now-zero clusters are preallocated
    instead of being unallocated.
    This will avoid fragmentation on the qcow2 image.
    
    Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
    Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
    Message-Id: <20230605084523.34134-2-jean-louis@dupond.be>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
    dupondje authored and XanClic committed Jun 5, 2023
    Copy the full SHA
    42a2890 View commit details
    Browse the repository at this point in the history
  18. Merge tag 'pull-block-2023-06-05' of https://gitlab.com/hreitz/qemu i…

    …nto staging
    
    Block patches
    
    - Fix padding of unaligned vectored requests to match the host alignment
      for vectors with 1023 or 1024 buffers
    - Refactor and fix bugs in parallels's image check functionality
    - Add an option to the qcow2 driver to retain (qcow2-level) allocations
      on discard requests from the guest (while still forwarding the discard
      to the lower level and marking the range as zero)
    
    # -----BEGIN PGP SIGNATURE-----
    #
    # iQJGBAABCAAwFiEEy2LXoO44KeRfAE00ofpA0JgBnN8FAmR+AT4SHGhyZWl0ekBy
    # ZWRoYXQuY29tAAoJEKH6QNCYAZzfnboQAKD6YrreZLoseomRfqOAoApSf6yOdcHk
    # 6kfsvzwzjosomsF1Pkzm4851vX5PyDqTdeu0iViM+pxanVO1b494q1P4VcAERqMB
    # iZVs68R6M0l6HV9btWFGm+ibHJf4FapdntkIdwog1ka5TIhw5oDWCVNLigjhIoRv
    # sM37Bgf14kC3sFTR++0HESsyU1eUP5gJjwJbPZ2IgJBmzYay0is1z5nHA/3VUswu
    # 8dKnGQDsv62EtlK7PK8cU2BhLOeNi6Wr3bAb6Wf2QLB5e0qRb7oAkqNx5/UcTznk
    # a3XMC1aiWhYvM/+DaYIpQUcIPgA8xQ1KHKeD6WjbGfLgZBqseX0aGWMByUsiY8Bo
    # +BPIBnUDrbiPnAKB/XLQfnzlE+s7121/JpEbB7AkZqVFRGuw8Wur4tbc2fzvy8Pw
    # x/uQfv3ZPi/2Lf6u7hv/TVHubXi8jucVgx3Ubu5Jeo3901S4/KOQBQ4BQ/GYIGQX
    # 38ijSROcEd0eQJ1mTKPEctouxjSZCghNSbrn9DfsL1V3VWqWNKKGCU3hM+RQ1SJT
    # 688qvnyYt8QZfTsiDSHR/GfKsufG0DkoqE7c9IhSEPohecAH8Rrc3HcLut7fuwD2
    # gCFQhm68CPwwRmBjPCY6Zi1RDzeOyFBSWN31T6t0yTb4OHJ/3/cSZVBJtwwkOVbx
    # zwabHDNdY5Kw
    # =GuoL
    # -----END PGP SIGNATURE-----
    # gpg: Signature made Mon 05 Jun 2023 08:37:34 AM PDT
    # gpg:                using RSA key CB62D7A0EE3829E45F004D34A1FA40D098019CDF
    # gpg:                issuer "hreitz@redhat.com"
    # gpg: Good signature from "Hanna Reitz <hreitz@redhat.com>" [unknown]
    # gpg: WARNING: This key is not certified with a trusted signature!
    # gpg:          There is no indication that the signature belongs to the owner.
    # Primary key fingerprint: CB62 D7A0 EE38 29E4 5F00  4D34 A1FA 40D0 9801 9CDF
    
    * tag 'pull-block-2023-06-05' of https://gitlab.com/hreitz/qemu:
      qcow2: add discard-no-unref option
      parallels: Incorrect condition in out-of-image check
      parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD
      parallels: Move statistic collection to a separate function
      parallels: Move check of leaks to a separate function
      parallels: Fix statistics calculation
      parallels: Move check of cluster outside image to a separate function
      parallels: Move check of unclean image to a separate function
      parallels: Use generic infrastructure for BAT writing in parallels_co_check()
      parallels: create parallels_set_bat_entry_helper() to assign BAT value
      parallels: Fix image_end_offset and data_end after out-of-image check
      parallels: Fix high_off calculation in parallels_co_check()
      parallels: Out of image offset in BAT leads to image inflation
      iotests/iov-padding: New test
      util/iov: Remove qemu_iovec_init_extended()
      block: Collapse padded I/O vecs exceeding IOV_MAX
      util/iov: Make qiov_slice() public
    
    Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
    rth7680 committed Jun 5, 2023
    Copy the full SHA
    b52daaf View commit details
    Browse the repository at this point in the history