Skip to content

Commits

Permalink
Anuj-Gupta/blo…
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on Mar 29, 2023

  1. null_blk: add support for copy offload

    Implementaion is based on existing read and write infrastructure.
    
    Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
    Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
    Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
    nj-shetty authored and intel-lab-lkp committed Mar 29, 2023
    Copy the full SHA
    a0c6da8 View commit details
    Browse the repository at this point in the history
  2. dm: Enable copy offload for dm-linear target

    Setting copy_offload_supported flag to enable offload.
    
    Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
    nj-shetty authored and intel-lab-lkp committed Mar 29, 2023
    Copy the full SHA
    201e098 View commit details
    Browse the repository at this point in the history
  3. dm: Add support for copy offload.

    Before enabling copy for dm target, check if underlying devices and
    dm target support copy. Avoid split happening inside dm target.
    Fail early if the request needs split, currently splitting copy
    request is not supported.
    
    Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
    nj-shetty authored and intel-lab-lkp committed Mar 29, 2023
    Copy the full SHA
    28e2820 View commit details
    Browse the repository at this point in the history
  4. nvmet: add copy command support for bdev and file ns

    Add support for handling target command on target.
    For bdev-ns we call into blkdev_issue_copy, which the block layer
    completes by a offloaded copy request to backend bdev or by emulating the
    request.
    
    For file-ns we call vfs_copy_file_range to service our request.
    
    Currently target always shows copy capability by setting
    NVME_CTRL_ONCS_COPY in controller ONCS.
    
    Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
    Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
    nj-shetty authored and intel-lab-lkp committed Mar 29, 2023
    Copy the full SHA
    f846a8a View commit details
    Browse the repository at this point in the history
  5. nvme: add copy offload support

    For device supporting native copy, nvme driver receives read and
    write request with BLK_COPY op flags.
    For read request the nvme driver populates the payload with source
    information.
    For write request the driver converts it to nvme copy command using the
    source information in the payload and submits to the device.
    current design only supports single source range.
    This design is courtesy Mikulas Patocka's token based copy
    
    trace event support for nvme_copy_cmd.
    Set the device copy limits to queue limits.
    
    Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
    Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
    Signed-off-by: Javier González <javier.gonz@samsung.com>
    Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
    nj-shetty authored and intel-lab-lkp committed Mar 29, 2023
    Copy the full SHA
    11efed3 View commit details
    Browse the repository at this point in the history
  6. fs, block: copy_file_range for def_blk_ops for direct block device.

    For direct block device opened with O_DIRECT, use copy_file_range to
    issue device copy offload, and fallback to generic_copy_file_range incase
    device copy offload capability is absent.
    Modify checks to allow bdevs to use copy_file_range.
    
    Suggested-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
    Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
    nj-shetty authored and intel-lab-lkp committed Mar 29, 2023
    Copy the full SHA
    61819d2 View commit details
    Browse the repository at this point in the history
  7. block: add emulation for copy

    For the devices which does not support copy, copy emulation is added.
    It is required for in-kernel users like fabrics, where file descriptor is
    not available and hence they can't use copy_file_range.
    Copy-emulation is implemented by reading from source into memory and
    writing to the corresponding destination asynchronously.
    Also emulation is used, if copy offload fails or partially completes.
    
    Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
    Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
    Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
    nj-shetty authored and intel-lab-lkp committed Mar 29, 2023
    Copy the full SHA
    ba285db View commit details
    Browse the repository at this point in the history
  8. block: Add copy offload support infrastructure

    Introduce blkdev_issue_copy which takes similar arguments as
    copy_file_range and performs copy offload between two bdevs.
    Introduce REQ_COPY copy offload operation flag. Create a read-write
    bio pair with a token as payload and submitted to the device in order.
    Read request populates token with source specific information which
    is then passed with write request.
    This design is courtesy Mikulas Patocka's token based copy
    
    Larger copy will be divided, based on max_copy_sectors limit.
    
    Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
    Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
    nj-shetty authored and intel-lab-lkp committed Mar 29, 2023
    Copy the full SHA
    f1e6711 View commit details
    Browse the repository at this point in the history
  9. block: Introduce queue limits for copy-offload support

    Add device limits as sysfs entries,
            - copy_offload (RW)
            - copy_max_bytes (RW)
            - copy_max_bytes_hw (RO)
    
    Above limits help to split the copy payload in block layer.
    copy_offload: used for setting copy offload(1) or emulation(0).
    copy_max_bytes: maximum total length of copy in single payload.
    copy_max_bytes_hw: Reflects the device supported maximum limit.
    
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
    Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
    Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
    nj-shetty authored and intel-lab-lkp committed Mar 29, 2023
    Copy the full SHA
    2b36aee View commit details
    Browse the repository at this point in the history

Commits on Mar 28, 2023

  1. Merge branch 'iter-ubuf' into for-next

    * iter-ubuf:
      iov_iter: import single vector iovecs as ITER_UBUF
      iov_iter: convert import_single_range() to ITER_UBUF
      IB/qib: make qib_write_iter() deal with ITER_UBUF iov_iter
      IB/hfi1: make hfi1_write_iter() deal with ITER_UBUF iov_iter
      snd: make snd_map_bufs() deal with ITER_UBUF
      snd: move mapping an iov_iter to user bufs into a helper
      iov_iter: add iovec_nr_user_vecs() helper
      iov_iter: teach iov_iter_iovec() to deal with ITER_UBUF
    axboe committed Mar 28, 2023
    Copy the full SHA
    a9ea7aa View commit details
    Browse the repository at this point in the history
  2. Merge branch 'for-6.4/block' into for-next

    * for-6.4/block:
      block: open code __blk_account_io_done()
      block: open code __blk_account_io_start()
    axboe committed Mar 28, 2023
    Copy the full SHA
    6d37a4c View commit details
    Browse the repository at this point in the history
  3. iov_iter: import single vector iovecs as ITER_UBUF

    Add a special case to __import_iovec(), which imports a single segment
    iovec as an ITER_UBUF rather than an ITER_IOVEC. ITER_UBUF is cheaper
    to iterate than ITER_IOVEC, and for a single segment iovec, there's no
    point in using a segmented iterator.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Mar 28, 2023
    Copy the full SHA
    d7ab4c3 View commit details
    Browse the repository at this point in the history
  4. iov_iter: convert import_single_range() to ITER_UBUF

    Since we're just importing a single vector, we don't have to turn it
    into an ITER_IOVEC. Instead turn it into an ITER_UBUF, which is cheaper
    to iterate.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Mar 28, 2023
    Copy the full SHA
    8e5b52b View commit details
    Browse the repository at this point in the history
  5. IB/qib: make qib_write_iter() deal with ITER_UBUF iov_iter

    Don't assume that a user backed iterator is always of the type
    ITER_IOVEC. Handle the single segment case separately, then we can
    use the same logic for ITER_UBUF and ITER_IOVEC.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Mar 28, 2023
    Copy the full SHA
    c3b880c View commit details
    Browse the repository at this point in the history
  6. IB/hfi1: make hfi1_write_iter() deal with ITER_UBUF iov_iter

    Don't assume that a user backed iterator is always of the type
    ITER_IOVEC. Handle the single segment case separately, then we can
    use the same logic for ITER_UBUF and ITER_IOVEC.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Mar 28, 2023
    Copy the full SHA
    cf0e2c5 View commit details
    Browse the repository at this point in the history
  7. snd: make snd_map_bufs() deal with ITER_UBUF

    This probably doesn't make any sense, as it's reliant on passing in
    different things in multiple segments. Most likely we can just make
    this go away as it's already checking for ITER_IOVEC upon entry, and
    it looks like nr_segments == 2 is the smallest legal value. IOW, any
    attempt to readv/writev with 1 segment would fail with -EINVAL already.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Mar 28, 2023
    Copy the full SHA
    cddb236 View commit details
    Browse the repository at this point in the history
  8. snd: move mapping an iov_iter to user bufs into a helper

    snd_pcm_{readv,writev} both do the same mapping of a struct iov_iter
    into an array of buffers. Move this into a helper.
    
    No functional changes intended in this patch.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Mar 28, 2023
    Copy the full SHA
    afa254a View commit details
    Browse the repository at this point in the history
  9. iov_iter: add iovec_nr_user_vecs() helper

    This returns the number of user segments in an iov_iter. The input can
    either be an ITER_IOVEC, where it'll return the number of iovecs. Or it
    can be an ITER_UBUF, in which case the number of segments is always 1.
    
    Outside of those two, no user backed iterators exist. Just return 0 for
    those.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Mar 28, 2023
    Copy the full SHA
    128fc83 View commit details
    Browse the repository at this point in the history

Commits on Mar 27, 2023

  1. block: open code __blk_account_io_done()

    There is only one caller for __blk_account_io_done(), the function
    is small enough to fit in its caller blk_account_io_done().
    
    Remove the function and opencode in the its caller
    blk_account_io_done().
    
    Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20230327073427.4403-2-kch@nvidia.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Chaitanya Kulkarni authored and axboe committed Mar 27, 2023
    Copy the full SHA
    0696503 View commit details
    Browse the repository at this point in the history
  2. block: open code __blk_account_io_start()

    There is only one caller for __blk_account_io_start(), the function
    is small enough to fit in its caller blk_account_io_start().
    
    Remove the function and opencode in the its caller
    blk_account_io_start().
    
    Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20230327073427.4403-2-kch@nvidia.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Chaitanya Kulkarni authored and axboe committed Mar 27, 2023
    Copy the full SHA
    e165fb4 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'for-6.4/io_uring' into for-next

    * for-6.4/io_uring:
      io_uring: encapsulate task_work state
      io_uring: remove extra tw trylocks
      io_uring/io-wq: drop outdated comment
      io_uring: kill unused notif declarations
      io_uring/rw: transform single vector readv/writev into ubuf
      io-wq: Drop struct io_wqe
      io-wq: Move wq accounting to io_wq
      io_uring/kbuf: disallow mapping a badly aligned provided ring buffer
      io_uring: Add KASAN support for alloc_caches
      io_uring: Move from hlist to io_wq_work_node
      io_uring: One wqe per wq
      io_uring: add support for user mapped provided buffer ring
      io_uring/kbuf: rename struct io_uring_buf_reg 'pad' to'flags'
      io_uring/kbuf: add buffer_list->is_mapped member
      io_uring/kbuf: move pinning of provided buffer ring into helper
      io_uring: Adjust mapping wrt architecture aliasing requirements
      io_uring: avoid hashing O_DIRECT writes if the filesystem doesn't need it
      fs: add FMODE_DIO_PARALLEL_WRITE flag
    axboe committed Mar 27, 2023
    Copy the full SHA
    505fc44 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'for-6.4/block' into for-next

    * for-6.4/block:
      blk-mq: remove hybrid polling
      blk-crypto: drop the NULL check from blk_crypto_put_keyslot()
      blk-mq: return actual keyslot error in blk_insert_cloned_request()
      blk-crypto: remove blk_crypto_insert_cloned_request()
      blk-crypto: make blk_crypto_evict_key() more robust
      blk-crypto: make blk_crypto_evict_key() return void
      blk-mq: release crypto keyslot before reporting I/O complete
      nbd: use the structured req attr check
      nbd: allow genl access outside init_net
    axboe committed Mar 27, 2023
    Copy the full SHA
    0d6664d View commit details
    Browse the repository at this point in the history
  5. Merge branch 'for-6.4/splice' into for-next

    * for-6.4/splice:
      block: convert bio_map_user_iov to use iov_iter_extract_pages
      block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages
      block: Add BIO_PAGE_PINNED and associated infrastructure
      block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic
      block: Fix bio_flagged() so that gcc can better optimise it
      iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing
      iov_iter: Kill ITER_PIPE
      cifs: Use generic_file_splice_read()
      splice: Do splice read from a file without using ITER_PIPE
      tty, proc, kernfs, random: Use direct_splice_read()
      coda: Implement splice-read
      overlayfs: Implement splice-read
      shmem: Implement splice-read
      splice: Make do_splice_to() generic and export it
      splice: Clean up direct_splice_read() a bit
    axboe committed Mar 27, 2023
    Copy the full SHA
    da6c606 View commit details
    Browse the repository at this point in the history
  6. io_uring: encapsulate task_work state

    For task works we're passing around a bool pointer for whether the
    current ring is locked or not, let's wrap it in a structure, that
    will make it more opaque preventing abuse and will also help us
    to pass more info in the future if needed.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Mar 27, 2023
    Copy the full SHA
    f14d589 View commit details
    Browse the repository at this point in the history
  7. io_uring: remove extra tw trylocks

    Before cond_resched()'ing in handle_tw_list() we also drop the current
    ring context, and so the next loop iteration will need to pick/pin a new
    context and do trylock.
    
    The chunk removed by this patch was intended to be an optimisation
    covering exactly this case, i.e. retaking the lock after reschedule, but
    in reality it's skipped for the first iteration after resched as
    described and will keep hammering the lock if it's contended.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Mar 27, 2023
    Copy the full SHA
    8281650 View commit details
    Browse the repository at this point in the history
  8. io_uring/io-wq: drop outdated comment

    Since the move to PF_IO_WORKER, we don't juggle memory context manually
    anymore. Remove that outdated part of the comment for __io_worker_idle().
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Mar 27, 2023
    Copy the full SHA
    ba46d13 View commit details
    Browse the repository at this point in the history
  9. io_uring: kill unused notif declarations

    There are two leftover structures from the notification registration
    mechanism that has never been released, kill them.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/f05f65aebaf8b1b5bf28519a8fdb350e3e7c9ad0.1679924536.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Mar 27, 2023
    Copy the full SHA
    e3546d2 View commit details
    Browse the repository at this point in the history
  10. io_uring/rw: transform single vector readv/writev into ubuf

    It's very common to have applications that use vectored reads or writes,
    even if they only pass in a single segment. Obviously they should be
    using read/write at that point, but...
    
    Vectored IO comes with the downside of needing to retain iovec state,
    and hence they require and allocation and state copy if they end up
    getting deferred. Additionally, they also require extra cleanup when
    completed as the memory as the allocated state memory has to be freed.
    
    Automatically transform single segment IORING_OP_{READV,WRITEV} into
    IORING_OP_{READ,WRITE}, and hence into an ITER_UBUF. Outside of being
    more efficient if needing deferral, ITER_UBUF is also more efficient
    for normal processing compared to ITER_IOVEC, as they don't require
    iteration. The latter is apparent when running peak testing, where
    using IORING_OP_READV to randomly read 24 drives previously scored:
    
    IOPS=72.54M, BW=35.42GiB/s, IOS/call=32/31
    IOPS=73.35M, BW=35.81GiB/s, IOS/call=32/31
    IOPS=72.71M, BW=35.50GiB/s, IOS/call=32/31
    IOPS=73.29M, BW=35.78GiB/s, IOS/call=32/32
    IOPS=73.45M, BW=35.86GiB/s, IOS/call=32/32
    IOPS=73.19M, BW=35.74GiB/s, IOS/call=31/32
    IOPS=72.89M, BW=35.59GiB/s, IOS/call=32/31
    IOPS=73.07M, BW=35.68GiB/s, IOS/call=32/32
    
    and after the change we get:
    
    IOPS=77.31M, BW=37.75GiB/s, IOS/call=32/31
    IOPS=77.32M, BW=37.75GiB/s, IOS/call=32/32
    IOPS=77.45M, BW=37.81GiB/s, IOS/call=31/31
    IOPS=77.47M, BW=37.83GiB/s, IOS/call=32/32
    IOPS=77.14M, BW=37.67GiB/s, IOS/call=32/32
    IOPS=77.14M, BW=37.66GiB/s, IOS/call=31/31
    IOPS=77.37M, BW=37.78GiB/s, IOS/call=32/32
    IOPS=77.25M, BW=37.72GiB/s, IOS/call=32/32
    
    which is a nice win as well.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Mar 27, 2023
    Copy the full SHA
    613eb95 View commit details
    Browse the repository at this point in the history
  11. io-wq: Drop struct io_wqe

    Since commit 0654b05 ("io_uring: One wqe per wq"), we have just a
    single io_wqe instance embedded per io_wq.  Drop the extra structure in
    favor of accessing struct io_wq directly, cleaning up quite a bit of
    dereferences and backpointers.
    
    No functional changes intended.  Tested with liburing's testsuite
    and mmtests performance microbenchmarks.  I didn't observe any
    performance regressions.
    
    Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
    Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Gabriel Krisman Bertazi authored and axboe committed Mar 27, 2023
    Copy the full SHA
    0ea9518 View commit details
    Browse the repository at this point in the history
  12. io-wq: Move wq accounting to io_wq

    Since we now have a single io_wqe per io_wq instead of per-node, and in
    preparation to its removal, move the accounting into the parent
    structure.
    
    Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
    Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Gabriel Krisman Bertazi authored and axboe committed Mar 27, 2023
    Copy the full SHA
    f8db24f View commit details
    Browse the repository at this point in the history
  13. io_uring/kbuf: disallow mapping a badly aligned provided ring buffer

    On at least parisc, we have strict requirements on how we virtually map
    an address that is shared between the application and the kernel. On
    these platforms, IOU_PBUF_RING_MMAP should be used when setting up a
    shared ring buffer for provided buffers. If the application is mapping
    these pages and asking the kernel to pin+map them as well, then we have
    no control over what virtual address we get in the kernel.
    
    For that case, do a sanity check if SHM_COLOUR is defined, and disallow
    the mapping request. The application must fall back to using
    IOU_PBUF_RING_MMAP for this case, and liburing will do that transparently
    with the set of helpers that it has.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Mar 27, 2023
    Copy the full SHA
    8c1578f View commit details
    Browse the repository at this point in the history
  14. io_uring: Add KASAN support for alloc_caches

    Add support for KASAN in the alloc_caches (apoll and netmsg_cache).
    Thus, if something touches the unused caches, it will raise a KASAN
    warning/exception.
    
    It poisons the object when the object is put to the cache, and unpoisons
    it when the object is gotten or freed.
    
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
    Link: https://lore.kernel.org/r/20230223164353.2839177-2-leitao@debian.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    leitao authored and axboe committed Mar 27, 2023
    Copy the full SHA
    16afed1 View commit details
    Browse the repository at this point in the history
  15. io_uring: Move from hlist to io_wq_work_node

    Having cache entries linked using the hlist format brings no benefit, and
    also requires an unnecessary extra pointer address per cache entry.
    
    Use the internal io_wq_work_node single-linked list for the internal
    alloc caches (async_msghdr and async_poll)
    
    This is required to be able to use KASAN on cache entries, since we do
    not need to touch unused (and poisoned) cache entries when adding more
    entries to the list.
    
    Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Link: https://lore.kernel.org/r/20230223164353.2839177-2-leitao@debian.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    leitao authored and axboe committed Mar 27, 2023
    Copy the full SHA
    66eb95a View commit details
    Browse the repository at this point in the history
  16. io_uring: One wqe per wq

    Right now io_wq allocates one io_wqe per NUMA node.  As io_wq is now
    bound to a task, the task basically uses only the NUMA local io_wqe, and
    almost never changes NUMA nodes, thus, the other wqes are mostly
    unused.
    
    Allocate just one io_wqe embedded into io_wq, and uses all possible cpus
    (cpu_possible_mask) in the io_wqe->cpumask.
    
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Link: https://lore.kernel.org/r/20230310201107.4020580-1-leitao@debian.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    leitao authored and axboe committed Mar 27, 2023
    Copy the full SHA
    7fbf6d7 View commit details
    Browse the repository at this point in the history
  17. io_uring: add support for user mapped provided buffer ring

    The ring mapped provided buffer rings rely on the application allocating
    the memory for the ring, and then the kernel will map it. This generally
    works fine, but runs into issues on some architectures where we need
    to be able to ensure that the kernel and application virtual address for
    the ring play nicely together. This at least impacts architectures that
    set SHM_COLOUR, but potentially also anyone setting SHMLBA.
    
    To use this variant of ring provided buffers, the application need not
    allocate any memory for the ring. Instead the kernel will do so, and
    the allocation must subsequently call mmap(2) on the ring with the
    offset set to:
    
    	IORING_OFF_PBUF_RING | (bgid << IORING_OFF_PBUF_SHIFT)
    
    to get a virtual address for the buffer ring. Normally the application
    would allocate a suitable piece of memory (and correctly aligned) and
    simply pass that in via io_uring_buf_reg.ring_addr and the kernel would
    map it.
    
    Outside of the setup differences, the kernel allocate + user mapped
    provided buffer ring works exactly the same.
    
    Acked-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Mar 27, 2023
    Copy the full SHA
    0c4831d View commit details
    Browse the repository at this point in the history
Older