Skip to content

Commits

Permalink
Christoph-Hell…
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on Nov 14, 2022

  1. blk-crypto: move internal only declarations to blk-crypto-internal.h

     blk_crypto_get_keyslot, blk_crypto_put_keyslot, __blk_crypto_evict_key
    and __blk_crypto_cfg_supported are only used internally by the
    blk-crypto code, so move the out of blk-crypto-profile.h, which is
    included by drivers that supply blk-crypto functionality.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Christoph Hellwig authored and intel-lab-lkp committed Nov 14, 2022
    Copy the full SHA
    981a7e9 View commit details
    Browse the repository at this point in the history
  2. blk-crypto: add a blk_crypto_config_supported_natively helper

    Add a blk_crypto_config_supported_natively helper that wraps
    __blk_crypto_cfg_supported to retrieve the crypto_profile from the
    request queue.  With this fscrypt can stop including
    blk-crypto-profile.h and rely on the public consumer interface in
    blk-crypto.h.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Christoph Hellwig authored and intel-lab-lkp committed Nov 14, 2022
    Copy the full SHA
    03fa384 View commit details
    Browse the repository at this point in the history
  3. blk-crypto: don't use struct request_queue for public interfaces

    Switch all public blk-crypto interfaces to use struct block_device
    arguments to specify the device they operate on instead of th
    request_queue, which is a block layer implementation detail.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Christoph Hellwig authored and intel-lab-lkp committed Nov 14, 2022
    Copy the full SHA
    02def9f View commit details
    Browse the repository at this point in the history

Commits on Nov 12, 2022

  1. Merge branch 'for-6.2/io_uring' into for-next

    * for-6.2/io_uring:
      io_uring: split tw fallback into a function
      io_uring: inline io_req_task_work_add()
    axboe committed Nov 12, 2022
    Copy the full SHA
    939f803 View commit details
    Browse the repository at this point in the history
  2. io_uring: split tw fallback into a function

    When the target process is dying and so task_work_add() is not allowed
    we push all task_work item to the fallback workqueue. Move the part
    responsible for moving tw items out of __io_req_task_work_add() into
    a separate function. Makes it a bit cleaner and gives the compiler a bit
    of extra info.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/e503dab9d7af95470ca6b214c6de17715ae4e748.1668162751.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Nov 12, 2022
    Copy the full SHA
    79fde04 View commit details
    Browse the repository at this point in the history
  3. io_uring: inline io_req_task_work_add()

    __io_req_task_work_add() is huge but marked inline, that makes compilers
    to generate lots of garbage. Inline the wrapper caller
    io_req_task_work_add() instead.
    
    before and after:
       text    data     bss     dec     hex filename
      47347   16248       8   63603    f873 io_uring/io_uring.o
       text    data     bss     dec     hex filename
      45303   16248       8   61559    f077 io_uring/io_uring.o
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/26dc8c28ca0160e3269ef3e55c5a8b917c4d4450.1668162751.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Nov 12, 2022
    Copy the full SHA
    912f3f5 View commit details
    Browse the repository at this point in the history

Commits on Nov 11, 2022

  1. Merge branch 'for-6.2/block' into for-next

    * for-6.2/block:
      sbitmap: Use single per-bitmap counting to wake up queued tags
    axboe committed Nov 11, 2022
    Copy the full SHA
    008cb91 View commit details
    Browse the repository at this point in the history
  2. sbitmap: Use single per-bitmap counting to wake up queued tags

    sbitmap suffers from code complexity, as demonstrated by recent fixes,
    and eventual lost wake ups on nested I/O completion.  The later happens,
    from what I understand, due to the non-atomic nature of the updates to
    wait_cnt, which needs to be subtracted and eventually reset when equal
    to zero.  This two step process can eventually miss an update when a
    nested completion happens to interrupt the CPU in between the wait_cnt
    updates.  This is very hard to fix, as shown by the recent changes to
    this code.
    
    The code complexity arises mostly from the corner cases to avoid missed
    wakes in this scenario.  In addition, the handling of wake_batch
    recalculation plus the synchronization with sbq_queue_wake_up is
    non-trivial.
    
    This patchset implements the idea originally proposed by Jan [1], which
    removes the need for the two-step updates of wait_cnt.  This is done by
    tracking the number of completions and wakeups in always increasing,
    per-bitmap counters.  Instead of having to reset the wait_cnt when it
    reaches zero, we simply keep counting, and attempt to wake up N threads
    in a single wait queue whenever there is enough space for a batch.
    Waking up less than batch_wake shouldn't be a problem, because we
    haven't changed the conditions for wake up, and the existing batch
    calculation guarantees at least enough remaining completions to wake up
    a batch for each queue at any time.
    
    Performance-wise, one should expect very similar performance to the
    original algorithm for the case where there is no queueing.  In both the
    old algorithm and this implementation, the first thing is to check
    ws_active, which bails out if there is no queueing to be managed. In the
    new code, we took care to avoid accounting completions and wakeups when
    there is no queueing, to not pay the cost of atomic operations
    unnecessarily, since it doesn't skew the numbers.
    
    For more interesting cases, where there is queueing, we need to take
    into account the cross-communication of the atomic operations.  I've
    been benchmarking by running parallel fio jobs against a single hctx
    nullb in different hardware queue depth scenarios, and verifying both
    IOPS and queueing.
    
    Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel
    jobs. fio was issuing fixed-size randwrites with qd=64 against nullb,
    varying only the hardware queue length per test.
    
    queue size 2                 4                 8                 16                 32                 64
    6.1-rc2    1681.1K (1.6K)    2633.0K (12.7K)   6940.8K (16.3K)   8172.3K (617.5K)   8391.7K (367.1K)   8606.1K (351.2K)
    patched    1721.8K (15.1K)   3016.7K (3.8K)    7543.0K (89.4K)   8132.5K (303.4K)   8324.2K (230.6K)   8401.8K (284.7K)
    
    The following is a similar experiment, ran against a nullb with a single
    bitmap shared by 20 hctx spread across 2 NUMA nodes. This has 40
    parallel fio jobs operating on the same device
    
    queue size 2 	             4                 8              	16             	    32		       64
    6.1-rc2	   1081.0K (2.3K)    957.2K (1.5K)     1699.1K (5.7K) 	6178.2K (124.6K)    12227.9K (37.7K)   13286.6K (92.9K)
    patched	   1081.8K (2.8K)    1316.5K (5.4K)    2364.4K (1.8K) 	6151.4K  (20.0K)    11893.6K (17.5K)   12385.6K (18.4K)
    
    It has also survived blktests and a 12h-stress run against nullb. I also
    ran the code against nvme and a scsi SSD, and I didn't observe
    performance regression in those. If there are other tests you think I
    should run, please let me know and I will follow up with results.
    
    [1] https://lore.kernel.org/all/aef9de29-e9f5-259a-f8be-12d1b734e72@google.com/
    
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Keith Busch <kbusch@kernel.org>
    Cc: Liu Song <liusong@linux.alibaba.com>
    Suggested-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
    Link: https://lore.kernel.org/r/20221105231055.25953-1-krisman@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Gabriel Krisman Bertazi authored and axboe committed Nov 11, 2022
    Copy the full SHA
    4f8126b View commit details
    Browse the repository at this point in the history

Commits on Nov 10, 2022

  1. Merge branch 'for-6.2/block' into for-next

    * for-6.2/block:
      blk-mq: simplify blk_mq_realloc_tag_set_tags
      blk-mq: remove blk_mq_alloc_tag_set_tags
    axboe committed Nov 10, 2022
    Copy the full SHA
    32befda View commit details
    Browse the repository at this point in the history
  2. blk-mq: simplify blk_mq_realloc_tag_set_tags

    Use set->nr_hw_queues for the current number of tags, and remove the
    duplicate set->nr_hw_queues update in the caller.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20221109100811.2413423-2-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Nov 10, 2022
    Copy the full SHA
    ee9d552 View commit details
    Browse the repository at this point in the history
  3. blk-mq: remove blk_mq_alloc_tag_set_tags

    There is no point in trying to share any code with the realloc case when
    all that is needed by the initial tagset allocation is a simple
    kcalloc_node.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20221109100811.2413423-1-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Nov 10, 2022
    Copy the full SHA
    5ee2029 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'for-6.2/io_uring' into for-next

    * for-6.2/io_uring:
      io_uring: update outdated comment of callbacks
      io_uring/poll: remove outdated comments of caching
    axboe committed Nov 10, 2022
    Copy the full SHA
    d3a94a6 View commit details
    Browse the repository at this point in the history
  5. io_uring: update outdated comment of callbacks

    Previous commit ebc11b6 ("io_uring: clean io-wq callbacks") rename
    io_free_work() into io_wq_free_work() for consistency. This patch also
    updates relevant comment to avoid misunderstanding.
    
    Fixes: ebc11b6 ("io_uring: clean io-wq callbacks")
    Signed-off-by: Lin Ma <linma@zju.edu.cn>
    Link: https://lore.kernel.org/r/20221110122103.20120-1-linma@zju.edu.cn
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    f0rm2l1n authored and axboe committed Nov 10, 2022
    Copy the full SHA
    55acb79 View commit details
    Browse the repository at this point in the history
  6. io_uring/poll: remove outdated comments of caching

    Previous commit 13a9901 ("io_uring: remove events caching
    atavisms") entirely removes the events caching optimization introduced
    by commit 8145935 ("io_uring: cache req->apoll->events in
    req->cflags"). Hence the related comment should also be removed to avoid
    misunderstanding.
    
    Fixes: 13a9901 ("io_uring: remove events caching atavisms")
    Signed-off-by: Lin Ma <linma@zju.edu.cn>
    Link: https://lore.kernel.org/r/20221110060313.16303-1-linma@zju.edu.cn
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    f0rm2l1n authored and axboe committed Nov 10, 2022
    Copy the full SHA
    ff611c8 View commit details
    Browse the repository at this point in the history

Commits on Nov 9, 2022

  1. Merge branch 'for-6.2/block' into for-next

    * for-6.2/block:
      bfq: ignore oom_bfqq in bfq_check_waker
      bfq: fix waker_bfqq inconsistency crash
      drbd: Store op in drbd_peer_request
      drbd: disable discard support if granularity > max
      drbd: use blk_queue_max_discard_sectors helper
    axboe committed Nov 9, 2022
    Copy the full SHA
    dac6a87 View commit details
    Browse the repository at this point in the history
  2. bfq: ignore oom_bfqq in bfq_check_waker

    oom_bfqq is just a fallback bfqq, so shouldn't be used with waker
    detection.
    
    Suggested-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20221108181030.1611703-2-khazhy@google.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Khazhismel Kumykov authored and axboe committed Nov 9, 2022
    Copy the full SHA
    99771d7 View commit details
    Browse the repository at this point in the history
  3. bfq: fix waker_bfqq inconsistency crash

    This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
    but woken_list_node still being hashed. This would happen when
    bfq_init_rq() expects a brand new allocated queue to be returned from
    bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
    without resetting woken_list_node. Since we can always return oom_bfqq
    when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
    
    Avoid setting woken_bfqq for oom_bfqq entirely, as it's not useful.
    
    Crashes would have a stacktrace like:
    [160595.656560]  bfq_add_bfqq_busy+0x110/0x1ec
    [160595.661142]  bfq_add_request+0x6bc/0x980
    [160595.666602]  bfq_insert_request+0x8ec/0x1240
    [160595.671762]  bfq_insert_requests+0x58/0x9c
    [160595.676420]  blk_mq_sched_insert_request+0x11c/0x198
    [160595.682107]  blk_mq_submit_bio+0x270/0x62c
    [160595.686759]  __submit_bio_noacct_mq+0xec/0x178
    [160595.691926]  submit_bio+0x120/0x184
    [160595.695990]  ext4_mpage_readpages+0x77c/0x7c8
    [160595.701026]  ext4_readpage+0x60/0xb0
    [160595.705158]  filemap_read_page+0x54/0x114
    [160595.711961]  filemap_fault+0x228/0x5f4
    [160595.716272]  do_read_fault+0xe0/0x1f0
    [160595.720487]  do_fault+0x40/0x1c8
    
    Tested by injecting random failures into bfq_get_queue, crashes go away
    completely.
    
    Fixes: 8ef3fc3 ("block, bfq: make shared queues inherit wakers")
    Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20221108181030.1611703-1-khazhy@google.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Khazhismel Kumykov authored and axboe committed Nov 9, 2022
    Copy the full SHA
    a1795c2 View commit details
    Browse the repository at this point in the history
  4. drbd: Store op in drbd_peer_request

    (Sort of) cherry-picked from the out-of-tree drbd9 branch. Original
    commit message by Joel Colledge:
    
        This simplifies drbd_submit_peer_request by removing most of the
        arguments. It also makes the treatment of the op better aligned with
        that in struct bio.
    
        Determine fault_type dynamically using information which is already
        available instead of passing it in as a parameter.
    
    Note: The opf in receive_rs_deallocated was changed from
    REQ_OP_WRITE_ZEROES to REQ_OP_DISCARD. This was required in the
    out-of-tree module, and does not matter in-tree. The opf is ignored
    anyway in drbd_submit_peer_request, since the discard/zero-out is
    decided by the EE_TRIM flag.
    
    Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
    Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
    Link: https://lore.kernel.org/r/20221109133453.51652-4-christoph.boehmwalder@linbit.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    chrboe authored and axboe committed Nov 9, 2022
    Copy the full SHA
    ce668b6 View commit details
    Browse the repository at this point in the history
  5. drbd: disable discard support if granularity > max

    The discard_granularity describes the minimum unit of a discard.
    If that is larger than the maximal discard size, we need to disable
    discards completely.
    
    Reviewed-by: Joel Colledge <joel.colledge@linbit.com>
    Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
    Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
    Link: https://lore.kernel.org/r/20221109133453.51652-3-christoph.boehmwalder@linbit.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Philipp-Reisner authored and axboe committed Nov 9, 2022
    Copy the full SHA
    21b87a7 View commit details
    Browse the repository at this point in the history
  6. drbd: use blk_queue_max_discard_sectors helper

    We currently only set q->limits.max_discard_sectors, but that is not
    enough. Another field, max_hw_discard_sectors, was introduced in
    commit 0034af0 ("block: make /sys/block/<dev>/queue/discard_max_bytes
    writeable").
    
    The difference is that max_discard_sectors can be changed from user
    space via sysfs, while max_hw_discard_sectors is the "hardware" upper
    limit.
    
    So use this helper, which sets both.
    
    This is also a fixup for commit 998e9cb ("drbd: cleanup
    decide_on_discard_support"): if discards are not supported, that does
    not necessarily mean we also want to disable write_zeroes.
    
    Fixes: 998e9cb ("drbd: cleanup decide_on_discard_support")
    Reviewed-by: Joel Colledge <joel.colledge@linbit.com>
    Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
    Link: https://lore.kernel.org/r/20221109133453.51652-2-christoph.boehmwalder@linbit.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    chrboe authored and axboe committed Nov 9, 2022
    Copy the full SHA
    258bea6 View commit details
    Browse the repository at this point in the history
  7. Merge branches 'for-6.2/block' and 'for-next' into for-next

    * for-6.2/block:
      ABI: sysfs-bus-pci: add documentation for p2pmem allocate
      PCI/P2PDMA: Allow userspace VMA allocations through sysfs
      block: set FOLL_PCI_P2PDMA in bio_map_user_iov()
      block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages()
      lib/scatterlist: add check when merging zone device pages
      block: add check when merging zone device pages
      iov_iter: introduce iov_iter_get_pages_[alloc_]flags()
      mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages
      mm: allow multiple error returns in try_grab_page()
    
    * for-next:
    axboe committed Nov 9, 2022
    Copy the full SHA
    a2c4b08 View commit details
    Browse the repository at this point in the history
  8. ABI: sysfs-bus-pci: add documentation for p2pmem allocate

    Add documentation for the p2pmem/allocate binary file which allows
    for allocating p2pmem buffers in userspace for passing to drivers
    that support them. (Currently only O_DIRECT to NVMe devices.)
    
    Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20221021174116.7200-10-logang@deltatee.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    lsgunth authored and axboe committed Nov 9, 2022
    Copy the full SHA
    6d4338c View commit details
    Browse the repository at this point in the history
  9. PCI/P2PDMA: Allow userspace VMA allocations through sysfs

    Create a sysfs bin attribute called "allocate" under the existing
    "p2pmem" group. The only allowable operation on this file is the mmap()
    call.
    
    When mmap() is called on this attribute, the kernel allocates a chunk of
    memory from the genalloc and inserts the pages into the VMA. The
    dev_pagemap .page_free callback will indicate when these pages are no
    longer used and they will be put back into the genalloc.
    
    On device unbind, remove the sysfs file before the memremap_pages are
    cleaned up. This ensures unmap_mapping_range() is called on the files
    inode and no new mappings can be created.
    
    Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20221021174116.7200-9-logang@deltatee.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    lsgunth authored and axboe committed Nov 9, 2022
    Copy the full SHA
    7e9c7ef View commit details
    Browse the repository at this point in the history
  10. block: set FOLL_PCI_P2PDMA in bio_map_user_iov()

    When a bio's queue supports PCI P2PDMA, set FOLL_PCI_P2PDMA for
    iov_iter_get_pages_flags(). This allows PCI P2PDMA pages to be
    passed from userspace and enables the NVMe passthru requests to
    use P2PDMA pages.
    
    Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20221021174116.7200-8-logang@deltatee.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    lsgunth authored and axboe committed Nov 9, 2022
    Copy the full SHA
    7ee4ccf View commit details
    Browse the repository at this point in the history
  11. block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages()

    When a bio's queue supports PCI P2PDMA, set FOLL_PCI_P2PDMA for
    iov_iter_get_pages_flags(). This allows PCI P2PDMA pages to be passed
    from userspace and enables the O_DIRECT path in iomap based filesystems
    and direct to block devices.
    
    Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20221021174116.7200-7-logang@deltatee.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    lsgunth authored and axboe committed Nov 9, 2022
    Copy the full SHA
    5e3e3f2 View commit details
    Browse the repository at this point in the history
  12. lib/scatterlist: add check when merging zone device pages

    Consecutive zone device pages should not be merged into the same sgl
    or bvec segment with other types of pages or if they belong to different
    pgmaps. Otherwise getting the pgmap of a given segment is not possible
    without scanning the entire segment. This helper returns true either if
    both pages are not zone device pages or both pages are zone device
    pages with the same pgmap.
    
    Factor out the check for page mergability into a pages_are_mergable()
    helper and add a check with zone_device_pages_are_mergeable().
    
    Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20221021174116.7200-6-logang@deltatee.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    lsgunth authored and axboe committed Nov 9, 2022
    Copy the full SHA
    1567b49 View commit details
    Browse the repository at this point in the history
  13. block: add check when merging zone device pages

    Consecutive zone device pages should not be merged into the same sgl
    or bvec segment with other types of pages or if they belong to different
    pgmaps. Otherwise getting the pgmap of a given segment is not possible
    without scanning the entire segment. This helper returns true either if
    both pages are not zone device pages or both pages are zone device
    pages with the same pgmap.
    
    Add a helper to determine if zone device pages are mergeable and use
    this helper in page_is_mergeable().
    
    Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20221021174116.7200-5-logang@deltatee.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    lsgunth authored and axboe committed Nov 9, 2022
    Copy the full SHA
    49580e6 View commit details
    Browse the repository at this point in the history
  14. iov_iter: introduce iov_iter_get_pages_[alloc_]flags()

    Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags()
    which take a flags argument that is passed to get_user_pages_fast().
    
    This is so that FOLL_PCI_P2PDMA can be passed when appropriate.
    
    Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20221021174116.7200-4-logang@deltatee.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    lsgunth authored and axboe committed Nov 9, 2022
    Copy the full SHA
    d820764 View commit details
    Browse the repository at this point in the history
  15. mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages

    GUP Callers that expect PCI P2PDMA pages can now set FOLL_PCI_P2PDMA to
    allow obtaining P2PDMA pages. If GUP is called without the flag and a
    P2PDMA page is found, it will return an error in try_grab_page() or
    try_grab_folio().
    
    The check is safe to do before taking the reference to the page in both
    cases seeing the page should be protected by either the appropriate
    ptl or mmap_lock; or the gup fast guarantees preventing TLB flushes.
    
    try_grab_folio() has one call site that WARNs on failure and cannot
    actually deal with the failure of this function (it seems it will
    get into an infinite loop). Expand the comment there to document a
    couple more conditions on why it will not fail.
    
    FOLL_PCI_P2PDMA cannot be set if FOLL_LONGTERM is set. This is to copy
    fsdax until pgmap refcounts are fixed (see the link below for more
    information).
    
    Link: https://lkml.kernel.org/r/Yy4Ot5MoOhsgYLTQ@ziepe.ca
    Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20221021174116.7200-3-logang@deltatee.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    lsgunth authored and axboe committed Nov 9, 2022
    Copy the full SHA
    4003f10 View commit details
    Browse the repository at this point in the history
  16. mm: allow multiple error returns in try_grab_page()

    In order to add checks for P2PDMA memory into try_grab_page(), expand
    the error return from a bool to an int/error code. Update all the
    callsites handle change in usage.
    
    Also remove the WARN_ON_ONCE() call at the callsites seeing there
    already is a WARN_ON_ONCE() inside the function if it fails.
    
    Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20221021174116.7200-2-logang@deltatee.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    lsgunth authored and axboe committed Nov 9, 2022
    Copy the full SHA
    0f08923 View commit details
    Browse the repository at this point in the history

Commits on Nov 7, 2022

  1. Merge branch 'for-6.2/io_uring' into for-next

    * for-6.2/io_uring:
      io_uring: remove allow_overflow parameter
      io_uring: allow multishot recv CQEs to overflow
      io_uring: revert "io_uring: fix multishot poll on overflow"
      io_uring: revert "io_uring fix multishot accept ordering"
      io_uring: do not always force run task_work in io_uring_register
    axboe committed Nov 7, 2022
    Copy the full SHA
    72cc2f8 View commit details
    Browse the repository at this point in the history
  2. io_uring: remove allow_overflow parameter

    It is now always true, so just remove it
    
    Signed-off-by: Dylan Yudaken <dylany@meta.com>
    Link: https://lore.kernel.org/r/20221107125236.260132-5-dylany@meta.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Dylan Yudaken authored and axboe committed Nov 7, 2022
    Copy the full SHA
    6488182 View commit details
    Browse the repository at this point in the history
  3. io_uring: allow multishot recv CQEs to overflow

    With commit aa1df3a ("io_uring: fix CQE reordering"), there are
    stronger guarantees for overflow ordering. Specifically ensuring that
    userspace will not receive out of order receive CQEs. Therefore this is
    not needed any more for recv/recvmsg.
    
    Signed-off-by: Dylan Yudaken <dylany@meta.com>
    Link: https://lore.kernel.org/r/20221107125236.260132-4-dylany@meta.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Dylan Yudaken authored and axboe committed Nov 7, 2022
    Copy the full SHA
    beecb96 View commit details
    Browse the repository at this point in the history
  4. io_uring: revert "io_uring: fix multishot poll on overflow"

    This is no longer needed after commit aa1df3a ("io_uring: fix CQE
    reordering"), since all reordering is now taken care of.
    
    This reverts commit a2da676 ("io_uring: fix multishot poll on
    overflow").
    
    Signed-off-by: Dylan Yudaken <dylany@meta.com>
    Link: https://lore.kernel.org/r/20221107125236.260132-3-dylany@meta.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Dylan Yudaken authored and axboe committed Nov 7, 2022
    Copy the full SHA
    7bf3f5a View commit details
    Browse the repository at this point in the history
  5. io_uring: revert "io_uring fix multishot accept ordering"

    This is no longer needed after commit aa1df3a ("io_uring: fix CQE
    reordering"), since all reordering is now taken care of.
    
    This reverts commit cbd2574 ("io_uring: fix multishot accept
    ordering").
    
    Signed-off-by: Dylan Yudaken <dylany@meta.com>
    Link: https://lore.kernel.org/r/20221107125236.260132-2-dylany@meta.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Dylan Yudaken authored and axboe committed Nov 7, 2022
    Copy the full SHA
    0166128 View commit details
    Browse the repository at this point in the history
Older