Skip to content

Commits

Permalink
Uday-Shankar/b…
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on Feb 15, 2023

  1. blk-mq: enforce op-specific segment limits in blk_insert_cloned_request

    The block layer might merge together discard requests up until the
    max_discard_segments limit is hit, but blk_insert_cloned_request checks
    the segment count against max_segments regardless of the req op. This
    can result in errors like the following when discards are issued through
    a DM device and max_discard_segments exceeds max_segments for the queue
    of the chosen underlying device.
    
    blk_insert_cloned_request: over max segments limit. (256 > 129)
    
    Fix this by looking at the req_op and enforcing the appropriate segment
    limit - max_discard_segments for REQ_OP_DISCARDs and max_segments for
    everything else.
    
    Signed-off-by: Uday Shankar <ushankar@purestorage.com>
    Uday Shankar authored and intel-lab-lkp committed Feb 15, 2023
    Copy the full SHA
    2cd958b View commit details
    Browse the repository at this point in the history

Commits on Feb 14, 2023

  1. Merge branch 'for-6.3/iov-extract' into for-next

    * for-6.3/iov-extract: (177 commits)
      mm: move FOLL_PIN debug accounting under CONFIG_DEBUG_VM
      Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
      Revert "blk-cgroup: pass a gendisk to blkg_lookup"
      Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
      Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
      Revert "blk-cgroup: move the cgroup information to struct gendisk"
      block: convert bio_map_user_iov to use iov_iter_extract_pages
      block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages
      block: Add BIO_PAGE_PINNED and associated infrastructure
      block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic
      block: Fix bio_flagged() so that gcc can better optimise it
      iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing
      iov_iter: Add a function to extract a page list from an iterator
      iov_iter: Define flags to qualify page extraction.
      iov_iter: Kill ITER_PIPE
      splice: Do splice read from a file without using ITER_PIPE
      tty, proc, kernfs, random: Use direct_splice_read()
      coda: Implement splice-read
      overlayfs: Implement splice-read
      shmem: Implement splice-read
      ...
    axboe committed Feb 14, 2023
    Copy the full SHA
    6bea9ac View commit details
    Browse the repository at this point in the history
  2. mm: move FOLL_PIN debug accounting under CONFIG_DEBUG_VM

    Using FOLL_PIN for mapping user pages caused a performance regression of
    about 2.7%. Looking at profiles, we see:
    
    +2.71%  [kernel.vmlinux]  [k] mod_node_page_state
    
    which wasn't there before. The node page state counters are percpu, but
    with a very low threshold. On my setup, every 108th update ends up
    needing to punt to two atomic_lond_add()'s, which is causing this above
    regression.
    
    As these counters are purely for debug purposes, move them under
    CONFIG_DEBUG_VM rather than do them unconditionally. Note that this
    commit does not fix a real bug with the commits identified as being
    fixed, rather it ensures that we don't regress on performance due to
    those commits moving to using FOLL_PIN rather than FOLL_GET.
    
    Fixes: 33f4320 ("block: convert bio_map_user_iov to use iov_iter_extract_pages")
    Fixes: b699de6 ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages")
    Link: https://lore.kernel.org/linux-block/f57ee72f-38e9-6afa-182f-2794638eadcb@kernel.dk/
    Link: https://lore.kernel.org/all/54b0b07a-c178-9ffe-b5af-088f3c21696c@kernel.dk/
    Acked-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Feb 14, 2023
    Copy the full SHA
    a8fcff6 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'iov-extract' of https://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/dhowells/linux-fs into for-6.3/iov-extract
    
    Pull iov-extract from David:
    
    "Here are patches to provide support for extracting pages from an iov_iter
     and to use this in the extraction functions in the block layer bio code.
    
     The patches make the following changes:
    
     (1) Change generic_file_splice_read() to no longer use ITER_PIPE for doing
         a read from an O_DIRECT file fd, but rather load up an ITER_BVEC
         iterator with sufficient pages and use that rather than using an
         ITER_PIPE.  This avoids a problem[2] when __iomap_dio_rw() calls
         iov_iter_revert() to shorten an iterator when it races with
         truncation.  The reversion causes the pipe iterator to prematurely
         release the pages it was retaining - despite the read still being in
         progress.  This caused memory corruption.
    
     (2) Change generic_file_splice_read() to no longer use ITER_PIPE for doing
         a read from a buffered file fd, but rather get pages directly from the
         pagecache using filemap_get_pages() do all the readahead, reading,
         waiting and extraction, and then feed the pages directly into the
         pipe.
    
     (3) filemap_get_pages() is altered so that it doesn't take an iterator
         (which we don't have in (2)), but rather the count and a flag
         indicating if we can handle partially uptodate pages are passed in and
         down to its subsidiary functions.
    
     (4) Remove ITER_PIPE and its paraphernalia as generic_file_splice_read()
         was the only user.
    
     (5) Add a function, iov_iter_extract_pages() to replace
         iov_iter_get_pages*() that gets refs, pins or just lists the pages as
         appropriate to the iterator type.
    
         Add a function, iov_iter_extract_will_pin() that will indicate from
         the iterator type how the cleanup is to be performed, returning true
         if the pages will need unpinning, false otherwise.
    
     (6) Make the bio struct carry a pair of flags to indicate the cleanup
         mode.  BIO_NO_PAGE_REF is replaced with BIO_PAGE_REFFED (indicating
         FOLL_GET was used) and BIO_PAGE_PINNED (indicating FOLL_PIN was used)
         is added.
    
         BIO_PAGE_REFFED will go away, but at the moment fs/direct-io.c sets it
         and this series does not fully address that file.
    
     (7) Add a function, bio_release_page(), to release a page appropriately to
         the cleanup mode indicated by the BIO_PAGE_* flags.
    
     (8) Make the iter-to-bio code use iov_iter_extract_pages() to retain the
         pages appropriately and clean them up later.
    
     (9) Fix bio_flagged() so that it doesn't prevent a gcc optimisation."
    
    * 'iov-extract' of https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (299 commits)
      block: convert bio_map_user_iov to use iov_iter_extract_pages
      block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages
      block: Add BIO_PAGE_PINNED and associated infrastructure
      block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic
      block: Fix bio_flagged() so that gcc can better optimise it
      iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing
      iov_iter: Add a function to extract a page list from an iterator
      iov_iter: Define flags to qualify page extraction.
      iov_iter: Kill ITER_PIPE
      splice: Do splice read from a file without using ITER_PIPE
      tty, proc, kernfs, random: Use direct_splice_read()
      coda: Implement splice-read
      overlayfs: Implement splice-read
      shmem: Implement splice-read
      splice: Add a func to do a splice from an O_DIRECT file without ITER_PIPE
      splice: Add a func to do a splice from a buffered file without ITER_PIPE
      mm: Pass info, not iter, into filemap_get_pages()
      Linux 6.2-rc7
      fbcon: Check font dimension limits
      efi: fix potential NULL deref in efi_mem_reserve_persistent
      ...
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Feb 14, 2023
    Copy the full SHA
    651d77d View commit details
    Browse the repository at this point in the history
  4. Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"

    This reverts commit 84d7d46.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20230214183308.1658775-6-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 14, 2023
    Copy the full SHA
    a06377c View commit details
    Browse the repository at this point in the history
  5. Revert "blk-cgroup: pass a gendisk to blkg_lookup"

    This reverts commit 821e840c08ad83736eced4037cdad864e95e2584.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20230214183308.1658775-5-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 14, 2023
    Copy the full SHA
    9a9c261 View commit details
    Browse the repository at this point in the history
  6. Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"

    This reverts commit 178fa7d.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20230214183308.1658775-4-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 14, 2023
    Copy the full SHA
    b6553be View commit details
    Browse the repository at this point in the history
  7. Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"

    This reverts commit c43332f as it is not
    needed without moving to disk references in the blkg.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20230214183308.1658775-3-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 14, 2023
    Copy the full SHA
    b4e94f9 View commit details
    Browse the repository at this point in the history
  8. Revert "blk-cgroup: move the cgroup information to struct gendisk"

    This reverts commit 3f13ab7 as a patch
    it depends on caused a few problems.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20230214183308.1658775-2-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 14, 2023
    Copy the full SHA
    1231039 View commit details
    Browse the repository at this point in the history
  9. block: convert bio_map_user_iov to use iov_iter_extract_pages

    This will pin pages or leave them unaltered rather than getting a ref on
    them as appropriate to the iterator.
    
    The pages need to be pinned for DIO rather than having refs taken on them
    to prevent VM copy-on-write from malfunctioning during a concurrent fork()
    (the result of the I/O could otherwise end up being visible to/affected by
    the child process).
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Jan Kara <jack@suse.cz>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: Logan Gunthorpe <logang@deltatee.com>
    cc: linux-block@vger.kernel.org
    dhowells committed Feb 14, 2023
    Copy the full SHA
    07a6467 View commit details
    Browse the repository at this point in the history
  10. block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages

    This will pin pages or leave them unaltered rather than getting a ref on
    them as appropriate to the iterator.
    
    The pages need to be pinned for DIO rather than having refs taken on them to
    prevent VM copy-on-write from malfunctioning during a concurrent fork() (the
    result of the I/O could otherwise end up being affected by/visible to the
    child process).
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Jan Kara <jack@suse.cz>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: Logan Gunthorpe <logang@deltatee.com>
    cc: linux-block@vger.kernel.org
    dhowells committed Feb 14, 2023
    Copy the full SHA
    6bc4b6f View commit details
    Browse the repository at this point in the history
  11. block: Add BIO_PAGE_PINNED and associated infrastructure

    Add BIO_PAGE_PINNED to indicate that the pages in a bio are pinned
    (FOLL_PIN) and that the pin will need removing.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Jan Kara <jack@suse.cz>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: Logan Gunthorpe <logang@deltatee.com>
    cc: linux-block@vger.kernel.org
    dhowells committed Feb 14, 2023
    Copy the full SHA
    05ff102 View commit details
    Browse the repository at this point in the history
  12. block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic

    Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted
    meaning is only set when a page reference has been acquired that needs to
    be released by bio_release_pages().
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: David Howells <dhowells@redhat.com>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Jan Kara <jack@suse.cz>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: Logan Gunthorpe <logang@deltatee.com>
    cc: linux-block@vger.kernel.org
    Christoph Hellwig authored and dhowells committed Feb 14, 2023
    Copy the full SHA
    c0ba657 View commit details
    Browse the repository at this point in the history
  13. block: Fix bio_flagged() so that gcc can better optimise it

    Fix bio_flagged() so that multiple instances of it, such as:
    
    	if (bio_flagged(bio, BIO_PAGE_REFFED) ||
    	    bio_flagged(bio, BIO_PAGE_PINNED))
    
    can be combined by the gcc optimiser into a single test in assembly
    (arguably, this is a compiler optimisation issue[1]).
    
    The missed optimisation stems from bio_flagged() comparing the result of
    the bitwise-AND to zero.  This results in an out-of-line bio_release_page()
    being compiled to something like:
    
       <+0>:     mov    0x14(%rdi),%eax
       <+3>:     test   $0x1,%al
       <+5>:     jne    0xffffffff816dac53 <bio_release_pages+11>
       <+7>:     test   $0x2,%al
       <+9>:     je     0xffffffff816dac5c <bio_release_pages+20>
       <+11>:    movzbl %sil,%esi
       <+15>:    jmp    0xffffffff816daba1 <__bio_release_pages>
       <+20>:    jmp    0xffffffff81d0b800 <__x86_return_thunk>
    
    However, the test is superfluous as the return type is bool.  Removing it
    results in:
    
       <+0>:     testb  $0x3,0x14(%rdi)
       <+4>:     je     0xffffffff816e4af4 <bio_release_pages+15>
       <+6>:     movzbl %sil,%esi
       <+10>:    jmp    0xffffffff816dab7c <__bio_release_pages>
       <+15>:    jmp    0xffffffff81d0b7c0 <__x86_return_thunk>
    
    instead.
    
    Also, the MOVZBL instruction looks unnecessary[2] - I think it's just
    're-booling' the mark_dirty parameter.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: linux-block@vger.kernel.org
    Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1]
    Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2]
    Link: https://lore.kernel.org/r/167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk/ # v6
    dhowells committed Feb 14, 2023
    Copy the full SHA
    758a53d View commit details
    Browse the repository at this point in the history
  14. iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing

    ZERO_PAGE can't go away, no need to hold an extra reference.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: David Hildenbrand <david@redhat.com>
    cc: linux-fsdevel@vger.kernel.org
    dhowells committed Feb 14, 2023
    Copy the full SHA
    cc2d046 View commit details
    Browse the repository at this point in the history
  15. iov_iter: Add a function to extract a page list from an iterator

    Add a function, iov_iter_extract_pages(), to extract a list of pages from
    an iterator.  The pages may be returned with a pin added or nothing,
    depending on the type of iterator.
    
    Add a second function, iov_iter_extract_will_pin(), to determine how the
    cleanup should be done.
    
    There are two cases:
    
     (1) ITER_IOVEC or ITER_UBUF iterator.
    
         Extracted pages will have pins (FOLL_PIN) obtained on them so that a
         concurrent fork() will forcibly copy the page so that DMA is done
         to/from the parent's buffer and is unavailable to/unaffected by the
         child process.
    
         iov_iter_extract_will_pin() will return true for this case.  The
         caller should use something like unpin_user_page() to dispose of the
         page.
    
     (2) Any other sort of iterator.
    
         No refs or pins are obtained on the page, the assumption is made that
         the caller will manage page retention.
    
         iov_iter_extract_will_pin() will return false.  The pages don't need
         additional disposal.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: John Hubbard <jhubbard@nvidia.com>
    cc: David Hildenbrand <david@redhat.com>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: linux-fsdevel@vger.kernel.org
    cc: linux-mm@kvack.org
    dhowells committed Feb 14, 2023
    Copy the full SHA
    0fff5a3 View commit details
    Browse the repository at this point in the history
  16. iov_iter: Define flags to qualify page extraction.

    Define flags to qualify page extraction to pass into iov_iter_*_pages*()
    rather than passing in FOLL_* flags.
    
    For now only a flag to allow peer-to-peer DMA is supported.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Logan Gunthorpe <logang@deltatee.com>
    cc: linux-fsdevel@vger.kernel.org
    cc: linux-block@vger.kernel.org
    dhowells committed Feb 14, 2023
    Copy the full SHA
    51e851b View commit details
    Browse the repository at this point in the history
  17. iov_iter: Kill ITER_PIPE

    The ITER_PIPE-type iterator was only used for generic_file_splice_read(),
    but that has now been switched to either pull pages directly from the
    pagecache for buffered file splice-reads or to use ITER_BVEC instead for
    O_DIRECT file splice-reads.  This leaves ITER_PIPE unused - so remove it.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: David Hildenbrand <david@redhat.com>
    cc: John Hubbard <jhubbard@nvidia.com>
    cc: linux-mm@kvack.org
    cc: linux-block@vger.kernel.org
    cc: linux-fsdevel@vger.kernel.org
    dhowells committed Feb 14, 2023
    Copy the full SHA
    45b9487 View commit details
    Browse the repository at this point in the history
  18. splice: Do splice read from a file without using ITER_PIPE

    Make generic_file_splice_read() use filemap_splice_read() and
    direct_splice_read() rather than using an ITER_PIPE and call_read_iter().
    
    With this, ITER_PIPE is no longer used.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Christoph Hellwig <hch@lst.de>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: David Hildenbrand <david@redhat.com>
    cc: John Hubbard <jhubbard@nvidia.com>
    cc: linux-mm@kvack.org
    cc: linux-block@vger.kernel.org
    cc: linux-fsdevel@vger.kernel.org
    dhowells committed Feb 14, 2023
    Copy the full SHA
    8dfb4a0 View commit details
    Browse the repository at this point in the history
  19. tty, proc, kernfs, random: Use direct_splice_read()

    Use direct_splice_read() for tty, procfs, kernfs and random files rather
    than going through generic_file_splice_read() as they just copy the file
    into the output buffer and don't splice pages.  This avoids the need for
    them to have a ->read_folio() to satisfy filemap_splice_read().
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Christoph Hellwig <hch@lst.de>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: John Hubbard <jhubbard@nvidia.com>
    cc: David Hildenbrand <david@redhat.com>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: Miklos Szeredi <miklos@szeredi.hu>
    cc: Arnd Bergmann <arnd@arndb.de>
    cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    cc: linux-block@vger.kernel.org
    cc: linux-fsdevel@vger.kernel.org
    cc: linux-mm@kvack.org
    dhowells committed Feb 14, 2023
    Copy the full SHA
    5c93cea View commit details
    Browse the repository at this point in the history
  20. coda: Implement splice-read

    Implement splice-read for coda by passing the request down a layer rather
    than going through generic_file_splice_read() which is going to be changed
    to assume that ->read_folio() is present on buffered files.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Christoph Hellwig <hch@lst.de>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: John Hubbard <jhubbard@nvidia.com>
    cc: David Hildenbrand <david@redhat.com>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: Jan Harkes <jaharkes@cs.cmu.edu>
    cc: coda@cs.cmu.edu
    cc: codalist@coda.cs.cmu.edu
    cc: linux-unionfs@vger.kernel.org
    cc: linux-block@vger.kernel.org
    cc: linux-fsdevel@vger.kernel.org
    cc: linux-mm@kvack.org
    dhowells committed Feb 14, 2023
    Copy the full SHA
    508918e View commit details
    Browse the repository at this point in the history
  21. overlayfs: Implement splice-read

    Implement splice-read for overlayfs by passing the request down a layer
    rather than going through generic_file_splice_read() which is going to be
    changed to assume that ->read_folio() is present on buffered files.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Christoph Hellwig <hch@lst.de>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: John Hubbard <jhubbard@nvidia.com>
    cc: David Hildenbrand <david@redhat.com>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: Miklos Szeredi <miklos@szeredi.hu>
    cc: linux-unionfs@vger.kernel.org
    cc: linux-block@vger.kernel.org
    cc: linux-fsdevel@vger.kernel.org
    cc: linux-mm@kvack.org
    dhowells committed Feb 14, 2023
    Copy the full SHA
    02d1dac View commit details
    Browse the repository at this point in the history
  22. shmem: Implement splice-read

    The new filemap_splice_read() has an implicit expectation via
    filemap_get_pages() that ->read_folio() exists if ->readahead() doesn't
    fully populate the pagecache of the file it is reading from[1], potentially
    leading to a jump to NULL if this doesn't exist.  shmem, however, (and by
    extension, tmpfs, ramfs and rootfs), doesn't have ->read_folio(),
    
    Work around this by equipping shmem with its own splice-read
    implementation, based on filemap_splice_read(), but able to paste in
    zero_page when there's a page missing.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Daniel Golle <daniel@makrotopia.org>
    cc: Guenter Roeck <groeck7@gmail.com>
    cc: Christoph Hellwig <hch@lst.de>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: John Hubbard <jhubbard@nvidia.com>
    cc: David Hildenbrand <david@redhat.com>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: Hugh Dickins <hughd@google.com>
    cc: linux-block@vger.kernel.org
    cc: linux-fsdevel@vger.kernel.org
    cc: linux-mm@kvack.org
    Link: https://lore.kernel.org/r/Y+pdHFFTk1TTEBsO@makrotopia.org/ [1]
    dhowells committed Feb 14, 2023
    Copy the full SHA
    cc606dc View commit details
    Browse the repository at this point in the history
  23. splice: Add a func to do a splice from an O_DIRECT file without ITER_…

    …PIPE
    
    Implement a function, direct_file_splice(), that deals with this by using
    an ITER_BVEC iterator instead of an ITER_PIPE iterator as the former won't
    free its buffers when reverted.  The function bulk allocates all the
    buffers it thinks it is going to use in advance, does the read
    synchronously and only then trims the buffer down.  The pages we did use
    get pushed into the pipe.
    
    This fixes a problem with the upcoming iov_iter_extract_pages() function,
    whereby pages extracted from a non-user-backed iterator such as ITER_PIPE
    aren't pinned.  __iomap_dio_rw(), however, calls iov_iter_revert() to
    shorten the iterator to just the bufferage it is going to use - which has
    the side-effect of freeing the excess pipe buffers, even though they're
    attached to a bio and may get written to by DMA (thanks to Hillf Danton for
    spotting this[1]).
    
    This then causes memory corruption that is particularly noticable when the
    syzbot test[2] is run.  The test boils down to:
    
    	out = creat(argv[1], 0666);
    	ftruncate(out, 0x800);
    	lseek(out, 0x200, SEEK_SET);
    	in = open(argv[1], O_RDONLY | O_DIRECT | O_NOFOLLOW);
    	sendfile(out, in, NULL, 0x1dd00);
    
    run repeatedly in parallel.  What I think is happening is that ftruncate()
    occasionally shortens the DIO read that's about to be made by sendfile's
    splice core by reducing i_size.
    
    This should be more efficient for DIO read by virtue of doing a bulk page
    allocation, but slightly less efficient by ignoring any partial page in the
    pipe.
    
    Reported-by: syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Christoph Hellwig <hch@lst.de>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: David Hildenbrand <david@redhat.com>
    cc: John Hubbard <jhubbard@nvidia.com>
    cc: linux-mm@kvack.org
    cc: linux-block@vger.kernel.org
    cc: linux-fsdevel@vger.kernel.org
    Link: https://lore.kernel.org/r/20230207094731.1390-1-hdanton@sina.com/ [1]
    Link: https://lore.kernel.org/r/000000000000b0b3c005f3a09383@google.com/ [2]
    dhowells committed Feb 14, 2023
    Copy the full SHA
    f2aa2c5 View commit details
    Browse the repository at this point in the history
  24. splice: Add a func to do a splice from a buffered file without ITER_PIPE

    Provide a function to do splice read from a buffered file, pulling the
    folios out of the pagecache directly by calling filemap_get_pages() to do
    any required reading and then pasting the returned folios into the pipe.
    
    A helper function is provided to do the actual folio pasting and will
    handle multipage folios by splicing as many of the relevant subpages as
    will fit into the pipe.
    
    The code is loosely based on filemap_read() and might belong in
    mm/filemap.c with that as it needs to use filemap_get_pages().
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Christoph Hellwig <hch@lst.de>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: David Hildenbrand <david@redhat.com>
    cc: John Hubbard <jhubbard@nvidia.com>
    cc: linux-mm@kvack.org
    cc: linux-block@vger.kernel.org
    cc: linux-fsdevel@vger.kernel.org
    dhowells committed Feb 14, 2023
    Copy the full SHA
    a53cad0 View commit details
    Browse the repository at this point in the history
  25. mm: Pass info, not iter, into filemap_get_pages()

    filemap_get_pages() and a number of functions that it calls take an
    iterator to provide two things: the number of bytes to be got from the file
    specified and whether partially uptodate pages are allowed.  Change these
    functions so that this information is passed in directly.  This allows it
    to be called without having an iterator to hand.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Christoph Hellwig <hch@lst.de>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: David Hildenbrand <david@redhat.com>
    cc: John Hubbard <jhubbard@nvidia.com>
    cc: linux-mm@kvack.org
    cc: linux-block@vger.kernel.org
    cc: linux-fsdevel@vger.kernel.org
    dhowells committed Feb 14, 2023
    Copy the full SHA
    78e11ab View commit details
    Browse the repository at this point in the history

Commits on Feb 13, 2023

  1. block: ublk: check IO buffer based on flag need_get_data

    Currently, uring_cmd with UBLK_IO_FETCH_REQ or
    UBLK_IO_COMMIT_AND_FETCH_REQ is always checked whether
    userspace server has provided IO buffer even flag
    UBLK_F_NEED_GET_DATA is configured.
    
    This is a excessive check. If UBLK_F_NEED_GET_DATA is
    configured, FETCH_RQ doesn't need to provide IO buffer;
    COMMIT_AND_FETCH_REQ also doesn't need to do that if
    the IO type is not READ.
    
    Check ub_cmd->addr together with ublk_need_get_data()
    and IO type in ublk_ch_uring_cmd().
    
    With this fix, userspace server doesn't need to preserve
    buffers for every ublk_io when flag UBLK_F_NEED_GET_DATA
    is configured, in order to save memory.
    
    Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
    Fixes: c86019f ("ublk_drv: add support for UBLK_IO_NEED_GET_DATA")
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20230210141356.112321-1-xiaodong.liu@intel.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    dong-liuliu authored and axboe committed Feb 13, 2023
    Copy the full SHA
    2f1e07d View commit details
    Browse the repository at this point in the history

Commits on Feb 10, 2023

  1. Merge branch 'for-6.3/io_uring' into for-next

    * for-6.3/io_uring: (50 commits)
      io_uring,audit: don't log IORING_OP_MADVISE
      io_uring: mark task TASK_RUNNING before handling resume/task work
      io_uring: always go async for unsupported open flags
      io_uring: always go async for unsupported fadvise flags
      io_uring: for requests that require async, force it
      io_uring: if a linked request has REQ_F_FORCE_ASYNC then run it async
      io_uring: add reschedule point to handle_tw_list()
      io_uring: add a conditional reschedule to the IOPOLL cancelation loop
      io_uring: return normal tw run linking optimisation
      io_uring: refactor tctx_task_work
      io_uring: refactor io_put_task helpers
      io_uring: refactor req allocation
      io_uring: improve io_get_sqe
      io_uring: kill outdated comment about overflow flush
      io_uring: use user visible tail in io_uring_poll()
      io_uring: pass in io_issue_def to io_assign_file()
      io_uring: Enable KASAN for request cache
      io_uring: handle TIF_NOTIFY_RESUME when checking for task_work
      io_uring/msg-ring: ensure flags passing works for task_work completions
      io_uring: Split io_issue_def struct
      ...
    axboe committed Feb 10, 2023
    Copy the full SHA
    6938b81 View commit details
    Browse the repository at this point in the history
  2. io_uring,audit: don't log IORING_OP_MADVISE

    fadvise and madvise both provide hints for caching or access pattern for
    file and memory respectively.  Skip them.
    
    Fixes: 5bd2182 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    Link: https://lore.kernel.org/r/b5dfdcd541115c86dbc774aa9dd502c964849c5f.1675282642.git.rgb@redhat.com
    Acked-by: Paul Moore <paul@paul-moore.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    rgbriggs authored and axboe committed Feb 10, 2023
    Copy the full SHA
    fbe870a View commit details
    Browse the repository at this point in the history
  3. Merge branch 'for-6.3/iter-ubuf' into for-next

    * for-6.3/iter-ubuf:
      block: use iter_ubuf for single range
      iov_iter: move iter_ubuf check inside restore WARN
      io_uring: use iter_ubuf for single range imports
      io_uring: switch network send/recv to ITER_UBUF
      iov: add import_ubuf()
    axboe committed Feb 10, 2023
    Copy the full SHA
    84dadf6 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'for-6.3/dio' into for-next

    * for-6.3/dio:
      fs: build the legacy direct I/O code conditionally
      fs: move sb_init_dio_done_wq out of direct-io.c
    axboe committed Feb 10, 2023
    Copy the full SHA
    e2499c3 View commit details
    Browse the repository at this point in the history
  5. Merge tag 'soc-fixes-6.2-4' of git://git.kernel.org/pub/scm/linux/ker…

    …nel/git/soc/soc
    
    Pull ARM SoC fixes from Arnd Bergmann:
     "All the changes this time are minor devicetree corrections, the
      majority being for 64-bit Rockchip SoC support. These are a couple of
      corrections for properties that are in violation of the binding, some
      that put the machine into safer operating points for the eMMC and
      thermal settings, and missing properties that prevented rk356x PCIe
      and ethernet from working correctly.
    
      The changes for amlogic and mediatek address incorrect properties that
      were preventing the display support on MT8195 and the MMC support on
      various Meson SoCs from working correctly.
    
      The stihxxx-b2120 change fixes the GPIO polarity for the DVB tuner to
      allow this to be used correctly after a futre driver change, though it
      has no effect on older kernels"
    
    * tag 'soc-fixes-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
      arm64: dts: meson-gx: Make mmc host controller interrupts level-sensitive
      arm64: dts: meson-g12-common: Make mmc host controller interrupts level-sensitive
      arm64: dts: meson-axg: Make mmc host controller interrupts level-sensitive
      ARM: dts: stihxxx-b2120: fix polarity of reset line of tsin0 port
      arm64: dts: mediatek: mt8195: Fix vdosys* compatible strings
      arm64: dts: rockchip: align rk3399 DMC OPP table with bindings
      arm64: dts: rockchip: set sdmmc0 speed to sd-uhs-sdr50 on rock-3a
      arm64: dts: rockchip: fix probe of analog sound card on rock-3a
      arm64: dts: rockchip: add missing #interrupt-cells to rk356x pcie2x1
      arm64: dts: rockchip: fix input enable pinconf on rk3399
      ARM: dts: rockchip: add power-domains property to dp node on rk3288
      arm64: dts: rockchip: add io domain setting to rk3566-box-demo
      arm64: dts: rockchip: remove unsupported property from sdmmc2 for rock-3a
      arm64: dts: rockchip: drop unused LED mode property from rk3328-roc-cc
      arm64: dts: rockchip: reduce thermal limits on rk3399-pinephone-pro
      arm64: dts: rockchip: use correct reset names for rk3399 crypto nodes
    torvalds committed Feb 10, 2023
    Copy the full SHA
    4f72a26 View commit details
    Browse the repository at this point in the history
  6. Merge tag 'riscv-for-linus-6.2-rc8' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/riscv/linux
    
    Pull RISC-V fixes from Palmer Dabbelt:
     "This is a little bigger that I'd hope for this late in the cycle, but
      they're all pretty concrete fixes and the only one that's bigger than
      a few lines is pmdp_collapse_flush() (which is almost all
      boilerplate/comment). It's also all bug fixes for issues that have
      been around for a while.
    
      So I think it's not all that scary, just bad timing.
    
       - avoid partial TLB fences for huge pages, which are disallowed by
         the ISA
    
       - avoid missing a frame when dumping stacks
    
       - avoid misaligned accesses (and possibly overflows) in kprobes
    
       - fix a race condition in tracking page dirtiness"
    
    * tag 'riscv-for-linus-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
      riscv: Fixup race condition on PG_dcache_clean in flush_icache_pte
      riscv: kprobe: Fixup misaligned load text
      riscv: stacktrace: Fix missing the first frame
      riscv: mm: Implement pmdp_collapse_flush for THP
    torvalds committed Feb 10, 2023
    Copy the full SHA
    8e9a842 View commit details
    Browse the repository at this point in the history
  7. Merge tag 'ceph-for-6.2-rc8' of https://github.com/ceph/ceph-client

    Pull ceph fix from Ilya Dryomov:
     "A fix for a pretty embarrassing omission in the session flush handler
      from Xiubo, marked for stable"
    
    * tag 'ceph-for-6.2-rc8' of https://github.com/ceph/ceph-client:
      ceph: flush cap releases when the session is flushed
    torvalds committed Feb 10, 2023
    Copy the full SHA
    3647d2d View commit details
    Browse the repository at this point in the history
  8. Merge tag 'block-6.2-2023-02-10' of git://git.kernel.dk/linux

    Pull block fix from Jens Axboe:
     "A single fix for a smatch regression introduced in this merge window"
    
    * tag 'block-6.2-2023-02-10' of git://git.kernel.dk/linux:
      nvme-auth: mark nvme_auth_wq static
    torvalds committed Feb 10, 2023
    Copy the full SHA
    2971668 View commit details
    Browse the repository at this point in the history
Older