Commits
Uday-Shankar/b…
Name already in use
Commits on Feb 15, 2023
-
blk-mq: enforce op-specific segment limits in blk_insert_cloned_request
The block layer might merge together discard requests up until the max_discard_segments limit is hit, but blk_insert_cloned_request checks the segment count against max_segments regardless of the req op. This can result in errors like the following when discards are issued through a DM device and max_discard_segments exceeds max_segments for the queue of the chosen underlying device. blk_insert_cloned_request: over max segments limit. (256 > 129) Fix this by looking at the req_op and enforcing the appropriate segment limit - max_discard_segments for REQ_OP_DISCARDs and max_segments for everything else. Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Commits on Feb 14, 2023
-
Merge branch 'for-6.3/iov-extract' into for-next
* for-6.3/iov-extract: (177 commits) mm: move FOLL_PIN debug accounting under CONFIG_DEBUG_VM Revert "blk-cgroup: pin the gendisk in struct blkcg_gq" Revert "blk-cgroup: pass a gendisk to blkg_lookup" Revert "blk-cgroup: delay blk-cgroup initialization until add_disk" Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release" Revert "blk-cgroup: move the cgroup information to struct gendisk" block: convert bio_map_user_iov to use iov_iter_extract_pages block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages block: Add BIO_PAGE_PINNED and associated infrastructure block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic block: Fix bio_flagged() so that gcc can better optimise it iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing iov_iter: Add a function to extract a page list from an iterator iov_iter: Define flags to qualify page extraction. iov_iter: Kill ITER_PIPE splice: Do splice read from a file without using ITER_PIPE tty, proc, kernfs, random: Use direct_splice_read() coda: Implement splice-read overlayfs: Implement splice-read shmem: Implement splice-read ...
-
mm: move FOLL_PIN debug accounting under CONFIG_DEBUG_VM
Using FOLL_PIN for mapping user pages caused a performance regression of about 2.7%. Looking at profiles, we see: +2.71% [kernel.vmlinux] [k] mod_node_page_state which wasn't there before. The node page state counters are percpu, but with a very low threshold. On my setup, every 108th update ends up needing to punt to two atomic_lond_add()'s, which is causing this above regression. As these counters are purely for debug purposes, move them under CONFIG_DEBUG_VM rather than do them unconditionally. Note that this commit does not fix a real bug with the commits identified as being fixed, rather it ensures that we don't regress on performance due to those commits moving to using FOLL_PIN rather than FOLL_GET. Fixes: 33f4320 ("block: convert bio_map_user_iov to use iov_iter_extract_pages") Fixes: b699de6 ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages") Link: https://lore.kernel.org/linux-block/f57ee72f-38e9-6afa-182f-2794638eadcb@kernel.dk/ Link: https://lore.kernel.org/all/54b0b07a-c178-9ffe-b5af-088f3c21696c@kernel.dk/ Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Merge branch 'iov-extract' of https://git.kernel.org/pub/scm/linux/ke…
…rnel/git/dhowells/linux-fs into for-6.3/iov-extract Pull iov-extract from David: "Here are patches to provide support for extracting pages from an iov_iter and to use this in the extraction functions in the block layer bio code. The patches make the following changes: (1) Change generic_file_splice_read() to no longer use ITER_PIPE for doing a read from an O_DIRECT file fd, but rather load up an ITER_BVEC iterator with sufficient pages and use that rather than using an ITER_PIPE. This avoids a problem[2] when __iomap_dio_rw() calls iov_iter_revert() to shorten an iterator when it races with truncation. The reversion causes the pipe iterator to prematurely release the pages it was retaining - despite the read still being in progress. This caused memory corruption. (2) Change generic_file_splice_read() to no longer use ITER_PIPE for doing a read from a buffered file fd, but rather get pages directly from the pagecache using filemap_get_pages() do all the readahead, reading, waiting and extraction, and then feed the pages directly into the pipe. (3) filemap_get_pages() is altered so that it doesn't take an iterator (which we don't have in (2)), but rather the count and a flag indicating if we can handle partially uptodate pages are passed in and down to its subsidiary functions. (4) Remove ITER_PIPE and its paraphernalia as generic_file_splice_read() was the only user. (5) Add a function, iov_iter_extract_pages() to replace iov_iter_get_pages*() that gets refs, pins or just lists the pages as appropriate to the iterator type. Add a function, iov_iter_extract_will_pin() that will indicate from the iterator type how the cleanup is to be performed, returning true if the pages will need unpinning, false otherwise. (6) Make the bio struct carry a pair of flags to indicate the cleanup mode. BIO_NO_PAGE_REF is replaced with BIO_PAGE_REFFED (indicating FOLL_GET was used) and BIO_PAGE_PINNED (indicating FOLL_PIN was used) is added. BIO_PAGE_REFFED will go away, but at the moment fs/direct-io.c sets it and this series does not fully address that file. (7) Add a function, bio_release_page(), to release a page appropriately to the cleanup mode indicated by the BIO_PAGE_* flags. (8) Make the iter-to-bio code use iov_iter_extract_pages() to retain the pages appropriately and clean them up later. (9) Fix bio_flagged() so that it doesn't prevent a gcc optimisation." * 'iov-extract' of https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (299 commits) block: convert bio_map_user_iov to use iov_iter_extract_pages block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages block: Add BIO_PAGE_PINNED and associated infrastructure block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic block: Fix bio_flagged() so that gcc can better optimise it iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing iov_iter: Add a function to extract a page list from an iterator iov_iter: Define flags to qualify page extraction. iov_iter: Kill ITER_PIPE splice: Do splice read from a file without using ITER_PIPE tty, proc, kernfs, random: Use direct_splice_read() coda: Implement splice-read overlayfs: Implement splice-read shmem: Implement splice-read splice: Add a func to do a splice from an O_DIRECT file without ITER_PIPE splice: Add a func to do a splice from a buffered file without ITER_PIPE mm: Pass info, not iter, into filemap_get_pages() Linux 6.2-rc7 fbcon: Check font dimension limits efi: fix potential NULL deref in efi_mem_reserve_persistent ... Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
This reverts commit 84d7d46. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Revert "blk-cgroup: pass a gendisk to blkg_lookup"
This reverts commit 821e840c08ad83736eced4037cdad864e95e2584. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
This reverts commit 178fa7d. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
This reverts commit c43332f as it is not needed without moving to disk references in the blkg. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Revert "blk-cgroup: move the cgroup information to struct gendisk"
This reverts commit 3f13ab7 as a patch it depends on caused a few problems. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
block: convert bio_map_user_iov to use iov_iter_extract_pages
This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O could otherwise end up being visible to/affected by the child process). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jan Kara <jack@suse.cz> cc: Matthew Wilcox <willy@infradead.org> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-block@vger.kernel.org
-
block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages
This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O could otherwise end up being affected by/visible to the child process). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jan Kara <jack@suse.cz> cc: Matthew Wilcox <willy@infradead.org> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-block@vger.kernel.org
-
block: Add BIO_PAGE_PINNED and associated infrastructure
Add BIO_PAGE_PINNED to indicate that the pages in a bio are pinned (FOLL_PIN) and that the pin will need removing. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jan Kara <jack@suse.cz> cc: Matthew Wilcox <willy@infradead.org> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-block@vger.kernel.org
-
block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic
Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted meaning is only set when a page reference has been acquired that needs to be released by bio_release_pages(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jan Kara <jack@suse.cz> cc: Matthew Wilcox <willy@infradead.org> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-block@vger.kernel.org
-
block: Fix bio_flagged() so that gcc can better optimise it
Fix bio_flagged() so that multiple instances of it, such as: if (bio_flagged(bio, BIO_PAGE_REFFED) || bio_flagged(bio, BIO_PAGE_PINNED)) can be combined by the gcc optimiser into a single test in assembly (arguably, this is a compiler optimisation issue[1]). The missed optimisation stems from bio_flagged() comparing the result of the bitwise-AND to zero. This results in an out-of-line bio_release_page() being compiled to something like: <+0>: mov 0x14(%rdi),%eax <+3>: test $0x1,%al <+5>: jne 0xffffffff816dac53 <bio_release_pages+11> <+7>: test $0x2,%al <+9>: je 0xffffffff816dac5c <bio_release_pages+20> <+11>: movzbl %sil,%esi <+15>: jmp 0xffffffff816daba1 <__bio_release_pages> <+20>: jmp 0xffffffff81d0b800 <__x86_return_thunk> However, the test is superfluous as the return type is bool. Removing it results in: <+0>: testb $0x3,0x14(%rdi) <+4>: je 0xffffffff816e4af4 <bio_release_pages+15> <+6>: movzbl %sil,%esi <+10>: jmp 0xffffffff816dab7c <__bio_release_pages> <+15>: jmp 0xffffffff81d0b7c0 <__x86_return_thunk> instead. Also, the MOVZBL instruction looks unnecessary[2] - I think it's just 're-booling' the mark_dirty parameter. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Jens Axboe <axboe@kernel.dk> cc: linux-block@vger.kernel.org Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2] Link: https://lore.kernel.org/r/167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk/ # v6
-
iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing
ZERO_PAGE can't go away, no need to hold an extra reference. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: linux-fsdevel@vger.kernel.org
-
iov_iter: Add a function to extract a page list from an iterator
Add a function, iov_iter_extract_pages(), to extract a list of pages from an iterator. The pages may be returned with a pin added or nothing, depending on the type of iterator. Add a second function, iov_iter_extract_will_pin(), to determine how the cleanup should be done. There are two cases: (1) ITER_IOVEC or ITER_UBUF iterator. Extracted pages will have pins (FOLL_PIN) obtained on them so that a concurrent fork() will forcibly copy the page so that DMA is done to/from the parent's buffer and is unavailable to/unaffected by the child process. iov_iter_extract_will_pin() will return true for this case. The caller should use something like unpin_user_page() to dispose of the page. (2) Any other sort of iterator. No refs or pins are obtained on the page, the assumption is made that the caller will manage page retention. iov_iter_extract_will_pin() will return false. The pages don't need additional disposal. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: John Hubbard <jhubbard@nvidia.com> cc: David Hildenbrand <david@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org -
iov_iter: Define flags to qualify page extraction.
Define flags to qualify page extraction to pass into iov_iter_*_pages*() rather than passing in FOLL_* flags. For now only a flag to allow peer-to-peer DMA is supported. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org
-
The ITER_PIPE-type iterator was only used for generic_file_splice_read(), but that has now been switched to either pull pages directly from the pagecache for buffered file splice-reads or to use ITER_BVEC instead for O_DIRECT file splice-reads. This leaves ITER_PIPE unused - so remove it. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org
-
splice: Do splice read from a file without using ITER_PIPE
Make generic_file_splice_read() use filemap_splice_read() and direct_splice_read() rather than using an ITER_PIPE and call_read_iter(). With this, ITER_PIPE is no longer used. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org
-
tty, proc, kernfs, random: Use direct_splice_read()
Use direct_splice_read() for tty, procfs, kernfs and random files rather than going through generic_file_splice_read() as they just copy the file into the output buffer and don't splice pages. This avoids the need for them to have a ->read_folio() to satisfy filemap_splice_read(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Christoph Hellwig <hch@lst.de> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: John Hubbard <jhubbard@nvidia.com> cc: David Hildenbrand <david@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Miklos Szeredi <miklos@szeredi.hu> cc: Arnd Bergmann <arnd@arndb.de> cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
-
Implement splice-read for coda by passing the request down a layer rather than going through generic_file_splice_read() which is going to be changed to assume that ->read_folio() is present on buffered files. Signed-off-by: David Howells <dhowells@redhat.com> cc: Christoph Hellwig <hch@lst.de> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: John Hubbard <jhubbard@nvidia.com> cc: David Hildenbrand <david@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Jan Harkes <jaharkes@cs.cmu.edu> cc: coda@cs.cmu.edu cc: codalist@coda.cs.cmu.edu cc: linux-unionfs@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
-
overlayfs: Implement splice-read
Implement splice-read for overlayfs by passing the request down a layer rather than going through generic_file_splice_read() which is going to be changed to assume that ->read_folio() is present on buffered files. Signed-off-by: David Howells <dhowells@redhat.com> cc: Christoph Hellwig <hch@lst.de> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: John Hubbard <jhubbard@nvidia.com> cc: David Hildenbrand <david@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Miklos Szeredi <miklos@szeredi.hu> cc: linux-unionfs@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
-
The new filemap_splice_read() has an implicit expectation via filemap_get_pages() that ->read_folio() exists if ->readahead() doesn't fully populate the pagecache of the file it is reading from[1], potentially leading to a jump to NULL if this doesn't exist. shmem, however, (and by extension, tmpfs, ramfs and rootfs), doesn't have ->read_folio(), Work around this by equipping shmem with its own splice-read implementation, based on filemap_splice_read(), but able to paste in zero_page when there's a page missing. Signed-off-by: David Howells <dhowells@redhat.com> cc: Daniel Golle <daniel@makrotopia.org> cc: Guenter Roeck <groeck7@gmail.com> cc: Christoph Hellwig <hch@lst.de> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: John Hubbard <jhubbard@nvidia.com> cc: David Hildenbrand <david@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Hugh Dickins <hughd@google.com> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/Y+pdHFFTk1TTEBsO@makrotopia.org/ [1]
-
splice: Add a func to do a splice from an O_DIRECT file without ITER_…
…PIPE Implement a function, direct_file_splice(), that deals with this by using an ITER_BVEC iterator instead of an ITER_PIPE iterator as the former won't free its buffers when reverted. The function bulk allocates all the buffers it thinks it is going to use in advance, does the read synchronously and only then trims the buffer down. The pages we did use get pushed into the pipe. This fixes a problem with the upcoming iov_iter_extract_pages() function, whereby pages extracted from a non-user-backed iterator such as ITER_PIPE aren't pinned. __iomap_dio_rw(), however, calls iov_iter_revert() to shorten the iterator to just the bufferage it is going to use - which has the side-effect of freeing the excess pipe buffers, even though they're attached to a bio and may get written to by DMA (thanks to Hillf Danton for spotting this[1]). This then causes memory corruption that is particularly noticable when the syzbot test[2] is run. The test boils down to: out = creat(argv[1], 0666); ftruncate(out, 0x800); lseek(out, 0x200, SEEK_SET); in = open(argv[1], O_RDONLY | O_DIRECT | O_NOFOLLOW); sendfile(out, in, NULL, 0x1dd00); run repeatedly in parallel. What I think is happening is that ftruncate() occasionally shortens the DIO read that's about to be made by sendfile's splice core by reducing i_size. This should be more efficient for DIO read by virtue of doing a bulk page allocation, but slightly less efficient by ignoring any partial page in the pipe. Reported-by: syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230207094731.1390-1-hdanton@sina.com/ [1] Link: https://lore.kernel.org/r/000000000000b0b3c005f3a09383@google.com/ [2]
-
splice: Add a func to do a splice from a buffered file without ITER_PIPE
Provide a function to do splice read from a buffered file, pulling the folios out of the pagecache directly by calling filemap_get_pages() to do any required reading and then pasting the returned folios into the pipe. A helper function is provided to do the actual folio pasting and will handle multipage folios by splicing as many of the relevant subpages as will fit into the pipe. The code is loosely based on filemap_read() and might belong in mm/filemap.c with that as it needs to use filemap_get_pages(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org
-
mm: Pass info, not iter, into filemap_get_pages()
filemap_get_pages() and a number of functions that it calls take an iterator to provide two things: the number of bytes to be got from the file specified and whether partially uptodate pages are allowed. Change these functions so that this information is passed in directly. This allows it to be called without having an iterator to hand. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Matthew Wilcox <willy@infradead.org> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org
Commits on Feb 13, 2023
-
block: ublk: check IO buffer based on flag need_get_data
Currently, uring_cmd with UBLK_IO_FETCH_REQ or UBLK_IO_COMMIT_AND_FETCH_REQ is always checked whether userspace server has provided IO buffer even flag UBLK_F_NEED_GET_DATA is configured. This is a excessive check. If UBLK_F_NEED_GET_DATA is configured, FETCH_RQ doesn't need to provide IO buffer; COMMIT_AND_FETCH_REQ also doesn't need to do that if the IO type is not READ. Check ub_cmd->addr together with ublk_need_get_data() and IO type in ublk_ch_uring_cmd(). With this fix, userspace server doesn't need to preserve buffers for every ublk_io when flag UBLK_F_NEED_GET_DATA is configured, in order to save memory. Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com> Fixes: c86019f ("ublk_drv: add support for UBLK_IO_NEED_GET_DATA") Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230210141356.112321-1-xiaodong.liu@intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commits on Feb 10, 2023
-
Merge branch 'for-6.3/io_uring' into for-next
* for-6.3/io_uring: (50 commits) io_uring,audit: don't log IORING_OP_MADVISE io_uring: mark task TASK_RUNNING before handling resume/task work io_uring: always go async for unsupported open flags io_uring: always go async for unsupported fadvise flags io_uring: for requests that require async, force it io_uring: if a linked request has REQ_F_FORCE_ASYNC then run it async io_uring: add reschedule point to handle_tw_list() io_uring: add a conditional reschedule to the IOPOLL cancelation loop io_uring: return normal tw run linking optimisation io_uring: refactor tctx_task_work io_uring: refactor io_put_task helpers io_uring: refactor req allocation io_uring: improve io_get_sqe io_uring: kill outdated comment about overflow flush io_uring: use user visible tail in io_uring_poll() io_uring: pass in io_issue_def to io_assign_file() io_uring: Enable KASAN for request cache io_uring: handle TIF_NOTIFY_RESUME when checking for task_work io_uring/msg-ring: ensure flags passing works for task_work completions io_uring: Split io_issue_def struct ...
-
io_uring,audit: don't log IORING_OP_MADVISE
fadvise and madvise both provide hints for caching or access pattern for file and memory respectively. Skip them. Fixes: 5bd2182 ("audit,io_uring,io-wq: add some basic audit support to io_uring") Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Link: https://lore.kernel.org/r/b5dfdcd541115c86dbc774aa9dd502c964849c5f.1675282642.git.rgb@redhat.com Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Merge branch 'for-6.3/iter-ubuf' into for-next
* for-6.3/iter-ubuf: block: use iter_ubuf for single range iov_iter: move iter_ubuf check inside restore WARN io_uring: use iter_ubuf for single range imports io_uring: switch network send/recv to ITER_UBUF iov: add import_ubuf()
-
Merge branch 'for-6.3/dio' into for-next
* for-6.3/dio: fs: build the legacy direct I/O code conditionally fs: move sb_init_dio_done_wq out of direct-io.c
-
Merge tag 'soc-fixes-6.2-4' of git://git.kernel.org/pub/scm/linux/ker…
…nel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "All the changes this time are minor devicetree corrections, the majority being for 64-bit Rockchip SoC support. These are a couple of corrections for properties that are in violation of the binding, some that put the machine into safer operating points for the eMMC and thermal settings, and missing properties that prevented rk356x PCIe and ethernet from working correctly. The changes for amlogic and mediatek address incorrect properties that were preventing the display support on MT8195 and the MMC support on various Meson SoCs from working correctly. The stihxxx-b2120 change fixes the GPIO polarity for the DVB tuner to allow this to be used correctly after a futre driver change, though it has no effect on older kernels" * tag 'soc-fixes-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: dts: meson-gx: Make mmc host controller interrupts level-sensitive arm64: dts: meson-g12-common: Make mmc host controller interrupts level-sensitive arm64: dts: meson-axg: Make mmc host controller interrupts level-sensitive ARM: dts: stihxxx-b2120: fix polarity of reset line of tsin0 port arm64: dts: mediatek: mt8195: Fix vdosys* compatible strings arm64: dts: rockchip: align rk3399 DMC OPP table with bindings arm64: dts: rockchip: set sdmmc0 speed to sd-uhs-sdr50 on rock-3a arm64: dts: rockchip: fix probe of analog sound card on rock-3a arm64: dts: rockchip: add missing #interrupt-cells to rk356x pcie2x1 arm64: dts: rockchip: fix input enable pinconf on rk3399 ARM: dts: rockchip: add power-domains property to dp node on rk3288 arm64: dts: rockchip: add io domain setting to rk3566-box-demo arm64: dts: rockchip: remove unsupported property from sdmmc2 for rock-3a arm64: dts: rockchip: drop unused LED mode property from rk3328-roc-cc arm64: dts: rockchip: reduce thermal limits on rk3399-pinephone-pro arm64: dts: rockchip: use correct reset names for rk3399 crypto nodes
-
Merge tag 'riscv-for-linus-6.2-rc8' of git://git.kernel.org/pub/scm/l…
…inux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: "This is a little bigger that I'd hope for this late in the cycle, but they're all pretty concrete fixes and the only one that's bigger than a few lines is pmdp_collapse_flush() (which is almost all boilerplate/comment). It's also all bug fixes for issues that have been around for a while. So I think it's not all that scary, just bad timing. - avoid partial TLB fences for huge pages, which are disallowed by the ISA - avoid missing a frame when dumping stacks - avoid misaligned accesses (and possibly overflows) in kprobes - fix a race condition in tracking page dirtiness" * tag 'riscv-for-linus-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fixup race condition on PG_dcache_clean in flush_icache_pte riscv: kprobe: Fixup misaligned load text riscv: stacktrace: Fix missing the first frame riscv: mm: Implement pmdp_collapse_flush for THP -
Merge tag 'ceph-for-6.2-rc8' of https://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov: "A fix for a pretty embarrassing omission in the session flush handler from Xiubo, marked for stable" * tag 'ceph-for-6.2-rc8' of https://github.com/ceph/ceph-client: ceph: flush cap releases when the session is flushed
-
Merge tag 'block-6.2-2023-02-10' of git://git.kernel.dk/linux
Pull block fix from Jens Axboe: "A single fix for a smatch regression introduced in this merge window" * tag 'block-6.2-2023-02-10' of git://git.kernel.dk/linux: nvme-auth: mark nvme_auth_wq static