Skip to content
Permalink
Branch: v5.0-blk-for-b…
Commits on Mar 4, 2019
  1. block: don't check if adjacent bvecs in one bio can be mergeable

    Ming Lei
    Ming Lei committed Mar 1, 2019
    Now both passthrough and FS IO have supported multi-page bvec, and
    bvec merging has been handled actually when adding page to bio, then
    adjacent bvecs won't be mergeable any more if they belong to same bio.
    
    So only try to merge bvecs if they are from different bios.
    
    Cc: Omar Sandoval <osandov@fb.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
  2. block: enable multi-page bvec for passthrough IO

    Ming Lei
    Ming Lei committed Mar 1, 2019
    Now block IO stack is basically ready for supporting multi-page bvec,
    however it isn't enabled on passthrough IO.
    
    One reason is that passthrough IO is dispatched to LLD directly and bio
    split is bypassed, so the bio has to be built correctly for dispatch to
    LLD from the beginning.
    
    Implement multi-page support for passthrough IO by limitting each bvec
    as block device's segment and applying all kinds of queue limit in
    blk_add_pc_page(). Then we don't need to calculate segments any more for
    passthrough IO any more, turns out code is simplified much.
    
    Cc: Omar Sandoval <osandov@fb.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
  3. block: put the same page when adding it to bio

    Ming Lei
    Ming Lei committed Mar 4, 2019
    When the added page is merged to last same page in bio_add_pc_page(),
    the user may need to put this page for avoiding page leak.
    
    bio_map_user_iov() needs this kind of handling, and now it deals with
    it by itself in hack style.
    
    Moves the handling of put page into __bio_add_pc_page(), so
    bio_map_user_iov() may be simplified a bit, and maybe more users
    can benefit from this change.
    
    Cc: Omar Sandoval <osandov@fb.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
Commits on Mar 3, 2019
  1. block: check if page is mergeable in one helper

    Ming Lei
    Ming Lei committed Mar 1, 2019
    Now the check for deciding if one page is mergeable to current bvec
    becomes a bit complicated, and we need to reuse the code before
    adding pc page.
    
    So move the check in one dedicated helper.
    
    No function change.
    
    Cc: Omar Sandoval <osandov@fb.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
  2. block: don't merge adjacent bvecs to one segment in bio blk_queue_split

    Ming Lei
    Ming Lei committed Mar 1, 2019
    For normal filesystem IO, each page is added via blk_add_page(),
    in which bvec(page) merge has been handled already, and basically
    not possible to merge two adjacent bvecs in one bio.
    
    So not try to merge two adjacent bvecs in blk_queue_split(), also add
    check if one page is mergeable to current bvec in bio_add_page() for
    avoiding to break XEN.
    
    Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
    Cc: Juergen Gross <jgross@suse.com>
    Cc: xen-devel@lists.xenproject.org
    Cc: Omar Sandoval <osandov@fb.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
  3. block: pass page to xen_biovec_phys_mergeable

    Ming Lei
    Ming Lei committed Mar 1, 2019
    xen_biovec_phys_mergeable() only needs .bv_page of the 2nd bio bvec
    for checking if the two bvecs can be merged, so pass page to
    xen_biovec_phys_mergeable() directly.
    
    No function change.
    
    Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
    Cc: Juergen Gross <jgross@suse.com>
    Cc: xen-devel@lists.xenproject.org
    Cc: Omar Sandoval <osandov@fb.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
  4. Revert "nvme: add support for the Write Zeroes command"

    Ming Lei
    Ming Lei committed Mar 2, 2019
    This reverts commit 6e02318.
  5. block: fix segment calculation for passthrough IO

    Ming Lei
    Ming Lei committed Mar 3, 2019
    blk_recount_segments() can be called in bio_add_pc_page() for
    calculating how many segments this bio will has after one page is added
    to this bio. If the resulted segment number is beyond the queue limit,
    the added page will be removed.
    
    The try-and-fix policy requires blk_recount_segments(__blk_recalc_rq_segments)
    to not consider the segment number limit. Unfortunately bvec_split_segs()
    does check this limit, and causes small segment number returned to
    bio_add_pc_page(), then page still may be added to the bio even though
    segment number limit becomes broken.
    
    Fixes this issue by not considering segment number limit when calcualting
    bio's segment number.
    
    Fixes: dcebd75 ("block: use bio_for_each_bvec() to compute multi-page bvec count")
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Omar Sandoval <osandov@fb.com>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
  6. block: fix updating bio's front segment size

    Ming Lei
    Ming Lei committed Mar 2, 2019
    When the current bvec can be merged to the 1st segment, the bio's front
    segment size has to be updated.
    
    However, dcebd75 doesn't consider that case, then bio's front
    segment size may not be correct.
    
    This patch fixes this issue.
    
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Omar Sandoval <osandov@fb.com>
    Fixes: dcebd75 ("block: use bio_for_each_bvec() to compute multi-page bvec count")
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
Commits on Feb 28, 2019
  1. block: Replace function name in string with __func__

    iamkeyur authored and axboe committed Feb 17, 2019
    Replace hard coded function name register_blkdev with __func__, to
    improve robustness and to conform to the Linux kernel coding
    style. Issue found using checkpatch.
    
    Signed-off-by: Keyur Patel <iamkeyur96@gmail.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  2. nbd: propagate genlmsg_reply return code

    Li RongQing authored and axboe committed Feb 19, 2019
    genlmsg_reply can fail, so propagate its return code
    
    Signed-off-by: Li RongQing <lirongqing@baidu.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  3. floppy: remove set but not used variable 'q'

    YueHaibing authored and axboe committed Feb 18, 2019
    Fixes gcc '-Wunused-but-set-variable' warning:
    
    drivers/block/floppy.c: In function 'request_done':
    drivers/block/floppy.c:2233:24: warning:
     variable 'q' set but not used [-Wunused-but-set-variable]
    
    It's never used and can be removed.
    
    Acked-by: Jiri Kosina <jkosina@suse.cz>
    Signed-off-by: YueHaibing <yuehaibing@huawei.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  4. null_blk: fix checking for REQ_FUA

    mauelsha authored and axboe committed Feb 22, 2019
    null_handle_bio() erroneously uses the bio_op macro
    which masks respective request flag bits including REQ_FUA
    out thus failing the check.
    
    Fix by checking bio->bi_opf directly.
    
    Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  5. block: fix NULL pointer dereference in register_disk

    zhengbin13 authored and axboe committed Feb 20, 2019
    If __device_add_disk-->bdi_register_owner-->bdi_register-->
    bdi_register_va-->device_create_vargs fails, bdi->dev is still
    NULL, __device_add_disk-->register_disk will visit bdi->dev->kobj.
    This patch fixes that.
    
    Signed-off-by: zhengbin <zhengbin13@huawei.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  6. fs: fix guard_bio_eod to check for real EOD errors

    cmaiolino authored and axboe committed Feb 26, 2019
    guard_bio_eod() can truncate a segment in bio to allow it to do IO on
    odd last sectors of a device.
    
    It already checks if the IO starts past EOD, but it does not consider
    the possibility of an IO request starting within device boundaries can
    contain more than one segment past EOD.
    
    In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
    underflow bvec->bv_len.
    
    Fix this by checking if truncated_bytes is lower than PAGE_SIZE.
    
    This situation has been found on filesystems such as isofs and vfat,
    which doesn't check the device size before mount, if the device is
    smaller than the filesystem itself, a readahead on such filesystem,
    which spans EOD, can trigger this situation, leading a call to
    zero_user() with a wrong size possibly corrupting memory.
    
    I didn't see any crash, or didn't let the system run long enough to
    check if memory corruption will be hit somewhere, but adding
    instrumentation to guard_bio_end() to check truncated_bytes size, was
    enough to see the error.
    
    The following script can trigger the error.
    
    MNT=/mnt
    IMG=./DISK.img
    DEV=/dev/loop0
    
    mkfs.vfat $IMG
    mount $IMG $MNT
    cp -R /etc $MNT &> /dev/null
    umount $MNT
    
    losetup -D
    
    losetup --find --show --sizelimit 16247280 $IMG
    mount $DEV $MNT
    
    find $MNT -type f -exec cat {} + >/dev/null
    
    Kudos to Eric Sandeen for coming up with the reproducer above
    
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  7. blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map

    Dongli Zhang authored and axboe committed Feb 27, 2019
    Replace set->map[0] with set->map[HCTX_TYPE_DEFAULT] to avoid hardcoding.
    
    Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  8. block: optimize bvec iteration in bvec_iter_advance

    Christoph Hellwig authored and axboe committed Feb 28, 2019
    There is no need to only iterate in chunks of PAGE_SIZE or less in
    bvec_iter_advance, given that the callers pass in the chunk length that
    they are operating on - either that already is less than PAGE_SIZE
    because they do classic page-based iteration, or it is larger because
    the caller operates on multi-page bvecs.
    
    This should help shaving off a few cycles of the I/O hot path.
    
    Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  9. block: introduce mp_bvec_for_each_page() for iterating over page

    Ming Lei authored and axboe committed Feb 27, 2019
    mp_bvec_for_each_segment() is a bit big for the iteration, so introduce
    a light-weight helper for iterating over pages, then 32bytes stack
    space can be saved.
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commits on Feb 27, 2019
  1. block: optimize blk_bio_segment_split for single-page bvec

    Ming Lei authored and axboe committed Feb 27, 2019
    Introduce a fast path for single-page bvec IO, then we can avoid
    to call bvec_split_segs() unnecessarily.
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  2. block: optimize __blk_segment_map_sg() for single-page bvec

    Ming Lei authored and axboe committed Feb 27, 2019
    Introduce a fast path for single-page bvec IO, then blk_bvec_map_sg()
    can be avoided.
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  3. block: introduce bvec_nth_page()

    Ming Lei authored and axboe committed Feb 27, 2019
    Single-page bvec can often be seen in small BS workloads, so
    introduce bvec_nth_page() for avoiding to call nth_page() unnecessarily,
    which looks not cheap.
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commits on Feb 24, 2019
  1. iomap: wire up the iopoll method

    Christoph Hellwig authored and axboe committed Dec 4, 2018
    Store the request queue the last bio was submitted to in the iocb
    private data in addition to the cookie so that we find the right block
    device.  Also refactor the common direct I/O bio submission code into a
    nice little helper.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    
    Modified to use bio_set_polled().
    
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  2. block: add bio_set_polled() helper

    axboe committed Dec 21, 2018
    For the upcoming async polled IO, we can't sleep allocating requests.
    If we do, then we introduce a deadlock where the submitter already
    has async polled IO in-flight, but can't wait for them to complete
    since polled requests must be active found and reaped.
    
    Utilize the helper in the blockdev DIRECT_IO code.
    
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  3. block: wire up block device iopoll method

    Christoph Hellwig authored and axboe committed Nov 30, 2018
    Just call blk_poll on the iocb cookie, we can derive the block device
    from the inode trivially.
    
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  4. fs: add an iopoll method to struct file_operations

    Christoph Hellwig authored and axboe committed Nov 22, 2018
    This new methods is used to explicitly poll for I/O completion for an
    iocb.  It must be called for any iocb submitted asynchronously (that
    is with a non-null ki_complete) which has the IOCB_HIPRI flag set.
    
    The method is assisted by a new ki_cookie field in struct iocb to store
    the polling cookie.
    
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commits on Feb 22, 2019
  1. loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()

    Dongli Zhang authored and axboe committed Feb 22, 2019
    Commit 0da03ca
    ("loop: Fix deadlock when calling blkdev_reread_part()") moves
    blkdev_reread_part() out of the loop_ctl_mutex. However,
    GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result,
    __blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and
    will not rescan the loop device to delete all partitions.
    
    Below are steps to reproduce the issue:
    
    step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100
    step2 # losetup -P /dev/loop0 tmp.raw
    step3 # parted /dev/loop0 mklabel gpt
    step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1
    step5 # losetup -d /dev/loop0
    
    Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and
    there is below kernel warning message:
    
    [  464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22)
    
    This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part().
    
    Fixes: 0da03ca ("loop: Fix deadlock when calling blkdev_reread_part()")
    Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  2. loop: do not print warn message if partition scan is successful

    Dongli Zhang authored and axboe committed Feb 22, 2019
    Do not print warn message when the partition scan returns 0.
    
    Fixes: d57f337 ("loop: Move special partition reread handling in loop_clr_fd()")
    Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commits on Feb 21, 2019
  1. block: bounce: make sure that bvec table is updated

    Ming Lei authored and axboe committed Feb 21, 2019
    Block bounce needs to allocate new page for doing IO, and the
    new page has to be updated to bvec table.
    
    Commit 6dc4f10 switches __blk_queue_bounce() to use the new
    bio_for_each_segment_all() interface. Unfortunately the new
    bio_for_each_segment_all() can't be used to update bvec table.
    
    This patch fixes this issue by retrieving bvec from the table
    directly, then the new allocated page can be updated to the bio.
    This way is safe because the cloned bio is single page bvec.
    
    Fixes: 6dc4f10 ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec")
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Omar Sandoval <osandov@fb.com>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  2. Merge branch 'nvme-5.1' of git://git.infradead.org/nvme into for-5.1/…

    axboe committed Feb 21, 2019
    …block
    
    Pull NVMe changes for 5.1 from Christoph
    
    * 'nvme-5.1' of git://git.infradead.org/nvme: (22 commits)
      nvme-rdma: use nr_phys_segments when map rq to sgl
      nvmet: convert to SPDX identifiers
      nvmet-rdma: convert to SPDX identifiers
      nvme-loop: convert to SPDX identifiers
      nvmet-fcloop: convert to SPDX identifiers
      nvmet-fc: convert to SPDX identifiers
      nvme: convert to SPDX identifiers
      nvme-pci: convert to SPDX identifiers
      nvme-lightnvm: convert to SPDX identifiers
      nvme-rdma: convert to SPDX identifiers
      nvme-fc: convert to SPDX identifiers
      nvme-fabrics: convert to SPDX identifiers
      nvme-tcp.h: fix SPDX header
      nvme_ioctl.h: remove duplicate GPL boilerplate
      nvme: return error from nvme_alloc_ns()
      nvme: avoid that deleting a controller triggers a circular locking complaint
      nvme: introduce a helper function for controller deletion
      nvme: unexport nvme_delete_ctrl_sync()
      nvme-pci: check kstrtoint() return value in queue_count_set()
      nvme-fabrics: document the poll function argument
      ...
  3. nvme-rdma: use nr_phys_segments when map rq to sgl

    ChaitanayaKulkarni authored and Christoph Hellwig committed Feb 21, 2019
    Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check
    if a command contains data to be mapped.  This fixes the case where
    a struct request contains LBAs, but it has no payload, such as
    Write Zeroes support.
    
    Fixes: 6e02318 ("nvme: add support for the Write Zeroes command")
    Reported-by: Ming Lei <tom.leiming@gmail.com>
    Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
    Tested-by: Ming Lei <tom.leiming@gmail.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
Commits on Feb 20, 2019
  1. nvmet: convert to SPDX identifiers

    Christoph Hellwig
    Christoph Hellwig committed Feb 18, 2019
    Update license to use SPDX-License-Identifier instead of verbose license
    text.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
  2. nvmet-rdma: convert to SPDX identifiers

    Christoph Hellwig
    Christoph Hellwig committed Feb 18, 2019
    Update license to use SPDX-License-Identifier instead of verbose license
    text.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
  3. nvme-loop: convert to SPDX identifiers

    Christoph Hellwig
    Christoph Hellwig committed Feb 18, 2019
    Update license to use SPDX-License-Identifier instead of verbose license
    text.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
  4. nvmet-fcloop: convert to SPDX identifiers

    Christoph Hellwig
    Christoph Hellwig committed Feb 18, 2019
    Update license to use SPDX-License-Identifier instead of verbose license
    text.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
  5. nvmet-fc: convert to SPDX identifiers

    Christoph Hellwig
    Christoph Hellwig committed Feb 18, 2019
    Update license to use SPDX-License-Identifier instead of verbose license
    text.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Older
You can’t perform that action at this time.