Skip to content

Commits

Permalink
my_v5.18-dm-bi…
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on Mar 10, 2022

  1. blk-mq: kill warning when building block/blk-mq-debugfs-zoned.c

    Fix the following warning when building block/blk-mq-debugfs-zoned.c:
    
    In file included from block/blk-mq-debugfs-zoned.c:7:
    block/blk-mq-debugfs.h:24:14: warning: ‘struct blk_mq_hw_ctx’ declared inside parameter list will not be visible outside of this definition or declaration
       24 |       struct blk_mq_hw_ctx *hctx);
          |              ^~~~~~~~~~~~~
    
    Cc: Christoph Hellwig <hch@lst.de>
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei committed Mar 10, 2022
    Copy the full SHA
    d891aa3 View commit details
    Browse the repository at this point in the history

Commits on Mar 7, 2022

  1. dm: support bio polling

    Support bio(REQ_POLLED) polling in the following approach:
    
    1) only support io polling on normal READ/WRITE, and other abnormal IOs
    still fallback to IRQ mode, so the target io is exactly inside the dm
    io.
    
    2) hold one refcnt on io->io_count after submitting this dm bio with
    REQ_POLLED
    
    3) support dm native bio splitting, any dm io instance associated with
    current bio will be added into one list which head is bio->bi_private
    which will be recovered before ending this bio
    
    4) implement .poll_bio() callback, call bio_poll() on the single target
    bio inside the dm io which is retrieved via bio->bi_bio_drv_data; call
    dm_io_dec_pending() after the target io is done in .poll_bio()
    
    5) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL,
    which is based on Jeffle's previous patch.
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Mike Snitzer <snitzer@redhat.com>
    Ming Lei committed Mar 7, 2022
    Copy the full SHA
    633c4d1 View commit details
    Browse the repository at this point in the history
  2. block: add ->poll_bio to block_device_operations

    Prepare for supporting IO polling for bio-based driver.
    
    Add ->poll_bio callback so that bio-based driver can provide their own
    logic for polling bio.
    
    Also fix ->submit_bio_bio typo in comment block above
    __submit_bio_noacct.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Mike Snitzer <snitzer@redhat.com>
    Ming Lei committed Mar 7, 2022
    Copy the full SHA
    5294681 View commit details
    Browse the repository at this point in the history

Commits on Mar 4, 2022

  1. blk-mq: manage hctx map via xarray

    Firstly code becomes more clean by switching to xarray from plain array.
    
    Secondly use-after-free on q->queue_hw_ctx can be fixed because
    queue_for_each_hw_ctx() may be run when updating nr_hw_queues is
    in-progress. With this patch, q->hctx_table is defined as xarray, and
    this structure will share same lifetime with request queue, so
    queue_for_each_hw_ctx() can use q->hctx_table to lookup hctx reliably.
    
    Reported-by: Yu Kuai <yukuai3@huawei.com>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei committed Mar 4, 2022
    Copy the full SHA
    807558c View commit details
    Browse the repository at this point in the history
  2. blk-mq: prepare for implementing hctx table via xarray

    It is inevitable to cause use-after-free on q->queue_hw_ctx between
    queue_for_each_hw_ctx() and blk_mq_update_nr_hw_queues(). And converting
    to xarray can fix the uaf, meantime code gets cleaner.
    
    Prepare for converting q->queue_hctx_ctx into xarray, one thing is that
    xa_for_each() can only accept 'unsigned long' as index, so changes type
    of hctx index of queue_for_each_hw_ctx() into 'unsigned long'.
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei committed Mar 4, 2022
    Copy the full SHA
    f49d27c View commit details
    Browse the repository at this point in the history
  3. block: mtip32xx: don't touch q->queue_hw_ctx

    q->queue_hw_ctx is really one blk-mq internal structure for retrieving
    hctx via its index, not supposed to be used by drivers. Meantime drivers
    can get the tags structure easily from tagset.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei committed Mar 4, 2022
    Copy the full SHA
    99f8757 View commit details
    Browse the repository at this point in the history
  4. blk-mq: reconfigure poll after queue map is changed

    queue map can be changed when updating nr_hw_queues, so we need to
    reconfigure queue's poll capability. Add one helper for doing this job.
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei committed Mar 4, 2022
    Copy the full SHA
    400b8d1 View commit details
    Browse the repository at this point in the history
  5. blk-mq: simplify reallocation of hw ctxs a bit

    blk_mq_alloc_and_init_hctx() has already taken reuse into account, so
    no need to do it outside, then we can simplify blk_mq_realloc_hw_ctxs().
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei committed Mar 4, 2022
    Copy the full SHA
    25a06e4 View commit details
    Browse the repository at this point in the history
  6. blk-mq: figure out correct numa node for hw queue

    The current code always uses default queue map and hw queue index
    for figuring out the numa node for hw queue, this way isn't correct
    because blk-mq supports three queue maps, and the correct queue map
    should be used for the specified hw queue.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei committed Mar 4, 2022
    Copy the full SHA
    748bc00 View commit details
    Browse the repository at this point in the history

Commits on Feb 27, 2022

  1. block: remove redundant semicolon

    Remove redundant semicolon from block/bdev.c
    
    Signed-off-by: Nian Yanchuan <yanchuan@nfschina.com>
    Link: https://lore.kernel.org/r/20220227170124.GA14658@localhost.localdomain
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Nian Yanchuan authored and axboe committed Feb 27, 2022
    Copy the full SHA
    483546c View commit details
    Browse the repository at this point in the history
  2. block: default BLOCK_LEGACY_AUTOLOAD to y

    As Luis reported, losetup currently doesn't properly create the loop
    device without this if the device node already exists because old
    scripts created it manually.  So default to y for now and remove the
    aggressive removal schedule.
    
    Reported-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20220225181440.1351591-1-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 27, 2022
    Copy the full SHA
    451f0b6 View commit details
    Browse the repository at this point in the history

Commits on Feb 22, 2022

  1. block: update io_ticks when io hang

    When the inflight IOs are slow and no new IOs are issued, we expect
    iostat could manifest the IO hang problem. However after
    commit 5b18b5a ("block: delete part_round_stats and switch to less
    precise counting"), io_tick and time_in_queue will not be updated until
    the end of IO, and the avgqu-sz and %util columns of iostat will be zero.
    
    Because it has using stat.nsecs accumulation to express time_in_queue
    which is not suitable to change, and may %util will express the status
    better when io hang occur. To fix io_ticks, we use update_io_ticks and
    inflight to update io_ticks when diskstats_show and part_stat_show
    been called.
    
    Fixes: 5b18b5a ("block: delete part_round_stats and switch to less precise counting")
    Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20220217064247.4041435-1-zhangwensheng5@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Zhang Wensheng authored and axboe committed Feb 22, 2022
    Copy the full SHA
    86d7331 View commit details
    Browse the repository at this point in the history

Commits on Feb 18, 2022

  1. block, bfq: don't move oom_bfqq

    Our test report a UAF:
    
    [ 2073.019181] ==================================================================
    [ 2073.019188] BUG: KASAN: use-after-free in __bfq_put_async_bfqq+0xa0/0x168
    [ 2073.019191] Write of size 8 at addr ffff8000ccf64128 by task rmmod/72584
    [ 2073.019192]
    [ 2073.019196] CPU: 0 PID: 72584 Comm: rmmod Kdump: loaded Not tainted 4.19.90-yk #5
    [ 2073.019198] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
    [ 2073.019200] Call trace:
    [ 2073.019203]  dump_backtrace+0x0/0x310
    [ 2073.019206]  show_stack+0x28/0x38
    [ 2073.019210]  dump_stack+0xec/0x15c
    [ 2073.019216]  print_address_description+0x68/0x2d0
    [ 2073.019220]  kasan_report+0x238/0x2f0
    [ 2073.019224]  __asan_store8+0x88/0xb0
    [ 2073.019229]  __bfq_put_async_bfqq+0xa0/0x168
    [ 2073.019233]  bfq_put_async_queues+0xbc/0x208
    [ 2073.019236]  bfq_pd_offline+0x178/0x238
    [ 2073.019240]  blkcg_deactivate_policy+0x1f0/0x420
    [ 2073.019244]  bfq_exit_queue+0x128/0x178
    [ 2073.019249]  blk_mq_exit_sched+0x12c/0x160
    [ 2073.019252]  elevator_exit+0xc8/0xd0
    [ 2073.019256]  blk_exit_queue+0x50/0x88
    [ 2073.019259]  blk_cleanup_queue+0x228/0x3d8
    [ 2073.019267]  null_del_dev+0xfc/0x1e0 [null_blk]
    [ 2073.019274]  null_exit+0x90/0x114 [null_blk]
    [ 2073.019278]  __arm64_sys_delete_module+0x358/0x5a0
    [ 2073.019282]  el0_svc_common+0xc8/0x320
    [ 2073.019287]  el0_svc_handler+0xf8/0x160
    [ 2073.019290]  el0_svc+0x10/0x218
    [ 2073.019291]
    [ 2073.019294] Allocated by task 14163:
    [ 2073.019301]  kasan_kmalloc+0xe0/0x190
    [ 2073.019305]  kmem_cache_alloc_node_trace+0x1cc/0x418
    [ 2073.019308]  bfq_pd_alloc+0x54/0x118
    [ 2073.019313]  blkcg_activate_policy+0x250/0x460
    [ 2073.019317]  bfq_create_group_hierarchy+0x38/0x110
    [ 2073.019321]  bfq_init_queue+0x6d0/0x948
    [ 2073.019325]  blk_mq_init_sched+0x1d8/0x390
    [ 2073.019330]  elevator_switch_mq+0x88/0x170
    [ 2073.019334]  elevator_switch+0x140/0x270
    [ 2073.019338]  elv_iosched_store+0x1a4/0x2a0
    [ 2073.019342]  queue_attr_store+0x90/0xe0
    [ 2073.019348]  sysfs_kf_write+0xa8/0xe8
    [ 2073.019351]  kernfs_fop_write+0x1f8/0x378
    [ 2073.019359]  __vfs_write+0xe0/0x360
    [ 2073.019363]  vfs_write+0xf0/0x270
    [ 2073.019367]  ksys_write+0xdc/0x1b8
    [ 2073.019371]  __arm64_sys_write+0x50/0x60
    [ 2073.019375]  el0_svc_common+0xc8/0x320
    [ 2073.019380]  el0_svc_handler+0xf8/0x160
    [ 2073.019383]  el0_svc+0x10/0x218
    [ 2073.019385]
    [ 2073.019387] Freed by task 72584:
    [ 2073.019391]  __kasan_slab_free+0x120/0x228
    [ 2073.019394]  kasan_slab_free+0x10/0x18
    [ 2073.019397]  kfree+0x94/0x368
    [ 2073.019400]  bfqg_put+0x64/0xb0
    [ 2073.019404]  bfqg_and_blkg_put+0x90/0xb0
    [ 2073.019408]  bfq_put_queue+0x220/0x228
    [ 2073.019413]  __bfq_put_async_bfqq+0x98/0x168
    [ 2073.019416]  bfq_put_async_queues+0xbc/0x208
    [ 2073.019420]  bfq_pd_offline+0x178/0x238
    [ 2073.019424]  blkcg_deactivate_policy+0x1f0/0x420
    [ 2073.019429]  bfq_exit_queue+0x128/0x178
    [ 2073.019433]  blk_mq_exit_sched+0x12c/0x160
    [ 2073.019437]  elevator_exit+0xc8/0xd0
    [ 2073.019440]  blk_exit_queue+0x50/0x88
    [ 2073.019443]  blk_cleanup_queue+0x228/0x3d8
    [ 2073.019451]  null_del_dev+0xfc/0x1e0 [null_blk]
    [ 2073.019459]  null_exit+0x90/0x114 [null_blk]
    [ 2073.019462]  __arm64_sys_delete_module+0x358/0x5a0
    [ 2073.019467]  el0_svc_common+0xc8/0x320
    [ 2073.019471]  el0_svc_handler+0xf8/0x160
    [ 2073.019474]  el0_svc+0x10/0x218
    [ 2073.019475]
    [ 2073.019479] The buggy address belongs to the object at ffff8000ccf63f00
     which belongs to the cache kmalloc-1024 of size 1024
    [ 2073.019484] The buggy address is located 552 bytes inside of
     1024-byte region [ffff8000ccf63f00, ffff8000ccf64300)
    [ 2073.019486] The buggy address belongs to the page:
    [ 2073.019492] page:ffff7e000333d800 count:1 mapcount:0 mapping:ffff8000c0003a00 index:0x0 compound_mapcount: 0
    [ 2073.020123] flags: 0x7ffff0000008100(slab|head)
    [ 2073.020403] raw: 07ffff0000008100 ffff7e0003334c08 ffff7e00001f5a08 ffff8000c0003a00
    [ 2073.020409] raw: 0000000000000000 00000000001c001c 00000001ffffffff 0000000000000000
    [ 2073.020411] page dumped because: kasan: bad access detected
    [ 2073.020412]
    [ 2073.020414] Memory state around the buggy address:
    [ 2073.020420]  ffff8000ccf64000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 2073.020424]  ffff8000ccf64080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 2073.020428] >ffff8000ccf64100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 2073.020430]                                   ^
    [ 2073.020434]  ffff8000ccf64180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 2073.020438]  ffff8000ccf64200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 2073.020439] ==================================================================
    
    The same problem exist in mainline as well.
    
    This is because oom_bfqq is moved to a non-root group, thus root_group
    is freed earlier.
    
    Thus fix the problem by don't move oom_bfqq.
    
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Acked-by: Paolo Valente <paolo.valente@linaro.org>
    Link: https://lore.kernel.org/r/20220129015924.3958918-4-yukuai3@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yu Kuai authored and axboe committed Feb 18, 2022
    Copy the full SHA
    8410f70 View commit details
    Browse the repository at this point in the history
  2. block, bfq: avoid moving bfqq to it's parent bfqg

    Moving bfqq to it's parent bfqg is pointless.
    
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Link: https://lore.kernel.org/r/20220129015924.3958918-3-yukuai3@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yu Kuai authored and axboe committed Feb 18, 2022
    Copy the full SHA
    c5e4cb0 View commit details
    Browse the repository at this point in the history
  3. block, bfq: cleanup bfq_bfqq_to_bfqg()

    Use bfq_group() instead, which do the same thing.
    
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Acked-by: Paolo Valente <paolo.valente@linaro.org>
    Link: https://lore.kernel.org/r/20220129015924.3958918-2-yukuai3@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yu Kuai authored and axboe committed Feb 18, 2022
    Copy the full SHA
    43a4b1f View commit details
    Browse the repository at this point in the history

Commits on Feb 17, 2022

  1. block/bfq_wf2q: correct weight to ioprio

    The return value is ioprio * BFQ_WEIGHT_CONVERSION_COEFF or 0.
    What we want is ioprio or 0.
    Correct this by changing the calculation.
    
    Signed-off-by: Yahu Gao <gaoyahu19@gmail.com>
    Acked-by: Paolo Valente <paolo.valente@linaro.org>
    Link: https://lore.kernel.org/r/20220107065859.25689-1-gaoyahu19@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    YahuGao authored and axboe committed Feb 17, 2022
    Copy the full SHA
    bcd2be7 View commit details
    Browse the repository at this point in the history
  2. blk-mq: avoid extending delays of active hctx from blk_mq_delay_run_h…

    …w_queues
    
    When blk_mq_delay_run_hw_queues sets an hctx to run in the future, it can
    reset the delay length for an already pending delayed work run_work. This
    creates a scenario where multiple hctx may have their queues set to run,
    but if one runs first and finds nothing to do, it can reset the delay of
    another hctx and stall the other hctx's ability to run requests.
    
    To avoid this I/O stall when an hctx's run_work is already pending,
    leave it untouched to run at its current designated time rather than
    extending its delay. The work will still run which keeps closed the race
    calling blk_mq_delay_run_hw_queues is needed for while also avoiding the
    I/O stall.
    
    Signed-off-by: David Jeffery <djeffery@redhat.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20220131203337.GA17666@redhat
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    David Jeffery authored and axboe committed Feb 17, 2022
    Copy the full SHA
    8f5fea6 View commit details
    Browse the repository at this point in the history
  3. virtio_blk: simplify refcounting

    Implement the ->free_disk method to free the virtio_blk structure only
    once the last gendisk reference goes away instead of keeping a local
    refcount.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Acked-by: Michael S. Tsirkin <mst@redhat.com>
    Link: https://lore.kernel.org/r/20220215094514.3828912-6-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 17, 2022
    Copy the full SHA
    24b45e6 View commit details
    Browse the repository at this point in the history
  4. memstick/mspro_block: simplify refcounting

    Implement the ->free_disk method to free the msb_data structure only once
    the last gendisk reference goes away instead of keeping a local
    refcount.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20220215094514.3828912-5-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 17, 2022
    Copy the full SHA
    185ed42 View commit details
    Browse the repository at this point in the history
  5. memstick/mspro_block: fix handling of read-only devices

    Use set_disk_ro to propagate the read-only state to the block layer
    instead of checking for it in ->open and leaking a reference in case
    of a read-only device.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20220215094514.3828912-4-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 17, 2022
    Copy the full SHA
    6dab421 View commit details
    Browse the repository at this point in the history
  6. memstick/ms_block: simplify refcounting

    Implement the ->free_disk method to free the msb_data structure only once
    the last gendisk reference goes away instead of keeping a local refcount.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20220215094514.3828912-3-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 17, 2022
    Copy the full SHA
    e2efa07 View commit details
    Browse the repository at this point in the history
  7. block: add a ->free_disk method

    Add a method to notify the driver that the gendisk is about to be freed.
    This allows drivers to tie the lifetime of their private data to that of
    the gendisk and thus deal with device removal races without expensive
    synchronization and boilerplate code.
    
    A new flag is added so that ->free_disk is only called after a successful
    call to add_disk, which significantly simplifies the error handling path
    during probing.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20220215094514.3828912-2-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 17, 2022
    Copy the full SHA
    7679205 View commit details
    Browse the repository at this point in the history
  8. block: revert 4f1e963 ("blk-throtl: optimize IOPS throttle for large …

    …IO scenarios")
    
    Revert commit 4f1e963 ("blk-throtl: optimize IOPS throttle for large
    IO scenarios") since we have another easier way to address this issue and
    get better iops throttling result.
    
    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20220216044514.2903784-9-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Ming Lei authored and axboe committed Feb 17, 2022
    Copy the full SHA
    34841e6 View commit details
    Browse the repository at this point in the history
  9. block: don't try to throttle split bio if iops limit isn't set

    We need to throttle split bio in case of IOPS limit even though the
    split bio has been marked as BIO_THROTTLED since block layer
    accounts split bio actually.
    
    If only throughput throttle is setup, no need to throttle any more
    if BIO_THROTTLED is set since we have accounted & considered the
    whole bio bytes already.
    
    Add one flag of THROTL_TG_HAS_IOPS_LIMIT for serving this purpose.
    
    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20220216044514.2903784-8-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Ming Lei authored and axboe committed Feb 17, 2022
    Copy the full SHA
    5a93b60 View commit details
    Browse the repository at this point in the history
  10. block: throttle split bio in case of iops limit

    Commit 111be88 ("block-throttle: avoid double charge") marks bio as
    BIO_THROTTLED unconditionally if __blk_throtl_bio() is called on this bio,
    then this bio won't be called into __blk_throtl_bio() any more. This way
    is to avoid double charge in case of bio splitting. It is reasonable for
    read/write throughput limit, but not reasonable for IOPS limit because
    block layer provides io accounting against split bio.
    
    Chunguang Xu has already observed this issue and fixed it in commit
    4f1e963 ("blk-throtl: optimize IOPS throttle for large IO scenarios").
    However, that patch only covers bio splitting in __blk_queue_split(), and
    we have other kind of bio splitting, such as bio_split() &
    submit_bio_noacct() and other ways.
    
    This patch tries to fix the issue in one generic way by always charging
    the bio for iops limit in blk_throtl_bio(). This way is reasonable:
    re-submission & fast-cloned bio is charged if it is submitted to same
    disk/queue, and BIO_THROTTLED will be cleared if bio->bi_bdev is changed.
    
    This new approach can get much more smooth/stable iops limit compared with
    commit 4f1e963 ("blk-throtl: optimize IOPS throttle for large IO
    scenarios") since that commit can't throttle current split bios actually.
    
    Also this way won't cause new double bio iops charge in
    blk_throtl_dispatch_work_fn() in which blk_throtl_bio() won't be called
    any more.
    
    Reported-by: Ning Li <lining2020x@163.com>
    Acked-by: Tejun Heo <tj@kernel.org>
    Cc: Chunguang Xu <brookxu@tencent.com>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20220216044514.2903784-7-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Ming Lei authored and axboe committed Feb 17, 2022
    Copy the full SHA
    9f5ede3 View commit details
    Browse the repository at this point in the history
  11. block: merge submit_bio_checks() into submit_bio_noacct

    Now submit_bio_checks() is only called by submit_bio_noacct(), so merge
    it into submit_bio_noacct().
    
    Suggested-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20220216044514.2903784-6-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Ming Lei authored and axboe committed Feb 17, 2022
    Copy the full SHA
    d24c670 View commit details
    Browse the repository at this point in the history
  12. block: don't check bio in blk_throtl_dispatch_work_fn

    The bio has been checked already before throttling, so no need to check
    it again before dispatching it from throttle queue.
    
    Add a helper of submit_bio_noacct_nocheck() for this purpose.
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20220216044514.2903784-5-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Ming Lei authored and axboe committed Feb 17, 2022
    Copy the full SHA
    3f98c75 View commit details
    Browse the repository at this point in the history
  13. block: don't declare submit_bio_checks in local header

    submit_bio_checks() won't be called outside of block/blk-core.c any more
    since commit 9d497e2 ("block: don't protect submit_bio_checks by
    q_usage_counter"), so mark it as one local helper.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20220216044514.2903784-4-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Ming Lei authored and axboe committed Feb 17, 2022
    Copy the full SHA
    29ff236 View commit details
    Browse the repository at this point in the history
  14. block: move blk_crypto_bio_prep() out of blk-mq.c

    blk_crypto_bio_prep() is called for both bio based and blk-mq drivers,
    so move it out of blk-mq.c, then we can unify this kind of handling.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20220216044514.2903784-3-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Ming Lei authored and axboe committed Feb 17, 2022
    Copy the full SHA
    7f36b7d View commit details
    Browse the repository at this point in the history
  15. block: move submit_bio_checks() into submit_bio_noacct

    It is more clean & readable to check bio when starting to submit it,
    instead of just before calling ->submit_bio() or blk_mq_submit_bio().
    
    Also it provides us chance to optimize bio submission without checking
    bio.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20220216044514.2903784-2-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Ming Lei authored and axboe committed Feb 17, 2022
    Copy the full SHA
    a650628 View commit details
    Browse the repository at this point in the history
  16. dm: remove dm_dispatch_clone_request

    Fold dm_dispatch_clone_request into it's only caller, and use a switch
    statement to single dispatch for the handling of the different return
    values from blk_insert_cloned_request.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Mike Snitzer <snitzer@redhat.com>
    Link: https://lore.kernel.org/r/20220215100540.3892965-6-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 17, 2022
    Copy the full SHA
    9f9adea View commit details
    Browse the repository at this point in the history
  17. dm: remove useless code from dm_dispatch_clone_request

    Both ->start_time_ns and the RQF_IO_STAT are set when the request is
    allocated using blk_mq_alloc_request by dm-mpath in blk_mq_rq_ctx_init.
    The block layer also ensures ->start_time_ns is only set when actually
    needed.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Mike Snitzer <snitzer@redhat.com>
    Link: https://lore.kernel.org/r/20220215100540.3892965-5-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 17, 2022
    Copy the full SHA
    8803c89 View commit details
    Browse the repository at this point in the history
  18. blk-mq: remove the request_queue argument to blk_insert_cloned_request

    The request must be submitted to the queue it was allocated for, so
    remove the extra request_queue argument.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Mike Snitzer <snitzer@redhat.com>
    Link: https://lore.kernel.org/r/20220215100540.3892965-4-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 17, 2022
    Copy the full SHA
    28db471 View commit details
    Browse the repository at this point in the history
  19. blk-mq: fold blk_cloned_rq_check_limits into blk_insert_cloned_request

    Fold blk_cloned_rq_check_limits into its only caller.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Mike Snitzer <snitzer@redhat.com>
    Link: https://lore.kernel.org/r/20220215100540.3892965-3-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 17, 2022
    Copy the full SHA
    a5efda3 View commit details
    Browse the repository at this point in the history
  20. blk-mq: make the blk-mq stacking code optional

    The code to stack blk-mq drivers is only used by dm-multipath, and
    will preferably stay that way.  Make it optional and only selected
    by device mapper, so that the buildbots more easily catch abuses
    like the one that slipped in in the ufs driver in the last merged
    window.  Another positive side effects is that kernel builds without
    device mapper shrink a little bit as well.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Mike Snitzer <snitzer@redhat.com>
    Link: https://lore.kernel.org/r/20220215100540.3892965-2-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Feb 17, 2022
    Copy the full SHA
    248c793 View commit details
    Browse the repository at this point in the history
Older