Skip to content

Commits

Permalink
Gulam-Mohamed/…
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on Dec 21, 2022

  1. block: Change the granularity of io ticks from ms to ns

    Problem Desc
    ============
    The "iostat" user-space utility was showing %util as 100% for the disks
    which has latencies less than a milli-second i.e for latencies in the
    range of micro-seconds and below.
    
    Root Cause
    ==========
    The IO accounting in block layer is currently done by updating the
    io_ticks in jiffies which is of milli-seconds granularity. Due to this,
    for the devices with IO latencies less than a milli-second, the latency
    will be accounted as 1 milli-second even-though its in the range of
    micro-seconds. This was causing the iostat command to show %util
    as 100% which is incorrect.
    
    Recreationg of the issue
    ========================
    Setup
    -----
    Devices: NVMe 24 devices
    Model number: 4610 (Intel)
    
    fio
    gulams authored and intel-lab-lkp committed Dec 21, 2022
    Copy the full SHA
    afff703 View commit details
    Browse the repository at this point in the history
  2. block: Data type conversion for IO accounting

    Change the data type of start and end time IO accounting variables in,
    block layer, from "unsigned long" to "u64". This is to enable nano-seconds
    granularity, in next commit, for the devices whose latency is less than
    milliseconds.
    
    Changes from V2 to V3
    =====================
    1. Changed all the required variables data-type to u64 as part of this
       first patch
    2. Create a new setting '2' for iostats in sysfs in next patch
    3. Change the code to get the ktime values when iostat=2 in next patch
    
    Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com>
    gulams authored and intel-lab-lkp committed Dec 21, 2022
    Copy the full SHA
    254be08 View commit details
    Browse the repository at this point in the history

Commits on Dec 19, 2022

  1. Merge branch 'io_uring-6.2' into for-next

    * io_uring-6.2:
      MAINTAINERS: io_uring: Add include/trace/events/io_uring.h
    axboe committed Dec 19, 2022
    Copy the full SHA
    0d6cb91 View commit details
    Browse the repository at this point in the history
  2. MAINTAINERS: io_uring: Add include/trace/events/io_uring.h

    This header file was introduced in commit c826bd7 ("io_uring: add
    set of tracing events"). It didn't get added to the io_uring
    maintainers section. Add this header file to the io_uring maintainers
    section.
    
    Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
    Link: https://lore.kernel.org/r/20221219164521.2481728-1-ammar.faizi@intel.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    ammarfaizi2 authored and axboe committed Dec 19, 2022
    Copy the full SHA
    5ad70eb View commit details
    Browse the repository at this point in the history
  3. Merge branch 'io_uring-6.2' into for-next

    * io_uring-6.2:
      io_uring/net: fix cleanup after recycle
      io_uring/net: ensure compat import handlers clear free_iov
    axboe committed Dec 19, 2022
    Copy the full SHA
    f7e5d69 View commit details
    Browse the repository at this point in the history
  4. io_uring/net: fix cleanup after recycle

    Don't access io_async_msghdr io_netmsg_recycle(), it may be reallocated.
    
    Cc: stable@vger.kernel.org
    Fixes: 9bb6690 ("io_uring: support multishot in recvmsg")
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/9e326f4ad4046ddadf15bf34bf3fa58c6372f6b5.1671461985.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Dec 19, 2022
    Copy the full SHA
    6c3e895 View commit details
    Browse the repository at this point in the history
  5. io_uring/net: ensure compat import handlers clear free_iov

    If we're not allocating the vectors because the count is below
    UIO_FASTIOV, we still do need to properly clear ->free_iov to prevent
    an erronous free of on-stack data.
    
    Reported-by: Jiri Slaby <jirislaby@gmail.com>
    Fixes: 4c17a49 ("io_uring/net: fix cleanup double free free_iov init")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Dec 19, 2022
    Copy the full SHA
    990a4de View commit details
    Browse the repository at this point in the history

Commits on Dec 18, 2022

  1. Merge branch 'io_uring-6.2' into for-next

    * io_uring-6.2:
      io_uring: include task_work run after scheduling in wait for events
      io_uring: don't use TIF_NOTIFY_SIGNAL to test for availability of task_work
      io_uring: use call_rcu_hurry if signaling an eventfd
    axboe committed Dec 18, 2022
    Copy the full SHA
    e3a4f4f View commit details
    Browse the repository at this point in the history
  2. io_uring: include task_work run after scheduling in wait for events

    It's quite possible that we got woken up because task_work was queued,
    and we need to process this task_work to generate the events waited for.
    If we return to the wait loop without running task_work, we'll end up
    adding the task to the waitqueue again, only to call
    io_cqring_wait_schedule() again which will run the task_work. This is
    less efficient than it could be, as it requires adding to the cq_wait
    queue again. It also triggers the wakeup path for completions as
    cq_wait is now non-empty with the task itself, and it'll require another
    lock grab and deletion to remove ourselves from the waitqueue.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Dec 18, 2022
    Copy the full SHA
    35d90f9 View commit details
    Browse the repository at this point in the history

Commits on Dec 17, 2022

  1. io_uring: don't use TIF_NOTIFY_SIGNAL to test for availability of tas…

    …k_work
    
    Use task_work_pending() as a better test for whether we have task_work
    or not, TIF_NOTIFY_SIGNAL is only valid if the any of the task_work
    items had been queued with TWA_SIGNAL as the notification mechanism.
    Hence task_work_pending() is a more reliable check.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Dec 17, 2022
    Copy the full SHA
    6434ec0 View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2022

  1. Merge branch 'block-6.2' into for-next

    * block-6.2:
      block: don't clear REQ_ALLOC_CACHE for non-polled requests
    axboe committed Dec 16, 2022
    Copy the full SHA
    e378351 View commit details
    Browse the repository at this point in the history
  2. block: don't clear REQ_ALLOC_CACHE for non-polled requests

    Since commit:
    
    b99182c ("bio: add pcpu caching for non-polling bio_put")
    
    we support bio caching for IRQ based IO as well, hence there's no need
    to manually clear REQ_ALLOC_CACHE if we disable polling on a request.
    
    Reviewed-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Dec 16, 2022
    Copy the full SHA
    53eab8e View commit details
    Browse the repository at this point in the history

Commits on Dec 15, 2022

  1. io_uring: use call_rcu_hurry if signaling an eventfd

    io_uring uses call_rcu in the case it needs to signal an eventfd as a
    result of an eventfd signal, since recursing eventfd signals are not
    allowed. This should be calling the new call_rcu_hurry API to not delay
    the signal.
    
    Signed-off-by: Dylan Yudaken <dylany@meta.com>
    
    Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
    Cc: Paul E. McKenney <paulmck@kernel.org>
    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Link: https://lore.kernel.org/r/20221215184138.795576-1-dylany@meta.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Dylan Yudaken authored and axboe committed Dec 15, 2022
    Copy the full SHA
    44a84da View commit details
    Browse the repository at this point in the history
  2. Merge branch 'io_uring-6.2' into for-next

    * io_uring-6.2:
      io_uring: use call_rcu_hurry if signaling an eventfd
    axboe committed Dec 15, 2022
    Copy the full SHA
    0c1f073 View commit details
    Browse the repository at this point in the history
  3. io_uring: use call_rcu_hurry if signaling an eventfd

    io_uring uses call_rcu in the case it needs to signal an eventfd as a
    result of an eventfd signal, since recursing eventfd signals are not
    allowed. This should be calling the new call_rcu_hurry API to not delay
    the signal.
    
    Signed-off-by: Dylan Yudaken <dylany@meta.com>
    
    Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
    Cc: Paul E. McKenney <paulmck@kernel.org>
    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Link: https://lore.kernel.org/r/20221215184138.795576-1-dylany@meta.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Dylan Yudaken authored and axboe committed Dec 15, 2022
    Copy the full SHA
    de8f020 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'block-6.2' into for-next

    * block-6.2:
      block: fix use-after-free of q->q_usage_counter
      block, bfq: only do counting of pending-request for BFQ_GROUP_IOSCHED
      blk-iolatency: Fix memory leak on add_disk() failures
      loop: Fix the max_loop commandline argument treatment when it is set to 0
    axboe committed Dec 15, 2022
    Copy the full SHA
    99c5251 View commit details
    Browse the repository at this point in the history
  5. Merge branch 'io_uring-6.2' into for-next

    * io_uring-6.2:
      io_uring: fix overflow handling regression
      io_uring: ease timeout flush locking requirements
      io_uring: revise completion_lock locking
      io_uring: protect cq_timeouts with timeout_lock
    axboe committed Dec 15, 2022
    Copy the full SHA
    eb69e08 View commit details
    Browse the repository at this point in the history
  6. io_uring: fix overflow handling regression

    Because the single task locking series got reordered ahead of the
    timeout and completion lock changes, two hunks inadvertently ended up
    using __io_fill_cqe_req() rather than io_fill_cqe_req(). This meant
    that we dropped overflow handling in those two spots. Reinstate the
    correct CQE filling helper.
    
    Fixes: f66f734 ("io_uring: skip spinlocking for ->task_complete")
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Dec 15, 2022
    Copy the full SHA
    a8cf95f View commit details
    Browse the repository at this point in the history
  7. block: fix use-after-free of q->q_usage_counter

    For blk-mq, queue release handler is usually called after
    blk_mq_freeze_queue_wait() returns. However, the
    q_usage_counter->release() handler may not be run yet at that time, so
    this can cause a use-after-free.
    
    Fix the issue by moving percpu_ref_exit() into blk_free_queue_rcu().
    Since ->release() is called with rcu read lock held, it is agreed that
    the race should be covered in caller per discussion from the two links.
    
    Reported-by: Zhang Wensheng <zhangwensheng@huaweicloud.com>
    Reported-by: Zhong Jinghua <zhongjinghua@huawei.com>
    Link: https://lore.kernel.org/linux-block/Y5prfOjyyjQKUrtH@T590/T/#u
    Link: https://lore.kernel.org/lkml/Y4%2FmzMd4evRg9yDi@fedora/
    Cc: Hillf Danton <hdanton@sina.com>
    Cc: Yu Kuai <yukuai3@huawei.com>
    Cc: Dennis Zhou <dennis@kernel.org>
    Fixes: 2b0d3d3 ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20221215021629.74870-1-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Ming Lei authored and axboe committed Dec 15, 2022
    Copy the full SHA
    d36a9ea View commit details
    Browse the repository at this point in the history
  8. block, bfq: only do counting of pending-request for BFQ_GROUP_IOSCHED

    The 'bfqd->num_groups_with_pending_reqs' is used when
    CONFIG_BFQ_GROUP_IOSCHED is enabled, so let the variables and processes
    take effect when CONFIG_BFQ_GROUP_IOSCHED is enabled.
    
    Cc: Yu Kuai <yukuai3@huawei.com>
    Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Reviewed-by: Yu Kuai <yukuai3@huawei.com>
    Link: https://lore.kernel.org/r/20221110112622.389332-1-Yuwei.Guan@zeekrlife.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yuwei Guan authored and axboe committed Dec 15, 2022
    Copy the full SHA
    1eb2062 View commit details
    Browse the repository at this point in the history

Commits on Dec 14, 2022

  1. blk-iolatency: Fix memory leak on add_disk() failures

    When a gendisk is successfully initialized but add_disk() fails such as when
    a loop device has invalid number of minor device numbers specified,
    blkcg_init_disk() is called during init and then blkcg_exit_disk() during
    error handling. Unfortunately, iolatency gets initialized in the former but
    doesn't get cleaned up in the latter.
    
    This is because, in non-error cases, the cleanup is performed by
    del_gendisk() calling rq_qos_exit(), the assumption being that rq_qos
    policies, iolatency being one of them, can only be activated once the disk
    is fully registered and visible. That assumption is true for wbt and iocost,
    but not so for iolatency as it gets initialized before add_disk() is called.
    
    It is desirable to lazy-init rq_qos policies because they are optional
    features and add to hot path overhead once initialized - each IO has to walk
    all the registered rq_qos policies. So, we want to switch iolatency to lazy
    init too. However, that's a bigger change. As a fix for the immediate
    problem, let's just add an extra call to rq_qos_exit() in blkcg_exit_disk().
    This is safe because duplicate calls to rq_qos_exit() become noop's.
    
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Reported-by: darklight2357@icloud.com
    Cc: Josef Bacik <josef@toxicpanda.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Fixes: d706751 ("block: introduce blk-iolatency io controller")
    Cc: stable@vger.kernel.org # v4.19+
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/Y5TQ5gm3O4HXrXR3@slm.duckdns.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    htejun authored and axboe committed Dec 14, 2022
    Copy the full SHA
    813e693 View commit details
    Browse the repository at this point in the history
  2. loop: Fix the max_loop commandline argument treatment when it is set …

    …to 0
    
    Currently, the max_loop commandline argument can be used to specify how
    many loop block devices are created at init time. If it is not
    specified on the commandline, CONFIG_BLK_DEV_LOOP_MIN_COUNT loop block
    devices will be created.
    
    The max_loop commandline argument can be used to override the value of
    CONFIG_BLK_DEV_LOOP_MIN_COUNT. However, when max_loop is set to 0
    through the commandline, the current logic treats it as if it had not
    been set, and creates CONFIG_BLK_DEV_LOOP_MIN_COUNT devices anyway.
    
    Fix this by starting max_loop off as set to CONFIG_BLK_DEV_LOOP_MIN_COUNT.
    This preserves the intended behavior of creating
    CONFIG_BLK_DEV_LOOP_MIN_COUNT loop block devices if the max_loop
    commandline parameter is not specified, and allowing max_loop to
    be respected for all values, including 0.
    
    This allows environments that can create all of their required loop
    block devices on demand to not have to unnecessarily preallocate loop
    block devices.
    
    Fixes: 7328508 ("remove artificial software max_loop limit")
    Cc: stable@vger.kernel.org
    Cc: Ken Chen <kenchen@google.com>
    Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
    Link: https://lore.kernel.org/r/20221208212902.765781-1-isaacmanjarres@google.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Isaac J. Manjarres authored and axboe committed Dec 14, 2022
    Copy the full SHA
    85c5019 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'block-6.2' into for-next

    * block-6.2:
      block/blk-iocost (gcc13): keep large values in a new enum
      block, bfq: replace 0/1 with false/true in bic apis
      block, bfq: don't return bfqg from __bfq_bic_change_cgroup()
      block, bfq: fix possible uaf for 'bfqq->bic'
    axboe committed Dec 14, 2022
    Copy the full SHA
    4c42273 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'for-6.2/writeback' into for-next

    * for-6.2/writeback:
      writeback: remove obsolete macro EXPIRE_DIRTY_ATIME
      writeback: Add asserts for adding freed inode to lists
    axboe committed Dec 14, 2022
    Copy the full SHA
    5f431dc View commit details
    Browse the repository at this point in the history
  5. block/blk-iocost (gcc13): keep large values in a new enum

    Since gcc13, each member of an enum has the same type as the enum [1]. And
    that is inherited from its members. Provided:
      VTIME_PER_SEC_SHIFT     = 37,
      VTIME_PER_SEC           = 1LLU << VTIME_PER_SEC_SHIFT,
      ...
      AUTOP_CYCLE_NSEC        = 10LLU * NSEC_PER_SEC,
    the named type is unsigned long.
    
    This generates warnings with gcc-13:
      block/blk-iocost.c: In function 'ioc_weight_prfill':
      block/blk-iocost.c:3037:37: error: format '%u' expects argument of type 'unsigned int', but argument 4 has type 'long unsigned int'
    
      block/blk-iocost.c: In function 'ioc_weight_show':
      block/blk-iocost.c:3047:34: error: format '%u' expects argument of type 'unsigned int', but argument 3 has type 'long unsigned int'
    
    So split the anonymous enum with large values to a separate enum, so
    that they don't affect other members.
    
    [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36113
    
    Cc: Martin Liska <mliska@suse.cz>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Josef Bacik <josef@toxicpanda.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: cgroups@vger.kernel.org
    Cc: linux-block@vger.kernel.org
    Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
    Link: https://lore.kernel.org/r/20221213120826.17446-1-jirislaby@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Jiri Slaby (SUSE) authored and axboe committed Dec 14, 2022
    Copy the full SHA
    ff1cc97 View commit details
    Browse the repository at this point in the history
  6. block, bfq: replace 0/1 with false/true in bic apis

    Just to make the code a litter cleaner, there are no functional changes.
    
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20221214033155.3455754-3-yukuai1@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yu Kuai authored and axboe committed Dec 14, 2022
    Copy the full SHA
    337366e View commit details
    Browse the repository at this point in the history
  7. block, bfq: don't return bfqg from __bfq_bic_change_cgroup()

    The return value is not used, hence remove it.
    
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20221214033155.3455754-2-yukuai1@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yu Kuai authored and axboe committed Dec 14, 2022
    Copy the full SHA
    452af7d View commit details
    Browse the repository at this point in the history
  8. block, bfq: fix possible uaf for 'bfqq->bic'

    Our test report a uaf for 'bfqq->bic' in 5.10:
    
    ==================================================================
    BUG: KASAN: use-after-free in bfq_select_queue+0x378/0xa30
    
    CPU: 6 PID: 2318352 Comm: fsstress Kdump: loaded Not tainted 5.10.0-60.18.0.50.h602.kasan.eulerosv2r11.x86_64 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-20220320_160524-szxrtosci10000 04/01/2014
    Call Trace:
     bfq_select_queue+0x378/0xa30
     bfq_dispatch_request+0xe8/0x130
     blk_mq_do_dispatch_sched+0x62/0xb0
     __blk_mq_sched_dispatch_requests+0x215/0x2a0
     blk_mq_sched_dispatch_requests+0x8f/0xd0
     __blk_mq_run_hw_queue+0x98/0x180
     __blk_mq_delay_run_hw_queue+0x22b/0x240
     blk_mq_run_hw_queue+0xe3/0x190
     blk_mq_sched_insert_requests+0x107/0x200
     blk_mq_flush_plug_list+0x26e/0x3c0
     blk_finish_plug+0x63/0x90
     __iomap_dio_rw+0x7b5/0x910
     iomap_dio_rw+0x36/0x80
     ext4_dio_read_iter+0x146/0x190 [ext4]
     ext4_file_read_iter+0x1e2/0x230 [ext4]
     new_sync_read+0x29f/0x400
     vfs_read+0x24e/0x2d0
     ksys_read+0xd5/0x1b0
     do_syscall_64+0x33/0x40
     entry_SYSCALL_64_after_hwframe+0x61/0xc6
    
    Commit 3bc5e68 ("bfq: Split shared queues on move between cgroups")
    changes that move process to a new cgroup will allocate a new bfqq to
    use, however, the old bfqq and new bfqq can point to the same bic:
    
    1) Initial state, two process with io in the same cgroup.
    
    Process 1       Process 2
     (BIC1)          (BIC2)
      |  Λ            |  Λ
      |  |            |  |
      V  |            V  |
      bfqq1           bfqq2
    
    2) bfqq1 is merged to bfqq2.
    
    Process 1       Process 2
     (BIC1)          (BIC2)
      |               |
       \-------------\|
                      V
      bfqq1           bfqq2(coop)
    
    3) Process 1 exit, then issue new io(denoce IOA) from Process 2.
    
     (BIC2)
      |  Λ
      |  |
      V  |
      bfqq2(coop)
    
    4) Before IOA is completed, move Process 2 to another cgroup and issue io.
    
    Process 2
     (BIC2)
       Λ
       |\--------------\
       |                V
      bfqq2           bfqq3
    
    Now that BIC2 points to bfqq3, while bfqq2 and bfqq3 both point to BIC2.
    If all the requests are completed, and Process 2 exit, BIC2 will be
    freed while there is no guarantee that bfqq2 will be freed before BIC2.
    
    Fix the problem by clearing bfqq->bic while bfqq is detached from bic.
    
    Fixes: 3bc5e68 ("bfq: Split shared queues on move between cgroups")
    Suggested-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20221214030430.3304151-1-yukuai1@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yu Kuai authored and axboe committed Dec 14, 2022
    Copy the full SHA
    64dc8c7 View commit details
    Browse the repository at this point in the history
  9. io_uring: ease timeout flush locking requirements

    We don't need completion_lock for timeout flushing, don't take it.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/1e3dc657975ac445b80e7bdc40050db783a5935a.1670002973.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Dec 14, 2022
    Copy the full SHA
    e5f30f6 View commit details
    Browse the repository at this point in the history
  10. io_uring: revise completion_lock locking

    io_kill_timeouts() doesn't post any events but queues everything to
    task_work. Locking there is needed for protecting linked requests
    traversing, we should grab completion_lock directly instead of using
    io_cq_[un]lock helpers. Same goes for __io_req_find_next_prep().
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/88e75d481a65dc295cb59722bb1cf76402d1c06b.1670002973.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Dec 14, 2022
    Copy the full SHA
    6971253 View commit details
    Browse the repository at this point in the history
  11. io_uring: protect cq_timeouts with timeout_lock

    Read cq_timeouts in io_flush_timeouts() only after taking the
    timeout_lock, as it's protected by it. There are many places where we
    also grab ->completion_lock, but for instance io_timeout_fn() doesn't
    and still modifies cq_timeouts.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/9c79544dd6cf5c4018cb1bab99cf481a93ea46ef.1670002973.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Dec 14, 2022
    Copy the full SHA
    ea011ee View commit details
    Browse the repository at this point in the history
  12. Merge tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linu…

    …x/kernel/git/akpm/mm
    
    Pull MM updates from Andrew Morton:
    
     - More userfaultfs work from Peter Xu
    
     - Several convert-to-folios series from Sidhartha Kumar and Huang Ying
    
     - Some filemap cleanups from Vishal Moola
    
     - David Hildenbrand added the ability to selftest anon memory COW
       handling
    
     - Some cpuset simplifications from Liu Shixin
    
     - Addition of vmalloc tracing support by Uladzislau Rezki
    
     - Some pagecache folioifications and simplifications from Matthew
       Wilcox
    
     - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use
       it
    
     - Miguel Ojeda contributed some cleanups for our use of the
       __no_sanitize_thread__ gcc keyword.
    
       This series should have been in the non-MM tree, my bad
    
     - Naoya Horiguchi improved the interaction between memory poisoning and
       memory section removal for huge pages
    
     - DAMON cleanups and tuneups from SeongJae Park
    
     - Tony Luck fixed the handling of COW faults against poisoned pages
    
     - Peter Xu utilized the PTE marker code for handling swapin errors
    
     - Hugh Dickins reworked compound page mapcount handling, simplifying it
       and making it more efficient
    
     - Removal of the autonuma savedwrite infrastructure from Nadav Amit and
       David Hildenbrand
    
     - zram support for multiple compression streams from Sergey Senozhatsky
    
     - David Hildenbrand reworked the GUP code's R/O long-term pinning so
       that drivers no longer need to use the FOLL_FORCE workaround which
       didn't work very well anyway
    
     - Mel Gorman altered the page allocator so that local IRQs can remnain
       enabled during per-cpu page allocations
    
     - Vishal Moola removed the try_to_release_page() wrapper
    
     - Stefan Roesch added some per-BDI sysfs tunables which are used to
       prevent network block devices from dirtying excessive amounts of
       pagecache
    
     - David Hildenbrand did some cleanup and repair work on KSM COW
       breaking
    
     - Nhat Pham and Johannes Weiner have implemented writeback in zswap's
       zsmalloc backend
    
     - Brian Foster has fixed a longstanding corner-case oddity in
       file[map]_write_and_wait_range()
    
     - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
       Chen
    
     - Shiyang Ruan has done some work on fsdax, to make its reflink mode
       work better under xfstests. Better, but still not perfect
    
     - Christoph Hellwig has removed the .writepage() method from several
       filesystems. They only need .writepages()
    
     - Yosry Ahmed wrote a series which fixes the memcg reclaim target
       beancounting
    
     - David Hildenbrand has fixed some of our MM selftests for 32-bit
       machines
    
     - Many singleton patches, as usual
    
    * tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits)
      mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio
      mm: mmu_gather: allow more than one batch of delayed rmaps
      mm: fix typo in struct pglist_data code comment
      kmsan: fix memcpy tests
      mm: add cond_resched() in swapin_walk_pmd_entry()
      mm: do not show fs mm pc for VM_LOCKONFAULT pages
      selftests/vm: ksm_functional_tests: fixes for 32bit
      selftests/vm: cow: fix compile warning on 32bit
      selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions
      mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem
      mm,thp,rmap: fix races between updates of subpages_mapcount
      mm: memcg: fix swapcached stat accounting
      mm: add nodes= arg to memory.reclaim
      mm: disable top-tier fallback to reclaim on proactive reclaim
      selftests: cgroup: make sure reclaim target memcg is unprotected
      selftests: cgroup: refactor proactive reclaim code to reclaim_until()
      mm: memcg: fix stale protection of reclaim target memcg
      mm/mmap: properly unaccount memory on mas_preallocate() failure
      omfs: remove ->writepage
      jfs: remove ->writepage
      ...
    torvalds committed Dec 14, 2022
    Copy the full SHA
    e2ca6ba View commit details
    Browse the repository at this point in the history

Commits on Dec 13, 2022

  1. Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/netdev/net-next
    
    Pull networking updates from Paolo Abeni:
     "Core:
    
       - Allow live renaming when an interface is up
    
       - Add retpoline wrappers for tc, improving considerably the
         performances of complex queue discipline configurations
    
       - Add inet drop monitor support
    
       - A few GRO performance improvements
    
       - Add infrastructure for atomic dev stats, addressing long standing
         data races
    
       - De-duplicate common code between OVS and conntrack offloading
         infrastructure
    
       - A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements
    
       - Netfilter: introduce packet parser for tunneled packets
    
       - Replace IPVS timer-based estimators with kthreads to scale up the
         workload with the number of available CPUs
    
       - Add the helper support for connection-tracking OVS offload
    
      BPF:
    
       - Support for user defined BPF objects: the use case is to allocate
         own objects, build own object hierarchies and use the building
         blocks to build own data structures flexibly, for example, linked
         lists in BPF
    
       - Make cgroup local storage available to non-cgroup attached BPF
         programs
    
       - Avoid unnecessary deadlock detection and failures wrt BPF task
         storage helpers
    
       - A relevant bunch of BPF verifier fixes and improvements
    
       - Veristat tool improvements to support custom filtering, sorting,
         and replay of results
    
       - Add LLVM disassembler as default library for dumping JITed code
    
       - Lots of new BPF documentation for various BPF maps
    
       - Add bpf_rcu_read_{,un}lock() support for sleepable programs
    
       - Add RCU grace period chaining to BPF to wait for the completion of
         access from both sleepable and non-sleepable BPF programs
    
       - Add support storing struct task_struct objects as kptrs in maps
    
       - Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
         values
    
       - Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions
    
      Protocols:
    
       - TCP: implement Protective Load Balancing across switch links
    
       - TCP: allow dynamically disabling TCP-MD5 static key, reverting back
         to fast[er]-path
    
       - UDP: Introduce optional per-netns hash lookup table
    
       - IPv6: simplify and cleanup sockets disposal
    
       - Netlink: support different type policies for each generic netlink
         operation
    
       - MPTCP: add MSG_FASTOPEN and FastOpen listener side support
    
       - MPTCP: add netlink notification support for listener sockets events
    
       - SCTP: add VRF support, allowing sctp sockets binding to VRF devices
    
       - Add bridging MAC Authentication Bypass (MAB) support
    
       - Extensions for Ethernet VPN bridging implementation to better
         support multicast scenarios
    
       - More work for Wi-Fi 7 support, comprising conversion of all the
         existing drivers to internal TX queue usage
    
       - IPSec: introduce a new offload type (packet offload) allowing
         complete header processing and crypto offloading
    
       - IPSec: extended ack support for more descriptive XFRM error
         reporting
    
       - RXRPC: increase SACK table size and move processing into a
         per-local endpoint kernel thread, reducing considerably the
         required locking
    
       - IEEE 802154: synchronous send frame and extended filtering support,
         initial support for scanning available 15.4 networks
    
       - Tun: bump the link speed from 10Mbps to 10Gbps
    
       - Tun/VirtioNet: implement UDP segmentation offload support
    
      Driver API:
    
       - PHY/SFP: improve power level switching between standard level 1 and
         the higher power levels
    
       - New API for netdev <-> devlink_port linkage
    
       - PTP: convert existing drivers to new frequency adjustment
         implementation
    
       - DSA: add support for rx offloading
    
       - Autoload DSA tagging driver when dynamically changing protocol
    
       - Add new PCP and APPTRUST attributes to Data Center Bridging
    
       - Add configuration support for 800Gbps link speed
    
       - Add devlink port function attribute to enable/disable RoCE and
         migratable
    
       - Extend devlink-rate to support strict prioriry and weighted fair
         queuing
    
       - Add devlink support to directly reading from region memory
    
       - New device tree helper to fetch MAC address from nvmem
    
       - New big TCP helper to simplify temporary header stripping
    
      New hardware / drivers:
    
       - Ethernet:
          - Marvel Octeon CNF95N and CN10KB Ethernet Switches
          - Marvel Prestera AC5X Ethernet Switch
          - WangXun 10 Gigabit NIC
          - Motorcomm yt8521 Gigabit Ethernet
          - Microchip ksz9563 Gigabit Ethernet Switch
          - Microsoft Azure Network Adapter
          - Linux Automation 10Base-T1L adapter
    
       - PHY:
          - Aquantia AQR112 and AQR412
          - Motorcomm YT8531S
    
       - PTP:
          - Orolia ART-CARD
    
       - WiFi:
          - MediaTek Wi-Fi 7 (802.11be) devices
          - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB
            devices
    
       - Bluetooth:
          - Broadcom BCM4377/4378/4387 Bluetooth chipsets
          - Realtek RTL8852BE and RTL8723DS
          - Cypress.CYW4373A0 WiFi + Bluetooth combo device
    
      Drivers:
    
       - CAN:
          - gs_usb: bus error reporting support
          - kvaser_usb: listen only and bus error reporting support
    
       - Ethernet NICs:
          - Intel (100G):
             - extend action skbedit to RX queue mapping
             - implement devlink-rate support
             - support direct read from memory
          - nVidia/Mellanox (mlx5):
             - SW steering improvements, increasing rules update rate
             - Support for enhanced events compression
             - extend H/W offload packet manipulation capabilities
             - implement IPSec packet offload mode
          - nVidia/Mellanox (mlx4):
             - better big TCP support
          - Netronome Ethernet NICs (nfp):
             - IPsec offload support
             - add support for multicast filter
          - Broadcom:
             - RSS and PTP support improvements
          - AMD/SolarFlare:
             - netlink extened ack improvements
             - add basic flower matches to offload, and related stats
          - Virtual NICs:
             - ibmvnic: introduce affinity hint support
          - small / embedded:
             - FreeScale fec: add initial XDP support
             - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood
             - TI am65-cpsw: add suspend/resume support
             - Mediatek MT7986: add RX wireless wthernet dispatch support
             - Realtek 8169: enable GRO software interrupt coalescing per
               default
    
       - Ethernet high-speed switches:
          - Microchip (sparx5):
             - add support for Sparx5 TC/flower H/W offload via VCAP
          - Mellanox mlxsw:
             - add 802.1X and MAC Authentication Bypass offload support
             - add ip6gre support
    
       - Embedded Ethernet switches:
          - Mediatek (mtk_eth_soc):
             - improve PCS implementation, add DSA untag support
             - enable flow offload support
          - Renesas:
             - add rswitch R-Car Gen4 gPTP support
          - Microchip (lan966x):
             - add full XDP support
             - add TC H/W offload via VCAP
             - enable PTP on bridge interfaces
          - Microchip (ksz8):
             - add MTU support for KSZ8 series
    
       - Qualcomm 802.11ax WiFi (ath11k):
          - support configuring channel dwell time during scan
    
       - MediaTek WiFi (mt76):
          - enable Wireless Ethernet Dispatch (WED) offload support
          - add ack signal support
          - enable coredump support
          - remain_on_channel support
    
       - Intel WiFi (iwlwifi):
          - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
          - 320 MHz channels support
    
       - RealTek WiFi (rtw89):
          - new dynamic header firmware format support
          - wake-over-WLAN support"
    
    * tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits)
      ipvs: fix type warning in do_div() on 32 bit
      net: lan966x: Remove a useless test in lan966x_ptp_add_trap()
      net: ipa: add IPA v4.7 support
      dt-bindings: net: qcom,ipa: Add SM6350 compatible
      bnxt: Use generic HBH removal helper in tx path
      IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver
      selftests: forwarding: Add bridge MDB test
      selftests: forwarding: Rename bridge_mdb test
      bridge: mcast: Support replacement of MDB port group entries
      bridge: mcast: Allow user space to specify MDB entry routing protocol
      bridge: mcast: Allow user space to add (*, G) with a source list and filter mode
      bridge: mcast: Add support for (*, G) with a source list and filter mode
      bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source
      bridge: mcast: Add a flag for user installed source entries
      bridge: mcast: Expose __br_multicast_del_group_src()
      bridge: mcast: Expose br_multicast_new_group_src()
      bridge: mcast: Add a centralized error path
      bridge: mcast: Place netlink policy before validation functions
      bridge: mcast: Split (*, G) and (S, G) addition into different functions
      bridge: mcast: Do not derive entry type from its filter mode
      ...
    torvalds committed Dec 13, 2022
    Copy the full SHA
    7e68dd7 View commit details
    Browse the repository at this point in the history
  2. Merge tag 'xtensa-20221213' of https://github.com/jcmvbkbc/linux-xtensa

    Pull Xtensa updates from Max Filippov:
    
     - fix kernel build with gcc-13
    
     - various minor fixes
    
    * tag 'xtensa-20221213' of https://github.com/jcmvbkbc/linux-xtensa:
      xtensa: add __umulsidi3 helper
      xtensa: update config files
      MAINTAINERS: update the 'T:' entry for xtensa
    torvalds committed Dec 13, 2022
    Copy the full SHA
    1ca06f1 View commit details
    Browse the repository at this point in the history
  3. Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

    Pull ARM updates from Russell King:
    
     - update unwinder to cope with module PLTs
    
     - enable UBSAN on ARM
    
     - improve kernel fault message
    
     - update UEFI runtime page tables dump
    
     - avoid clang's __aeabi_uldivmod generated in NWFPE code
    
     - disable FIQs on CPU shutdown paths
    
     - update XOR register usage
    
     - a number of build updates (using .arch, thread pointer, removal of
       lazy evaluation in Makefile)
    
     - conversion of stacktrace code to stackwalk
    
     - findbit assembly updates
    
     - hwcap feature updates for ARMv8 CPUs
    
     - instruction dump updates for big-endian platforms
    
     - support for function error injection
    
    * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (31 commits)
      ARM: 9279/1: support function error injection
      ARM: 9277/1: Make the dumped instructions are consistent with the disassembled ones
      ARM: 9276/1: Refactor dump_instr()
      ARM: 9275/1: Drop '-mthumb' from AFLAGS_ISA
      ARM: 9274/1: Add hwcap for Speculative Store Bypassing Safe
      ARM: 9273/1: Add hwcap for Speculation Barrier(SB)
      ARM: 9272/1: vfp: Add hwcap for FEAT_AA32I8MM
      ARM: 9271/1: vfp: Add hwcap for FEAT_AA32BF16
      ARM: 9270/1: vfp: Add hwcap for FEAT_FHM
      ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd
      ARM: 9268/1: vfp: Add hwcap FPHP and ASIMDHP for FEAT_FP16
      ARM: 9267/1: Define Armv8 registers in AArch32 state
      ARM: findbit: add unwinder information
      ARM: findbit: operate by words
      ARM: findbit: convert to macros
      ARM: findbit: provide more efficient ARMv7 implementation
      ARM: findbit: document ARMv5 bit offset calculation
      ARM: 9259/1: stacktrace: Convert stacktrace to generic ARCH_STACKWALK
      ARM: 9258/1: stacktrace: Make stack walk callback consistent with generic code
      ARM: 9265/1: pass -march= only to compiler
      ...
    torvalds committed Dec 13, 2022
    Copy the full SHA
    4cb1fc6 View commit details
    Browse the repository at this point in the history
Older