Skip to content
Permalink
Branch: landlock-v14
Commits on Feb 24, 2020
  1. landlock: Add user and kernel documentation

    l0kod committed Feb 24, 2020
    This documentation can be built with the Sphinx framework.
    
    Another location might be more appropriate, though.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Reviewed-by: Vincent Dagonneau <vincent.dagonneau@ssi.gouv.fr>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: James Morris <jmorris@namei.org>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v13:
    * Rewrote the documentation according to the major revamp.
    
    Previous version:
    https://lore.kernel.org/lkml/20191104172146.30797-8-mic@digikod.net/
  2. samples/landlock: Add a sandbox manager example

    l0kod committed Feb 24, 2020
    Add a basic sandbox tool to launch a command which can only access a
    whitelist of file hierarchies in a read-only or read-write way.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: James Morris <jmorris@namei.org>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v11:
    * Add back the filesystem sandbox manager and update it to work with the
      new Landlock syscall.
    
    Previous version:
    https://lore.kernel.org/lkml/20190721213116.23476-9-mic@digikod.net/
  3. selftests/landlock: Add initial tests

    l0kod committed Feb 24, 2020
    Test landlock syscall, ptrace hooks semantic and filesystem
    access-control.
    
    This is an initial batch, more tests will follow.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Reviewed-by: Vincent Dagonneau <vincent.dagonneau@ssi.gouv.fr>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: James Morris <jmorris@namei.org>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    Cc: Shuah Khan <shuah@kernel.org>
    ---
    
    Changes since v13:
    * Add back the filesystem tests (from v10) and extend them.
    * Add tests for the new syscall.
    
    Previous version:
    https://lore.kernel.org/lkml/20191104172146.30797-7-mic@digikod.net/
  4. arch: Wire up landlock() syscall

    l0kod committed Feb 24, 2020
    Wire up the landlock() call for x86_64 (for now).
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: James Morris <jmorris@namei.org>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v13:
    * New implementation.
  5. landlock: Add syscall implementation

    l0kod committed Feb 24, 2020
    This syscall, inspired from seccomp(2) and bpf(2), is designed to be
    used by unprivileged processes to sandbox themselves.  It has the same
    usage restrictions as seccomp(2): no_new_privs check.
    
    There is currently four commands:
    * get_features: Gets the supported features (required for backward
      compatibility and best-effort security).
    * create_ruleset: Creates a ruleset and returns its file descriptor.
    * add_rule: Adds a rule (e.g. file hierarchy access) to a ruleset,
      identified by the dedicated file descriptor.
    * enforce_ruleset: Enforces a ruleset on the current thread (similar to
      seccomp).
    
    See the user and code documentation for more details.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: James Morris <jmorris@namei.org>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v13:
    * New implementation, replacing the dependency on seccomp(2) and bpf(2).
  6. fs,landlock: Support filesystem access-control

    l0kod committed Feb 24, 2020
    Thanks to the Landlock objects and ruleset, it is possible to identify
    inodes according to a process' domain.  To enable an unprivileged
    process to express a file hierarchy, it first needs to open a directory
    (or a file) and pass this file descriptor to the kernel through
    landlock(2).  When checking if a file access request is allowed, we walk
    from the requested dentry to the real root, following the different
    mount layers.  The access to each "tagged" inodes are collected and
    ANDed to create an access to the requested file hierarchy.  This makes
    possible to identify a lot of files without tagging every inodes nor
    modifying the filesystem, while still following the view and
    understanding the user has from the filesystem.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: James Morris <jmorris@namei.org>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v11:
    * Add back, revamp and make a fully working filesystem access-control
      based on paths and inodes.
    * Remove the eBPF dependency.
    
    Previous version:
    https://lore.kernel.org/lkml/20190721213116.23476-6-mic@digikod.net/
  7. landlock: Add ptrace restrictions

    l0kod committed Feb 24, 2020
    Using ptrace(2) and related debug features on a target process can lead
    to a privilege escalation.  Indeed, ptrace(2) can be used by an attacker
    to impersonate another task and to remain undetected while performing
    malicious activities.  Thanks to  ptrace_may_access(), various part of
    the kernel can check if a tracer is more privileged than a tracee.
    
    A landlocked process has fewer privileges than a non-landlocked process
    and must then be subject to additional restrictions when manipulating
    processes. To be allowed to use ptrace(2) and related syscalls on a
    target process, a landlocked process must have a subset of the target
    process' rules (i.e. the tracee must be in a sub-domain of the tracer).
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: James Morris <jmorris@namei.org>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v13:
    * Make the ptrace restriction mandatory, like in the v10.
    * Remove the eBPF dependency.
    
    Previous version:
    https://lore.kernel.org/lkml/20191104172146.30797-5-mic@digikod.net/
  8. landlock: Set up the security framework and manage credentials

    l0kod committed Feb 24, 2020
    A process credentials point to a Landlock domain, which is underneath
    implemented with a ruleset.  In the following commits, this domain is
    used to check and enforce the ptrace and filesystem security policies.
    A domain is inherited from a parent to its child the same way a thread
    inherits a seccomp policy.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: James Morris <jmorris@namei.org>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v13:
    * totally get ride of the seccomp dependency
    * only keep credential management and LSM setup.
    
    Previous version:
    https://lore.kernel.org/lkml/20191104172146.30797-4-mic@digikod.net/
  9. landlock: Add ruleset and domain management

    l0kod committed Feb 24, 2020
    A Landlock ruleset is mainly a red-black tree with Landlock rules as
    nodes.  This enables quick update and lookup to match a requested access
    e.g., to a file.  A ruleset is usable through a dedicated file
    descriptor (cf. following commit adding the syscall) which enables a
    process to build it by adding new rules.
    
    A domain is a ruleset tied to a set of processes.  This group of rules
    defined the security policy enforced on these processes and their future
    children.  A domain can transition to a new domain which is the merge of
    itself with a ruleset provided by the current process.  This merge is
    the intersection of all the constraints, which means that a process can
    only gain more constraints over time.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: James Morris <jmorris@namei.org>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v13:
    * New implementation, inspired by the previous inode eBPF map, but
      agnostic to the underlying kernel object.
    
    Previous version:
    https://lore.kernel.org/lkml/20190721213116.23476-7-mic@digikod.net/
  10. landlock: Add object and rule management

    l0kod committed Feb 24, 2020
    A Landlock object enables to identify a kernel object (e.g. an inode).
    A Landlock rule is a set of access rights allowed on an object.  Rules
    are grouped in rulesets that may be tied to a set of processes (i.e.
    subjects) to enforce a scoped access-control (i.e. a domain).
    
    Because Landlock's goal is to empower any process (especially
    unprivileged ones) to sandbox themselves, we can't rely on a system-wide
    object identification such as file extended attributes.  Indeed, we need
    innocuous, composable and modular access-controls.
    
    The main challenge with this constraints is to identify kernel objects
    while this identification is useful (i.e. when a security policy makes
    use of this object).  But this identification data should be freed once
    no policy is using it.  This ephemeral tagging should not and may not be
    written in the filesystem.  We then need to manage the lifetime of a
    rule according to the lifetime of its object.  To avoid a global lock,
    this implementation make use of RCU and counters to safely reference
    objects.
    
    A following commit uses this generic object management for inodes.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: James Morris <jmorris@namei.org>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v13:
    * New dedicated implementation, removing the need for eBPF.
    
    Previous version:
    https://lore.kernel.org/lkml/20190721213116.23476-6-mic@digikod.net/
  11. Linux 5.6-rc3

    torvalds committed Feb 24, 2020
Commits on Feb 23, 2020
  1. Merge tag 'for-5.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/ker…

    torvalds committed Feb 23, 2020
    …nel/git/kdave/linux
    
    Pull btrfs fixes from David Sterba:
     "These are fixes that were found during testing with help of error
      injection, plus some other stable material.
    
      There's a fixup to patch added to rc1 causing locking in wrong context
      warnings, tests found one more deadlock scenario. The patches are
      tagged for stable, two of them now in the queue but we'd like all
      three released at the same time.
    
      I'm not happy about fixes to fixes in such a fast succession during
      rcs, but I hope we found all the fallouts of commit 28553fa
      ('Btrfs: fix race between shrinking truncate and fiemap')"
    
    * tag 'for-5.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
      Btrfs: fix deadlock during fast fsync when logging prealloc extents beyond eof
      Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents
      btrfs: fix bytes_may_use underflow in prealloc error condtition
      btrfs: handle logged extent failure properly
      btrfs: do not check delayed items are empty for single transaction cleanup
      btrfs: reset fs_root to NULL on error in open_ctree
      btrfs: destroy qgroup extent records on transaction abort
  2. Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/lin…

    torvalds committed Feb 23, 2020
    …ux/kernel/git/tytso/ext4
    
    Pull ext4 fixes from Ted Ts'o:
     "More miscellaneous ext4 bug fixes (all stable fodder)"
    
    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
      ext4: fix mount failure with quota configured as module
      jbd2: fix ocfs2 corrupt when clearing block group bits
      ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
      ext4: rename s_journal_flag_rwsem to s_writepages_rwsem
      ext4: fix potential race between s_flex_groups online resizing and access
      ext4: fix potential race between s_group_info online resizing and access
      ext4: fix potential race between online resizing and write operations
      ext4: add cond_resched() to __ext4_find_entry()
      ext4: fix a data race in EXT4_I(inode)->i_disksize
  3. Merge tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linux

    torvalds committed Feb 23, 2020
    Pull csky updates from Guo Ren:
     "Sorry, I missed 5.6-rc1 merge window, but in this pull request the
      most are the fixes and the rests are between fixes and features. The
      only outside modification is the MAINTAINERS file update with our
      mailing list.
    
       - cache flush implementation fixes
    
       - ftrace modify panic fix
    
       - CONFIG_SMP boot problem fix
    
       - fix pt_regs saving for atomic.S
    
       - fix fixaddr_init without highmem.
    
       - fix stack protector support
    
       - fix fake Tightly-Coupled Memory code compile and use
    
       - fix some typos and coding convention"
    
    * tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linux: (23 commits)
      csky: Replace <linux/clk-provider.h> by <linux/of_clk.h>
      csky: Implement copy_thread_tls
      csky: Add PCI support
      csky: Minimize defconfig to support buildroot config.fragment
      csky: Add setup_initrd check code
      csky: Cleanup old Kconfig options
      arch/csky: fix some Kconfig typos
      csky: Fixup compile warning for three unimplemented syscalls
      csky: Remove unused cache implementation
      csky: Fixup ftrace modify panic
      csky: Add flush_icache_mm to defer flush icache all
      csky: Optimize abiv2 copy_to_user_page with VM_EXEC
      csky: Enable defer flush_dcache_page for abiv2 cpus (807/810/860)
      csky: Remove unnecessary flush_icache_* implementation
      csky: Support icache flush without specific instructions
      csky/Kconfig: Add Kconfig.platforms to support some drivers
      csky/smp: Fixup boot failed when CONFIG_SMP
      csky: Set regs->usp to kernel sp, when the exception is from kernel
      csky/mm: Fixup export invalid_pte_table symbol
      csky: Separate fixaddr_init from highmem
      ...
  4. csky: Replace <linux/clk-provider.h> by <linux/of_clk.h>

    Geert Uytterhoeven Guo Ren
    Geert Uytterhoeven authored and Guo Ren committed Feb 12, 2020
    The C-Sky platform code is not a clock provider, and just needs to call
    of_clk_init().
    
    Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>.
    
    Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
  5. Merge tag 'ras-urgent-2020-02-22' of git://git.kernel.org/pub/scm/lin…

    torvalds committed Feb 23, 2020
    …ux/kernel/git/tip/tip
    
    Pull RAS fixes from Thomas Gleixner:
     "Two fixes for the AMD MCE driver:
    
       - Populate the per CPU MCA bank descriptor pointer only after it has
         been completely set up to prevent a use-after-free in case that one
         of the subsequent initialization step fails
    
       - Implement a proper release function for the sysfs entries of MCA
         threshold controls instead of freeing the memory right in the CPU
         teardown code, which leads to another use-after-free when the
         associated sysfs file is opened and accessed"
    
    * tag 'ras-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      x86/mce/amd: Fix kobject lifetime
      x86/mce/amd: Publish the bank pointer only after setup has succeeded
  6. Merge tag 'irq-urgent-2020-02-22' of git://git.kernel.org/pub/scm/lin…

    torvalds committed Feb 23, 2020
    …ux/kernel/git/tip/tip
    
    Pull irq fixes from Thomas Gleixner:
     "Two fixes for the irq core code which are follow ups to the recent MSI
      fixes:
    
       - The WARN_ON which was put into the MSI setaffinity callback for
         paranoia reasons actually triggered via a callchain which escaped
         when all the possible ways to reach that code were analyzed.
    
         The proc/irq/$N/*affinity interfaces have a quirk which came in
         when ALPHA moved to the generic interface: In case that the written
         affinity mask does not contain any online CPU it calls into ALPHAs
         magic auto affinity setting code.
    
         A few years later this mechanism was also made available to x86 for
         no good reasons and in a way which circumvents all sanity checks
         for interrupts which cannot have their affinity set from process
         context on X86 due to the way the X86 interrupt delivery works.
    
         It would be possible to make this work properly, but there is no
         point in doing so. If the interrupt is not yet started then the
         affinity setting has no effect and if it is started already then it
         is already assigned to an online CPU so there is no point to
         randomly move it to some other CPU. Just return EINVAL as the code
         has done before that change forever.
    
       - The new MSI quirk bit in the irq domain flags turned out to be
         already occupied, which escaped the author and the reviewers
         because the already in use bits were 0,6,2,3,4,5 listed in that
         order.
    
         That bit 6 was simply overlooked because the ordering was straight
         forward linear otherwise. So the new bit ended up being a
         duplicate.
    
         Fix it up by switching the oddball 6 to the obvious 1"
    
    * tag 'irq-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      genirq/irqdomain: Make sure all irq domain flags are distinct
      genirq/proc: Reject invalid affinity masks (again)
  7. Merge tag 'x86-urgent-2020-02-22' of git://git.kernel.org/pub/scm/lin…

    torvalds committed Feb 23, 2020
    …ux/kernel/git/tip/tip
    
    Pull x86 fixes from Thomas Gleixner:
     "Two fixes for x86:
    
       - Remove the __force_oder definiton from the kaslr boot code as it is
         already defined in the page table code which makes GCC 10 builds
         fail because it changed the default to -fno-common.
    
       - Address the AMD erratum 1054 concerning the IRPERF capability and
         enable the Instructions Retired fixed counter on machines which are
         not affected by the erratum"
    
    * tag 'x86-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF
      x86/boot/compressed: Don't declare __force_order in kaslr_64.c
Commits on Feb 22, 2020
  1. Merge tag 'zonefs-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kern…

    torvalds committed Feb 22, 2020
    …el/git/dlemoal/zonefs
    
    Pull zonefs fix from Damien Le Moal:
     "A single patch fixing typos in the documentation file"
    
    * tag 'zonefs-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
      zonefs: fix documentation typos etc.
  2. Merge tag 'io_uring-5.6-2020-02-22' of git://git.kernel.dk/linux-block

    torvalds committed Feb 22, 2020
    Pull io_uring fixes from Jens Axboe:
     "Here's a small collection of fixes that were queued up:
    
       - Remove unnecessary NULL check (Dan)
    
       - Missing io_req_cancelled() call in fallocate (Pavel)
    
       - Put the cleanup check for aux data in the right spot (Pavel)
    
       - Two fixes for SQPOLL (Stefano, Xiaoguang)"
    
    * tag 'io_uring-5.6-2020-02-22' of git://git.kernel.dk/linux-block:
      io_uring: fix __io_iopoll_check deadlock in io_sq_thread
      io_uring: prevent sq_thread from spinning when it should stop
      io_uring: fix use-after-free by io_cleanup_req()
      io_uring: remove unnecessary NULL checks
      io_uring: add missing io_req_cancelled()
  3. Merge tag 'block-5.6-2020-02-22' of git://git.kernel.dk/linux-block

    torvalds committed Feb 22, 2020
    Pull block fixes from Jens Axboe:
     "Just a set of NVMe fixes via Keith"
    
    * tag 'block-5.6-2020-02-22' of git://git.kernel.dk/linux-block:
      nvme-multipath: Fix memory leak with ana_log_buf
      nvme: Fix uninitialized-variable warning
      nvme-pci: Use single IRQ vector for old Apple models
      nvme/pci: Add sleep quirk for Samsung and Toshiba drives
  4. Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/g…

    torvalds committed Feb 22, 2020
    …it/jejb/scsi
    
    Pull SCSI fixes from James Bottomley:
     "Four non-core fixes.
    
      Two are reverts of target fixes which turned out to have unwanted side
      effects, one is a revert of an RDMA fix with the same problem and the
      final one fixes an incorrect warning about memory allocation failures
      in megaraid_sas (the driver actually reduces the allocation size until
      it succeeds)"
    
    Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
    
    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
      scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session"
      scsi: Revert "RDMA/isert: Fix a recently introduced regression related to logout"
      scsi: megaraid_sas: silence a warning
      scsi: Revert "target/core: Inline transport_lun_remove_cmd()"
  5. Merge tag 'hwmon-for-v5.6-rc3' of git://git.kernel.org/pub/scm/linux/…

    torvalds committed Feb 22, 2020
    …kernel/git/groeck/linux-staging
    
    Pull hwmon fixes from Guenter Roeck:
    
     - Fix crash in w83627ehf driver seen with W83627DHG-P
    
     - Fix lockdep splat in acpi_power_meter driver
    
     - Fix xdpe12284 documentation Sphinx warnings
    
    * tag 'hwmon-for-v5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
      hwmon: (w83627ehf) Fix crash seen with W83627DHG-P
      hwmon: (acpi_power_meter) Fix lockdep splat
      Documentation/hwmon: fix xdpe12284 Sphinx warnings
  6. Merge tag 'devicetree-fixes-for-5.6-2' of git://git.kernel.org/pub/sc…

    torvalds committed Feb 22, 2020
    …m/linux/kernel/git/robh/linux
    
    Pull devicetree fixes deom Rob Herring:
     "A handful of fixes in DT bindings for MDIO bus, Allwinner CSI, OMAP
      HSMMC, and Tegra124 EMC"
    
    * tag 'devicetree-fixes-for-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
      dt-bindings: media: csi: Fix clocks description
      dt-bindings: media: csi: Add interconnects properties
      dt-bindings: net: mdio: remove compatible string from example
      dt-bindings: memory-controller: Update example for Tegra124 EMC
      dt-bindings: mmc: omap-hsmmc: Fix SDIO interrupt
  7. Merge tag 's390-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/g…

    torvalds committed Feb 22, 2020
    …it/s390/linux
    
    Pull s390 fixes from Vasily Gorbik:
    
     - Remove ieee_emulation_warnings sysctl which is a dead code.
    
     - Avoid triggering rebuild of the kernel during make install.
    
     - Enable protected virtualization guest support in default configs.
    
     - Fix cio_ignore seq_file .next function to increase position index.
       And use kobj_to_dev instead of container_of in cio code.
    
     - Fix storage block address lists to contain absolute addresses in qdio
       code.
    
     - Few clang warnings and spelling fixes.
    
    * tag 's390-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
      s390/qdio: fill SBALEs with absolute addresses
      s390/qdio: fill SL with absolute addresses
      s390: remove obsolete ieee_emulation_warnings
      s390: make 'install' not depend on vmlinux
      s390/kaslr: Fix casts in get_random
      s390/mm: Explicitly compare PAGE_DEFAULT_KEY against zero in storage_key_init_range
      s390/pkey/zcrypt: spelling s/crytp/crypt/
      s390/cio: use kobj_to_dev() API
      s390/defconfig: enable CONFIG_PROTECTED_VIRTUALIZATION_GUEST
      s390/cio: cio_ignore_proc_seq_next should increase position index
  8. io_uring: fix __io_iopoll_check deadlock in io_sq_thread

    Xiaoguang Wang authored and axboe committed Feb 22, 2020
    Since commit a3a0e43 ("io_uring: don't enter poll loop if we have
    CQEs pending"), if we already events pending, we won't enter poll loop.
    In case SETUP_IOPOLL and SETUP_SQPOLL are both enabled, if app has
    been terminated and don't reap pending events which are already in cq
    ring, and there are some reqs in poll_list, io_sq_thread will enter
    __io_iopoll_check(), and find pending events, then return, this loop
    will never have a chance to exit.
    
    I have seen this issue in fio stress tests, to fix this issue, let
    io_sq_thread call io_iopoll_getevents() with argument 'min' being zero,
    and remove __io_iopoll_check().
    
    Fixes: a3a0e43 ("io_uring: don't enter poll loop if we have CQEs pending")
    Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  9. ext4: fix mount failure with quota configured as module

    jankara authored and tytso committed Feb 21, 2020
    When CONFIG_QFMT_V2 is configured as a module, the test in
    ext4_feature_set_ok() fails and so mount of filesystems with quota or
    project features fails. Fix the test to use IS_ENABLED macro which
    works properly even for modules.
    
    Link: https://lore.kernel.org/r/20200221100835.9332-1-jack@suse.cz
    Fixes: d65d87a ("ext4: improve explanation of a mount failure caused by a misconfigured kernel")
    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Cc: stable@kernel.org
  10. jbd2: fix ocfs2 corrupt when clearing block group bits

    wangyan122 authored and tytso committed Feb 20, 2020
    I found a NULL pointer dereference in ocfs2_block_group_clear_bits().
    The running environment:
    	kernel version: 4.19
    	A cluster with two nodes, 5 luns mounted on two nodes, and do some
    	file operations like dd/fallocate/truncate/rm on every lun with storage
    	network disconnection.
    
    The fallocate operation on dm-23-45 caused an null pointer dereference.
    
    The information of NULL pointer dereference as follows:
    	[577992.878282] JBD2: Error -5 detected when updating journal superblock for dm-23-45.
    	[577992.878290] Aborting journal on device dm-23-45.
    	...
    	[577992.890778] JBD2: Error -5 detected when updating journal superblock for dm-24-46.
    	[577992.890908] __journal_remove_journal_head: freeing b_committed_data
    	[577992.890916] (fallocate,88392,52):ocfs2_extend_trans:474 ERROR: status = -30
    	[577992.890918] __journal_remove_journal_head: freeing b_committed_data
    	[577992.890920] (fallocate,88392,52):ocfs2_rotate_tree_right:2500 ERROR: status = -30
    	[577992.890922] __journal_remove_journal_head: freeing b_committed_data
    	[577992.890924] (fallocate,88392,52):ocfs2_do_insert_extent:4382 ERROR: status = -30
    	[577992.890928] (fallocate,88392,52):ocfs2_insert_extent:4842 ERROR: status = -30
    	[577992.890928] __journal_remove_journal_head: freeing b_committed_data
    	[577992.890930] (fallocate,88392,52):ocfs2_add_clusters_in_btree:4947 ERROR: status = -30
    	[577992.890933] __journal_remove_journal_head: freeing b_committed_data
    	[577992.890939] __journal_remove_journal_head: freeing b_committed_data
    	[577992.890949] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
    	[577992.890950] Mem abort info:
    	[577992.890951]   ESR = 0x96000004
    	[577992.890952]   Exception class = DABT (current EL), IL = 32 bits
    	[577992.890952]   SET = 0, FnV = 0
    	[577992.890953]   EA = 0, S1PTW = 0
    	[577992.890954] Data abort info:
    	[577992.890955]   ISV = 0, ISS = 0x00000004
    	[577992.890956]   CM = 0, WnR = 0
    	[577992.890958] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f8da07a9
    	[577992.890960] [0000000000000020] pgd=0000000000000000
    	[577992.890964] Internal error: Oops: 96000004 [#1] SMP
    	[577992.890965] Process fallocate (pid: 88392, stack limit = 0x00000000013db2fd)
    	[577992.890968] CPU: 52 PID: 88392 Comm: fallocate Kdump: loaded Tainted: G        W  OE     4.19.36 #1
    	[577992.890969] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
    	[577992.890971] pstate: 60400009 (nZCv daif +PAN -UAO)
    	[577992.891054] pc : _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
    	[577992.891082] lr : _ocfs2_free_suballoc_bits+0x618/0x968 [ocfs2]
    	[577992.891084] sp : ffff0000c8e2b810
    	[577992.891085] x29: ffff0000c8e2b820 x28: 0000000000000000
    	[577992.891087] x27: 00000000000006f3 x26: ffffa07957b02e70
    	[577992.891089] x25: ffff807c59d50000 x24: 00000000000006f2
    	[577992.891091] x23: 0000000000000001 x22: ffff807bd39abc30
    	[577992.891093] x21: ffff0000811d9000 x20: ffffa07535d6a000
    	[577992.891097] x19: ffff000001681638 x18: ffffffffffffffff
    	[577992.891098] x17: 0000000000000000 x16: ffff000080a03df0
    	[577992.891100] x15: ffff0000811d9708 x14: 203d207375746174
    	[577992.891101] x13: 73203a524f525245 x12: 20373439343a6565
    	[577992.891103] x11: 0000000000000038 x10: 0101010101010101
    	[577992.891106] x9 : ffffa07c68a85d70 x8 : 7f7f7f7f7f7f7f7f
    	[577992.891109] x7 : 0000000000000000 x6 : 0000000000000080
    	[577992.891110] x5 : 0000000000000000 x4 : 0000000000000002
    	[577992.891112] x3 : ffff000001713390 x2 : 2ff90f88b1c22f00
    	[577992.891114] x1 : ffff807bd39abc30 x0 : 0000000000000000
    	[577992.891116] Call trace:
    	[577992.891139]  _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
    	[577992.891162]  _ocfs2_free_clusters+0x100/0x290 [ocfs2]
    	[577992.891185]  ocfs2_free_clusters+0x50/0x68 [ocfs2]
    	[577992.891206]  ocfs2_add_clusters_in_btree+0x198/0x5e0 [ocfs2]
    	[577992.891227]  ocfs2_add_inode_data+0x94/0xc8 [ocfs2]
    	[577992.891248]  ocfs2_extend_allocation+0x1bc/0x7a8 [ocfs2]
    	[577992.891269]  ocfs2_allocate_extents+0x14c/0x338 [ocfs2]
    	[577992.891290]  __ocfs2_change_file_space+0x3f8/0x610 [ocfs2]
    	[577992.891309]  ocfs2_fallocate+0xe4/0x128 [ocfs2]
    	[577992.891316]  vfs_fallocate+0x11c/0x250
    	[577992.891317]  ksys_fallocate+0x54/0x88
    	[577992.891319]  __arm64_sys_fallocate+0x28/0x38
    	[577992.891323]  el0_svc_common+0x78/0x130
    	[577992.891325]  el0_svc_handler+0x38/0x78
    	[577992.891327]  el0_svc+0x8/0xc
    
    My analysis process as follows:
    ocfs2_fallocate
      __ocfs2_change_file_space
        ocfs2_allocate_extents
          ocfs2_extend_allocation
            ocfs2_add_inode_data
              ocfs2_add_clusters_in_btree
                ocfs2_insert_extent
                  ocfs2_do_insert_extent
                    ocfs2_rotate_tree_right
                      ocfs2_extend_rotate_transaction
                        ocfs2_extend_trans
                          jbd2_journal_restart
                            jbd2__journal_restart
                              /* handle->h_transaction is NULL,
                               * is_handle_aborted(handle) is true
                               */
                              handle->h_transaction = NULL;
                              start_this_handle
                                return -EROFS;
                ocfs2_free_clusters
                  _ocfs2_free_clusters
                    _ocfs2_free_suballoc_bits
                      ocfs2_block_group_clear_bits
                        ocfs2_journal_access_gd
                          __ocfs2_journal_access
                            jbd2_journal_get_undo_access
                              /* I think jbd2_write_access_granted() will
                               * return true, because do_get_write_access()
                               * will return -EROFS.
                               */
                              if (jbd2_write_access_granted(...)) return 0;
                              do_get_write_access
                                /* handle->h_transaction is NULL, it will
                                 * return -EROFS here, so do_get_write_access()
                                 * was not called.
                                 */
                                if (is_handle_aborted(handle)) return -EROFS;
                        /* bh2jh(group_bh) is NULL, caused NULL
                           pointer dereference */
                        undo_bg = (struct ocfs2_group_desc *)
                                    bh2jh(group_bh)->b_committed_data;
    
    If handle->h_transaction == NULL, then jbd2_write_access_granted()
    does not really guarantee that journal_head will stay around,
    not even speaking of its b_committed_data. The bh2jh(group_bh)
    can be removed after ocfs2_journal_access_gd() and before call
    "bh2jh(group_bh)->b_committed_data". So, we should move
    is_handle_aborted() check from do_get_write_access() into
    jbd2_journal_get_undo_access() and jbd2_journal_get_write_access()
    before the call to jbd2_write_access_granted().
    
    Link: https://lore.kernel.org/r/f72a623f-b3f1-381a-d91d-d22a1c83a336@huawei.com
    Signed-off-by: Yan Wang <wangyan122@huawei.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Reviewed-by: Jun Piao <piaojun@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Cc: stable@kernel.org
  11. ext4: fix race between writepages and enabling EXT4_EXTENTS_FL

    ebiggers authored and tytso committed Feb 19, 2020
    If EXT4_EXTENTS_FL is set on an inode while ext4_writepages() is running
    on it, the following warning in ext4_add_complete_io() can be hit:
    
    WARNING: CPU: 1 PID: 0 at fs/ext4/page-io.c:234 ext4_put_io_end_defer+0xf0/0x120
    
    Here's a minimal reproducer (not 100% reliable) (root isn't required):
    
            while true; do
                    sync
            done &
            while true; do
                    rm -f file
                    touch file
                    chattr -e file
                    echo X >> file
                    chattr +e file
            done
    
    The problem is that in ext4_writepages(), ext4_should_dioread_nolock()
    (which only returns true on extent-based files) is checked once to set
    the number of reserved journal credits, and also again later to select
    the flags for ext4_map_blocks() and copy the reserved journal handle to
    ext4_io_end::handle.  But if EXT4_EXTENTS_FL is being concurrently set,
    the first check can see dioread_nolock disabled while the later one can
    see it enabled, causing the reserved handle to unexpectedly be NULL.
    
    Since changing EXT4_EXTENTS_FL is uncommon, and there may be other races
    related to doing so as well, fix this by synchronizing changing
    EXT4_EXTENTS_FL with ext4_writepages() via the existing
    s_writepages_rwsem (previously called s_journal_flag_rwsem).
    
    This was originally reported by syzbot without a reproducer at
    https://syzkaller.appspot.com/bug?extid=2202a584a00fffd19fbf,
    but now that dioread_nolock is the default I also started seeing this
    when running syzkaller locally.
    
    Link: https://lore.kernel.org/r/20200219183047.47417-3-ebiggers@kernel.org
    Reported-by: syzbot+2202a584a00fffd19fbf@syzkaller.appspotmail.com
    Fixes: 6b523df ("ext4: use transaction reservation for extent conversion in ext4_end_io")
    Signed-off-by: Eric Biggers <ebiggers@google.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Cc: stable@kernel.org
  12. ext4: rename s_journal_flag_rwsem to s_writepages_rwsem

    ebiggers authored and tytso committed Feb 19, 2020
    In preparation for making s_journal_flag_rwsem synchronize
    ext4_writepages() with changes to both the EXTENTS and JOURNAL_DATA
    flags (rather than just JOURNAL_DATA as it does currently), rename it to
    s_writepages_rwsem.
    
    Link: https://lore.kernel.org/r/20200219183047.47417-2-ebiggers@kernel.org
    Signed-off-by: Eric Biggers <ebiggers@google.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Cc: stable@kernel.org
  13. ext4: fix potential race between s_flex_groups online resizing and ac…

    Suraj Jitindar Singh authored and tytso committed Feb 19, 2020
    …cess
    
    During an online resize an array of s_flex_groups structures gets replaced
    so it can get enlarged. If there is a concurrent access to the array and
    this memory has been reused then this can lead to an invalid memory access.
    
    The s_flex_group array has been converted into an array of pointers rather
    than an array of structures. This is to ensure that the information
    contained in the structures cannot get out of sync during a resize due to
    an accessor updating the value in the old structure after it has been
    copied but before the array pointer is updated. Since the structures them-
    selves are no longer copied but only the pointers to them this case is
    mitigated.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
    Link: https://lore.kernel.org/r/20200221053458.730016-4-tytso@mit.edu
    Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Cc: stable@kernel.org
  14. Merge tag 'for-linus-5.6-rc3-tag' of git://git.kernel.org/pub/scm/lin…

    torvalds committed Feb 22, 2020
    …ux/kernel/git/xen/tip
    
    Pull xen fixes from Juergen Gross:
     "Two small fixes for Xen:
    
       - a fix to avoid warnings with new gcc
    
       - a fix for incorrectly disabled interrupts when calling
         _cond_resched()"
    
    * tag 'for-linus-5.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
      xen: Enable interrupts when calling _cond_resched()
      x86/xen: Distribute switch variables for initialization
  15. Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/…

    torvalds committed Feb 22, 2020
    …git/arm64/linux
    
    Pull arm64 fixes from Will Deacon:
     "It's all straightforward apart from the changes to mmap()/mremap() in
      relation to their handling of address arguments from userspace with
      non-zero tag bits in the upper byte.
    
      The change to brk() is necessary to fix a nasty user-visible
      regression in malloc(), but we tightened up mmap() and mremap() at the
      same time because they also allow the user to create virtual aliases
      by accident. It's much less likely than brk() to matter in practice,
      but enforcing the principle of "don't permit the creation of mappings
      using tagged addresses" leads to a straightforward ABI without having
      to worry about the "but what if a crazy program did foo?" aspect of
      things.
    
      Summary:
    
       - Fix regression in malloc() caused by ignored address tags in brk()
    
       - Add missing brackets around argument to untagged_addr() macro
    
       - Fix clang build when using binutils assembler
    
       - Fix silly typo in virtual memory map documentation"
    
    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
      mm: Avoid creating virtual address aliases in brk()/mmap()/mremap()
      docs: arm64: fix trivial spelling enought to enough in memory.rst
      arm64: memory: Add missing brackets to untagged_addr() macro
      arm64: lse: Fix LSE atomics with LLVM
Commits on Feb 21, 2020
  1. Merge tag 'powerpc-5.6-3' of git://git.kernel.org/pub/scm/linux/kerne…

    torvalds committed Feb 21, 2020
    …l/git/powerpc/linux
    
    Pull powerpc fixes from Michael Ellerman:
     "Some more powerpc fixes for 5.6. This is two weeks worth as I was out
      sick last week:
    
       - Three fixes for the recently added VMAP_STACK on 32-bit.
    
       - Three fixes related to hugepages on 8xx (32-bit).
    
       - A fix for a bug in our transactional memory handling that could
         lead to a kernel crash if we saw a page fault during signal
         delivery.
    
       - A fix for a deadlock in our PCI EEH (Enhanced Error Handling) code.
    
       - A couple of other minor fixes.
    
      Thanks to: Christophe Leroy, Erhard F, Frederic Barrat, Gustavo Luiz
      Duarte, Larry Finger, Leonardo Bras, Oliver O'Halloran, Sam Bobroff"
    
    * tag 'powerpc-5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
      powerpc/entry: Fix an #if which should be an #ifdef in entry_32.S
      powerpc/xmon: Fix whitespace handling in getstring()
      powerpc/6xx: Fix power_save_ppc32_restore() with CONFIG_VMAP_STACK
      powerpc/chrp: Fix enter_rtas() with CONFIG_VMAP_STACK
      powerpc/32s: Fix DSI and ISI exceptions for CONFIG_VMAP_STACK
      powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery
      powerpc/8xx: Fix clearing of bits 20-23 in ITLB miss
      powerpc/hugetlb: Fix 8M hugepages on 8xx
      powerpc/hugetlb: Fix 512k hugepages on 8xx with 16k page size
      powerpc/eeh: Fix deadlock handling dead PHB
  2. Merge tag 'linux-watchdog-5.6-rc3' of git://www.linux-watchdog.org/li…

    torvalds committed Feb 21, 2020
    …nux-watchdog
    
    Pull watchdog fixes from Wim Van Sebroeck:
    
     - mtk_wdt needs RESET_CONTROLLER to build
    
     - da9062 driver fixes:
         - fix power management ops
         - do not ping the hw during stop()
         - add dependency on I2C
    
    * tag 'linux-watchdog-5.6-rc3' of git://www.linux-watchdog.org/linux-watchdog:
      watchdog: da9062: Add dependency on I2C
      watchdog: da9062: fix power management ops
      watchdog: da9062: do not ping the hw during stop()
      watchdog: fix mtk_wdt.c RESET_CONTROLLER build error
Older
You can’t perform that action at this time.