Permalink
Commits on Feb 27, 2018
  1. landlock: Add user and kernel documentation for Landlock

    l0kod committed Feb 27, 2018
    This documentation can be built with the Sphinx framework.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: James Morris <james.l.morris@oracle.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v7:
    * update documentation according to the Landlock revamp
    
    Changes since v6:
    * add a check for ctx->event
    * rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
    * rename Landlock version to ABI to better reflect its purpose and add a
      dedicated changelog section
    * update tables
    * relax no_new_privs recommendations
    * remove ABILITY_WRITE related functions
    * reword rule "appending" to "prepending" and explain it
    * cosmetic fixes
    
    Changes since v5:
    * update the rule hierarchy inheritance explanation
    * briefly explain ctx->arg2
    * add ptrace restrictions
    * explain EPERM
    * update example (subtype)
    * use ":manpage:"
  2. bpf,landlock: Add tests for Landlock

    l0kod committed Feb 27, 2018
    Test basic context access, ptrace protection and filesystem hooks and
    Landlock program chaining with multiple cases.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: James Morris <james.l.morris@oracle.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Will Drewry <wad@chromium.org>
    ---
    
    Changes since v7:
    * update tests and add new ones for filesystem hierarchy and Landlock
      chains.
    
    Changes since v6:
    * use the new kselftest_harness.h
    * use const variables
    * replace ASSERT_STEP with ASSERT_*
    * rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
    * force sample library rebuild
    * fix install target
    
    Changes since v5:
    * add subtype test
    * add ptrace tests
    * split and rename files
    * cleanup and rebase
  3. bpf: Add a Landlock sandbox example

    l0kod committed Feb 27, 2018
    Add a basic sandbox tool to launch a command which is only allowed to
    access in a read only or read-write way a whitelist of file hierarchies.
    
    Add to the bpf_load library the ability to handle a BPF program subtype.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: James Morris <james.l.morris@oracle.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v7:
    * rewrite the example using an inode map
    * add to bpf_load the ability to handle subtypes per program type
    
    Changes since v6:
    * check return value of load_and_attach()
    * allow to write on pipes
    * rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
    * rename Landlock version to ABI to better reflect its purpose
    * use const variable (suggested by Kees Cook)
    * remove useless definitions (suggested by Kees Cook)
    * add detailed explanations (suggested by Kees Cook)
    
    Changes since v5:
    * cosmetic fixes
    * rebase
    
    Changes since v4:
    * write Landlock rule in C and compiled it with LLVM
    * remove cgroup handling
    * remove path handling: only handle a read-only environment
    * remove errno return codes
    
    Changes since v3:
    * remove seccomp and origin field: completely free from seccomp programs
    * handle more FS-related hooks
    * handle inode hooks and directory traversal
    * add faked but consistent view thanks to ENOENT
    * add /lib64 in the example
    * fix spelling
    * rename some types and definitions (e.g. SECCOMP_ADD_LANDLOCK_RULE)
    
    Changes since v2:
    * use BPF_PROG_ATTACH for cgroup handling
  4. landlock: Add ptrace restrictions

    l0kod committed Feb 27, 2018
    A landlocked process has less privileges than a non-landlocked process
    and must then be subject to additional restrictions when manipulating
    processes. To be allowed to use ptrace(2) and related syscalls on a
    target process, a landlocked process must have a subset of the target
    process' rules.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: James Morris <james.l.morris@oracle.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v6:
    * factor out ptrace check
    * constify pointers
    * cleanup headers
    * use the new security_add_hooks()
  5. landlock: Handle filesystem access control

    l0kod committed Feb 27, 2018
    This add three Landlock: FS_WALK, FS_PICK and FS_GET.
    
    The FS_WALK hook is used to walk through a file path. A program tied to
    this hook will be evaluated for each directory traversal except the last
    one if it is the leaf of the path.
    
    The FS_PICK hook is used to validate a set of actions requested on a
    file. This actions are defined with triggers (e.g. read, write, open,
    append...).
    
    The FS_GET hook is used to tag open files, which is necessary to be able
    to evaluate relative paths.  A program tied to this hook can tag a file
    with an inode map.
    
    A Landlock program can be chained to another if it is permitted by the
    BPF verifier. A FS_WALK can be chained to a FS_PICK which can be chained
    to a FS_GET.
    
    The Landlock LSM hook registration is done after other LSM to only run
    actions from user-space, via eBPF programs, if the access was granted by
    major (privileged) LSMs.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: James Morris <james.l.morris@oracle.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v7:
    * major rewrite with clean Landlock hooks able to deal with file paths
    
    Changes since v6:
    * add 3 more sub-events: IOCTL, LOCK, FCNTL
      https://lkml.kernel.org/r/2fbc99a6-f190-f335-bd14-04bdeed35571@digikod.net
    * use the new security_add_hooks()
    * explain the -Werror=unused-function
    * constify pointers
    * cleanup headers
    
    Changes since v5:
    * split hooks.[ch] into hooks.[ch] and hooks_fs.[ch]
    * add more documentation
    * cosmetic fixes
    * rebase (SCALAR_VALUE)
    
    Changes since v4:
    * add LSM hook abstraction called Landlock event
      * use the compiler type checking to verify hooks use by an event
      * handle all filesystem related LSM hooks (e.g. file_permission,
        mmap_file, sb_mount...)
    * register BPF programs for Landlock just after LSM hooks registration
    * move hooks registration after other LSMs
    * add failsafes to check if a hook is not used by the kernel
    * allow partial raw value access form the context (needed for programs
      generated by LLVM)
    
    Changes since v3:
    * split commit
    * add hooks dealing with struct inode and struct path pointers:
      inode_permission and inode_getattr
    * add abstraction over eBPF helper arguments thanks to wrapping structs
  6. bpf,landlock: Add a new map type: inode

    l0kod committed Feb 27, 2018
    This new map store arbitrary 64-bits values referenced by inode keys.
    The map can be updated from user space with file descriptor pointing to
    inodes tied to a file system.  From an eBPF (Landlock) program point of
    view, such a map is read-only and can only be used to retrieved a
    64-bits value tied to a given inode.  This is useful to recognize an
    inode tagged by user space, without access right to this inode (i.e. no
    need to have a write access to this inode).
    
    This also add new BPF map object types: landlock_tag_object and
    landlock_chain.  The landlock_chain pointer is needed to be able to
    handle multiple tags per inode.  The landlock_tag_object is needed to
    update a reference to a list of shared tags.  This is typically used by
    a struct file (reference) and a struct inode (shared list of tags).
    This way, we can account the process/user for the number of tagged
    files, while still being able to read the tags from the pointed inode.
    
    Add dedicated BPF functions to handle this type of map:
    * bpf_inode_map_update_elem()
    * bpf_inode_map_lookup_elem()
    * bpf_inode_map_delete_elem()
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: James Morris <james.l.morris@oracle.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    Cc: Jann Horn <jann@thejh.net>
    ---
    
    Changes since v7:
    * new design with a dedicated map and a BPF function to tie a value to
      an inode
    * add the ability to set or get a tag on an inode from a Landlock
      program
    
    Changes since v6:
    * remove WARN_ON() for missing dentry->d_inode
    * refactor bpf_landlock_func_proto() (suggested by Kees Cook)
    
    Changes since v5:
    * cosmetic fixes and rebase
    
    Changes since v4:
    * use a file abstraction (handle) to wrap inode, dentry, path and file
      structs
    * remove bpf_landlock_cmp_fs_beneath()
    * rename the BPF helper and move it to kernel/bpf/
    * tighten helpers accessible by a Landlock rule
    
    Changes since v3:
    * remove bpf_landlock_cmp_fs_prop() (suggested by Alexei Starovoitov)
    * add hooks dealing with struct inode and struct path pointers:
      inode_permission and inode_getattr
    * add abstraction over eBPF helper arguments thanks to wrapping structs
    * add bpf_landlock_get_fs_mode() helper to check file type and mode
    * merge WARN_ON() (suggested by Kees Cook)
    * fix and update bpf_helpers.h
    * use BPF_CALL_* for eBPF helpers (suggested by Alexei Starovoitov)
    * make handle arraymap safe (RCU) and remove buggy synchronize_rcu()
    * factor out the arraymay walk
    * use size_t to index array (suggested by Jann Horn)
    
    Changes since v2:
    * add MNT_INTERNAL check to only add file handle from user-visible FS
      (e.g. no anonymous inode)
    * replace struct file* with struct path* in map_landlock_handle
    * add BPF protos
    * fix bpf_landlock_cmp_fs_prop_with_struct_file()
  7. seccomp,landlock: Enforce Landlock programs per process hierarchy

    l0kod committed Feb 27, 2018
    The seccomp(2) syscall can be used by a task to apply a Landlock program
    to itself. As a seccomp filter, a Landlock program is enforced for the
    current task and all its future children. A program is immutable and a
    task can only add new restricting programs to itself, forming a list of
    programss.
    
    A Landlock program is tied to a Landlock hook. If the action on a kernel
    object is allowed by the other Linux security mechanisms (e.g. DAC,
    capabilities, other LSM), then a Landlock hook related to this kind of
    object is triggered. The list of programs for this hook is then
    evaluated. Each program return a 32-bit value which can deny the action
    on a kernel object with a non-zero value. If every programs of the list
    return zero, then the action on the object is allowed.
    
    Multiple Landlock programs can be chained to share a 64-bits value for a
    call chain (e.g. evaluating multiple elements of a file path).  This
    chaining is restricted when a process construct this chain by loading a
    program, but additional checks are performed when it requests to apply
    this chain of programs to itself.  The restrictions ensure that it is
    not possible to call multiple programs in a way that would imply to
    handle multiple shared values (i.e. cookies) for one chain.  For now,
    only a fs_pick program can be chained to the same type of program,
    because it may make sense if they have different triggers (cf. next
    commits).  This restrictions still allows to reuse Landlock programs in
    a safe way (e.g. use the same loaded fs_walk program with multiple
    chains of fs_pick programs).
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: James Morris <james.l.morris@oracle.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    Cc: Will Drewry <wad@chromium.org>
    Link: https://lkml.kernel.org/r/c10a503d-5e35-7785-2f3d-25ed8dd63fab@digikod.net
    ---
    
    Changes since v7:
    * handle and verify program chains
    * split and rename providers.c to enforce.c and enforce_seccomp.c
    * rename LANDLOCK_SUBTYPE_* to LANDLOCK_*
    
    Changes since v6:
    * rename some functions with more accurate names to reflect that an eBPF
      program for Landlock could be used for something else than a rule
    * reword rule "appending" to "prepending" and explain it
    * remove the superfluous no_new_privs check, only check global
      CAP_SYS_ADMIN when prepending a Landlock rule (needed for containers)
    * create and use {get,put}_seccomp_landlock() (suggested by Kees Cook)
    * replace ifdef with static inlined function (suggested by Kees Cook)
    * use get_user() (suggested by Kees Cook)
    * replace atomic_t with refcount_t (requested by Kees Cook)
    * move struct landlock_{rule,events} from landlock.h to common.h
    * cleanup headers
    
    Changes since v5:
    * remove struct landlock_node and use a similar inheritance mechanisme
      as seccomp-bpf (requested by Andy Lutomirski)
    * rename SECCOMP_ADD_LANDLOCK_RULE to SECCOMP_APPEND_LANDLOCK_RULE
    * rename file manager.c to providers.c
    * add comments
    * typo and cosmetic fixes
    
    Changes since v4:
    * merge manager and seccomp patches
    * return -EFAULT in seccomp(2) when user_bpf_fd is null to easely check
      if Landlock is supported
    * only allow a process with the global CAP_SYS_ADMIN to use Landlock
      (will be lifted in the future)
    * add an early check to exit as soon as possible if the current process
      does not have Landlock rules
    
    Changes since v3:
    * remove the hard link with seccomp (suggested by Andy Lutomirski and
      Kees Cook):
      * remove the cookie which could imply multiple evaluation of Landlock
        rules
      * remove the origin field in struct landlock_data
    * remove documentation fix (merged upstream)
    * rename the new seccomp command to SECCOMP_ADD_LANDLOCK_RULE
    * internal renaming
    * split commit
    * new design to be able to inherit on the fly the parent rules
    
    Changes since v2:
    * Landlock programs can now be run without seccomp filter but for any
      syscall (from the process) or interruption
    * move Landlock related functions and structs into security/landlock/*
      (to manage cgroups as well)
    * fix seccomp filter handling: run Landlock programs for each of their
      legitimate seccomp filter
    * properly clean up all seccomp results
    * cosmetic changes to ease the understanding
    * fix some ifdef
  8. bpf,landlock: Define an eBPF program type for Landlock hooks

    l0kod committed Feb 27, 2018
    Add a new type of eBPF program used by Landlock hooks. This type of
    program can be chained with the same eBPF program type (according to
    subtype rules). A state can be kept with a value available in the
    program's context (e.g. named "cookie" for Landlock programs).
    
    This new BPF program type will be registered with the Landlock LSM
    initialization.
    
    Add an initial Landlock Kconfig and update the MAINTAINERS file.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: James Morris <james.l.morris@oracle.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    ---
    
    Changes since v7:
    * cosmetic fixes
    * rename LANDLOCK_SUBTYPE_* to LANDLOCK_*
    * cleanup UAPI definitions and move them from bpf.h to landlock.h
      (suggested by Alexei Starovoitov)
    * disable Landlock by default (suggested by Alexei Starovoitov)
    * rename BPF_PROG_TYPE_LANDLOCK_{RULE,HOOK}
    * update the Kconfig
    * update the MAINTAINERS file
    * replace the IOCTL, LOCK and FCNTL events with FS_PICK, FS_WALK and
      FS_GET hook types
    * add the ability to chain programs with an eBPF program file descriptor
      (i.e. the "previous" field in a Landlock subtype) and keep a state
      with a "cookie" value available from the context
    * add a "triggers" subtype bitfield to match specific actions (e.g.
      append, chdir, read...)
    
    Changes since v6:
    * add 3 more sub-events: IOCTL, LOCK, FCNTL
      https://lkml.kernel.org/r/2fbc99a6-f190-f335-bd14-04bdeed35571@digikod.net
    * rename LANDLOCK_VERSION to LANDLOCK_ABI to better reflect its purpose,
      and move it from landlock.h to common.h
    * rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE: an eBPF
      program could be used for something else than a rule
    * simplify struct landlock_context by removing the arch and syscall_nr fields
    * remove all eBPF map functions call, remove ABILITY_WRITE
    * refactor bpf_landlock_func_proto() (suggested by Kees Cook)
    * constify pointers
    * fix doc inclusion
    
    Changes since v5:
    * rename file hooks.c to init.c
    * fix spelling
    
    Changes since v4:
    * merge a minimal (not enabled) LSM code and Kconfig in this commit
    
    Changes since v3:
    * split commit
    * revamp the landlock_context:
      * add arch, syscall_nr and syscall_cmd (ioctl, fcntl…) to be able to
        cross-check action with the event type
      * replace args array with dedicated fields to ease the addition of new
        fields
  9. bpf: Add eBPF program subtype and is_valid_subtype() verifier

    l0kod committed Feb 27, 2018
    The goal of the program subtype is to be able to have different static
    fine-grained verifications for a unique program type.
    
    The struct bpf_verifier_ops gets a new optional function:
    is_valid_subtype(). This new verifier is called at the beginning of the
    eBPF program verification to check if the (optional) program subtype is
    valid.
    
    The struct bpf_prog_ops gets a new optional function: put_extra(). This
    may be used to put extra data.
    
    For now, only Landlock eBPF programs are using a program subtype (see
    next commits) but this could be used by other program types in the
    future.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Cc: David S. Miller <davem@davemloft.net>
    Link: https://lkml.kernel.org/r/20160827205559.GA43880@ast-mbp.thefacebook.com
    ---
    
    Changes since v7:
    * rename LANDLOCK_SUBTYPE_* to LANDLOCK_*
    * move subtype in bpf_prog_aux and use only one bit for has_subtype
      (suggested by Alexei Starovoitov)
    * wrap the prog_subtype with a prog_extra to be able to reference kernel
      pointers:
      * add an optional put_extra() function to struct bpf_prog_ops to be
        able to free the pointed data
      * replace all the prog_subtype with prog_extra in the struct
        bpf_verifier_ops functions
    * remove the ABI field (requested by Alexei Starovoitov)
    * rename subtype fields
    
    Changes since v6:
    * rename Landlock version to ABI to better reflect its purpose
    * fix unsigned integer checks
    * fix pointer cast
    * constify pointers
    * rebase
    
    Changes since v5:
    * use a prog_subtype pointer and make it future-proof
    * add subtype test
    * constify bpf_load_program()'s subtype argument
    * cleanup subtype initialization
    * rebase
    
    Changes since v4:
    * replace the "status" field with "version" (more generic)
    * replace the "access" field with "ability" (less confusing)
    
    Changes since v3:
    * remove the "origin" field
    * add an "option" field
    * cleanup comments
  10. fs,security: Add a new file access type: MAY_CHROOT

    l0kod committed Feb 27, 2018
    For compatibility reason, MAY_CHROOT is always set with MAY_CHDIR.
    However, this new flag enable to differentiate a chdir form a chroot.
    
    This is needed for the Landlock LSM to be able to evaluate a new root
    directory.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Casey Schaufler <casey@schaufler-ca.com>
    Cc: James Morris <jmorris@namei.org>
    Cc: John Johansen <john.johansen@canonical.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Paul Moore <paul@paul-moore.com>
    Cc: "Serge E. Hallyn" <serge@hallyn.com>
    Cc: Stephen Smalley <sds@tycho.nsa.gov>
    Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Cc: linux-fsdevel@vger.kernel.org
  11. fs,security: Add a security blob to nameidata

    l0kod committed Feb 27, 2018
    The function current_nameidata_security(struct inode *) can be used to
    retrieve a blob's pointer address tied to the inode being walk through.
    This enable to follow a path lookup and know where an inode access come
    from. This is needed for the Landlock LSM to be able to restrict access
    to file path.
    
    The LSM hook nameidata_free_security(struct inode *) is called before
    freeing the associated nameidata.
    
    Signed-off-by: Mickaël Salaün <mic@digikod.net>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Casey Schaufler <casey@schaufler-ca.com>
    Cc: James Morris <jmorris@namei.org>
    Cc: John Johansen <john.johansen@canonical.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Paul Moore <paul@paul-moore.com>
    Cc: "Serge E. Hallyn" <serge@hallyn.com>
    Cc: Stephen Smalley <sds@tycho.nsa.gov>
    Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Cc: linux-fsdevel@vger.kernel.org
Commits on Feb 24, 2018
  1. Merge branch 'x86-jit'

    Alexei Starovoitov
    Alexei Starovoitov committed Feb 24, 2018
    Daniel Borkmann says:
    
    ====================
    Couple of minor improvements to the x64 JIT I had still around from
    pre merge window in order to shrink the image size further. Added
    test cases for kselftests too as well as running Cilium workloads on
    them w/o issues.
    ====================
    
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  2. bpf: add various jit test cases

    borkmann authored and Alexei Starovoitov committed Feb 24, 2018
    Add few test cases that check the rnu-time results under JIT.
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  3. bpf, x64: save 5 bytes in prologue when ebpf insns came from cbpf

    borkmann authored and Alexei Starovoitov committed Feb 24, 2018
    While it's rather cumbersome to reduce prologue for cBPF->eBPF
    migrations wrt spill/fill for r15 which is callee saved register
    due to bpf_error path in bpf_jit.S that is both used by migrations
    as well as native eBPF, we can still trivially save 5 bytes in
    prologue for the former since tail calls can never be used there.
    cBPF->eBPF migrations also have their own custom prologue in BPF
    asm that xors A and X reg anyway, so it's fine we skip this here.
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  4. bpf, x64: save few bytes when mul is in alu32

    borkmann authored and Alexei Starovoitov committed Feb 24, 2018
    Add a generic emit_mov_reg() helper in order to reuse it in BPF
    multiplication to load the src into rax, we can save a few bytes
    in alu32 while doing so.
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  5. bpf, x64: save several bytes when mul dest is r0/r3 anyway

    borkmann authored and Alexei Starovoitov committed Feb 24, 2018
    Instead of unconditionally performing push/pop on rax/rdx
    in case of multiplication, we can save a few bytes in case
    of dest register being either BPF r0 (rax) or r3 (rdx)
    since the result is written in there anyway.
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  6. bpf, x64: save several bytes by using mov over movabsq when possible

    borkmann authored and Alexei Starovoitov committed Feb 24, 2018
    While analyzing some of the more complex BPF programs from Cilium,
    I found that LLVM generally prefers to emit LD_IMM64 instead of MOV32
    BPF instructions for loading unsigned 32-bit immediates into a
    register. Given we cannot change the current/stable LLVM versions
    that are already out there, lets optimize this case such that the
    JIT prefers to emit 'mov %eax, imm32' over 'movabsq %rax, imm64'
    whenever suitable in order to reduce the image size by 4-5 bytes per
    such load in the typical case, reducing image size on some of the
    bigger programs by up to 4%. emit_mov_imm32() and emit_mov_imm64()
    have been added as helpers.
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  7. bpf, x64: save one byte per shl/shr/sar when imm is 1

    borkmann authored and Alexei Starovoitov committed Feb 24, 2018
    When we shift by one, we can use a different encoding where imm
    is not explicitly needed, which saves 1 byte per such op.
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commits on Feb 23, 2018
  1. bpf: NULL pointer check is not needed in BPF_CGROUP_RUN_PROG_INET_SOCK

    laoar authored and borkmann committed Feb 23, 2018
    sk is already allocated in inet_create/inet6_create, hence when
    BPF_CGROUP_RUN_PROG_INET_SOCK is executed sk will never be NULL.
    
    The logic is as bellow,
    	sk = sk_alloc();
    	if (!sk)
    		goto out;
    	BPF_CGROUP_RUN_PROG_INET_SOCK(sk);
    
    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Commits on Feb 15, 2018
  1. Merge branch 'bpf-misc-selftest-improvements'

    borkmann committed Feb 15, 2018
    Joe Stringer says:
    
    ====================
    This is series makes some minor changes primarily focused on making it easier
    to understand why test_verifier is failing a test. This includes printing the
    observed output when a test fails in a different way than expected, or when
    unprivileged tests fail due to sysctl kernel.unprivileged_bpf_disabled=1. The
    last patch removes some apparently dead code.
    ====================
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
  2. bpf: Remove unused callee_saved array

    joestringer authored and borkmann committed Feb 14, 2018
    This array appears to be completely unused, remove it.
    
    Signed-off-by: Joe Stringer <joe@wand.net.nz>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
  3. selftests/bpf: Only run tests if !bpf_disabled

    joestringer authored and borkmann committed Feb 14, 2018
    The "kernel.unprivileged_bpf_disabled" sysctl, if enabled, causes all
    unprivileged tests to fail because it permanently disables unprivileged
    BPF access for the currently running kernel. Skip the relevant tests if
    the user attempts to run the testsuite with this sysctl enabled.
    
    Signed-off-by: Joe Stringer <joe@wand.net.nz>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
  4. selftests/bpf: Count tests skipped by unpriv

    joestringer authored and borkmann committed Feb 14, 2018
    When priviliged tests are skipped due to user rights, count the number of
    skipped tests so it's more obvious that the test did not check everything.
    
    Signed-off-by: Joe Stringer <joe@wand.net.nz>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
  5. selftests/bpf: Print unexpected output on fail

    joestringer authored and borkmann committed Feb 14, 2018
    This makes it easier to debug off-hand when the error message isn't
    exactly as expected.
    
    Signed-off-by: Joe Stringer <joe@wand.net.nz>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Commits on Feb 14, 2018
  1. tools/bpf: adjust rlimit RLIMIT_MEMLOCK for test_tcpbpf_user

    yonghong-song authored and Alexei Starovoitov committed Feb 13, 2018
    The default rlimit RLIMIT_MEMLOCK is 64KB. In certain cases,
    e.g. in a test machine mimicking our production system, this test may
    fail due to unable to charge the required memory for map creation:
       # ./test_tcpbpf_user
       libbpf: failed to create map (name: 'global_map'): Operation not permitted
       libbpf: failed to load object 'test_tcpbpf_kern.o'
       FAILED: load_bpf_file failed for: test_tcpbpf_kern.o
    
    Changing the default rlimit RLIMIT_MEMLOCK to unlimited makes
    the test always pass.
    
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  2. selftests/bpf: fix Makefile for cgroup_helpers.c

    netoptimizer authored and Alexei Starovoitov committed Feb 13, 2018
    The current selftests Makefile construct result in cgroup_helpers.c
    gets compiled together with all the TEST_GEN_PROGS. And it also result
    in invoking the libbpf Makefile two times (tools/lib/bpf).
    
    These issues were introduced in commit 9d1f159 ("bpf: move
    cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/").
    
    The only test program that requires the cgroup helpers is 'test_dev_cgroup'.
    
    Thus, create a make target $(OUTPUT)/test_dev_cgroup that extend[1]
    the 'prerequisite' for the 'stem' %-style pattern in ../lib.mk,
    for this particular test program.
    
    Reviewers notice the make-rules in tools/testing/selftests/lib.mk
    differ from the normal kernel kbuild rules, and it is practical
    to use 'make -p' to follow how these 'Implicit/static pattern stem'
    gets expanded.
    
    [1] https://www.gnu.org/software/make/manual/html_node/Static-Usage.html
    
    Fixes: 9d1f159 ("bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/")
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  3. net: avoid including xdp.h in filter.h

    netoptimizer authored and Alexei Starovoitov committed Feb 13, 2018
    If is sufficient with a forward declaration of struct xdp_rxq_info in
    linux/filter.h, which avoids including net/xdp.h.  This was originally
    suggested by John Fastabend during the review phase, but wasn't
    included in the final patchset revision.  Thus, this followup.
    
    Suggested-by: John Fastabend <john.fastabend@gmail.com>
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  4. bpf: samples/sockmap detach sock ops program

    pbhole authored and Alexei Starovoitov committed Feb 13, 2018
    samples/sockops program keeps the sock_ops program attached to cgroup.
    Fixed this by detaching program before exit.
    
    Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  5. bpf: samples/sockmap fix Makefile for build error

    pbhole authored and Alexei Starovoitov committed Feb 13, 2018
    While building samples/sockmap, undefined reference error is thrown
    for `nla_dump_errormsg'.
    Linking tools/lib/bpf/nlattr.o as a fix
    
    Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  6. samples/bpf: adjust rlimit RLIMIT_MEMLOCK for xdp_redirect

    tndave authored and Alexei Starovoitov committed Feb 9, 2018
    Default rlimit RLIMIT_MEMLOCK is 64KB, causes bpf map failure.
    e.g.
    [root@labbpf]# ./xdp_redirect $(</sys/class/net/eth2/ifindex) \
    > $(</sys/class/net/eth3/ifindex)
    failed to create a map: 1 Operation not permitted
    
    The failure is seen when executing xdp_redirect while xdp_monitor
    is already runnig.
    
    Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commits on Feb 13, 2018
  1. selftests: Add FIB onlink tests

    dsahern authored and davem330 committed Feb 13, 2018
    Add test cases verifying FIB onlink commands work as expected in
    various conditions - IPv4, IPv6, main table, and VRF.
    
    Signed-off-by: David Ahern <dsahern@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  2. Merge branch 'selftests-fib_tests-simplifications-verbosity-and-a-race'

    davem330 committed Feb 13, 2018
    David Ahern says:
    
    ====================
    selftests: fib_tests: simplifications, verbosity and a race
    
    Improve efficiency of fib_tests.sh and make the test result more verbose,
    from this summary:
    $ fib_tests.sh is failing in a VM:
        $ fib_tests.sh
        Running netdev unregister tests
        PASS: unicast route test
        PASS: multipath route test
        Running netdev down tests
        PASS: unicast route test
        PASS: multipath route test
        Running netdev carrier change tests
        PASS: local route carrier test
        FAIL: unicast route carrier test
    
    where a single entry actually corresponds to many checks to a much more
    verbse output that clarifies test cases:
    $fib_tests.sh
    Single path route carrier test
        ....
        Carrier down
            IPv4 fibmatch                                         [ OK ]
            IPv6 fibmatch                                         [ OK ]
            IPv4 linkdown flag set                                [FAIL]
            IPv6 linkdown flag set                                [FAIL]
        Second address added with carrier down
            IPv4 fibmatch                                         [ OK ]
            IPv6 fibmatch                                         [ OK ]
            IPv4 linkdown flag set                                [FAIL]
            IPv6 linkdown flag set                                [ OK ]
    
    And then fix the race in changing carrier down on dummy device to checking
    the corresponding routes.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
  3. selftests: fib_tests: sleep after changing carrier

    dsahern authored and davem330 committed Feb 13, 2018
    sleep for a second after setting carrier down to allow linkwatch
    to propagate the change to the routing stack via netdev_state_change.
    As it stands there is a race setting carrier down on the dummy
    device and then checking the linkdown flag in the routes.
    
    Signed-off-by: David Ahern <dsahern@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  4. selftests: fib_tests: Move admin of dummy0 to helpers

    dsahern authored and davem330 committed Feb 13, 2018
    Move setup and teardown of testns and dummy0 to helpers.
    
    Signed-off-by: David Ahern <dsahern@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  5. selftests: fib_tests: Make test results more verbose

    dsahern authored and davem330 committed Feb 13, 2018
    fib_tests.sh is failing in a VM:
        $ fib_tests.sh
        Running netdev unregister tests
        PASS: unicast route test
        PASS: multipath route test
        Running netdev down tests
        PASS: unicast route test
        PASS: multipath route test
        Running netdev carrier change tests
        PASS: local route carrier test
        FAIL: unicast route carrier test
    
    The last test corresponds to fib_carrier_unicast_test which 12 places
    that could be failing. Be more verbose in the output so a failure is
    easier to track down and separate test setup failures with set -e and
    set +e pairs.
    
    With the verbose logging it is easier to see which checks are failing:
        $fib_tests.sh
        Single path route carrier test
            ....
            Carrier down
                IPv4 fibmatch                                         [ OK ]
                IPv6 fibmatch                                         [ OK ]
                IPv4 linkdown flag set                                [FAIL]
                IPv6 linkdown flag set                                [FAIL]
            Second address added with carrier down
                IPv4 fibmatch                                         [ OK ]
                IPv6 fibmatch                                         [ OK ]
                IPv4 linkdown flag set                                [FAIL]
                IPv6 linkdown flag set                                [ OK ]
    
    Signed-off-by: David Ahern <dsahern@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>