Sandbox

matthiasclasen edited this page Jun 2, 2016 · 7 revisions

This page describes the sandbox implementation of Flatpak as it currently exists, and how we want to have it work, as well as what we need to change in the Linux environment for this to work.

Flatpak uses a helper called flatpak-bwrap (borrowed from project atomic's bubblewrap) to set up the sandbox. flatpak-bwrap needs the CAP_SYSADMIN and CAP_MKNOD capabilities. It can either be installed with these file capabilities, or setuid root, in which case it will acquire the capabilities it needs and then go back to the real user id.

The current Flatpak sandbox

The current default sandbox of Flatpak setup is:

  • All processes run as the user with no capabilities
  • All processes run in a transient systemd user scope with the name flatpak-$appid-$pid
  • A filesystem namespace where:
    • / is a private tmpfs not visible anywhere else. This is pivot_root:ed into so it is the new / and all other mounts from the host are unmounted from the namespace.
    • /usr is a bind mount of the runtime
    • /app is a bind mount of the application
    • $HOME/.var/app/$APPID is a bind mount of the per-application, per-user writable data store
    • /proc shows only the processes in the app sandbox
    • /sys is a read-only bind mount of the host /sys
    • /dev contains
      • /dev/full, /dev/null, /dev/random, /dev/urandom, /dev/tty and /dev/zero
      • /dev/shm is a private tmpfs
    • /run/user/$uid is set up and $XDG_RUNTIME_DIR points to it
    • host fonts are bind mounted to /run/host/fonts
    • /etc has
      • passwd & group bind mounted from host
      • machine-id bind mounted from host
      • resolve.conf symlinked to /run/user/$pid/flatpak-monitor/resolv.conf
      • everything else in /usr/etc has a symlink from /etc
    • All mounts are nosuid
    • All mounts are nodev except:
      • /dev
      • /dev/pts
    • All mounts are read-only, except:
      • root tmpfs (e.g. for tmp subdir and transient home)
      • /var
      • /proc (except /proc/sys, proc/sysrq-trigger, /proc/irq and /proc/bus which are read-only)
      • /dev/pts
      • /dev/shm
  • Seccomp is used to disable unnecessary system calls
  • A private pid namespace with a minimal init process that reaps zombies
  • PR_SET_NO_NEW_PRIVS set so that execve() can never raise privileges for any child
  • A watcher monitor that exits when pid 2 (the initial process) of the app exits, returning its exit status
  • A private user namespace
  • A private ipc namespace
  • A private network namespace with only an ipv4 loopback device
  • A session dbus socket is available which goes through a filtering proxy. The app is allowed to own its own app id, and sub-names on the bus, and is only allowed to talk to org.freedesktop.DBus.
  • Environment variables set:
    • PATH=/app/bin:/usr/bin
    • LD_LIBRARY_PATH=/app/lib
    • XDG_CONFIG_DIRS=/app/etc/xdg:/etc/xdg
    • XDG_DATA_DIRS=/app/share:/usr/share
    • XDG_RUNTIME_DIR=/run/user/$pid

There is also a dbus-activated service (flatpak-session-helper) in the user session (outside the sandbox) which monitors /etc/resolv.conf and /etc/localtime and copies them to /run/user/$uid/flatpak-monitor/ every time they change. If this directory is mounted into the sandbox, system setting changes are propagated to the app.

Optionally there are flags to:

  • Add a read-write (nosuid, nodev) bind mount from system $HOME (replacing the /var/home symlink)
  • Add read-write bind mounts to non-system locations in the system / (such as /opt, /src, /media, etc)
  • Allow additional sub-bind mounts under /app and /usr.
  • Take an advisory read-lock on the .ref file in each bind-mounted filesystem, which can be used to detect if any app is still using these files. The lock is owned by pid 1, so will be released when the last process of the app dies.
  • Use the host network namespace
  • Use the host ipc namespace (useful to allow XShm support)
  • Make /app writable (useful when building apps)
  • Make /usr writable (useful when building runtimes)
  • Make /run/user/$uid bind-mount the sockets for:
    • system dbus
    • session dbus
    • user pulseaudio daemon
    • wayland compositor
    • X11 unix domain socket
  • Bind mount /run/user/$uid/dconf directory if the home directory is visible to the user
  • Bind mount run/user/$uid/flatpak-monitor to get resolv.conf and localtime updates. This also sets the TZ environment variable so that glibc picks up the new localtime.
  • Bind mount the host /dev/dri

More sandboxing we want

There are a few things we could to to further sandbox applications that we're not doing today.

kdbus

kdbus policies allow us to add per-name (or per-name-with-wildcard) policies of can-own, can-see, can-talk-to. If the kernel (and the app) support kdbus, then we can set up policies for the system and session bus, limiting what the app can do.

A good policy would be that apps can own their own id as a name, as well as see and talk to the bus and a few dbus apis that we deem "safe".

We are currently approximating the kdbus policy features using a filtering D-Bus proxy, but this adds extra overhead for every D-Bus message.

SELinux

If we could run each application sandbox in its own selinux context that would significantly decrease the risk that you can break out of the sandbox. Hopefully this should not be so hard at least for apps that don't have access to the home directory or other host files, as the number of allowed interactions between the sandbox and the host is very small.

Host changes needed

There are a bunch of changes needed to have a proper sandbox for desktop applications. Here is a list of some that i know of:

  • Need kdbus in the kernel (see above)
  • Need wayland compositor in the session and no access to the Xserver from the sandbox. X is completely insecure and we can't let apps use it.
  • pulseaudio needs a concept of limited access for some clients. Right now any client can read any microphone, and can read and modify stream data and/or properties (like volume) of other clients streams.
  • We need a service that supports arbitration of webcams so that an application can get access to a camera without having to use (unsafe) direct device access. Some initial work is here: https://github.com/wmanley/pulsevideo
  • We need to add joystick support to wayland. The current proposal here is apparently some custom wayland protocol to list joysticks, then you ask to open one and you get back a fd for the input device that the compositor can revoke when the app loses focus. This also needs code in SDL2 to use this.
  • We need to figure out how to best get smartcard support into the sandbox.
  • We need to figure out how to best get scanner support into the sandbox.
  • Printing mainly involves allowing the app access to a cups instance somehow. Need to figure out exactly how this would work.
  • Ryan has plans for a new gsettings backend for use in the sandbox, which only sees the configuration it is allowed to see, and where all modifications/reads go through a session service that applies e.g. lockdown and defaults.
  • The Gnome runtime ships libproxy, but it is not configured to do anything in general. We need to figure out how to best get it to access some session proxy daemon.

Portals

Once you have a perfectly sandboxed app you can run "trivial" applications like a game, but if you want to run normal desktop applications you need to start opening up "safe" ways that an application can interact with the system. We've previously used the name "Portals" for this. Here are some more or less random example of things that we might need portals for:

  • Opening files or other forms of contents from the users home directory
  • Saving newly created or previously opened files
  • Sharing content from the app. For instance, select some text and "share with twitter".
  • Set an alarm
  • Access calendar or contacts
  • Grab a picture (probably from a webcam)
  • Compose an email
  • Get geo location
  • Open a URI

There are some initial designs for the Portal user experience for GNOME.

Existing portal APIs: