Oz Technical Details

David Mirza Ahmad edited this page Aug 22, 2015 · 7 revisions

Oz

Demo vid 1: https://support.subgraph.com/videos/oz_evince_01.webm

Demo vid 2: https://support.subgraph.com/videos/ozshell_evince_01.webm

Technical Details

Introduction

The Oz system launches Linux desktop applications inside of isolated security sandboxes to prevent further system compromise in the event that an attacker can successfully exploit an application security vulnerability.

As large complex desktop applications cannot be guaranteed to be free of security vulnerabilities, software which handles untrusted and potentially malicious data could be a target for exploitation by an attacker.

Applications which connect to network services are exposed to untrusted data while processing network protocols as well as through content data obtained from the network.

Some examples of high risk software includes:

  • Web browser
  • Email client
  • Instant messaging client

Even if an application never directly interacts with the network, it can still be exploited if it handles data from an untrusted source:

  • Document viewer
  • Video player
  • File archive software

It is a big deal if an application is compromised, especially on a single-user system.

If an adversary wants to monitor a target and access their data, obtaining user-level access on the target’s endpoint is generally sufficient for accomplishing this objective. That user will have access to their own data, their credentials for other systems as well as local encryption keys, and, on a modern desktop operating system, hardware peripherals. The adversary would also generally be able to access the network to exfiltrate at will.

An application compromise on an endpoint can mean the adversary is able to:

  • Access the user’s data without privilege escalation: documents, e-mail messages
  • Access saved credentials such as e-mail/IM/website passwords, encryption keys, system authentication credentials
  • Modify the user’s login properties so that they run modified clients without being aware of it
  • Access and manipulate hardware devices that the user would have access to, such as portable hard drives, the system audio and video recording peripherals.

Additionally, with local user access, an adversary possesses a number of avenues for escalating privileges:

  • Exploiting kernel vulnerabilities or vulnerabilities in setuid binaries
  • Backdooring the system with an unprivileged backdoor: ssh client, sudo..

Escalating to administrative privileges is useful for gaining system-wide access, maintaining persistent remote access, conducting long-term surveillance and avoiding detection by more technical sophisticated targets (though in practice, many targets are not technically sophisticated and it’s often the case that adversaries can embed themselves for long periods of time without detection even if privileges are not escalated).

How Oz works

Oz wraps high-risk applications in a layer of isolation that systematically cuts off avenues for privilege escalation and compartmentalizes the application and its data from the rest of the system. It is designed to do so in a way that offers fine-grained control and flexibility.

Key features in Oz:

Limits access to user files   :   Yes
Limits access to devices      :   Yes
Limits X (desktop) access     :   Yes
Controls network access       :   Yes
Limits desktop exposure       :   Yes
Limits kernel attack surface  :   Yes
Limits process visibility     :   Yes
Limits filesystem visibility  :   Yes

Applications to be run within Oz include those most exposed to untrustworthy networks and data, such as instant messaging clients, web browsers, PDF viewers, and email clients.

The application access control parameters are specified in Oz policy documents. Oz includes a number of pre-made policies for specific applications.

When Oz is installed:

  1. Applications to be launched in Oz are renamed, and a link is created on the filesystem at the original location of binary that points instead to the oz binary.

  2. Oz is started when the user tries to run the application at its original location. Oz examines the application name and then opens the associated policy document. The policy document contains parameters for how to setup the application’s runtime environment and how to launch the application.

  3. The Oz daemon then creates the restricted environment and then launches the real application within it with policy-defined security settings applied.

  4. While the application is running the user can use an Oz CLI tool to obtain a shell in its container.

without oz

oz flow

1) User launches an application

The application binary (/usr/bin/evince in the diagram) is a symlink to the oz command line tool.

2) The oz program sends a message to oz-daemon telling it to launch application

By reading argv[0], the oz utility learns that it was launched from the evince path and sends a message over IPC socket to oz-daemon to launch evince application inside sandbox.

3) oz-daemon locates policy profile for application

The application profile describes how to set up the sandbox for an application.

4) Application sandbox is launched

Details about how the sandbox is set up are provided later in this document.

5) A new PID namespace is created with oz-init as pid 1

The oz-init process manages the child processes inside of the sandbox. As illustrated above, it launches Xpra server and the application binary with a seccomp-bpf wrapper which restricts execution of system calls.

6) Outside of the sandbox and Xpra client instance connects to the Xpra server inside the sandbox

Xpra isolates the application from the X server so that the sandbox cannot capture keystrokes or interfere with other applications running in the desktop session.

7) Xpra connects to real X server

Xpra client renders application window output on X server

Security Objectives

Prevent access to user data beyond what is required by application

Oz launches applications in a new mount namespace with a root directory set to an isolated subset of the host filesystem. The Oz application policy files define whitelists and blacklists to control which files and directories the application will have access to when it runs, and which should not be accessible.

Examples of files an application in Oz will not have access to unless required:

  • User’s SSH keys
  • User’s email content
  • User’s encryption keys
  • User’s documents
  • User’s downloads
  • System setuid binaries

Static policy mandatory access control systems such as AppArmor or SELinux are not good tools to completely solve this problem since a single policy needs to be defined which is sufficiently permissive so that the application will function correctly in a variety of different situations.

An instructive example is a document viewer such as evince where a general static general policy must allow the application to read (and write) to almost any arbitrary path on the filesystem. The user expects to be able to launch evince and then load any file they want from the Open menu item.

However, this is not an appropriate policy when using evince to view an untrusted document which may attempt to exploit the viewer itself. A compromise of the document viewer should not allow an attacker to read or erase all of the other files belonging to the user.

In Oz when you launch evince to read a document, the sandbox filesystem does not contain any user data other than a copy of the document being viewed. The access policy (in this case, including a single readable data file in the sandbox) is determined dynamically from the context of the user action (ie: opening a document from the nautilus file browser).

Deny attacker ability to execute arbitrary code

This can mean different things depending on the character of the initial vulnerability. If the vulnerability is a memory corruption vulnerability which allows an attacker to execute arbitrary machine code (either injected by the attacker, or scavenged ROP gadgets in executable address space) then only restrictions possible are to deny access to kernel facilities (see next section).

Other types of attacker controlled execution vulnerabilities allow running executable files from the filesystem and then leverage this ability into arbitrary control of the system. To prevent the attacker from being able to execute any code they want the filesystem is set up to not permit execution from any writable directories. Additionally, access to script language interpreters is restricted when possible.

Reduce exposure of operating system kernel to privilege escalation attacks

An attacker may attempt to escape from the sandbox environment by exploiting the operating system kernel itself via system calls, devices, kernel file systems such as /proc.

Applications are launched with seccomp-bpf filters loaded. These filters reduce the kernel attack surface by limiting the number of system calls that a process may invoke. Vulnerabilities in the Linux kernel that can lead to privilege escalation are very often exposed in system calls. Reducing the number of exposed system calls will reduce code paths to yet unknown but anticipated future kernel vulnerabilities.

Prevent escalation attacks on privileged system daemons

Namespace isolation used in Oz prevents various types of interaction with system daemons which could be used for attacks.

Namespace
PID Prevent sending signals or accessing /proc entries of system daemons
Mount Prevent access to file system socket IPC channels including DBUS system bus
IPC Deny access to system V ipc objects or posix message queues
Network Sandbox cannot communicate with localhost sockets on host

Prevent attacks on setuid or capability enhancing binaries

Processes launched in the sandbox have the flag PR_SET_NO_NEW_PRIVS set which prevents acquiring any capabilities. Additionally all filesystems in the sandbox are mounted MS_NOSUID and existing setuid binaries are blacklisted individually from the filesystem.

Restrict access to network to prevent exfiltration of stolen user data

Oz creates a new network namespace in each Oz container. This includes its own virtual interface. Oz can define by policy how applications are exposed to external networks. Applications can be entirely denied access to external networks if they do not need network connectivity to function.

Applications such as an image viewer which do not need access to the network are sandboxed in a network namespace with no network devices at all while some applications such as IM clients may only need to access a single address and port and can be restricted to this minimal network access.

Isolate X windows applications

The architecture of the X Windows system creates some challenges for security sandboxing of Linux desktop applications. Any process which can connect to the X server can capture user keystrokes and screen capture any window on the desktop. Oz isolates applications from the actual X display server by using the Xpra utility to give each sandbox a private X server to connect to. The Xpra system composites the application windows from the private X server for display on the real X server.

Minimize devices exposed in sandbox

Each application container environment will have a /dev assembled with the minimum devices necessary for application function.

Auto mounted devices such as portable disks will not be exposed in the container environment.

Access to audio and video recording hardware can also be controlled through the Oz policy.

Implementation

Filesystem

The following steps describe how the filesystem is set up for each sandbox. All of these actions happen in a newly created mount namespace so the changes are only visible to processes inside the sandbox.

  1. create rootfs directory if it doesn't already exist

    rootfs = "/srv/oz/rootfs"
    mkdir(rootfs)
    
  2. set MS_PRIVATE recursively on filesystem root

    mount("", "/", "", MS_PRIVATE | MS_REC, "")
    

    Since this is inside a new mount namespace it only affects the oz-daemon process

  3. mount a tmpfs instance on rootfs path

    mount("", rootfs, "tmpfs", MS_NOSUID | MS_NOEXEC | MS_NODEV, "mode=755,gid=0")
    
  4. Bind mount /bin /lib /lib64 /usr /etc
mount("/bin", "/srv/oz/rootfs/bin", "", MS_BIND, "")
mount("", "/srv/oz/rootfs/bin", "", MS_REMOUNT | MS_RDONLY)
// repeat for remaining directories
  1. Create empty directories
var basicEmptyDirs = []string{
    "/sbin", "/var", "/var/lib",
    "/var/cache", "/home", "/boot",
    "/tmp", "/run", "/run/user",
    "/run/lock", "/root",
    "/opt", "/srv", "/dev", "/proc",
    "/sys", "/mnt", "/media",
}
  1. Mount tmpfs on /srv/oz/rootfs/dev

    mount("", "/srv/oz/rootfs/dev", "tmpfs", MS_NOSUID | MS_NOEXEC, "mode=755")
    
  2. Create devices

    crw-------. 1 root root 5, 1 Jun 28 17:18 console
    crw-rw-rw-. 1 root root 1, 7 Jun 28 17:18 full
    crw-rw-rw-. 1 root root 1, 3 Jun 28 17:18 null
    crw-rw-rw-. 1 root root 1, 8 Jun 28 17:18 random
    crw-rw-rw-. 1 root root 5, 0 Jun 28 17:18 tty
    -rw-r-----. 1 root root    0 Jun 28 17:18 tty1
    -rw-r-----. 1 root root    0 Jun 28 17:18 tty2
    -rw-r-----. 1 root root    0 Jun 28 17:18 tty3
    -rw-r-----. 1 root root    0 Jun 28 17:18 tty4
    crw-rw-rw-. 1 root root 1, 9 Jun 28 17:18 urandom
    crw-rw-rw-. 1 root root 1, 5 Jun 28 17:18 zero
    
  3. Create symlinks

    var basicSymlinks = [][2]string{
      {"/run", "/var/run"},
      {"/tmp", "/var/tmp"},
      {"/run/lock", "/var/lock"},
      {"/dev/shm", "/run/shm"},
    }
    
    var deviceSymlinks = [][2]string{
      {"/proc/self/fd", "/dev/fd"},
      {"/proc/self/fd/2", "/dev/stderr"},
      {"/proc/self/fd/0", "/dev/stdin"},
      {"/proc/self/fd/1", "/dev/stdout"},
      {"/dev/pts/ptmx", "/dev/ptmx"},
    }
    
  4. Create blacklist file and directory

    dr-x------.   2 root root     40 Jun 28 17:18 oz.ro.dir
    -r--------.   1 root root      0 Jun 28 17:18 oz.ro.file
    
  5. blacklist a built in list of paths

    var basicBlacklist = []string{
      "${PATH}/sudo", "${PATH}/su",
      "${PATH}/xinput", "${PATH}/strace",
      "${PATH}/mount", "${PATH}/umount",
      "${PATH}/fusermount",
    }
    
  6. Bind whitelist from profile

    Profile whitelist items are additional bindmounts of files or directories into the sandbox

  7. Blacklist items from profile

    Paths are blacklisted by bind mounting the items created in #9 to blacklist target paths

  8. if Xpra is used, create Xpra socket directory in user home directory and bind into sandbox

[evince] $ ls -n $HOME/.Xoz/evince/evince-0
srw-------. 1 1000 1000 0 Jun 28 17:18 /home/x/.Xoz/evince/evince-0
  1. Enter chroot

    chroot("/srv/oz/rootfs")
    chdir("/")
    
  2. Mount additional kernel filesystems inside sandbox

    mount("", "/dev/shm", "tmpfs", MS_NOSUID | MS_NOEXEC | MS_NODEV)
    mount("", "/tmp", "tmpfs", MS_NOSUID | MS_NOEXEC | MS_NODEV)
    mount("", "/dev/pts", "devpts", MS_NOSUID | MS_NOEXEC "newinstance,ptmxmode=0666")
    mount("", "/proc", "proc", MS_NOSUID | MS_NOEXEC)
    mount("", "/sys", "sysfs", MS_NOSUID | MS_NOEXEC | MS_RDONLY)
    

    Networking

    networking

Bridged Networking

A private network range is configured in the sandbox for applications which need to interact with the network and this network is bridged to the host network.

No network

If an application does not need access to the network, no network interface is created in the sandbox network namespace.

Proxy networking

If it is known at launch time that an application will only connect to a specific single service, a localhost listening proxy can be configured which forwards connections to the remote address and port. In this case, no network device needs to be configured in the sandbox other than the localhost device.

seccomp

Kernel vulnerabilities are an avenue for privilege escalation when an adversary has local, unprivileged access. Many kernel vulnerabilities are exposed to user land processes through system calls and often the vulnerable system calls are obscure and not needed by general user applications.

Seccomp filter is a Linux kernel feature utilized by Oz to reduce the kernel attack surface by limiting the number of system calls a process can invoke, as well as by controlling how certain system calls are used. In Oz, each application can have seccomp rules defined in their profile. These rules describe whitelisted and blacklisted system calls. The rules are compiled and loaded immediately before an application starts. This is done to limit opportunities for privilege escalation via attacks on the Linux kernel if the application is compromised.