Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seccomp: allow unshare by default #41244

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stefanha
Copy link

- What I did
This pull request adds the unshare syscall to the default seccomp.json policy.

The unshare(2) syscall is currently only available when CAP_SYS_ADMIN
has been given. This is overly restrictive because not all CLONE_* flags
require CAP_SYS_ADMIN.

The virtiofsd daemon needs unshare(CLONE_FS) so that each thread has its
own working directory. This allows the program to use chdir(2) in a
thread-safe way:
https://git.qemu.org/?p=qemu.git;a=blob;f=tools/virtiofsd/fuse_virtio.c;h=3b6d16a0417ae560ee46dfec9254039487251007;hb=HEAD#l450

- How I did it
The seccomp.json file is updated, moving unshare to the default syscalls.

Both containers-golang and systemd already enable unshare by default:
https://github.com/seccomp/containers-golang/blob/master/seccomp.json
https://github.com/systemd/systemd/blob/master/src/shared/seccomp-util.c

- How to verify it
Run a container image that executes a program that uses unshare(2) (make sure it does not have CAP_SYS_ADMIN). The syscall fails with EPERM due to the seccomp policy violation.

With this patch applied the unshare(2) syscall executes.

The following program can be used:

#define _GNU_SOURCE
#include <assert.h>
#include <stdlib.h>
#include <sched.h>

int main(int argc, char **argv)
{
	if (unshare(CLONE_FS) == 0) {
		return EXIT_SUCCESS;
	} else {
		return EXIT_FAILURE;
	}
}

- Description for the changelog
Add the unshare syscall to the default seccomp.json policy

The unshare(2) syscall is currently only available when CAP_SYS_ADMIN
has been given. This is overly restrictive because not all CLONE_* flags
require CAP_SYS_ADMIN.

The virtiofsd daemon needs unshare(CLONE_FS) so that each thread has its
own working directory. This allows the program to use chdir(2) in a
thread-safe way:
https://git.qemu.org/?p=qemu.git;a=blob;f=tools/virtiofsd/fuse_virtio.c;h=3b6d16a0417ae560ee46dfec9254039487251007;hb=HEAD#l450

Both containers-golang and systemd already enable unshare by default:
https://github.com/seccomp/containers-golang/blob/master/seccomp.json
https://github.com/systemd/systemd/blob/master/src/shared/seccomp-util.c

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
@justincormack
Copy link
Contributor

Hi, the aim of this is partly to block CLONE_NEWUSER as this opens up a large attack surface; am reviewing the seccomp policy but I think that one needs to stay. The json seccomp policy config that runc has makes writing policies unnecessarily difficult unfortunately, but it may still be possible to adjust the policy to block that and not the other unprivileged ones.

@thaJeztah thaJeztah added area/security/seccomp kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/0-triage labels Jul 23, 2020
@stefanha
Copy link
Author

@justincormack systemd has per-flag seccomp rules so it can exclude certain flags:
https://github.com/systemd/systemd/blob/master/src/shared/seccomp-util.c#L1192

I don't know if the same is possible in the json policy file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/security/seccomp kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/0-triage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants