Rootless Containers #774

cyphar · 2016-04-23T13:36:26Z

This enables the support for "rootless container mode". There are
certain restrictions on what non-root users can do, resulting in several
runC features not being available. There are no checks in place at
the moment to make this clear to users. I've implemented the config
validation.

All cgroup operations require having write access to your current
cgroup directory. By default, the directories are owned by root and have
the mode 0755. This means that we cannot set up any cgroups, or join
cgroups. Therefore new cgroup namespace doesn't fix this for us either, but
hopefully we can get a patch upstream to fix this. But we should still
improve cgroup handling so that we apply any cgroups we can if we have
write access to the directory.
setgroups(2) cannot be used in a non-privileged user namespace setup.
We also have to set /proc/self/setgroups to "deny".
We cannot map any user other than ourselves in a rootless container,
which means that any user-related directives won't work. You can only be
"root".

If you want to use this, you have to make sure you remove the gid=5 entry from the /dev/pts mount, and only map your own user in the namespace.

Here's runc start working in both root and rootless setup:

And here's runc exec working in both root and rootless setup:

`TODO`

Open Questions

Make sure that the cgroup namespace actually allows us to set cgroups. It doesn't. I've sent an email to upstream proposing a potential solution. You can also take a look at the state of my (probably broken) kernel patchset.
Should we move runc back to /usr/bin, since it's no longer an admin piece of software? This would also mean moving the man pages to man1.

What works?

As unprivileged user:
- ❌ ~~runc checkpoint~~ (while potentially possible, not implemented)
- runc create (Console path resolution is done in host mount namespace #814)
- runc create --console (Console path resolution is done in host mount namespace #814)
- runc delete
- ❌ ~~runc events~~ (not really useful -- cgroups)
- runc exec
- runc exec --console (Console path resolution is done in host mount namespace #814)
- runc kill
- runc list
  - with containers not readable by us.
- ❌ ~~runc pause~~ (cgroups)
- ❌ ~~runc ps~~ (cgroups)
- ❌ ~~runc restore~~ (while potentially possible, not implemented)
- ❌ ~~runc resume~~ (cgroups)
- runc run
- runc run -d --console (Console path resolution is done in host mount namespace #814)
- runc spec
- runc start (create doesn't work -- Console path resolution is done in host mount namespace #814)
- runc state
- ❌ ~~runc update~~ (cgroups)
As root:
- ❌ ~~runc checkpoint~~ (while potentially possible, not implemented)
- runc delete
- ❌ ~~runc events~~ (not really useful -- cgroups)
- runc exec
- runc exec --console (Console path resolution is done in host mount namespace #814)
- runc kill
- runc list
- ❌ ~~runc pause~~ (cgroups)
- ❌ runc ps (cgroups)
- ❌ ~~runc restore~~ (while potentially possible, not implemented)
- ❌ ~~runc resume~~ (cgroups)
- runc run
- runc run -d --console (Console path resolution is done in host mount namespace #814)
- runc spec
- runc start (create doesn't work -- Console path resolution is done in host mount namespace #814)
- runc state
- ❌ ~~runc update~~ (cgroups)

Kernel Patches

CLONE_NEWCGROUP fix to allow unprivileged processes to allow us to create subtrees.
- v1: https://lkml.org/lkml/2016/5/1/77
- v2: https://lkml.org/lkml/2016/5/1/87
- v3: https://lkml.org/lkml/2016/5/2/280
- ❌ v4: https://lkml.org/lkml/2016/5/13/576

Implements #38.

Signed-off-by: Aleksa Sarai asarai@suse.de

mrunalp · 2016-04-23T16:32:57Z

For cgroups, we can skip doing any setup if cgroupsPath == "" and Resources == nil in the config.
We can either skip or introduce a NoOp cgroups manager.

cyphar · 2016-04-24T03:05:06Z

@mrunalp I'm going to go with a rootless cgroup manager so we might expand it later (there are some upcoming kernel features that might make cgroups in rootless containers usable).

mrunalp · 2016-04-24T03:10:59Z

Sounds good.

Sent from my iPhone

On Apr 23, 2016, at 8:05 PM, Aleksa Sarai notifications@github.com wrote:

@mrunalp I'm going to go with a rootless cgroup manager so we might expand it later (there are some upcoming kernel features that might make cgroups in rootless containers usable).

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

crosbymichael · 2016-04-25T19:19:36Z

libcontainer/configs/validate/rootless.go

+		return nil
+	}
+
+	// Used for comparison.


This is some pretty complex validation. So some cgroups are ok but others are not?

We could just fail the check if any values are set at all. No need to go check for defaults. User can create the config without setting resources or path which is simple to check.

The /actual/ check is that all cgroups have no non-default settings. Unfortunately, specconv adds device cgroup settings that get merged with the config the user specified -- so a simple "is this equal to the zero value" check doesn't cut it. I haven't figured out a nice way of dealing with that (specconv runs long before we get to this part, and we need it to run before we can do anything with the config (like figuring out if we're rootless)).

Maybe we can do our rootless check before specconv, then specconv doesn't modify the cgroup settings if we're rootless, and then we do the config checks for rootless (do we have mapping rights).

Ya, that sounds better. We should be looking for this in runc not in libcontainer.

I'm not really convinced that we shouldn't be doing any checking in libcontainer. The same question can be asked about libcontainer/configs/validate -- why do we do any config verification inside libcontainer? There's also a question of whether or not libcontainer should autodetect rootless mode or whether it should be passed as an option (you can't use rootless containers with root as far as I can tell -- and it's definitely a bad idea).

runc should populate the correct config that libcontainer gets and it should not be modified inside libcontainer. All the changes that need to be made should happen while we generate the config not after it is made.

The current state of this patchset doesn't modify the config inside libcontainer. The issue is that specconv adds device options to []Device even if the user doesn't specify anything. So we can either do the verification of the cgroups in runC (which means that if someone uses libcontainer directly they probably won't immediately realise they can't set cgroup settings) or we do the verification in libcontainer and make specconv not generate any cgroup options in rootless mode.

I'll also move the isRootless checks to RootlessValidator.

EDIT: I've fixed this.

jessfraz · 2016-04-26T16:43:16Z

i think the rootless cgroups manager is good, then it can be expanded when the cgroups ns is added :) just my opinion but ianam, thanks for this

jessfraz · 2016-04-26T16:45:20Z

also wrt the features that might not be possible or are hard for the time being, they could be disabled, and then slowly turned back on as implementations evolve, kinda like how we did userns in docker, and then slowly added more features back in wrt sharing namespaces
it's easier to make a smaller change then iterate on it, then one huge one

cyphar · 2016-04-26T23:08:01Z

@jfrazelle AFAICS all of the core features work. But some of them either just require root (criu IIRC) or currently can't be done under user namespaces (cgroups -- which I'm working on a patch for). But all of the others should still work, and in principle I want to try to get all of the features working for root operating on a rootless container.

jessfraz · 2016-04-26T23:09:54Z

Nice :)

On Tuesday, April 26, 2016, Aleksa Sarai notifications@github.com wrote:

@jfrazelle https://github.com/jfrazelle AFAICS all of the core features
work. But some of them either just require root (criu IIRC) or currently
can't be done under user namespaces (cgroups -- which I'm working on a
patch for). But all of the others should still work, and in principle I
want to try to get all of the features working for root operating on a
rootless container.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#774 (comment)

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

cyphar · 2016-04-26T23:13:55Z

Note: as far as I can see, the only thing left to do before we can clear most of the checkpoints is for me to fix up the rootless cgroup manager so that it stores the process's actual cgroup path. Then runc events should work, as well as all of the freezer code (for root). Checkpoint and restore might also work just by doing that.

cyphar · 2016-04-26T23:14:20Z

Whoops wrong button. ;)

cyphar · 2016-04-29T05:55:22Z

@avagin Do you know if it's possible to run criu as an unprivileged user (specifically from the kernel side when interacting with unprivileged user namespaces)? If not, is there any intention from upstream to get that to work? And if not, should we disable all checkpoint/restore functionality for rootless containers (even if the person doing the checkpoint/restore is root -- since the restore setup might break in weird ways)?

davidlt · 2016-05-02T11:54:50Z

Looks like this is moving forward, very interesting. Thus I started testing it. Built on Fedora 24 (updated on May 2nd). Looks like mount-bind works fine without root permissions. I wasn't lucky with internet connectivity without sudo. Later on I need to check if there is a way not to be a root user within container (our software will not using root account). Nice progress!

[davidlt@pccms205 magic_dir]$ pwd
/home/davidlt/magic_dir
[davidlt@pccms205 magic_dir]$ ls
[davidlt@pccms205 magic_dir]$ touch HOST_FILE
[davidlt@pccms205 magic_dir]$ ls
HOST_FILE
[davidlt@pccms205 magic_dir]$ cat /etc/os-release | grep PRETTY_NAME >> HOST_FILE
[davidlt@pccms205 magic_dir]$ cat HOST_FILE
PRETTY_NAME="Fedora 24 (Workstation Edition)"
[davidlt@pccms205 test]$ runc --root $PWD start test_container
sh-4.2# cat /etc/os-release | grep PRETTY_NAME
PRETTY_NAME="CentOS Linux 7 (Core)"
sh-4.2# cd /some
sh-4.2# cat /etc/os-release | grep PRETTY_NAME >> HOST_FILE
sh-4.2# touch NEW_FILE
sh-4.2# exit
[davidlt@pccms205 test]$ cat ~/magic_dir/HOST_FILE 
PRETTY_NAME="Fedora 24 (Workstation Edition)"
PRETTY_NAME="CentOS Linux 7 (Core)"
[davidlt@pccms205 test]$ file ~/magic_dir/NEW_FILE 
/home/davidlt/magic_dir/NEW_FILE: empty

For anyone else who wants to test this I am also sharing diff between original and modified config.json:

--- config.json 2016-05-02 13:25:24.468181348 +0200
+++ config.json.correct 2016-05-02 13:25:14.661496040 +0200
@@ -6,7 +6,7 @@
    },
    "process": {
        "terminal": true,
-       "user": {},
+       "user": { "uid": 0, "gid": 0, "additionalGids": null },
        "args": [
            "sh"
        ],
@@ -35,6 +35,12 @@
    },
    "hostname": "runc",
    "mounts": [
+                {
+                        "destination": "/some",
+                        "type": "bind",
+                        "source": "/home/davidlt/magic_dir",
+                        "options": [ "rbind" ]
+                },
        {
            "destination": "/proc",
            "type": "proc",
@@ -60,8 +66,7 @@
                "noexec",
                "newinstance",
                "ptmxmode=0666",
-               "mode=0620",
-               "gid=5"
+               "mode=0620"
            ]
        },
        {
@@ -112,14 +117,20 @@
    ],
    "hooks": {},
    "linux": {
-       "resources": {
-           "devices": [
-               {
-                   "allow": false,
-                   "access": "rwm"
-               }
-           ]
-       },
+                "uidMappings": [
+                        {
+                                "hostID": 1000,
+                                "containerID": 0,
+                                "size": 1
+                        }
+                ],
+                "gidMappings": [
+                        {
+                                "hostID": 1000,
+                                "containerID": 0,
+                                "size": 1
+                        }
+                ],
        "namespaces": [
            {
                "type": "pid"
@@ -135,7 +146,10 @@
            },
            {
                "type": "mount"
-           }
+           },
+                        {
+                                "type": "user"
+                        }
        ],
        "maskedPaths": [
            "/proc/kcore",
@@ -152,4 +166,4 @@
            "/proc/sysrq-trigger"
        ]
    }
-}
\ No newline at end of file
+}

cyphar · 2016-05-02T12:11:02Z

@davidlt

Later on I need to check if there is a way not to be a root user within container (our software will not using root account). Nice progress!

Unfortunately this is not possible, due to restrictions within the kernel. Essentially this is a logcal result of these two restrictions on user namespaces:

All user namespaces must provide a mapping for root inside a container.
Unprivileged user namespaces can only provide a mapping for one user, the user which created the namespace.

As a result, the only user that is mapped inside the container is your user (as root). You can discuss with the kernel upstream about restriction number 1, because it's the only restriction which it might be possible to fix. The second restriction is just a security issue. But at the moment, there isn't a way to do what you want.

davidlt · 2016-05-02T12:13:49Z

Yeah, I tried a few thing. I even added a user within a container using Docker, which I can see while running with runC, but cannot launch anything under that user.

What about internet connectivity?

cyphar · 2016-05-02T12:33:10Z

Unfortunately, creating bridges between a container's network namespace and the hosts's network namespace requires creating a virtual interface in the host's network namespace. AFAIK that requires root (but I may be wrong). One potential solution would be to not create a network namespace (this currently doesn't work due to some bindmount issues, but that can be fixed). Obviously, this means you don't get the benefits of network namespacing (such as iptables rules without root).

This could be something else we could push the kernel about. Unfortunately, I'm not familiar enough with networking to be able to help with writing a kernel patch. My only kernel experience thus far has been with cgroups.

davidlt · 2016-05-02T13:09:01Z

Okay. Looks like internet connectivity will arrive at some point. This is needed in my case because majority of data is not local (i.e. not available on some shared file system mounted via bind/slave mount to the container). It most cases it has to be streamed (network IO). I think, at this point just having internet connectivity is good step forward.

Do you know what was the reasoning for namespaces to provide root inside the container? Quick googling didn't reveal too much documentation around this.

cyphar · 2016-05-02T13:37:20Z

To be honest, I just quickly read through the kernel code and I'm not sure this is a restriction imposed by the kernel. It's possible it's just how we've set up user namespaced containers to work. Currently my runC build is failing with the error:

process_linux.go:247: getting pipe fds for pid 16189 caused "readlink /proc/16189/fd/0: permission denied"

Which tells me there's some permission issues with the /proc setup we have (where we read the pipe file descriptors over stdin -- but for some reason it looks like the process can't open its own stdin?).

mrunalp · 2016-05-02T22:12:49Z

@davidlt Best bet for networking would be to use the host network stack (i.e. don't add it to the config).
The way typically networking is setup requires moving network devices from host network namespace to the container's network namespace. With privileged user namespaces, the runc hooks can do that work but that isn't possible with unprivileged containers. It would need discussion with upstream as @cyphar suggested.

avagin · 2016-05-03T21:29:28Z

All cgroup operations require having CAP_SYS_ADMIN in the root user
namespace. This means that we cannot set up any cgroups, or join
cgroups. The new cgroup namespace doesn't fix this for us either, but
hopefully we can get a patch upstream to fix this.

I don't understand this passage. cgroups works for unprivileged users by the same way as other file systems. I've read the kernel code and haven't found places in cgroup code which are protected by CAP_SYS_ADMIN.

[avagin@laptop ~]$ whoami 
avagin
[avagin@laptop ~]$ sudo mkdir /sys/fs/cgroup/cpu/test
[avagin@laptop ~]$ sudo chown avagin:avagin /sys/fs/cgroup/cpu/test
[avagin@laptop ~]$ echo $$ > /sys/fs/cgroup/cpu/test/tasks 
bash: /sys/fs/cgroup/cpu/test/tasks: Permission denied
[avagin@laptop ~]$ mkdir /sys/fs/cgroup/cpu/test/sub
[avagin@laptop ~]$ echo $$ > /sys/fs/cgroup/cpu/test/sub/tasks 
[avagin@laptop ~]$ cat /sys/fs/cgroup/cpu/test/sub/cpu.shares 
1024
[avagin@laptop ~]$ echo 2014 > /sys/fs/cgroup/cpu/test/sub/cpu.shares
[avagin@laptop ~]$ echo 512 > /sys/fs/cgroup/cpu/test/sub/cpu.shares
[avagin@laptop ~]$ ls -l /sys/fs/cgroup/cpu/test/
total 0
-rw-r--r--. 1 root   root   0 May  3 14:26 cgroup.clone_children
-rw-r--r--. 1 root   root   0 May  3 14:26 cgroup.procs
-r--r--r--. 1 root   root   0 May  3 14:26 cpuacct.stat
-rw-r--r--. 1 root   root   0 May  3 14:26 cpuacct.usage
-r--r--r--. 1 root   root   0 May  3 14:26 cpuacct.usage_percpu
-rw-r--r--. 1 root   root   0 May  3 14:26 cpu.cfs_period_us
-rw-r--r--. 1 root   root   0 May  3 14:26 cpu.cfs_quota_us
-rw-r--r--. 1 root   root   0 May  3 14:26 cpu.shares
-r--r--r--. 1 root   root   0 May  3 14:26 cpu.stat
-rw-r--r--. 1 root   root   0 May  3 14:26 notify_on_release
drwxrwxr-x. 2 avagin avagin 0 May  3 14:25 sub
-rw-r--r--. 1 root   root   0 May  3 14:25 tasks
[avagin@laptop ~]$ ls -l /sys/fs/cgroup/cpu/test/sub/
total 0
-rw-r--r--. 1 avagin avagin 0 May  3 14:25 cgroup.clone_children
-rw-r--r--. 1 avagin avagin 0 May  3 14:25 cgroup.procs
-r--r--r--. 1 avagin avagin 0 May  3 14:25 cpuacct.stat
-rw-r--r--. 1 avagin avagin 0 May  3 14:25 cpuacct.usage
-r--r--r--. 1 avagin avagin 0 May  3 14:25 cpuacct.usage_percpu
-rw-r--r--. 1 avagin avagin 0 May  3 14:25 cpu.cfs_period_us
-rw-r--r--. 1 avagin avagin 0 May  3 14:25 cpu.cfs_quota_us
-rw-r--r--. 1 avagin avagin 0 May  3 14:25 cpu.shares
-r--r--r--. 1 avagin avagin 0 May  3 14:25 cpu.stat
-rw-r--r--. 1 avagin avagin 0 May  3 14:25 notify_on_release
-rw-r--r--. 1 avagin avagin 0 May  3 14:25 tasks

jessfraz · 2016-05-03T21:31:24Z

Just device cgroups will fail

On Tue, May 3, 2016 at 2:29 PM, Andrew Vagin notifications@github.com
wrote:

All cgroup operations require having CAP_SYS_ADMIN in the root user
namespace. This means that we cannot set up any cgroups, or join
cgroups. The new cgroup namespace doesn't fix this for us either, but
hopefully we can get a patch upstream to fix this.

I don't understand this passage. I tried and cgroups works for
unprivileged users by the same way as other file systems. I've read the
kernel code and haven't found places in cgroup code which are protected by
CAP_SYS_ADMIN.

[avagin@laptop ~]$ whoami
avagin
[avagin@laptop ~]$ sudo mkdir /sys/fs/cgroup/cpu/test
[avagin@laptop ~]$ sudo chown avagin:avagin /sys/fs/cgroup/cpu/test
[avagin@laptop ~]$ echo $$ > /sys/fs/cgroup/cpu/test/tasks
bash: /sys/fs/cgroup/cpu/test/tasks: Permission denied
[avagin@laptop ~]$ mkdir /sys/fs/cgroup/cpu/test/sub
[avagin@laptop ~]$ echo $$ > /sys/fs/cgroup/cpu/test/sub/tasks
[avagin@laptop ~]$ cat /sys/fs/cgroup/cpu/test/sub/cpu.shares
1024
[avagin@laptop ~]$ echo 2014 > /sys/fs/cgroup/cpu/test/sub/cpu.shares
[avagin@laptop ~]$ echo 512 > /sys/fs/cgroup/cpu/test/sub/cpu.shares
[avagin@laptop ~]$ ls -l /sys/fs/cgroup/cpu/test/
total 0
-rw-r--r--. 1 root root 0 May 3 14:26 cgroup.clone_children
-rw-r--r--. 1 root root 0 May 3 14:26 cgroup.procs
-r--r--r--. 1 root root 0 May 3 14:26 cpuacct.stat
-rw-r--r--. 1 root root 0 May 3 14:26 cpuacct.usage
-r--r--r--. 1 root root 0 May 3 14:26 cpuacct.usage_percpu
-rw-r--r--. 1 root root 0 May 3 14:26 cpu.cfs_period_us
-rw-r--r--. 1 root root 0 May 3 14:26 cpu.cfs_quota_us
-rw-r--r--. 1 root root 0 May 3 14:26 cpu.shares
-r--r--r--. 1 root root 0 May 3 14:26 cpu.stat
-rw-r--r--. 1 root root 0 May 3 14:26 notify_on_release
drwxrwxr-x. 2 avagin avagin 0 May 3 14:25 sub
-rw-r--r--. 1 root root 0 May 3 14:25 tasks
[avagin@laptop ~]$ ls -l /sys/fs/cgroup/cpu/test/sub/
total 0
-rw-r--r--. 1 avagin avagin 0 May 3 14:25 cgroup.clone_children
-rw-r--r--. 1 avagin avagin 0 May 3 14:25 cgroup.procs
-r--r--r--. 1 avagin avagin 0 May 3 14:25 cpuacct.stat
-rw-r--r--. 1 avagin avagin 0 May 3 14:25 cpuacct.usage
-r--r--r--. 1 avagin avagin 0 May 3 14:25 cpuacct.usage_percpu
-rw-r--r--. 1 avagin avagin 0 May 3 14:25 cpu.cfs_period_us
-rw-r--r--. 1 avagin avagin 0 May 3 14:25 cpu.cfs_quota_us
-rw-r--r--. 1 avagin avagin 0 May 3 14:25 cpu.shares
-r--r--r--. 1 avagin avagin 0 May 3 14:25 cpu.stat
-rw-r--r--. 1 avagin avagin 0 May 3 14:25 notify_on_release
-rw-r--r--. 1 avagin avagin 0 May 3 14:25 tasks

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#774 (comment)

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

cyphar · 2016-05-03T21:43:54Z

@avagin Sorry, I need to update the first paragraph. But if you look at your session log:

$ sudo chown avagin:avagin /sys/fs/cgroup/cpu/test

If you have to use root to enable using cgroups, it's not useful for some usecases of rootless containers. I've been working upstream to allow an unprivileged cgroup namespace to create their own subtrees, which is what is necessary to make rootless containers mostly feature-complete.

But yeah, CAP_SYS_ADMIN is definitely the wrong thing and I'll fix up what I wrote.

Do you know anything about whether you can use criu as an unprivileged user? My guess is that you probably can't, since it messes around with saving and restoring kernel state.

avagin · 2016-05-03T23:43:26Z

@cyphar

If you have to use root to enable using cgroups, it's not useful for some usecases of rootless containers.

Are you sure that this should be fixed in a kernel space? Maybe we need to fix this in systemd? How does LXC handles this problem.

Do you know anything about whether you can use criu as an unprivileged user? My guess is that you probably can't, since it messes around with saving and restoring kernel state.

We announced "Unprivileged dump" in CRIU 2.0 and now we are working on "Unprivileged restore". I don't know how good it will work for root-less containers, but I think it isn't unsolvable task.

davidlt · 2016-05-04T04:25:42Z

I have been testing DMTCP (transparently checkpoints a single-host or distributed computation in user-space) and that worked in user-land for checkpointing and restoring complex applications. Thus CRIU in user-land should technologically possible.

Previously Host{U,G}ID only gave you the root mapping, which isn't very useful if you are trying to do other things with the IDMaps. Signed-off-by: Aleksa Sarai <asarai@suse.de>

If the stdio of the container is owned by a group which is not mapped in the user namespace, attempting to fchown the file descriptor will result in EINVAL. Counteract this by simply not doing an fchown if the group owner of the file descriptor has no host mapping according to the configured GIDMappings. Signed-off-by: Aleksa Sarai <asarai@suse.de>

Since this is a runC-specific feature, this belongs here over in opencontainers/ocitools (which is for generic OCI runtimes). In addition, we don't create a new network namespace. This is because currently if you want to set up a veth bridge you need CAP_NET_ADMIN in both network namespaces' pinned user namespace to create the necessary interfaces in each network namespace. Signed-off-by: Aleksa Sarai <asarai@suse.de>

This is in preperation of allowing us to run the integration test suite on rootless containers. Signed-off-by: Aleksa Sarai <asarai@suse.de>

This adds targets for rootless integration tests, as well as all of the required setup in order to get the tests to run. This includes quite a few changes, because of a lot of assumptions about things running as root within the bats scripts (which is not true when setting up rootless containers). Signed-off-by: Aleksa Sarai <asarai@suse.de>

cyphar · 2017-03-23T11:10:01Z

@hqhq Squashed and rebased.

hqhq · 2017-03-23T11:23:43Z

LGTM

mrunalp · 2017-03-23T16:59:32Z

Should we drop groups that are unmapped?

[mrunal@acme busybox]$ ./runc --root ~/runc/state run 1234
/ # id
uid=0(root) gid=0(root) groups=65534,0(root)

cyphar · 2017-03-24T07:20:01Z

@mrunalp We don't have privileges to do that. In fact, it's a security feature of the kernel to not allow unprivileged users to drop supplementary groups because of paths with modes such as 0707. Such ACLs make it easy to blacklist a group from accessing something.

crosbymichael · 2017-03-27T17:58:52Z

ping @mrunalp

mrunalp · 2017-03-27T18:25:36Z

LGTM

cyphar · 2017-03-27T21:06:05Z

🎉

davidlt · 2017-03-27T21:10:22Z

Looks like it's party time! 11 months in development. Someone should post this on Hacker News.

marcosnils · 2017-03-27T21:12:44Z

:D

muayyad-alsadi · 2017-03-27T21:47:57Z

Any link to updated docs. Blog post?

cyphar · 2017-03-27T22:22:03Z

@muayyad-alsadi No doc updates, I'll follow up with those. Here's a blog post from last year and my talk at Linux.conf.au from earlier this year.

schema: add `clean` to Makefile

GordonTheTurtle added the status/0-triage label Apr 23, 2016

cyphar changed the title ~~[WIP] Rootless Containers~~ Rootless Containers Apr 24, 2016

crosbymichael reviewed Apr 25, 2016
View reviewed changes

cyphar closed this Apr 26, 2016

cyphar reopened this Apr 26, 2016

cyphar mentioned this pull request Apr 27, 2016

support cgroup v2 (unified hierarchy) #654

Closed

cyphar mentioned this pull request May 3, 2016

Cannot create user namespaced container without network namespaces #799

Open

davidlt mentioned this pull request May 3, 2016

runC depends on devices cgroup to find mountpoints #798

Open

cyphar added 5 commits March 23, 2017 20:46

libcontainer: configs: add proper HostUID and HostGID

f0876b0

Previously Host{U,G}ID only gave you the root mapping, which isn't very useful if you are trying to do other things with the IDMaps. Signed-off-by: Aleksa Sarai <asarai@suse.de>

integration: added root requires

2ce3357

This is in preperation of allowing us to run the integration test suite on rootless containers. Signed-off-by: Aleksa Sarai <asarai@suse.de>

mrunalp merged commit 653207b into opencontainers:master Mar 27, 2017

This was referenced Apr 7, 2017

update runc to version ac50e77bbb440dcab354a328c79754e2502b79ca linuxkit/linuxkit#1531

Merged

miragesdk: remove CAP_SYS_PTRACe linuxkit/linuxkit#1532

Merged

Cobi mentioned this pull request Apr 8, 2017

Docker-in-docker should work, even without privileged mode moby/moby#22139

Closed

This was referenced Apr 21, 2017

runc run fails with "mkdir /run/runc: permission denied" #1413

Closed

process_linux.go:252: getting pipe fds for pid 2130 caused "readlink /proc/2130/fd/0: permission denied" #1419

Closed

v217 mentioned this pull request Jul 3, 2017

Rootless container support sagemathinc/cocalc#2170

Closed

cyphar mentioned this pull request Jul 23, 2017

runc spec --rootless does not generate a config.json file #1531

Closed

stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017

Merge pull request opencontainers#774 from q384566678/makefile-clean

4754b55

schema: add `clean` to Makefile

cyphar added the rootless-containers label Mar 17, 2018

cyphar deleted the rootless-containers branch August 22, 2018 06:57

cyphar mentioned this pull request Sep 11, 2018

Rootless containers without uid mapping to root #1800

Open

ashokponkumar mentioned this pull request Jul 22, 2020

Using runc inside a pod in kubernetes with least privileges #2526

Open

cyphar mentioned this pull request Oct 13, 2020

[RFC] runc cli: --rootless flag idiosyncrasies #2645

Open

Rootless Containers #774

Rootless Containers #774

Conversation

cyphar commented Apr 23, 2016 • edited

TODO

Open Questions

What works?

Kernel Patches

mrunalp commented Apr 23, 2016

cyphar commented Apr 24, 2016

mrunalp commented Apr 24, 2016

crosbymichael Apr 25, 2016

Choose a reason for hiding this comment

mrunalp Apr 25, 2016

Choose a reason for hiding this comment

cyphar Apr 25, 2016 • edited

Choose a reason for hiding this comment

crosbymichael Apr 25, 2016

Choose a reason for hiding this comment

cyphar Apr 25, 2016

Choose a reason for hiding this comment

crosbymichael Apr 25, 2016

Choose a reason for hiding this comment

cyphar Apr 26, 2016 • edited

Choose a reason for hiding this comment

jessfraz commented Apr 26, 2016

jessfraz commented Apr 26, 2016

cyphar commented Apr 26, 2016 • edited

jessfraz commented Apr 26, 2016

cyphar commented Apr 26, 2016

cyphar commented Apr 26, 2016

cyphar commented Apr 29, 2016 • edited

davidlt commented May 2, 2016

cyphar commented May 2, 2016 • edited

davidlt commented May 2, 2016

cyphar commented May 2, 2016 • edited

davidlt commented May 2, 2016

cyphar commented May 2, 2016 • edited

mrunalp commented May 2, 2016

avagin commented May 3, 2016 • edited

jessfraz commented May 3, 2016

cyphar commented May 3, 2016 • edited

avagin commented May 3, 2016

davidlt commented May 4, 2016

cyphar commented Mar 23, 2017

hqhq commented Mar 23, 2017 • edited by caniszczyk

mrunalp commented Mar 23, 2017

cyphar commented Mar 24, 2017

crosbymichael commented Mar 27, 2017

mrunalp commented Mar 27, 2017 • edited by caniszczyk

cyphar commented Mar 27, 2017

davidlt commented Mar 27, 2017

marcosnils commented Mar 27, 2017

muayyad-alsadi commented Mar 27, 2017

cyphar commented Mar 27, 2017

cyphar commented Apr 23, 2016 •

edited

`TODO`

cyphar Apr 25, 2016 •

edited

cyphar Apr 26, 2016 •

edited

cyphar commented Apr 26, 2016 •

edited

cyphar commented Apr 29, 2016 •

edited

cyphar commented May 2, 2016 •

edited

cyphar commented May 2, 2016 •

edited

cyphar commented May 2, 2016 •

edited

avagin commented May 3, 2016 •

edited

cyphar commented May 3, 2016 •

edited

hqhq commented Mar 23, 2017 •

edited by caniszczyk

mrunalp commented Mar 27, 2017 •

edited by caniszczyk