lxc.cap.drop - fundamentally wrong model #12

Closed
globalcitizen opened this Issue Feb 13, 2013 · 13 comments

3 participants

@globalcitizen

It is common wisdom within computer security that "deny all, permit explicitly" is the appropriate model for security related configuration.

lxc.cap.drop in lxc.conf represents the opposite.

Therefore, I suggest removing lxc.cap.drop (or including with a HEAVY deprecated warning for a few releases, then removing) in favor of:
lxc.cap.preserve = chown ipc_lock lease net_bind_service net_broadcast net_raw setgid setuid sys_chroot

The implementation could occur similarly to the following:

  1. Get mask of current capabilities
  2. Get mask of 'lxc.cap.preserve' capabilities
  3. Drop any capabilities that differ

Right now, lxc users upgrading their host kernel are placed in a position where any additional capabilities implemented by the kernel will be granted by default to all containers. This is obviously undesirable.

Thanks for your consideration of this problem.

(Copied from http://sourceforge.net/tracker/?func=detail&aid=3572025&group_id=163076&atid=826303 )

Owner
hallyn commented Feb 15, 2013

A few comments:

  1. Frequently when a new capability is added, it is split out of an existing capability. Blindly refusing permission could cause application misbehavior as bad as a security breach.
  2. deny all, permit explicitly would in most common cases result in illegibly long lists of capabilities so that presence or absence of an important capability would be hard to detect.
  3. with user namespaces, this should become nigh upon moot.

But certainly your point has merit.

I would be fine with an optional lxc.cap.preserve config item. It would be mutually exclusive with lxc.cap.drop, and, if not specified, behavior would default to the legacy behavior (i.e. all caps allowed, not dropped). Please feel free to send such a patch to the lxc-devel mailing list, where the devel community can discuss the merits.

  1. Frequently when a new capability is added, it is split out of an existing capability. Blindly refusing permission could cause application misbehavior as bad as a security breach.

The objective of the proposed change is to ensure that equal or less functionality is granted to a container when kernel upgrades occur, given a static/unchanged configuration. This is a separate issue to requisite testing under the new environment (which in any event is good practice and largely impossible to sidestep; LXC and related kernel functionality still do not seem like they are at the stage of maturity where expectations of long-term API stability are widespread; AFAIK Linus has IIRC in the past strongly supported kernel changes breaking userspace assumptions and does not value backward-compatibility within many kernel features above functionality and maintainability...).

  1. deny all, permit explicitly would in most common cases result in illegibly long lists of capabilities so that presence or absence of an important capability would be hard to detect.

An interesting point. However, given the main tradeoff appears to be between (A) being explicit; and (B) being concise; I would suggest that in a system security context the former is without a doubt the preferred behaviour to aim for. While concision can improve security, in reality the main consumer of this configuration is the system, not the user. Diffing two configurations with an explicit list will reveal true modifications regardless of host kernel version; not so if the current, inverse and assumptions-based approach is taken. In implicit configurations lie many security-related bugs. Therefore I do think that (A) is the correct strategy.

  1. with user namespaces, this should become nigh upon moot.

I must admit that I don't understand this point; my understanding was that user namespaces only affect one kernel subsystem (uid/gid handling within the container) and not other functionality that can be configured against a container (ie. capabilities, the functionality currently under discussion). Could you please clarify what you meant here?

I would be fine with an optional lxc.cap.preserve config item. It would be mutually exclusive with lxc.cap.drop, and, if not specified, behavior would default to the legacy behavior (i.e. all caps allowed, not dropped).

Sounds perfect.

Please feel free to send such a patch to the lxc-devel mailing list, where the devel community can discuss the merits.

I'm afraid I'm not a kernel-level C-wrangler. If I had more time on my hands I'd love to delve in and learn but unfortunately I am under significant time pressure at the moment, so that's not likely to happen any time soon.

Owner
hallyn commented Feb 19, 2013

Quoting Walter Stanish (notifications@github.com):

  1. with user namespaces, this should become nigh upon moot.

I must admit that I don't understand this point; my understanding was that
user namespaces only affect one kernel subsystem (uid/gid handling within the
container) and not other functionality that can be configured against a
container (ie. capabilities, the functionality currently under discussion).
Could you please clarify what you meant here?

Privilege in a container is limited to resources owned by the container. As
an example, if you clone only a new user namespace, and become root in there,
you will have CAP_NET_ADMIN, but will not be able to exert it over your network
interfaces. Once you clone a new network namespace, which is then owned by your
new user namespace, you can administer it. (This means that, in order to
actually talk to the outside world, a privileged task on the host will need
to somehow hook you into the host's networking (i.e. hooking a veth endpoint into
a bridge).

-serge

I was going to go through the various capabilities on a brainstorm but decided to keep my response targeted at the fundamentals. I think that, fundamentally, you are looking at the concern - that new capabilities will be granted by default to existing containers as kernels are upgraded under the current system - through the eyes of userland functionality limitation, as things stand now. This is fine and understandable.

However, from a security perspective, even if you assume that issuing such-and-such a capability can only ever modify the state of processes within the same container (and that is a reasonably large assumption to make, given that there are probably loads of complex timing, entropy, and other non-obvious relationships between containers running under the same kernel that could be useful to a skilled attacker), they still - by definition - increase the kernel attack surface that is available to an attacker.

This is measurable, real, and undesirable.

Here's a quick review of some existing capabilities anyway; the following I saw as existing examples of types of capabilities that might cause issues for the overall host if granted by default to a container due to the present configuration system and a kernel upgrade introducing them, if a container was compromised by a bad actor.

CAP_BLOCK_SUSPEND (eg. battery death), CAP_SYS_ADMIN (mostly bad news), CAP_SYS_BOOT (unsure if this is container-limited), CAP_SYS_RAWIO (kmem etc.), CAP_SYS_RESOURCE (override quotas, resource exhaustion), CAP_WAKE_ALARM (eg. battery death)

nudge

mumble

Owner
hallyn commented Jun 14, 2013

I've sent a patch to lxc-devel and cc:d your saffrondigital email address, but it bounced.

Cheers Serge. Haven't worked there since early 2011 (2+ years) I'm afraid!

Owner

Serge did you get any feedback on that patch? It feels like hardly anyone cares about this...

Owner
hallyn commented Aug 30, 2013

Quoting Stéphane Graber (notifications@github.com):

Serge did you get any feedback on that patch? It feels like hardly anyone cares about this...

No, not a bit. And the original submitter seems to drop off the grid
for months at a time.

I don't want to lose the patch, but I also don't want to apply it
if noone wants it.

@hallyn hallyn was assigned Sep 4, 2013

Just to confirm, I am dying for this :) I have indeed been on the road for a few months following OHM2013, but am back online now.

Owner
hallyn commented Sep 5, 2013

Applying patch.

@hallyn hallyn closed this Sep 5, 2013
@stgraber stgraber added a commit to stgraber/lxc that referenced this issue Dec 19, 2013
@caglar10ur @stgraber caglar10ur + stgraber remove static_lock()/static_unlock() and start to use thread local st…
…orage (v2)

While testing lxc#106, I found that concurrent starts
are hanging time to time. I then reproduced the same problem in master and got following;

 [caglar@oOo:~] sudo gdb -p 16221
 (gdb) bt
 #0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 #1  0x00007f495526515c in _L_lock_982 () from /lib/x86_64-linux-gnu/libpthread.so.0
 #2  0x00007f4955264fab in __GI___pthread_mutex_lock (mutex=0x7f49556d4600 <static_mutex>) at pthread_mutex_lock.c:64
 #3  0x00007f49554b27a6 in lock_mutex (l=l@entry=0x7f49556d4600 <static_mutex>) at lxclock.c:78
 #4  0x00007f49554b2dac in static_lock () at lxclock.c:330
 #5  0x00007f4955498f71 in lxc_global_config_value (option_name=option_name@entry=0x7f49554c02cf "cgroup.use") at utils.c:273
 #6  0x00007f495549926c in default_cgroup_use () at utils.c:366
 #7  0x00007f49554953bd in lxc_cgroup_load_meta () at cgroup.c:94
 #8  0x00007f495548debc in lxc_spawn (handler=handler@entry=0x7f49200af300) at start.c:783
 #9  0x00007f495548e7a7 in __lxc_start (name=name@entry=0x7f49200b48a0 "lxc-test-concurrent-4", conf=conf@entry=0x7f49200b2030, ops=ops@entry=0x7f49556d3900 <start_ops>, data=data@entry=0x7f495487db90,
    lxcpath=lxcpath@entry=0x7f49200b2010 "/var/lib/lxc") at start.c:951
 #10 0x00007f495548eb9c in lxc_start (name=0x7f49200b48a0 "lxc-test-concurrent-4", argv=argv@entry=0x7f495487dbe0, conf=conf@entry=0x7f49200b2030, lxcpath=0x7f49200b2010 "/var/lib/lxc") at start.c:1048
 #11 0x00007f49554b68f1 in lxcapi_start (c=0x7f49200b1dd0, useinit=<optimized out>, argv=0x7f495487dbe0) at lxccontainer.c:648
 #12 0x0000000000401317 in do_function (arguments=0x1aa80b0) at concurrent.c:94
 #13 0x0000000000401499 in concurrent (arguments=<optimized out>) at concurrent.c:130
 #14 0x00007f4955262f6e in start_thread (arg=0x7f495487e700) at pthread_create.c:311
 #15 0x00007f4954f8d9cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

It looks like both parent and child end up with locked mutex thus deadlocks.

I ended up placing values in the thread local storage pool, instead of doing "unlock the lock in the child" dance

Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
4d409ab
@stgraber stgraber added a commit that referenced this issue Dec 19, 2013
@caglar10ur @stgraber caglar10ur + stgraber remove static_lock()/static_unlock() and start to use thread local st…
…orage (v2)

While testing #106, I found that concurrent starts
are hanging time to time. I then reproduced the same problem in master and got following;

 [caglar@oOo:~] sudo gdb -p 16221
 (gdb) bt
 #0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 #1  0x00007f495526515c in _L_lock_982 () from /lib/x86_64-linux-gnu/libpthread.so.0
 #2  0x00007f4955264fab in __GI___pthread_mutex_lock (mutex=0x7f49556d4600 <static_mutex>) at pthread_mutex_lock.c:64
 #3  0x00007f49554b27a6 in lock_mutex (l=l@entry=0x7f49556d4600 <static_mutex>) at lxclock.c:78
 #4  0x00007f49554b2dac in static_lock () at lxclock.c:330
 #5  0x00007f4955498f71 in lxc_global_config_value (option_name=option_name@entry=0x7f49554c02cf "cgroup.use") at utils.c:273
 #6  0x00007f495549926c in default_cgroup_use () at utils.c:366
 #7  0x00007f49554953bd in lxc_cgroup_load_meta () at cgroup.c:94
 #8  0x00007f495548debc in lxc_spawn (handler=handler@entry=0x7f49200af300) at start.c:783
 #9  0x00007f495548e7a7 in __lxc_start (name=name@entry=0x7f49200b48a0 "lxc-test-concurrent-4", conf=conf@entry=0x7f49200b2030, ops=ops@entry=0x7f49556d3900 <start_ops>, data=data@entry=0x7f495487db90,
    lxcpath=lxcpath@entry=0x7f49200b2010 "/var/lib/lxc") at start.c:951
 #10 0x00007f495548eb9c in lxc_start (name=0x7f49200b48a0 "lxc-test-concurrent-4", argv=argv@entry=0x7f495487dbe0, conf=conf@entry=0x7f49200b2030, lxcpath=0x7f49200b2010 "/var/lib/lxc") at start.c:1048
 #11 0x00007f49554b68f1 in lxcapi_start (c=0x7f49200b1dd0, useinit=<optimized out>, argv=0x7f495487dbe0) at lxccontainer.c:648
 #12 0x0000000000401317 in do_function (arguments=0x1aa80b0) at concurrent.c:94
 #13 0x0000000000401499 in concurrent (arguments=<optimized out>) at concurrent.c:130
 #14 0x00007f4955262f6e in start_thread (arg=0x7f495487e700) at pthread_create.c:311
 #15 0x00007f4954f8d9cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

It looks like both parent and child end up with locked mutex thus deadlocks.

I ended up placing values in the thread local storage pool, instead of doing "unlock the lock in the child" dance

Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
95b422f
@globalcitizen globalcitizen referenced this issue in globalcitizen/lxc-gentoo Jan 27, 2015
Open

Migrate to lxc.cap.keep #59

@stgraber stgraber pushed a commit to stgraber/lxc that referenced this issue Feb 10, 2015
@caglar10ur caglar10ur Add missing Attach options including Uid, Gid, Namespaces and Persona…
…lity.

Introduce DefaultAttachOptions

fixes #12
b098cbc
@z-image z-image pushed a commit to z-image/lxc that referenced this issue Oct 16, 2016
@caglar10ur @stgraber caglar10ur + stgraber remove static_lock()/static_unlock() and start to use thread local st…
…orage (v2)

While testing lxc#106, I found that concurrent starts
are hanging time to time. I then reproduced the same problem in master and got following;

 [caglar@oOo:~] sudo gdb -p 16221
 (gdb) bt
 #0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 #1  0x00007f495526515c in _L_lock_982 () from /lib/x86_64-linux-gnu/libpthread.so.0
 #2  0x00007f4955264fab in __GI___pthread_mutex_lock (mutex=0x7f49556d4600 <static_mutex>) at pthread_mutex_lock.c:64
 #3  0x00007f49554b27a6 in lock_mutex (l=l@entry=0x7f49556d4600 <static_mutex>) at lxclock.c:78
 #4  0x00007f49554b2dac in static_lock () at lxclock.c:330
 #5  0x00007f4955498f71 in lxc_global_config_value (option_name=option_name@entry=0x7f49554c02cf "cgroup.use") at utils.c:273
 #6  0x00007f495549926c in default_cgroup_use () at utils.c:366
 #7  0x00007f49554953bd in lxc_cgroup_load_meta () at cgroup.c:94
 #8  0x00007f495548debc in lxc_spawn (handler=handler@entry=0x7f49200af300) at start.c:783
 #9  0x00007f495548e7a7 in __lxc_start (name=name@entry=0x7f49200b48a0 "lxc-test-concurrent-4", conf=conf@entry=0x7f49200b2030, ops=ops@entry=0x7f49556d3900 <start_ops>, data=data@entry=0x7f495487db90,
    lxcpath=lxcpath@entry=0x7f49200b2010 "/var/lib/lxc") at start.c:951
 #10 0x00007f495548eb9c in lxc_start (name=0x7f49200b48a0 "lxc-test-concurrent-4", argv=argv@entry=0x7f495487dbe0, conf=conf@entry=0x7f49200b2030, lxcpath=0x7f49200b2010 "/var/lib/lxc") at start.c:1048
 #11 0x00007f49554b68f1 in lxcapi_start (c=0x7f49200b1dd0, useinit=<optimized out>, argv=0x7f495487dbe0) at lxccontainer.c:648
 #12 0x0000000000401317 in do_function (arguments=0x1aa80b0) at concurrent.c:94
 #13 0x0000000000401499 in concurrent (arguments=<optimized out>) at concurrent.c:130
 #14 0x00007f4955262f6e in start_thread (arg=0x7f495487e700) at pthread_create.c:311
 #15 0x00007f4954f8d9cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

It looks like both parent and child end up with locked mutex thus deadlocks.

I ended up placing values in the thread local storage pool, instead of doing "unlock the lock in the child" dance

Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
cc4ca54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment