Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--cap-drop and --cap-add do not work as expected #453

Closed
mviereck opened this issue Jan 6, 2022 · 18 comments
Closed

--cap-drop and --cap-add do not work as expected #453

mviereck opened this issue Jan 6, 2022 · 18 comments

Comments

@mviereck
Copy link

mviereck commented Jan 6, 2022

Hi @ctalledo ,

I found some confusing behaviour of sysbox with options --cap-add and --cap-drop.
For many cases --cap-drop ALL is disregarded at all. Checked with capsh --print | grep Current in container.
Example:

$ docker run --rm  --runtime=sysbox-runc --cap-drop ALL -- x11docker/check capsh --print | grep Current
Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read+eip

Compare with --runtime=runc:

$ docker run --rm --runtime=runc --cap-drop ALL -- x11docker/check capsh --print | grep Current
Current: =

I found only one setup that indeed drops all capabilities:

$ docker run --rm --runtime=sysbox-runc --cap-drop ALL --security-opt=no-new-privileges  --user 1000:1000 -- x11docker/check capsh --print | grep Current
Current: =

However, e.g. adding --cap-add SYS_BOOT fails and does not appear.

$ docker run --rm --runtime=sysbox-runc --cap-drop ALL --cap-add SYS_BOOT --security-opt=no-new-privileges --user 1000:1000 -- x11docker/check capsh --print | grep Current
Current: =

Dropping capabilities also fails if I don't use one of --security-opt=no-new-privileges or --user 1000:1000.

Expected behaviour: capsh --print | grep Current in container should show exactly the capabilities that are defined on CLI.

@ctalledo
Copy link
Member

ctalledo commented Jan 7, 2022

Hi @mviereck,

Sysbox is ~95% OCI spec compatible, and capability assignment is one of the areas where it deviates a bit from the OCI spec.

The intended behavior in Sysbox is described here: https://github.com/nestybox/sysbox/blob/master/docs/user-guide/security.md#process-capabilities

Basically, because Sysbox containers try to mimic a "real host" environment as much as possible, the root user in the container always starts with all capabilities enabled. Of course, Sysbox uses the Linux user-namespace on all containers to ensure that the root user inside the container has no privileges outside of it (i.e., the capabilities are confined).

On the other hand, non-root users in the container start without any capabilities, which is why docker run --runtime=sysbox-runc --user 1000:1000 ... starts with no caps by default.

However, you should be able to add capabilities to non-root users via docker run --runtime=sysbox-runc --cap-add <CAP> .... Not sure why this is not working as expected (I confirmed your findings), I need to investigate.

One thing to keep in mind is that --cap-drop makes a lot of sense when the root user in the container ** is ** the root user in the host (as with regular containers), as a way to increase security. But once you enable the Linux user-namespace on the container (as Sysbox always does), then --cap-drop for the root user looses much of it's purpose because all it does is give you a less powerful root inside the container (and deviates from what root on a real host means).

Does this make sense?

@mviereck
Copy link
Author

mviereck commented Jan 7, 2022

Hi @ctalledo ,

thank you for your explanation!
The links are quite helpful.

Concerning the intention of sysbox it makes sense to allow all capabilities by default for the root user (and it makes clear why user-ns is a must-have).

However, if one specifies a different set of capabilities, it should be regarded nonetheless.

On the other hand, non-root users in the container start without any capabilities, which is why docker run --runtime=sysbox-runc --user 1000:1000 ... starts with no caps by default.

This is not the case, all capabilities are available:

$ docker run --rm --runtime=sysbox-runc --user 1000:1000 x11docker/check capsh --print | grep Current
Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read+ep

They disappear for no obvious reason if I add --security-opt no-new-privileges.

One thing to keep in mind is that --cap-drop makes a lot of sense when the root user in the container ** is ** the root user in the host (as with regular containers), as a way to increase security. But once you enable the Linux user-namespace on the container (as Sysbox always does), then --cap-drop for the root user looses much of it's purpose because all it does is give you a less powerful root inside the container (and deviates from what root on a real host means).
Does this make sense?

I understand. But what if one wants a restricted root in container nonetheless?
I'd still say that custom capability settings should be regarded.

@ctalledo
Copy link
Member

ctalledo commented Jan 7, 2022

Hi @mviereck,

Strange that when you start the container as user 1000:1000 you see all caps; look what I see:

$ docker run --rm --runtime=sysbox-runc --user 1000:1000 -it x11docker/check /bin/sh
$ cat /proc/self/status | grep -i cap
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000        <<< HERE: no caps enabled (as expected)
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000

I understand. But what if one wants a restricted root in container nonetheless? I'd still say that custom capability settings should be regarded.

Two points here:

  • Once the container is set up with the user-namespace, the practical use cases for a restricted root diminish significantly.

  • We would have preferred to honor --cap-drop for the root user too, but the difficulty is that Sysbox never sees the --cap-drop or --cap-add options (these are Docker options). Sysbox is at the lowest level of the container stack, and it's only told "start the container with this user-ID and these capabilities". Thus, we can't easily tell if the capabilities are the default ones from Docker or if they were configured by the user.

Both of these led us to the decision that the root user inside Sysbox containers would always start with all caps enabled.

Hope that makes sense ...

@ctalledo
Copy link
Member

Hi @mviereck,

Any feedback on the last comment I posted regarding the --user 1000:1000 causing all caps to be disabled. Strange that you saw something different in your setup.

Thanks!

@mviereck
Copy link
Author

Sorry for my late response!

Strange that when you start the container as user 1000:1000 you see all caps; look what I see:

That's odd.
I can reproduce your output:

$ docker run --rm --runtime=sysbox-runc --user 1000:1000 -it x11docker/check bash

I have no name!@97adc1ec2e8b:/$ cat /proc/self/status | grep -i cap
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	0000003fffffffff
CapAmb:	0000000000000000

However, capsh --print | grep -i current gives a different result:

I have no name!@a14d21421991:/$ capsh --print | grep -i current
Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read+ep

But if I add --security-opt=no-new-privileges, capsh shows the expected result:

$ docker run --rm --runtime=sysbox-runc --user 1000:1000 -it --security-opt=no-new-privileges x11docker/check bash

I have no name!@2d3f63967816:/$ capsh --print | grep -i current
Current: =

I have no name!@2d3f63967816:/$ cat /proc/self/status | grep -i cap
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	0000003fffffffff
CapAmb:	0000000000000000

I cannot explain that. A bug in capsh? But with other runtimes it always shows exactly the capabilities I've set.
Maybe there is something else in the system where one can look for capabilities and capsh uses that?

@ctalledo
Copy link
Member

Thanks @mviereck for the response.

It must be something in the version of capsh in your image. Look what I see:

$ docker run --runtime=sysbox-runc -it --rm --user 1000:1000 capshtest /bin/bash

groups: cannot find name for group ID 1000
I have no name!@6f6f3e73d5b0:/$ capsh --print
Current: =
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Ambient set =

That looks pretty good and consistent with the output of /proc/self/status inside the container:

$ cat /proc/self/status | grep -i cap
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000

I am using an image called capshtest, created from this Dockerfile:

cesar@focal:~/tmp/capsh$ more Dockerfile 
FROM ubuntu:latest
RUN apt-get update && apt-get install -y libcap2-bin

@mviereck
Copy link
Author

mviereck commented Jan 26, 2022

I can confirm that your test setup looks well here, too. Tested with base images ubuntu:latest and debian:bullseye. The issue showing all caps in capsh --print occures only with debian:buster. So that seemed to be a capsh issue.

lauscher@debianlaptop:~/git2/test$ docker run --rm --user=1000:1000 --runtime=sysbox-runc test capsh --print
WARNING: libcap needs an update (cap=40 should have a name).
Current: =
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Ambient set =
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=1000(???) euid=1000(???)
gid=1000(???)
groups=
Guessed mode: UNCERTAIN (0)

lauscher@debianlaptop:~/git2/test$ docker run --rm --user=1000:1000 --runtime=sysbox-runc test cat /proc/self/status | grep -i cap
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	0000003fffffffff
CapAmb:	0000000000000000

Without --user=1000:1000 the Current output of capsh --print looks odd: (Current: =eip 38,39,40-eip). However, it looks the same with --runtime=runc --cap-add=ALL, so this is rather a capsh issue.

lauscher@debianlaptop:~/git2/test$ docker run --rm --runtime=sysbox-runc test capsh --print
WARNING: libcap needs an update (cap=40 should have a name).
Current: =eip 38,39,40-eip
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Ambient set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=
Guessed mode: UNCERTAIN (0)

lauscher@debianlaptop:~/git2/test$ docker run --rm --runtime=sysbox-runc test cat /proc/self/status | grep -i cap
CapInh:	0000003fffffffff
CapPrm:	0000003fffffffff
CapEff:	0000003fffffffff
CapBnd:	0000003fffffffff
CapAmb:	0000003fffffffff

--cap-add=CHOWN --user=1000:1000 is ignored, CHOWN is not available:

lauscher@debianlaptop:~/git2/test$ docker run --rm --runtime=sysbox-runc  --user=1000:1000 --cap-add=CHOWN test cat /proc/self/status | grep -i cap
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	0000003fffffffff
CapAmb:	0000000000000000


--cap-drop=ALL for root is ignored, all capabilities are available:

lauscher@debianlaptop:~/git2/test$ docker run --rm --runtime=sysbox-runc --cap-drop=ALL test cat /proc/self/status | grep -i cap
CapInh:	0000003fffffffff
CapPrm:	0000003fffffffff
CapEff:	0000003fffffffff
CapBnd:	0000003fffffffff
CapAmb:	0000003fffffffff

So far, sysbox-runc seems to behave as you have intended.

@mviereck
Copy link
Author

mviereck commented Jan 26, 2022

Although sysbox-runc behaves as you have intended, I am still not lucky about its break with the OCI standard concerning capabilities.

x11docker makes heavy use of capabilities because it follows the principle of least privilege.
By default, it drops all capabilities --cap-drop=ALL --security-opt=no-new-privileges and adds only those which are essentially needed for some purpose. It always warns the user if some privilege is added to this default.
(Referring to #452 )


Just one example, x11docker settings for systemd in container:

  --cap-drop ALL \
  --cap-add AUDIT_WRITE \
  --cap-add CHOWN \
  --cap-add DAC_OVERRIDE \
  --cap-add FOWNER \
  --cap-add FSETID \
  --cap-add SETGID \
  --cap-add SETPCAP \
  --cap-add SETUID \
  --cap-add SYS_BOOT \
  --security-opt no-new-privileges \

After startup of systemd as root it switches to an unprivileged user to run the desired application. This user isn't even able to switch back to root again. Those restrictions are not possible with sysbox-runc.


Another setup starts as an unprivileged user, but allows to switch to root with su or sudo:

  --user 1000:1000 \
  --cap-drop ALL \
  --cap-add AUDIT_WRITE \
  --cap-add CHOWN \
  --cap-add DAC_OVERRIDE \
  --cap-add FOWNER \
  --cap-add FSETID \
  --cap-add KILL \
  --cap-add SETGID \
  --cap-add SETPCAP \
  --cap-add SETUID \

This will fail with sysbox-runc because it does not allow to add capabilities with --user=1000:1000.


This in consequence leads to print warnings about sysbox-runc in x11docker that capabilities cannot be restricted.
For some options (like the sudo example) x11docker has to run a different setup for sysbox-runc than for other runtimes to make it work.
For single capability changes where x11docker normally prints a warning an extra check for sysbox-runc has to be added.
Ok, I can do all this, it is some coding for several cases in x11docker. But do you understand that I am not lucky about this?


My proposal would be:

  • Follow the OCI standard, i.e. set the capabilities as they are given by docker or podman.
  • Users who want to use the full power of sysbox-runc with all privileges in container just have to add --cap-add=ALL. After all, it is a great feature that one can do it with sysbox-runc.

@ctalledo
Copy link
Member

ctalledo commented Jan 26, 2022

Hi @mviereck,

Thanks for the very good feedback.

I get why this is a problem for x11docker, and we want to design it so that you don't have to code-up any special changes in x11docker for Sysbox.

On the other hand, the reason we decided that Sysbox would give the root user all caps by default is because Sysbox is a specialized runtime to create "VM-like" environments in secure containers (via the Linux user-ns and other isolation features), and a root user in such environments has all caps enabled by default.

We felt that in the common case, most users (and most software inside the container) expect root to be all powerful within the container (as it's on a real host or VM), so we wanted to avoid the burden of users having to specify --cap-add=ALL on pretty much every Sysbox container. Reverting that decision would add burden and break most users of Sysbox at this time.

I am thinking we can thread the needle with an approach such as:

If user does not specify caps:
  if root user -> all caps
  else -> no caps
else 
  honor user-specified caps (for root or non-root)

The problem is that at Sysbox's level, we may not be able to discern if the user specified the caps. But I'll dig down further.

Would such an approach work for x11docker?

Thanks again!

@mviereck
Copy link
Author

mviereck commented Jan 26, 2022

Thank you very much for considering this!
If already many users rely on the current behaviour of Sysbox, it is hard to change this design, of course.

It is impossible to decide by given capabilities if they are set manually or a default of docker/podman/whatever.
Also the current usual defaults might change at any time, and might have been set manually as well.

One idea:
What if x11docker sets an environment variable option like --env SYSBOX_RUNC=keepcapset or --env SYSBOX_RUNC_KEEPCAPS=1? And Sysbox would know by this variable that it should use the defined capabilities as given by docker?
I assume Sysbox sees the environment variables?

Such an environment variable could even be used for further user configuration of Sysbox features I am not aware of yet and are not available by docker cli.

via the Linux user-ns and other isolation features

May I ask what these other isolation features are? You can just point me to a documentation link.

@ctalledo
Copy link
Member

ctalledo commented Jan 26, 2022

It is impossible to decide by given capabilities if they are set manually or a default of docker/podman/whatever.

It may be possible, let me investigate.

What if x11docker sets an environment variable option like --env SYSBOX_RUNC=keepcapset or --env SYSBOX_RUNC_KEEPCAPS=1?

In general we want to avoid using environment vars to pass configs to Sysbox. It's a slippery slope that up to know we've been able to avoid (and want to continue to do so).

@mviereck
Copy link
Author

In general we want to avoid using environment vars to pass configs to Sysbox. It's a slippery slope that up to know we've been able to avoid (and want to continue to do so).

Just to point out, I don't mean system wide environment variables (that can be slippery indeed), but those set on docker cli with option --env for the container only.

@ctalledo
Copy link
Member

Just to point out, I don't mean system wide environment variables (that can be slippery indeed), but those set on docker cli with option --env for the container only.

Yes understood; it's just that those container env variables can also be a slippery slope and we want to avoid them if possible. Let me research if we can avoid them, otherwise they are probably the best alterntive.

@ctalledo
Copy link
Member

Apologies for the silence on this, been busy with other Sysbox issues.

Another approach we are considering here is to add a command line option to the sysbox-mgr daemon that would put Sysbox into a stricter OCI compliance mode. In this case, Sysbox would not enable all caps for the root user inside the container's user-ns by default, but rather honor those selected by the higher level manager (Docker, K8s, etc.) This would appeal to users that want OCI compatibility, at the detriment of having a more complex "docker run" command to launch the system container.

Let me know if this sounds reasonable @mviereck.

@mviereck
Copy link
Author

mviereck commented Feb 17, 2022

Thank you for your proposal!

This is a possible way to go, but I see some disadvantages:

  • One would not know from command line how a command will work, i.e. if the capabilities will be restricted or not. The same docker run --runtime=sybox-runc ... command could result in two different setups. One would have to somehow check the configuration of sysbox-mgr to predict the result.
    In case of x11docker, I would have to provide two different code pathes for Sysbox, one for the restricted and one for the unrestricted setup.
  • The setting would be global, so all containers with Sysbox would be affected.
  • Likely only few users would change the default to restricted. But both setups would have to be maintained/supported by you.

It would be a reasonable solution if the intention is to make the restricted setup a default in future (where all users would have to set --cap-add=ALL), and this option serves to provide a soft transition.

A default restricted setup (in future) would allow to use --cap-add=ALL as the switch between restricted and unrestricted setup.

Please don't take my thoughts too serious. After all, I can adjust x11docker for an unrestricted Sysbox setup.
On the other hand: The principle of least privilege has its advantages. x11docker with Sysbox and restricted capability settings would not have been affected by CVE 2022-0185. There was once another CVE in docker that also did not affect default x11docker setups.

@ctalledo
Copy link
Member

ctalledo commented Mar 4, 2022

In the end, we decided to apply the solution as follows:

  • If the sysbox container is passed the SYSBOX_HONOR_CAPS=TRUE environment variable, Sysbox will honor the capabilities passed by the higher level container manager (e.g., Docker) when launching the container. For example:
$ docker run --runtime=sysbox-runc -e SYSBOX_HONOR_CAPS=TRUE --rm -it alpine 
/ # cat /proc/self/status | grep -i cap
CapInh: 00000000a80425fb
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
  • Otherwise, Sysbox will assign default capabilities to the container to mimic those of a Linux host: if the container's process is a root process, it will assign full capabilities; otherwise, it will assign no capabilities.

For example, container process is root:

$ docker run --runtime=sysbox-runc --rm alpine sh -c "cat /proc/self/status | grep -i cap"
CapInh: 0000003fffffffff
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000003ffffffff

Container process is non-root:

$ docker run --runtime=sysbox-runc --rm -u 1000:1000 alpine sh -c "cat /proc/self/status | grep -i cap"
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
  • Note that since Sysbox uses the Linux user-namespace on all containers, the capabilities are restricted within the container (i.e., the container has no capabilities at host level).

  • In general, if a user wants fine control of the capabilities (e.g., for extra security), it can use the SYSBOX_HONOR_CAPS=TRUE setting. The drawback is that the user must understand all the capabilities required by the processes inside the container.

  • Finally, the SYSBOX_HONOR_CAPS=TRUE controls the per-container behavior. Users that want this behavior to apply to all containers can do so by editing the sysbox-mgr systemd unit to add the --honor-caps flags to the sysbox-mgr command line. If the user does this, she need not pass SYSBOX_HONOR_CAPS=TRUE to the containers anymore. And she can always start the container without the config by passing the SYSBOX_HONOR_CAPS=FALSE env var to the container (i.e., the env var always overrides the global config).

Special thanks to @mviereck for opening this issue and suggesting a good solution.

@ctalledo
Copy link
Member

ctalledo commented Mar 4, 2022

Code has been merged to the master branch via these PRs:

sysbox-mgr: nestybox/sysbox-mgr#56
sysbox-runc: nestybox/sysbox-runc#74
sysbox-ipc: nestybox/sysbox-ipc#28
sysbox: #495

Closing now. Please re-open if any more issues are found.

@ctalledo ctalledo closed this as completed Mar 4, 2022
@mviereck
Copy link
Author

mviereck commented Mar 6, 2022

Much thanks for your solution! It works very well here.

The drawback is that the user must understand all the capabilities required by the processes inside the container.

It might help to give some examples in the docs.
There could be lists of needed capabilities for different tasks.
A few are already given above in #453 (comment). I could provide a few more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants