-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--cap-drop and --cap-add do not work as expected #453
Comments
Hi @mviereck, Sysbox is ~95% OCI spec compatible, and capability assignment is one of the areas where it deviates a bit from the OCI spec. The intended behavior in Sysbox is described here: https://github.com/nestybox/sysbox/blob/master/docs/user-guide/security.md#process-capabilities Basically, because Sysbox containers try to mimic a "real host" environment as much as possible, the root user in the container always starts with all capabilities enabled. Of course, Sysbox uses the Linux user-namespace on all containers to ensure that the root user inside the container has no privileges outside of it (i.e., the capabilities are confined). On the other hand, non-root users in the container start without any capabilities, which is why However, you should be able to add capabilities to non-root users via One thing to keep in mind is that Does this make sense? |
Hi @ctalledo , thank you for your explanation! Concerning the intention of sysbox it makes sense to allow all capabilities by default for the root user (and it makes clear why user-ns is a must-have). However, if one specifies a different set of capabilities, it should be regarded nonetheless.
This is not the case, all capabilities are available:
They disappear for no obvious reason if I add
I understand. But what if one wants a restricted root in container nonetheless? |
Hi @mviereck, Strange that when you start the container as user 1000:1000 you see all caps; look what I see:
Two points here:
Both of these led us to the decision that the root user inside Sysbox containers would always start with all caps enabled. Hope that makes sense ... |
Hi @mviereck, Any feedback on the last comment I posted regarding the Thanks! |
Sorry for my late response!
That's odd.
However,
But if I add
I cannot explain that. A bug in |
Thanks @mviereck for the response. It must be something in the version of
That looks pretty good and consistent with the output of
I am using an image called
|
I can confirm that your test setup looks well here, too. Tested with base images
Without
So far, |
Although
Just one example,
After startup of systemd as root it switches to an unprivileged user to run the desired application. This user isn't even able to switch back to root again. Those restrictions are not possible with Another setup starts as an unprivileged user, but allows to switch to root with
This will fail with This in consequence leads to print warnings about My proposal would be:
|
Hi @mviereck, Thanks for the very good feedback. I get why this is a problem for x11docker, and we want to design it so that you don't have to code-up any special changes in x11docker for Sysbox. On the other hand, the reason we decided that Sysbox would give the root user all caps by default is because Sysbox is a specialized runtime to create "VM-like" environments in secure containers (via the Linux user-ns and other isolation features), and a root user in such environments has all caps enabled by default. We felt that in the common case, most users (and most software inside the container) expect root to be all powerful within the container (as it's on a real host or VM), so we wanted to avoid the burden of users having to specify I am thinking we can thread the needle with an approach such as:
The problem is that at Sysbox's level, we may not be able to discern if the user specified the caps. But I'll dig down further. Would such an approach work for x11docker? Thanks again! |
Thank you very much for considering this! It is impossible to decide by given capabilities if they are set manually or a default of docker/podman/whatever. One idea: Such an environment variable could even be used for further user configuration of Sysbox features I am not aware of yet and are not available by docker cli.
May I ask what these other isolation features are? You can just point me to a documentation link. |
It may be possible, let me investigate.
In general we want to avoid using environment vars to pass configs to Sysbox. It's a slippery slope that up to know we've been able to avoid (and want to continue to do so). |
Just to point out, I don't mean system wide environment variables (that can be slippery indeed), but those set on docker cli with option |
Yes understood; it's just that those container env variables can also be a slippery slope and we want to avoid them if possible. Let me research if we can avoid them, otherwise they are probably the best alterntive. |
Apologies for the silence on this, been busy with other Sysbox issues. Another approach we are considering here is to add a command line option to the sysbox-mgr daemon that would put Sysbox into a stricter OCI compliance mode. In this case, Sysbox would not enable all caps for the root user inside the container's user-ns by default, but rather honor those selected by the higher level manager (Docker, K8s, etc.) This would appeal to users that want OCI compatibility, at the detriment of having a more complex "docker run" command to launch the system container. Let me know if this sounds reasonable @mviereck. |
Thank you for your proposal! This is a possible way to go, but I see some disadvantages:
It would be a reasonable solution if the intention is to make the restricted setup a default in future (where all users would have to set A default restricted setup (in future) would allow to use Please don't take my thoughts too serious. After all, I can adjust x11docker for an unrestricted Sysbox setup. |
In the end, we decided to apply the solution as follows:
For example, container process is root:
Container process is non-root:
Special thanks to @mviereck for opening this issue and suggesting a good solution. |
Code has been merged to the master branch via these PRs: sysbox-mgr: nestybox/sysbox-mgr#56 Closing now. Please re-open if any more issues are found. |
Much thanks for your solution! It works very well here.
It might help to give some examples in the docs. |
Hi @ctalledo ,
I found some confusing behaviour of sysbox with options
--cap-add
and--cap-drop
.For many cases
--cap-drop ALL
is disregarded at all. Checked withcapsh --print | grep Current
in container.Example:
Compare with
--runtime=runc
:I found only one setup that indeed drops all capabilities:
However, e.g. adding
--cap-add SYS_BOOT
fails and does not appear.Dropping capabilities also fails if I don't use one of
--security-opt=no-new-privileges
or--user 1000:1000
.Expected behaviour:
capsh --print | grep Current
in container should show exactly the capabilities that are defined on CLI.The text was updated successfully, but these errors were encountered: