Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[23.0 backport] Do not drop effective&permitted set #46222

Merged
merged 5 commits into from Aug 29, 2023

Conversation

thaJeztah
Copy link
Member

- What I did

daemon: stop setting container resources to zero

Many of the fields in LinuxResources struct are pointers to scalars for some reason, presumably to differentiate between set-to-zero and unset when unmarshaling from JSON, despite zero being outside the acceptable range for the corresponding kernel tunables. When creating the OCI spec for a container, the daemon sets the container's OCI spec CPUShares and BlkioWeight parameters to zero when the corresponding Docker container configuration values are zero, signifying unset, despite the minimum acceptable value for CPUShares being two, and BlkioWeight ten. This has gone unnoticed as runC does not distingiush set-to-zero from unset as it also uses zero internally to represent unset for those fields. However, kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource parameters to the container, exactly as instructed, and fails loudly. The OCI runtime-spec is silent on how the runtime should handle the case when those parameters are explicitly set to out-of-range values and kata's behaviour is not unreasonable, so the daemon must therefore be in the wrong.

Translate unset values in the Docker container's resources HostConfig to omit the corresponding fields in the container's OCI spec when starting and updating a container in order to maximize compatibility with runtimes.

Do not drop effective&permitted set

Currently moby drops ep sets before the entrypoint is executed. This does mean that with combination of no-new-privileges the file capabilities stops working with non-root containers. This is undesired as the usability of such containers is harmed comparing to running root containers.

This commit therefore sets the effective/permitted set in order to allow use of file capabilities or libcap(3)/prctl(2) respectively with combination of no-new-privileges and without respectively.

For no-new-privileges the container will be able to obtain capabilities that are requested.

- How I did it

- How to verify it

Use the below as Dockerfile

FROM alpine
RUN apk add --update libcap

RUN ls -la /usr/sbin/capsh
RUN setcap 'cap_sys_admin=ep' /usr/sbin/capsh

docker run --security-opt=no-new-privileges --user=100 --cap-add sys_admin <tag of the build Dockerfile> capsh --print

You should see Current: = cap_sys_admin+ep rather than Current:

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

corhere and others added 5 commits August 13, 2023 22:35
Switch to using t.TempDir() instead of rolling our own.

Clean up mounts leaked by the tests as otherwise the tests fail due to
the leaked mounts because unlike the old cleanup code, t.TempDir()
cleanup does not ignore errors from os.RemoveAll.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 9ff169c)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Many of the fields in LinuxResources struct are pointers to scalars for
some reason, presumably to differentiate between set-to-zero and unset
when unmarshaling from JSON, despite zero being outside the acceptable
range for the corresponding kernel tunables. When creating the OCI spec
for a container, the daemon sets the container's OCI spec CPUShares and
BlkioWeight parameters to zero when the corresponding Docker container
configuration values are zero, signifying unset, despite the minimum
acceptable value for CPUShares being two, and BlkioWeight ten. This has
gone unnoticed as runC does not distingiush set-to-zero from unset as it
also uses zero internally to represent unset for those fields. However,
kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource
parameters to the container, exactly as instructed, and fails loudly.
The OCI runtime-spec is silent on how the runtime should handle the case
when those parameters are explicitly set to out-of-range values and
kata's behaviour is not unreasonable, so the daemon must therefore be in
the wrong.

Translate unset values in the Docker container's resources HostConfig to
omit the corresponding fields in the container's OCI spec when starting
and updating a container in order to maximize compatibility with
runtimes.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit dea870f)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Audit the OCI spec options used for Linux containers to ensure they are
less order-dependent. Ensure they don't assume that any pointer fields
are non-nil and that they don't unintentionally clobber mutations to the
spec applied by other options.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 8a094fe)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Currently moby drops ep sets before the entrypoint is executed.
This does mean that with combination of no-new-privileges the
file capabilities stops working with non-root containers.
This is undesired as the usability of such containers is harmed
comparing to running root containers.

This commit therefore sets the effective/permitted set in order
to allow use of file capabilities or libcap(3)/prctl(2) respectively
with combination of no-new-privileges and without respectively.

For no-new-privileges the container will be able to obtain capabilities
that are requested.

Signed-off-by: Luboslav Pivarc <lpivarc@redhat.com>
Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
(cherry picked from commit 3aef732)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Verify non-root containers are able to use file
capabilities.

Signed-off-by: Luboslav Pivarc <lpivarc@redhat.com>
Co-authored-by: Cory Snider <csnider@mirantis.com>
Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 42fa7a1)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

24.0 has been out for over three months now and nobody has complained about file capabilities starting to work when used with no-new-privileges, so we're likely in the clear to backport the fixes.

@thaJeztah
Copy link
Member Author

Alright; let's bring this one in 👍

@thaJeztah thaJeztah merged commit 436bcf7 into moby:23.0 Aug 29, 2023
87 checks passed
@thaJeztah thaJeztah deleted the 23.0_backport_capabilites branch August 29, 2023 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants