Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: mount /sys/kernel/security into kubelet #8641

Merged
merged 1 commit into from
May 7, 2024

Conversation

andrewrynhard
Copy link
Member

This allows the kubelet to enforce AppArmor.

@@ -120,6 +120,7 @@ func (k *Kubelet) Runner(r runtime.Runtime) (runner.Runner, error) {
mounts := []specs.Mount{
{Type: "bind", Destination: "/dev", Source: "/dev", Options: []string{"rbind", "rshared", "rw"}},
{Type: "sysfs", Destination: "/sys", Source: "/sys", Options: []string{"bind", "ro"}},
{Type: "securityfs", Destination: "/sys/kernel/security", Source: "/sys/kernel/security", Options: []string{"bind", "ro"}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it need to be just "ro"? so kubelet only looks into it, but never configures anything?

do we want it to be backported or part of some future work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bind should be needed because that will propagate changes from the host down into the container (if anything does change over time I am unsure of).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it needs to be backported but if it wouldn't hurt then it could be useful in 1.7. I can check with the user.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only ro is technically needed for AppArmor since kubelet is checking if AppArmor is enabled via opencontainer apparmor lib: https://github.com/opencontainers/runc/blob/main/libcontainer/apparmor/apparmor_linux.go#L18

Which check if directory /sys/kernel/security/apparmor exists and a read on /sys/module/apparmor/parameters/enabled returns Y.
Both which should have their final value set by the time kubelet starts.
The part I'm not sure how bind handles is that /sys/kernel/security/is one of these virtual filesystems where this securityfs is a sort of interface to LSM modules in the kernel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think bind should be fine, it just exposes the filesystem at another mountpoint, but without bind we do a completely fresh mount.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do we want it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I think bind is better, but we also need to handle the case when the filesystem is not mounted on the host (if running in container mode). I believe in container mode we don't mount it explicitly (???) and rely on the container runtime to provide it or not... just to make sure we don't break Talos in containers if securityfs is not mounted on the container runtime host. I don't know if it's even possible though, just as a precaution :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, we don't mount securityfs in container mode:

default:
phases = phases.Append(
"systemRequirements",
EnforceKSPPRequirements,
SetupSystemDirectory,
MountBPFFS,
MountCgroups,
MountPseudoFilesystems,
SetRLimit,
).Append(

But as we do bind mount, I believe our code will just mount an empty directory into the kubelet if the host doesn't provide securityfs, so bind >> re-mount :)

@andrewrynhard
Copy link
Member Author

Since AppArmor landed with 1.7 kernel it probably makes sense to backport

@konrader
Copy link

Since AppArmor landed with 1.7 kernel it probably makes sense to backport

I think getting this in to v1.7 fits nicely with the bump to Kubernetes 1.30 that includes the move of AppArmor config to Pod SecurityContext fields. kubernetes/kubernetes#123435

@konrader
Copy link

I tried to validate if this fixes the problem with AppArmor configured on Pod causing it to get stuck in Pending with message Cannot enforce AppArmor: AppArmor is not enabled on the host.

I mounted the securityfs in kubelet using machine config:

machine:
    kubelet:
        extraMounts:
            - destination: /sys/kernel/security
              type: bind
              source: /sys/kernel/security
              options:
                - bind
                - ro

Starting a Pod with AppArmor profile specified (that I have loaded into the node Kernel beforehand):

spec:
  containers:
  - name: test
  ...
    securityContext:
      appArmorProfile:
        type: Localhost
        localhostProfile: myprofile

I no longer get the message ...AppArmor is not enabled on the host error but a new type of error: Error: failed to get container spec opts: failed to generate apparmor spec opts: apparmor is not supported

Feels like there is something else that also needs to be fixed around the kublet.

@smira
Copy link
Member

smira commented Apr 24, 2024

@konrader it might be easier if you could provide us with a full set of steps to launch a pod with AppArmor profile, which we could include into Talos as an integration test. This way we can develop a fix and a test.

@konrader
Copy link

I will write down the steps needed. It does include loading a AppArmor profile (with apparmor_parser cmdline tool) via a (somewhat) privileged container.

Where are the integration tests located today?

@smira
Copy link
Member

smira commented Apr 24, 2024

I will write down the steps needed. It does include loading a AppArmor profile (with apparmor_parser cmdline tool) via a (somewhat) privileged container.

Where are the integration tests located today?

in the internal/integration, but if that only needs some Kubernetes interactions, this is perfectly fine as well

@konrader
Copy link

Here is a complete script for doing the test of both loading AppArmor profile into kernel and running a Pod with profile enabled.

#!/bin/sh

NS=apparmor

kubectl create namespace $NS

kubectl label ns $NS pod-security.kubernetes.io/enforce=privileged

kubectl apply -n $NS -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: apparmor
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: apparmor
    image: ghcr.io/bifrostsec/apparmor-image/apparmor:22.04
    # Just spin & wait forever
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
    securityContext:
      privileged: true
      capabilities:
        add:
          - MAC_ADMIN
    volumeMounts:
      - name: sysfs
        mountPath: /host/sys
        readOnly: false
      - name: securityfs
        mountPath: /sys/kernel/security
        readOnly: false
  volumes:
    - name: sysfs
      hostPath:
        path: /sys
    - name: securityfs
      hostPath:
        path: /sys/kernel/security
EOF
kubectl -n $NS wait --for=condition=Ready pod/apparmor

# Copy over simple AppArmor audit profile
kubectl -n $NS cp audit.profile apparmor:tmp/audit.profile
# Load it into host Kernel
kubectl -n $NS exec apparmor -it -- apparmor_parser /tmp/audit.profile

kubectl apply -n $NS -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: test
    image: alpine:3.19.1
    # Just spin & wait forever
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
    securityContext:
      appArmorProfile:
        type: Localhost
        localhostProfile: audit-all
EOF

This depends on AppArmor profile being in a separate file named audit.profile with this content:

abi <abi/3.0>,

profile audit-all flags=(attach_disconnected,mediate_deleted,complain,audit) {
}

@konrader
Copy link

I've located the error failed to generate apparmor spec opts: apparmor is not supported to be coming from containerd https://github.com/containerd/containerd/blob/release/1.7/pkg/cri/server/container_create_linux.go#L505

which at the core relies on containerd's own check apparmor.HostSupports() to see if host supports AppArmor.
https://github.com/containerd/containerd/blob/release/1.7/pkg/apparmor/apparmor_linux.go#L34

This check does more than kubelet already does, notably it also checks for /sbin/apparmor_parser being present which it's not on a Talos node.

On Talos (or other Linux platforms) where AppArmor profile loading is done either from other user-land tool or from inside container this check is not necessary since containerd requires the profile to be loaded into kernel before you reference it for a container anyway.
There seems to be a bit of controversy around this with removing extra check and then reverting it back: containerd/containerd#8087

This PR is still a step in the right direction since it gets rid of the problem with kubelet not detecting if AppArmor is enabled on the node.

@smira
Copy link
Member

smira commented Apr 29, 2024

@konrader we can put an empty /sbin/apparmor_parser to the rootfs if that helps, but I don't fully understand how it broke Moby? would it break some workloads on Talos (not using AppArmor previously)?

@konrader
Copy link

I was actually thinking about adding empty /sbin/apparmor_parser to Talos rootfs to try but didn't have my build-instance up.
Not sure either what the connection is between containerd and moby in regards to AppArmor. Moby says on their website site that they use containerd as the default container runtime, but still it's weird that they don't accept referencing AppArmor profiles (that is already loaded into kernel) for containers unless apparmor_parser is present (indicating that userland AppArmor tools are installed). It's a bit like checking if your car repair shop exists before you can start your car.

@konrader
Copy link

I tried to add empty /sbin/apparmor_parser via machine config but that wasn't allowed.

machine:
    files:
        - content: ''
          permissions: 0o555
          path: /sbin/apparmor_parser
          op: create

Got this message during boot:
create operation not allowed outside of /var: "/sbin/apparmor_parser"

What is the easiest way to add the file in rootfs to try out if this would work?

@smira
Copy link
Member

smira commented Apr 30, 2024

What is the easiest way to add the file in rootfs to try out if this would work?

The only way is to rebuild Talos with changes to the Dockerfile to touch that file in the rootfs.

I can get to it with the steps you provided to build a test for AppArmor.

@konrader
Copy link

konrader commented Apr 30, 2024

@konrader we can put an empty /sbin/apparmor_parser to the rootfs if that helps, but I don't fully understand how it broke Moby? would it break some workloads on Talos (not using AppArmor previously)?

I'm looking into why Moby/Docker want containerd to have the extra check for /sbin/apparmor_parser.
It seems to be related to Moby applying a default AppArmor profile docker-default to started containers (unless overridden by security-opt option) https://docs.docker.com/engine/security/apparmor/#understand-the-policies

Moby removed their own AppArmor enabled check and started relying on containerd's check: moby/moby#42276
Moby uses the result of this check to see if they need to parse and load docker-default profile, which they need apparmor_parser for. https://github.com/moby/moby/blob/master/profiles/apparmor/apparmor.go#L59
So some of their users got problem starting containers when AppArmor was enabled in Kernel but AppArmor userland tools weren't installed.

I do see that the default profile loading stuff is put under /contrib in containerd and it could possible be a problem with missing apparmor_parser if someone deploys a workload on Talos specifying it to apply the runtime's default profile, like this:

securityContext:
  appArmorProfile:
    type: RuntimeDefault

Docs: https://kubernetes.io/docs/tutorials/security/apparmor/#securing-a-pod

I will try to verify this.

@konrader
Copy link

I could reproduce the scenario with Kubernetes 1.30 running on a Ubuntu 22.04 when I uninstalled AppArmor userland tools (apt remove apparmor and reboot since containerd caches the AppArmor enabled check result).

With no /sbin/apparmor_parser then I could not start Pod with AppArmor securityContext, got error message failed to generate apparmor spec opts: apparmor is not supported (same as on Talos after mounting /sys/kernel/security into kubelet).

I created an empty /sbin/apparmor_parser and when rebooting then various Kubernetes pods fails to start with error:
kubelet[1706]: E0430 16:35:26.865258 1706 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CreateContainerError: \"failed to create containerd container: load apparmor profile /tmp/cri-containerd.apparmor.d3636700873: parser error(\\\"\\\"): exec: \\\"apparmor_parser\\\": executable file not found in $PATH\""

So it seem like the CRI implementation in containerd do try to apply some default AppArmor profile for all containers (that does not have specific AppArmor profile specified in security option).

If we do want AppArmor support in Talos then I see the following options:

  1. Include real apparmor_parser in rootfs
  2. Disable applying default profile to containers in containerd CRI

@konrader
Copy link

konrader commented May 2, 2024

It does not seem possible to disable automatic applying of a default AppArmor profile (called cri-containerd.apparmor.d) in containerd for containers started via CRI (the interface Kubernetes uses) and it has deemed AppArmor enabled on host.
I assume this is to make all container runtimes have the same default behaviour: https://github.com/containerd/containerd/blob/main/internal/cri/server/container_create_linux.go#L202

If we would include apparmor_parser (1.5M binary) into the rootfs for everyone then we have the impact of people suddenly having their workloads running with a default profile without knowing it.
The default profile is quite open but it do enforce denying access to certain things that people might not prepared for, see profile template here: https://github.com/containerd/containerd/blob/main/contrib/apparmor/template.go#L42

@smira
Copy link
Member

smira commented May 2, 2024

@konrader thanks for your research here. I think having AppArmor (better security) is in the best spirit of Talos. We can think whether we want apparmor-parser as an extension for example, but I guess this becomes a bigger change, so would likely be in the timeframe of 1.8

This allows the kubelet to detect AppArmor.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
@smira
Copy link
Member

smira commented May 7, 2024

I created issue #8715 to track the actual AppArmor work, while this PR makes sense on its own, so should be good to merge

@smira
Copy link
Member

smira commented May 7, 2024

/m

@talos-bot talos-bot merged commit 4aac5b4 into siderolabs:main May 7, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants