New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PodSecurityPolicy allowedHostPaths does not effectively restrict to a subpath #61043

Closed
liggitt opened this Issue Mar 12, 2018 · 6 comments

Comments

Projects
None yet
3 participants
@liggitt
Member

liggitt commented Mar 12, 2018

the allowedHostPaths feature limits what paths can be specified in a hostPath volume, but does not restrict symlink creation and traversal within that subpath

To prevent this, either of the following are required:

  • limit hostPath use to read only volumes (in progress in #58647)
  • limit hostPath use to exact path matches

Until those changes are made, PodSecurityPolicy objects designed to limit container permissions must completely disable hostPath volumes

@liggitt liggitt self-assigned this Mar 12, 2018

@liggitt liggitt changed the title from placeholder to PodSecurityPolicy allowedHostPaths does not effectively restrict to a subpath Mar 12, 2018

@k8s-ci-robot k8s-ci-robot removed the needs-sig label Mar 12, 2018

@liggitt

This comment has been minimized.

Member

liggitt commented Mar 12, 2018

@cyphar

This comment has been minimized.

cyphar commented Mar 20, 2018

tl;dr: I have an idea but it will require some layer violations between security policies and volumes.

I have a couple of ideas on how to solve this (I implemented quite a few of the symlink-related protections in Docker and runc), and I've started working on some preliminary patches. Unfortunately the more obvious ways of solving this problem aren't secure enough. One of the simpler ways of solving this problem would be:

  • Modify pkg/security/podsecuritypolicy/util/util.go to check path prefixes by expanding symlinks correctly (filepath.EvalSymlinks or https://github.com/cyphar/filepath-securejoin).

  • Modify pkg/volume/host_path/host_path.go to actually create a bind-mount of the path which is then used as the source of the bind-mount for the container. This allows the directory to be pinned after validation, so that once SetUp is called any symlink components changing in the original path given are irrelevant (which helps fix one TOCTTOU issue).

However, the main problem I see at the moment is that there is a trivial TOCTTOU with the above setup. If one of the components of the expanded path is changed between these two steps, then the path is no longer safe. This can be handled in a more complicated way by holding a reference to the directory and then applying the checks to that reference (using the reference in the SetUp code later), but now we run into many more problems -- this would probably be seen as a layer violation. Here is how it would have to look:

  • Ultimately the main protection here is to be able to have a dirfd which references the path that the user is referring to. We would have to open an O_PATH for each hostPath under consideration. We then recurse up .. until we hit / to construct the true path to consider for hostPath. We then apply the prefix checks on this constructed set of path components. None of the components will be actually evaluated, they are just used for policy checking (the path has already been evaluated). If preferred we can also do a "sanity check" to stop people from creating symlinks that point to a valid path to cause confusion. We then pass down the O_PATH descriptor to pkg/volume/host_path/host_path.go which then bind-mounts /proc/self/fd/$num to a temporary directory and then passes that path for container bind-mounting. If you feel bad about using /proc/self/fd this can also be done by spawning a dummy process and setting its /proc/self/cwd to the path in question.

Why is it so convoluted? Because we need a real reference to the path to avoid race conditions from mutating the a path that we operate on multiple times. Our recursion of .. will not contain symlink components (if you actually interface with the kernel -- your shell lies to you about your pwd). If a user manages to rename(2) the path to be outside of the allowedHostPaths after we've done that check, they still only get a bind-mount to the path they already had the ability to rename(2) inside the container (no other path is exposed).

Sorry if that's a bit of a brain-dump, but it's how I would suggest tackling this issue.

Also note that while https://github.com/cyphar/filepath-securejoin (the symlink escape handling code that Docker and runc use) is a fairly simple solution to this problem, it was explicitly not designed for cases where you may have adversaries that are modifying the filesystem in a way that makes TOCTTOU viable.
@liggitt

This comment has been minimized.

Member

liggitt commented Mar 21, 2018

PodSecurityPolicy enforcement is done at admission time, by the API server, not at container start time on the node. The only options I see available at admission time (without adding API fields to the pod spec and plumbing more information down to the kubelet) are letting PSP require the volume to be readonly or an exact path match.

@cyphar

This comment has been minimized.

cyphar commented Mar 21, 2018

Exact path matches won't help against symlink components changing. If you want to fix the issue in a complete way, you need to pass a file descriptor which can be used for mounting in some fashion (it's a shame that this will cause a layer violation, but the current design doesn't provide the security it claims to 😞). Read-only will only fix the problem if there isn't co-ordination from the host-side (if a user on the host can write to the allowedHostPaths then the issue is the same).

From my point of view, neither are really an adequate solution if you want to protect from races against the kubelet. If races against the kubelet aren't an issue, then you can just do the first proposal I listed. That proposal will still do enforcement at admission time, but the kubelet will take an additional precaution as soon as it can to avoid further symlink changes.

@liggitt

This comment has been minimized.

Member

liggitt commented Mar 22, 2018

Exact path matches won't help against symlink components changing

The presumption is that the restricted user does not have the ability to change parent components of the allowed path

Read-only will only fix the problem if there isn't co-ordination from the host-side (if a user on the host can write to the allowedHostPaths then the issue is the same).

This is not intended to protect against a user with host access.

@liggitt liggitt referenced this issue Apr 17, 2018

Closed

Issue with k8s.io/security/ #8065

1 of 2 tasks complete
@liggitt

This comment has been minimized.

Member

liggitt commented Jun 14, 2018

resolved by #58647

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment