Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: archive: pause containers before doing filesystem operations #39252

Closed
wants to merge 1 commit into from

Conversation

cyphar
Copy link
Contributor

@cyphar cyphar commented May 22, 2019

(Note: This PR was made public after discussions with the Docker security team, if you find a security vulnerability please report it directly to security@docker.com.)

There are certain classes of attacks (as evidenced in CVE-2018-15664)
which are caused by our allowing container processes to be executing
while we are doing filesystem operations on the container. In
particular, there are trivial TOCTOU races in symlink resolution and
scoping that can be exploited
.

The most complete solution to this problem would be to modify
chrootarchive so that all of the archive operations occur with the root
as the container rootfs (and not the parent directory, which is what
causes the vulnerability since the parent is attacker-controlled).
Unfortunately, changes to this core piece of Docker are almost
impossible (the TarUntar interface has many copies and reimplementations
that would all need to be modified to be able to handle a new "root"
argument).

So, we instead settle for the next-best option which is to pause the
container during our usage of the filesystem. This is far from an ideal
solution (you can image some attack scenarios such as shared volume
mounts) where this is ineffectual but it does block the most basic
attack.

I am currently working on some new kernel functionality that would allow
for much safer resolution of paths inside untrusted roots
, but as
above it would be difficult to modify Docker to use it. I am working on
adding support to filepath-securejoin though (however this will
require quite a few inteface changes).

Fixes: CVE-2018-15664
Signed-off-by: Aleksa Sarai asarai@suse.de

@cyphar
Copy link
Contributor Author

cyphar commented May 22, 2019

This is still a WIP since I still need to test that my reproducer no longer triggers the issue.

daemon/archive.go Outdated Show resolved Hide resolved
@cyphar cyphar force-pushed the archive-pause-containers branch 2 times, most recently from 8fbdb8c to 88799f6 Compare May 22, 2019 12:35
@thaJeztah
Copy link
Member

/cc @justincormack @cpuguy83

@cyphar cyphar force-pushed the archive-pause-containers branch 2 times, most recently from 78377ce to fede627 Compare May 23, 2019 13:17
@codecov
Copy link

codecov bot commented May 23, 2019

Codecov Report

Merging #39252 into master will increase coverage by 0.02%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master   #39252      +/-   ##
==========================================
+ Coverage   37.05%   37.08%   +0.02%     
==========================================
  Files         612      612              
  Lines       45478    45584     +106     
==========================================
+ Hits        16853    16904      +51     
- Misses      26341    26392      +51     
- Partials     2284     2288       +4

@thaJeztah
Copy link
Member

Wondering if we need a --pause option on docker cp, similar to what we have on docker commit / docker container commit;

 -p, --pause            Pause container during commit (default true)

(could be either --no-pause if the default should be true (which will be a breaking change likely), or --pause to make it optional)

@cyphar
Copy link
Contributor Author

cyphar commented May 23, 2019

Since this is a security fix I'm not sure how good of an idea it is to allow disabling it -- the attack scripts I have allow attackers to read and write to anywhere on the host filesystem as root. Adding footguns for their own sake feels like a bad idea.

docker commit -p is more of a functional feature to (in theory) allow for atomic snapshots on storage drivers which don't have them (like overlayfs).

@cyphar cyphar marked this pull request as ready for review May 23, 2019 13:43
@thaJeztah
Copy link
Member

Yes 😞 definitely understood; I can see this being a disruptive change though (as in; users that would currently would now seeing their container going offline). Surely, not really best practice if you need the actual container to get important data out 🤷‍♂, but still.

@cyphar
Copy link
Contributor Author

cyphar commented May 23, 2019

I can do some more testing to see how bad the impact is, but in theory we're talking about hundreds of milliseconds of downtime -- docker pause doesn't kill the container process after all it freezes the container's cgroup.

EDIT: Actually no, it could be pretty bad if you're copying large files...

@thaJeztah
Copy link
Member

I can do some more testing to see how bad the impact is, but in theory we're talking about hundreds of milliseconds of downtime -- docker pause doesn't kill the container process after all it freezes the container's cgroup.

Oh, I misunderstood, and thought the container would be paused during the whole copy operation 😊 😅. Should've taken the time to go through the whole flow. Sorry for that! Jumping between conversations 😊

@cyphar
Copy link
Contributor Author

cyphar commented May 23, 2019

No sorry, you are right -- it does apply to the entire copy operation. I forgot people copy pretty large files into and out of containers. Hmm, maybe we do need an off switch for this...

@AkihiroSuda
Copy link
Member

Pause doesnt work with rootless, any workaround?

@cyphar
Copy link
Contributor Author

cyphar commented May 23, 2019

For rootless we could just not do the pausing. As I mention in the commit message, this is a best-effort protection because (after spending several weeks working on it) I concluded it's effectively impossible to implement the actual protection necessary to defend against the attack in Docker. You would need to rewrite too many interfaces and too much glue code that it's not really doable anymore.

@olljanat
Copy link
Contributor

Derek add label: rebuild/janky

@cpuguy83
Copy link
Member

How come we aren't using chrootarchive here?

@cyphar
Copy link
Contributor Author

cyphar commented May 23, 2019

@cpuguy83 At the very end of the chain, chrootarchive is being used. The problem is that chrootarchive is being used to chroot into the parent directory of the directory where the extraction is occurring (which is attacker-controlled since it's inside the container). So, if you get hit by the race you end up chrooting into the host filesystem. As I said above, this could be fixed by adding a root parameter to all of the TarUntar interfaces but this turns out to be quite hard purely due to the number of changes needed all over the place.

There are certain classes of attacks (as evidenced in CVE-2018-15664)
which are caused by our allowing container processes to be executing
while we are doing filesystem operations on the container. In
particular, there are trivial TOCTOU races in symlink resolution and
scoping that can be exploited[1].

The most complete solution to this problem would be to modify
chrootarchive so that all of the archive operations occur with the root
as the container rootfs (and not the parent directory, which is what
causes the vulnerability since the parent is attacker-controlled).
Unfortunately, changes to this core piece of Docker are almost
impossible (the TarUntar interface has many copies and reimplementations
that would all need to be modified to be able to handle a new "root"
argument).

So, we instead settle for the next-best option which is to pause the
container during our usage of the filesystem. This is far from an ideal
solution (you can image some attack scenarios such as shared volume
mounts) where this is ineffectual but it does block the most basic
attack.

I am currently working on some new kernel functionality that would allow
for much safer resolution of paths inside untrusted roots[2], but as
above it would be difficult to modify Docker to use it. I am working on
adding support to filepath-securejoin[3] though (however this will
require quite a few inteface changes).

[1]: https://bugzilla.suse.com/show_bug.cgi?id=1096726
[2]: https://lwn.net/Articles/767547/ [outdated API]
[3]: https://github.com/cyphar/filepath-securejoin

Fixes: CVE-2018-15664
Signed-off-by: Aleksa Sarai <asarai@suse.de>
@cpuguy83
Copy link
Member

Thinking we could add a flag to set the daemon level default (--default-pause-on-cp)

Or alternatively we could add a flag which requires the user to set pause on cp (--require-puase-on-cp) and error out if pause is not specified.


I will be happy to look at updating interfaces to make this better.
I was already considering building an external tool to deal with container filesystem stuff for cases like docker cp, because it's pretty horrific today with manipulating mount namespaces in the root namespace and other things... which to provide any real benefit requires updating these interfaces anyway.

@cpuguy83
Copy link
Member

Just a thought, we could take a temporary snapshot on the fs if the container is running (the if part is just for optimization).
I'm not sure if we would even need to modify the layer store/graphdriver api's for this.

/cc @dmcgowan

@arno01
Copy link

arno01 commented May 30, 2019

Are there real use cases people really need docker cp and cannot live without it?

I am just saying if that is too complicated to fix and is too dangerous to leave, I would opt for having some flag like --enable-insecure-docker-cp for dockerd, but this is only because I am having in mind docker cp is a handy feature only for quick dev tests and that if someone needs to copy some data from the container, should rather use a shared docker volume or use other tools like nsenter.

@justincormack
Copy link
Contributor

@arno01 it is hard to know how people are using docker cp; I hope people are not using it in production, just for dev. If this is the case, adding an automatic pause seems an ok mitigation without any additional changes for now.

@arno01
Copy link

arno01 commented May 30, 2019

I am afraid that pausing (SIGSTOP) containers is also a breaking change as it will lead to the error prone situations such as when someone is docker cp'ing something big as usual (for him, in a prod, maybe even scripted or just one time ad-hoc copy) will unexpectedly start pausing the running process which in turn (if left paused for too long) may break all network connections (or socket's buffer can fill up) to the service running in that container.

WDYT?

@cpuguy83
Copy link
Member

I have implemented chrootarchive within a specified root (the container's root) here: #39292

The change itself is fairly isolated (2 new functions in chrootarchive).

@cyphar
Copy link
Contributor Author

cyphar commented May 30, 2019

I'm closing this PR in favour of that one.

@cpuguy83 I should've been more specific -- the difficulty I had was in trying to fix and restructure all the users of FollowSymlinkInScope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants