New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix overlay2 storage driver inside a user namespace #35794

Merged
merged 1 commit into from Dec 16, 2017

Conversation

Projects
None yet
7 participants
@estesp
Contributor

estesp commented Dec 14, 2017

The overlay2 driver was not setting up the archive.TarOptions field
for userns properly, like other storage backend routes to "applyTarLayer"
functionality are. The InUserNS field is now populated for overlay2 using
the same query function used by the other storage drivers.

Signed-off-by: Phil Estes estesp@linux.vnet.ibm.com

Fixes: #35245

Fix overlay2 storage driver inside a user namespace
The overlay2 driver was not setting up the archive.TarOptions field
properly like other storage backend routes to "applyTarLayer"
functionality. The InUserNS field is populated now for overlay2 using
the same query function used by the other storage drivers.

Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Dec 14, 2017

Member

Would it be able to add a test to this?

Member

thaJeztah commented Dec 14, 2017

Would it be able to add a test to this?

@thaJeztah thaJeztah removed this from backlog in maintainers-session Dec 14, 2017

@kolyshkin

This comment has been minimized.

Show comment
Hide comment
@kolyshkin

kolyshkin Dec 14, 2017

Contributor

@estesp can you please add a test case?

Also, speaking of other storage drivers, I see that

  1. overlay isusing func (g *NaiveDiffDriver) ApplyDiff() which passes archive.TarOptions already prepopulated to chrootarchive.ApplyUncompressedLayer() which calls chrootarchive.applyLayerHandler() which has the following code:
        if options == nil {
                options = &archive.TarOptions{}
                if rsystem.RunningInUserNS() {
                        options.InUserNS = true
                }
        }

So, it only sets options.InUserNS if options == nil which is not the case when it is used from NaiveDiffDriver's ApplyDiff.

  1. devmapper, btrfs, zfs and vfs are all using NaiveDiffDriver's implementation too, so they all suffer from the same issue.

  2. aufs is not supported inside userns.

So, based on this quick glance, it needs to be fixed as well (and a generic graphdriver test case would help to reveal the bug). The fix would be something like this:

diff --git a/daemon/graphdriver/fsdiff.go b/daemon/graphdriver/fsdiff.go
index 533c5a769..5e98d980a 100644
--- a/daemon/graphdriver/fsdiff.go
+++ b/daemon/graphdriver/fsdiff.go
@@ -8,6 +8,7 @@ import (
        "github.com/docker/docker/pkg/chrootarchive"
        "github.com/docker/docker/pkg/idtools"
        "github.com/docker/docker/pkg/ioutils"
+       rsystem "github.com/opencontainers/runc/libcontainer/system"
        "github.com/sirupsen/logrus"
 )
 
@@ -142,8 +143,11 @@ func (gdw *NaiveDiffDriver) ApplyDiff(id, parent string, diff io.Reader) (size i
        defer driver.Put(id)
 
        layerFs := layerRootFs.Path()
-       options := &archive.TarOptions{UIDMaps: gdw.uidMaps,
-               GIDMaps: gdw.gidMaps}
+       options := &archive.TarOptions{
+               UIDMaps:  gdw.uidMaps,
+               GIDMaps:  gdw.gidMaps,
+               InUserNS: rsystem.RunningInUserNS(),
+       }
        start := time.Now().UTC()
        logrus.Debug("Start untar layer")
        if size, err = ApplyUncompressedLayer(layerFs, diff, options); err != nil {
Contributor

kolyshkin commented Dec 14, 2017

@estesp can you please add a test case?

Also, speaking of other storage drivers, I see that

  1. overlay isusing func (g *NaiveDiffDriver) ApplyDiff() which passes archive.TarOptions already prepopulated to chrootarchive.ApplyUncompressedLayer() which calls chrootarchive.applyLayerHandler() which has the following code:
        if options == nil {
                options = &archive.TarOptions{}
                if rsystem.RunningInUserNS() {
                        options.InUserNS = true
                }
        }

So, it only sets options.InUserNS if options == nil which is not the case when it is used from NaiveDiffDriver's ApplyDiff.

  1. devmapper, btrfs, zfs and vfs are all using NaiveDiffDriver's implementation too, so they all suffer from the same issue.

  2. aufs is not supported inside userns.

So, based on this quick glance, it needs to be fixed as well (and a generic graphdriver test case would help to reveal the bug). The fix would be something like this:

diff --git a/daemon/graphdriver/fsdiff.go b/daemon/graphdriver/fsdiff.go
index 533c5a769..5e98d980a 100644
--- a/daemon/graphdriver/fsdiff.go
+++ b/daemon/graphdriver/fsdiff.go
@@ -8,6 +8,7 @@ import (
        "github.com/docker/docker/pkg/chrootarchive"
        "github.com/docker/docker/pkg/idtools"
        "github.com/docker/docker/pkg/ioutils"
+       rsystem "github.com/opencontainers/runc/libcontainer/system"
        "github.com/sirupsen/logrus"
 )
 
@@ -142,8 +143,11 @@ func (gdw *NaiveDiffDriver) ApplyDiff(id, parent string, diff io.Reader) (size i
        defer driver.Put(id)
 
        layerFs := layerRootFs.Path()
-       options := &archive.TarOptions{UIDMaps: gdw.uidMaps,
-               GIDMaps: gdw.gidMaps}
+       options := &archive.TarOptions{
+               UIDMaps:  gdw.uidMaps,
+               GIDMaps:  gdw.gidMaps,
+               InUserNS: rsystem.RunningInUserNS(),
+       }
        start := time.Now().UTC()
        logrus.Debug("Start untar layer")
        if size, err = ApplyUncompressedLayer(layerFs, diff, options); err != nil {
@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Dec 14, 2017

Contributor

A testcase that tests the RunningInUserNS code path is difficult as it requires a Docker engine running "containerized" inside a user namespace enabled container. I do that locally via LXC, and I assume it should be possible with some form of DinD setup, but I gave up last time I tried to get the inner Docker engine to start properly. The LXC team does do nightly validation that moby/moby hasn't broken running in a user namespace, but apparently aren't using the overlay2 driver or they probably would have caught this.

@brauner is there a public place to see the daily CI for LXC+Docker/moby daemon details?

If @kolyshkin or @thaJeztah have ideas on how to test this with our CI setup let me know. I definitely tried when this support initially came in via @hallyn's PR but found the dependencies (e.g. run via LXC) made it not so simple to add to our current framework.

Contributor

estesp commented Dec 14, 2017

A testcase that tests the RunningInUserNS code path is difficult as it requires a Docker engine running "containerized" inside a user namespace enabled container. I do that locally via LXC, and I assume it should be possible with some form of DinD setup, but I gave up last time I tried to get the inner Docker engine to start properly. The LXC team does do nightly validation that moby/moby hasn't broken running in a user namespace, but apparently aren't using the overlay2 driver or they probably would have caught this.

@brauner is there a public place to see the daily CI for LXC+Docker/moby daemon details?

If @kolyshkin or @thaJeztah have ideas on how to test this with our CI setup let me know. I definitely tried when this support initially came in via @hallyn's PR but found the dependencies (e.g. run via LXC) made it not so simple to add to our current framework.

@brauner

This comment has been minimized.

Show comment
Hide comment
@brauner

brauner Dec 14, 2017

Contributor

@brauner is there a public place to see the daily CI for LXC+Docker/moby daemon details?

@estesp, yeah, sure it's at https://jenkins.linuxcontainers.org/view/LXD/view/LXD%20tests/job/lxd-test-docker/
you can then click on the build number and select privileged or unprivileged, e.g.:

https://jenkins.linuxcontainers.org/view/LXD/view/LXD%20tests/job/lxd-test-docker/280/arch=amd64,mode=unprivileged,restrict=lxd-priv/console

Contributor

brauner commented Dec 14, 2017

@brauner is there a public place to see the daily CI for LXC+Docker/moby daemon details?

@estesp, yeah, sure it's at https://jenkins.linuxcontainers.org/view/LXD/view/LXD%20tests/job/lxd-test-docker/
you can then click on the build number and select privileged or unprivileged, e.g.:

https://jenkins.linuxcontainers.org/view/LXD/view/LXD%20tests/job/lxd-test-docker/280/arch=amd64,mode=unprivileged,restrict=lxd-priv/console

@stgraber

This comment has been minimized.

Show comment
Hide comment
@stgraber

stgraber Dec 14, 2017

And the actual test script is https://github.com/lxc/lxc-ci/blob/master/bin/test-lxd-docker it's pretty crude but it does the job :)

stgraber commented Dec 14, 2017

And the actual test script is https://github.com/lxc/lxc-ci/blob/master/bin/test-lxd-docker it's pretty crude but it does the job :)

@kolyshkin

This comment has been minimized.

Show comment
Hide comment
@kolyshkin

kolyshkin Dec 14, 2017

Contributor

I still don't understand how the other graphdrivers work inside userns (if my analysis are correct and they do not set options.InUserNS=true).

Contributor

kolyshkin commented Dec 14, 2017

I still don't understand how the other graphdrivers work inside userns (if my analysis are correct and they do not set options.InUserNS=true).

@brauner

This comment has been minimized.

Show comment
Hide comment
@brauner

brauner Dec 14, 2017

Contributor

I still don't understand how the other graphdrivers work inside userns (if my analysis are correct and they do not set options.InUserNS=true).

Should always fallback to the vfs driver if other drivers don't work, no?

Contributor

brauner commented Dec 14, 2017

I still don't understand how the other graphdrivers work inside userns (if my analysis are correct and they do not set options.InUserNS=true).

Should always fallback to the vfs driver if other drivers don't work, no?

@brauner

This comment has been minimized.

Show comment
Hide comment
@brauner

brauner Dec 14, 2017

Contributor

That's at least how I would've implemented it.

Contributor

brauner commented Dec 14, 2017

That's at least how I would've implemented it.

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Dec 14, 2017

Contributor

@kolyshkin I think you are correct that a more airtight fix across drivers needs more investigation. To your question "how does it work anywhere?" I think there is some complexity to the layers at which inUserNs checks are being applied, including at the tarfile apply "write a single tar entry out to file" step (where yet another check exists), so that may be part of the answer. I think one of us needs to more deeply review this whole flow of options object, inuserns checks, and that interaction with the various graphdrivers.

This PR as it stands may only be a stopgap to fix one glaring issue, but maybe we want to clean up this overall impl across graphdrivers?

Contributor

estesp commented Dec 14, 2017

@kolyshkin I think you are correct that a more airtight fix across drivers needs more investigation. To your question "how does it work anywhere?" I think there is some complexity to the layers at which inUserNs checks are being applied, including at the tarfile apply "write a single tar entry out to file" step (where yet another check exists), so that may be part of the answer. I think one of us needs to more deeply review this whole flow of options object, inuserns checks, and that interaction with the various graphdrivers.

This PR as it stands may only be a stopgap to fix one glaring issue, but maybe we want to clean up this overall impl across graphdrivers?

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Dec 15, 2017

Contributor

@kolyshkin I don't have the exact reasoning down to a codepath, but just FYI, vfs and overlay work fine with today's code; only overlay2 fails and is fixed by this PR.

I can't test devicemapper and aufs as both are not supported inside a user namespace. Will try btrfs and zfs if possible.

Contributor

estesp commented Dec 15, 2017

@kolyshkin I don't have the exact reasoning down to a codepath, but just FYI, vfs and overlay work fine with today's code; only overlay2 fails and is fixed by this PR.

I can't test devicemapper and aufs as both are not supported inside a user namespace. Will try btrfs and zfs if possible.

@kolyshkin

This comment has been minimized.

Show comment
Hide comment
@kolyshkin

kolyshkin Dec 15, 2017

Contributor

And the actual test script is https://github.com/lxc/lxc-ci/blob/master/bin/test-lxd-docker it's pretty crude but it does the job :)

Thank you @stgraber, I sent a trivial PR to improve it a bit: lxc/lxc-ci#18

In particular, I am interested in what graph driver is used (docker info shows it).

Contributor

kolyshkin commented Dec 15, 2017

And the actual test script is https://github.com/lxc/lxc-ci/blob/master/bin/test-lxd-docker it's pretty crude but it does the job :)

Thank you @stgraber, I sent a trivial PR to improve it a bit: lxc/lxc-ci#18

In particular, I am interested in what graph driver is used (docker info shows it).

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Dec 15, 2017

Contributor

Just FYI: on my lxc ubuntu:16.04 instance, using the "standard" Docker CE install (apt-add repo path), I end up with overlay2 as the default today.

Contributor

estesp commented Dec 15, 2017

Just FYI: on my lxc ubuntu:16.04 instance, using the "standard" Docker CE install (apt-add repo path), I end up with overlay2 as the default today.

@kolyshkin

This comment has been minimized.

Show comment
Hide comment
@kolyshkin

kolyshkin Dec 15, 2017

Contributor

Should always fallback to the vfs driver if other drivers don't work, no?

True, but the amount of checks a graph driver can do during its initialization to see whether it can work correctly is limited (in other words, we can not afford trying to create, mount and start a multi-layer container during every graph driver init to see if it really works). So, we do some basic checks like if the fs is supported, if needed binaries are there, if the kernel is sufficient etc, and the check if gd is working inside user ns is not performed.

Contributor

kolyshkin commented Dec 15, 2017

Should always fallback to the vfs driver if other drivers don't work, no?

True, but the amount of checks a graph driver can do during its initialization to see whether it can work correctly is limited (in other words, we can not afford trying to create, mount and start a multi-layer container during every graph driver init to see if it really works). So, we do some basic checks like if the fs is supported, if needed binaries are there, if the kernel is sufficient etc, and the check if gd is working inside user ns is not performed.

@kolyshkin

This comment has been minimized.

Show comment
Hide comment
@kolyshkin

kolyshkin Dec 15, 2017

Contributor

Hmm, as per #35245, overlay is working inside userns while overlay2 does not. While the fix looks correct to me, I still fail to understand how overlay is working, as it looks like it needs a very similar fix (see #35794 (comment)).

I will do some tests.

Contributor

kolyshkin commented Dec 15, 2017

Hmm, as per #35245, overlay is working inside userns while overlay2 does not. While the fix looks correct to me, I still fail to understand how overlay is working, as it looks like it needs a very similar fix (see #35794 (comment)).

I will do some tests.

@kolyshkin

This comment has been minimized.

Show comment
Hide comment
@kolyshkin

kolyshkin Dec 15, 2017

Contributor

OK I found why overlay works: applyLayer sets the option by itself:

inUserns := rsystem.RunningInUserNS()

...
if inUserns {
options.InUserNS = true
}

Now it's good to figure out why overlay2 needs this option explicitly set.

Contributor

kolyshkin commented Dec 15, 2017

OK I found why overlay works: applyLayer sets the option by itself:

inUserns := rsystem.RunningInUserNS()

...
if inUserns {
options.InUserNS = true
}

Now it's good to figure out why overlay2 needs this option explicitly set.

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Dec 15, 2017

Contributor

@kolyshkin

Now it's good to figure out why overlay2 needs this option explicitly set.

Because for whatever reason overlay2 doesn't go through applyLayer at all. It links up it's archive untar to chrootarchive.UntarUncompressed which ends up at the docker-untar reexec, not the docker-applyLayer reexec as overlay does.

Contributor

estesp commented Dec 15, 2017

@kolyshkin

Now it's good to figure out why overlay2 needs this option explicitly set.

Because for whatever reason overlay2 doesn't go through applyLayer at all. It links up it's archive untar to chrootarchive.UntarUncompressed which ends up at the docker-untar reexec, not the docker-applyLayer reexec as overlay does.

@kolyshkin

This comment has been minimized.

Show comment
Hide comment
@kolyshkin

kolyshkin Dec 15, 2017

Contributor

OK, it's all clear to me now.

Overlay call stack is:

  • overlay.ApplyDiff() calls
  • chrootarchive.ApplyUncompressedLayer() calls
  • chrootarchive.applyLayerHandler() (which sets options.InUserNS in case options == nil) calls
  • chrootarchive.applyLayer() (which sets options.InUserNS even if it was set before) calls
  • archive.UnpackLayer() calls
  • archive.createTarFile() for every file found in a tar stream

Overlay2 call stack is:

  • overlay2.ApplyDiff() calls
  • chrootarchive.UntarUncompressed() calls
  • chrootarchive.untarHandler() calls
  • chrootarchive.invokeUnpack() calls
  • chrootarchive.untar() calls
  • archive.Unpack() calls
  • archive.createTarFile() for every file found in a tar stream

So, overlay works because its ApplyDiff() call stack involves chrootarchive.applyLayer() which sets InUserNS, and overlay2 didn't work because its ApplyDiff() doesn't have that.

So, this patch LGTM.

As a side note, it's interesting why code paths differ so much, and what is the difference between these.

Contributor

kolyshkin commented Dec 15, 2017

OK, it's all clear to me now.

Overlay call stack is:

  • overlay.ApplyDiff() calls
  • chrootarchive.ApplyUncompressedLayer() calls
  • chrootarchive.applyLayerHandler() (which sets options.InUserNS in case options == nil) calls
  • chrootarchive.applyLayer() (which sets options.InUserNS even if it was set before) calls
  • archive.UnpackLayer() calls
  • archive.createTarFile() for every file found in a tar stream

Overlay2 call stack is:

  • overlay2.ApplyDiff() calls
  • chrootarchive.UntarUncompressed() calls
  • chrootarchive.untarHandler() calls
  • chrootarchive.invokeUnpack() calls
  • chrootarchive.untar() calls
  • archive.Unpack() calls
  • archive.createTarFile() for every file found in a tar stream

So, overlay works because its ApplyDiff() call stack involves chrootarchive.applyLayer() which sets InUserNS, and overlay2 didn't work because its ApplyDiff() doesn't have that.

So, this patch LGTM.

As a side note, it's interesting why code paths differ so much, and what is the difference between these.

@dmcgowan

This comment has been minimized.

Show comment
Hide comment
@dmcgowan

dmcgowan Dec 15, 2017

Member

@kolyshkin they differ because ovleray2 does not use the naive driver. The naive driver mounts, then applies the tar including doing removes for whiteouts. Overlay2 skips the mounts and just directly unpacks, transforming the aufs style whiteouts from the tar to overlayfs whiteouts. The main reason for using different codes paths was both performance, and to resolve a bug in naive driver that would show up with some of the faster graph drivers. IMO performance should be secondary and we should just have overlay2 use naive diff driver in all cases.

Member

dmcgowan commented Dec 15, 2017

@kolyshkin they differ because ovleray2 does not use the naive driver. The naive driver mounts, then applies the tar including doing removes for whiteouts. Overlay2 skips the mounts and just directly unpacks, transforming the aufs style whiteouts from the tar to overlayfs whiteouts. The main reason for using different codes paths was both performance, and to resolve a bug in naive driver that would show up with some of the faster graph drivers. IMO performance should be secondary and we should just have overlay2 use naive diff driver in all cases.

@dmcgowan

LGTM

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Dec 16, 2017

Member

IMO performance should be secondary and we should just have overlay2 use naive diff driver in all cases.

@kolyshkin would that be something you could work on?

Member

thaJeztah commented Dec 16, 2017

IMO performance should be secondary and we should just have overlay2 use naive diff driver in all cases.

@kolyshkin would that be something you could work on?

@thaJeztah

LGTM

@thaJeztah thaJeztah merged commit 0862014 into moby:master Dec 16, 2017

6 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 38376 has succeeded
Details
janky Jenkins build Docker-PRs 47113 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 7496 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 18681 has succeeded
Details
z Jenkins build Docker-PRs-s390x 7319 has succeeded
Details
@kolyshkin

This comment has been minimized.

Show comment
Hide comment
@kolyshkin

kolyshkin Dec 18, 2017

Contributor

and to resolve a bug in naive driver that would show up with some of the faster graph drivers

@dmcgowan was/is it related to lack of sub-second time-stamps and inability to distinguish between changed files because of that?

Contributor

kolyshkin commented Dec 18, 2017

and to resolve a bug in naive driver that would show up with some of the faster graph drivers

@dmcgowan was/is it related to lack of sub-second time-stamps and inability to distinguish between changed files because of that?

@dmcgowan

This comment has been minimized.

Show comment
Hide comment
@dmcgowan

dmcgowan Dec 19, 2017

Member

@kolyshkin yes, that one. There is a work around for it now that fixes the builder issue.

Member

dmcgowan commented Dec 19, 2017

@kolyshkin yes, that one. There is a work around for it now that fixes the builder issue.

@estesp estesp deleted the estesp:fix-overlay2-untarinuserns branch Dec 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment