New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running the Engine daemon inside a user namespace #20902

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
@hallyn
Contributor

hallyn commented Mar 3, 2016

Two commits to accomodate things we cannot do when running docker from a container which is in a user namespace. Combined with one more patch to libcontainer itself, this suffices to run docker inside a user namespaced container.

@vdemeester

This comment has been minimized.

Show comment
Hide comment
@vdemeester

vdemeester Mar 3, 2016

Member

Hi @hallyn, thanks for contribution. Could you do the following :

  • Using a better title for the PR "2016 03 02/nest.partial" doesn't really help us to know what it relates to ("Run docker daemon inside a docker namespace enabled engine" could be better).
  • We do not change anything that is in vendor/ directory is not, changes on those must be done upstream (in opencontainers/runc in your case).
  • Please use gofmt -s -w on the file changed
09:05:06 ---> Making bundle: validate-gofmt (in bundles/1.11.0-dev/validate-gofmt)
09:05:07 These files are not properly gofmt'd:
09:05:07  - pkg/archive/archive.go
Member

vdemeester commented Mar 3, 2016

Hi @hallyn, thanks for contribution. Could you do the following :

  • Using a better title for the PR "2016 03 02/nest.partial" doesn't really help us to know what it relates to ("Run docker daemon inside a docker namespace enabled engine" could be better).
  • We do not change anything that is in vendor/ directory is not, changes on those must be done upstream (in opencontainers/runc in your case).
  • Please use gofmt -s -w on the file changed
09:05:06 ---> Making bundle: validate-gofmt (in bundles/1.11.0-dev/validate-gofmt)
09:05:07 These files are not properly gofmt'd:
09:05:07  - pkg/archive/archive.go
@hallyn

This comment has been minimized.

Show comment
Hide comment
@hallyn

hallyn Mar 3, 2016

Contributor

D'oh! the third patch (which edits vendor/) was supposed to be removed and is in fact already a PR against opencontainers/runc. Removing it for real now.

Contributor

hallyn commented Mar 3, 2016

D'oh! the third patch (which edits vendor/) was supposed to be removed and is in fact already a PR against opencontainers/runc. Removing it for real now.

@hallyn hallyn changed the title from 2016 03 02/nest.partial to Support running inside a user namespace Mar 3, 2016

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 3, 2016

Member

ping @estesp for user namespaces

Member

thaJeztah commented Mar 3, 2016

ping @estesp for user namespaces

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 3, 2016

Member

Got panics on CI

09:14:51 Error response from daemon: Untar re-exec error: exit status 2: output: panic: runtime error: invalid memory address or nil pointer dereference
09:14:51 [signal 0xb code=0x1 addr=0x88 pc=0x90bcf6]
Member

thaJeztah commented Mar 3, 2016

Got panics on CI

09:14:51 Error response from daemon: Untar re-exec error: exit status 2: output: panic: runtime error: invalid memory address or nil pointer dereference
09:14:51 [signal 0xb code=0x1 addr=0x88 pc=0x90bcf6]
Show outdated Hide outdated pkg/chrootarchive/archive_unix.go Outdated
@hallyn

This comment has been minimized.

Show comment
Hide comment
@hallyn

hallyn Mar 3, 2016

Contributor

Hm, I don't know what happened there, that doesn't look like my original patch... checking.

Contributor

hallyn commented Mar 3, 2016

Hm, I don't know what happened there, that doesn't look like my original patch... checking.

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Mar 3, 2016

Contributor

Last step here is to handle cross-platform issue with the RunningInUserNS:

19:09:09 pkg/archive/archive_unix.go:84: undefined: "github.com/opencontainers/runc/libcontainer/system".RunningInUserNS

Running make cross will reveal the issue (the libcontainer/system/linux.go functions are only compiled on linux, but pkg/archive/archive_unix.go is all Unix-like systems).

I think the cleanest way is a runC PR to have a stub for non-Linux that returns false automatically.

Contributor

estesp commented Mar 3, 2016

Last step here is to handle cross-platform issue with the RunningInUserNS:

19:09:09 pkg/archive/archive_unix.go:84: undefined: "github.com/opencontainers/runc/libcontainer/system".RunningInUserNS

Running make cross will reveal the issue (the libcontainer/system/linux.go functions are only compiled on linux, but pkg/archive/archive_unix.go is all Unix-like systems).

I think the cleanest way is a runC PR to have a stub for non-Linux that returns false automatically.

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Mar 3, 2016

Contributor

@hallyn I created opencontainers/runc#620 to solve this; I'm hoping to vendor in opencontainers/runc very soon for the shared namespace work that is now in libcontainer/nsenter; so hopefully we can get this solved ASAP.

Contributor

estesp commented Mar 3, 2016

@hallyn I created opencontainers/runc#620 to solve this; I'm hoping to vendor in opencontainers/runc very soon for the shared namespace work that is now in libcontainer/nsenter; so hopefully we can get this solved ASAP.

@hallyn

This comment has been minimized.

Show comment
Hide comment
@hallyn

hallyn Mar 4, 2016

Contributor

Thanks @estesp ! Do I understand right that I should just wait for #620 to clear?

Contributor

hallyn commented Mar 4, 2016

Thanks @estesp ! Do I understand right that I should just wait for #620 to clear?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 4, 2016

Member

@hallyn you can include the change from opencontainers/runc#620 in your PR by adding the file here https://github.com/docker/docker/tree/master/vendor/src/github.com/opencontainers/runc/libcontainer/system. This will make the "vendor" CI check fail, but the other tests should then complete successfully. Once the RunC PR is merged, and vendored, you can rebase your PR to make the vendor check pass

Member

thaJeztah commented Mar 4, 2016

@hallyn you can include the change from opencontainers/runc#620 in your PR by adding the file here https://github.com/docker/docker/tree/master/vendor/src/github.com/opencontainers/runc/libcontainer/system. This will make the "vendor" CI check fail, but the other tests should then complete successfully. Once the RunC PR is merged, and vendored, you can rebase your PR to make the vendor check pass

@hallyn

This comment has been minimized.

Show comment
Hide comment
@hallyn

hallyn Mar 5, 2016

Contributor

Thanks, done, will watch for the runc pr merge...

Contributor

hallyn commented Mar 5, 2016

Thanks, done, will watch for the runc pr merge...

@@ -49,6 +51,10 @@ func applyLayer() {
fatal(err)
}
if inUserns {
options.InUserNS = true

This comment has been minimized.

@tianon

tianon Mar 8, 2016

Member

options is nil until the line further down, so this fails with panic: runtime error: invalid memory address or nil pointer dereference 😄

@tianon

tianon Mar 8, 2016

Member

options is nil until the line further down, so this fails with panic: runtime error: invalid memory address or nil pointer dereference 😄

This comment has been minimized.

@tianon

tianon Mar 8, 2016

Member

actually, even worse -- options doesn't appear to ever be set to non-nil within this function

@tianon

tianon Mar 8, 2016

Member

actually, even worse -- options doesn't appear to ever be set to non-nil within this function

This comment has been minimized.

@hallyn

hallyn Mar 8, 2016

Contributor

? the diff I have here has options.InUserNS = true happening after the unmarshalling

@hallyn

hallyn Mar 8, 2016

Contributor

? the diff I have here has options.InUserNS = true happening after the unmarshalling

This comment has been minimized.

@tianon

tianon Mar 8, 2016

Member

Hmm, @mwhudson managed to trigger a panic here on s390x, so I wonder if there's some edge case where json.Unmarshal doesn't create the struct?

@tianon

tianon Mar 8, 2016

Member

Hmm, @mwhudson managed to trigger a panic here on s390x, so I wonder if there's some edge case where json.Unmarshal doesn't create the struct?

This comment has been minimized.

@tianon

tianon Mar 8, 2016

Member

Ah nevermind, that's a different set of changes from v1.10.2...hallyn:v1.10.0.serge.2diff-4166b9ad558bc9c8d0ff7b01b69c128aR34 causing that one -- carry on! 👍

@tianon

tianon Mar 8, 2016

Member

Ah nevermind, that's a different set of changes from v1.10.2...hallyn:v1.10.0.serge.2diff-4166b9ad558bc9c8d0ff7b01b69c128aR34 causing that one -- carry on! 👍

This comment has been minimized.

@hallyn

hallyn Mar 8, 2016

Contributor

Quoting Tianon Gravi (notifications@github.com):

@@ -49,6 +51,10 @@ func applyLayer() {
fatal(err)
}

  • if inUserns {
  •   options.InUserNS = true
    

(even still, couldn't this just be options.InUserNS = inUserns? or does setting it regardless like that have other consequences?)

nu, bc we must detect it before the chroot.

@hallyn

hallyn Mar 8, 2016

Contributor

Quoting Tianon Gravi (notifications@github.com):

@@ -49,6 +51,10 @@ func applyLayer() {
fatal(err)
}

  • if inUserns {
  •   options.InUserNS = true
    

(even still, couldn't this just be options.InUserNS = inUserns? or does setting it regardless like that have other consequences?)

nu, bc we must detect it before the chroot.

This comment has been minimized.

@hallyn

hallyn Mar 8, 2016

Contributor

Quoting Tianon Gravi (notifications@github.com):

@@ -49,6 +51,10 @@ func applyLayer() {
fatal(err)
}

  • if inUserns {
  •   options.InUserNS = true
    

Ah nevermind, that's a different set of changes from v1.10.2...hallyn:v1.10.0.serge.2diff-4166b9ad558bc9c8d0ff7b01b69c128aR34 causing that one -- carry on! 👍

yeah i had it wrong originally

@hallyn

hallyn Mar 8, 2016

Contributor

Quoting Tianon Gravi (notifications@github.com):

@@ -49,6 +51,10 @@ func applyLayer() {
fatal(err)
}

  • if inUserns {
  •   options.InUserNS = true
    

Ah nevermind, that's a different set of changes from v1.10.2...hallyn:v1.10.0.serge.2diff-4166b9ad558bc9c8d0ff7b01b69c128aR34 causing that one -- carry on! 👍

yeah i had it wrong originally

This comment has been minimized.

@mwhudson

mwhudson Mar 9, 2016

Contributor

fwiw, I triggered this by running "sudo adt-run --unbuilt-tree . --- null". and I just tried the same on arm64 with the same results, so it would probably happen on amd64 too...

@mwhudson

mwhudson Mar 9, 2016

Contributor

fwiw, I triggered this by running "sudo adt-run --unbuilt-tree . --- null". and I just tried the same on arm64 with the same results, so it would probably happen on amd64 too...

This comment has been minimized.

@hallyn

hallyn Mar 9, 2016

Contributor

Exactly which commit id were you using?

@hallyn

hallyn Mar 9, 2016

Contributor

Exactly which commit id were you using?

This comment has been minimized.

@tianon

tianon Mar 9, 2016

Member

That was my fault -- he was testing on essentially ab8e54b, which is missing 45afdc9 (which fixes the panic).

@tianon

tianon Mar 9, 2016

Member

That was my fault -- he was testing on essentially ab8e54b, which is missing 45afdc9 (which fixes the panic).

@calavera calavera changed the title from Support running inside a user namespace to Support running the Engine daemon inside a user namespace Mar 14, 2016

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Mar 17, 2016

Contributor

@hallyn we need to wait until the containerd integration PR merges (#20662) to get rid of your vendor commit--that PR will update the opencontainers/runc vendor to include the needed changes.

In the meantime, what do you think about logging the "skips" for unsupported file types when tarring/untarring in the archive packages? That way there will be some record if bugs come up in the future about "missing" content?

Contributor

estesp commented Mar 17, 2016

@hallyn we need to wait until the containerd integration PR merges (#20662) to get rid of your vendor commit--that PR will update the opencontainers/runc vendor to include the needed changes.

In the meantime, what do you think about logging the "skips" for unsupported file types when tarring/untarring in the archive packages? That way there will be some record if bugs come up in the future about "missing" content?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 21, 2016

Member

ping @hallyn #20662 was merged, can you rebase and address the comments (if any needs addressing?)

Member

thaJeztah commented Mar 21, 2016

ping @hallyn #20662 was merged, can you rebase and address the comments (if any needs addressing?)

@hallyn

This comment has been minimized.

Show comment
Hide comment
@hallyn

hallyn Mar 21, 2016

Contributor

Hi,

rebased and added the log msg @estesp mentioned.

Contributor

hallyn commented Mar 21, 2016

Hi,

rebased and added the log msg @estesp mentioned.

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Apr 1, 2016

Contributor

code LGTM.

Per comment in the more recent PR re: userns + AUFS, @thaJeztah do you know which doc would be the right place to explain this behavior/availability of running the engine in a userns?

Contributor

estesp commented Apr 1, 2016

code LGTM.

Per comment in the more recent PR re: userns + AUFS, @thaJeztah do you know which doc would be the right place to explain this behavior/availability of running the engine in a userns?

@hallyn

This comment has been minimized.

Show comment
Hide comment
@hallyn

hallyn Apr 1, 2016

Contributor

On Fri, Apr 01, 2016 at 09:53:17AM -0700, Phil Estes wrote:

code LGTM.

Per comment in the more recent PR re: userns + AUFS, @thaJeztah do you know which doc would be the right place to explain this behavior/availability of running the engine in a userns?

Note, stgraber pointed out yesterday there is a allow_userns module load
option in aufs, so technically docker should probably look for that and
proceed to load aufs if /sys/module/aufs/parameters/allow_userns is 1.

Contributor

hallyn commented Apr 1, 2016

On Fri, Apr 01, 2016 at 09:53:17AM -0700, Phil Estes wrote:

code LGTM.

Per comment in the more recent PR re: userns + AUFS, @thaJeztah do you know which doc would be the right place to explain this behavior/availability of running the engine in a userns?

Note, stgraber pointed out yesterday there is a allow_userns module load
option in aufs, so technically docker should probably look for that and
proceed to load aufs if /sys/module/aufs/parameters/allow_userns is 1.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Apr 14, 2016

Contributor

This needs a rebase. There's a conflict that's not actually causing a merge conflict.

Contributor

cpuguy83 commented Apr 14, 2016

This needs a rebase. There's a conflict that's not actually causing a merge conflict.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Apr 14, 2016

Contributor

Can you also flatten your commits into 1

Contributor

cpuguy83 commented Apr 14, 2016

Can you also flatten your commits into 1

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Apr 14, 2016

Contributor

Interestingly, docker seems unable to resolve hostnames when in the namespace, so I can't pull images.
Edit, of course it can't, I had to put it in a new network namespace and didn't add a network interface.

Contributor

cpuguy83 commented Apr 14, 2016

Interestingly, docker seems unable to resolve hostnames when in the namespace, so I can't pull images.
Edit, of course it can't, I had to put it in a new network namespace and didn't add a network interface.

@hallyn

This comment has been minimized.

Show comment
Hide comment
@hallyn

hallyn Apr 27, 2016

Contributor

Hi @cpuguy83,

what exactly are you running lxd in? Recently for all but simply proxied ipv6 traffic you need to set up lxdbr0 yourself for your containers. If you have something like a br0 set up you can use that in your containers. For instance, I created a profile virbr to reuse my libvirt virbr0 bridge

lxc profile create virbr
lxc profile edit virbr

Placing the following in the profile:

name: virbr
config: {}
description: profile for using the libvirt bridge
devices:
eth0:
name: eth0
nictype: bridged
parent: virbr0
type: nic

then created a container with that profile as:

lxc launch ubuntu:xenial dd -p virbr

Please let me know if I can help further.

Contributor

hallyn commented Apr 27, 2016

Hi @cpuguy83,

what exactly are you running lxd in? Recently for all but simply proxied ipv6 traffic you need to set up lxdbr0 yourself for your containers. If you have something like a br0 set up you can use that in your containers. For instance, I created a profile virbr to reuse my libvirt virbr0 bridge

lxc profile create virbr
lxc profile edit virbr

Placing the following in the profile:

name: virbr
config: {}
description: profile for using the libvirt bridge
devices:
eth0:
name: eth0
nictype: bridged
parent: virbr0
type: nic

then created a container with that profile as:

lxc launch ubuntu:xenial dd -p virbr

Please let me know if I can help further.

@hallyn

This comment has been minimized.

Show comment
Hide comment
@hallyn

hallyn Apr 27, 2016

Contributor

(Of course if you're on ubuntu you can setup lxdbr0 by doing

dpkg-reconfigure -p medium lxd
)

Contributor

hallyn commented Apr 27, 2016

(Of course if you're on ubuntu you can setup lxdbr0 by doing

dpkg-reconfigure -p medium lxd
)

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Apr 28, 2016

Contributor

@hallyn Thanks, not very familiar with lxd, that got networking setup.
Now I'm getting errors where when trying to start a container (in overlay or vfs drivers) that it can't stat the command I tried to execute (like docker run -it --rm busybox sh).

Note that I get this same error with this PR and with docker 1.11.

Contributor

cpuguy83 commented Apr 28, 2016

@hallyn Thanks, not very familiar with lxd, that got networking setup.
Now I'm getting errors where when trying to start a container (in overlay or vfs drivers) that it can't stat the command I tried to execute (like docker run -it --rm busybox sh).

Note that I get this same error with this PR and with docker 1.11.

@hallyn

This comment has been minimized.

Show comment
Hide comment
@hallyn

hallyn Apr 28, 2016

Contributor

On Thu, Apr 28, 2016 at 07:33:18AM -0700, Brian Goff wrote:

@hallyn Thanks, not very familiar with lxd, that got networking setup.
Now I'm getting errors where when trying to start a container (in overlay or vfs drivers) that it can't stat the command I tried to execute (like docker run -it --rm busybox sh).

Actually I get that misleading error message when there are problems
with the cgroups. I believe you're running on a kernel support
cgroup namespaces but without the patch in

opencontainers/runc#617

( I closed that request while I wait to see what happens with
http://lkml.org/lkml/2016/4/18/31 )

Contributor

hallyn commented Apr 28, 2016

On Thu, Apr 28, 2016 at 07:33:18AM -0700, Brian Goff wrote:

@hallyn Thanks, not very familiar with lxd, that got networking setup.
Now I'm getting errors where when trying to start a container (in overlay or vfs drivers) that it can't stat the command I tried to execute (like docker run -it --rm busybox sh).

Actually I get that misleading error message when there are problems
with the cgroups. I believe you're running on a kernel support
cgroup namespaces but without the patch in

opencontainers/runc#617

( I closed that request while I wait to see what happens with
http://lkml.org/lkml/2016/4/18/31 )

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Apr 28, 2016

Contributor

@hallyn Interesting, TIL! (also, congrats!).
Is this merged mainline?
I'm using the stock xenial kernel.

Contributor

cpuguy83 commented Apr 28, 2016

@hallyn Interesting, TIL! (also, congrats!).
Is this merged mainline?
I'm using the stock xenial kernel.

@hallyn

This comment has been minimized.

Show comment
Hide comment
@hallyn

hallyn Apr 28, 2016

Contributor
Contributor

hallyn commented Apr 28, 2016

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 May 11, 2016

Contributor

So I still can't get this to work.
Using unshare I get the lchown error. If I disable lchown then I get this overlay error: error creating overlay mount to /var/lib/docker/overlay/0ce08510fd8ca9013970f06c67003d6a0c474b4ac2d5f959c440067430ae665d-init/merged: operation not permitted

Contributor

cpuguy83 commented May 11, 2016

So I still can't get this to work.
Using unshare I get the lchown error. If I disable lchown then I get this overlay error: error creating overlay mount to /var/lib/docker/overlay/0ce08510fd8ca9013970f06c67003d6a0c474b4ac2d5f959c440067430ae665d-init/merged: operation not permitted

@hallyn

This comment has been minimized.

Show comment
Hide comment
@hallyn

hallyn May 12, 2016

Contributor

@cpuguy83 - can you show the output of 'lxc config show containername --expanded' ?
Can you create a new lxd container with

lxc launch ubuntu;xenial x1 -p default -p docker

and use the docker.io package in the archive, and see whether that also fails for you?

Contributor

hallyn commented May 12, 2016

@cpuguy83 - can you show the output of 'lxc config show containername --expanded' ?
Can you create a new lxd container with

lxc launch ubuntu;xenial x1 -p default -p docker

and use the docker.io package in the archive, and see whether that also fails for you?

@hallyn

This comment has been minimized.

Show comment
Hide comment
@hallyn

hallyn May 26, 2016

Contributor

ping?

Contributor

hallyn commented May 26, 2016

ping?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 30, 2016

Member

Hi @hallyn, we brought this PR up in our review session last Thursday. Unfortunately, @cpuguy83 didn't find the time to look into this again (and so far didn't manage to get it running properly 😊).

ping @estesp perhaps you have some time to try this?

Member

thaJeztah commented May 30, 2016

Hi @hallyn, we brought this PR up in our review session last Thursday. Unfortunately, @cpuguy83 didn't find the time to look into this again (and so far didn't manage to get it running properly 😊).

ping @estesp perhaps you have some time to try this?

@jessfraz

This comment has been minimized.

Show comment
Hide comment
@jessfraz

jessfraz Jun 23, 2016

Contributor

There should be a test added for this, I bet you can use unshare, and add it to the daemon suite

Contributor

jessfraz commented Jun 23, 2016

There should be a test added for this, I bet you can use unshare, and add it to the daemon suite

@jessfraz

This comment has been minimized.

Show comment
Hide comment
@jessfraz

jessfraz Jun 23, 2016

Contributor

otherwise we will never know if it breaks :)

Contributor

jessfraz commented Jun 23, 2016

otherwise we will never know if it breaks :)

@hallyn

This comment has been minimized.

Show comment
Hide comment
@hallyn

hallyn Jun 24, 2016

Contributor

A test would definately be good. IMO this was a bugfix not a feature so a testcase could be a follow-on, but especially now that it's been months since the patch was written, it would already be nice to have one.

I'm all for someone resubmitting my patch with an added-on test :)

Contributor

hallyn commented Jun 24, 2016

A test would definately be good. IMO this was a bugfix not a feature so a testcase could be a follow-on, but especially now that it's been months since the patch was written, it would already be nice to have one.

I'm all for someone resubmitting my patch with an added-on test :)

@icecrime

This comment has been minimized.

Show comment
Hide comment
@icecrime

icecrime Jul 28, 2016

Contributor

Ping @estesp: halp!

Contributor

icecrime commented Jul 28, 2016

Ping @estesp: halp!

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Jul 31, 2016

Contributor

I may also try with lxc, but I thought using runc might be a good option given it will be installed in the CI system as one of the build outputs along with the daemon.

Ran into several issues with an ubuntu image that I could mostly get around with the following daemon startup:

PATH=/usr/bin dockerd --oom-score-adjust=0 --iptables=false --bridge=none

However, on docker pull alpine after starting the daemon (with vfs being the only driver that seems happy):

Using default tag: latest
latest: Pulling from library/alpine
e110a4a17941: Extracting [=>                                                 ] 65.54 kB/2.31 MB
ERRO[0013] Error trying v2 registry: failed to register layer: ApplyLayer exit status 1 stdout:  stderr: Error creating mount namespace before pivot: operation not permitted
ERRO[0013] Attempting next endpoint for pull after error: failed to register layer: ApplyLayer exit status 1 stdout:  stderr: Error creatinge110a4a17941: Extracting [==================================================>]  2.31 MB/2.31 MB
failed to register layer: ApplyLayer exit status 1 stdout:  stderr: Error creating mount namespace before pivot: operation not permitted
root@ubuntu:/#

Maybe LXC is set up better for this nesting, but I would like to figure out how to get runc as a valid "candidate" for the outer layer of the nesting if at all possible. Any thoughts @hallyn ?

Contributor

estesp commented Jul 31, 2016

I may also try with lxc, but I thought using runc might be a good option given it will be installed in the CI system as one of the build outputs along with the daemon.

Ran into several issues with an ubuntu image that I could mostly get around with the following daemon startup:

PATH=/usr/bin dockerd --oom-score-adjust=0 --iptables=false --bridge=none

However, on docker pull alpine after starting the daemon (with vfs being the only driver that seems happy):

Using default tag: latest
latest: Pulling from library/alpine
e110a4a17941: Extracting [=>                                                 ] 65.54 kB/2.31 MB
ERRO[0013] Error trying v2 registry: failed to register layer: ApplyLayer exit status 1 stdout:  stderr: Error creating mount namespace before pivot: operation not permitted
ERRO[0013] Attempting next endpoint for pull after error: failed to register layer: ApplyLayer exit status 1 stdout:  stderr: Error creatinge110a4a17941: Extracting [==================================================>]  2.31 MB/2.31 MB
failed to register layer: ApplyLayer exit status 1 stdout:  stderr: Error creating mount namespace before pivot: operation not permitted
root@ubuntu:/#

Maybe LXC is set up better for this nesting, but I would like to figure out how to get runc as a valid "candidate" for the outer layer of the nesting if at all possible. Any thoughts @hallyn ?

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Jul 31, 2016

Contributor

Ah, this is caused by #22506; seems we will need to extend this patch to rely on/fallback to chroot if we are nested in a userns.

Contributor

estesp commented Jul 31, 2016

Ah, this is caused by #22506; seems we will need to extend this patch to rely on/fallback to chroot if we are nested in a userns.

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Jul 31, 2016

Contributor

Getting further after fallback to "real choot" if nested userns, but shm tmpfs mount seems to be a problem; or a real problem is not logged and the cleanup of mounts is just a side effect:

root@ubuntu # docker run -ti --rm alpine sh
WARN[0067] failed to cleanup ipc mounts:
failed to umount /var/lib/docker/containers/3cab03c60610a93231bd95a63ba2ad4f914feec3b1655b082170c59c3318fb6b/shm: operation not permitted
ERRO[0067] Handler for POST /v1.25/containers/3cab03c60610a93231bd95a63ba2ad4f914feec3b1655b082170c59c3318fb6b/start returned error: mounting shm tmpfs: operation not permitted
docker: Error response from daemon: mounting shm tmpfs: operation not permitted.

At this point I have to stop for now, but suggestions welcome. Looks like we are close to a testable scenario with runc, maybe?

Contributor

estesp commented Jul 31, 2016

Getting further after fallback to "real choot" if nested userns, but shm tmpfs mount seems to be a problem; or a real problem is not logged and the cleanup of mounts is just a side effect:

root@ubuntu # docker run -ti --rm alpine sh
WARN[0067] failed to cleanup ipc mounts:
failed to umount /var/lib/docker/containers/3cab03c60610a93231bd95a63ba2ad4f914feec3b1655b082170c59c3318fb6b/shm: operation not permitted
ERRO[0067] Handler for POST /v1.25/containers/3cab03c60610a93231bd95a63ba2ad4f914feec3b1655b082170c59c3318fb6b/start returned error: mounting shm tmpfs: operation not permitted
docker: Error response from daemon: mounting shm tmpfs: operation not permitted.

At this point I have to stop for now, but suggestions welcome. Looks like we are close to a testable scenario with runc, maybe?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Aug 9, 2016

Member

ping @mlaventure perhaps you could have a look?

Member

thaJeztah commented Aug 9, 2016

ping @mlaventure perhaps you could have a look?

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Aug 10, 2016

Contributor

Here's the real issue; at least running under runc with a fairly vanilla config.json, the /sys mount is not owned by my remapped root, so directory creation for cgroups fails miserably:

root@ubuntu:~# docker run --rm alpine date
ERRO[0005] containerd: start container                   error=oci runtime error: process_linux.go:258: applying cgroup configuration for process caused "mkdir /sys
/fs/cgroup/cpuset/docker: permission denied" id=3fa1606bd39a2f2ac276e47830865e58b6d7f1426012843d91cb08042fd89805
ERRO[0005] Create container failed with error: oci runtime error: process_linux.go:258: applying cgroup configuration for process caused "mkdir /sys/fs/cgroup/cpuse
t/docker: permission denied"
Contributor

estesp commented Aug 10, 2016

Here's the real issue; at least running under runc with a fairly vanilla config.json, the /sys mount is not owned by my remapped root, so directory creation for cgroups fails miserably:

root@ubuntu:~# docker run --rm alpine date
ERRO[0005] containerd: start container                   error=oci runtime error: process_linux.go:258: applying cgroup configuration for process caused "mkdir /sys
/fs/cgroup/cpuset/docker: permission denied" id=3fa1606bd39a2f2ac276e47830865e58b6d7f1426012843d91cb08042fd89805
ERRO[0005] Create container failed with error: oci runtime error: process_linux.go:258: applying cgroup configuration for process caused "mkdir /sys/fs/cgroup/cpuse
t/docker: permission denied"
@kimh

This comment has been minimized.

Show comment
Hide comment
@kimh

kimh Aug 11, 2016

I'm trying to catch up this PR because I also want to run docker inside user namespace (unpriv LXC container) and want to reproduce the errors locally first.

@estesp Mind sharing in what env you are using? Specifically, I like to know

  • are you using docker built from master with the PR change?
  • on what env you started docker command? (lxc container?)

Also, I'm curious if running the latest docker inside userns becomes possible without changing runc/containerd? Or is it something we'll find out eventually?

kimh commented Aug 11, 2016

I'm trying to catch up this PR because I also want to run docker inside user namespace (unpriv LXC container) and want to reproduce the errors locally first.

@estesp Mind sharing in what env you are using? Specifically, I like to know

  • are you using docker built from master with the PR change?
  • on what env you started docker command? (lxc container?)

Also, I'm curious if running the latest docker inside userns becomes possible without changing runc/containerd? Or is it something we'll find out eventually?

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Aug 11, 2016

Contributor

@kimh it is possible with LXC and this PR plus an additional change that is required since this PR was created (I mentioned it above related to chroot/untar fallback in a comment 12 days ago).

I will be carrying this PR with the additional change, but had been stuck on trying to see if I could get a working test using runc as the outer user namespaced container, but running into trouble as you can see.

I did try via LXC today and with this PR + my added change I can run the daemon with "dockerd -D -s vfs --oom-score-adjust=0" and pull images and run them successfully in an LXC ubuntu:xenial container.

Contributor

estesp commented Aug 11, 2016

@kimh it is possible with LXC and this PR plus an additional change that is required since this PR was created (I mentioned it above related to chroot/untar fallback in a comment 12 days ago).

I will be carrying this PR with the additional change, but had been stuck on trying to see if I could get a working test using runc as the outer user namespaced container, but running into trouble as you can see.

I did try via LXC today and with this PR + my added change I can run the daemon with "dockerd -D -s vfs --oom-score-adjust=0" and pull images and run them successfully in an LXC ubuntu:xenial container.

@kimh

This comment has been minimized.

Show comment
Hide comment
@kimh

kimh Aug 11, 2016

@estesp So if my understanding is correct, my added change refers to fallback to chroot under nested userns ? Any chance you can share the change with us? I looked for the branch in your fork but not sure which one it is. I'm thrilled about testing this 💣

kimh commented Aug 11, 2016

@estesp So if my understanding is correct, my added change refers to fallback to chroot under nested userns ? Any chance you can share the change with us? I looked for the branch in your fork but not sure which one it is. I'm thrilled about testing this 💣

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Aug 12, 2016

Contributor

@kimh please see my carry PR #25672 that has both patches. I'm going to close this PR and hopefully we can get the carried PR merged

Contributor

estesp commented Aug 12, 2016

@kimh please see my carry PR #25672 that has both patches. I'm going to close this PR and hopefully we can get the carried PR merged

@@ -105,6 +106,10 @@ func copyDir(srcDir, dstDir string, flags copyFlags) error {
case os.ModeNamedPipe:
fallthrough
case os.ModeSocket:
if rsystem.RunningInUserNS() {

This comment has been minimized.

@redbaron

redbaron Apr 1, 2018

Contributor

this seems be a typo. according to commit message, it should skip in os.ModeDevice case, not in os.ModeSocket

@redbaron

redbaron Apr 1, 2018

Contributor

this seems be a typo. according to commit message, it should skip in os.ModeDevice case, not in os.ModeSocket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment