Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rslave propagation for mounts from daemon root #36055

Merged
merged 1 commit into from
Feb 15, 2018

Conversation

cpuguy83
Copy link
Member

@cpuguy83 cpuguy83 commented Jan 18, 2018

By default, if a user requests a bind mount it uses private propagation.
When the source path is a path within the daemon root this, along with
some other propagation values that the user can use, causes issues when
the daemon tries to remove a mountpoint because a container will then
have a private reference to that mount which prevents removal.

Unmouting with MNT_DETATCH can help this scenario on newer kernels, but
ultimately this is just covering up the problem and doesn't actually
free up the underlying resources until all references are destroyed.

This change does essentially 2 things:

  1. Change the default propagation when unspecified to rslave when the
    source path is within the daemon root path
  2. Creates a validation error on create when the user tries to specify
    an unacceptable propagation mode for a path within the daemon root...
    basically the only two acceptable modes are rslave and rshared.

In cases where we have used the new default propagation but the
underlying filesystem is not setup to handle it (fs must hvae at least
rshared propagation) instead of erroring out like we normally would,
this falls back to the old default mode of private, which preserves
backwards compatibility.

@cpuguy83
Copy link
Member Author

ping @cyphar @euank @kolyshkin

@kolyshkin
Copy link
Contributor

LGTM

@kolyshkin
Copy link
Contributor

@cpuguy83 looks like this needs to be rebased

@cpuguy83
Copy link
Member Author

Rebased.

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐼

@cpuguy83 cpuguy83 force-pushed the slave_mounts_for_root branch 2 times, most recently from ebc0864 to 6add9a1 Compare January 23, 2018 22:35
@trapier
Copy link
Contributor

trapier commented Jan 25, 2018

  1. Change the default propagation when unspecified to rslave when the source path is within the daemon root path

Why only if the source is within the daemon root? This saves us from -v /var/lib/docker:..., but not -v /var/lib:.. or -v /var/:... or -v /:.... Since it's an rbind these have the same effect as -v /var/lib/docker:... with respect to passing container mounts around. Shouldn't rslave also apply if the source is a parent directory of the daemon root?

@cpuguy83 cpuguy83 force-pushed the slave_mounts_for_root branch 2 times, most recently from 9c76bad to ac1c50b Compare January 25, 2018 22:33
@cpuguy83
Copy link
Member Author

Fixed this to also do the same for parent directories, updated commit message to match. Also some unit tests for this.

@cpuguy83
Copy link
Member Author

Thinking about it, we may need to ignore validation errors for parent directories just so we don't break people... not sure.

@cpuguy83
Copy link
Member Author

And tests confirm that maybe we shouldn't error out on the validation when mounting a parent dir... this could break people (of course they're already broken due to having private references to docker mounts in a container).

@trapier
Copy link
Contributor

trapier commented Jan 26, 2018

Trying to think who would have specified something other than rshared or rslave on any bind mount.

As an anecdote rprivate is not specified in any public github gists.

Since rprivate is the default one possibility would be those feeding container inspect into container create. This is unlikely though because they would most likely derive from .HostConfig.Binds, where the implicit rprivate is not reported. Anecdotal example of this is rekcod. Same would apply for service inspect -> service create.

So, the most likely case is someone who already had the right intent but specifed shared or slave instead of their recursive variants. It would suck to break them, and without a web search engine that supports regex, they are hard to find.

@trapier
Copy link
Contributor

trapier commented Jan 29, 2018

For the case where the user has bind mounted a parent of the daemon root another option would be to implicitly give it an rslave sub mount of the daemon root with its master at dockerd's /. This would need its own cleanup routine.

@thaJeztah
Copy link
Member

CI failure:

16:30:04 ----------------------------------------------------------------------
16:30:04 FAIL: docker_cli_run_test.go:1230: DockerSuite.TestRunAllowBindMountingRoot
16:30:04 
16:30:04 docker_cli_run_test.go:1235:
16:30:04     dockerCmd(c, "run", "-v", "/:/host", "busybox", "ls", "/host")
16:30:04 /go/src/github.com/docker/docker/vendor/github.com/gotestyourself/gotestyourself/icmd/command.go:65:
16:30:04     t.Fatalf(err.Error() + "\n")
16:30:04 ... Error: 
16:30:04 Command:  /usr/local/bin/docker run -v /:/host busybox ls /host
16:30:04 ExitCode: 125
16:30:04 Error:    exit status 125
16:30:04 Stdout:   
16:30:04 Stderr:   /usr/local/bin/docker: Error response from daemon: linux mounts: path / is mounted on / but it is not a shared or slave mount.
16:30:04 time="2018-01-29T16:30:04Z" level=error msg="error waiting for container: context canceled" 
16:30:04 
16:30:04 Failures:
16:30:04 ExitCode was 125 expected 0
16:30:04 Expected no error

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member

Whoops, looks like this broke the build;

14:15:31 # github.com/docker/docker/integration/container
14:15:31 integration/container/mounts_linux_test.go:16:2: cannot find package "github.com/docker/docker/integration/util/request" in any of:
14:15:31 	/go/src/github.com/docker/docker/vendor/github.com/docker/docker/integration/util/request (vendor tree)
14:15:31 	/usr/local/go/src/github.com/docker/docker/integration/util/request (from $GOROOT)
14:15:31 	/go/src/github.com/docker/docker/integration/util/request (from $GOPATH)
14:15:31 FAIL	github.com/docker/docker/integration/container [setup failed]

@cpuguy83 cpuguy83 deleted the slave_mounts_for_root branch February 15, 2018 14:23
@thaJeztah
Copy link
Member

Caused by #36265

@free2k
Copy link

free2k commented Oct 12, 2018

Error response from daemon: Driver devicemapper failed to remove root filesystem 03339fa46e2df571873c5efda70266363bb6213b73d3a50c615c827466c7a865: remove /var/lib/docker/devicemapper/mnt/6d29157152dfda03902aae457549b9822f2fac558d4b239273ba5b7eb6fbaea0: device or resource busy
Is there a condition for this? ?
My environment has this problem, but I don't know how it happened.

@cpuguy83
Copy link
Member Author

@free2k would need more details as to how you got to that point... also what version are you on?

@free2k
Copy link

free2k commented Oct 15, 2018

@free2k would need more details as to how you got to that point... also what version are you on?

`Client:
Version: 1.13.1
API version: 1.26
Package version: docker-1.13.1-63.mt20181011.git1e620d4.el7.centos.x86_64
Go version: go1.7.5
Git commit: 1e620d4/1.13.1
Built: Thu Oct 11 08:10:01 2018
OS/Arch: linux/amd64

Server:
Version: 1.13.1
API version: 1.26 (minimum version 1.12)
Package version: docker-1.13.1-63.mt20181011.git1e620d4.el7.centos.x86_64
Go version: go1.7.5
Git commit: 1e620d4/1.13.1
Built: Thu Oct 11 08:10:01 2018
OS/Arch: linux/amd64
Experimental: false`

@cpuguy83
Copy link
Member Author

That version of Docker does not have this fix (though I think this isn't the fix you are looking for, the others aren't in that release either).
One thing you could do is make "/var/lib/docker/devicemapper" use shared propagation after Docker has started... so mount --make-shared /var/lib/docker/devicemapper as a post-start operation. It is not a perfect solution, but should work well enough... or upgrade to a recent version of Docker.

@free2k
Copy link

free2k commented Oct 17, 2018

I found a law,All kernels that fail to delete the root file system are 3.10.0-229.el7.x86_64 ,But 3.10.0-693.47.el7.x86_64 this kernel did not find the problem。I don't know what the kernel has changed。Do you have any better suggestions? @cpuguy83

@cpuguy83
Copy link
Member Author

@free2k Probably upgraded device mapper libraries that support deferred device deletion. The problem with this method is that the reason you are getting "device or resource busy" is because something else is holding the mount point (more specifically, most likely it is another service on the system as many services get their own mount namespace, and as such a copy of the mount).

With the newer kernel you aren't seeing the issue, but the underlying problem still exists and data is not actually freed. I would still recommend either upgrading Docker to a more recent version or adding it that mount command I mentioned above.

@free2k
Copy link

free2k commented Oct 17, 2018

Docker log:

Old kernel devicemapper version
devicemapper: driver version is 4.29.0 devmapper: Deferred removal support enabled. devmapper: Deferred deletion support enabled.

new kernel devicemapper version
devicemapper: driver version is 4.35.0 devmapper: Deferred removal support enabled. devmapper: Deferred deletion support enabled.

Old kernel version supports deferred device deletion。I don't understand Why is there only a problem with the old kernel?

@AkihiroSuda
Copy link
Member

We should have an option to disable this.

PTAL:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants