-
Notifications
You must be signed in to change notification settings - Fork 18.8k
daemon mount namespaces #10225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon mount namespaces #10225
Conversation
This systemd.exec setting will construct a new mount namespace for the docker daemon, and use slave shared-subtree mounts so that volume mounts propogate correctly into containers. By having an unshared mount namespace for the daemon it ensures that mount references are not held by other pids outside of the docker daemon. Frequently this can be seen in EBUSY or "device or resource busy" errors. Signed-off-by: Vincent Batts <vbatts@redhat.com>
unshare the mount namespace of the docker daemon to avoid other pids outside the daemon holding mount references of docker containers. Signed-off-by: Vincent Batts <vbatts@redhat.com>
b2f2aea to
6bb6586
Compare
|
LGTM, although I think we should wait until after 1.5 (and then include the |
|
Unless you can make a compelling argument for this not breaking other stuff so close to our freeze window. 😉 |
|
and, this is ./contrib/ ;-) |
|
^^ Add a line to those files: "In case of emergency, call Vincent +1 555 ...", then it should be good to merge 😸 |
|
@thaJeztah woah there, cowboy ;-) |
|
I'm not very familiar with this stuff, but would this also have a positive effect on AUFS not freeing inodes I saw reported? |
|
@thaJeztah do you have a link to that conversation? |
|
Yup, read it earlier today, that's why I thought of it; #9755 |
|
@vbatts Do you think this will fix aufs |
|
@vbatts For the redhat sysvinit change, I agree 100% that the only repercussions would be to the redhat packagers who are quite familiar with creating out-of-band patches. For the systemd change, this is used in the packages we release, so I'm much more leery (since a problem there would mean we have to re-release depending on the severity and the number of times it gets reported 😞). Regardless, we'll need our systemd maintainers to weigh in: @lsm5 @philips @jfrazelle |
|
Didn't @crosbymichael already try doing this, although within Docker itself? #4599 Also, what happens in situations where someone has mounted an fs on the host and wants to then bind-mount that in a container? |
|
@cpuguy83 No, that wasn't merged because there were other problems to take into account. |
|
@tianon well, fedora and centos7 had been using |
|
@vbatts OK, that's fair; so, if the I think it's probably prudent that we at least ping @maxamillion for the sysvinit change. 😄 Also, for future reference, the argument that "it's just |
|
@tianon fair. I like to keep contrib fresh and current too. ❤️ |
|
LGTM |
| @@ -6,6 +6,7 @@ Requires=docker.socket | |||
|
|
|||
| [Service] | |||
| ExecStart=/usr/bin/docker -d -H fd:// | |||
| MountFlags=slave | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Y'all know that you can no longer go into |
|
I am kinda in favor of possibly reverting this now because there are instances when people need to backup a stopped container etc. |
|
Being able to access current active mounts in |
|
works fine on overlayfs even. I don't know what to say for aufs. Really a shame that you'd improve many folks experience, reduce errors, but determine a need to revert because of mystical behavior on aufs. |
|
Is there another solution maybe :( On Wednesday, January 21, 2015, Vincent Batts notifications@github.com
|
|
@jfrazelle i even tried https://gist.github.com/vbatts/7257da4823c8bc8c61c1 and the directories are still hidden on aufs. |
|
shakes fist in air ok @crosbymichael you think we should revert? On Wednesday, January 21, 2015, Vincent Batts notifications@github.com
|
|
We could also add something to docs about how this recommended for On Wednesday, January 21, 2015, Jessica Frazelle jess@docker.com wrote:
|
|
I think the PR is safe as-is right ? It only modifies systemd / sysvinit-redhat init script and aufs is not generally used on those systems while devicemapper and btrfs are. It's probably better to keep this and add some docs on how this doesnt work with aufs. |
|
I have debian aufs machines, actually the core machine is debian aufs. On Wednesday, January 21, 2015, Daniel, Dao Quang Minh <
|
|
@jfrazelle debian should use sysvinit-debian script, which is not affected by this change right ? |
|
Not debian jessie ;) On Wednesday, January 21, 2015, Daniel, Dao Quang Minh <
|
|
As much as I would love to be the only debian jessie user running aufs I do On Wednesday, January 21, 2015, Daniel, Dao Quang Minh <
|
|
Also what about arch linux users with aufs On Wednesday, January 21, 2015, Jessica Frazelle jess@docker.com wrote:
|
|
oh noes 🔥 |
|
@vbatts weird, https://gist.github.com/vbatts/7257da4823c8bc8c61c1 works on my aufs system. |
|
@dqminh oh good. is that on a running or stopped container? Perhaps this is the start of a contrib tool for debugging containers in their namespace. |
|
@vbatts it's on a running container. If the container is stopped, then |
|
@dqminh Now I am confused. Does your version of aufs differ from @jfrazelle 's? |
|
Running containers, we don't leave them mounted if they are stopped |
|
#10281 is up for debate |
http://blog.hashbangbash.com/2014/11/docker-devicemapper-fix-for-device-or-resource-busy-ebusy/ moby/moby#10225 Also, we'll need an extra option in the future - commented out now, uncomment when it's introduced (probably Docker 1.7.0).
|
@vbatts heh, looks like we forgot to push this over to sysvinit, upstart, and openrc too |
|
Yeah, I remember that we discussed it quite a bit and thought we'd found a solution to the |
|
"I want to die" that was the solution there
|
This change covers two of the contrib init's. The daemon ought to be invoked in its own mount namespace, unshared from the root mount namespace.
Some of the error most often seen affect the devicemapper graph driver, seen as EBUSY or "device or resource busy".
Though others are likely to run in to this as folks will have containers that will have mounts or even devices being mounted.
If any other pid outside of the docker daemon unshares their mount namespace, and that pid's mountinfo includes a mount reference of a container, then even when the container exits, and attempts to remove, there kernel will return that the path is busy, due to the mount reference by the outside pid.
I've written about this here: http://blog.hashbangbash.com/2014/11/docker-devicemapper-fix-for-device-or-resource-busy-ebusy/
I did not include a commit for the init scripts with
start-stop-daemon, as I would like them to get more testing first. See:(and for completeness, here is the change for ubuntu's upstart http://pastebin.com/D5MsVCUK)