New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix EBUSY errors under overlayfs and v4.13+ kernels #34948

Merged
merged 2 commits into from Nov 29, 2017

Conversation

@euank
Contributor

euank commented Sep 22, 2017

This removes and recreates the merged dir with each umount/mount
respectively.
This is done to make the impact of leaking mountpoints have less
user-visible impact.

It's fairly easy to accidentally leak mountpoints (even if moby doesn't,
other tools on linux like 'unshare' are quite able to incidentally do
so).

As of recently, overlayfs reacts to these mounts being leaked (see

One trick to force an unmount is to remove the mounted directory and
recreate it. Devicemapper now does this, overlay can follow suit.

- Description for the changelog

Fix upperdir in use warnings under overlayfs and v4.13+ kernels

- A picture of a cute animal (not mandatory but encouraged)

red panda

@euank euank requested a review from dmcgowan as a code owner Sep 22, 2017

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Sep 22, 2017

Contributor

cc @rhvgoyal as the author of #22069, which this reverts, and @tonistiigi as the author of #27609, which I think makes the change in #22069 unnecessary.

Contributor

euank commented Sep 22, 2017

cc @rhvgoyal as the author of #22069, which this reverts, and @tonistiigi as the author of #27609, which I think makes the change in #22069 unnecessary.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Sep 25, 2017

Contributor

The powerpc failure is a flake. I don't know how to kick the jenkins to re-run

Contributor

euank commented Sep 25, 2017

The powerpc failure is a flake. I don't know how to kick the jenkins to re-run

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Sep 25, 2017

Member

@euank Looks like you forgot to sign-off the second commit 😅

also ping @dmcgowan @rhvgoyal @tonistiigi PTAL

Member

thaJeztah commented Sep 25, 2017

@euank Looks like you forgot to sign-off the second commit 😅

also ping @dmcgowan @rhvgoyal @tonistiigi PTAL

@GordonTheTurtle GordonTheTurtle removed the dco/no label Sep 25, 2017

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Sep 26, 2017

Contributor

cc @amir73il

Ok, so kernel commit 2cac0c00a6cdcc9 started giving errors in case of mount point leaks. Trying to avoid mount point leaks completely will make sense only if mount propagation was synchronous. That is by the time unmount finished, all the propagations and unmounting must have finished before it. If not, then we still have races where a leaked mount has not been unmounted yet (because propagation has not happened yet) and we are trying to create a mount point using same upper directory. I don't know if unmount is synchronous or not.

Secondly, even if you make slave, we will still have the race because runc makes all mount points rprivate by default and leaves a small race window where your unmount will not propagate.

Making it slave/shared in a way makes it more likely that we will leak mount points. Think of an application just doing unshare -m and now all future mounts under /var/lib/docker/overlay2/ will propagate to this application.

I feel we can try to minimize mount point leaks but being able to take care of all corner cases will be very hard. So I am wondering if a solution should come from kernel side instead.

I am wondering will it make sense to convert kernel error message to a warning message? Or is there a way where kernel can figure out that we are essentially using same set of lower/upper/work directories and instead of instantiating a new super block, it re-uses existing super block (which is around due to busy mount). Amir will know much more, I am ccing him.

Contributor

rhvgoyal commented Sep 26, 2017

cc @amir73il

Ok, so kernel commit 2cac0c00a6cdcc9 started giving errors in case of mount point leaks. Trying to avoid mount point leaks completely will make sense only if mount propagation was synchronous. That is by the time unmount finished, all the propagations and unmounting must have finished before it. If not, then we still have races where a leaked mount has not been unmounted yet (because propagation has not happened yet) and we are trying to create a mount point using same upper directory. I don't know if unmount is synchronous or not.

Secondly, even if you make slave, we will still have the race because runc makes all mount points rprivate by default and leaves a small race window where your unmount will not propagate.

Making it slave/shared in a way makes it more likely that we will leak mount points. Think of an application just doing unshare -m and now all future mounts under /var/lib/docker/overlay2/ will propagate to this application.

I feel we can try to minimize mount point leaks but being able to take care of all corner cases will be very hard. So I am wondering if a solution should come from kernel side instead.

I am wondering will it make sense to convert kernel error message to a warning message? Or is there a way where kernel can figure out that we are essentially using same set of lower/upper/work directories and instead of instantiating a new super block, it re-uses existing super block (which is around due to busy mount). Amir will know much more, I am ccing him.

@amir73il

This comment has been minimized.

Show comment
Hide comment
@amir73il

amir73il Sep 26, 2017

Contributor

@rhvgoyal overlayfa is right to complain, so kernel warning is not good enough.
re-using the same super block sounds interesting, but I have no idea how hard it would be
or if that idea will be acceptable.

@tonistiigi The root cause of the problem IMO is the chroot() does not Unmount
all the tree under pivotDir.
You see making all mounts private in the new ns is just the first step.
The cloned overlay mounts, even though private are still in use by the new ns,
so they must be unmounted after pivot_root.
The Mount syscall below is not needed because all mounts in new ns are already private.
The Umount syscall below is wrong because it uses MNT_DETACH instead of
unmounting all tree below pivotDir synchronically.

// Make the pivotDir (where the old root lives) private so it can be unmounted without propagating to the host
if err := syscall.Mount("", pivotDir, "", syscall.MS_PRIVATE|syscall.MS_REC, ""); err != nil {
	return fmt.Errorf("Error making old root private after pivot: %v", err)
}

// Now unmount the old root so it's no longer visible from the new root
if err := syscall.Unmount(pivotDir, syscall.MNT_DETACH); err != nil {
	return fmt.Errorf("Error while unmounting old root after pivot: %v", err)
}
Contributor

amir73il commented Sep 26, 2017

@rhvgoyal overlayfa is right to complain, so kernel warning is not good enough.
re-using the same super block sounds interesting, but I have no idea how hard it would be
or if that idea will be acceptable.

@tonistiigi The root cause of the problem IMO is the chroot() does not Unmount
all the tree under pivotDir.
You see making all mounts private in the new ns is just the first step.
The cloned overlay mounts, even though private are still in use by the new ns,
so they must be unmounted after pivot_root.
The Mount syscall below is not needed because all mounts in new ns are already private.
The Umount syscall below is wrong because it uses MNT_DETACH instead of
unmounting all tree below pivotDir synchronically.

// Make the pivotDir (where the old root lives) private so it can be unmounted without propagating to the host
if err := syscall.Mount("", pivotDir, "", syscall.MS_PRIVATE|syscall.MS_REC, ""); err != nil {
	return fmt.Errorf("Error making old root private after pivot: %v", err)
}

// Now unmount the old root so it's no longer visible from the new root
if err := syscall.Unmount(pivotDir, syscall.MNT_DETACH); err != nil {
	return fmt.Errorf("Error while unmounting old root after pivot: %v", err)
}

@moby moby deleted a comment from GordonTheTurtle Sep 26, 2017

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Sep 26, 2017

Contributor

@rhvgoyal

Trying to avoid mount point leaks completely will make sense only if mount propagation was synchronous.

That's not true. I can reliably reproduce a mount point leak to runc (leaked for 100ms to 1s) without this patch. With this patch I can't. It therefore still makes sense to try and avoid it.
Just because we can't be perfect isn't a reason to do nothing.

I don't know if unmount is synchronous or not.

Because the code here uses MNT_DETACH as of #33638 we're opting in to async behavior either way. I think that PR is technically wrong, but that's something to address separate from this.
I do agree this code is still wrong in theory. I'm simply fixing the instance that is hit reliably in practice.

Secondly, even if you make slave, we will still have the race because runc makes all mount points rprivate by default.

False. runc makes the mountpoints rslave. See either my original issue I say this fixes and my discussion there or my pr against runc and the referenced defaulting code.

Making it slave/shared in a way makes it more likely that we will leak mount points. Think of an application just doing unshare -m and now all future mounts under /var/lib/docker/overlay2/ will propagate to this application.

I disagree. unshare -m by default also sets private on /, so what you say will happen won't. An unshare -m someone runs won't get future mounts either before or after this change. It will break things either way, but I'm trying to fix runc stomping on docker, not arbitrary external users stomping on docker. Before or after this change, external users that don't use rslave will break docker roughly equally badly I think.

If they do unshare -m --propagation slave then they will get new mounts, but the umounts from the host will also propagate, so it's not a big deal. That mountpoint isn't leaked unless the program inside of unshare -m then remounts it private.

I feel we can try to minimize mount point leaks but being able to take care of all corner cases will be very hard. So I am wondering if a solution should come from kernel side instead.

A better solution would be something like @cyphar suggests imo.
I think this current change is an incremental improvement that is necessary to fix real issues though.

Or is there a way where kernel can figure out that we are essentially using same set of lower/upper/work directories and instead of instantiating a new super block, it re-uses existing super block (which is around due to busy mount).

A good first step there would be to not do "mount umount mount umount mount" to start a container, but instead just mount it once and leave it mounted.
That's also a significantly more painful refactor to do when there's already bleeding.


@amir73il

The root cause of the problem IMO is the chroot() does not Unmount
all the tree under pivotDir.

Read my original issue (#34672) where I talk about the actual issue I'm trying to solve. There may be more issues, but the issue I'm trying to fix does not actually use any of that code so it's not closely related to what I see as the root problem.

I agree the things you pointed out are problems (though I think the solution is making it rslave not rprivate), but I don't think that's related to getting this merged or the specific issue I'm trying to fix.

Contributor

euank commented Sep 26, 2017

@rhvgoyal

Trying to avoid mount point leaks completely will make sense only if mount propagation was synchronous.

That's not true. I can reliably reproduce a mount point leak to runc (leaked for 100ms to 1s) without this patch. With this patch I can't. It therefore still makes sense to try and avoid it.
Just because we can't be perfect isn't a reason to do nothing.

I don't know if unmount is synchronous or not.

Because the code here uses MNT_DETACH as of #33638 we're opting in to async behavior either way. I think that PR is technically wrong, but that's something to address separate from this.
I do agree this code is still wrong in theory. I'm simply fixing the instance that is hit reliably in practice.

Secondly, even if you make slave, we will still have the race because runc makes all mount points rprivate by default.

False. runc makes the mountpoints rslave. See either my original issue I say this fixes and my discussion there or my pr against runc and the referenced defaulting code.

Making it slave/shared in a way makes it more likely that we will leak mount points. Think of an application just doing unshare -m and now all future mounts under /var/lib/docker/overlay2/ will propagate to this application.

I disagree. unshare -m by default also sets private on /, so what you say will happen won't. An unshare -m someone runs won't get future mounts either before or after this change. It will break things either way, but I'm trying to fix runc stomping on docker, not arbitrary external users stomping on docker. Before or after this change, external users that don't use rslave will break docker roughly equally badly I think.

If they do unshare -m --propagation slave then they will get new mounts, but the umounts from the host will also propagate, so it's not a big deal. That mountpoint isn't leaked unless the program inside of unshare -m then remounts it private.

I feel we can try to minimize mount point leaks but being able to take care of all corner cases will be very hard. So I am wondering if a solution should come from kernel side instead.

A better solution would be something like @cyphar suggests imo.
I think this current change is an incremental improvement that is necessary to fix real issues though.

Or is there a way where kernel can figure out that we are essentially using same set of lower/upper/work directories and instead of instantiating a new super block, it re-uses existing super block (which is around due to busy mount).

A good first step there would be to not do "mount umount mount umount mount" to start a container, but instead just mount it once and leave it mounted.
That's also a significantly more painful refactor to do when there's already bleeding.


@amir73il

The root cause of the problem IMO is the chroot() does not Unmount
all the tree under pivotDir.

Read my original issue (#34672) where I talk about the actual issue I'm trying to solve. There may be more issues, but the issue I'm trying to fix does not actually use any of that code so it's not closely related to what I see as the root problem.

I agree the things you pointed out are problems (though I think the solution is making it rslave not rprivate), but I don't think that's related to getting this merged or the specific issue I'm trying to fix.

@amir73il

This comment has been minimized.

Show comment
Hide comment
@amir73il

amir73il Sep 26, 2017

Contributor

Yeh I am confused of all the different issues and fixes. Anyway, the solution of rslave and lazy umount in cloned ns looks good.

Contributor

amir73il commented Sep 26, 2017

Yeh I am confused of all the different issues and fixes. Anyway, the solution of rslave and lazy umount in cloned ns looks good.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Sep 26, 2017

Contributor

@amir73il

Yeh I am confused of all the different issues and fixes. Anyway, the solution of rslave and lazy umount in cloned ns looks good.

This PR is for fixing something else though. This is to fix the case where mounting /var/lib/overlay2 as private explicitly means that an unshare -m --propagation rslave, such as the one done by runc, will have a private reference to /var/lib/overlay2/$id/merged, meaning that the overlay2 driver leaks the mount until the program with a cloned mountns exits.

Are you willing to review this change, or do you have specific questions about this issue that isn't answered already by my comments in #34672?

Contributor

euank commented Sep 26, 2017

@amir73il

Yeh I am confused of all the different issues and fixes. Anyway, the solution of rslave and lazy umount in cloned ns looks good.

This PR is for fixing something else though. This is to fix the case where mounting /var/lib/overlay2 as private explicitly means that an unshare -m --propagation rslave, such as the one done by runc, will have a private reference to /var/lib/overlay2/$id/merged, meaning that the overlay2 driver leaks the mount until the program with a cloned mountns exits.

Are you willing to review this change, or do you have specific questions about this issue that isn't answered already by my comments in #34672?

@amir73il

This comment has been minimized.

Show comment
Hide comment
@amir73il

amir73il Sep 27, 2017

Contributor

I can only ACK the change that doesn't make home dir private. I don't know enough on the bug picture to say if var/lib/docker/overlay is always shared to begin with.

I suppose if makes little sense to make home dir private for other graph drivers? And the rslave change should be made to chroot() as well. Otherwise this fix is correct but partial. You shouldn't fix just the problem you see in front of you when you know there is a bigger issue to solve. But fine with me of this is the first step.

Contributor

amir73il commented Sep 27, 2017

I can only ACK the change that doesn't make home dir private. I don't know enough on the bug picture to say if var/lib/docker/overlay is always shared to begin with.

I suppose if makes little sense to make home dir private for other graph drivers? And the rslave change should be made to chroot() as well. Otherwise this fix is correct but partial. You shouldn't fix just the problem you see in front of you when you know there is a bigger issue to solve. But fine with me of this is the first step.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Sep 27, 2017

Contributor

@amir73il I apologize if I came off a little roughly. The internet does make it hard to convey tone properly.

I can only ACK the change that doesn't make home dir private

Thanks! What I'm looking for here is making sure this change is in the right direction

... var/lib/docker/overlay is always shared to begin with.

I think it generally will be, especially in a systemd world. It's possible this should be swapping MakePrivate over to MakeShared, though I'm a little more wary about explicit shared than defaulting shared

I suppose if makes little sense to make home dir private for other graph drivers?

It might, but I'm not an expert. I do know there are some related issues floating around for devicemapper at the least.

And the rslave change should be made to chroot() as well

Yup, happy to look at that and changing the MNT_DETACH bit later. One step at a time.

Contributor

euank commented Sep 27, 2017

@amir73il I apologize if I came off a little roughly. The internet does make it hard to convey tone properly.

I can only ACK the change that doesn't make home dir private

Thanks! What I'm looking for here is making sure this change is in the right direction

... var/lib/docker/overlay is always shared to begin with.

I think it generally will be, especially in a systemd world. It's possible this should be swapping MakePrivate over to MakeShared, though I'm a little more wary about explicit shared than defaulting shared

I suppose if makes little sense to make home dir private for other graph drivers?

It might, but I'm not an expert. I do know there are some related issues floating around for devicemapper at the least.

And the rslave change should be made to chroot() as well

Yup, happy to look at that and changing the MNT_DETACH bit later. One step at a time.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Sep 28, 2017

Contributor

Review bump for this change

Contributor

euank commented Sep 28, 2017

Review bump for this change

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Sep 28, 2017

Contributor

@amir73il Thinking more about it, it feels like regression to me (from userspace point of view). Sure, we can try modifying user space to not leak mount points, but kernel upgrade will still break existing container runtime.

And making sure setup is completely right and none of the mounts point are leaking is hard.

So if I take a device /dev/foo and mount it on /mnt/test, leak the mount point, unmount it and mount again, it still works. It most likely gets same superblock.

IMHO, either we need to implement same behavior for overlay or reduce the level from error to warning.

And this is irrespective of docker changes. Sure try to reduce the amount of leaked mount points, that can only help. But it still feels like a regression.

Contributor

rhvgoyal commented Sep 28, 2017

@amir73il Thinking more about it, it feels like regression to me (from userspace point of view). Sure, we can try modifying user space to not leak mount points, but kernel upgrade will still break existing container runtime.

And making sure setup is completely right and none of the mounts point are leaking is hard.

So if I take a device /dev/foo and mount it on /mnt/test, leak the mount point, unmount it and mount again, it still works. It most likely gets same superblock.

IMHO, either we need to implement same behavior for overlay or reduce the level from error to warning.

And this is irrespective of docker changes. Sure try to reduce the amount of leaked mount points, that can only help. But it still feels like a regression.

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Sep 28, 2017

Contributor

@amir73il In some ways it feels like DOS to me. If an overlay mount point has leaked somewhere, then root can not create a new overlay mount point and get to its data.

Terminal 1

mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/

Terminal 2

unshare -m

Terminal 1

umount merged
mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/
mount: /root/overlay-testing/merged: none already mounted or mount point busy

Now root can't access its own data due to leaked mount point in some process mount namespace. And that sounds not desirable to me (given how easy it is to leak mount points).

Contributor

rhvgoyal commented Sep 28, 2017

@amir73il In some ways it feels like DOS to me. If an overlay mount point has leaked somewhere, then root can not create a new overlay mount point and get to its data.

Terminal 1

mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/

Terminal 2

unshare -m

Terminal 1

umount merged
mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/
mount: /root/overlay-testing/merged: none already mounted or mount point busy

Now root can't access its own data due to leaked mount point in some process mount namespace. And that sounds not desirable to me (given how easy it is to leak mount points).

@amir73il

This comment has been minimized.

Show comment
Hide comment
@amir73il

amir73il Sep 29, 2017

Contributor

@rhvgoyal wrt regression you are definitely right. This should be fixed in stable kernels.
I will post a patch to make this new enforcement depend on new config option INDEX=y.

Contributor

amir73il commented Sep 29, 2017

@rhvgoyal wrt regression you are definitely right. This should be fixed in stable kernels.
I will post a patch to make this new enforcement depend on new config option INDEX=y.

@tonistiigi

This comment has been minimized.

Show comment
Hide comment
@tonistiigi
Member

tonistiigi commented Oct 2, 2017

@kolyshkin

This comment has been minimized.

Show comment
Hide comment
@kolyshkin

kolyshkin Oct 6, 2017

Contributor

In my limited testing, this patch indeed helps with EBUSY on /merged on removal, which I can reproduce easily on a RHEL 7.4 system with sysfs fs.may_detach_mounts = 0 and 100-200 short-lived containers starting/ending in parallel (docker run --rm busybox true).

This is not ultimate fix though as I still see occasional EBUSY on /shm removal (which is somewhat
harder to reproduce).

Contributor

kolyshkin commented Oct 6, 2017

In my limited testing, this patch indeed helps with EBUSY on /merged on removal, which I can reproduce easily on a RHEL 7.4 system with sysfs fs.may_detach_mounts = 0 and 100-200 short-lived containers starting/ending in parallel (docker run --rm busybox true).

This is not ultimate fix though as I still see occasional EBUSY on /shm removal (which is somewhat
harder to reproduce).

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Oct 12, 2017

Contributor

@kolyshkin For the reasons described in #34672, I also had to apply this runc patch to fix my issue: opencontainers/runc#1598

Is there a separate issue or bugzilla filed for the /shm ebusy you get with further details? It's possible that runc patch could help further, or that other people could help investigate.

Also, review ping! It seems like this PR has stalled out, but I don't think I've gotten any actionable requests / reasons why it's stalled.

@dmcgowan and @rhvgoyal, want to review this change?

Even if overlayfs lets us get away with leaking mounts, I don't think we should be and so IMO this should still be merged.

Contributor

euank commented Oct 12, 2017

@kolyshkin For the reasons described in #34672, I also had to apply this runc patch to fix my issue: opencontainers/runc#1598

Is there a separate issue or bugzilla filed for the /shm ebusy you get with further details? It's possible that runc patch could help further, or that other people could help investigate.

Also, review ping! It seems like this PR has stalled out, but I don't think I've gotten any actionable requests / reasons why it's stalled.

@dmcgowan and @rhvgoyal, want to review this change?

Even if overlayfs lets us get away with leaking mounts, I don't think we should be and so IMO this should still be merged.

@doits

This comment has been minimized.

Show comment
Hide comment
@doits

doits Oct 20, 2017

I can confirm that this patch (applied to docker 17.10) fixes issue #34672. +1 for merging it.

doits commented Oct 20, 2017

I can confirm that this patch (applied to docker 17.10) fixes issue #34672. +1 for merging it.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Oct 20, 2017

Contributor

Why wouldn't we use rslave here?
Also wondering if #35217 is relevant (or correct) as well.

Contributor

cpuguy83 commented Oct 20, 2017

Why wouldn't we use rslave here?
Also wondering if #35217 is relevant (or correct) as well.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Oct 22, 2017

Contributor

@cpuguy83 rslave doesn't fix the problem I'm trying to fix.
I've created a simplified model of the problem here.

In that model, I've currently got it using --make-rslave for the simulated /var/lib/docker/overlay2 directory.

On a 4.13 kernel, that will either fail, or on newer 4.13 it'll give a scary warning in dmesg and succeed.

However, on any kernel you can see that the mounts shown after "c2: still in runc init, current view of mounts" include the same c1 mount with the same mount ID, not a new mount with a different mount ID.

That's the leak I'm trying to squash.

(note: the below explanation might be wrong in some way; experimentally the above is true, but I wouldn't be surprised if I'm misunderstanding some intricacy of mount subtrees here)

Using rslave doesn't fix that leak because the leak is really two layers deep (runc has its own rslave subtree), and thus the umount sitll doesn't forward since slave mounts are only defined to receive from their master, not propagate as shared do.

If we switch it to explicitly be MakeShared vs Private or Rslave, that will work just fine. However, that's really what just skipping the --make-private does in practice. If you think it's cleaner to explicitly use MakeShared here, I could switch it to that rather than reverting the MakePrivate commit.

Contributor

euank commented Oct 22, 2017

@cpuguy83 rslave doesn't fix the problem I'm trying to fix.
I've created a simplified model of the problem here.

In that model, I've currently got it using --make-rslave for the simulated /var/lib/docker/overlay2 directory.

On a 4.13 kernel, that will either fail, or on newer 4.13 it'll give a scary warning in dmesg and succeed.

However, on any kernel you can see that the mounts shown after "c2: still in runc init, current view of mounts" include the same c1 mount with the same mount ID, not a new mount with a different mount ID.

That's the leak I'm trying to squash.

(note: the below explanation might be wrong in some way; experimentally the above is true, but I wouldn't be surprised if I'm misunderstanding some intricacy of mount subtrees here)

Using rslave doesn't fix that leak because the leak is really two layers deep (runc has its own rslave subtree), and thus the umount sitll doesn't forward since slave mounts are only defined to receive from their master, not propagate as shared do.

If we switch it to explicitly be MakeShared vs Private or Rslave, that will work just fine. However, that's really what just skipping the --make-private does in practice. If you think it's cleaner to explicitly use MakeShared here, I could switch it to that rather than reverting the MakePrivate commit.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Nov 21, 2017

Contributor

Intentionally leaving the fixup! unsigned-off to be sure I go through and autosquash it before this merges.

I think I've addressed or replied to the review comments.

Contributor

euank commented Nov 21, 2017

Intentionally leaving the fixup! unsigned-off to be sure I go through and autosquash it before this merges.

I think I've addressed or replied to the review comments.

@kolyshkin

LGTM except for a tiny nitpick

Show outdated Hide outdated daemon/graphdriver/overlay/overlay.go Outdated
Show outdated Hide outdated daemon/graphdriver/overlay2/overlay.go Outdated
@kolyshkin

This comment has been minimized.

Show comment
Hide comment
@kolyshkin

kolyshkin Nov 22, 2017

Contributor

Can you please rebase it?

Contributor

kolyshkin commented Nov 22, 2017

Can you please rebase it?

euank added some commits Sep 25, 2017

graphdriver/overlay: minor doc comment cleanup
Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
graphdriver/overlay{,2}: remove 'merged' on umount
This removes and recreates the merged dir with each umount/mount
respectively.
This is done to make the impact of leaking mountpoints have less
user-visible impact.

It's fairly easy to accidentally leak mountpoints (even if moby doesn't,
other tools on linux like 'unshare' are quite able to incidentally do
so).

As of recently, overlayfs reacts to these mounts being leaked (see

One trick to force an unmount is to remove the mounted directory and
recreate it. Devicemapper now does this, overlay can follow suit.

Signed-off-by: Euan Kemp <euan.kemp@coreos.com>

@GordonTheTurtle GordonTheTurtle removed the dco/no label Nov 22, 2017

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Nov 22, 2017

Contributor

rebased, addressed nits, autosquashed

Contributor

euank commented Nov 22, 2017

rebased, addressed nits, autosquashed

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Nov 23, 2017

Contributor

From the Z run:

22:52:17 error loading frozen images: ApplyLayer exit status 1 stdout: stderr: unexpected EOF

Rebuilding.

Contributor

cpuguy83 commented Nov 23, 2017

From the Z run:

22:52:17 error loading frozen images: ApplyLayer exit status 1 stdout: stderr: unexpected EOF

Rebuilding.

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Nov 24, 2017

Contributor

LGTM.

Contributor

cyphar commented Nov 24, 2017

LGTM.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Nov 24, 2017

Member

ping @kolyshkin PTAL - nits were addressed

Member

thaJeztah commented Nov 24, 2017

ping @kolyshkin PTAL - nits were addressed

@cpuguy83

LGTM

@kolyshkin

This comment has been minimized.

Show comment
Hide comment
@kolyshkin

kolyshkin Nov 27, 2017

Contributor

LGTM (disclaimer: I'm not a maintainer)

Contributor

kolyshkin commented Nov 27, 2017

LGTM (disclaimer: I'm not a maintainer)

@thaJeztah

LGTM, thanks all for reviewing!

@AkihiroSuda AkihiroSuda merged commit 09eb7bc into moby:master Nov 29, 2017

6 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 37977 has succeeded
Details
janky Jenkins build Docker-PRs 46688 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 7101 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 18303 has succeeded
Details
z Jenkins build Docker-PRs-s390x 6928 has succeeded
Details
@doits

This comment has been minimized.

Show comment
Hide comment
@doits

doits Dec 10, 2017

Looks like this did not land in 17.11? I had to apply the patch manually, here's a slightly modified version which applies correctly on 17.11 in case anyone else needs this.

doits commented Dec 10, 2017

Looks like this did not land in 17.11? I had to apply the patch manually, here's a slightly modified version which applies correctly on 17.11 in case anyone else needs this.

doits added a commit to doits/docker-ce that referenced this pull request Dec 10, 2017

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Dec 11, 2017

Member

No it's not part of 17.11, will be in 17.12

Member

thaJeztah commented Dec 11, 2017

No it's not part of 17.11, will be in 17.12

@euank euank deleted the euank:public-mounts branch Jan 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment