New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devicemapper: Wait for device removal if deferredRemoval=true and deferredDeletion=… #33877

Merged
merged 1 commit into from Jul 13, 2017

Conversation

Projects
None yet
7 participants
@rhvgoyal
Contributor

rhvgoyal commented Jun 29, 2017

…false

There have been some cases where umount, a device can be busy for a very
short duration. Maybe its udev rules, or maybe it is runc related races
or probably it is something else. We don't know yet.

If deferred removal is enabled but deferred deletion is not, then for the
case of "docker run -ti --rm fedora bash", a container will exit, device
will be deferred removed and then immediately a call will come to delete
the device. It is possible that deletion will fail if device was busy
at that time.

A device can't be deleted if it can't be removed/deactivated first. There
is only one exception and that is when deferred deletion is on. In that
case graph driver will keep track of deleted device and try to delete it
later and return success to caller.

Always make sure that device deactivation is synchronous when device is
being deleted (except the case when deferred deletion is enabled).

This should also take care of small races when device is busy for a short
duration and it is being deleted.

Signed-off-by: Vivek Goyal vgoyal@redhat.com

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal
Contributor

rhvgoyal commented Jun 29, 2017

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal
Contributor

rhvgoyal commented Jun 29, 2017

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jun 29, 2017

Contributor

👍 This is what @vrothberg also suggested in #33846.

Contributor

cyphar commented Jun 29, 2017

👍 This is what @vrothberg also suggested in #33846.

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jun 29, 2017

Contributor

LGTM (IANAM).

Contributor

cyphar commented Jun 29, 2017

LGTM (IANAM).

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jun 29, 2017

Contributor

@slp please give this a try and let me know if this fixes the issue you are facing or not.

Contributor

rhvgoyal commented Jun 29, 2017

@slp please give this a try and let me know if this fixes the issue you are facing or not.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jun 29, 2017

Member

SGTM, but I'll wait for @cpuguy83 to have a look as well

Thanks for the well-written commit message ❤️

Member

thaJeztah commented Jun 29, 2017

SGTM, but I'll wait for @cpuguy83 to have a look as well

Thanks for the well-written commit message ❤️

@slp

This comment has been minimized.

Show comment
Hide comment
@slp

slp Jun 30, 2017

@rhvgoyal Just tried your patch. Same environment as before. I'm getting frequent errors like this:

Error response from daemon: driver "devicemapper" failed to remove root filesystem for 32857d293a7af65cbe69ccbbfc869d2cc94592178bf7d04aa1514d2960537b76: failed to remove device 0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4: devicemapper: Error running RemoveDevice dm_task_run failed

Accompanied by these others in dmesg:

[260515.276364] device-mapper: ioctl: unable to remove open device docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4
[260515.344597] XFS (dm-5): Unmounting Filesystem
[260515.376775] device-mapper: ioctl: unable to remove open device docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4
[260515.477380] device-mapper: ioctl: device doesn't appear to be in the dev hash table.

The problem is that, on container exit, a RemoveDeviceDeferred order is issued. Then, while processing the container delete request (remember, has been started with "--rm"), a bunch of RemoveDevice calls are made by devices.removeDevice retry loop. In the deferred removal success, the synchronous removal will fail, returning ENXIO.

Those are de debug logs frow dockerd, showing the two deactivateDevice calls for the same device:

DEBU[0173] devmapper: deactivateDevice START(0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devicemapper: RemoveDeviceDeferred START(docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devicemapper: RemoveDeviceDeferred END(docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devmapper: deactivateDevice END(0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devmapper: UnmountDevice END(hash=0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] Calling GET /v1.24/containers/32857d293a7af65cbe69ccbbfc869d2cc94592178bf7d04aa1514d2960537b76/json 
DEBU[0173] Calling DELETE /v1.24/containers/32857d293a7af65cbe69ccbbfc869d2cc94592178bf7d04aa1514d2960537b76?force=1&v=1 
DEBU[0173] devmapper: DeleteDevice START(hash=0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4 syncDelete=false) 
DEBU[0173] devmapper: deactivateDevice START(0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devmapper: removeDevice START(docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] sandbox set key processing took 79.304296ms for container bd8784ba446aa43a0bd35adf62ee412c34f473766de9250fe9be0f15669ed049 
DEBU[0174] devmapper: removeDevice END(docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0174] devmapper: deactivateDevice END(0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0174] devmapper: Error deactivating device: devicemapper: Error running RemoveDevice dm_task_run failed 
DEBU[0174] devmapper: DeleteDevice END(hash=0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4 syncDelete=false) 

I must mention that retrying the devicemapper.DeleteDevice call in devices.deleteTransaction doesn't suffer from this issue.

slp commented Jun 30, 2017

@rhvgoyal Just tried your patch. Same environment as before. I'm getting frequent errors like this:

Error response from daemon: driver "devicemapper" failed to remove root filesystem for 32857d293a7af65cbe69ccbbfc869d2cc94592178bf7d04aa1514d2960537b76: failed to remove device 0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4: devicemapper: Error running RemoveDevice dm_task_run failed

Accompanied by these others in dmesg:

[260515.276364] device-mapper: ioctl: unable to remove open device docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4
[260515.344597] XFS (dm-5): Unmounting Filesystem
[260515.376775] device-mapper: ioctl: unable to remove open device docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4
[260515.477380] device-mapper: ioctl: device doesn't appear to be in the dev hash table.

The problem is that, on container exit, a RemoveDeviceDeferred order is issued. Then, while processing the container delete request (remember, has been started with "--rm"), a bunch of RemoveDevice calls are made by devices.removeDevice retry loop. In the deferred removal success, the synchronous removal will fail, returning ENXIO.

Those are de debug logs frow dockerd, showing the two deactivateDevice calls for the same device:

DEBU[0173] devmapper: deactivateDevice START(0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devicemapper: RemoveDeviceDeferred START(docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devicemapper: RemoveDeviceDeferred END(docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devmapper: deactivateDevice END(0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devmapper: UnmountDevice END(hash=0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] Calling GET /v1.24/containers/32857d293a7af65cbe69ccbbfc869d2cc94592178bf7d04aa1514d2960537b76/json 
DEBU[0173] Calling DELETE /v1.24/containers/32857d293a7af65cbe69ccbbfc869d2cc94592178bf7d04aa1514d2960537b76?force=1&v=1 
DEBU[0173] devmapper: DeleteDevice START(hash=0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4 syncDelete=false) 
DEBU[0173] devmapper: deactivateDevice START(0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devmapper: removeDevice START(docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] sandbox set key processing took 79.304296ms for container bd8784ba446aa43a0bd35adf62ee412c34f473766de9250fe9be0f15669ed049 
DEBU[0174] devmapper: removeDevice END(docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0174] devmapper: deactivateDevice END(0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0174] devmapper: Error deactivating device: devicemapper: Error running RemoveDevice dm_task_run failed 
DEBU[0174] devmapper: DeleteDevice END(hash=0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4 syncDelete=false) 

I must mention that retrying the devicemapper.DeleteDevice call in devices.deleteTransaction doesn't suffer from this issue.

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jun 30, 2017

Contributor

Aha! @slp thanks for confirming that the issues we've been seeing from production boxes are the same as the issues you've had, except that they are caused by not having deferred removal or deletion. It is a shame this patch doesn't fix the problem, but luckily we've been debugging this class of problem for a few days now.

If you can apply #33845 and give the debug logs (--storage-opt dm.libdm_log_level=7 --debug) that would be incredibly helpful.

I must mention that retrying the devicemapper.DeleteDevice call in devices.deleteTransaction doesn't suffer from this issue.

I'm not a maintainer, so I can't speak to what maintainers would prefer. But just because it works doesn't mean it's a proper fix to the problem, and I'm worried that patching the symptoms like this will make debugging future issues harder.

Contributor

cyphar commented Jun 30, 2017

Aha! @slp thanks for confirming that the issues we've been seeing from production boxes are the same as the issues you've had, except that they are caused by not having deferred removal or deletion. It is a shame this patch doesn't fix the problem, but luckily we've been debugging this class of problem for a few days now.

If you can apply #33845 and give the debug logs (--storage-opt dm.libdm_log_level=7 --debug) that would be incredibly helpful.

I must mention that retrying the devicemapper.DeleteDevice call in devices.deleteTransaction doesn't suffer from this issue.

I'm not a maintainer, so I can't speak to what maintainers would prefer. But just because it works doesn't mean it's a proper fix to the problem, and I'm worried that patching the symptoms like this will make debugging future issues harder.

@slp

This comment has been minimized.

Show comment
Hide comment
@slp

slp Jun 30, 2017

@cyphar

I'm not a maintainer, so I can't speak to what maintainers would prefer. But just because it works doesn't mean it's a proper fix to the problem, and I'm worried that patching the symptoms like this will make debugging future issues harder.

Both changes are just patching the symptoms. The actual problem, as we discussed in the other PR, is that, while conainer A is trying to deactivate and remove a device, container B (accidentally) is keeping a temporary (is being dereferenced by Unmount) reference to a mount point from A's namespace, keeping the device busy.

The only signifcant difference between both approaches is retrying RemoveDevice vs. retrying DeleteDevice. Why the latter worries you but the first one doesn't?

slp commented Jun 30, 2017

@cyphar

I'm not a maintainer, so I can't speak to what maintainers would prefer. But just because it works doesn't mean it's a proper fix to the problem, and I'm worried that patching the symptoms like this will make debugging future issues harder.

Both changes are just patching the symptoms. The actual problem, as we discussed in the other PR, is that, while conainer A is trying to deactivate and remove a device, container B (accidentally) is keeping a temporary (is being dereferenced by Unmount) reference to a mount point from A's namespace, keeping the device busy.

The only signifcant difference between both approaches is retrying RemoveDevice vs. retrying DeleteDevice. Why the latter worries you but the first one doesn't?

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jun 30, 2017

Contributor

@slp

Actually you're right, however I would argue this patch actually fixes a semantic issue in deferred removal and deletion. If you have deferred removal but not deferred deletion, then when you want to delete something it doesn't make sense to defer the removal and then do a blocking delete operation. I'm under the impression that this is one of the things that libdm doesn't like.

The only signifcant difference between both approaches is retrying RemoveDevice vs. retrying DeleteDevice. Why the latter worries you but the first one doesn't?

I wish we didn't do any retrying, but as I said above it's possible that the retrying changes are orthogonal to this semantic issue. I have a feeling that the final "solution" is going to be retrying though, despite it's issues simply due to the complexity of solving this properly.

Contributor

cyphar commented Jun 30, 2017

@slp

Actually you're right, however I would argue this patch actually fixes a semantic issue in deferred removal and deletion. If you have deferred removal but not deferred deletion, then when you want to delete something it doesn't make sense to defer the removal and then do a blocking delete operation. I'm under the impression that this is one of the things that libdm doesn't like.

The only signifcant difference between both approaches is retrying RemoveDevice vs. retrying DeleteDevice. Why the latter worries you but the first one doesn't?

I wish we didn't do any retrying, but as I said above it's possible that the retrying changes are orthogonal to this semantic issue. I have a feeling that the final "solution" is going to be retrying though, despite it's issues simply due to the complexity of solving this properly.

@slp

This comment has been minimized.

Show comment
Hide comment
@slp

slp Jun 30, 2017

@cyphar

Actually you're right, however I would argue this patch actually fixes a semantic issue in deferred removal and deletion. If you have deferred removal but not deferred deletion, then when you want to delete something it doesn't make sense to defer the removal and then do a blocking delete operation.

Well, I guess that depends on each user's point of view. If I'm running dockerd with "dm.use_deferred_removal=true" and "dm.use_deferred_removal=false", I'd definitely expect it to always defer removals, while making deletions synchronously.

I'm under the impression that this is one of the things that libdm doesn't like.

What do you mean by that?

slp commented Jun 30, 2017

@cyphar

Actually you're right, however I would argue this patch actually fixes a semantic issue in deferred removal and deletion. If you have deferred removal but not deferred deletion, then when you want to delete something it doesn't make sense to defer the removal and then do a blocking delete operation.

Well, I guess that depends on each user's point of view. If I'm running dockerd with "dm.use_deferred_removal=true" and "dm.use_deferred_removal=false", I'd definitely expect it to always defer removals, while making deletions synchronously.

I'm under the impression that this is one of the things that libdm doesn't like.

What do you mean by that?

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jun 30, 2017

Contributor

I'd definitely expect it to always defer removals, while making deletions synchronously.

This patch doesn't change that removals will happen asynchronously, but if you have a delete operation that implies a removal it doesn't make sense to defer the removal because the delete will have to wait for the removal to finish before you can synchronously delete. I'm not sure I understand how a user could tell the difference between the two because both will synchronously finish.

What do you mean by that?

From my understanding, if you have a deferred removal pending and then attempt to do a delete, the delete will fail because libdm sees that the device has not been removed. It's stretch to say that "it doesn't like it" but I meant that it doesn't know internally that it should wait for the removal to complete and then synchronously carry out the deletion. But I could definitely be wrong in my understanding.

Contributor

cyphar commented Jun 30, 2017

I'd definitely expect it to always defer removals, while making deletions synchronously.

This patch doesn't change that removals will happen asynchronously, but if you have a delete operation that implies a removal it doesn't make sense to defer the removal because the delete will have to wait for the removal to finish before you can synchronously delete. I'm not sure I understand how a user could tell the difference between the two because both will synchronously finish.

What do you mean by that?

From my understanding, if you have a deferred removal pending and then attempt to do a delete, the delete will fail because libdm sees that the device has not been removed. It's stretch to say that "it doesn't like it" but I meant that it doesn't know internally that it should wait for the removal to complete and then synchronously carry out the deletion. But I could definitely be wrong in my understanding.

Wait for device removal if deferredRemoval=true and deferredDeletion=…
…false

There have been some cases where umount, a device can be busy for a very
short duration. Maybe its udev rules, or maybe it is runc related races
or probably it is something else. We don't know yet.

If deferred removal is enabled but deferred deletion is not, then for the
case of "docker run -ti --rm fedora bash", a container will exit, device
will be deferred removed and then immediately a call will come to delete
the device. It is possible that deletion will fail if device was busy
at that time.

A device can't be deleted if it can't be removed/deactivated first. There
is only one exception and that is when deferred deletion is on. In that
case graph driver will keep track of deleted device and try to delete it
later and return success to caller.

Always make sure that device deactivation is synchronous when device is
being deleted (except the case when deferred deletion is enabled).

This should also take care of small races when device is busy for a short
duration and it is being deleted.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jun 30, 2017

Contributor

@slp, ok, so it makes sense that if a device is already scheduled for deferred removal and then we are trying to remove it synchronously later, it will fail saying device does not exist (as long as device deferred removal succeeds).

So I have updated the patch to not error out if device is already gone and parse devicemapper.ErrEnxio. Also updated RemoveDevice and RemoveDeviceDeferred code to return devicemapper.ErrEnxio if device is not present.

Please give it a try now.

Contributor

rhvgoyal commented Jun 30, 2017

@slp, ok, so it makes sense that if a device is already scheduled for deferred removal and then we are trying to remove it synchronously later, it will fail saying device does not exist (as long as device deferred removal succeeds).

So I have updated the patch to not error out if device is already gone and parse devicemapper.ErrEnxio. Also updated RemoveDevice and RemoveDeviceDeferred code to return devicemapper.ErrEnxio if device is not present.

Please give it a try now.

@slp

This comment has been minimized.

Show comment
Hide comment
@slp

slp Jul 1, 2017

@rhvgoyal Just tested in the same environment. Container deletion no longer fails, but messages like these are still being logged by kernel:

[366626.745332] device-mapper: ioctl: unable to remove open device docker-253:0-17074721-6db24cc6a1617f7f934dd7f85109ce1403115f8a8f367723bdfde3fcf15cf9d0
[366626.845679] device-mapper: ioctl: device doesn't appear to be in the dev hash table.
[366626.870889] device-mapper: ioctl: remove_all left 6 open device(s)

In any case, this patch works around the main symptom, and can easily be backported to stable versions, so I'm happy with it.

Thanks Vivek!

slp commented Jul 1, 2017

@rhvgoyal Just tested in the same environment. Container deletion no longer fails, but messages like these are still being logged by kernel:

[366626.745332] device-mapper: ioctl: unable to remove open device docker-253:0-17074721-6db24cc6a1617f7f934dd7f85109ce1403115f8a8f367723bdfde3fcf15cf9d0
[366626.845679] device-mapper: ioctl: device doesn't appear to be in the dev hash table.
[366626.870889] device-mapper: ioctl: remove_all left 6 open device(s)

In any case, this patch works around the main symptom, and can easily be backported to stable versions, so I'm happy with it.

Thanks Vivek!

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jul 3, 2017

Contributor

@slp, kernel messages for failing to remove the device will continue to come as we are trying to remove device when it is busy. And also trying to remove it after it has been removed due to deferred removal.

Glad that this patch finally fixes the issue you are seeing.

Not sure why CI is not making any progress on this.

@thaJeztah will you be able to restart CI for this job.

Contributor

rhvgoyal commented Jul 3, 2017

@slp, kernel messages for failing to remove the device will continue to come as we are trying to remove device when it is busy. And also trying to remove it after it has been removed due to deferred removal.

Glad that this patch finally fixes the issue you are seeing.

Not sure why CI is not making any progress on this.

@thaJeztah will you be able to restart CI for this job.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 4, 2017

Member

I restarted CI, thanks for the ping

Member

thaJeztah commented Jul 4, 2017

I restarted CI, thanks for the ping

@vrothberg

This comment has been minimized.

Show comment
Hide comment
@vrothberg

vrothberg Jul 5, 2017

03:26:46 re-exec error: exit status 1: output: BackupWrite [...] There is not enough space on the disk.

I am pretty sure that's not caused by this commit. @thaJeztah, @rhvgoyal, another CI run?

vrothberg commented Jul 5, 2017

03:26:46 re-exec error: exit status 1: output: BackupWrite [...] There is not enough space on the disk.

I am pretty sure that's not caused by this commit. @thaJeztah, @rhvgoyal, another CI run?

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jul 5, 2017

Contributor

Right, this does not look related to my patch. @thaJeztah another restart?

Contributor

rhvgoyal commented Jul 5, 2017

Right, this does not look related to my patch. @thaJeztah another restart?

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jul 7, 2017

Contributor

@cpuguy83 PTAL

Contributor

rhvgoyal commented Jul 7, 2017

@cpuguy83 PTAL

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jul 7, 2017

Contributor

So this patch is basically saying either both deferred deletion and deferred removal must be enabled, or neither are enabled.
There is already a check on startup for the user requested deferred deletion then deferred removal must be enabled.

The patch enforces the flip-side of that without erroring out, is that right?

Contributor

cpuguy83 commented Jul 7, 2017

So this patch is basically saying either both deferred deletion and deferred removal must be enabled, or neither are enabled.
There is already a check on startup for the user requested deferred deletion then deferred removal must be enabled.

The patch enforces the flip-side of that without erroring out, is that right?

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jul 7, 2017

Contributor

@cpuguy83 No, this patch is making sure that a user can run with deferred removal enabled but deferred deletion disabled. So graph driver will support following 3 combinations.

  • both disabled
  • deferred removal enabled, deferred deletion disabled
  • both enabled

We don't support fourth possibility that is deferred removal disabled and deferred deletion enabled. And we already have a check for that.

Contributor

rhvgoyal commented Jul 7, 2017

@cpuguy83 No, this patch is making sure that a user can run with deferred removal enabled but deferred deletion disabled. So graph driver will support following 3 combinations.

  • both disabled
  • deferred removal enabled, deferred deletion disabled
  • both enabled

We don't support fourth possibility that is deferred removal disabled and deferred deletion enabled. And we already have a check for that.

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jul 7, 2017

Contributor

@cpuguy83 BTW, I have a question about EnsureRemoveAll. Why do we try to do recursive unmouting? Atleast for the case of graph driver it is not needed as we are trying to remove a directory which itself is a mount point. So doing a umount(dir, MNT_DETACH) should be enough?

Is it for the case where top level directory is not a mount point? If yes, we could probably check if directory being removed is a mount point or not and take action accordingly.

Contributor

rhvgoyal commented Jul 7, 2017

@cpuguy83 BTW, I have a question about EnsureRemoveAll. Why do we try to do recursive unmouting? Atleast for the case of graph driver it is not needed as we are trying to remove a directory which itself is a mount point. So doing a umount(dir, MNT_DETACH) should be enough?

Is it for the case where top level directory is not a mount point? If yes, we could probably check if directory being removed is a mount point or not and take action accordingly.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jul 7, 2017

Contributor

@rhvgoyal It indeed should be enough. Originally it wasn't a detached unmount. It may be something we can look at changing.

Contributor

cpuguy83 commented Jul 7, 2017

@rhvgoyal It indeed should be enough. Originally it wasn't a detached unmount. It may be something we can look at changing.

// rather busy wait for device removal to take care of these cases.
deferredRemove := devices.deferredRemove
if !devices.deferredDelete {
deferredRemove = false

This comment has been minimized.

@cpuguy83

cpuguy83 Jul 7, 2017

Contributor

So here is what I was looking at.
We always disable deferred removal if deferred delete is disabled (this is where Remove() winds up).

@cpuguy83

cpuguy83 Jul 7, 2017

Contributor

So here is what I was looking at.
We always disable deferred removal if deferred delete is disabled (this is where Remove() winds up).

This comment has been minimized.

@rhvgoyal

rhvgoyal Jul 7, 2017

Contributor

ok, yes. But we will do this only if container is being deleted. If container is just exiting/stopping but not going away, then deferred removal will still work.

@rhvgoyal

rhvgoyal Jul 7, 2017

Contributor

ok, yes. But we will do this only if container is being deleted. If container is just exiting/stopping but not going away, then deferred removal will still work.

@cpuguy83

LGTM

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jul 10, 2017

Contributor

@thaJeztah Can we restart the CI. I don't think this issue is due to my patch. I am hoping this PR can be merged soon now.

Contributor

rhvgoyal commented Jul 10, 2017

@thaJeztah Can we restart the CI. I don't think this issue is due to my patch. I am hoping this PR can be merged soon now.

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jul 12, 2017

Contributor

can we merge this PR now. It has acks and build has passed.

Contributor

rhvgoyal commented Jul 12, 2017

can we merge this PR now. It has acks and build has passed.

@thaJeztah

LGTM

@thaJeztah thaJeztah merged commit e04dbe5 into moby:master Jul 13, 2017

6 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 35450 has succeeded
Details
janky Jenkins build Docker-PRs 44062 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 4433 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 15567 has succeeded
Details
z Jenkins build Docker-PRs-s390x 4137 has succeeded
Details
@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 13, 2017

Member

Thanks!

Member

thaJeztah commented Jul 13, 2017

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment