New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPSTREAM: 19707: Fix race condition in cinder attach/detach #7108

Merged
merged 1 commit into from Feb 8, 2016

Conversation

Projects
None yet
5 participants
@jsafrane
Contributor

jsafrane commented Feb 8, 2016

Fixes BZ#1298507: Cinder volume is detached from running pod

Upstream commit 220163f67de05890dd57b2ed6f2b02362f633a78

Add a mutex to guard SetUpAt() and TearDownAt() calls - they should not
run in parallel. There is a race in these calls when there are two pods
using the same volume, one of them is dying and the other one starting.

TearDownAt() checks that a volume is not needed by any pods and detaches the
volume. It does so by counting how many times is the volume mounted
(GetMountRefs() call below).

When SetUpAt() of the starting pod already attached the volume and did not mount
it yet, TearDownAt() of the dying pod will detach it - GetMountRefs() does not
count with this volume.

These two threads run in parallel:

 dying pod.TearDownAt("myVolume")          starting pod.SetUpAt("myVolume")
   |                                       |
   |                                       AttachDisk("myVolume")
   refs, err := mount.GetMountRefs()       |
   Unmount("myDir")                        |
   if refs == 1 {                          |
   |  |                                    Mount("myVolume", "myDir")
   |  |                                    |
   |  DetachDisk("myVolume")               |
   |                                       start containers - OOPS! The volume is detached!
   |
   finish the pod cleanup

Also, add some logs to cinder plugin for easier debugging in the future, add
a test and update the fake mounter to know about bind mounts.

UPSTREAM: 19707: Fix race condition in cinder attach/detach
Fixes BZ#1298507: Cinder volume is detached from running pod

Upstream commit 220163f67de05890dd57b2ed6f2b02362f633a78

Add a mutex to guard SetUpAt() and TearDownAt() calls - they should not
run in parallel.  There is a race in these calls when there are two pods
using the same volume, one of them is dying and the other one starting.

TearDownAt() checks that a volume is not needed by any pods and detaches the
volume. It does so by counting how many times is the volume mounted
(GetMountRefs() call below).

When SetUpAt() of the starting pod already attached the volume and did not mount
it yet, TearDownAt() of the dying pod will detach it - GetMountRefs() does not
count with this volume.

These two threads run in parallel:

 dying pod.TearDownAt("myVolume")          starting pod.SetUpAt("myVolume")
   |                                       |
   |                                       AttachDisk("myVolume")
   refs, err := mount.GetMountRefs()       |
   Unmount("myDir")                        |
   if refs == 1 {                          |
   |  |                                    Mount("myVolume", "myDir")
   |  |                                    |
   |  DetachDisk("myVolume")               |
   |                                       start containers - OOPS! The volume is detached!
   |
   finish the pod cleanup


Also, add some logs to cinder plugin for easier debugging in the future, add
a test and update the fake mounter to know about bind mounts.

@jsafrane jsafrane changed the title from UPSTREAM: PR 19707: Fix race condition in cinder attach/detach to UPSTREAM: 19707: Fix race condition in cinder attach/detach Feb 8, 2016

@jsafrane

This comment has been minimized.

Show comment
Hide comment
@jsafrane
Contributor

jsafrane commented Feb 8, 2016

@deads2k

This comment has been minimized.

Show comment
Hide comment
@deads2k

deads2k Feb 8, 2016

Contributor

@markturansky please confirm that we need no changes in origin to enable or make use of this.

Contributor

deads2k commented Feb 8, 2016

@markturansky please confirm that we need no changes in origin to enable or make use of this.

@liggitt

This comment has been minimized.

Show comment
Hide comment
@liggitt

liggitt Feb 8, 2016

Contributor

commit title should just be UPSTREAM: 19707: ...

nm, already fixed.

Contributor

liggitt commented Feb 8, 2016

commit title should just be UPSTREAM: 19707: ...

nm, already fixed.

@markturansky

This comment has been minimized.

Show comment
Hide comment
@markturansky

markturansky Feb 8, 2016

Member

@deads2k confirmed, no Origin changes required.

Member

markturansky commented Feb 8, 2016

@deads2k confirmed, no Origin changes required.

@deads2k

This comment has been minimized.

Show comment
Hide comment
@deads2k

deads2k Feb 8, 2016

Contributor

[merge]

Contributor

deads2k commented Feb 8, 2016

[merge]

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Feb 8, 2016

Member

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/4887/) (Image: devenv-rhel7_3360)

Member

openshift-bot commented Feb 8, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/4887/) (Image: devenv-rhel7_3360)

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Feb 8, 2016

Member

Evaluated for origin merge up to fc8e89a

Member

openshift-bot commented Feb 8, 2016

Evaluated for origin merge up to fc8e89a

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Feb 8, 2016

Member

[Test]ing while waiting on the merge queue

Member

openshift-bot commented Feb 8, 2016

[Test]ing while waiting on the merge queue

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Feb 8, 2016

Member

Evaluated for origin test up to fc8e89a

Member

openshift-bot commented Feb 8, 2016

Evaluated for origin test up to fc8e89a

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Feb 8, 2016

Member

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/911/)

Member

openshift-bot commented Feb 8, 2016

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/911/)

openshift-bot added a commit that referenced this pull request Feb 8, 2016

@openshift-bot openshift-bot merged commit 2d161b7 into openshift:master Feb 8, 2016

1 of 3 checks passed

continuous-integration/openshift-jenkins/test Failed
Details
continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/openshift-jenkins/merge Passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment