Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a race condition in add SCSI workflow #1483

Merged
merged 1 commit into from
Aug 16, 2022

Conversation

ambarve
Copy link
Contributor

@ambarve ambarve commented Aug 16, 2022

addSCSI currently uses a mutex only to check if a disk is already attached to the UVM. However, no mutex is
held when actually attaching the disk to the UVM. Because of this if two goroutines try to add the same SCSI
disk to a UVM at the same time, one of them will see that the disk is not already attached, will add an entry
into the controller/LUN map and continue with the attach process. The other goroutine will just see the entry
in the map and returns thinking that the SCSI disk is already attached to the UVM. At this point the disk
attach operation from the first goroutine is still in progress so if the second goroutine tries to use that
disk inside the UVM it fails with cryptic errors from overlayfs (or whatever other component in the guest that
tries to use this disk).

To get around this problem, we now include a channel in each SCSIMount struct that should be used by all the
goroutines (except for the very first goroutine that adds this disk) to wait until the mounting of that SCSI
disk is complete. Only the very first goroutine that adds this disk should close it.

Signed-off-by: Amit Barve ambarve@microsoft.com

addSCSI currently uses the mutex only to check if a disk is already attached to the UVM. However, no mutex is
held when actually attaching the disk to the UVM. Because of this if two goroutines try to add the same SCSI
disk to a UVM at the same time, one of them will see that the disk is not already attached, will add an entry
into the controller/LUN map and continue with the attach process. The other goroutine will just see the entry
in the map and returns thinking that the SCSI disk is already attached to the UVM. At this point the disk
attach operation from the first goroutine is still in progress so if the second goroutine tries to use that
disk inside the UVM it fails with cryptic errors from overlayfs (or whatever other component in the guest that
tries to use this disk).

To get around this problem, we now include a channel in each SCSIMount struct that should be used by all the
goroutines (except for the very first goroutine that adds this disk) to wait until the mounting of that SCSI
disk is complete. Only the very first goroutine that adds this disk should close it.

Signed-off-by: Amit Barve <ambarve@microsoft.com>
@ambarve ambarve requested a review from a team as a code owner August 16, 2022 04:42
@dcantah dcantah self-assigned this Aug 16, 2022
@helsaawy helsaawy self-assigned this Aug 16, 2022
@ambarve ambarve merged commit 09cb211 into microsoft:main Aug 16, 2022
princepereira pushed a commit to princepereira/hcsshim that referenced this pull request Aug 29, 2024
addSCSI currently uses the mutex only to check if a disk is already attached to the UVM. However, no mutex is
held when actually attaching the disk to the UVM. Because of this if two goroutines try to add the same SCSI
disk to a UVM at the same time, one of them will see that the disk is not already attached, will add an entry
into the controller/LUN map and continue with the attach process. The other goroutine will just see the entry
in the map and returns thinking that the SCSI disk is already attached to the UVM. At this point the disk
attach operation from the first goroutine is still in progress so if the second goroutine tries to use that
disk inside the UVM it fails with cryptic errors from overlayfs (or whatever other component in the guest that
tries to use this disk).

To get around this problem, we now include a channel in each SCSIMount struct that should be used by all the
goroutines (except for the very first goroutine that adds this disk) to wait until the mounting of that SCSI
disk is complete. Only the very first goroutine that adds this disk should close it.

Signed-off-by: Amit Barve <ambarve@microsoft.com>

Signed-off-by: Amit Barve <ambarve@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants