-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Refine use of zv_state_lock. #6226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@bprotopopov, thanks for your PR! By analyzing the history of the files in this pull request, we identified @behlendorf, @tuxoko and @ryao to be potential reviewers. |
|
@behlendorf |
|
@bprotopopov the best way to test suspend/resume will be doing a |
|
OK, thanks, @behlendorf, here is the output of the relevant tests from zfs-tests: is this adequate or something special is desirable ? |
|
Ran flips from hidden to visible concurrent with renames: No issues were found. |
|
Hi, @behlendorf I believe it is caused by instead of I can add this tweak to this pull request. |
|
Ran a few full/incremental send/recv for zvols, worked fine. |
|
@bprotopopov thanks, this cleanup is looking good. I'd like to merge #6213 after the buildbot finishes with it in a few hours. Then if you could rebase this change on the updated master and resolve the conflicts we'll get your additional cleanup in. Please go ahead and include the |
70be7ad to
2ae41fc
Compare
|
force-push to crank the builds/tests |
Use queue_flag_set_unlocked() in zvol_alloc(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com> Issue #6226
|
@bprotopopov I've merged #6228 to master and cherry-picked your tiny 2ae41fc fix. I know you just refreshed this but please go ahead and rebase this additional cleanup on master and resolve the conflict so we can get this reviewed too. |
|
np @behlendorf |
2ae41fc to
12ab968
Compare
|
Hi, @tuxoko
Please let me know if you have any suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is with holding zv_state_lock over calls to zvol_suspend()/zvol_resume(). That will effectively block zvol_find_by_* lookups while any zv is suspended doing dmu_recv_end. We should continue to drop the lock in zvol_suspend() and reacquire it in zvol_resume() to avoid this.
module/zfs/zvol.c
Outdated
| zv = list_next(&zvol_state_list, zv)) { | ||
| mutex_enter(&zv->zv_state_lock); | ||
| if (zv->zv_dev == dev) | ||
| /* return with zv_state_lock taken */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd suggest we move this comment in to the block comment above the function and clearly describe the expected behavior of returning with the lock held when a match is found. The same for zvol_find_by_name_hash and zvol_find_by_name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
|
Hi, @behlendorf we can't re-acquire the lock because this will amount to acquiring |
|
@behlendorf |
e96f3eb to
ebfe7e9
Compare
|
rebased against the master, tweaked |
ebfe7e9 to
492aeff
Compare
|
tweaked the comments |
|
addressed review comments, checked the test failures, those do not seem to be related, and some appear to be caused by I/O errors on the pool devices |
492aeff to
b7fcade
Compare
| q->bq_size -= item_size; | ||
| mutex_exit(&q->bq_lock); | ||
| cv_signal(&q->bq_add_cv); | ||
| mutex_exit(&q->bq_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you need to move the cv_signal() under the lock?
|
I have done this in response to an intermittent hang in the receive() code that was blocked in bqueue_enqueue(). After reviewing the code, I came to a conclusion that the bqueue_dequeue() function might not be delivering the signal properly.
I am pretty sure this is the standard idiom for using conditional variables, designed to deliver the signal exactly once. Also, the bqueue_enqueue() function does it this way.
Typos courtesy of my iPhone
On Jun 21, 2017, at 11:23 AM, Brian Behlendorf <notifications@github.com<mailto:notifications@github.com>> wrote:
@behlendorf commented on this pull request.
________________________________
In module/zfs/bqueue.c<#6226 (comment)>:
cv_signal(&q->bq_add_cv);
+ mutex_exit(&q->bq_lock);
Why did you need to move the cv_signal() under the lock?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#6226 (review)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ACX4uXxr6eIyzT0p5beS462pzq0LEc4Tks5sGTV5gaJpZM4N5HBb>.
|
We've seen the same symptom in #5887 and #4486 but haven't had to chance to investigate it. It occured frequently enough that we disabled the The proposed locking changes here look reasonable but it would be great if you could squash them to make one last review round easier. We should also get @tuxoko's feedback on this. |
|
@behlendorf sure sounds good |
b7fcade to
710365d
Compare
|
Seems like I/O errors on VM storage are interfering with test runs ? |
Use zv_state_lock to protect all members of zvol_state structure, add relevant ASSERT()s. Take zv_suspend_lock before zv_state_lock, do not hold zv_state_lock across suspend/resume. Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
710365d to
a782f1c
Compare
|
kicked off another run of tests |
Signed-off-by: Boris Protopopov boris.protopopov@actifio.com
Description
I wanted to be more deliberate in using zv_state_lock to protect and access zvol_state structure members. The lock is now taken when members of zvol_state structure are accessed or modified, and the relevant ASSERT()s are added to verify that the locks are taken.
Motivation and Context
This code change is to clarify usage/intent of zv_state_lock and to make it easier to avoid potentially difficult to reproduce race conditions.
How Has This Been Tested?
Ran zfs-tests, did not see failures (some tests were skipped).
Ran the following to stress device-minor-related zvol code paths:$(seq 1 10); do zfs snapshot zvol_pool/zvol$ {i}@s0; done$i; zfs set snapdev=hidden zvol_pool; zfs set snapdev=visible zvol_pool; for j in $ (seq 1 10); do zfs rename zvol_pool/zvol$j zvol_pool/zvol-1-$j; done; for j in $(seq 1 10); do zfs rename zvol_pool/zvol-1-$j zvol_pool/zvol$j; done; done$i; zfs set snapdev=hidden zvol_pool; zfs set snapdev=visible zvol_pool; done & for k in $ (seq 1 10); do for j in $(seq 1 10); do zfs rename zvol_pool/zvol$j zvol_pool/zvol-1-$j; done; for j in $(seq 1 10); do zfs rename zvol_pool/zvol-1-$j zvol_pool/zvol$j; done; done &
# create 10 zvols with snapshots
for i in $(seq 1 10); do zfs create -V1G -s -b 4k zvol_pool/zvol$i; done
for i in
# flip snapdev for all zvols from visible to hidden and back 100 times in quick succession
for i in $(seq 1 100); do echo Test $i; zfs set snapdev=hidden zvol_pool; zfs set snapdev=visible zvol_pool; done
# add zvol renames to the mix
for i in $(seq 1 100); do echo Test
# run same concurrently
for i in $(seq 1 100); do echo Test
Types of changes
Checklist:
Signed-off-by.