Refine use of zv_state_lock. #6226

bprotopopov · 2017-06-13T21:29:12Z

Signed-off-by: Boris Protopopov boris.protopopov@actifio.com

Description

I wanted to be more deliberate in using zv_state_lock to protect and access zvol_state structure members. The lock is now taken when members of zvol_state structure are accessed or modified, and the relevant ASSERT()s are added to verify that the locks are taken.

Motivation and Context

This code change is to clarify usage/intent of zv_state_lock and to make it easier to avoid potentially difficult to reproduce race conditions.

How Has This Been Tested?

Ran zfs-tests, did not see failures (some tests were skipped).

Ran the following to stress device-minor-related zvol code paths:
# create 10 zvols with snapshots
for i in $(seq 1 10); do zfs create -V1G -s -b 4k zvol_pool/zvol$i; done
for i in $(seq 1 10); do zfs snapshot zvol_pool/zvol${i}@s0; done
# flip snapdev for all zvols from visible to hidden and back 100 times in quick succession
for i in $(seq 1 100); do echo Test $i; zfs set snapdev=hidden zvol_pool; zfs set snapdev=visible zvol_pool; done
# add zvol renames to the mix
for i in $(seq 1 100); do echo Test $i; zfs set snapdev=hidden zvol_pool; zfs set snapdev=visible zvol_pool; for j in $(seq 1 10); do zfs rename zvol_pool/zvol$j zvol_pool/zvol-1-$j; done; for j in $(seq 1 10); do zfs rename zvol_pool/zvol-1-$j zvol_pool/zvol$j; done; done
# run same concurrently
for i in $(seq 1 100); do echo Test $i; zfs set snapdev=hidden zvol_pool; zfs set snapdev=visible zvol_pool; done & for k in $(seq 1 10); do for j in $(seq 1 10); do zfs rename zvol_pool/zvol$j zvol_pool/zvol-1-$j; done; for j in $(seq 1 10); do zfs rename zvol_pool/zvol-1-$j zvol_pool/zvol$j; done; done &

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.
Change has been approved by a ZFS on Linux member.

mention-bot · 2017-06-13T21:29:14Z

@bprotopopov, thanks for your PR! By analyzing the history of the files in this pull request, we identified @behlendorf, @tuxoko and @ryao to be potential reviewers.

bprotopopov · 2017-06-13T21:32:58Z

@behlendorf
I was not quite sure how to test the suspend/resume portion of things; in this version, I am holding zv_state_lock across suspend-resume, so would like to make sure that's ok. Will investigate more.

behlendorf · 2017-06-13T21:36:14Z

@bprotopopov the best way to test suspend/resume will be doing a zfs receive or zfs rollback of a zvol. See commit 040dab9 for additional details.

bprotopopov · 2017-06-13T21:43:00Z

OK, thanks, @behlendorf, here is the output of the relevant tests from zfs-tests:

--- Configuration ---
Runfile:         /usr/share/zfs/runfiles/rerun.run
STF_TOOLS:       /usr/share/zfs/test-runner
STF_SUITE:       /usr/share/zfs/zfs-tests
STF_PATH:        /var/tmp/constrained_path.G9OE
FILEDIR:         /var/tmp
FILES:           /var/tmp/file-vdev0 /var/tmp/file-vdev1 /var/tmp/file-vdev2
LOOPBACKS:       /dev/loop0 /dev/loop1 /dev/loop2 
DISKS:           loop0 loop1 loop2 
NUM_DISKS:       3
FILESIZE:        4G
Keep pool(s):    rpool
Missing util(s): umask wait 

/usr/share/zfs/test-runner/bin/test-runner.py  -c /usr/share/zfs/runfiles/rerun.run -i /usr/share/zfs/zfs-tests
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/setup (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_001_pos (run as root) [00:01] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_002_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_003_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_005_neg (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_006_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_007_neg (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_008_pos (run as root) [00:03] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_009_neg (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_010_pos (run as root) [00:02] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_011_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_012_pos (run as root) [00:09] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_013_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_014_pos (run as root) [00:02] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_015_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/receive-o-x_props_override (run as root) [00:07] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/cleanup (run as root) [00:00] [PASS]

Results Summary
PASS	  17

Running Time:	00:00:32
Percent passed:	100.0%

is this adequate or something special is desirable ?

bprotopopov · 2017-06-14T13:22:53Z

Ran flips from hidden to visible concurrent with renames:

[root@centos-6 build]# for i in $(seq 1 100); do echo Test $i; zfs set snapdev=hidden zvol_pool; zfs set snapdev=visible zvol_pool; done & for k in $(seq 1 10); do for j in $(seq 1 10); do zfs rename zvol_pool/zvol$j zvol_pool/zvol-1-$j; done; for j in $(seq 1 10); do zfs rename zvol_pool/zvol-1-$j zvol_pool/zvol$j; done; done &

No issues were found.

bprotopopov · 2017-06-14T14:06:13Z

Hi, @behlendorf
while testing these changes, I found the following warning shown at pool import time:

Jun 14 09:41:25 centos-6 kernel: ------------[ cut here ]------------
Jun 14 09:41:25 centos-6 kernel: WARNING: at include/linux/blkdev.h:509 zvol_create_minor_impl+0x898/0x8b0 [zfs]() (Tainte\
d: P        W  -- ------------   )
Jun 14 09:41:25 centos-6 kernel: Hardware name: VirtualBox
Jun 14 09:41:25 centos-6 kernel: Modules linked in: zfs(P)(U) zcommon(P)(U) znvpair(P)(U) icp(P)(U) zavl(P)(U) zunicode(P)\
(U) splat(U) spl(U) zlib_deflate ext2 joydev ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_ma\
ngle bridge autofs4 bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc 8021q garp scsi_tgt stp llc vboxsf(U) ipt_REJECT \
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntra\
ck ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun uinput microcode sg i2c_piix4 i2c_core vboxguest(U) e1000\
 snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ext4 jbd2 mbca\
che sd_mod crc_t10dif sr_mod cdrom ahci pata_acpi ata_generic ata_piix video output dm_mirror dm_region_hash dm_log dm_mod\
 [last unloaded: zunicode]
Jun 14 09:41:25 centos-6 kernel: Pid: 4431, comm: z_zvol Tainted: P        W  -- ------------    2.6.32-642.13.1.el6.x86_6\
4 #1
Jun 14 09:41:25 centos-6 kernel: Call Trace:
Jun 14 09:41:25 centos-6 kernel: [<ffffffff8107c6f1>] ? warn_slowpath_common+0x91/0xe0
Jun 14 09:41:25 centos-6 kernel: [<ffffffff8107c75a>] ? warn_slowpath_null+0x1a/0x20
Jun 14 09:41:25 centos-6 kernel: [<ffffffffa0b5bd68>] ? zvol_create_minor_impl+0x898/0x8b0 [zfs]
Jun 14 09:41:25 centos-6 kernel: [<ffffffffa0b5eb09>] ? zvol_task_cb+0x199/0x5d0 [zfs]
Jun 14 09:41:25 centos-6 kernel: [<ffffffffa0979854>] ? taskq_thread+0x294/0x590 [spl]
Jun 14 09:41:25 centos-6 kernel: [<ffffffff8106c500>] ? default_wake_function+0x0/0x20
Jun 14 09:41:25 centos-6 kernel: [<ffffffffa09795c0>] ? taskq_thread+0x0/0x590 [spl]
Jun 14 09:41:25 centos-6 kernel: [<ffffffff810a640e>] ? kthread+0x9e/0xc0
Jun 14 09:41:25 centos-6 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
Jun 14 09:41:25 centos-6 kernel: [<ffffffff810a6370>] ? kthread+0x0/0xc0
Jun 14 09:41:25 centos-6 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Jun 14 09:41:25 centos-6 kernel: ---[ end trace 62059cb9cd373384 ]---

I believe it is caused by

static zvol_state_t *
zvol_alloc(dev_t dev, const char *name)
{
        zvol_state_t *zv;
...
        /* Limit read-ahead to a single page to prevent over-prefetching. */
        blk_queue_set_read_ahead(zv->zv_queue, 1);

        /* Disable write merging in favor of the ZIO pipeline. */
        queue_flag_set(QUEUE_FLAG_NOMERGES, zv->zv_queue);
...

instead of

        /* Disable write merging in favor of the ZIO pipeline. */
        queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, zv->zv_queue);

I can add this tweak to this pull request.

bprotopopov · 2017-06-14T19:33:29Z

Ran a few full/incremental send/recv for zvols, worked fine.

behlendorf · 2017-06-14T19:43:24Z

@bprotopopov thanks, this cleanup is looking good. I'd like to merge #6213 after the buildbot finishes with it in a few hours. Then if you could rebase this change on the updated master and resolve the conflicts we'll get your additional cleanup in. Please go ahead and include the queue_flag_set_unlocked() change in your updated PR, nice find on that.

bprotopopov · 2017-06-15T17:45:31Z

force-push to crank the builds/tests

Use queue_flag_set_unlocked() in zvol_alloc(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com> Issue #6226

behlendorf · 2017-06-15T18:14:38Z

@bprotopopov I've merged #6228 to master and cherry-picked your tiny 2ae41fc fix. I know you just refreshed this but please go ahead and rebase this additional cleanup on master and resolve the conflict so we can get this reviewed too.

bprotopopov · 2017-06-15T18:35:56Z

np @behlendorf

bprotopopov · 2017-06-16T16:57:07Z

Hi, @tuxoko
I wonder if you could weigh in on the cleanup I did, specifically, on the following aspects:

I am holding zv_state_lock mutex across suspend/resume; this did not result in any issues in my testing, but I am not sure if I have covered all the angles
I have replaced use of zv_suspend_lock protecting zv_name with zv_state_lock since the latter protects all the members of zvol_state, and also, in a few places where zv_suspend_lock is taken, zv_state_lock also needs to be taken within the same (or enclosing) scope

Please let me know if you have any suggestions.

behlendorf

My only concern is with holding zv_state_lock over calls to zvol_suspend()/zvol_resume(). That will effectively block zvol_find_by_* lookups while any zv is suspended doing dmu_recv_end. We should continue to drop the lock in zvol_suspend() and reacquire it in zvol_resume() to avoid this.

behlendorf · 2017-06-16T17:00:44Z

module/zfs/zvol.c

 	    zv = list_next(&zvol_state_list, zv)) {
+		mutex_enter(&zv->zv_state_lock);
 		if (zv->zv_dev == dev)
+			/* return with zv_state_lock taken */


nit: I'd suggest we move this comment in to the block comment above the function and clearly describe the expected behavior of returning with the lock held when a match is found. The same for zvol_find_by_name_hash and zvol_find_by_name.

Sure, makes sense.

bprotopopov · 2017-06-16T17:40:58Z

Hi, @behlendorf

we can't re-acquire the lock because this will amount to acquiring zv_state_lock and zv_suspend_lock out of order, causing deadlocks that I I have observed earlier when testing my original commit. At that time, I chose to not acquire the lock in zvol_resume(). I'll think of other options.

bprotopopov · 2017-06-16T18:13:34Z

@behlendorf
I think I will try to change the order of acquisition of zv_suspend_lock and zv_state_lock to take zv_suspend_lock first, and then zv_state_lock could be dropped and re-acquired across suspend/resume.

bprotopopov · 2017-06-20T13:51:39Z

rebased against the master, tweaked bqueue_dequeue().

bprotopopov · 2017-06-20T13:59:52Z

tweaked the comments

bprotopopov · 2017-06-21T01:33:44Z

addressed review comments, checked the test failures, those do not seem to be related, and some appear to be caused by I/O errors on the pool devices

behlendorf · 2017-06-21T15:23:16Z

module/zfs/bqueue.c

 	q->bq_size -= item_size;
-	mutex_exit(&q->bq_lock);
 	cv_signal(&q->bq_add_cv);
+	mutex_exit(&q->bq_lock);


Why did you need to move the cv_signal() under the lock?

bprotopopov · 2017-06-21T19:25:31Z

I have done this in response to an intermittent hang in the receive() code that was blocked in bqueue_enqueue(). After reviewing the code, I came to a conclusion that the bqueue_dequeue() function might not be delivering the signal properly. I am pretty sure this is the standard idiom for using conditional variables, designed to deliver the signal exactly once. Also, the bqueue_enqueue() function does it this way. Typos courtesy of my iPhone On Jun 21, 2017, at 11:23 AM, Brian Behlendorf <notifications@github.com<mailto:notifications@github.com>> wrote: @behlendorf commented on this pull request.

________________________________ In module/zfs/bqueue.c<#6226 (comment)>:

cv_signal(&q->bq_add_cv);

+ mutex_exit(&q->bq_lock); Why did you need to move the cv_signal() under the lock? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#6226 (review)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ACX4uXxr6eIyzT0p5beS462pzq0LEc4Tks5sGTV5gaJpZM4N5HBb>.

behlendorf · 2017-06-22T17:12:17Z

I came to a conclusion that the bqueue_dequeue() function might not be delivering the signal properly.

We've seen the same symptom in #5887 and #4486 but haven't had to chance to investigate it. It occured frequently enough that we disabled the rsend_009_pos test case until it was resolved. It sounds like you were able to see this fairly frequently and moving the cv_signal() under the lock addressed it. If that's the case could you move this unrelated fix in to its own PR and re-enable the rsend_009_pos test case. That will make it easier to merge and verify.

The proposed locking changes here look reasonable but it would be great if you could squash them to make one last review round easier. We should also get @tuxoko's feedback on this.

bprotopopov · 2017-06-22T18:20:41Z

@behlendorf sure sounds good

bprotopopov · 2017-06-23T01:07:25Z

Seems like I/O errors on VM storage are interfering with test runs ?

Use zv_state_lock to protect all members of zvol_state structure, add relevant ASSERT()s. Take zv_suspend_lock before zv_state_lock, do not hold zv_state_lock across suspend/resume. Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>

bprotopopov · 2017-06-23T01:13:14Z

kicked off another run of tests

bprotopopov force-pushed the zv-state-lock branch from 70be7ad to 2ae41fc Compare June 15, 2017 17:44

bprotopopov force-pushed the zv-state-lock branch from 2ae41fc to 12ab968 Compare June 15, 2017 18:42

behlendorf requested changes Jun 16, 2017

View reviewed changes

bprotopopov force-pushed the zv-state-lock branch 4 times, most recently from e96f3eb to ebfe7e9 Compare June 20, 2017 13:49

bprotopopov force-pushed the zv-state-lock branch from ebfe7e9 to 492aeff Compare June 20, 2017 13:59

bprotopopov force-pushed the zv-state-lock branch from 492aeff to b7fcade Compare June 21, 2017 01:34

behlendorf reviewed Jun 21, 2017

View reviewed changes

bprotopopov force-pushed the zv-state-lock branch from b7fcade to 710365d Compare June 22, 2017 18:56

Refine use of zv_state_lock.

a782f1c

Use zv_state_lock to protect all members of zvol_state structure, add relevant ASSERT()s. Take zv_suspend_lock before zv_state_lock, do not hold zv_state_lock across suspend/resume. Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>

bprotopopov force-pushed the zv-state-lock branch from 710365d to a782f1c Compare June 23, 2017 01:12

behlendorf mentioned this pull request Jun 26, 2017

PANIC at zvol.c:1165:zvol_resume() #6263

Closed

behlendorf approved these changes Jun 27, 2017

View reviewed changes

behlendorf closed this in 58404a7 Jun 27, 2017

Refine use of zv_state_lock. #6226

Refine use of zv_state_lock. #6226

Uh oh!

Conversation

bprotopopov commented Jun 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

mention-bot commented Jun 13, 2017

Uh oh!

bprotopopov commented Jun 13, 2017

Uh oh!

behlendorf commented Jun 13, 2017

Uh oh!

bprotopopov commented Jun 13, 2017

Uh oh!

bprotopopov commented Jun 14, 2017

Uh oh!

bprotopopov commented Jun 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bprotopopov commented Jun 14, 2017

Uh oh!

behlendorf commented Jun 14, 2017

Uh oh!

bprotopopov commented Jun 15, 2017

Uh oh!

behlendorf commented Jun 15, 2017

Uh oh!

bprotopopov commented Jun 15, 2017

Uh oh!

bprotopopov commented Jun 16, 2017

Uh oh!

behlendorf left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

behlendorf Jun 16, 2017

Choose a reason for hiding this comment

Uh oh!

bprotopopov Jun 16, 2017

Choose a reason for hiding this comment

Uh oh!

bprotopopov Jun 21, 2017

Choose a reason for hiding this comment

Uh oh!

bprotopopov commented Jun 16, 2017

Uh oh!

bprotopopov commented Jun 16, 2017

Uh oh!

bprotopopov commented Jun 20, 2017

Uh oh!

bprotopopov commented Jun 20, 2017

Uh oh!

bprotopopov commented Jun 21, 2017

Uh oh!

behlendorf Jun 21, 2017

Choose a reason for hiding this comment

Uh oh!

bprotopopov commented Jun 21, 2017 via email

Uh oh!

behlendorf commented Jun 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bprotopopov commented Jun 22, 2017

Uh oh!

bprotopopov commented Jun 23, 2017

Uh oh!

bprotopopov commented Jun 23, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

bprotopopov commented Jun 13, 2017 •

edited

Loading

bprotopopov commented Jun 14, 2017 •

edited

Loading

behlendorf left a comment •

edited

Loading

behlendorf commented Jun 22, 2017 •

edited

Loading