Skip to content

Conversation

@bprotopopov
Copy link
Contributor

@bprotopopov bprotopopov commented Jun 13, 2017

Signed-off-by: Boris Protopopov boris.protopopov@actifio.com

Description

I wanted to be more deliberate in using zv_state_lock to protect and access zvol_state structure members. The lock is now taken when members of zvol_state structure are accessed or modified, and the relevant ASSERT()s are added to verify that the locks are taken.

Motivation and Context

This code change is to clarify usage/intent of zv_state_lock and to make it easier to avoid potentially difficult to reproduce race conditions.

How Has This Been Tested?

Ran zfs-tests, did not see failures (some tests were skipped).

Ran the following to stress device-minor-related zvol code paths:
# create 10 zvols with snapshots
for i in $(seq 1 10); do zfs create -V1G -s -b 4k zvol_pool/zvol$i; done
for i in $(seq 1 10); do zfs snapshot zvol_pool/zvol${i}@s0; done
# flip snapdev for all zvols from visible to hidden and back 100 times in quick succession
for i in $(seq 1 100); do echo Test $i; zfs set snapdev=hidden zvol_pool; zfs set snapdev=visible zvol_pool; done
# add zvol renames to the mix
for i in $(seq 1 100); do echo Test $i; zfs set snapdev=hidden zvol_pool; zfs set snapdev=visible zvol_pool; for j in $(seq 1 10); do zfs rename zvol_pool/zvol$j zvol_pool/zvol-1-$j; done; for j in $(seq 1 10); do zfs rename zvol_pool/zvol-1-$j zvol_pool/zvol$j; done; done
# run same concurrently
for i in $(seq 1 100); do echo Test $i; zfs set snapdev=hidden zvol_pool; zfs set snapdev=visible zvol_pool; done & for k in $(seq 1 10); do for j in $(seq 1 10); do zfs rename zvol_pool/zvol$j zvol_pool/zvol-1-$j; done; for j in $(seq 1 10); do zfs rename zvol_pool/zvol-1-$j zvol_pool/zvol$j; done; done &

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

@mention-bot
Copy link

@bprotopopov, thanks for your PR! By analyzing the history of the files in this pull request, we identified @behlendorf, @tuxoko and @ryao to be potential reviewers.

@bprotopopov
Copy link
Contributor Author

@behlendorf
I was not quite sure how to test the suspend/resume portion of things; in this version, I am holding zv_state_lock across suspend-resume, so would like to make sure that's ok. Will investigate more.

@behlendorf
Copy link
Contributor

@bprotopopov the best way to test suspend/resume will be doing a zfs receive or zfs rollback of a zvol. See commit 040dab9 for additional details.

@bprotopopov
Copy link
Contributor Author

OK, thanks, @behlendorf, here is the output of the relevant tests from zfs-tests:

--- Configuration ---
Runfile:         /usr/share/zfs/runfiles/rerun.run
STF_TOOLS:       /usr/share/zfs/test-runner
STF_SUITE:       /usr/share/zfs/zfs-tests
STF_PATH:        /var/tmp/constrained_path.G9OE
FILEDIR:         /var/tmp
FILES:           /var/tmp/file-vdev0 /var/tmp/file-vdev1 /var/tmp/file-vdev2
LOOPBACKS:       /dev/loop0 /dev/loop1 /dev/loop2 
DISKS:           loop0 loop1 loop2 
NUM_DISKS:       3
FILESIZE:        4G
Keep pool(s):    rpool
Missing util(s): umask wait 

/usr/share/zfs/test-runner/bin/test-runner.py  -c /usr/share/zfs/runfiles/rerun.run -i /usr/share/zfs/zfs-tests
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/setup (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_001_pos (run as root) [00:01] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_002_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_003_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_005_neg (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_006_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_007_neg (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_008_pos (run as root) [00:03] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_009_neg (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_010_pos (run as root) [00:02] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_011_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_012_pos (run as root) [00:09] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_013_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_014_pos (run as root) [00:02] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_015_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/receive-o-x_props_override (run as root) [00:07] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/cleanup (run as root) [00:00] [PASS]

Results Summary
PASS	  17

Running Time:	00:00:32
Percent passed:	100.0%

is this adequate or something special is desirable ?

@bprotopopov
Copy link
Contributor Author

Ran flips from hidden to visible concurrent with renames:

[root@centos-6 build]# for i in $(seq 1 100); do echo Test $i; zfs set snapdev=hidden zvol_pool; zfs set snapdev=visible zvol_pool; done & for k in $(seq 1 10); do for j in $(seq 1 10); do zfs rename zvol_pool/zvol$j zvol_pool/zvol-1-$j; done; for j in $(seq 1 10); do zfs rename zvol_pool/zvol-1-$j zvol_pool/zvol$j; done; done &

No issues were found.

@bprotopopov
Copy link
Contributor Author

bprotopopov commented Jun 14, 2017

Hi, @behlendorf
while testing these changes, I found the following warning shown at pool import time:

Jun 14 09:41:25 centos-6 kernel: ------------[ cut here ]------------
Jun 14 09:41:25 centos-6 kernel: WARNING: at include/linux/blkdev.h:509 zvol_create_minor_impl+0x898/0x8b0 [zfs]() (Tainte\
d: P        W  -- ------------   )
Jun 14 09:41:25 centos-6 kernel: Hardware name: VirtualBox
Jun 14 09:41:25 centos-6 kernel: Modules linked in: zfs(P)(U) zcommon(P)(U) znvpair(P)(U) icp(P)(U) zavl(P)(U) zunicode(P)\
(U) splat(U) spl(U) zlib_deflate ext2 joydev ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_ma\
ngle bridge autofs4 bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc 8021q garp scsi_tgt stp llc vboxsf(U) ipt_REJECT \
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntra\
ck ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun uinput microcode sg i2c_piix4 i2c_core vboxguest(U) e1000\
 snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ext4 jbd2 mbca\
che sd_mod crc_t10dif sr_mod cdrom ahci pata_acpi ata_generic ata_piix video output dm_mirror dm_region_hash dm_log dm_mod\
 [last unloaded: zunicode]
Jun 14 09:41:25 centos-6 kernel: Pid: 4431, comm: z_zvol Tainted: P        W  -- ------------    2.6.32-642.13.1.el6.x86_6\
4 #1
Jun 14 09:41:25 centos-6 kernel: Call Trace:
Jun 14 09:41:25 centos-6 kernel: [<ffffffff8107c6f1>] ? warn_slowpath_common+0x91/0xe0
Jun 14 09:41:25 centos-6 kernel: [<ffffffff8107c75a>] ? warn_slowpath_null+0x1a/0x20
Jun 14 09:41:25 centos-6 kernel: [<ffffffffa0b5bd68>] ? zvol_create_minor_impl+0x898/0x8b0 [zfs]
Jun 14 09:41:25 centos-6 kernel: [<ffffffffa0b5eb09>] ? zvol_task_cb+0x199/0x5d0 [zfs]
Jun 14 09:41:25 centos-6 kernel: [<ffffffffa0979854>] ? taskq_thread+0x294/0x590 [spl]
Jun 14 09:41:25 centos-6 kernel: [<ffffffff8106c500>] ? default_wake_function+0x0/0x20
Jun 14 09:41:25 centos-6 kernel: [<ffffffffa09795c0>] ? taskq_thread+0x0/0x590 [spl]
Jun 14 09:41:25 centos-6 kernel: [<ffffffff810a640e>] ? kthread+0x9e/0xc0
Jun 14 09:41:25 centos-6 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
Jun 14 09:41:25 centos-6 kernel: [<ffffffff810a6370>] ? kthread+0x0/0xc0
Jun 14 09:41:25 centos-6 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Jun 14 09:41:25 centos-6 kernel: ---[ end trace 62059cb9cd373384 ]---

I believe it is caused by

static zvol_state_t *
zvol_alloc(dev_t dev, const char *name)
{
        zvol_state_t *zv;
...
        /* Limit read-ahead to a single page to prevent over-prefetching. */
        blk_queue_set_read_ahead(zv->zv_queue, 1);

        /* Disable write merging in favor of the ZIO pipeline. */
        queue_flag_set(QUEUE_FLAG_NOMERGES, zv->zv_queue);
...

instead of

        /* Disable write merging in favor of the ZIO pipeline. */
        queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, zv->zv_queue);

I can add this tweak to this pull request.

@bprotopopov
Copy link
Contributor Author

Ran a few full/incremental send/recv for zvols, worked fine.

@behlendorf
Copy link
Contributor

@bprotopopov thanks, this cleanup is looking good. I'd like to merge #6213 after the buildbot finishes with it in a few hours. Then if you could rebase this change on the updated master and resolve the conflicts we'll get your additional cleanup in. Please go ahead and include the queue_flag_set_unlocked() change in your updated PR, nice find on that.

@bprotopopov
Copy link
Contributor Author

force-push to crank the builds/tests

behlendorf pushed a commit that referenced this pull request Jun 15, 2017
Use queue_flag_set_unlocked() in zvol_alloc().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Issue #6226
@behlendorf
Copy link
Contributor

@bprotopopov I've merged #6228 to master and cherry-picked your tiny 2ae41fc fix. I know you just refreshed this but please go ahead and rebase this additional cleanup on master and resolve the conflict so we can get this reviewed too.

@bprotopopov
Copy link
Contributor Author

np @behlendorf

@bprotopopov
Copy link
Contributor Author

Hi, @tuxoko
I wonder if you could weigh in on the cleanup I did, specifically, on the following aspects:

  • I am holding zv_state_lock mutex across suspend/resume; this did not result in any issues in my testing, but I am not sure if I have covered all the angles
  • I have replaced use of zv_suspend_lock protecting zv_name with zv_state_lock since the latter protects all the members of zvol_state, and also, in a few places where zv_suspend_lock is taken, zv_state_lock also needs to be taken within the same (or enclosing) scope

Please let me know if you have any suggestions.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is with holding zv_state_lock over calls to zvol_suspend()/zvol_resume(). That will effectively block zvol_find_by_* lookups while any zv is suspended doing dmu_recv_end. We should continue to drop the lock in zvol_suspend() and reacquire it in zvol_resume() to avoid this.

zv = list_next(&zvol_state_list, zv)) {
mutex_enter(&zv->zv_state_lock);
if (zv->zv_dev == dev)
/* return with zv_state_lock taken */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd suggest we move this comment in to the block comment above the function and clearly describe the expected behavior of returning with the lock held when a match is found. The same for zvol_find_by_name_hash and zvol_find_by_name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@bprotopopov
Copy link
Contributor Author

Hi, @behlendorf

we can't re-acquire the lock because this will amount to acquiring zv_state_lock and zv_suspend_lock out of order, causing deadlocks that I I have observed earlier when testing my original commit. At that time, I chose to not acquire the lock in zvol_resume(). I'll think of other options.

@bprotopopov
Copy link
Contributor Author

@behlendorf
I think I will try to change the order of acquisition of zv_suspend_lock and zv_state_lock to take zv_suspend_lock first, and then zv_state_lock could be dropped and re-acquired across suspend/resume.

@bprotopopov bprotopopov force-pushed the zv-state-lock branch 4 times, most recently from e96f3eb to ebfe7e9 Compare June 20, 2017 13:49
@bprotopopov
Copy link
Contributor Author

rebased against the master, tweaked bqueue_dequeue().

@bprotopopov
Copy link
Contributor Author

tweaked the comments

@bprotopopov
Copy link
Contributor Author

addressed review comments, checked the test failures, those do not seem to be related, and some appear to be caused by I/O errors on the pool devices

q->bq_size -= item_size;
mutex_exit(&q->bq_lock);
cv_signal(&q->bq_add_cv);
mutex_exit(&q->bq_lock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you need to move the cv_signal() under the lock?

@bprotopopov
Copy link
Contributor Author

bprotopopov commented Jun 21, 2017 via email

@behlendorf
Copy link
Contributor

behlendorf commented Jun 22, 2017

I came to a conclusion that the bqueue_dequeue() function might not be delivering the signal properly.

We've seen the same symptom in #5887 and #4486 but haven't had to chance to investigate it. It occured frequently enough that we disabled the rsend_009_pos test case until it was resolved. It sounds like you were able to see this fairly frequently and moving the cv_signal() under the lock addressed it. If that's the case could you move this unrelated fix in to its own PR and re-enable the rsend_009_pos test case. That will make it easier to merge and verify.

The proposed locking changes here look reasonable but it would be great if you could squash them to make one last review round easier. We should also get @tuxoko's feedback on this.

@bprotopopov
Copy link
Contributor Author

@behlendorf sure sounds good

@bprotopopov
Copy link
Contributor Author

Seems like I/O errors on VM storage are interfering with test runs ?

Use zv_state_lock to protect all members of zvol_state structure, add
relevant ASSERT()s. Take zv_suspend_lock before zv_state_lock, do not
hold zv_state_lock across suspend/resume.

Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
@bprotopopov
Copy link
Contributor Author

kicked off another run of tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants