Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool import hangs after spa_freeze() and zpool export #2088

Closed
utopiabound opened this issue Jan 30, 2014 · 6 comments
Closed

zpool import hangs after spa_freeze() and zpool export #2088

utopiabound opened this issue Jan 30, 2014 · 6 comments

Comments

@utopiabound
Copy link
Contributor

Have run into a problem trying to use Lustre with tip of ZFS/SPL master.

Reproduction:
Run lustre test replay-single.sh only 14 and 15.

/usr/lib64/lustre/tests/auster -r -v -c /path/to/lustre-zfs-config.sh replay-single.sh --only 14,15

Mount of mds in test 15 hangs forever and eventually prints the following to messages:

INFO: task spl_system_task:715 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
spl_system_ta D 0000000000000000     0   715      2 0x00000000
ffff88007bf6daa0 0000000000000046 0000000000000200 ffff88006a4e6000
ffff88007bf6da40 ffffffffa0283e99 ffff88007bf6da80 ffff88007d5c8900
ffff880037a65058 ffff88007bf6dfd8 000000000000fb88 ffff880037a65058
Call Trace:
[<ffffffffa0283e99>] ? zio_vdev_io_start+0x1c9/0x2c0 [zfs]
[<ffffffff81096fae>] ? prepare_to_wait_exclusive+0x4e/0x80
[<ffffffffa013e29d>] cv_wait_common+0xed/0x100 [spl]
[<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa01e68c7>] ? buf_hash_find+0x87/0x110 [zfs]
[<ffffffffa013e305>] __cv_wait+0x15/0x20 [spl]
[<ffffffffa01ed20b>] arc_read+0xab/0x8e0 [zfs]
[<ffffffffa01eac10>] ? arc_getbuf_func+0x0/0x60 [zfs]
[<ffffffffa02047af>] traverse_visitbp+0x31f/0x6e0 [zfs]
[<ffffffffa0204be4>] traverse_dnode+0x74/0x100 [zfs]
[<ffffffffa0204a90>] traverse_visitbp+0x600/0x6e0 [zfs]
[<ffffffffa0204d07>] traverse_prefetch_thread+0x97/0xd0 [zfs]
[<ffffffffa02042a0>] ? traverse_prefetcher+0x0/0x150 [zfs]
[<ffffffffa013a227>] taskq_thread+0x1e7/0x3f0 [spl]
[<ffffffff81063990>] ? default_wake_function+0x0/0x20
[<ffffffffa013a040>] ? taskq_thread+0x0/0x3f0 [spl]
[<ffffffff81096a36>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffff810969a0>] ? kthread+0x0/0xa0
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20
INFO: task zpool:15589 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zpool         D 0000000000000000     0 15589  15587 0x00000080
ffff880068d8b6f8 0000000000000082 0000000000000200 ffff88006a4e6000
ffff880068d8b698 ffffffffa0283e99 ffff880068d8b6d8 ffff88007d5cf660
ffff88007b611ab8 ffff880068d8bfd8 000000000000fb88 ffff88007b611ab8
Call Trace:
[<ffffffffa0283e99>] ? zio_vdev_io_start+0x1c9/0x2c0 [zfs]
[<ffffffff81096fae>] ? prepare_to_wait_exclusive+0x4e/0x80
[<ffffffffa013e29d>] cv_wait_common+0xed/0x100 [spl]
[<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa01e68c7>] ? buf_hash_find+0x87/0x110 [zfs]
[<ffffffffa013e305>] __cv_wait+0x15/0x20 [spl]
[<ffffffffa01ed20b>] arc_read+0xab/0x8e0 [zfs]
[<ffffffffa0284823>] ? zio_nowait+0xb3/0x170 [zfs]
[<ffffffffa01eac10>] ? arc_getbuf_func+0x0/0x60 [zfs]
[<ffffffffa02047af>] traverse_visitbp+0x31f/0x6e0 [zfs]
[<ffffffffa0204be4>] traverse_dnode+0x74/0x100 [zfs]
[<ffffffffa0204a90>] traverse_visitbp+0x600/0x6e0 [zfs]
[<ffffffffa0204c70>] ? traverse_prefetch_thread+0x0/0xd0 [zfs]
[<ffffffffa0204ea7>] traverse_impl+0x167/0x340 [zfs]
[<ffffffffa0205105>] traverse_dataset+0x45/0x50 [zfs]
[<ffffffffa022cb10>] ? spa_load_verify_cb+0x0/0xb0 [zfs]
[<ffffffffa0218c85>] ? dsl_pool_config_exit+0x15/0x20 [zfs]
[<ffffffffa02052b6>] traverse_pool+0x1a6/0x1d0 [zfs]
[<ffffffffa022cb10>] ? spa_load_verify_cb+0x0/0xb0 [zfs]
[<ffffffffa022cb10>] ? spa_load_verify_cb+0x0/0xb0 [zfs]
[<ffffffffa0000001>] ? dm_uevent_exit+0x1/0x20 [dm_mod]
[<ffffffffa023375b>] spa_load+0x139b/0x1800 [zfs]
[<ffffffffa0233c0e>] spa_load_best+0x4e/0x260 [zfs]
[<ffffffffa0234563>] spa_import+0x1a3/0x5e0 [zfs]
[<ffffffffa01b4484>] ? nvlist_lookup_common+0x84/0xd0 [znvpair]
[<ffffffffa0266e44>] zfs_ioc_pool_import+0xe4/0x120 [zfs]
[<ffffffffa02677dd>] zfsdev_ioctl+0x4fd/0x540 [zfs]
[<ffffffff81195382>] vfs_ioctl+0x22/0xa0
[<ffffffff811494bf>] ? unmap_region+0xff/0x130
[<ffffffff81195524>] do_vfs_ioctl+0x84/0x580
[<ffffffff8119f9b2>] ? alloc_fd+0x92/0x160
[<ffffffff81195aa1>] sys_ioctl+0x81/0xa0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
@behlendorf
Copy link
Contributor

This is probably related to spa_freeze() somehow, but the stacks show arc_read() is blocked waiting for an I/O to the disk to complete.

(gdb) list *(arc_read+0xab)
0x4c3b is in arc_read (/home/behlendo/src/git/zfs/module/zfs/../../module/zfs/arc.c:3129).
3124            *arc_flags |= ARC_CACHED;
3125    
3126            if (HDR_IO_IN_PROGRESS(hdr)) {
3127    
3128                if (*arc_flags & ARC_WAIT) {
3129                    cv_wait(&hdr->b_cv, hash_lock);
3130                    mutex_exit(hash_lock);
3131                    goto top;
3132                }
3133                ASSERT(*arc_flags & ARC_NOWAIT);

@dweeezil
Copy link
Contributor

Another data point since I see mention of spa_freeze(): I was playing with (porting) ziltest some time ago (forgot why) and could never get it working right because something seemed to be wrong with freeze. This could have nothing to do with the problem being reported here but it might also represent an avenue for further testing in a non-lustre environment.

@behlendorf
Copy link
Contributor

@dweeezil I ran in to similar issues when porting ziltest but I never ran them down either.

@behlendorf behlendorf removed this from the 0.6.4 milestone Oct 30, 2014
@behlendorf
Copy link
Contributor

@utopiabound Is this still an issue?

@utopiabound
Copy link
Contributor Author

I cannot reproduce with zfs/spl 0.6.3

@behlendorf
Copy link
Contributor

OK, then I'm going to close this out. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants