Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to import zpool with corrupted SPA history #3889

Closed
jfilizetti opened this issue Oct 5, 2015 · 2 comments
Closed

Unable to import zpool with corrupted SPA history #3889

jfilizetti opened this issue Oct 5, 2015 · 2 comments
Labels
Status: Inactive Not being actively updated

Comments

@jfilizetti
Copy link

I've had several pools get a corrupted SPA history which then prevents them from being imported. Is there a workaround to "rebuild" the history so they can be imported again?

When attempting to import the pool it just hangs and causes a panic message to dmesg along with several hung tasks. I've attempted using a freebsd VM with the same hang.

[root@oss-01 ~]# dmesg
WARNING: Pool 'ost0' has encountered an uncorrectable I/O failure and has been suspended.

INFO: task zpool:7252 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-504.8.1.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zpool         D 0000000000000001     0  7252  11178 0x00000000
 ffff880f1f07fbd8 0000000000000086 ffff880f1f07fb88 ffffffff810649fe
 ffff881015b58a00 ffff880f0000000d 0000000d1f07fba8 0000000000000001
 ffff880f1f07fbb8 0000000000000082 ffff881051e4bab8 ffff880f1f07ffd8
Call Trace:
 [] ? try_to_wake_up+0x24e/0x3e0
 [] ? prepare_to_wait_exclusive+0x4e/0x80
 [] cv_wait_common+0x11d/0x130 [spl]
 [] ? autoremove_wake_function+0x0/0x40
 [] __cv_wait+0x15/0x20 [spl]
 [] txg_wait_synced+0xef/0x140 [zfs]
 [] spa_config_update+0xcc/0x120 [zfs]
 [] spa_import+0x56a/0x730 [zfs]
 [] ? nvlist_lookup_common+0x84/0xd0 [znvpair]
 [] zfs_ioc_pool_import+0xe4/0x120 [zfs]
 [] zfsdev_ioctl+0x495/0x4d0 [zfs]
 [] vfs_ioctl+0x22/0xa0
 [] do_vfs_ioctl+0x84/0x580
 [] sys_ioctl+0x81/0xa0
 [] system_call_fastpath+0x16/0x1b
INFO: task txg_sync:7530 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-504.8.1.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync      D 000000000000000d     0  7530      2 0x00000000
 ffff880f1d991880 0000000000000046 0000000000000000 ffff8810334f3a38
 0000000000000000 0000000000000000 0000ebd08ba2a003 ffffffff81064ba2
 ffff880f1d991840 000000010f73796c ffff880f1d981058 ffff880f1d991fd8
Call Trace:
 [] ? default_wake_function+0x12/0x20
 [] io_schedule+0x73/0xc0
 [] cv_wait_common+0xaf/0x130 [spl]
 [] ? autoremove_wake_function+0x0/0x40
 [] __cv_wait_io+0x18/0x20 [spl]
 [] zio_wait+0x10b/0x1e0 [zfs]
 [] dbuf_read+0x439/0x850 [zfs]
 [] ? dnode_rele_and_unlock+0x64/0xb0 [zfs]
 [] dmu_buf_will_dirty+0x58/0xc0 [zfs]
 [] dmu_write+0xa0/0x1a0 [zfs]
 [] spa_history_write+0x186/0x1d0 [zfs]
 [] spa_history_log_sync+0x1b9/0x4e0 [zfs]
 [] dsl_sync_task_sync+0x10a/0x110 [zfs]
 [] dsl_pool_sync+0x2fb/0x440 [zfs]
 [] ? spa_sync_nvlist+0x12d/0x1d0 [zfs]
 [] spa_sync+0x35e/0xb10 [zfs]
 [] ? __wake_up_common+0x59/0x90
 [] ? read_tsc+0x9/0x20
 [] txg_sync_thread+0x3d8/0x670 [zfs]
 [] ? txg_sync_thread+0x0/0x670 [zfs]
 [] ? txg_sync_thread+0x0/0x670 [zfs]
 [] thread_generic_wrapper+0x68/0x80 [spl]
 [] ? thread_generic_wrapper+0x0/0x80 [spl]
 [] kthread+0x9e/0xc0
 [] child_rip+0xa/0x20
 [] ? kthread+0x0/0xc0
 [] ? child_rip+0x0/0x20
INFO: task zpool:7252 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-504.8.1.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zpool         D 0000000000000001     0  7252  11178 0x00000000
 ffff880f1f07fbd8 0000000000000086 ffff880f1f07fb88 ffffffff810649fe
 ffff881015b58a00 ffff880f0000000d 0000000d1f07fba8 0000000000000001
 ffff880f1f07fbb8 0000000000000082 ffff881051e4bab8 ffff880f1f07ffd8
Call Trace:
 [] ? try_to_wake_up+0x24e/0x3e0
 [] ? prepare_to_wait_exclusive+0x4e/0x80
 [] cv_wait_common+0x11d/0x130 [spl]
 [] ? autoremove_wake_function+0x0/0x40
 [] __cv_wait+0x15/0x20 [spl]
 [] txg_wait_synced+0xef/0x140 [zfs]
 [] spa_config_update+0xcc/0x120 [zfs]
 [] spa_import+0x56a/0x730 [zfs]
 [] ? nvlist_lookup_common+0x84/0xd0 [znvpair]
 [] zfs_ioc_pool_import+0xe4/0x120 [zfs]
 [] zfsdev_ioctl+0x495/0x4d0 [zfs]
 [] vfs_ioctl+0x22/0xa0
 [] do_vfs_ioctl+0x84/0x580
 [] sys_ioctl+0x81/0xa0
 [] system_call_fastpath+0x16/0x1b

[root@oss-01 ~]# zdb -e ost0 -h
Unable to read history: error 52
@ryao
Copy link
Contributor

ryao commented Oct 8, 2015

Please get in touch with us in #zfsonlinux on freenode. It will be very difficult to assist you from the issue tracker.

@ryao
Copy link
Contributor

ryao commented Oct 8, 2015

After talking with @jfilizetti in IRC, it seems like he had a situation where two nodes imported pools on a shared SAS backplane simultaneously, which caused pool corruption. It happened because he was working on implementing HA and he did not fully handle split brain by cutting off access to disks from the backplane when a node went down and consensus went to another node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Inactive Not being actively updated
Projects
None yet
Development

No branches or pull requests

3 participants
@ryao @jfilizetti and others