Unable to import zpool with corrupted SPA history #3889

jfilizetti · 2015-10-05T18:22:17Z

I've had several pools get a corrupted SPA history which then prevents them from being imported. Is there a workaround to "rebuild" the history so they can be imported again?

When attempting to import the pool it just hangs and causes a panic message to dmesg along with several hung tasks. I've attempted using a freebsd VM with the same hang.

[root@oss-01 ~]# dmesg
WARNING: Pool 'ost0' has encountered an uncorrectable I/O failure and has been suspended.

INFO: task zpool:7252 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-504.8.1.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zpool         D 0000000000000001     0  7252  11178 0x00000000
 ffff880f1f07fbd8 0000000000000086 ffff880f1f07fb88 ffffffff810649fe
 ffff881015b58a00 ffff880f0000000d 0000000d1f07fba8 0000000000000001
 ffff880f1f07fbb8 0000000000000082 ffff881051e4bab8 ffff880f1f07ffd8
Call Trace:
 [] ? try_to_wake_up+0x24e/0x3e0
 [] ? prepare_to_wait_exclusive+0x4e/0x80
 [] cv_wait_common+0x11d/0x130 [spl]
 [] ? autoremove_wake_function+0x0/0x40
 [] __cv_wait+0x15/0x20 [spl]
 [] txg_wait_synced+0xef/0x140 [zfs]
 [] spa_config_update+0xcc/0x120 [zfs]
 [] spa_import+0x56a/0x730 [zfs]
 [] ? nvlist_lookup_common+0x84/0xd0 [znvpair]
 [] zfs_ioc_pool_import+0xe4/0x120 [zfs]
 [] zfsdev_ioctl+0x495/0x4d0 [zfs]
 [] vfs_ioctl+0x22/0xa0
 [] do_vfs_ioctl+0x84/0x580
 [] sys_ioctl+0x81/0xa0
 [] system_call_fastpath+0x16/0x1b
INFO: task txg_sync:7530 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-504.8.1.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync      D 000000000000000d     0  7530      2 0x00000000
 ffff880f1d991880 0000000000000046 0000000000000000 ffff8810334f3a38
 0000000000000000 0000000000000000 0000ebd08ba2a003 ffffffff81064ba2
 ffff880f1d991840 000000010f73796c ffff880f1d981058 ffff880f1d991fd8
Call Trace:
 [] ? default_wake_function+0x12/0x20
 [] io_schedule+0x73/0xc0
 [] cv_wait_common+0xaf/0x130 [spl]
 [] ? autoremove_wake_function+0x0/0x40
 [] __cv_wait_io+0x18/0x20 [spl]
 [] zio_wait+0x10b/0x1e0 [zfs]
 [] dbuf_read+0x439/0x850 [zfs]
 [] ? dnode_rele_and_unlock+0x64/0xb0 [zfs]
 [] dmu_buf_will_dirty+0x58/0xc0 [zfs]
 [] dmu_write+0xa0/0x1a0 [zfs]
 [] spa_history_write+0x186/0x1d0 [zfs]
 [] spa_history_log_sync+0x1b9/0x4e0 [zfs]
 [] dsl_sync_task_sync+0x10a/0x110 [zfs]
 [] dsl_pool_sync+0x2fb/0x440 [zfs]
 [] ? spa_sync_nvlist+0x12d/0x1d0 [zfs]
 [] spa_sync+0x35e/0xb10 [zfs]
 [] ? __wake_up_common+0x59/0x90
 [] ? read_tsc+0x9/0x20
 [] txg_sync_thread+0x3d8/0x670 [zfs]
 [] ? txg_sync_thread+0x0/0x670 [zfs]
 [] ? txg_sync_thread+0x0/0x670 [zfs]
 [] thread_generic_wrapper+0x68/0x80 [spl]
 [] ? thread_generic_wrapper+0x0/0x80 [spl]
 [] kthread+0x9e/0xc0
 [] child_rip+0xa/0x20
 [] ? kthread+0x0/0xc0
 [] ? child_rip+0x0/0x20
INFO: task zpool:7252 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-504.8.1.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zpool         D 0000000000000001     0  7252  11178 0x00000000
 ffff880f1f07fbd8 0000000000000086 ffff880f1f07fb88 ffffffff810649fe
 ffff881015b58a00 ffff880f0000000d 0000000d1f07fba8 0000000000000001
 ffff880f1f07fbb8 0000000000000082 ffff881051e4bab8 ffff880f1f07ffd8
Call Trace:
 [] ? try_to_wake_up+0x24e/0x3e0
 [] ? prepare_to_wait_exclusive+0x4e/0x80
 [] cv_wait_common+0x11d/0x130 [spl]
 [] ? autoremove_wake_function+0x0/0x40
 [] __cv_wait+0x15/0x20 [spl]
 [] txg_wait_synced+0xef/0x140 [zfs]
 [] spa_config_update+0xcc/0x120 [zfs]
 [] spa_import+0x56a/0x730 [zfs]
 [] ? nvlist_lookup_common+0x84/0xd0 [znvpair]
 [] zfs_ioc_pool_import+0xe4/0x120 [zfs]
 [] zfsdev_ioctl+0x495/0x4d0 [zfs]
 [] vfs_ioctl+0x22/0xa0
 [] do_vfs_ioctl+0x84/0x580
 [] sys_ioctl+0x81/0xa0
 [] system_call_fastpath+0x16/0x1b

[root@oss-01 ~]# zdb -e ost0 -h
Unable to read history: error 52

ryao · 2015-10-08T10:36:41Z

Please get in touch with us in #zfsonlinux on freenode. It will be very difficult to assist you from the issue tracker.

ryao · 2015-10-08T15:19:12Z

After talking with @jfilizetti in IRC, it seems like he had a situation where two nodes imported pools on a shared SAS backplane simultaneously, which caused pool corruption. It happened because he was working on implementing HA and he did not fully handle split brain by cutting off access to disks from the backplane when a node went down and consensus went to another node.

This was referenced Oct 8, 2015

zfs-import: Use cache file to reimport pools at boot #3800

Closed

0.6.5 Regression: Exporting the pool on shutdown in dracut dirties vdev labels and pollutes zpool history. #3875

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to import zpool with corrupted SPA history #3889

Unable to import zpool with corrupted SPA history #3889

jfilizetti commented Oct 5, 2015

ryao commented Oct 8, 2015

ryao commented Oct 8, 2015

Unable to import zpool with corrupted SPA history #3889

Unable to import zpool with corrupted SPA history #3889

Comments

jfilizetti commented Oct 5, 2015

ryao commented Oct 8, 2015

ryao commented Oct 8, 2015