Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

permanent errors after upgrading ZFS #13763

Open
clhedrick opened this issue Aug 10, 2022 · 9 comments
Open

permanent errors after upgrading ZFS #13763

clhedrick opened this issue Aug 10, 2022 · 9 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@clhedrick
Copy link

clhedrick commented Aug 10, 2022

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 22.04
Kernel Version 5.15.0-43-generic
Architecture x86_64
OpenZFS Version zfs-2.1.4-0ubuntu0.1

Describe the problem you're observing

After upgrading from Ubuntu 20 to 22, zpool status show 143 permanent errors. I've never had an issue with devices. No errors shown then or after a scrub.

This is a backup system. I backup to it by send | receive. Originally one of the systems backed up was encrypted. After a crash I reconstructed it unencrypted, but I didn't reconstruct the backup system, as it had no errors. I did create unencrypted versions of the file systems on the backup system, but kept some of the encrypted ones around. They caused no problems under Ubuntu 20. But under 22, I got failures to mount, and 143 permanent errors. zpool status -v showed file names that were all in encrypted file systems.

I destroyed the encrypted file systems and then run a scrub. Now I've got 2 permanent errors
<0x1c336>:<0x0>
<0x2b49c>:<0x0>

Based on other reports I'll do a second scrub this weekend.

Note that the root file system is encrypted. It has no data, not even mount points. It's not mounted, although it will mount.

It would be useful to be able to clear the errors. We have monitoring scripts that check for problems with our ZFS file systems. This shows as a problem. We can ignore it, but that would hide any new errors that might occur.

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

@clhedrick clhedrick added the Type: Defect Incorrect behavior (e.g. crash, hang) label Aug 10, 2022
@ofthesun9
Copy link
Contributor

I have similar issues...
I could clear the errors by running (twice) a scrub. It wasn't mandatory to have the scrub competed at 100%: canceling the scrub(s) shortly after starting them did the trick.

I created a new encrypted dataset under zfs 2.1.5 (syncoid) and destroyed the former encrypted dataset (initially created under zfs 0.8.3 with syncoid)

So far so good :-)... no permanent errors anymore

@versus167
Copy link

Same problems here after upgrade from ubuntu 20.04 to 22.04

@versus167
Copy link

Some new flavour of this problem. On an other machine I got after the update hangs an this log-entries:

Aug 21 20:44:08 backup kernel: [ 4841.628971] VERIFY3(0 == zap_add(mos, dsl_dir_phys(pds)->dd_child_dir_zapobj, name, sizeof (uint64_t), 1, &ddobj, tx)) failed (0 == 17)
Aug 21 20:44:08 backup kernel: [ 4841.629271] PANIC at dsl_dir.c:951:dsl_dir_create_sync()
Aug 21 20:44:08 backup kernel: [ 4841.629338] Showing stack for process 675
Aug 21 20:44:08 backup kernel: [ 4841.629340] CPU: 0 PID: 675 Comm: txg_sync Tainted: P O 5.15.0-46-generic #49-Ubuntu
Aug 21 20:44:08 backup kernel: [ 4841.629344] Hardware name: Gigabyte Technology Co., Ltd. GA-A55M-S2V/GA-A55M-S2V, BIOS F6 11/18/2011
Aug 21 20:44:08 backup kernel: [ 4841.629346] Call Trace:
Aug 21 20:44:08 backup kernel: [ 4841.629349]
Aug 21 20:44:08 backup kernel: [ 4841.629352] show_stack+0x52/0x5c
Aug 21 20:44:08 backup kernel: [ 4841.629357] dump_stack_lvl+0x4a/0x63
Aug 21 20:44:08 backup kernel: [ 4841.629363] dump_stack+0x10/0x16
Aug 21 20:44:08 backup kernel: [ 4841.629366] spl_dumpstack+0x29/0x2f [spl]
Aug 21 20:44:08 backup kernel: [ 4841.629382] spl_panic+0xd1/0xe9 [spl]
Aug 21 20:44:08 backup kernel: [ 4841.629394] ? dmu_buf_rele+0xe/0x20 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.629598] ? zap_unlockdir+0x46/0x60 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.629777] ? zap_add_impl+0x96/0x160 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.629957] ? zap_add+0x7b/0xb0 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.630138] dsl_dir_create_sync+0x1ff/0x280 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.630306] ? spl_kmem_free_impl+0x29/0x40 [spl]
Aug 21 20:44:08 backup kernel: [ 4841.630319] dsl_dataset_create_sync+0x52/0x380 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.630498] dmu_recv_begin_sync+0x374/0xa00 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.630659] ? spa_get_slop_space+0x6e/0xc0 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.630833] ? __cond_resched+0x1a/0x50
Aug 21 20:44:08 backup kernel: [ 4841.630838] dsl_sync_task_sync+0xb9/0x110 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.631010] dsl_pool_sync+0x369/0x400 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.631177] spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.631353] spa_sync+0x2dc/0x5b0 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.631526] txg_sync_thread+0x266/0x2f0 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.631703] ? txg_dispatch_callbacks+0x100/0x100 [zfs]
Aug 21 20:44:08 backup kernel: [ 4841.631883] thread_generic_wrapper+0x64/0x80 [spl]
Aug 21 20:44:08 backup kernel: [ 4841.631896] ? __thread_exit+0x20/0x20 [spl]
Aug 21 20:44:08 backup kernel: [ 4841.631907] kthread+0x12a/0x150
Aug 21 20:44:08 backup kernel: [ 4841.631912] ? set_kthread_struct+0x50/0x50
Aug 21 20:44:08 backup kernel: [ 4841.631914] ret_from_fork+0x22/0x30
Aug 21 20:44:08 backup kernel: [ 4841.631919]
Aug 21 20:48:03 backup kernel: [ 5076.258637] INFO: task txg_sync:675 blocked for more than 120 seconds.
Aug 21 20:48:03 backup kernel: [ 5076.258829] Tainted: P O 5.15.0-46-generic #49-Ubuntu
Aug 21 20:48:03 backup kernel: [ 5076.259007] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 21 20:48:03 backup kernel: [ 5076.259149] task:txg_sync state:D stack: 0 pid: 675 ppid: 2 flags:0x00004000
Aug 21 20:48:03 backup kernel: [ 5076.259163] Call Trace:
Aug 21 20:48:03 backup kernel: [ 5076.259169]
Aug 21 20:48:03 backup kernel: [ 5076.259176] __schedule+0x23d/0x590
Aug 21 20:48:03 backup kernel: [ 5076.259197] schedule+0x4e/0xc0
Aug 21 20:48:03 backup kernel: [ 5076.259206] spl_panic+0xe7/0xe9 [spl]
Aug 21 20:48:03 backup kernel: [ 5076.259254] ? dmu_buf_rele+0xe/0x20 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.259710] ? zap_unlockdir+0x46/0x60 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.260216] ? zap_add_impl+0x96/0x160 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.260722] ? zap_add+0x7b/0xb0 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.261229] dsl_dir_create_sync+0x1ff/0x280 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.261690] ? spl_kmem_free_impl+0x29/0x40 [spl]
Aug 21 20:48:03 backup kernel: [ 5076.261728] dsl_dataset_create_sync+0x52/0x380 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.262192] dmu_recv_begin_sync+0x374/0xa00 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.262696] ? spa_get_slop_space+0x6e/0xc0 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.263289] ? __cond_resched+0x1a/0x50
Aug 21 20:48:03 backup kernel: [ 5076.263303] dsl_sync_task_sync+0xb9/0x110 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.263773] dsl_pool_sync+0x369/0x400 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.264239] spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.264726] spa_sync+0x2dc/0x5b0 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.265213] txg_sync_thread+0x266/0x2f0 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.265712] ? txg_dispatch_callbacks+0x100/0x100 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.266207] thread_generic_wrapper+0x64/0x80 [spl]
Aug 21 20:48:03 backup kernel: [ 5076.266246] ? __thread_exit+0x20/0x20 [spl]
Aug 21 20:48:03 backup kernel: [ 5076.266284] kthread+0x12a/0x150
Aug 21 20:48:03 backup kernel: [ 5076.266295] ? set_kthread_struct+0x50/0x50
Aug 21 20:48:03 backup kernel: [ 5076.266305] ret_from_fork+0x22/0x30
Aug 21 20:48:03 backup kernel: [ 5076.266318]
Aug 21 20:48:03 backup kernel: [ 5076.266351] INFO: task zfs:1782 blocked for more than 120 seconds.
Aug 21 20:48:03 backup kernel: [ 5076.266561] Tainted: P O 5.15.0-46-generic #49-Ubuntu
Aug 21 20:48:03 backup kernel: [ 5076.266714] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 21 20:48:03 backup kernel: [ 5076.266857] task:zfs state:D stack: 0 pid: 1782 ppid: 1781 flags:0x00004002
Aug 21 20:48:03 backup kernel: [ 5076.266870] Call Trace:
Aug 21 20:48:03 backup kernel: [ 5076.266874]
Aug 21 20:48:03 backup kernel: [ 5076.266878] __schedule+0x23d/0x590
Aug 21 20:48:03 backup kernel: [ 5076.266887] ? autoremove_wake_function+0x12/0x40
Aug 21 20:48:03 backup kernel: [ 5076.266897] schedule+0x4e/0xc0
Aug 21 20:48:03 backup kernel: [ 5076.266905] io_schedule+0x46/0x80
Aug 21 20:48:03 backup kernel: [ 5076.266913] cv_wait_common+0xab/0x130 [spl]
Aug 21 20:48:03 backup kernel: [ 5076.266953] ? wait_woken+0x70/0x70
Aug 21 20:48:03 backup kernel: [ 5076.266962] __cv_wait_io+0x18/0x20 [spl]
Aug 21 20:48:03 backup kernel: [ 5076.267002] txg_wait_synced_impl+0x9b/0x120 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.267520] txg_wait_synced+0x10/0x50 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.268016] dsl_sync_task_common+0x1c6/0x2a0 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.268486] ? recv_begin_check_existing_impl+0x590/0x590 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.268924] ? recv_check_large_blocks+0x60/0x60 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.269365] ? recv_begin_check_existing_impl+0x590/0x590 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.269804] ? recv_check_large_blocks+0x60/0x60 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.270242] dsl_sync_task+0x1a/0x20 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.270754] dmu_recv_begin+0x1e2/0x390 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.271292] zfs_ioc_recv_impl.constprop.0+0x106/0xb20 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.271898] zfs_ioc_recv_new+0x310/0x3b0 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.272498] ? spl_kmem_alloc_impl+0xbe/0xd0 [spl]
Aug 21 20:48:03 backup kernel: [ 5076.272542] ? spl_vmem_alloc+0x19/0x20 [spl]
Aug 21 20:48:03 backup kernel: [ 5076.272586] ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
Aug 21 20:48:03 backup kernel: [ 5076.272629] ? nv_mem_zalloc+0x33/0x50 [znvpair]
Aug 21 20:48:03 backup kernel: [ 5076.272668] ? nvlist_xalloc+0x51/0xa0 [znvpair]
Aug 21 20:48:03 backup kernel: [ 5076.272707] ? nvlist_alloc+0x28/0x40 [znvpair]
Aug 21 20:48:03 backup kernel: [ 5076.272747] zfsdev_ioctl_common+0x285/0x740 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.273270] ? _copy_from_user+0x2e/0x70
Aug 21 20:48:03 backup kernel: [ 5076.273281] zfsdev_ioctl+0x57/0xf0 [zfs]
Aug 21 20:48:03 backup kernel: [ 5076.273790] __x64_sys_ioctl+0x95/0xd0
Aug 21 20:48:03 backup kernel: [ 5076.273803] do_syscall_64+0x5c/0xc0
Aug 21 20:48:03 backup kernel: [ 5076.273812] ? do_user_addr_fault+0x1e7/0x670
Aug 21 20:48:03 backup kernel: [ 5076.273821] ? do_syscall_64+0x69/0xc0
Aug 21 20:48:03 backup kernel: [ 5076.273828] ? exit_to_user_mode_prepare+0x37/0xb0
Aug 21 20:48:03 backup kernel: [ 5076.273838] ? irqentry_exit_to_user_mode+0x9/0x20
Aug 21 20:48:03 backup kernel: [ 5076.273847] ? irqentry_exit+0x1d/0x30
Aug 21 20:48:03 backup kernel: [ 5076.273856] ? exc_page_fault+0x89/0x170
Aug 21 20:48:03 backup kernel: [ 5076.273865] entry_SYSCALL_64_after_hwframe+0x61/0xcb
Aug 21 20:48:03 backup kernel: [ 5076.273876] RIP: 0033:0x7faa82a99aff
Aug 21 20:48:03 backup kernel: [ 5076.273884] RSP: 002b:00007ffcd73c4bb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 21 20:48:03 backup kernel: [ 5076.273893] RAX: ffffffffffffffda RBX: 00007ffcd73c8280 RCX: 00007faa82a99aff
Aug 21 20:48:03 backup kernel: [ 5076.273899] RDX: 00007ffcd73c4c30 RSI: 0000000000005a46 RDI: 0000000000000005
Aug 21 20:48:03 backup kernel: [ 5076.273904] RBP: 00007ffcd73c8220 R08: 0000000000000000 R09: 0000555b46c32d70
Aug 21 20:48:03 backup kernel: [ 5076.273909] R10: 00007faa82b98da0 R11: 0000000000000246 R12: 0000000000005a46
Aug 21 20:48:03 backup kernel: [ 5076.273914] R13: 00007ffcd73c4c30 R14: 0000000000005a46 R15: 0000555b46c0f7a0
Aug 21 20:48:03 backup kernel: [ 5076.273923]

@clhedrick
Copy link
Author

clhedrick commented Aug 22, 2022 via email

@versus167
Copy link

In the meantime, I no longer believe that. There is only the connection that the change from Ubuntu 20 to 22 took place only on Saturday and there were problems with zfs send/receive. But there are no more errors reported. So probably not the same problem.

@clhedrick
Copy link
Author

If you're talking about send / receive of encrypted data, that's could be known issues that were also in 20.04. It's unsafe to send or receive from or into an encrypted file system. It's unclear whether it is safe to use encryption without using send / receive. I'm currently skeptical.

@versus167
Copy link

I'm talking about "send from unencrypted to encrypted dataset". I use this constellation for about one year now withoout any problems. And now - after upgrade - I run in this problem...

@jonryk
Copy link

jonryk commented Aug 30, 2022

I just had the same experience as @clhedrick - I upgraded to Ubuntu 22, and lost a few ZFS datasets in my ZFS pool!
Towards the end of the "zpool status -v" output I get:

errors: Permanent errors have been detected in the following files:

        IWPro/home:<0x0>

(and a couple of other datasets)...
My pool contains both encrypted and unencrypted datasets. Only encrypted datasets are affected, but not all of them.
The encrypted datasets affected ARE datasets I also "zfs send" to an offsite ZFS pool (datasets are encrypted at both ends, so perhaps somewhat similar to @versus167), but I don't really see how a simple "zfs send" is able to corrupt the datasets...

In my case, the root (/) filesystem is OK, whereas e.g. the /home/ filesystem is not, although both reside in the same pool. - I suspect this could be the same issue as #13709 .

@mat128
Copy link

mat128 commented Oct 27, 2022

+1 @jonryk , this looks like #13709
I posted a script to recover the data:

zfs snapshot dataset_name/documents@recover1
zfs snapshot dataset_name/documents@recover2
zfs send --raw -i dataset_name/documents@recover1 dataset_name/documents@recover2 > /tmp/recovery.bin
zfs rollback -r dataset_name/documents@recover1
zfs receive -F -v dataset_name/documents < /tmp/recovery.bin
zfs mount dataset_name/documents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

5 participants