Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another encryption bug: "unencrypted block in encrypted object set" #14330

Open
lheckemann opened this issue Dec 26, 2022 · 1 comment
Open
Labels
Component: Encryption "native encryption" feature Component: Send/Recv "zfs send/recv" feature Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@lheckemann
Copy link

lheckemann commented Dec 26, 2022

System information

Type Version/Name
Distribution Name NixOS
Distribution Version 22.11
Kernel Version 5.15.81
Architecture amd64
OpenZFS Version zfs-2.1.7-1/zfs-kmod-2.1.7-1

Describe the problem you're observing

Attempting to send a snapshot of an encrypted dataset results in a panic.

Describe how to reproduce the problem

How to get to the on-disk state that leads to this panic is unclear to me. However, on the affected machine, zfs send -i madeye/e/backup/linus/borg@2022-05-30-000000 madeye/e/backup/linus/borg@2022-05-31-121212 | pv >/dev/null reliably triggers it.

Interestingly, the date of the snapshot matches when I converted the dataset that this was being synced from from encrypted to unencrypted.

Include any warning/errors/backtraces from the system logs

[210065.805807] PANIC: unencrypted block in encrypted object set 244177
[210065.805813] Showing stack for process 93744
[210065.805815] CPU: 0 PID: 93744 Comm: send_traverse Tainted: P          IO      5.15.81 #1-NixOS
[210065.805817] Hardware name: FUJITSU D3402-B1/D3402-B1, BIOS V5.0.0.11 R1.13.0 for D3402-B1x                    02/19/2016
[210065.805818] Call Trace:
[210065.805820]  <TASK>
[210065.805823]  dump_stack_lvl+0x46/0x5e
[210065.805830]  vcmn_err.cold+0x50/0x68 [spl]
[210065.805839]  ? __cond_resched+0x16/0x50
[210065.805842]  ? __kmalloc_node+0x14c/0x490
[210065.805844]  ? mutex_lock+0xe/0x30
[210065.805846]  ? spl_kmem_zalloc+0xa7/0xf0 [spl]
[210065.805852]  ? spa_log_error+0xcb/0x150 [zfs]
[210065.805938]  zfs_panic_recover+0x6d/0x90 [zfs]
[210065.806013]  send_cb+0x4ec/0x5e0 [zfs]
[210065.806068]  traverse_visitbp+0x371/0xb00 [zfs]
[210065.806123]  ? __cond_resched+0x16/0x50
[210065.806126]  ? mutex_lock+0xe/0x30
[210065.806128]  ? bqueue_enqueue_impl+0xf9/0x140 [zfs]
[210065.806176]  traverse_dnode+0xb9/0x1f0 [zfs]
[210065.806230]  traverse_visitbp+0x78e/0xb00 [zfs]
[210065.806282]  traverse_visitbp+0x4ca/0xb00 [zfs]
[210065.806333]  traverse_visitbp+0x4ca/0xb00 [zfs]
[210065.806385]  traverse_visitbp+0x4ca/0xb00 [zfs]
[210065.806437]  traverse_visitbp+0x4ca/0xb00 [zfs]
[210065.806489]  traverse_visitbp+0x4ca/0xb00 [zfs]
[210065.806542]  traverse_dnode+0xb9/0x1f0 [zfs]
[210065.806594]  traverse_visitbp+0x9a1/0xb00 [zfs]
[210065.806644]  ? zap_lookup+0xd6/0x100 [zfs]
[210065.806725]  traverse_impl+0x1eb/0x4c0 [zfs]
[210065.806778]  ? send_merge_thread+0x370/0x370 [zfs]
[210065.806830]  ? set_next_task_fair+0x2d/0xd0
[210065.806834]  traverse_dataset_resume+0x48/0x60 [zfs]
[210065.806885]  ? send_merge_thread+0x370/0x370 [zfs]
[210065.806935]  send_traverse_thread+0x50/0xa0 [zfs]
[210065.806986]  ? enqueue_range+0x170/0x170 [zfs]
[210065.807036]  thread_generic_wrapper+0x59/0x70 [spl]
[210065.807044]  ? __thread_exit+0x20/0x20 [spl]
[210065.807050]  kthread+0x127/0x150
[210065.807054]  ? set_kthread_struct+0x50/0x50
[210065.807057]  ret_from_fork+0x22/0x30
[210065.807061]  </TASK>
@rincebrain
Copy link
Contributor

rincebrain commented Dec 29, 2022

Some fun digging

After patching dmu_send to print the object it was on, and inspecting it...

# zdb -e -U /dev/null -dbdbdbdbdbdb madeye-old/e/backup/linus/borg@2022-05-31-121212 655
Dataset madeye-old/e/backup/linus/borg@2022-05-31-121212 [ZPL], ID 244177, cr_txg 6882075, 59.4G, 258 objects, rootbp DVA[0]=<0:a6b2f08f000:1000> DVA[1]=<0:a722c934000:1000> [L0 DMU objset] fletcher4 uncompressed authenticated LE contiguous unique double size=1000L/1000P birth=6882071L/6882071P fill=258 cksum=82e693bfa:1d97a8fd3a8c:35e3eb1b38d9e2:41dc6f2bd4993b4f

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
       655    1   128K    512      0     512    512  100.00  ZFS plain file (K=inherit) (Z=inherit=inherit)
                                               176   bonus  System attributes
	dnode flags: USED_BYTES
	dnode maxblkid: 0
		(bonus encrypted)
		(object encrypted)
Indirect blocks:
               0 L0 EMBEDDED [L0 ZFS plain file] et=0 lz4 size=200L/22P birth=6863859L

		segment [0000000000000000, 0000000000000200) size   512

Now that's not supposed to happen, now is it?

So it seems like somehow we generated an embedded write record on an encrypted dataset. Whoopsie.

At the time, the source and receiver would have been running 2.1.4.

I haven't been able to fiddle anything into reproducing this, so far.

edit: Well, I should say, I haven't reproduced it with stock code. I pretty readily fiddled the code into allowing me to generate records like this, though sending them raw panics, so it seems unlikely that's directly how this happened...

@rincebrain rincebrain added Component: Encryption "native encryption" feature Component: Send/Recv "zfs send/recv" feature labels Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Encryption "native encryption" feature Component: Send/Recv "zfs send/recv" feature Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants