Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UBSAN: shift-out-of-bounds spew #14777

Open
adamdmoss opened this issue Apr 20, 2023 · 4 comments
Open

UBSAN: shift-out-of-bounds spew #14777

adamdmoss opened this issue Apr 20, 2023 · 4 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@adamdmoss
Copy link
Contributor

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 22.04.2
Kernel Version Linux version 5.19.0-1022-lowlatency (buildd@lcy02-amd64-044) (x86_64-linux-gnu-gcc (Ubuntu 11.3.0-1ubuntu122.04) 11.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #2322.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Mar 29 15:33:15 UTC 2
Architecture x64
OpenZFS Version 135d9a9 / master

Describe the problem you're observing

kernel spew when mounting(?) zfs filesystems:

UBSAN: shift-out-of-bounds in /var/lib/dkms/zfs/2.1.99/build/module/zfs/zio.c:5008:28
[  390.361064] shift exponent -5 is negative

Full spew:

[  390.361053] ================================================================================
[  390.361060] UBSAN: shift-out-of-bounds in /var/lib/dkms/zfs/2.1.99/build/module/zfs/zio.c:5008:28
[  390.361064] shift exponent -5 is negative
[  390.361066] CPU: 5 PID: 10457 Comm: z_rd_int_0 Tainted: P           OE     5.19.0-1022-lowlatency #23~22.04.1-Ubuntu
[  390.361069] Hardware name: Gigabyte Technology Co., Ltd. Z68MA-D2H-B3/Z68MA-D2H-B3, BIOS F10 02/23/2012
[  390.361071] Call Trace:
[  390.361073]  <TASK>
[  390.361076]  show_stack+0x52/0x69
[  390.361082]  dump_stack_lvl+0x49/0x6d
[  390.361087]  dump_stack+0x10/0x18
[  390.361089]  ubsan_epilogue+0x9/0x43
[  390.361093]  __ubsan_handle_shift_out_of_bounds.cold+0x61/0xef
[  390.361098]  ? __sbitmap_get_word+0x36/0x90
[  390.361104]  zbookmark_compare.cold+0x20/0x66 [zfs]
[  390.361278]  zbookmark_subtree_completed+0x60/0x90 [zfs]
[  390.361423]  dsl_scan_check_prefetch_resume+0x6b/0xa0 [zfs]
[  390.361565]  ? abd_fletcher_4_fini+0x58/0x70 [zfs]
[  390.361680]  ? abd_fletcher_4_native+0x92/0xd0 [zfs]
[  390.361824]  ? dma_map_page_attrs+0x35/0x90
[  390.361829]  ? ktime_get+0x43/0xc0
[  390.361832]  ? __rq_qos_issue+0x26/0x50
[  390.361837]  ? blk_mq_start_request+0x3d/0x150
[  390.361842]  ? nvme_prep_rq.part.0+0xac/0x120 [nvme]
[  390.361847]  ? nvme_queue_rqs+0x1e0/0x290 [nvme]
[  390.361852]  ? aggsum_add+0x1af/0x1d0 [zfs]
[  390.361976]  ? kmem_cache_alloc+0x1b3/0x340
[  390.361979]  ? spl_kmem_cache_alloc+0x121/0x790 [spl]
[  390.361991]  ? buf_cons+0x65/0x80 [zfs]
[  390.362115]  ? arc_buf_fill+0x983/0xd90 [zfs]
[  390.362240]  dsl_scan_prefetch+0x8e/0x280 [zfs]
[  390.362382]  dsl_scan_prefetch_cb+0x158/0x310 [zfs]
[  390.362525]  arc_read_done+0x2d9/0x590 [zfs]
[  390.362650]  l2arc_read_done+0x7cf/0xae0 [zfs]
[  390.362768]  ? zio_wait_for_children+0xb2/0x140 [zfs]
[  390.362905]  zio_done+0x412/0x12a0 [zfs]
[  390.363038]  zio_execute+0x94/0x170 [zfs]
[  390.363178]  taskq_thread+0x27a/0x490 [spl]
[  390.363190]  ? wake_up_q+0xa0/0xa0
[  390.363197]  ? zio_gang_tree_free+0x70/0x70 [zfs]
[  390.363339]  ? taskq_thread_spawn+0x60/0x60 [spl]
[  390.363349]  kthread+0xeb/0x120
[  390.363353]  ? kthread_complete_and_exit+0x20/0x20
[  390.363357]  ret_from_fork+0x1f/0x30
[  390.363362]  </TASK>
[  390.363408] ================================================================================

The spew points to this code in zio.c:

int
zbookmark_compare(uint16_t dbss1, uint8_t ibs1, uint16_t dbss2, uint8_t ibs2,
    const zbookmark_phys_t *zb1, const zbookmark_phys_t *zb2)
{
	/*
	 * These variables represent the "equivalent" values for the zbookmark,
	 * after converting zbookmarks inside the meta dnode to their
	 * normal-object equivalents.
	 */
	uint64_t zb1obj, zb2obj;
	uint64_t zb1L0, zb2L0;
	uint64_t zb1level, zb2level;

	if (zb1->zb_object == zb2->zb_object &&
	    zb1->zb_level == zb2->zb_level &&
	    zb1->zb_blkid == zb2->zb_blkid)
		return (0);

	IMPLY(zb1->zb_level > 0, ibs1 >= SPA_MINBLOCKSHIFT);
	IMPLY(zb2->zb_level > 0, ibs2 >= SPA_MINBLOCKSHIFT);

	/*
	 * BP_SPANB calculates the span in blocks.
	 */
	zb1L0 = (zb1->zb_blkid) * BP_SPANB(ibs1, zb1->zb_level);
	zb2L0 = (zb2->zb_blkid) * BP_SPANB(ibs2, zb2->zb_level);

... and that final line is line 5009:
zb2L0 = (zb2->zb_blkid) * BP_SPANB(ibs2, zb2->zb_level);

BP_SPANB is

#define	BP_SPANB(indblkshift, level) \
	(((uint64_t)1) << ((level) * ((indblkshift) - SPA_BLKPTRSHIFT)))

spa.h says:
#define SPA_BLKPTRSHIFT 7

so level * (indblkshift - 7) == -5

I think those IMPLYs are trying to catch such a situation but I'm not on a debug kernel... :)

I don't see anything clearly related in recent git history so I guess my pool has some interesting corruption, but I'm reporting it just in case.

Describe how to reproduce the problem

Unsure.

@adamdmoss adamdmoss added the Type: Defect Incorrect behavior (e.g. crash, hang) label Apr 20, 2023
@ryao
Copy link
Contributor

ryao commented Apr 24, 2023

zb1->zb_level and zb2->zb_level are both 0. This is either an empty file or an embedded data, since blocks in regular files will always have a level greater than 0.

There is likely enough information here to fix this after studying the code in more detail.

@adamdmoss
Copy link
Contributor Author

FWIW I don't think I can repro this again unless my pool corrupts(?) in the same way - I rolled back one dataset (that a scrub was complaining about) to an earlier snap and everything is smooth again! 😬
Still, I guess no sort of on-disk corruption should end up with UB inside the kernel. Perhaps very wishful thinking. :)

@devZer0
Copy link

devZer0 commented Jan 16, 2024

i got this one with some other kernel trace

[220073.761054] ================================================================================
[20073.761433] UBSAN: shift-out-of-bounds in /home/tom/sources/pve/pve-kernel/proxmox-kernel-6.5.11/modules/pkg-zfs/module/zfs/zio.c:5050:28
[20073.761832] shift exponent -10 is negative
[20073.762141] CPU: 3 PID: 665554 Comm: z_rd_int_1 Tainted: P           O       6.5.11-7-pve #1
[20073.762148] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
[20073.762151] Call Trace:
[20073.762165]  <TASK>
[20073.762190]  dump_stack_lvl+0x48/0x70
[20073.762237]  dump_stack+0x10/0x20
[20073.762244]  __ubsan_handle_shift_out_of_bounds+0x1ac/0x360
[20073.762255]  zbookmark_compare.cold+0x20/0x66 [zfs]
[20073.763336]  zbookmark_subtree_completed+0x60/0x90 [zfs]
[20073.763593]  dsl_scan_check_prefetch_resume+0x82/0xc0 [zfs]
[20073.763919]  dsl_scan_prefetch+0x96/0x290 [zfs]
[20073.764298]  dsl_scan_prefetch_cb+0x15f/0x350 [zfs]
[20073.764610]  arc_read_done+0x2ad/0x4b0 [zfs]
[20073.764900]  l2arc_read_done+0x94a/0xaa0 [zfs]
[20073.765180]  ? vdev_queue_io_to_issue+0x4a4/0xce0 [zfs]
[20073.765515]  zio_done+0x28c/0x10b0 [zfs]
[20073.765826]  ? _raw_spin_unlock+0xe/0x40
[20073.765835]  ? zio_wait_for_children+0x91/0xd0 [zfs]
[20073.766138]  zio_execute+0x8b/0x130 [zfs]
[20073.766433]  ? _raw_spin_unlock_irqrestore+0x11/0x60
[20073.766440]  taskq_thread+0x282/0x490 [spl]
[20073.766494]  ? __pfx_default_wake_function+0x10/0x10
[20073.766505]  ? __pfx_zio_execute+0x10/0x10 [zfs]
[20073.766800]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[20073.766823]  kthread+0xf2/0x120
[20073.766841]  ? __pfx_kthread+0x10/0x10
[20073.766847]  ret_from_fork+0x47/0x70
[20073.766874]  ? __pfx_kthread+0x10/0x10
[20073.766880]  ret_from_fork_asm+0x1b/0x30
[20073.766894]  </TASK>
[20073.766923] ================================================================================
[641249.027057] INFO: task tokio-runtime-w:857942 blocked for more than 120 seconds.
[641249.027551]       Tainted: P           O       6.5.11-7-pve #1
[641249.027898] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[641249.028232] task:tokio-runtime-w state:D stack:0     pid:857942 ppid:1      flags:0x00004002
[641249.028241] Call Trace:
[641249.028261]  <TASK>
[641249.028294]  __schedule+0x3fd/0x1450
[641249.028304]  ? __schedule+0x405/0x1450
[641249.028310]  ? _raw_spin_unlock_irq+0xe/0x50
[641249.028318]  schedule+0x63/0x110
[641249.028323]  wb_wait_for_completion+0x89/0xc0
[641249.028342]  ? __pfx_autoremove_wake_function+0x10/0x10
[641249.028357]  __writeback_inodes_sb_nr+0x9d/0xd0
[641249.028383]  writeback_inodes_sb+0x3c/0x60
[641249.028404]  sync_filesystem+0x3d/0xb0
[641249.028415]  __x64_sys_syncfs+0x49/0xb0
[641249.028421]  do_syscall_64+0x5b/0x90
[641249.028428]  ? exit_to_user_mode_prepare+0xa5/0x190
[641249.028443]  ? syscall_exit_to_user_mode+0x37/0x60
[641249.028450]  ? do_syscall_64+0x67/0x90
[641249.028455]  ? exc_page_fault+0x94/0x1b0
[641249.028461]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[641249.028484] RIP: 0033:0x7fe8d5d1db57
[641249.028545] RSP: 002b:00007fe7b548d2a8 EFLAGS: 00000202 ORIG_RAX: 0000000000000132
[641249.028551] RAX: ffffffffffffffda RBX: 00007fe7b548d2f8 RCX: 00007fe8d5d1db57
[641249.028554] RDX: 00007fef568dce4c RSI: 0000000000000007 RDI: 000000000000001c
[641249.028558] RBP: 000000000000001c R08: 0000000000000007 R09: 00007fe8a800cae0
[641249.028560] R10: 6362d3f42356e84f R11: 0000000000000202 R12: 0000000000000001
[641249.028563] R13: 00007fe85c03af90 R14: 000000000000001c R15: 00007fe8a803c120
[641249.028570]  </TASK>
[641249.028584] INFO: task tokio-runtime-w:858018 blocked for more than 120 seconds.
[641249.028927]       Tainted: P           O       6.5.11-7-pve #1
[641249.029260] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[641249.029585] task:tokio-runtime-w state:D stack:0     pid:858018 ppid:1      flags:0x00004002
[641249.029592] Call Trace:
[641249.029594]  <TASK>
[641249.029598]  __schedule+0x3fd/0x1450
[641249.029604]  ? __queue_delayed_work+0x83/0xf0
[641249.029622]  ? _raw_spin_unlock_irq+0xe/0x50
[641249.029629]  schedule+0x63/0x110
[641249.029634]  wb_wait_for_completion+0x89/0xc0
[641249.029639]  ? __pfx_autoremove_wake_function+0x10/0x10
[641249.029645]  sync_inodes_sb+0xd6/0x2c0
[641249.029652]  sync_filesystem+0x70/0xb0
[641249.029662]  __x64_sys_syncfs+0x49/0xb0
[641249.029696]  do_syscall_64+0x5b/0x90
[641249.029701]  ? do_syscall_64+0x67/0x90
[641249.029706]  ? syscall_exit_to_user_mode+0x37/0x60
[641249.029712]  ? do_syscall_64+0x67/0x90
[641249.029717]  ? syscall_exit_to_user_mode+0x37/0x60
[641249.029722]  ? do_syscall_64+0x67/0x90
[641249.029727]  ? exc_page_fault+0x94/0x1b0
[641249.029733]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[641249.029739] RIP: 0033:0x7fe8d5d1db57
[641249.029749] RSP: 002b:00007fe7972df2a8 EFLAGS: 00000202 ORIG_RAX: 0000000000000132
[641249.029754] RAX: ffffffffffffffda RBX: 00007fe7972df2f8 RCX: 00007fe8d5d1db57
[641249.029757] RDX: 00007fef867f3d82 RSI: 0000000000000007 RDI: 000000000000003d
[641249.029759] RBP: 000000000000003d R08: 0000000000000007 R09: 00007fe7d00326f0
[641249.029762] R10: 6362d3f42356e84f R11: 0000000000000202 R12: 0000000000000001
[641249.029782] R13: 00007fe8c806c910 R14: 000000000000001c R15: 00007fe7d00fd500
[641249.029789]  </TASK>

@devZer0
Copy link

devZer0 commented Jan 16, 2024

related ? #13146

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants