Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BRT + sync = page fault #15768

Closed
allanjude opened this issue Jan 12, 2024 · 8 comments
Closed

BRT + sync = page fault #15768

allanjude opened this issue Jan 12, 2024 · 8 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@allanjude
Copy link
Contributor

System information

Type Version/Name
Distribution Name FreeBSD
Distribution Version 14.0-RELEASE
Kernel Version 14.0-RELEASE
Architecture AMD64
OpenZFS Version zfs-2.2.0-FreeBSD_g95785196f

Describe the problem you're observing

When using BRT to clone a file, and then calling sync, the system will panic.

Describe how to reproduce the problem

FreeBSD:

sysctl vfs.zfs.bclone_enabled = 1
for n in 1 2 3 4 5 6 7 8 9; do cp file file.$n; sync;done

Ubuntu:

echo 1 | sudo tee /sys/module/zfs/parameters/zfs_bclone_enabled
for n in 1 2 3 4 5 6 7 8 9; do cp --reflink=always file file.$n; sync;done

Analysis

It seems the issue is in the memcpy() in zil_lwb_commit(), the

Include any warning/errors/backtraces from the system logs

FreeBSD:

panic: vm_fault_lookup: fault on nofault entry, addr: 0xfffffe008e87a000
cpuid = 0
time = 1704410329
KDB: stack backtrace:
#0 0xffffffff80b9002d at kdb_backtrace+0x5d
#1 0xffffffff80b43132 at vpanic+0x132
#2 0xffffffff80b42ff3 at panic+0x43
#3 0xffffffff80eb28b5 at vm_fault+0x15c5
#4 0xffffffff80eb1220 at vm_fault_trap+0xb0
#5 0xffffffff8100ca39 at trap_pfault+0x1d9
#6 0xffffffff80fe3828 at calltrap+0x8
#7 0xffffffff82128d57 at zil_lwb_write_issue+0xe7
#8 0xffffffff821260db at zil_commit_impl+0x1db
#9 0xffffffff81f889d1 at zfs_sync+0x71
#10 0xffffffff80c30e08 at kern_sync+0xc8
#11 0xffffffff80c30eb9 at sys_sync+0x9
#12 0xffffffff8100d119 at amd64_syscall+0x109
#13 0xffffffff80fe413b at fast_syscall_common+0xf8

kgdb of vmcore:

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
57              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:405
#2  0xffffffff80b42cc7 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:526
#3  0xffffffff80b4319f in vpanic (fmt=0xffffffff811fdb6d "%s: fault on nofault entry, addr: %#lx", ap=ap@entry=0xfffffe01133268b0) at /usr/src/sys/kern/kern_shutdown.c:970
#4  0xffffffff80b42ff3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:894
#5  0xffffffff80eb28b5 in vm_fault_lookup (fs=0xfffffe0113326910) at /usr/src/sys/vm/vm_fault.c:912
#6  vm_fault (map=<optimized out>, vaddr=vaddr@entry=18446741879877189632, fault_type=2 '\002', fault_flags=fault_flags@entry=0, m_hold=m_hold@entry=0x0) at /usr/src/sys/vm/vm_fault.c:1569
#7  0xffffffff80eb1220 in vm_fault_trap (map=<optimized out>, vaddr=vaddr@entry=18446741879877189632, fault_type=<optimized out>, fault_flags=fault_flags@entry=0, signo=0x0, ucode=0x0) at /usr/src/sys/vm/vm_fault.c:712
#8  0xffffffff8100ca39 in trap_pfault (frame=0xfffffe0113326aa0, usermode=false, signo=<unavailable>, ucode=<unavailable>) at /usr/src/sys/amd64/amd64/trap.c:845
#9  <signal handler called>
#10 memcpy_std () at /usr/src/sys/amd64/amd64/support.S:547
#11 0xffffffff82128ef7 in zil_lwb_commit (zilog=0xfffff800b1446c00, lwb=0xfffff803747df188, itx=0xfffffe015bca0000) at /usr/src/sys/contrib/openzfs/module/zfs/zil.c:2135
#12 zil_lwb_write_issue (zilog=zilog@entry=0xfffff800b1446c00, lwb=0xfffff803747df188) at /usr/src/sys/contrib/openzfs/module/zfs/zil.c:1795
#13 0xffffffff8212627b in zil_commit_writer (zilog=0xfffff800b1446c00, zcw=0xfffff80012464b90) at /usr/src/sys/contrib/openzfs/module/zfs/zil.c:2991
#14 zil_commit_impl (zilog=0xfffff800b1446c00, foid=<optimized out>) at /usr/src/sys/contrib/openzfs/module/zfs/zil.c:3495
#15 0xffffffff821260a0 in zil_commit (zilog=0xfffffe013566c000, foid=18446741880521318288, foid@entry=0) at /usr/src/sys/contrib/openzfs/module/zfs/zil.c:3458
#16 0xffffffff81f889d1 in zfs_sync (vfsp=<optimized out>, waitfor=<optimized out>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vfsops.c:465
#17 0xffffffff80c30e08 in kern_sync (td=<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:151
#18 0xffffffff80c30eb9 in sys_sync (td=0xfffffe013566c000, uap=0xfffffe015bcb5f90) at /usr/src/sys/kern/vfs_syscalls.c:176
#19 0xffffffff8100d119 in syscallenter (td=0xfffffe0110e191e0) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:187
#20 amd64_syscall (td=0xfffffe0110e191e0, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1197
#21 <signal handler called>
#22 0x00000fc6d3f7dc8a in ?? ()

Ubuntu:

Jan 12 20:55:06 lxd kernel: BUG: unable to handle page fault for address: ffffb5c682aa2000
Jan 12 20:55:06 lxd kernel: #PF: supervisor write access in kernel mode
Jan 12 20:55:06 lxd kernel: #PF: error_code(0x0002) - not-present page
Jan 12 20:55:06 lxd kernel: PGD 11b159067 P4D 11b159067 PUD 11b15a067 PMD 10e6e6067 PTE 0
Jan 12 20:55:06 lxd kernel: Oops: 0002 [#1] SMP NOPTI
Jan 12 20:55:06 lxd kernel: CPU: 1 PID: 26414 Comm: kworker/u16:3 Tainted: P           OE     5.4.0-136-generic #153-Ubuntu
Jan 12 20:55:06 lxd kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Jan 12 20:55:06 lxd kernel: Workqueue: writeback wb_workfn (flush-zfs-1)
Jan 12 20:55:06 lxd kernel: RIP: 0010:__memcpy+0x12/0x20
Jan 12 20:55:06 lxd kernel: Code: 84 a7 67 ff 0f 31 48 c1 e2 20 48 09 c2 48 31 d3 e9 70 ff ff ff 90 90 90 0f 1f 44 00 00 48 >
Jan 12 20:55:06 lxd kernel: RSP: 0018:ffffb5c684727960 EFLAGS: 00010246
Jan 12 20:55:06 lxd kernel: RAX: ffffb5c682a8d0b8 RBX: 000000000001ff48 RCX: 0000000000001600
Jan 12 20:55:06 lxd kernel: RDX: 0000000000000000 RSI: ffffb5c687c73f90 RDI: ffffb5c682aa2000
Jan 12 20:55:06 lxd kernel: RBP: ffffb5c684727a20 R08: 0000000000000003 R09: ffff9eb42e2ae400
Jan 12 20:55:06 lxd kernel: R10: ffff9eb43cd75e98 R11: ffffb5c687c5f000 R12: ffffb5c682a8d0b8
Jan 12 20:55:06 lxd kernel: R13: ffffb5c687c5f000 R14: ffff9eb4c9e82ff0 R15: ffff9eb45fd0b800
Jan 12 20:55:06 lxd kernel: FS:  0000000000000000(0000) GS:ffff9eb4dba40000(0000) knlGS:0000000000000000
Jan 12 20:55:06 lxd kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 12 20:55:06 lxd kernel: CR2: ffffb5c682aa2000 CR3: 000000010c6e8000 CR4: 00000000000406e0
Jan 12 20:55:06 lxd kernel: Call Trace:
Jan 12 20:55:06 lxd kernel:  ? zil_lwb_write_issue+0xb1/0xd40 [zfs]
Jan 12 20:55:06 lxd kernel:  zil_commit_impl+0x224/0x1130 [zfs]
Jan 12 20:55:06 lxd kernel:  zil_commit+0x40/0x60 [zfs]
Jan 12 20:55:06 lxd kernel:  zpl_writepages+0x11e/0x210 [zfs]
Jan 12 20:55:06 lxd kernel:  do_writepages+0x43/0xd0
Jan 12 20:55:06 lxd kernel:  ? fprop_fraction_percpu+0x2f/0x80
Jan 12 20:55:06 lxd kernel:  __writeback_single_inode+0x40/0x300
Jan 12 20:55:06 lxd kernel:  writeback_sb_inodes+0x233/0x4d0
Jan 12 20:55:06 lxd kernel:  wb_writeback+0xd1/0x2f0
Jan 12 20:55:06 lxd kernel:  wb_workfn+0xd3/0x4f0
Jan 12 20:55:06 lxd kernel:  ? put_prev_entity+0x23/0x100
Jan 12 20:55:06 lxd kernel:  ? __switch_to+0x157/0x470
Jan 12 20:55:06 lxd kernel:  ? __switch_to_asm+0x42/0x80
Jan 12 20:55:06 lxd kernel:  ? __switch_to_asm+0x5a/0x80
Jan 12 20:55:06 lxd kernel:  ? __schedule+0x2eb/0x740
Jan 12 20:55:06 lxd kernel:  process_one_work+0x1eb/0x3b0
Jan 12 20:55:06 lxd kernel:  worker_thread+0x4d/0x400
Jan 12 20:55:06 lxd kernel:  kthread+0x104/0x140
Jan 12 20:55:06 lxd kernel:  ? process_one_work+0x3b0/0x3b0
Jan 12 20:55:06 lxd kernel:  ? kthread_park+0x90/0x90
Jan 12 20:55:06 lxd kernel:  ret_from_fork+0x35/0x40
Jan 12 20:55:06 lxd kernel: Modules linked in: zfs(POE) spl(OE) dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua binfmt_mi>
Jan 12 20:55:06 lxd kernel: CR2: ffffb5c682aa2000
@allanjude allanjude added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jan 12, 2024
@amotin
Copy link
Member

amotin commented Jan 12, 2024

Is it reproducible on the latest stable? There was #15634 after 2.2.0 and merged for upcoming 2.2.3, that looks potentially related. #15635 should fix it better in master.

@robn
Copy link
Member

robn commented Jan 13, 2024

I'm not sure what version this is targeting actually. zfs-2.2.0-FreeBSD_g95785196f is zfs-2.2.0 release, which on Linux didn't have zfs_bclone_enabled.

In any case, I couldn't reproduce on 2.2.0, 2.2.2 or 2.2.3-staging @ ac59231 against Linux 5.10.170:

echo 1 > /sys/module/zfs/parameters/zfs_bclone_enabled || true
zpool create tank loop0 && dd if=/dev/random of=/tank/file bs=1M count=1
zpool sync
for n in 12 3 4 5 6 7 8 9; do cp --reflink=always /tank/file /tank/file.$n; sync;done

Any more info? Source file size, recordsize=, slog device? ... not sure what else?

@robn
Copy link
Member

robn commented Jan 13, 2024

1G file does better. 2.2.0 (debug):

+ dd if=/dev/random of=/tank/file bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 8.97848 s, 120 MB/s
+ zpool sync
+ cp --reflink=always /tank/file /tank/file.1
+ sync
+ cp --reflink=always /tank/file /tank/file.2
+ sync
[   13.976995] VERIFY3(reclen + MIN(dlen, sizeof (uint64_t)) <= lwb_sp) failed (130888 <= 69448)
[   13.977083] PANIC at zil.c:2061:zil_lwb_assign()
[   13.977129] Showing stack for process 617
[   13.977169] CPU: 0 PID: 617 Comm: sync Tainted: P           O      5.10.170 #3
[   13.977216] Call Trace:
[   13.977242]  dump_stack+0x57/0x6e
[   13.977286]  spl_panic+0xd3/0xfb [spl]
[   13.977317]  ? prepare_to_wait_event+0x76/0x160
[   13.977353]  ? spl_cache_refill+0xba/0x350 [spl]
[   13.977391]  ? _cond_resched+0x1a/0x50
[   13.977685]  ? queued_spin_unlock+0x5/0x10 [zfs]
[   13.977801]  ? __raw_spin_unlock+0x5/0x10 [zfs]
[   13.977919]  ? zil_alloc_lwb+0x21c/0x350 [zfs]
[   13.978066]  zil_lwb_assign+0x235/0x420 [zfs]
[   13.978252]  zil_process_commit_list+0x176/0x450 [zfs]
[   13.978379]  zil_commit_writer+0x184/0x230 [zfs]
[   13.978515]  zil_commit_impl+0x5f/0xd0 [zfs]
[   13.978755]  zfs_sync+0x51/0x80 [zfs]
[   13.978917]  zpl_sync_fs+0x49/0xb0 [zfs]
[   13.978954]  ? vfs_fsync_range+0x90/0x90
[   13.978991]  iterate_supers+0x8c/0x100
[   13.979023]  ksys_sync+0x60/0xb0
[   13.979047]  __do_sys_sync+0xa/0x20
[   13.979071]  do_syscall_64+0x30/0x40
[   13.979106]  entry_SYSCALL_64_after_hwframe+0x61/0xc6

2.2.2 (debug):

+ dd if=/dev/random of=/tank/file bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.78453 s, 138 MB/s
+ zpool sync
+ cp --reflink=always /tank/file /tank/file.1
+ sync
+ cp --reflink=always /tank/file /tank/file.2
+ sync
[   17.498077] VERIFY3(reclen + MIN(dlen, sizeof (uint64_t)) <= lwb_sp) failed (130888 <= 69448)
[   17.498158] PANIC at zil.c:2061:zil_lwb_assign()
[   17.498198] Showing stack for process 617
[   17.498235] CPU: 0 PID: 617 Comm: sync Tainted: P           O      5.10.170 #3
[   17.498276] Call Trace:
[   17.498298]  dump_stack+0x57/0x6e
[   17.498338]  spl_panic+0xd3/0xfb [spl]
[   17.498366]  ? prepare_to_wait_event+0x76/0x160
[   17.498400]  ? spl_cache_refill+0xba/0x350 [spl]
[   17.498430]  ? _cond_resched+0x1a/0x50
[   17.498552]  ? queued_spin_unlock+0x5/0x10 [zfs]
[   17.498665]  ? __raw_spin_unlock+0x5/0x10 [zfs]
[   17.498780]  ? zil_alloc_lwb+0x21c/0x350 [zfs]
[   17.498902]  zil_lwb_assign+0x235/0x420 [zfs]
[   17.499031]  zil_process_commit_list+0x176/0x450 [zfs]
[   17.499182]  zil_commit_writer+0x184/0x230 [zfs]
[   17.499337]  zil_commit_impl+0x5f/0xd0 [zfs]
[   17.499486]  zfs_sync+0x51/0x80 [zfs]
[   17.499633]  zpl_sync_fs+0x49/0xb0 [zfs]
[   17.499706]  ? vfs_fsync_range+0x90/0x90
[   17.499732]  iterate_supers+0x8c/0x100
[   17.499756]  ksys_sync+0x60/0xb0
[   17.499779]  __do_sys_sync+0xa/0x20
[   17.499804]  do_syscall_64+0x30/0x40
[   17.499829]  entry_SYSCALL_64_after_hwframe+0x61/0xc6

2.2.3-staging @ ac59231 (debug):

+ dd if=/dev/random of=/tank/file bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 8.10867 s, 132 MB/s
+ zpool sync
+ cp --reflink=always /tank/file /tank/file.1
+ sync
+ cp --reflink=always /tank/file /tank/file.2
+ sync
+ cp --reflink=always /tank/file /tank/file.3
+ sync
+ cp --reflink=always /tank/file /tank/file.4
+ sync
+ cp --reflink=always /tank/file /tank/file.5
+ sync
+ cp --reflink=always /tank/file /tank/file.6
+ sync
+ cp --reflink=always /tank/file /tank/file.7
+ sync
+ cp --reflink=always /tank/file /tank/file.8
+ sync
+ cp --reflink=always /tank/file /tank/file.9
+ sync

So probably fixed already?

@allanjude
Copy link
Contributor Author

I'm not sure what version this is targeting actually. zfs-2.2.0-FreeBSD_g95785196f is zfs-2.2.0 release, which on Linux didn't have zfs_bclone_enabled.

On Ubuntu I was using 2.2.2

@allanjude
Copy link
Contributor Author

Is it reproducible on the latest stable? There was #15634 after 2.2.0 and merged for upcoming 2.2.3, that looks potentially related. #15635 should fix it better in master.

Confirmed that 2aa3a48 from #15634 fixes it on 2.2.2

@allanjude
Copy link
Contributor Author

@mmatuska We should get this into 14.0 as an Errata Notice. Do you want to handle that, or would you like me to do it?

@mmatuska
Copy link
Contributor

mmatuska commented Jan 13, 2024

@allanjude Please prepare the Errata, but you fix here only one of many problems.
Sadly, there is a lot more and 2.2.3 will be in a way better shape for BRT than 2.2.0.

@behlendorf
Copy link
Contributor

Closing, this was fixed in 2.2.2 by 2aa3a48 as mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

5 participants