Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: kernel NULL pointer dereference, address: 0000000000000008 #10642

Open
aaronjwood opened this issue Jul 29, 2020 · 9 comments
Open

BUG: kernel NULL pointer dereference, address: 0000000000000008 #10642

aaronjwood opened this issue Jul 29, 2020 · 9 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@aaronjwood
Copy link

aaronjwood commented Jul 29, 2020

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 20.04.1 LTS
Linux Kernel Linux vault 5.4.0-42-generic #46-Ubuntu SMP
Architecture x86_64
ZFS Version 0.8.3-1ubuntu12.1
SPL Version 0.8.3-1ubuntu12.1

Describe the problem you're observing

After a few days (sometimes longer) there is strange behavior that is noticeable in the apps that are served up to the network which are backed by ZFS datasets. Looking at dmesg shows the crash. This never causes the OS to completely crash and after this happens ZFS seems to work, but certain workloads against my datasets seem to hang.

Describe how to reproduce the problem

Not quite sure how the scenario is triggered. I see that after running my server for a few days to maybe 1.5 weeks this can happen.

Include any warning/errors/backtraces from the system logs

[205215.211305] BUG: kernel NULL pointer dereference, address: 0000000000000008
[205215.212024] #PF: supervisor write access in kernel mode
[205215.212024] #PF: error_code(0x0002) - not-present page
[205215.212024] PGD 860342067 P4D 860342067 PUD 860344067 PMD 0 
[205215.212024] Oops: 0002 [#1] SMP NOPTI
[205215.212024] CPU: 4 PID: 4051658 Comm: Plex Media Serv Tainted: P           O      5.4.0-42-generic #46-Ubuntu
[205215.212024] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019
[205215.212024] RIP: 0010:avl_rotation.isra.0+0x211/0x240 [zavl]
[205215.212024] Code: 83 c8 01 49 89 43 10 48 85 db 75 27 48 8b 45 d0 4c 89 18 48 83 c4 10 b8 01 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 63 ff <4c> 89 14 fb e9 e5 fe ff ff 48 63 ff b8 01 00 00 00 4c 89 1c fb 48
[205215.212024] RSP: 0018:ffffb04e458f3b70 EFLAGS: 00010202
[205215.212024] RAX: 000000000000000a RBX: 0000000000000008 RCX: 0000000000000000
[205215.212024] RDX: 0000000000000001 RSI: ffff96b43d9e45a0 RDI: 0000000000000000
[205215.212024] RBP: ffffb04e458f3ba8 R08: 0000000000000000 R09: 0000000000000008
[205215.212024] R10: ffff96b4e658a008 R11: 0000000000000000 R12: ffff96b4e658a010
[205215.212024] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff96b43d9e45a0
[205215.212024] FS:  00007f03ad7fa700(0000) GS:ffff96bbe0100000(0000) knlGS:0000000000000000
[205215.212024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[205215.212024] CR2: 0000000000000008 CR3: 000000086081e000 CR4: 00000000003406e0
[205215.212024] Call Trace:
[205215.212024]  avl_insert+0xa9/0xb0 [zavl]
[205215.212024]  zfs_rangelock_add_reader+0x14c/0x1e0 [zfs]
[205215.212024]  zfs_rangelock_enter_reader+0xf1/0x1c0 [zfs]
[205215.212024]  zfs_rangelock_enter+0xed/0xf0 [zfs]
[205215.212024]  zfs_get_data+0x158/0x350 [zfs]
[205215.212024]  zil_lwb_commit+0x1c6/0x360 [zfs]
[205215.212024]  zil_process_commit_list+0xf1/0x210 [zfs]
[205215.212024]  zil_commit_writer.isra.0+0xa3/0xb0 [zfs]
[205215.212024]  zil_commit_impl+0x59/0xa0 [zfs]
[205215.212024]  zil_commit+0x40/0x60 [zfs]
[205215.212024]  zfs_fsync+0x7a/0xe0 [zfs]
[205215.212024]  zpl_fsync+0x5c/0x90 [zfs]
[205215.212024]  vfs_fsync_range+0x49/0x80
[205215.212024]  ? __fget_light+0x57/0x70
[205215.212024]  do_fsync+0x3d/0x70
[205215.212024]  __x64_sys_fsync+0x14/0x20
[205215.212024]  do_syscall_64+0x57/0x190
[205215.212024]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[205215.212024] RIP: 0033:0x7f03cdd7fb2d
[205215.212024] Code: b8 20 00 00 75 10 b8 4a 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 3e f6 ff ff 48 89 04 24 b8 4a 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 87 f6 ff ff 48 89 d0 48 83 c4 08 48 3d 01
[205215.212024] RSP: 002b:00007f03ad7f8bd0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[205215.212024] RAX: ffffffffffffffda RBX: 00000000023a18a8 RCX: 00007f03cdd7fb2d
[205215.212024] RDX: 00000000023a1a58 RSI: 0000000000000002 RDI: 000000000000000c
[205215.212024] RBP: 00000000023b43b8 R08: 0000000000000000 R09: 0000000000000001
[205215.212024] R10: 00007f03404613b0 R11: 0000000000000293 R12: 0000000000000000
[205215.304031] R13: 00000000023a1818 R14: 0000000000000002 R15: 0000000000000000
[205215.332036] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod aufs target_core_mod overlay nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlua(PO) ses enclosure mpt3sas raid_class scsi_transport_sas pci_hyperv serio_raw joydev hv_balloon hyperv_fb pci_hyperv_intf mac_hid sch_fq_codel drm ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul hid_generic ghash_clmulni_intel hv_storvsc hid_hyperv hv_netvsc hid scsi_transport_fc hv_utils hyperv_keyboard aesni_intel crypto_simd
[205215.332036]  cryptd glue_helper hv_vmbus
[205215.654749] CR2: 0000000000000008
[205215.689602] ---[ end trace ad3e9559572add2c ]---
[205215.728063] RIP: 0010:avl_rotation.isra.0+0x211/0x240 [zavl]
[205215.764027] Code: 83 c8 01 49 89 43 10 48 85 db 75 27 48 8b 45 d0 4c 89 18 48 83 c4 10 b8 01 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 63 ff <4c> 89 14 fb e9 e5 fe ff ff 48 63 ff b8 01 00 00 00 4c 89 1c fb 48
[205215.839313] RSP: 0018:ffffb04e458f3b70 EFLAGS: 00010202
[205215.873658] RAX: 000000000000000a RBX: 0000000000000008 RCX: 0000000000000000
[205215.912036] RDX: 0000000000000001 RSI: ffff96b43d9e45a0 RDI: 0000000000000000
[205215.948030] RBP: ffffb04e458f3ba8 R08: 0000000000000000 R09: 0000000000000008
[205215.981319] R10: ffff96b4e658a008 R11: 0000000000000000 R12: ffff96b4e658a010
[205216.018130] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff96b43d9e45a0
[205216.052029] FS:  00007f03ad7fa700(0000) GS:ffff96bbe0100000(0000) knlGS:0000000000000000
[205216.089008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[205216.126095] CR2: 0000000000000008 CR3: 000000086081e000 CR4: 00000000003406e0

As you can see I'm inside a VM here. Some details about my setup:

  • I have a SAS 3008 card passed through to this guest using DDA. This is how I'm using ZFS safely from inside the VM
    • 25f9:00:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
  • My host is using Windows Server 2019
    • Using a Zen 2 CPU with the x570 chipset

Guest CPU details:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   44 bits physical, 48 bits virtual
CPU(s):                          12
On-line CPU(s) list:             0-11
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           113
Model name:                      AMD Ryzen 9 3950X 16-Core Processor
Stepping:                        0
CPU MHz:                         3499.943
BogoMIPS:                        6999.88
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       192 KiB
L1i cache:                       192 KiB
L2 cache:                        3 MiB
L3 cache:                        16 MiB
NUMA node0 CPU(s):               0-11
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, STIBP conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_go
                                 od nopl cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm
                                  sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsave
                                 c xgetbv1 xsaves xsaveerptr virt_ssbd arat rdpid

Guest memory ranges:

RANGE                                  SIZE  STATE REMOVABLE  BLOCK
0x0000000000000000-0x00000000f7ffffff  3.9G online       yes   0-30
0x0000000100000000-0x0000000907ffffff 32.1G online       yes 32-288

Memory block size:       128M
Total online memory:      36G
Total offline memory:      0B
@aaronjwood
Copy link
Author

aaronjwood commented Jul 29, 2020

FWIW when this happens it is very hard to shutdown/reboot the system cleanly. It takes an extremely long time and hangs at various spots here:
Capture

EDIT: full log trace of when reboot was triggered:

Jul 29 16:13:18 vault kernel: sr 0:0:0:1: [sr0] tag#746 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 29 16:13:18 vault kernel: sr 0:0:0:1: [sr0] tag#746 Sense Key : Not Ready [current]
Jul 29 16:13:18 vault kernel: sr 0:0:0:1: [sr0] tag#746 Add. Sense: Medium not present - tray closed
Jul 29 16:13:18 vault kernel: sr 0:0:0:1: [sr0] tag#746 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
Jul 29 16:13:18 vault kernel: blk_update_request: I/O error, dev sr0, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
Jul 29 16:13:18 vault kernel: sr 0:0:0:1: [sr0] tag#747 unaligned transfer
Jul 29 16:13:18 vault kernel: blk_update_request: I/O error, dev sr0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:18 vault kernel: Buffer I/O error on dev sr0, logical block 0, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#744 unaligned transfer
Jul 29 16:13:19 vault kernel: blk_update_request: I/O error, dev sr0, sector 1 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:19 vault kernel: Buffer I/O error on dev sr0, logical block 1, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#745 unaligned transfer
Jul 29 16:13:19 vault kernel: blk_update_request: I/O error, dev sr0, sector 2 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:19 vault kernel: Buffer I/O error on dev sr0, logical block 2, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#385 unaligned transfer
Jul 29 16:13:19 vault kernel: blk_update_request: I/O error, dev sr0, sector 3 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:19 vault kernel: Buffer I/O error on dev sr0, logical block 3, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#279 unaligned transfer
Jul 29 16:13:19 vault kernel: blk_update_request: I/O error, dev sr0, sector 4 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:19 vault kernel: Buffer I/O error on dev sr0, logical block 4, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#266 unaligned transfer
Jul 29 16:13:19 vault kernel: blk_update_request: I/O error, dev sr0, sector 5 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:19 vault kernel: Buffer I/O error on dev sr0, logical block 5, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#267 unaligned transfer
Jul 29 16:13:19 vault kernel: blk_update_request: I/O error, dev sr0, sector 6 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:19 vault kernel: Buffer I/O error on dev sr0, logical block 6, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#268 unaligned transfer
Jul 29 16:13:20 vault kernel: blk_update_request: I/O error, dev sr0, sector 7 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:20 vault kernel: Buffer I/O error on dev sr0, logical block 7, async page read
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#866 unaligned transfer
Jul 29 16:13:20 vault kernel: blk_update_request: I/O error, dev sr0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:20 vault kernel: Buffer I/O error on dev sr0, logical block 0, async page read
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#10 unaligned transfer
Jul 29 16:13:20 vault kernel: Buffer I/O error on dev sr0, logical block 1, async page read
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#11 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#12 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#13 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#14 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#15 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#16 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#17 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#18 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#19 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#20 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#21 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#22 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#23 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#24 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#25 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#26 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#27 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#28 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#29 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#30 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#31 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#32 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#33 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#34 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#35 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#36 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#37 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#38 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#39 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#40 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#41 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#42 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#43 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#44 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#45 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#46 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#47 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#48 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#49 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#50 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#51 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#52 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#53 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#54 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#55 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#56 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#57 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#58 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#59 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#60 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#61 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#62 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#63 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#0 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#1 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#2 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#3 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#4 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#5 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#6 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#7 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#8 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#9 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#10 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#11 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#12 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#13 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#14 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#15 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#16 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#17 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#18 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#19 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#20 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#21 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#22 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#23 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#24 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#25 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#26 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#27 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#28 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#29 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#30 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#31 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#32 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#33 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#34 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#35 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#36 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#37 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#38 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#39 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#40 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#41 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#42 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#43 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#44 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#45 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#46 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#47 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#48 unaligned transfer
Jul 29 16:20:54 vault kernel: docker0: port 2(veth37ec875) entered disabled state
Jul 29 16:20:54 vault kernel: veth53d3c1a: renamed from eth0
Jul 29 16:20:54 vault kernel: docker0: port 2(veth37ec875) entered disabled state
Jul 29 16:20:54 vault kernel: device veth37ec875 left promiscuous mode
Jul 29 16:20:54 vault kernel: docker0: port 2(veth37ec875) entered disabled state
Jul 29 16:20:54 vault kernel: vethe05f9b2: renamed from eth0
Jul 29 16:20:54 vault kernel: docker0: port 3(veth95bb255) entered disabled state
Jul 29 16:20:54 vault kernel: Unable to locate Target Portal Group on iqn.2020-04.com.aaronjwood.lan.vault
Jul 29 16:20:54 vault kernel: iSCSI Login negotiation failed.
Jul 29 16:20:54 vault kernel: vetha214a1d: renamed from eth0
Jul 29 16:20:55 vault kernel: br-641b512ac453: port 1(veth21b5a89) entered disabled state
Jul 29 16:20:55 vault kernel: docker0: port 3(veth95bb255) entered disabled state
Jul 29 16:20:55 vault kernel: device veth95bb255 left promiscuous mode
Jul 29 16:20:55 vault kernel: docker0: port 3(veth95bb255) entered disabled state
Jul 29 16:20:55 vault kernel: br-641b512ac453: port 1(veth21b5a89) entered disabled state
Jul 29 16:20:55 vault kernel: device veth21b5a89 left promiscuous mode
Jul 29 16:20:55 vault kernel: br-641b512ac453: port 1(veth21b5a89) entered disabled state
Jul 29 16:20:55 vault kernel: docker0: port 1(vethfa797b5) entered disabled state
Jul 29 16:20:55 vault kernel: veth6aaf00e: renamed from eth0
Jul 29 16:20:55 vault kernel: docker0: port 1(vethfa797b5) entered disabled state
Jul 29 16:20:55 vault kernel: device vethfa797b5 left promiscuous mode
Jul 29 16:20:55 vault kernel: docker0: port 1(vethfa797b5) entered disabled state
Jul 29 16:20:57 vault kernel: docker0: port 4(veth0911cc9) entered disabled state
Jul 29 16:20:57 vault kernel: vethdff1a34: renamed from eth0
Jul 29 16:20:57 vault kernel: docker0: port 4(veth0911cc9) entered disabled state
Jul 29 16:20:57 vault kernel: device veth0911cc9 left promiscuous mode
Jul 29 16:20:57 vault kernel: docker0: port 4(veth0911cc9) entered disabled state
Jul 29 16:20:58 vault kernel: vethc523ecd: renamed from eth0
Jul 29 16:20:58 vault kernel: docker0: port 5(veth258ecf7) entered disabled state
Jul 29 16:20:58 vault kernel: docker0: port 5(veth258ecf7) entered disabled state
Jul 29 16:20:58 vault kernel: device veth258ecf7 left promiscuous mode
Jul 29 16:20:58 vault kernel: docker0: port 5(veth258ecf7) entered disabled state
Jul 29 16:20:58 vault kernel: docker0: port 6(vethd8c1225) entered disabled state
Jul 29 16:20:58 vault kernel: veth44f9777: renamed from eth0
Jul 29 16:20:58 vault kernel: docker0: port 6(vethd8c1225) entered disabled state
Jul 29 16:20:58 vault kernel: device vethd8c1225 left promiscuous mode
Jul 29 16:20:58 vault kernel: docker0: port 6(vethd8c1225) entered disabled state
Jul 29 16:20:58 vault kernel: docker0: port 8(veth4e16f4d) entered disabled state
Jul 29 16:20:58 vault kernel: veth2cd99ec: renamed from eth0
Jul 29 16:20:58 vault kernel: docker0: port 8(veth4e16f4d) entered disabled state
Jul 29 16:20:58 vault kernel: device veth4e16f4d left promiscuous mode
Jul 29 16:20:58 vault kernel: docker0: port 8(veth4e16f4d) entered disabled state
Jul 29 16:21:01 vault kernel: docker0: port 7(veth328cb31) entered disabled state
Jul 29 16:21:01 vault kernel: vethb0388d6: renamed from eth0
Jul 29 16:21:02 vault kernel: docker0: port 7(veth328cb31) entered disabled state
Jul 29 16:21:02 vault kernel: device veth328cb31 left promiscuous mode
Jul 29 16:21:02 vault kernel: docker0: port 7(veth328cb31) entered disabled state
Jul 29 16:21:05 vault systemd-shutdown[1]: Syncing filesystems and block devices.
Jul 29 16:21:45 vault systemd-shutdown[1]: Syncing filesystems and block devices - timed out, issuing SIGKILL to PID 1070391.
Jul 29 16:21:45 vault systemd-shutdown[1]: Sending SIGTERM to remaining processes...

@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label Aug 23, 2020
@mas90
Copy link

mas90 commented Nov 7, 2020

I believe I have the same bug -- also triggered by Plex Media Server (consistently, every few weeks; the server runs a lot more than just Plex but the crash is always triggered by Plex).

Nov  3 15:45:14 callisto kernel: [4229866.738020] BUG: kernel NULL pointer dereference, address: 0000000000000008
Nov  3 15:45:14 callisto kernel: [4229866.746633] #PF: supervisor write access in kernel mode
Nov  3 15:45:14 callisto kernel: [4229866.753246] #PF: error_code(0x0002) - not-present page
Nov  3 15:45:14 callisto kernel: [4229866.759760] PGD 8000000894e8c067 P4D 8000000894e8c067 PUD 894e8d067 PMD 0
Nov  3 15:45:14 callisto kernel: [4229866.768221] Oops: 0002 [#1] SMP PTI
Nov  3 15:45:14 callisto kernel: [4229866.772960] CPU: 2 PID: 1835283 Comm: Plex Media Serv Tainted: P           O      5.4.0-47-generic #51-Ubuntu
Nov  3 15:45:14 callisto kernel: [4229866.784867] Hardware name: HP ProLiant DL380e Gen8, BIOS P73 05/24/2019
Nov  3 15:45:14 callisto kernel: [4229866.793026] RIP: 0010:avl_rotation.isra.0+0x211/0x240 [zavl]
Nov  3 15:45:14 callisto kernel: [4229866.800136] Code: 83 c8 01 49 89 43 10 48 85 db 75 27 48 8b 45 d0 4c 89 18 48 83 c4 10 b8 01 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 63 ff <4c> 89 14 fb e9 e5 fe ff ff 48 63 ff b8 01 00 00 00 4c 89 1c fb 48
Nov  3 15:45:14 callisto kernel: [4229866.823054] RSP: 0018:ffff992388e7fb70 EFLAGS: 00010202
Nov  3 15:45:14 callisto kernel: [4229866.829798] RAX: 000000000000000a RBX: 0000000000000008 RCX: 0000000000000000
Nov  3 15:45:14 callisto kernel: [4229866.838681] RDX: 0000000000000001 RSI: ffff8c9aa23c1a80 RDI: 0000000000000000
Nov  3 15:45:14 callisto kernel: [4229866.847553] RBP: ffff992388e7fba8 R08: 0000000000000000 R09: 0000000000000008
Nov  3 15:45:14 callisto kernel: [4229866.856431] R10: ffff8ca42694e808 R11: 0000000000000000 R12: ffff8ca42694e810
Nov  3 15:45:14 callisto kernel: [4229866.865317] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8c9aa23c1a80
Nov  3 15:45:14 callisto kernel: [4229866.874194] FS:  00007f6f6ffff700(0000) GS:ffff8c9f67480000(0000) knlGS:0000000000000000
Nov  3 15:45:14 callisto kernel: [4229866.884146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov  3 15:45:14 callisto kernel: [4229866.891551] CR2: 0000000000000008 CR3: 0000000a4c554002 CR4: 00000000000606e0
Nov  3 15:45:14 callisto kernel: [4229866.900507] Call Trace:
Nov  3 15:45:14 callisto kernel: [4229866.904301]  avl_insert+0xa9/0xb0 [zavl]
Nov  3 15:45:14 callisto kernel: [4229866.909794]  zfs_rangelock_add_reader+0x14c/0x1e0 [zfs]
Nov  3 15:45:14 callisto kernel: [4229866.916696]  zfs_rangelock_enter_reader+0xf1/0x1c0 [zfs]
Nov  3 15:45:14 callisto kernel: [4229866.923705]  zfs_rangelock_enter+0xed/0xf0 [zfs]
Nov  3 15:45:14 callisto kernel: [4229866.929945]  zfs_get_data+0x158/0x350 [zfs]
Nov  3 15:45:14 callisto kernel: [4229866.935699]  zil_lwb_commit+0x1c6/0x360 [zfs]
Nov  3 15:45:14 callisto kernel: [4229866.941639]  zil_process_commit_list+0xf1/0x210 [zfs]
Nov  3 15:45:14 callisto kernel: [4229866.948667]  zil_commit_writer.isra.0+0xa3/0xb0 [zfs]
Nov  3 15:45:14 callisto kernel: [4229866.955352]  zil_commit_impl+0x59/0xa0 [zfs]
Nov  3 15:45:14 callisto kernel: [4229866.961147]  zil_commit+0x40/0x60 [zfs]
Nov  3 15:45:14 callisto kernel: [4229866.966436]  zfs_fsync+0x7a/0xe0 [zfs]
Nov  3 15:45:14 callisto kernel: [4229866.971591]  zpl_fsync+0x5c/0x90 [zfs]
Nov  3 15:45:14 callisto kernel: [4229866.976677]  ? do_fcntl+0x323/0x560
Nov  3 15:45:14 callisto kernel: [4229866.981469]  vfs_fsync_range+0x49/0x80
Nov  3 15:45:14 callisto kernel: [4229866.986565]  ? __fget_light+0x57/0x70
Nov  3 15:45:14 callisto kernel: [4229866.991554]  do_fsync+0x3d/0x70
Nov  3 15:45:14 callisto kernel: [4229866.995974]  __x64_sys_fsync+0x14/0x20
Nov  3 15:45:14 callisto kernel: [4229867.001061]  do_syscall_64+0x57/0x190
Nov  3 15:45:14 callisto kernel: [4229867.006052]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov  3 15:45:14 callisto kernel: [4229867.012582] RIP: 0033:0x7f6fca57a8db
Nov  3 15:45:14 callisto kernel: [4229867.017481] Code: 4a 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 13 f7 ff ff 8b 7c 24 0c 41 89 c0 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 89 44 24 0c e8 51 f7 ff ff 8b 44
Nov  3 15:45:14 callisto kernel: [4229867.039992] RSP: 002b:00007f6f6fffd960 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Nov  3 15:45:14 callisto kernel: [4229867.049362] RAX: ffffffffffffffda RBX: 0000000002f0bdc8 RCX: 00007f6fca57a8db
Nov  3 15:45:14 callisto kernel: [4229867.058269] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 000000000000000c
Nov  3 15:45:14 callisto kernel: [4229867.067186] RBP: 0000000002efb328 R08: 0000000000000000 R09: 00007f6ed806f760
Nov  3 15:45:14 callisto kernel: [4229867.076244] R10: 00007f6ed80008d0 R11: 0000000000000293 R12: 0000000000000000
Nov  3 15:45:14 callisto kernel: [4229867.085155] R13: 0000000002f0bd38 R14: 0000000000000002 R15: 0000000000000000
Nov  3 15:45:14 callisto kernel: [4229867.094062] Modules linked in: cpuid sctp tcp_diag inet_diag ipt_REJECT nf_reject_ipv4 xt_multiport rpcsec_gss_krb5 nfsv4 nfs fscache aufs overlay ip6table_filter ip6table_nat ip6_tables xt_tcpudp xt_state xt_conntrack iptable_filter xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bpfilter bonding intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper rapl intel_cstate serio_raw cdc_acm hpilo ioatdma sch_fq_codel ipmi_ssif mac_hid ipmi_si ipmi_devintf ipmi_msghandler coretemp parport_pc ppdev lp parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlua(PO) uas usb_storage mgag200 drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt crc32_pclmul mpt3sas fb_sys_fops igb psmouse ahci nvme raid_class libahci dca drm lpc_ich
Nov  3 15:45:14 callisto kernel: [4229867.094110]  scsi_transport_sas i2c_algo_bit nvme_core netxen_nic
Nov  3 15:45:14 callisto kernel: [4229867.205779] CR2: 0000000000000008
Nov  3 15:45:14 callisto kernel: [4229867.210729] ---[ end trace db0707df8cd3a864 ]---
Nov  3 15:45:14 callisto kernel: [4229867.224840] ERST: [Firmware Warn]: Firmware does not respond in time.
Nov  3 15:45:14 callisto kernel: [4229867.242613] RIP: 0010:avl_rotation.isra.0+0x211/0x240 [zavl]
Nov  3 15:45:14 callisto kernel: [4229867.250019] Code: 83 c8 01 49 89 43 10 48 85 db 75 27 48 8b 45 d0 4c 89 18 48 83 c4 10 b8 01 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 63 ff <4c> 89 14 fb e9 e5 fe ff ff 48 63 ff b8 01 00 00 00 4c 89 1c fb 48
Nov  3 15:45:14 callisto kernel: [4229867.272919] RSP: 0018:ffff992388e7fb70 EFLAGS: 00010202
Nov  3 15:45:14 callisto kernel: [4229867.279884] RAX: 000000000000000a RBX: 0000000000000008 RCX: 0000000000000000
Nov  3 15:45:14 callisto kernel: [4229867.288954] RDX: 0000000000000001 RSI: ffff8c9aa23c1a80 RDI: 0000000000000000
Nov  3 15:45:14 callisto kernel: [4229867.298021] RBP: ffff992388e7fba8 R08: 0000000000000000 R09: 0000000000000008
Nov  3 15:45:14 callisto kernel: [4229867.307086] R10: ffff8ca42694e808 R11: 0000000000000000 R12: ffff8ca42694e810
Nov  3 15:45:14 callisto kernel: [4229867.316156] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8c9aa23c1a80
Nov  3 15:45:14 callisto kernel: [4229867.325353] FS:  00007f6f6ffff700(0000) GS:ffff8c9f67480000(0000) knlGS:0000000000000000
Nov  3 15:45:14 callisto kernel: [4229867.335547] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov  3 15:45:14 callisto kernel: [4229867.343103] CR2: 0000000000000008 CR3: 0000000a4c554002 CR4: 00000000000606e0

Apart from running the same Ubuntu version and kernel (I've experienced the crash on several different 5.4.0-generic Ubuntu builds), my environment is not particularly similar to the original reporter's: not a VM, just Ubuntu 20.04 on bare metal; Xeon E5-2407 CPU; LSI SAS2308 HBA.

@adamdmoss
Copy link
Contributor

@mas90 is that also zfs 0.8.3 ?

@mas90
Copy link

mas90 commented Nov 8, 2020

@mas90 is that also zfs 0.8.3 ?

Yes - 0.8.3-1ubuntu12.4.

@jade-42
Copy link

jade-42 commented Jan 5, 2021

I have the same crash running Linux 5.4.0-58-generic #64-Ubuntu SMP, zfs 0.8.3-1ubuntu12.5 and with a Plex workload. Although for me it causes the server to stop responding to network traffic.

@youzhongyang
Copy link
Contributor

We had the same issue:

[155116.495329] BUG: kernel NULL pointer dereference, address: 0000000000000008
[155116.495375] #PF: supervisor write access in kernel mode
[155116.495399] #PF: error_code(0x0002) - not-present page
[155116.495423] PGD 0 P4D 0
[155116.495440] Oops: 0002 [#1] PREEMPT SMP NOPTI
[155116.495464] CPU: 18 PID: 83956 Comm: nfsd Kdump: loaded Tainted: P        W  OE     5.4.0-26-lowlatency #30mw
[155116.495508] Hardware name: Supermicro SYS-1028U-E1CRTP+/X10DRU-i+, BIOS 2.0b 08/09/2016
[155116.495548] RIP: 0010:avl_rotation.isra.0+0x211/0x240 [zavl]
[155116.495580] Code: 83 c8 01 49 89 43 10 48 85 db 75 27 48 8b 45 d0 4c 89 18 48 83 c4 10 b8 01 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 63 ff <4c> 89 14 fb e9 e5 fe ff ff 48 63
 ff b8 01 00 00 00 4c 89 1c fb 48
[155116.495660] RSP: 0018:ffffa98d851bf9c0 EFLAGS: 00010202
[155116.495689] RAX: 000000000000000a RBX: 0000000000000008 RCX: 0000000000000000
[155116.495722] RDX: 0000000000000001 RSI: ffff9339c9443d00 RDI: 0000000000000000
[155116.495755] RBP: ffffa98d851bf9f8 R08: 0000000000000000 R09: 0000000000000008
[155116.495791] R10: ffff9348a778bd08 R11: 0000000000000000 R12: ffff9348a778bd10
[155116.495823] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff9339c9443d00
[155116.495857] FS:  0000000000000000(0000) GS:ffff934cffa80000(0000) knlGS:0000000000000000
[155116.495899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[155116.495926] CR2: 0000000000000008 CR3: 0000002c40694004 CR4: 00000000001606e0
[155116.495959] Call Trace:
[155116.495980]  avl_insert+0xa9/0xb0 [zavl]
[155116.496104]  zfs_rangelock_add_reader+0x14c/0x1e0 [zfs]
[155116.496189]  zfs_rangelock_enter_reader+0xf1/0x1c0 [zfs]
[155116.496272]  zfs_rangelock_enter+0xed/0xf0 [zfs]
[155116.496352]  zfs_get_data+0x158/0x350 [zfs]
[155116.496429]  zil_lwb_commit+0x1c6/0x360 [zfs]
[155116.496504]  zil_process_commit_list+0x107/0x230 [zfs]
[155116.496580]  zil_commit_writer.isra.0+0xa3/0xb0 [zfs]
[155116.496658]  zil_commit_impl+0x59/0xa0 [zfs]
[155116.496730]  zil_commit+0x40/0x60 [zfs]
[155116.496803]  zfs_fsync+0x7a/0xe0 [zfs]
[155116.496873]  zpl_commit_metadata+0x4c/0x70 [zfs]
[155116.496905]  commit_inode_metadata+0x1f/0x30 [nfsd]
[155116.496935]  nfsd_create_setattr+0x5c/0x80 [nfsd]
[155116.496963]  nfsd_create_locked+0x10c/0x2c0 [nfsd]
[155116.496992]  nfsd_create+0x10b/0x170 [nfsd]
[155116.497019]  nfsd3_proc_mkdir+0xda/0x170 [nfsd]
[155116.497047]  nfsd_dispatch+0xd6/0x220 [nfsd]
[155116.497083]  svc_process_common+0x3af/0x710 [sunrpc]
[155116.497118]  ? svc_sock_secure_port+0x16/0x30 [sunrpc]
[155116.497147]  ? nfsd_svc+0x2f0/0x2f0 [nfsd]
[155116.497177]  svc_process+0xd9/0x110 [sunrpc]
[155116.497203]  nfsd+0xf2/0x150 [nfsd]
[155116.498056]  kthread+0x104/0x140
[155116.498951]  ? nfsd_destroy+0x70/0x70 [nfsd]
[155116.499790]  ? kthread_park+0x90/0x90
[155116.500597]  ret_from_fork+0x1f/0x40

@Whytey
Copy link

Whytey commented Aug 27, 2021

I am seeing the same thing here. Sporadic misbehaviour on one of my LXD containers that are backed by a very trivial ZFS pool that runs Plex. Every couple of weeks, load increases though no specific process can be blamed for the load. Unable to manipulate the container in anyway. Reboots take forever due to issues shutting down the ZFS pool I have.

From dmesg:

[Thu Aug 26 04:21:50 2021] BUG: kernel NULL pointer dereference, address: 0000000000000008
[Thu Aug 26 04:21:50 2021] #PF: supervisor write access in kernel mode
[Thu Aug 26 04:21:50 2021] #PF: error_code(0x0002) - not-present page
[Thu Aug 26 04:21:50 2021] PGD 0 P4D 0 
[Thu Aug 26 04:21:50 2021] Oops: 0002 [#1] SMP PTI
[Thu Aug 26 04:21:50 2021] CPU: 7 PID: 3369243 Comm: Plex Media Serv Tainted: P           O      5.4.0-81-generic #91-Ubuntu
[Thu Aug 26 04:21:50 2021] Hardware name: Dell Inc. PowerEdge T320/0FDT3J, BIOS 2.4.2 01/29/2015
[Thu Aug 26 04:21:50 2021] RIP: 0010:avl_rotation.isra.0+0x211/0x240 [zavl]
[Thu Aug 26 04:21:50 2021] Code: 83 c8 01 49 89 43 10 48 85 db 75 27 48 8b 45 d0 4c 89 18 48 83 c4 10 b8 01 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 63 ff <4c> 89 14 fb e9 e5 fe ff ff 48 63 ff b8 01 00 00 00 4c 89 1c fb 48
[Thu Aug 26 04:21:50 2021] RSP: 0018:ffff96f845bbbb70 EFLAGS: 00010202
[Thu Aug 26 04:21:50 2021] RAX: 000000000000000a RBX: 0000000000000008 RCX: 0000000000000000
[Thu Aug 26 04:21:50 2021] RDX: 0000000000000001 RSI: ffff89f7d753b460 RDI: 0000000000000000
[Thu Aug 26 04:21:50 2021] RBP: ffff96f845bbbba8 R08: 0000000000000000 R09: 0000000000000008
[Thu Aug 26 04:21:50 2021] R10: ffff89f432dfad08 R11: 0000000000000000 R12: ffff89f432dfad10
[Thu Aug 26 04:21:50 2021] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff89f7d753b460
[Thu Aug 26 04:21:50 2021] FS:  00007f80f47d7b38(0000) GS:ffff89fb0f1c0000(0000) knlGS:0000000000000000
[Thu Aug 26 04:21:50 2021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Aug 26 04:21:50 2021] CR2: 0000000000000008 CR3: 000000066fd64004 CR4: 00000000001606e0
[Thu Aug 26 04:21:50 2021] Call Trace:
[Thu Aug 26 04:21:50 2021]  avl_insert+0xa9/0xb0 [zavl]
[Thu Aug 26 04:21:50 2021]  zfs_rangelock_add_reader+0x14c/0x1e0 [zfs]
[Thu Aug 26 04:21:50 2021]  zfs_rangelock_enter_reader+0xf1/0x1c0 [zfs]
[Thu Aug 26 04:21:50 2021]  zfs_rangelock_enter+0xed/0xf0 [zfs]
[Thu Aug 26 04:21:50 2021]  zfs_get_data+0x158/0x350 [zfs]
[Thu Aug 26 04:21:50 2021]  zil_lwb_commit+0x1c6/0x360 [zfs]
[Thu Aug 26 04:21:50 2021]  zil_process_commit_list+0xf1/0x210 [zfs]
[Thu Aug 26 04:21:50 2021]  zil_commit_writer.isra.0+0xa3/0xb0 [zfs]
[Thu Aug 26 04:21:50 2021]  zil_commit_impl+0x59/0xa0 [zfs]
[Thu Aug 26 04:21:50 2021]  zil_commit+0x40/0x60 [zfs]
[Thu Aug 26 04:21:50 2021]  zfs_fsync+0x7a/0xe0 [zfs]
[Thu Aug 26 04:21:50 2021]  zpl_fsync+0x5c/0x90 [zfs]
[Thu Aug 26 04:21:50 2021]  vfs_fsync_range+0x49/0x80
[Thu Aug 26 04:21:50 2021]  ? __fget_light+0x57/0x70
[Thu Aug 26 04:21:50 2021]  do_fsync+0x3d/0x70
[Thu Aug 26 04:21:50 2021]  __x64_sys_fsync+0x14/0x20
[Thu Aug 26 04:21:50 2021]  do_syscall_64+0x57/0x190
[Thu Aug 26 04:21:50 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Thu Aug 26 04:21:50 2021] RIP: 0033:0x7f810138b739
[Thu Aug 26 04:21:50 2021] Code: c0 0f 85 24 00 00 00 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 d5 cb ff ff 41 57 41 56 53 48 81 ec 90 00 00 00 49 89 f6 b8
[Thu Aug 26 04:21:50 2021] RSP: 002b:00007f80f47d4508 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[Thu Aug 26 04:21:50 2021] RAX: ffffffffffffffda RBX: 000000000000004a RCX: 00007f810138b739
[Thu Aug 26 04:21:50 2021] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
[Thu Aug 26 04:21:50 2021] RBP: 00007f80f47d4580 R08: 0000000000000000 R09: 0000000000000000
[Thu Aug 26 04:21:50 2021] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f80f47d7b38
[Thu Aug 26 04:21:50 2021] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f80f47d7b74
[Thu Aug 26 04:21:50 2021] Modules linked in: unix_diag hid_plantronics uas usb_storage ufs qnx4 hfsplus hfs minix ntfs msdos jfs nf_conntrack_netlink xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE xfrm_user xfrm_algo xt_addrtype overlay nfs lockd grace fscache sunrpc binfmt_misc veth ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter nf_tables nfnetlink vhost_vsock vmw_vsock_virtio_transport_common vhost vsock bridge stp llc xfs nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm cp210x usbserial mei_me rapl dcdbas ipmi_si intel_cstate joydev input_leds ipmi_devintf mei ipmi_msghandler mac_hid acpi_power_meter sch_fq_codel ip_tables x_tables autofs4 btrfs
[Thu Aug 26 04:21:50 2021]  zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c raid1 raid0 multipath linear mgag200 drm_vram_helper i2c_algo_bit ttm crct10dif_pclmul crc32_pclmul drm_kms_helper syscopyarea ghash_clmulni_intel sysfillrect aesni_intel sysimgblt fb_sys_fops crypto_simd cryptd ahci glue_helper drm lpc_ich libahci tg3 megaraid_sas wmi
[Thu Aug 26 04:21:50 2021] CR2: 0000000000000008
[Thu Aug 26 04:21:50 2021] ---[ end trace d0c848bce097565a ]---
[Thu Aug 26 04:21:50 2021] RIP: 0010:avl_rotation.isra.0+0x211/0x240 [zavl]
[Thu Aug 26 04:21:50 2021] Code: 83 c8 01 49 89 43 10 48 85 db 75 27 48 8b 45 d0 4c 89 18 48 83 c4 10 b8 01 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 63 ff <4c> 89 14 fb e9 e5 fe ff ff 48 63 ff b8 01 00 00 00 4c 89 1c fb 48
[Thu Aug 26 04:21:50 2021] RSP: 0018:ffff96f845bbbb70 EFLAGS: 00010202
[Thu Aug 26 04:21:50 2021] RAX: 000000000000000a RBX: 0000000000000008 RCX: 0000000000000000
[Thu Aug 26 04:21:50 2021] RDX: 0000000000000001 RSI: ffff89f7d753b460 RDI: 0000000000000000
[Thu Aug 26 04:21:50 2021] RBP: ffff96f845bbbba8 R08: 0000000000000000 R09: 0000000000000008
[Thu Aug 26 04:21:50 2021] R10: ffff89f432dfad08 R11: 0000000000000000 R12: ffff89f432dfad10
[Thu Aug 26 04:21:50 2021] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff89f7d753b460
[Thu Aug 26 04:21:50 2021] FS:  00007f80f47d7b38(0000) GS:ffff89fb0f1c0000(0000) knlGS:0000000000000000
[Thu Aug 26 04:21:50 2021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Aug 26 04:21:50 2021] CR2: 0000000000000008 CR3: 000000066fd64004 CR4: 00000000001606e0

ZFS pool info:

djwhyte@server1:~$ zpool status
  pool: default
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:16:44 with 0 errors on Sun Aug  8 00:40:46 2021
config:

	NAME        STATE     READ WRITE CKSUM
	default     ONLINE       0     0     0
	  sdc       ONLINE       0     0     0

errors: No known data errors
djwhyte@server1:~$ 

Version info:

djwhyte@server1:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.2 LTS
Release:	20.04
Codename:	focal
djwhyte@server1:~$ uname -a
Linux server1 5.4.0-81-generic #91-Ubuntu SMP Thu Jul 15 19:09:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
djwhyte@server1:~$ apt list zfsutils-linux
Listing... Done
zfsutils-linux/focal-updates 0.8.3-1ubuntu12.12 amd64 [upgradable from: 0.8.3-1ubuntu12.11]
N: There are 3 additional versions. Please use the '-a' switch to see them.

Related thread in the LXD forums - https://discuss.linuxcontainers.org/t/single-lxd-container-gets-in-a-bad-state-need-a-host-restart-to-resolve/11751

@stale
Copy link

stale bot commented Sep 17, 2023

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Sep 17, 2023
@faustinoaq
Copy link

I also use ZFS and a Plex server, I was looking at my email alerts from a few days ago:

image

It automatically restarted, I didn't notice the downtime but I got the alert

# journalctl --since "2024-07-03 02:20:24"
jul 03 02:20:24 bserver kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
jul 03 02:20:24 bserver kernel: #PF: supervisor write access in kernel mode
jul 03 02:20:24 bserver kernel: #PF: error_code(0x0002) - not-present page
jul 03 02:20:24 bserver kernel: PGD 0 P4D 0 
jul 03 02:20:24 bserver kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
jul 03 02:20:24 bserver kernel: CPU: 0 PID: 2559 Comm: CPU 0/KVM Tainted: P           O       6.5.0-41-generic #41~22.04.2-Ubuntu
jul 03 02:20:24 bserver kernel: Hardware name: BIOSTAR Group B450MH/B450MH, BIOS 5.17 03/04/2024
jul 03 02:20:24 bserver kernel: RIP: 0010:_raw_spin_lock+0x13/0x60
jul 03 02:20:24 bserver kernel: Code: 31 db e9 00 18 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 65 ff 05 8c 12 ef 51 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 1b 31 c0 31 d2 31 c9 31>
jul 03 02:20:24 bserver kernel: RSP: 0018:ffffadf986603c90 EFLAGS: 00010046
jul 03 02:20:24 bserver kernel: RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
jul 03 02:20:24 bserver kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000008
jul 03 02:20:24 bserver kernel: RBP: ffffadf986603cc8 R08: 0000000000000000 R09: 0000000000000000
jul 03 02:20:24 bserver kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
jul 03 02:20:24 bserver kernel: R13: 0000000000000092 R14: ffff9730f101fb50 R15: 0226800000000000
jul 03 02:20:24 bserver kernel: FS:  00007e293e071640(0000) GS:ffff9730f1000000(0000) knlGS:0000000000000000
jul 03 02:20:24 bserver kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 03 02:20:24 bserver kernel: CR2: 0000000000000008 CR3: 000000013e782000 CR4: 00000000003506f0
jul 03 02:20:24 bserver kernel: Call Trace:
jul 03 02:20:24 bserver kernel:  <TASK>
jul 03 02:20:24 bserver kernel:  ? show_regs+0x6d/0x80
jul 03 02:20:24 bserver kernel:  ? __die+0x24/0x80
jul 03 02:20:24 bserver kernel:  ? page_fault_oops+0x99/0x1b0
jul 03 02:20:24 bserver kernel:  ? do_user_addr_fault+0x31d/0x6b0
jul 03 02:20:24 bserver kernel:  ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel:  ? kvm_set_lapic_tscdeadline_msr+0x50/0xa0 [kvm]
jul 03 02:20:24 bserver kernel:  ? exc_page_fault+0x83/0x1b0
jul 03 02:20:24 bserver kernel:  ? asm_exc_page_fault+0x27/0x30
jul 03 02:20:24 bserver kernel:  ? _raw_spin_lock+0x13/0x60
jul 03 02:20:24 bserver kernel:  ? speculation_ctrl_update+0xda/0x1e0
jul 03 02:20:24 bserver kernel:  x86_virt_spec_ctrl+0x61/0x70
jul 03 02:20:24 bserver kernel:  svm_vcpu_run+0x59a/0x860 [kvm_amd]
jul 03 02:20:24 bserver kernel:  vcpu_enter_guest+0x456/0xf00 [kvm]
jul 03 02:20:24 bserver kernel:  ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel:  ? kvm_apic_local_deliver+0xa1/0xd0 [kvm]
jul 03 02:20:24 bserver kernel:  vcpu_run+0x46/0x290 [kvm]
jul 03 02:20:24 bserver kernel:  kvm_arch_vcpu_ioctl_run+0x1d4/0x590 [kvm]
jul 03 02:20:24 bserver kernel:  ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel:  ? rseq_ip_fixup+0x90/0x1f0
jul 03 02:20:24 bserver kernel:  kvm_vcpu_ioctl+0x297/0x800 [kvm]
jul 03 02:20:24 bserver kernel:  ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel:  ? __seccomp_filter+0x37b/0x560
jul 03 02:20:24 bserver kernel:  ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel:  ? __fget_light+0xa5/0x120
jul 03 02:20:24 bserver kernel:  __x64_sys_ioctl+0xa3/0xf0
jul 03 02:20:24 bserver kernel:  x64_sys_call+0x1198/0x20b0
jul 03 02:20:24 bserver kernel:  do_syscall_64+0x55/0x90
jul 03 02:20:24 bserver kernel:  ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel:  ? do_syscall_64+0x61/0x90
jul 03 02:20:24 bserver kernel:  ? common_interrupt+0x54/0xb0
jul 03 02:20:24 bserver kernel:  entry_SYSCALL_64_after_hwframe+0x73/0xdd
jul 03 02:20:24 bserver kernel: RIP: 0033:0x7e2945af194f
jul 03 02:20:24 bserver kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44>
jul 03 02:20:24 bserver kernel: RSP: 002b:00007e293e070460 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
jul 03 02:20:24 bserver kernel: RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007e2945af194f
jul 03 02:20:24 bserver kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015
jul 03 02:20:24 bserver kernel: RBP: 00005dd2075f2b30 R08: 00005dd204e43f10 R09: 0000000000000000
jul 03 02:20:24 bserver kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
jul 03 02:20:24 bserver kernel: R13: 0000000000000001 R14: 000000000000c040 R15: 0000000000000000
jul 03 02:20:24 bserver kernel:  </TASK>
jul 03 02:20:24 bserver kernel: Modules linked in: macvtap macvlan vhost_net vhost vhost_iotlb tap rpcsec_gss_krb5 nfsv4 nfs fscache netfs xt_CHECKSUM xt_conntrack ipt_REJECT nf_reject_ipv4 bridge stp llc>
jul 03 02:20:24 bserver kernel:  libcrc32c raid1 raid0 multipath linear hid_generic crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic usbhid ghash_clmulni_intel sha256_ssse3 hid sha1_ssse3 aes>
jul 03 02:20:24 bserver kernel: CR2: 0000000000000008
jul 03 02:20:24 bserver kernel: ---[ end trace 0000000000000000 ]---
jul 03 02:20:24 bserver kernel: pstore: backend (efi_pstore) writing error (-5)
jul 03 02:20:24 bserver kernel: RIP: 0010:_raw_spin_lock+0x13/0x60
jul 03 02:20:24 bserver kernel: Code: 31 db e9 00 18 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 65 ff 05 8c 12 ef 51 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 1b 31 c0 31 d2 31 c9 31>
jul 03 02:20:24 bserver kernel: RSP: 0018:ffffadf986603c90 EFLAGS: 00010046
jul 03 02:20:24 bserver kernel: RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
jul 03 02:20:24 bserver kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000008
jul 03 02:20:24 bserver kernel: RBP: ffffadf986603cc8 R08: 0000000000000000 R09: 0000000000000000
jul 03 02:20:24 bserver kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
jul 03 02:20:24 bserver kernel: R13: 0000000000000092 R14: ffff9730f101fb50 R15: 0226800000000000
jul 03 02:20:24 bserver kernel: FS:  00007e293e071640(0000) GS:ffff9730f1000000(0000) knlGS:0000000000000000
jul 03 02:20:24 bserver kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 03 02:20:24 bserver kernel: CR2: 0000000000000008 CR3: 000000013e782000 CR4: 00000000003506f0
jul 03 02:20:24 bserver kernel: note: CPU 0/KVM[2559] exited with irqs disabled
jul 03 02:20:24 bserver kernel: note: CPU 0/KVM[2559] exited with preempt_count 2

Pool is working as expected:

# zpool status apool
  pool: apool
 state: ONLINE
  scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Jul  6 23:36:38 2024
config:

        NAME        STATE     READ WRITE CKSUM
        apool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0

errors: No known data errors

Version info:

~ ❯ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy
~ ❯ uname -a
Linux bserver 6.5.0-41-generic #41~22.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun  3 11:32:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
~ ❯ apt list zfsutils-linux
Listing... Done
zfsutils-linux/jammy-updates,jammy-security,now 2.1.5-1ubuntu6~22.04.4 amd64 [installed]

@stale stale bot removed the Status: Stale No recent activity for issue label Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

8 participants