Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS Kernel Panic #16187

Open
chriexpe opened this issue May 10, 2024 · 0 comments
Open

ZFS Kernel Panic #16187

chriexpe opened this issue May 10, 2024 · 0 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@chriexpe
Copy link

System information

Type Version/Name
Distribution Name Unraid
Distribution Version 6.12.8
Kernel Version 6.1.74-Unraid
Architecture x86_64
OpenZFS Version 2.1.14-1

These last few weeks I've been getting this kernel panic from a ZFS pool that I've created a year ago (almost daily), honestly idk what to do aside from removing it and starting fresh (and loose a few TBs of data).

The pool in question is the main one from my server, formed by 3x8TB HDD Seagate Exos 7e8 that is connected to a RAID Card (passthrough mode), this pool is constantly being written by a NVR, and these crashes are random apparently (or it coincidentally crashes after I write/read a file when it's been running for quite a while), this is the error:

May 10 17:26:43 Tower kernel: kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
May 10 17:26:43 Tower kernel: BUG: unable to handle page fault for address: ffff88941de53c80
May 10 17:26:43 Tower kernel: #PF: supervisor instruction fetch in kernel mode
May 10 17:26:43 Tower kernel: #PF: error_code(0x0011) - permissions violation
May 10 17:26:43 Tower kernel: PGD 4c01067 P4D 4c01067 PUD 7b2111063 PMD 800000141de001e3 
May 10 17:26:43 Tower kernel: Oops: 0011 [#1] PREEMPT SMP NOPTI
May 10 17:26:43 Tower kernel: CPU: 5 PID: 9080 Comm: dp_sync_taskq Tainted: P     U     O       6.1.74-Unraid #1
May 10 17:26:43 Tower kernel: Hardware name: ASRock Z690 Phantom Gaming 4/D5/Z690 Phantom Gaming 4/D5, BIOS 15.01 01/04/2024
May 10 17:26:43 Tower kernel: RIP: 0010:0xffff88941de53c80
May 10 17:26:43 Tower kernel: Code: ff ff 98 3d e5 1d 94 88 ff ff 01 00 00 f0 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 37 81 3e 66 00 00 00 00 <c0> 3c e5 1d 94 88 ff ff 60 9a ee a0 ff ff ff ff d1 4b bc 0f 3e 99
May 10 17:26:43 Tower kernel: RSP: 0018:ffffc900221afd80 EFLAGS: 00010282
May 10 17:26:43 Tower kernel: RAX: ffff88941de53c80 RBX: ffff8891050d2000 RCX: 0000000000000003
May 10 17:26:43 Tower kernel: RDX: 0000000000000001 RSI: ffffffff8214ded8 RDI: ffff8891050d2000
May 10 17:26:43 Tower kernel: RBP: ffffffffa0987a45 R08: ffff8885614352c0 R09: 0000000080190018
May 10 17:26:43 Tower kernel: R10: ffff8885614352c0 R11: 0000000000000010 R12: ffff88814f502000
May 10 17:26:43 Tower kernel: R13: ffff88810662eb90 R14: ffff88815d74f000 R15: ffff889593ce7c00
May 10 17:26:43 Tower kernel: FS:  0000000000000000(0000) GS:ffff88a00f540000(0000) knlGS:0000000000000000
May 10 17:26:43 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 10 17:26:43 Tower kernel: CR2: ffff88941de53c80 CR3: 000000086122e000 CR4: 0000000000752ee0
May 10 17:26:43 Tower kernel: PKRU: 55555554
May 10 17:26:43 Tower kernel: Call Trace:
May 10 17:26:43 Tower kernel: <TASK>
May 10 17:26:43 Tower kernel: ? __die_body+0x1a/0x5c
May 10 17:26:43 Tower kernel: ? page_fault_oops+0x329/0x376
May 10 17:26:43 Tower kernel: ? exc_page_fault+0xf4/0x11d
May 10 17:26:43 Tower kernel: ? asm_exc_page_fault+0x22/0x30
May 10 17:26:43 Tower kernel: ? dnode_destroy+0x1e6/0x1e6 [zfs]
May 10 17:26:43 Tower kernel: ? dbuf_evict_user+0x34/0x60 [zfs]
May 10 17:26:43 Tower kernel: ? dbuf_clear_data+0xf/0x3e [zfs]
May 10 17:26:43 Tower kernel: ? dbuf_destroy+0x9b/0x3b8 [zfs]
May 10 17:26:43 Tower kernel: ? dnode_rele_task+0x4c/0x69 [zfs]
May 10 17:26:43 Tower kernel: ? taskq_thread+0x266/0x38a [spl]
May 10 17:26:43 Tower kernel: ? wake_up_q+0x44/0x44
May 10 17:26:43 Tower kernel: ? taskq_dispatch_delay+0x106/0x106 [spl]
May 10 17:26:43 Tower kernel: ? kthread+0xe4/0xef
May 10 17:26:43 Tower kernel: ? kthread_complete_and_exit+0x1b/0x1b
May 10 17:26:43 Tower kernel: ? ret_from_fork+0x1f/0x30
May 10 17:26:43 Tower kernel: </TASK>
May 10 17:26:43 Tower kernel: Modules linked in: dm_mod ipvlan xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod i915(O) drm_buddy i2c_algo_bit ttm drm_display_helper drm_kms_helper drm intel_gtt agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc bonding tls zfs(PO) intel_rapl_msr zunicode(PO) intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp zzstd(O) coretemp kvm_intel zlua(O) kvm zavl(PO) icp(PO) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel zcommon(PO) crypto_simd znvpair(PO) cryptd rapl intel_cstate
May 10 17:26:43 Tower kernel: spl(O) mei_hdcp mei_pxp wmi_bmof intel_uncore tpm_crb mpt3sas i2c_i801 mei_me nvme tpm_tis cp210x i2c_smbus video ahci raid_class sr_mod tpm_tis_core e1000e mei i2c_core nvme_core scsi_transport_sas libahci input_leds cdrom joydev usbserial led_class wmi tpm intel_pmc_core backlight acpi_pad acpi_tad button unix
May 10 17:26:43 Tower kernel: CR2: ffff88941de53c80
May 10 17:26:43 Tower kernel: ---[ end trace 0000000000000000 ]---
May 10 17:26:43 Tower kernel: RIP: 0010:0xffff88941de53c80
May 10 17:26:43 Tower kernel: Code: ff ff 98 3d e5 1d 94 88 ff ff 01 00 00 f0 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 37 81 3e 66 00 00 00 00 <c0> 3c e5 1d 94 88 ff ff 60 9a ee a0 ff ff ff ff d1 4b bc 0f 3e 99
May 10 17:26:43 Tower kernel: RSP: 0018:ffffc900221afd80 EFLAGS: 00010282
May 10 17:26:43 Tower kernel: RAX: ffff88941de53c80 RBX: ffff8891050d2000 RCX: 0000000000000003
May 10 17:26:43 Tower kernel: RDX: 0000000000000001 RSI: ffffffff8214ded8 RDI: ffff8891050d2000
May 10 17:26:43 Tower kernel: RBP: ffffffffa0987a45 R08: ffff8885614352c0 R09: 0000000080190018
May 10 17:26:43 Tower kernel: R10: ffff8885614352c0 R11: 0000000000000010 R12: ffff88814f502000
May 10 17:26:43 Tower kernel: R13: ffff88810662eb90 R14: ffff88815d74f000 R15: ffff889593ce7c00
May 10 17:26:43 Tower kernel: FS:  0000000000000000(0000) GS:ffff88a00f540000(0000) knlGS:0000000000000000
May 10 17:26:43 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 10 17:26:43 Tower kernel: CR2: ffff88941de53c80 CR3: 000000086122e000 CR4: 0000000000752ee0
May 10 17:26:43 Tower kernel: PKRU: 55555554
May 10 17:26:43 Tower kernel: note: dp_sync_taskq[9080] exited with irqs disabled
May 10 17:27:59 Tower kernel: kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
May 10 17:27:59 Tower kernel: BUG: unable to handle page fault for address: ffff8898d597c2c0
May 10 17:27:59 Tower kernel: #PF: supervisor instruction fetch in kernel mode
May 10 17:27:59 Tower kernel: #PF: error_code(0x0011) - permissions violation
May 10 17:27:59 Tower kernel: PGD 4c01067 P4D 4c01067 PUD 80000018c00001e3 
May 10 17:27:59 Tower kernel: Oops: 0011 [#2] PREEMPT SMP NOPTI

And it keeps repeating this same error.

If I check it with zpool status (after kernel panic) everything looks normal:

root@Tower:~# zpool status
  pool: disk1
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        disk1       ONLINE       0     0     0
          md1p1     ONLINE       0     0     0

errors: No known data errors

  pool: nasa
 state: ONLINE
  scan: scrub repaired 0B in 11:05:53 with 0 errors on Tue Apr 30 11:33:04 2024
config:

        NAME        STATE     READ WRITE CKSUM
        nasa        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sde1    ONLINE       0     0     0
            sdd1    ONLINE       0     0     0
            sdc1    ONLINE       0     0     0

errors: No known data errors

Note that I scrubbed it and there as no error.

I don't remember exactly when it started, but it's probably after upgrading to 6.12.9, tho I rolled back to 6.12.8 and it kept crashing.

I ran memtest too, but it didn't report any error.
memtest  unraid server

@chriexpe chriexpe added the Type: Defect Incorrect behavior (e.g. crash, hang) label May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

1 participant