We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Also seen in January, 2023 in FreeBSD 13.1-CURRENT.
Server has crashed several times during a zpool scrub due to a corrupt in-memory AVL tree.
See also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268909
Run zpool scrub on my system. Not very useful to ZFS developers, I understand.
zpool scrub
The symptom is a GPF due to a bad pointer, in one case null and in the other case invalid.
In the most recent crash avl_walk was called with a node that looks like
avl_walk
{avl_child = {0x0, 0xfffff80200004d20}, avl_pcb = 0xfffff801f1c461fa}
The [1] child points to
{avl_child = {0x395753c375b177a6, 0xfa91e69b009252c}, avl_pcb = 0xfffff801476764a6}
The parent link avl_pcb is correct but the children are invalid pointers. The loop crashed trying to examine the [0] child.
avl_pcb
In a previous crash, also during a scrub, avl_rotation crashed because gchild was null in this block of code
avl_rotation
gchild
gchild = child->avl_child[right]; gleft = gchild->avl_child[left]; gright = gchild->avl_child[right];
This is the bad pool:
NAME SIZE ALLOC FREE FRAG CAP DEDUP HEALTH ALTROOT data 36.4T 20.4T 15.9T 34% 56% 1.17x ONLINE - raidz2-0 36.4T 20.4T 15.9T 34% 56.2% - ONLINE ada0 9.10T - - - - - ONLINE ada1 9.10T - - - - - ONLINE ada2 9.10T - - - - - ONLINE ada3 9.10T - - - - - ONLINE cache - - - - - - - ada4p5 150G 143G 6.69G 0% 95.5% - ONLINE
The cache partition is on an SSD. The other disks are spinning hard drives.
The largest filesystem has encryption=aes-256-gcm and dedup=on. The other filesystems have dedup=verify and no encryption.
The CPU is an AMD Opteron x3421 ("excavator") and the system is compiled with -march=bdver4.
The interesting part of the stack trace is
#7 avl_walk (tree=tree@entry=0xfffff80009178260, oldnode=oldnode@entry=0xfffff80147676440, left=left@entry=1) at /usr/src/sys/contrib/openzfs/module/avl/avl.c:147 #8 0xffffffff81c1bea5 in scan_io_queue_gather (queue=0xfffff80009178200, list=0xfffffe010f60eda8, rs=<optimized out>) at /usr/src/sys/contrib/openzfs/module/zfs/dsl_scan.c:2942 #9 scan_io_queues_run_one (arg=0xfffff80009178200) at /usr/src/sys/contrib/openzfs/module/zfs/dsl_scan.c:3093 #10 0xffffffff81b41bbf in taskq_run (arg=0xfffff80041735d80, pending=<optimized out>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/spl/spl_taskq.c:315
The text was updated successfully, but these errors were encountered:
No branches or pull requests
System information
Also seen in January, 2023 in FreeBSD 13.1-CURRENT.
Describe the problem you're observing
Server has crashed several times during a zpool scrub due to a corrupt in-memory AVL tree.
See also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268909
Describe how to reproduce the problem
Run
zpool scrub
on my system. Not very useful to ZFS developers, I understand.Include any warning/errors/backtraces from the system logs
The symptom is a GPF due to a bad pointer, in one case null and in the other case invalid.
In the most recent crash
avl_walk
was called with a node that looks likeThe [1] child points to
The parent link
avl_pcb
is correct but the children are invalid pointers. The loop crashed trying to examine the [0] child.In a previous crash, also during a scrub,
avl_rotation
crashed becausegchild
was null in this block of codeThis is the bad pool:
The cache partition is on an SSD. The other disks are spinning hard drives.
The largest filesystem has encryption=aes-256-gcm and dedup=on.
The other filesystems have dedup=verify and no encryption.
The CPU is an AMD Opteron x3421 ("excavator") and the system is compiled with -march=bdver4.
The interesting part of the stack trace is
The text was updated successfully, but these errors were encountered: