-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic: corrupted memory in l2arc #15506
Comments
One more panic. Different trace, but again smashed entry in the hdr hash.
|
I got third panic. Again different trace, but zvol is involved. Again buf is thrashed and again entire beginning of the slab containing bufs is thrashed. |
Similar to the other issues that are popping up on here the past days. ZFS 2.2.x is kinda broken atm. I am also waiting for a fix and had to shut down almost all of my services on my server to prevent a possible pool corruption. |
Please share outputs from:
Here, superior ZFS:
Here:
|
Right now I'm trying to bisect staying on the same FreeBSD main revision, but bisecting only the openzfs subtree merge. That requires some git gymnastics but is possible. Right now revision 05a7348 seems to be good, but I need to wait at least 24 hours to be sure. Your revision is definitely ahead of mine, so may be the bug is gone ;| |
The comment about this being bisected to the zvol_threading commit seems to have been deleted. Did you conclude that was actually a dead end / coincidence? |
The previous bisect point (assumed to be good) paniced after ~ 24 hours, proving the statement wrong. It is really hard to bisect this, since sometimes it takes minutes to panic and sometimes > 24 hours. |
Ahh, well that at least makes sense, it didn't seem like zvol_threading would have anything to do with where your panics were. Still panic free with |
No, running with vfs.zfs.bclone_enabled=1. Bisecting. Now running at 799e09f |
I also got a second machine with very similar configuration and work profile. Now updating it to bleeding edge FreeBSD/OpenZFS to check if any of the later commits have fixed the problem. |
This time my bisecting brought me to dbe839a. It paniced within an hour and previous revision was stable for over 24 hours. |
I'm not sure how helpful the bisect is when BRT is enabled. Though the stack smashing being hit at the DMU might be unique here (other users don't likely have debug assertions on in their ZFS modules). |
After more than 24 hours a revision marked as good panic. So pointing to dbe839a was again incorrect. @amotin gave me a patch that loops over entire hash and verifies it again and again the l2arc_feed_thread. Now I'm restarting the bisect from scratch with addition of this patch to speed up crash discovery. |
I found out that I had been stable for 3 days on a revision that previously was considered bad. So, it looks like that some boots are lucky and some are unlucky, making bisection almost useless. So I gave up on bisection and updated to recent FreeBSD/main, which I really needed for my normal work. The panic seems to no longer reproduce. I kept running endless l2arc checking cycle by @amotin , btw. I'm closing this issue and will open a new one if I can reproduce again on up to date FreeBSD and ZFS. |
Close in a correct state. |
Yes, looks like all bisecting was wrong as it appeared that there are "lucky boots" that sometimes a bad revision is actually stable for a prolonged period of time. |
System information
FreeBSD 15-CURRENT @ d6e457328d0e
OpenZFS @ 41e55b476bcf
zfs-2.2.99-184-FreeBSD_g41e55b476
zfs-kmod-2.2.99-184-FreeBSD_g41e55b476
Describe the problem you're observing
Got kernel panic running kernel with INVARIANTS enabled. The panic happened during nightly job run, which induces
find /
. The filesystem has zvols on it and pool configuration has L2ARC.The second hdr in the chain is corrupted:
The memory points at a valid location in the hdr_l2only_cache zone, which is marked as allocated in the UMA slab metadata. It is 3rd item in the slab. It appears that first five entries in the slab are all trashed with similar pattern. Starting with sixth's entry the slab items aren't corrupted.
I have core file saved and can provide more data.
The text was updated successfully, but these errors were encountered: