-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel crash when ZFS 2.2 features are enabled, but works for ZFS 2.0 #15984
Comments
Go yell at Ubuntu until they include #15634. |
Well, considering the fact that their bundled kernel modules don't even include the dirty-dnode fix, you'll gonna need to yell very loud. |
Thanks for the feedback! So I guess we can close this here and I start yelling at ubuntu? 😅 |
Last I knew, didn't they add that cherrypick but didn't cut a new kernel package just for that?
I could be wrong, but this really seems to look like that bug, so yeah. You could try building 2.2.3 locally and running it and making sure your problem goes away. |
It's pretty easy to update the kernel. Check out the source. Then (the first line is installing the tools needed to build.)
nproc is however many processors you want to give it to build. When it's finished, copy
to /lib/modules/6.5.0-25-generic/kernel/zfs/ This assumes you're updating from 2.2.x to 2.2.3 and reboot. I haven't tried it to go from 2.1.x. I think the .ko files are different. |
System information
Describe the problem you're observing
We are running multiple machines with LXD (5.20) as ephemeral GitHub Actions Runners, which results in a high number of container creation/deletion. The containers run on a ZFS filesystem which was created by LXD. After setting up another machine, we noticed that the machine crashed after about 16h of use.
After comparing the machines (all are set up identical, or should be) we noticed that on the working machines the ZFS dataset was created before ZFS 2.2 (I guess it was 2.0 or 2.1), while on the latest machine it was created with ZFS 2.2.
After this discovery we destroyed the ZFS pool and recreated it like this:
After that change, the server now runs without an issue so far.
The feature difference between the non-working and working pool were these:
In the syslog I found these pagefaults and null pointer deref:
The only thing from the kernel side I was able to retrieve was this (unable to scroll or catch it otherwise, sorry):
Not sure if this is helpful since this is far from my expertise, but maybe it makes sense to anyone here.
The text was updated successfully, but these errors were encountered: