-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PANIC at zfs_znode.c zfs_znode_sa_init() #10971
Comments
This issue is also happening to me. Any related process accessing "trigger file" would hangs there forever.
In additionally, here is my system spec:
|
I just started hitting this on Ubuntu Hirsute (development release) in the last couple of days for some unclear reason. The stacks all show code related to SA and for whatever reason it was happening with multiple Chrome/Electron apps trying to access the "Cache" dir specifically - but different instances of the cache dir in different paths (e.g. ~/.cache/google-chrome/Default/Cache and ~/.config/Mattermost/Cache) . Those processes stay hung forever and I can't strace/gdb them or even ls that same directory while the task is stuck presumably due to a lock or similar. I had zfs-dkms installed, i removed that and went back to the version built with the kernel in Ubuntu and it's working OK but that version is 0.8.4-1ubuntu11 where as zfs-dkms was 0.8.4-1ubuntu16. They added quite a lot of patches in "ubuntu13" for Linux 5.9 compatability as part of https://bugs.launchpad.net/bugs/1899826 .. However given the other reporters were on stable versions it seems more likely they may be the same effect but different cause possibly? Just reverting to the 0.8.4-1ubuntu11 code resolved it for me. I will try install zfs-dkms of the same version to see if it happens there in case it's some quirk of the DKMS build versus the build that happens in the Ubuntu kernel packages. Happy to try debug if anyone has suggestions on what to look at. Reasonably competent programmer, debugger and very familiar with ZFS from an admin and various internals but not super familiar with the code-base as a whole. Can also look to try the native version and see if it hits or whether it's specific to the Ubuntu patches. Also opened here: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476 Note: not really expecting support for the Ubuntu patched version here so much as, this is the only Google hit for that error, so wanted to contribute information here in case it helps others and happy to try debug if that also helps.
|
I have found the same problem. Going back to 0.8.4-1ubuntu11 fixes it for new files. If I move the old chrome cache to a different name, the problem disappears, but if I try to remove it, list it, etc... there is some persistent corruption in the filesystem that triggers the panic. |
I hit this problem again today, but now without zfs-dkms. After upgrading my kernel from 5.8.0-29-generic to 5.8.0-36-generic my Google Chrome Cache directory is broken again, had to rename it and then reboot to get out of the problem. Curiously I found a 2016(??) report of similar here: https://bbs.archlinux.org/viewtopic.php?id=217204 The renamed directories still exist if any developers have an idea about anything I can do to try and debug or understand the issue. |
Having a similar problem. Same traceback, different files. Just started with the Ubuntu 5.8.0-36 kernel. Unfortunately, booting the old kernel doesn't seem to make the existing files accessible, either. I'm a bit worried and would love to help find the root cause and make sure I don't lose more data here. |
@migrax when you say that rolling back "fixes it for new files", do you have a reliable way to reproduce this? I only found that this problem occurred with some files, but could not figure out which ones or why. |
I had the same thing, basically at a certain package version the problem started happening. If you roll back to a kernel/package without the issue, existing files are still broken but it stops creating new broken files. That's my experience to. From my naive attempt to read through the code, I think something is getting corrupted on disk that then causes the PANIC() when trying to read a file.. once that panic happens a lock is left held that stops other access to that and I suspect maybe some other unrelated files.. they maybe share some resource.. if you reboot sometimes some files that seemed broken are accessible again but the main problem file is still broken and once you try to access that file it seems to get stuck on a lock that then blocks access to other things. But I might be wrong about the blocking access to other things. In the kernel trace you first see this PANIC(): And then some hung task reports later. |
linux-image-5.8.0-29-generic: working When the issue first hit, I had zfs-dkms installed, i removed that and went back to the version built with the kernel in Ubuntu and it's working OK. That version was 0.8.4-1ubuntu11 where as zfs-dkms was 0.8.4-1ubuntu16. Problem has now repeated as the 5.8.0-36-generic kernel has now picked up 0.8.4-1ubuntu16.. lathiat@optane ~/src/zfs[zfs-2.0-release]$ sudo modinfo /lib/modules/5.8.0-36-generic/kernel/zfs/zfs.ko|grep version I don't have a good quick/easy reproducer but just using my desktop for a day or two seems I am likely to hit the issue after a while. I tried to install the upstream zfs-dkms package for 2.0 to see if I can bisect the issue on upstream versions but it breaks my boot for some reason I cannot quite figure out. I will continue to try and experiment and see if I can bisect which version broke it. Looking at the Ubuntu changelog I'd say the fix for https://bugs.launchpad.net/bugs/1899826 to backport the 5.9 and 5.10 compataibility patches is a prime suspect. I'll copy this info to the Ubuntu Launchpad bug and see if I can chase someone internally at Canonical to pick it up if I don't have enough time to continue the debug. Side note: I sortof know what I'm doing in that I'm a Linux Software engineer, dabble in kernel stuff and I am a very long time deeply knowledable ZFS user at a user-space level but my code-level knowledge of ZFS is very basic so don't mistake any confidence for actually having real knowledge :) |
I ran into this on the 0.8.4-1ubuntu16 packaged with the 5.8.0-36 kernel. I was able to use my zsys snapshots to get back to a good state from before I upgraded.
Not too different here :). The significant changes came in 0.8.4-1ubuntu13. |
zfs-2.0.1 is in hirsute-proposd so I am going to try that. Reasonable chance it will have fixed it since those patches are probably dropped. |
Yeah, all those patches were dropped. Which means the issue is either fixed or upstream.
|
I have not run into this issue since 2.0.2. |
Still running smoothly. I think this can be closed. |
The issue appeared :(
|
Is there anything I can do to provide more debug info needed for the fix? |
me too on 2.0.2 on ubuntu kernel 5.13.0-12-generic |
Same issue here. Skypeforlinux, MS Teams, VS Code, IntelliJ Idea, Firefox hangs are spotted.
Sample stack:
|
Happens here as well. I fear that this renders my computer unusable beyond a point. |
I believe I have tracked down the cause of this issue to be an Ubuntu-specific ZFS patch and have a reliable reproducer. Full details in https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476 I am not aware at this time of any good way to "fix" the issue on an existing dataset. For now I've just been moving the files into a "broken" directory and try not to access them. |
Is there an actual fix for this yet that I can apply? I'm concerned about data corruption, and of course my system is pretty much unusable due to this. I read through the launchpad chatter but it's not clear exactly what I should do to fix this today :\ Should I be downgrading to zfsutils-linux=2.0.2-1ubuntu5.2 ? Kernel: 5.13.0-7614-generic |
If you're using the zfs-dkms package it's fixed in:
The kernel builds ZFS into a module at the time of the kernel release. New kernels are released on a regular 3 week cadence but one hasn't yet been released to incorporate this fix. So for now you can install zfs-dkms to build your own module from the updated source (assuming your zfs-dkms package is one of the above two versions). Within 3 weeks or so there should be an updated kernel incorporating the fix in the pre-built zfs module. As best I can tell, it will only affect you if you have an encrypted dataset. |
Hirsute's current kernel is 5.11.0-37 that does not have the fix. Hopefully -38 will. You can verify the zfs version included in your currently running kernel with "modinfo zfs". |
@lathiat Thank you for this info. I couldn't use 2.0.2-1ubuntu5.2 since it seems to only support kernels up to 5.10. I went ahead and installed zfs 2.0.6 from source using DKMS, and it looks like I still can't remove files that were previously affected by this.. at least not without causing the same Does this indeed mean that there is permanent corruption of affected files? I saw different opinions on this in the launchpad discussion. Current zfs versions: |
Yeah for me there is permanent corruption I can't fix and that scrub doesn't find. I had to move all those files to an un-used directory. Others are having the issue only on boot, I think basically what happens, is that the data is corrupted when loaded into the ARC and then that data may or may not get flushed back to disk. For some people it happens on boot and I think it never gets flushed to disk, because their whole / is encrypted, for me only /home is encrypted so the rest of the system keeps working and maybe that gives it an oppurtunity to end up back on disk. I don't currently have a solution (other than just moving them out of the way into /home/broken) to get rid of the broken files. |
looks like the fix has been uploaded to the proposed channel for Ubuntu 21.10 https://launchpad.net/ubuntu/+source/linux/5.13.0-20.20
|
After upgrading my system from Ubuntu 21.04 (openzfs 2.0.2, Linux 5.11) to 21.10 (openzfs 2.0.6, Linux 5.13.0-19) my system is also affected by this issue. |
The fixed kernel is now released. Please upgrade your kernel to 5.13.0-20 and reboot. And try not to use ZFS with the Kernel at all If you still get the errors after the new kernel it means the corruption got written to the FS and there is no known way to fix that currently. You have to figure out which files are broken and move them somewhere they won’t be accessed. Scrub does not identify it. |
@lathiat when you say "try not to use ZFS with the Kernel at all", are you implying that it will always be safer to install and use the zfs-dkms package instead? |
No I meant just don’t use the broken kernel release. As corrupt data can get committed to disk. With the latest kernel on Impish it’s all good no need for the DKMs package now. |
Thanks for the response. I understand that |
This problem is caused by a patch that we don't have, ubuntu has released a fix for this, see https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476 If you hit this - upgrade to kernel 5.13.0-20 or later |
Seems like there is a regression in kernel 5.17.5. I got this bug after upgrading to pop OS 22.04 and it wasn't a problem before on 20.04. |
I don't know what this means @ineo00048 |
System information
Describe the problem you're observing
A PANIC event is logged in dmesg
Describe how to reproduce the problem
Unsure
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: