New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic after setting dnodesize=auto #8705
Comments
dnodesize=auto|
Thanks for letting us know so we can look in to it. |
|
We hit a similar problem with Lustre and ZFS 0.7.13 during a benchmark run: |
|
We are also seeing this with Lustre 2.12.3/ZFS 0.7.13 on one of our HPC clusters. We've seen 2 instances in 2 days - stack traces follow - is there any additional information I can provide that would be useful if this happens again? Following the PANIC I/O to the affected zpool appears to hang and the zpool cannot be exported, however zpool status shows the zpool as online - is this expected behaviour? Following a manual failover the zpool was able to be imported and mounted without ny problems February 6th - 2020 Feb 8th 2020 Feb 8 12:47:37 amds02a kernel: VERIFY(dnode_add_ref(dn, (void *)(uintptr_t)tx->tx_txg)) failed The server has 2 lustre MDT zpools during normal operation, both with the same setup: [root@amds02a ~]# zpool get all alicemdt00 errors: No known data errors |
|
We also see it on Lustre 2.12.2 servers, Centos 7.6, 3.10.0-957.10.1.el7_lustre.x86_64, zfs 0.7.13, and 2.12.3 servers. |
|
We are still seeing this problem 1-2 times a week, the dnodesize on the ZFS pool which is panics is "auto" - I'll try setting it to legacy. |
|
For those of you hitting this problem regularly, does setting |
|
We also ran into this bug on an MDS running 4.19.150, zfs 0.7.13 and Lustre 2.12.5. |
|
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
|
We just hit this yesterday with a stack dump like cjm14's Feb 6th stackdump going through __mutex_unlock_slowpath. |
|
We've just seen this on 2.12.6, CentOS 7.9, ZFS 0.7.13. Just in time for the stale bot! |
|
After moving our Lustre 2.12 servers from zfs-0.7.13 to zfs-2.1.x we're no longer able to trigger this issue. Unfortunately, it's not entirely clear from the skimming commit logs exactly what commit resolved this bug. If you're able to update your zfs version to the latest 2.1.6 release I'd recommend it. |
System information
Describe the problem you're observing
I use beegfs and the metadata part is stored in extended attributes, so I also had
xattr=sasince the beginning. After settingdnodesize=autotoday I received a panic during a benchmark.Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: