New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
livelock in dnode_hold_impl() #8426
Comments
|
@bunder2015 unfortunately I don't have a reproducer on my side (the issue was hit few times on a different site), so I can't verify that with 0.8*, but looking at the patches I don't think they are relevant. |
|
@bunder2015 The relevant code on master is the same as 0.7.9, what we're running: No thread can progress past the It's alright with only a couple of threads here, since eventually they'll be in the right place to allow progress. But as you add more threads the chance of this gets much smaller; in the crash dump I looked there were ~20 threads spinning in this loop until it triggered a soft lockup and panic. |
Soft lockups could happen when multiple threads trying to get zrl on the same dnode handle in order to allocate and initialize the dnode marked as DN_SLOT_ALLOCATED. Don't loop from beginning when we can't get zrl, otherwise we would increase the zrl refcount and nobody can actually lock it. Signed-off-by: Li Dongyang <dongyangli@ddn.com> Closes openzfs#8426
Soft lockups could happen when multiple threads trying to get zrl on the same dnode handle in order to allocate and initialize the dnode marked as DN_SLOT_ALLOCATED. Don't loop from beginning when we can't get zrl, otherwise we would increase the zrl refcount and nobody can actually lock it. Signed-off-by: Li Dongyang <dongyangli@ddn.com> Closes openzfs#8426
Soft lockups could happen when multiple threads trying to get zrl on the same dnode handle in order to allocate and initialize the dnode marked as DN_SLOT_ALLOCATED. Don't loop from beginning when we can't get zrl, otherwise we would increase the zrl refcount and nobody can actually lock it. Signed-off-by: Li Dongyang <dongyangli@ddn.com> Closes openzfs#8426
|
I'm running into this issue on some of my systems. I see the fix has been merged, but is not yet included in any tagged releases (as far as I can tell). Is it only going to be in the next 0.8 tagged release candidate (or release, if that is what is coming next), or will it be in the next 0.7 release as well? If its only going to be included in 0.8, until 0.8 is officially released, is it safe to cherry pick the patch and apply it on top of 0.7.13? It seems to apply cleanly, just with offsets. |
|
@cperl82 thanks for the reminder. This change is safe to apply to 0.7.13 and we will be applying it to the next point release. I've added it to the 0.7.14 to track it. And it'll be in 0.8.0-rc4 when tagged. |
|
This issue has not yet been added to the 0.7.14 project tracking. |
System information
Describe the problem you're observing
dnode_hold_impl() seem to be spinning forever while many threads are trying to initialize same dnode
while dnode_hold_alloc_lock_retry = 124742875
at least one thread between dnode_slots_hold() and dnode_slots_rele() is enough to block dnode_slots_tryenter() ?
Describe how to reproduce the problem
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: