Skip to content
This repository has been archived by the owner on Feb 26, 2020. It is now read-only.

"rwsem_is_locked() acquires sem->wait_lock" autoconf test fails when generic rwsem code is not used #447

Closed
dweeezil opened this issue Apr 6, 2015 · 4 comments

Comments

@dweeezil
Copy link
Contributor

dweeezil commented Apr 6, 2015

The autoconf test for rwsem_is_locked() in 46aa7b3 appears to be testing for the presence of torvalds/linux@29671f2 which converted it from an inline function in include/linux/rwsem-spinlock.h to a real function in lib/rwsem-spinlock.c. Unfortunately the test only works when CONFIG_RWSEM_GENERIC_SPINLOCK is defined. It's not defined in most common cases nowadays because most architectures have their own versions of the functions.

This causes the SPL wrapper to be enabled when it wouldn't otherwise be necessary and increases contention on the wait_lock which is already hotly contended within ZFS (dnode_hold_impl() and friends).

The autoconf test will likely have to incorporate some sort of test involving CONFIG_RWSEM_GENERIC_SPINLOCK in order to make a correct decision.

@dweeezil
Copy link
Contributor Author

dweeezil commented Apr 6, 2015

I originally came across this when investigating openzfs/zfs#3091 but also noticed the effects during recent zfs stress testing.

@behlendorf
Copy link
Contributor

This may be something we can now safely drop entirely. The only reason it was added was due to a flaw in the kernel's rwsem_is_locked() implementation. That was fixed as of 2.6.33 and I know that the fix was backported to RHEL6. That would just leave SLES as a supported platform running a kernel that old and I'd hope they have the fix as well. Newer kernels don't suffer this flaw. If this patch is leading it excessive lock contention we should seriously consider dropping it.

@dweeezil
Copy link
Contributor Author

dweeezil commented Apr 7, 2015

I was wondering about that and was unclear as to how old of a kernel we needed to support. I'll try to come up with some numbers as to how much it contributes to contention. IIRC, wait_lock was still generally the first-place contender manually working around the problem but it wasn't as bad as otherwise. There are only certain workloads in which wait_lock becomes a major pinch point but when it does, it's a pretty bad one.

@behlendorf
Copy link
Contributor

Closing. This code was retired in commit 86c16c5.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants