-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new internal hashtable implementation #23671
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Should there be a multithreaded test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget to submit my comments, yesterday ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few more comments
8d92fb5
to
3554c03
Compare
@t8m in your question regarding a multithreaded test, There probably should be, yes, but my thought was that would be covered by the performance test suite, since it is inherently multithreaded (when we get the performance monitoring CI work completed). I can add one here if you prefer though. |
3554c03
to
92ccffb
Compare
I do not think performance tests replace the need for a regular multithreaded testcase although they would be multithreaded. They are potentially going to be run at different times, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
This pull request is ready to merge |
Merged to master. |
@paulidale it does not look like this was merged. |
@t8m, looks like its merged to me:
|
github has fallen behind GHE - so it has been merged but something has broken with the mirroring to github. Anyway - closing this since the merge was done. |
This is unfortunate, but seems necessecary tsan in gcc/clang tracks data races by recording memory references made while various locks are held. If it finds that a given address is read/written while under lock (or under no locks without the use of atomics), it issues a warning this creates a specific problem for rcu, because on the write side of a critical section, we write data under the protection of a lock, but by definition the read side has no lock, and so rcu warns us about it, which is really a false positive, because we know that, even if a pointer changes its value, the data it points to will be valid. The best way to fix it, short of implementing tsan hooks for rcu locks in any thread sanitizer in the field, is to 'fake it'. If thread sanitization is activated, then in ossl_rcu_write_[lock|unlock] we add annotations to make the sanitizer think that, after the write lock is taken, that we immediately unlock it, and lock it right before we unlock it again. In this way tsan thinks there are no locks held while referencing protected data on the read or write side. we still need to use atomics to ensure that tsan recognizes that we are doing atomic accesses safely, but thats ok, and we still get warnings if we don't do that properly Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #23671)
The ossl_rcu_call function for windows creates a linked list loop. fix it to work like the pthread version properly Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #23671)
Generally we can get away with just using CRYPTO_atomic_load to do stores by reversing the source and target variables, but doing so creates a problem for the thread sanitizer as CRYPTO_atomic_load hard codes an __ATOMIC_ACQUIRE constraint, which confuses tsan into thinking that loads and stores aren't properly ordered, leading to RAW/WAR hazzards getting reported. Instead create a CRYPTO_atomic_store api that is identical to the load variant, save for the fact that the value is a unit64_t rather than a pointer that gets stored using an __ATOMIC_RELEASE constraint, satisfying tsan. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #23671)
Create a new hashtable that is more efficient than the existing LHASH_OF implementation. the new ossl_ht api offers several new features that improve performance opportunistically * A more generalized hash function. Currently using fnv1a, provides a more general hash function, but can still be overridden where needed * Improved locking and reference counting. This hash table is internally locked with an RCU lock, and optionally reference counts elements, allowing for users to not have to create and manage their own read/write locks * Lockless operation. The hash table can be configured to operate locklessly on the read side, improving performance, at the sacrifice of the ability to grow the hash table or delete elements from it * A filter function allowing for the retrieval of several elements at a time matching a given criteria without having to hold a lock permanently * a doall_until iterator variant, that allows callers which need to iterate over the entire hash table until a given condition is met (as defined by the return value of the iterator callback). This allows for callers attempting to do expensive cache searches for a small number of elements to terminate the iteration early, saving cpu cycles * Dynamic type safety. The hash table provides operations to set and get data of a specific type without having to define a type at the instatiation point * Multiple data type storage. The hash table can store multiple data types allowing for more flexible usage * Ubsan safety. Because the API deals with concrete single types (HT_KEY and HT_VALUE), leaving specific type casting to the call recipient with dynamic type validation, this implementation is safe from the ubsan undefined behavior warnings that require additional thunking on callbacks. Testing of this new hashtable with an equivalent hash function, I can observe approximately a 6% performance improvement in the lhash_test Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #23671)
Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #23671)
Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #23671)
Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #23671)
This is unfortunate, but seems necessecary tsan in gcc/clang tracks data races by recording memory references made while various locks are held. If it finds that a given address is read/written while under lock (or under no locks without the use of atomics), it issues a warning this creates a specific problem for rcu, because on the write side of a critical section, we write data under the protection of a lock, but by definition the read side has no lock, and so rcu warns us about it, which is really a false positive, because we know that, even if a pointer changes its value, the data it points to will be valid. The best way to fix it, short of implementing tsan hooks for rcu locks in any thread sanitizer in the field, is to 'fake it'. If thread sanitization is activated, then in ossl_rcu_write_[lock|unlock] we add annotations to make the sanitizer think that, after the write lock is taken, that we immediately unlock it, and lock it right before we unlock it again. In this way tsan thinks there are no locks held while referencing protected data on the read or write side. we still need to use atomics to ensure that tsan recognizes that we are doing atomic accesses safely, but thats ok, and we still get warnings if we don't do that properly Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from openssl#23671)
The ossl_rcu_call function for windows creates a linked list loop. fix it to work like the pthread version properly Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from openssl#23671)
Generally we can get away with just using CRYPTO_atomic_load to do stores by reversing the source and target variables, but doing so creates a problem for the thread sanitizer as CRYPTO_atomic_load hard codes an __ATOMIC_ACQUIRE constraint, which confuses tsan into thinking that loads and stores aren't properly ordered, leading to RAW/WAR hazzards getting reported. Instead create a CRYPTO_atomic_store api that is identical to the load variant, save for the fact that the value is a unit64_t rather than a pointer that gets stored using an __ATOMIC_RELEASE constraint, satisfying tsan. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from openssl#23671)
Create a new hashtable that is more efficient than the existing LHASH_OF implementation. the new ossl_ht api offers several new features that improve performance opportunistically * A more generalized hash function. Currently using fnv1a, provides a more general hash function, but can still be overridden where needed * Improved locking and reference counting. This hash table is internally locked with an RCU lock, and optionally reference counts elements, allowing for users to not have to create and manage their own read/write locks * Lockless operation. The hash table can be configured to operate locklessly on the read side, improving performance, at the sacrifice of the ability to grow the hash table or delete elements from it * A filter function allowing for the retrieval of several elements at a time matching a given criteria without having to hold a lock permanently * a doall_until iterator variant, that allows callers which need to iterate over the entire hash table until a given condition is met (as defined by the return value of the iterator callback). This allows for callers attempting to do expensive cache searches for a small number of elements to terminate the iteration early, saving cpu cycles * Dynamic type safety. The hash table provides operations to set and get data of a specific type without having to define a type at the instatiation point * Multiple data type storage. The hash table can store multiple data types allowing for more flexible usage * Ubsan safety. Because the API deals with concrete single types (HT_KEY and HT_VALUE), leaving specific type casting to the call recipient with dynamic type validation, this implementation is safe from the ubsan undefined behavior warnings that require additional thunking on callbacks. Testing of this new hashtable with an equivalent hash function, I can observe approximately a 6% performance improvement in the lhash_test Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from openssl#23671)
Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from openssl#23671)
Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from openssl#23671)
Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from openssl#23671)
Create a new hashtable that is more efficient than the existing LHASH_OF implementation. the new ossl_ht api offers several new features that improve performance opportunistically
A more generalized hash function. Currently using fnv1a, provides a more general hash function, but can still be overridden where needed
Improved locking and reference counting. This hash table is internally locked with an RCU lock, and optionally reference counts elements, allowing for users to not have to create and manage their own read/write locks
Lockless operation. The hash table can be configured to operate locklessly on the read side, improving performance, at the sacrifice of the ability to grow the hash table or delete elements from it
A filter function allowing for the retrieval of several elements at a time matching a given criteria without having to hold a lock permanently
a doall_until iterator variant, that allows callers which need to iterate over the entire hash table until a given condition is met (as defined by the return value of the iterator callback). This allows for callers attempting to do expensive cache searches for a small number of elements to terminate the iteration early, saving cpu cycles
Dynamic type safety. The hash table provides operations to set and get data of a specific type without having to define a type at the instatiation point
Multiple data type storage. The hash table can store multiple data types allowing for more flexible usage
Ubsan safety. Because the API deals with concrete single types (HT_KEY and HT_VALUE), leaving specific type casting to the call recipient with dynamic type validation, this implementation is safe from the ubsan undefined behavior warnings that require additional thunking on callbacks.
Testing of this new hashtable with an equivalent hash function, I can observe the following run time improvement in the hashtable stress test vs the legacy hashtable stress test, when inserting 2.5 million entries and deleting them in a different order:
Which equates to approximately a %33 improvement
Checklist
Note: This PR is dependent on the inclusion of #24162