Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new internal hashtable implementation #23671

Closed
wants to merge 8 commits into from

Conversation

nhorman
Copy link
Contributor

@nhorman nhorman commented Feb 23, 2024

Create a new hashtable that is more efficient than the existing LHASH_OF implementation. the new ossl_ht api offers several new features that improve performance opportunistically

A more generalized hash function. Currently using fnv1a, provides a more general hash function, but can still be overridden where needed

Improved locking and reference counting. This hash table is internally locked with an RCU lock, and optionally reference counts elements, allowing for users to not have to create and manage their own read/write locks

Lockless operation. The hash table can be configured to operate locklessly on the read side, improving performance, at the sacrifice of the ability to grow the hash table or delete elements from it

A filter function allowing for the retrieval of several elements at a time matching a given criteria without having to hold a lock permanently

a doall_until iterator variant, that allows callers which need to iterate over the entire hash table until a given condition is met (as defined by the return value of the iterator callback). This allows for callers attempting to do expensive cache searches for a small number of elements to terminate the iteration early, saving cpu cycles

Dynamic type safety. The hash table provides operations to set and get data of a specific type without having to define a type at the instatiation point

Multiple data type storage. The hash table can store multiple data types allowing for more flexible usage

Ubsan safety. Because the API deals with concrete single types (HT_KEY and HT_VALUE), leaving specific type casting to the call recipient with dynamic type validation, this implementation is safe from the ubsan undefined behavior warnings that require additional thunking on callbacks.

Testing of this new hashtable with an equivalent hash function, I can observe the following run time improvement in the hashtable stress test vs the legacy hashtable stress test, when inserting 2.5 million entries and deleting them in a different order:

legacy hash table new hashtable
0.686934 sec 0.455566 sec

Which equates to approximately a %33 improvement

Checklist
  • documentation is added or updated
  • tests are added or updated

Note: This PR is dependent on the inclusion of #24162

@nhorman nhorman self-assigned this Feb 23, 2024
@github-actions github-actions bot added severity: fips change The pull request changes FIPS provider sources severity: ABI change This pull request contains ABI changes labels Feb 23, 2024
@t8m t8m added branch: master Merge to master branch approval: review pending This pull request needs review by a committer approval: otc review pending This pull request needs review by an OTC member tests: present The PR has suitable tests present triaged: performance The issue/pr reports/fixes a performance concern labels Feb 26, 2024
Copy link
Member

@t8m t8m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Should there be a multithreaded test?

crypto/hashtable/hashtable.c Show resolved Hide resolved
crypto/hashtable/hashtable.c Outdated Show resolved Hide resolved
crypto/hashtable/hashtable.c Show resolved Hide resolved
crypto/hashtable/hashtable.c Outdated Show resolved Hide resolved
crypto/hashtable/hashtable.c Show resolved Hide resolved
doc/internal/man3/ossl_ht_new.pod Show resolved Hide resolved
doc/internal/man3/ossl_ht_new.pod Outdated Show resolved Hide resolved
doc/man3/OPENSSL_malloc.pod Outdated Show resolved Hide resolved
doc/man3/OPENSSL_malloc.pod Outdated Show resolved Hide resolved
doc/man3/OPENSSL_malloc.pod Outdated Show resolved Hide resolved
Copy link
Contributor

@FdaSilvaYY FdaSilvaYY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget to submit my comments, yesterday ;)

doc/man3/OPENSSL_malloc.pod Outdated Show resolved Hide resolved
fuzz/hashtable.c Outdated Show resolved Hide resolved
fuzz/hashtable.c Outdated Show resolved Hide resolved
include/internal/hashtable.h Outdated Show resolved Hide resolved
include/internal/hashtable.h Outdated Show resolved Hide resolved
include/internal/hashtable.h Outdated Show resolved Hide resolved
include/internal/hashtable.h Outdated Show resolved Hide resolved
Copy link
Contributor

@FdaSilvaYY FdaSilvaYY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few more comments

crypto/hashtable/hashtable.c Show resolved Hide resolved
crypto/hashtable/hashtable.c Show resolved Hide resolved
crypto/hashtable/hashtable.c Outdated Show resolved Hide resolved
crypto/hashtable/hashtable.c Outdated Show resolved Hide resolved
crypto/hashtable/hashtable.c Show resolved Hide resolved
crypto/hashtable/hashtable.c Outdated Show resolved Hide resolved
crypto/hashtable/hashtable.c Outdated Show resolved Hide resolved
crypto/mem.c Outdated Show resolved Hide resolved
doc/internal/man3/ossl_ht_new.pod Outdated Show resolved Hide resolved
doc/internal/man3/ossl_ht_new.pod Outdated Show resolved Hide resolved
@nhorman nhorman force-pushed the new-hashtable branch 5 times, most recently from 8d92fb5 to 3554c03 Compare February 29, 2024 16:24
@nhorman
Copy link
Contributor Author

nhorman commented Feb 29, 2024

@t8m in your question regarding a multithreaded test, There probably should be, yes, but my thought was that would be covered by the performance test suite, since it is inherently multithreaded (when we get the performance monitoring CI work completed). I can add one here if you prefer though.

@t8m
Copy link
Member

t8m commented Mar 1, 2024

I do not think performance tests replace the need for a regular multithreaded testcase although they would be multithreaded. They are potentially going to be run at different times, etc.

Copy link
Contributor

@paulidale paulidale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@paulidale paulidale added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels Apr 23, 2024
@openssl-machine openssl-machine removed the approval: done This pull request has the required number of approvals label Apr 24, 2024
@openssl-machine
Copy link
Collaborator

This pull request is ready to merge

@openssl-machine openssl-machine added the approval: ready to merge The 24 hour grace period has passed, ready to merge label Apr 24, 2024
@paulidale
Copy link
Contributor

Merged to master.

@paulidale paulidale closed this Apr 24, 2024
@t8m
Copy link
Member

t8m commented Apr 24, 2024

@paulidale it does not look like this was merged.

@t8m t8m reopened this Apr 24, 2024
@nhorman
Copy link
Contributor Author

nhorman commented Apr 24, 2024

@t8m, looks like its merged to me:

commit 0339382abad578ccb3989799ea2fb99dfb2d099b (HEAD -> master, origin/master, origin/HEAD)
Author: Randall S. Becker <randall.becker@nexbridge.ca>
Date:   Fri Apr 19 22:15:10 2024 +0000

    Remove all references to FLOSS for NonStop Builds.
    
    FLOSS is no longer a dependency for NonStop as of the deprecation of the SPT
    thread model builds.
    
    Fixes: #24214
    
    Signed-off-by: Randall S. Becker <randall.becker@nexbridge.ca>
    
    Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
    Reviewed-by: Neil Horman <nhorman@openssl.org>
    Reviewed-by: Tomas Mraz <tomas@openssl.org>
    (Merged from https://github.com/openssl/openssl/pull/24217)

commit ca43171b3c38cd8bcd6de8ec11a3b34751cd5a8b
Author: Neil Horman <nhorman@openssl.org>
Date:   Mon Mar 18 14:32:33 2024 -0400

    updating fuzz-corpora submodule
    
    Reviewed-by: Tomas Mraz <tomas@openssl.org>
    Reviewed-by: Paul Dale <pauli@openssl.org>
    (Merged from https://github.com/openssl/openssl/pull/23671)

@mattcaswell
Copy link
Member

@t8m, looks like its merged to me:

github has fallen behind GHE - so it has been merged but something has broken with the mirroring to github.

Anyway - closing this since the merge was done.

openssl-machine pushed a commit that referenced this pull request Apr 24, 2024
This is unfortunate, but seems necessecary

tsan in gcc/clang tracks data races by recording memory references made
while various locks are held.  If it finds that a given address is
read/written while under lock (or under no locks without the use of
atomics), it issues a warning

this creates a specific problem for rcu, because on the write side of a
critical section, we write data under the protection of a lock, but by
definition the read side has no lock, and so rcu warns us about it,
which is really a false positive, because we know that, even if a
pointer changes its value, the data it points to will be valid.

The best way to fix it, short of implementing tsan hooks for rcu locks
in any thread sanitizer in the field, is to 'fake it'.  If thread
sanitization is activated, then in ossl_rcu_write_[lock|unlock] we add
annotations to make the sanitizer think that, after the write lock is
taken, that we immediately unlock it, and lock it right before we unlock
it again.  In this way tsan thinks there are no locks held while
referencing protected data on the read or write side.

we still need to use atomics to ensure that tsan recognizes that we are
doing atomic accesses safely, but thats ok, and we still get warnings if
we don't do that properly

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from #23671)
openssl-machine pushed a commit that referenced this pull request Apr 24, 2024
The ossl_rcu_call function for windows creates a linked list loop.  fix
it to work like the pthread version properly

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from #23671)
openssl-machine pushed a commit that referenced this pull request Apr 24, 2024
Generally we can get away with just using CRYPTO_atomic_load to do
stores by reversing the source and target variables, but doing so
creates a problem for the thread sanitizer as CRYPTO_atomic_load hard
codes an __ATOMIC_ACQUIRE constraint, which confuses tsan into thinking
that loads and stores aren't properly ordered, leading to RAW/WAR
hazzards getting reported.  Instead create a CRYPTO_atomic_store api
that is identical to the load variant, save for the fact that the value
is a unit64_t rather than a pointer that gets stored using an
__ATOMIC_RELEASE constraint, satisfying tsan.

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from #23671)
openssl-machine pushed a commit that referenced this pull request Apr 24, 2024
Create a new hashtable that is more efficient than the existing LHASH_OF
implementation.  the new ossl_ht api offers several new features that
improve performance opportunistically

* A more generalized hash function.  Currently using fnv1a, provides a
  more general hash function, but can still be overridden where needed

* Improved locking and reference counting.  This hash table is
  internally locked with an RCU lock, and optionally reference counts
  elements, allowing for users to not have to create and manage their
  own read/write locks

* Lockless operation.  The hash table can be configured to operate
  locklessly on the read side, improving performance, at the sacrifice
  of the ability to grow the hash table or delete elements from it

* A filter function allowing for the retrieval of several elements at a
  time matching a given criteria without having to hold a lock
  permanently

* a doall_until iterator variant, that allows callers which need to
  iterate over the entire hash table until a given condition is met (as
  defined by the return value of the iterator callback).  This allows
  for callers attempting to do expensive cache searches for a small
  number of elements to terminate the iteration early, saving cpu cycles

* Dynamic type safety.  The hash table provides operations to set and
  get data of a specific type without having to define a type at the
  instatiation point

* Multiple data type storage.  The hash table can store multiple data
  types allowing for more flexible usage

* Ubsan safety.  Because the API deals with concrete single types
  (HT_KEY and HT_VALUE), leaving specific type casting to the call
  recipient with dynamic type validation, this implementation is safe
  from the ubsan undefined behavior warnings that require additional
  thunking on callbacks.

Testing of this new hashtable with an equivalent hash function, I can
observe approximately a 6% performance improvement in the lhash_test

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from #23671)
openssl-machine pushed a commit that referenced this pull request Apr 24, 2024
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from #23671)
openssl-machine pushed a commit that referenced this pull request Apr 24, 2024
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from #23671)
openssl-machine pushed a commit that referenced this pull request Apr 24, 2024
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from #23671)
jvdsn pushed a commit to jvdsn/openssl that referenced this pull request Jun 3, 2024
This is unfortunate, but seems necessecary

tsan in gcc/clang tracks data races by recording memory references made
while various locks are held.  If it finds that a given address is
read/written while under lock (or under no locks without the use of
atomics), it issues a warning

this creates a specific problem for rcu, because on the write side of a
critical section, we write data under the protection of a lock, but by
definition the read side has no lock, and so rcu warns us about it,
which is really a false positive, because we know that, even if a
pointer changes its value, the data it points to will be valid.

The best way to fix it, short of implementing tsan hooks for rcu locks
in any thread sanitizer in the field, is to 'fake it'.  If thread
sanitization is activated, then in ossl_rcu_write_[lock|unlock] we add
annotations to make the sanitizer think that, after the write lock is
taken, that we immediately unlock it, and lock it right before we unlock
it again.  In this way tsan thinks there are no locks held while
referencing protected data on the read or write side.

we still need to use atomics to ensure that tsan recognizes that we are
doing atomic accesses safely, but thats ok, and we still get warnings if
we don't do that properly

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from openssl#23671)
jvdsn pushed a commit to jvdsn/openssl that referenced this pull request Jun 3, 2024
The ossl_rcu_call function for windows creates a linked list loop.  fix
it to work like the pthread version properly

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from openssl#23671)
jvdsn pushed a commit to jvdsn/openssl that referenced this pull request Jun 3, 2024
Generally we can get away with just using CRYPTO_atomic_load to do
stores by reversing the source and target variables, but doing so
creates a problem for the thread sanitizer as CRYPTO_atomic_load hard
codes an __ATOMIC_ACQUIRE constraint, which confuses tsan into thinking
that loads and stores aren't properly ordered, leading to RAW/WAR
hazzards getting reported.  Instead create a CRYPTO_atomic_store api
that is identical to the load variant, save for the fact that the value
is a unit64_t rather than a pointer that gets stored using an
__ATOMIC_RELEASE constraint, satisfying tsan.

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from openssl#23671)
jvdsn pushed a commit to jvdsn/openssl that referenced this pull request Jun 3, 2024
Create a new hashtable that is more efficient than the existing LHASH_OF
implementation.  the new ossl_ht api offers several new features that
improve performance opportunistically

* A more generalized hash function.  Currently using fnv1a, provides a
  more general hash function, but can still be overridden where needed

* Improved locking and reference counting.  This hash table is
  internally locked with an RCU lock, and optionally reference counts
  elements, allowing for users to not have to create and manage their
  own read/write locks

* Lockless operation.  The hash table can be configured to operate
  locklessly on the read side, improving performance, at the sacrifice
  of the ability to grow the hash table or delete elements from it

* A filter function allowing for the retrieval of several elements at a
  time matching a given criteria without having to hold a lock
  permanently

* a doall_until iterator variant, that allows callers which need to
  iterate over the entire hash table until a given condition is met (as
  defined by the return value of the iterator callback).  This allows
  for callers attempting to do expensive cache searches for a small
  number of elements to terminate the iteration early, saving cpu cycles

* Dynamic type safety.  The hash table provides operations to set and
  get data of a specific type without having to define a type at the
  instatiation point

* Multiple data type storage.  The hash table can store multiple data
  types allowing for more flexible usage

* Ubsan safety.  Because the API deals with concrete single types
  (HT_KEY and HT_VALUE), leaving specific type casting to the call
  recipient with dynamic type validation, this implementation is safe
  from the ubsan undefined behavior warnings that require additional
  thunking on callbacks.

Testing of this new hashtable with an equivalent hash function, I can
observe approximately a 6% performance improvement in the lhash_test

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from openssl#23671)
jvdsn pushed a commit to jvdsn/openssl that referenced this pull request Jun 3, 2024
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from openssl#23671)
jvdsn pushed a commit to jvdsn/openssl that referenced this pull request Jun 3, 2024
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from openssl#23671)
jvdsn pushed a commit to jvdsn/openssl that referenced this pull request Jun 3, 2024
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from openssl#23671)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.4.0 Planned eng view approval: ready to merge The 24 hour grace period has passed, ready to merge branch: master Merge to master branch severity: ABI change This pull request contains ABI changes severity: fips change The pull request changes FIPS provider sources tests: present The PR has suitable tests present triaged: performance The issue/pr reports/fixes a performance concern
Projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.

Investigate high number of calls to ossl_namemap_name2num
9 participants