Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High hash collision rate for thread id hashing with default callback #4363

Open
toddlipcon opened this issue Sep 13, 2017 · 3 comments
Open

High hash collision rate for thread id hashing with default callback #4363

toddlipcon opened this issue Sep 13, 2017 · 3 comments
Milestone

Comments

@toddlipcon
Copy link

@toddlipcon toddlipcon commented Sep 13, 2017

The default implementation for threadid uses &errno to generate an ID. The implementation of thread-local variables causes &errno to end up always at the same offset in a 4kb page. In other words, the last 12 bits of all thread IDs are identical. Additionally the &errnos from different threads all end up mapped in the same general region of memory, so they all share the same first ~32 bits as well, the first 16 of which are always 0 (because x86_64 only uses 48-bit addresses).

The implementation of CRYPTO_THREADID_hash just passes through the ID value directly with no hashing. The hash function for the ERR_STATE lhash just multiplies the value by 13 before modulo by the number of buckets. This hash function is very poor given that the low-order bits of all the thread IDs are very similar.

For example, on my system, errno addresses are all of the form 0x7fdc35ce3768 + 0x1000*T (for thread index T). Assuming we have B buckets, the bucket assignment is:

bucket = ((0x7fdc35ce3768 + 0x1000*T)*13) % B
= (0x7fdc35ce3768*13 + 0x13000*T) % B
The constant on the left doesn't matter in terms of uniqueness (it just rotates the bucket assignment) so this is equivalent to:
bucket = (0x13000*T) % B
modular multiplication is associative, so:
bucket = ((0x13000%B) * (T%B))%B

If B is a factor of 0x13000, then this obviously simplifies to 0, meaning that all threads use the same bucket. This is very common since B is assigned to be powers of two, and every power of two less than 2^13 is a factor of 0x13000. So, as long as we have less than 4096 threads, all of the thread infos will fall into the same bucket. When we have more than 4096 threads we start to fall into just a small handful of buckets.

In practice, this means that the various thread-local lookups end up being O(n) in the number of threads rather than O(1). I verified this in a production use case by dumping int_thread_hash on a server with lots of threads:

{
  b = 0x2b520e000, 
  comp = 0x7f8f9c38bf50 <err_state_LHASH_COMP>, 
  hash = 0x7f8f9c38bf60 <err_state_LHASH_HASH>, 
  num_nodes = 902,  <-- 902 buckets in the hashtable
  num_alloc_nodes = 1024, 
  p = 390, 
  pmax = 512, 
  up_load = 512, 
  down_load = 256, 
  num_items = 1803, <-- 1803 elements (threads) in the table
  num_expands = 894, 
  num_expand_reallocs = 6, 
  num_contracts = 0, 
  num_contract_reallocs = 0, 
  num_hash_calls = 4951064256, 
  num_comp_calls = 4950645393, 
  num_insert = 209432, 
  num_replace = 0, 
  num_delete = 207629, 
  num_no_delete = 0, 
  num_retrieve = 4950437764, 
  num_retrieve_miss = 209431, 
  num_hash_comps = 4449801693807, 
  error = 0
}

Note that num_hash_comps is ~898x compared to num_retrieve. 898 is about 50% of 1803, meaning that on average each retrieve is looking at half of the entries in the table before finding a result (as expected for a linear search).

Given this, I think it would be very beneficial to change the hashing for this LHASH to more effectively push randomness to low bits. Even something as simple as multiplying by a large 64-bit prime would be a big benefit, but probably better to use a mix function of some sort (eg see https://gist.github.com/badboy/6267743)

@toddlipcon
Copy link
Author

@toddlipcon toddlipcon commented Sep 13, 2017

This only affects OpenSSL 1.0.2 and earlier, since the threadlocal stuff is reimplemented in 1.1 afaik.

@richsalz
Copy link
Contributor

@richsalz richsalz commented Sep 14, 2017

Yes in 1.1.x we use true thread-local storage.
You can work around this by defining your own id callback for now.

@paulidale
Copy link
Contributor

@paulidale paulidale commented Sep 14, 2017

The lhash structure is showing its age -- it was a neat piece of research when it was created but the state of the art has progressed.

On my list of things to get around to looking at eventually.

@mattcaswell mattcaswell added this to the 1.0.2 milestone Jan 22, 2018
@richsalz richsalz modified the milestones: 1.0.2, Other May 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants