High hash collision rate for thread id hashing with default callback #4363
The default implementation for threadid uses &errno to generate an ID. The implementation of thread-local variables causes &errno to end up always at the same offset in a 4kb page. In other words, the last 12 bits of all thread IDs are identical. Additionally the &errnos from different threads all end up mapped in the same general region of memory, so they all share the same first ~32 bits as well, the first 16 of which are always 0 (because x86_64 only uses 48-bit addresses).
The implementation of CRYPTO_THREADID_hash just passes through the ID value directly with no hashing. The hash function for the ERR_STATE lhash just multiplies the value by 13 before modulo by the number of buckets. This hash function is very poor given that the low-order bits of all the thread IDs are very similar.
For example, on my system, errno addresses are all of the form 0x7fdc35ce3768 + 0x1000*T (for thread index T). Assuming we have B buckets, the bucket assignment is:
If B is a factor of 0x13000, then this obviously simplifies to 0, meaning that all threads use the same bucket. This is very common since B is assigned to be powers of two, and every power of two less than 2^13 is a factor of 0x13000. So, as long as we have less than 4096 threads, all of the thread infos will fall into the same bucket. When we have more than 4096 threads we start to fall into just a small handful of buckets.
In practice, this means that the various thread-local lookups end up being O(n) in the number of threads rather than O(1). I verified this in a production use case by dumping int_thread_hash on a server with lots of threads:
Note that num_hash_comps is ~898x compared to num_retrieve. 898 is about 50% of 1803, meaning that on average each retrieve is looking at half of the entries in the table before finding a result (as expected for a linear search).
Given this, I think it would be very beneficial to change the hashing for this LHASH to more effectively push randomness to low bits. Even something as simple as multiplying by a large 64-bit prime would be a big benefit, but probably better to use a mix function of some sort (eg see https://gist.github.com/badboy/6267743)
The text was updated successfully, but these errors were encountered: