Description
Short description: Copying mutable.HashSet (adding all elements of a mutable.HashSet to an empty mutable.HashSet) is very slow.
Cause: Excessive collisions results from the characteristic of the FlatHashTable#index method and lack of the sizeHint method.
Possible fix: Modify FlatHashTable#index method and/or implement sizeHint method.
Detailed Description:
Running attached benchmark script, copying mutable.HashSet is 100x slower than java.util.HashSet.
This only occurs when adding all elements of a mutable.HashSet to an empty HashSet.
If we sort the values before adding to a HashSet, it runs fast.
After careful inspection, I found that:
- HashSet is implemented using FlatHashTable, an open addressing hash table with linear probing.
- HashSet nor FlatHashTable implements sizeHint, so that the hash table grows as elements are added.
- FlatHashTable#index, the method computing the slot index, uses the upper bits of improved hash code.
- The smaller hash table, the fewer bits are used.
- The foreach method enumerates the elements in FlatHashTable#index order.
- Consequently, the higher bits of improved hash codes of successive elements are almost same.
- This results in higher collision rate until the table grow up enough.
The FlatHashTable#index method is modified at 2.9.x.
At 2.8.x, it uses the lower bits of improved hash code, so that the problem does not occurred.
A possible fix is to use the lower bits of improved hash code, but I am not sure about other impacts on characteristics of the method.
Another possible fix is to implement sizeHint method and call it from ++= and other methods.