Skip to content

Performance: Copying mutable.HashSet (FlatHashTable) 100x slower than java.util.HashSet. #5293

Closed
@scabug

Description

@scabug

Short description: Copying mutable.HashSet (adding all elements of a mutable.HashSet to an empty mutable.HashSet) is very slow.
Cause: Excessive collisions results from the characteristic of the FlatHashTable#index method and lack of the sizeHint method.
Possible fix: Modify FlatHashTable#index method and/or implement sizeHint method.

Detailed Description:

Running attached benchmark script, copying mutable.HashSet is 100x slower than java.util.HashSet.
This only occurs when adding all elements of a mutable.HashSet to an empty HashSet.
If we sort the values before adding to a HashSet, it runs fast.

After careful inspection, I found that:

  • HashSet is implemented using FlatHashTable, an open addressing hash table with linear probing.
  • HashSet nor FlatHashTable implements sizeHint, so that the hash table grows as elements are added.
  • FlatHashTable#index, the method computing the slot index, uses the upper bits of improved hash code.
  • The smaller hash table, the fewer bits are used.
  • The foreach method enumerates the elements in FlatHashTable#index order.
  • Consequently, the higher bits of improved hash codes of successive elements are almost same.
  • This results in higher collision rate until the table grow up enough.

The FlatHashTable#index method is modified at 2.9.x.
At 2.8.x, it uses the lower bits of improved hash code, so that the problem does not occurred.

A possible fix is to use the lower bits of improved hash code, but I am not sure about other impacts on characteristics of the method.
Another possible fix is to implement sizeHint method and call it from ++= and other methods.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions