New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Copying mutable.HashSet (FlatHashTable) 100x slower than java.util.HashSet. #5293

Closed
scabug opened this Issue Dec 8, 2011 · 8 comments

Comments

Projects
None yet
2 participants
@scabug
Copy link

scabug commented Dec 8, 2011

Short description: Copying mutable.HashSet (adding all elements of a mutable.HashSet to an empty mutable.HashSet) is very slow.
Cause: Excessive collisions results from the characteristic of the FlatHashTable#index method and lack of the sizeHint method.
Possible fix: Modify FlatHashTable#index method and/or implement sizeHint method.

Detailed Description:

Running attached benchmark script, copying mutable.HashSet is 100x slower than java.util.HashSet.
This only occurs when adding all elements of a mutable.HashSet to an empty HashSet.
If we sort the values before adding to a HashSet, it runs fast.

After careful inspection, I found that:

  • HashSet is implemented using FlatHashTable, an open addressing hash table with linear probing.
  • HashSet nor FlatHashTable implements sizeHint, so that the hash table grows as elements are added.
  • FlatHashTable#index, the method computing the slot index, uses the upper bits of improved hash code.
  • The smaller hash table, the fewer bits are used.
  • The foreach method enumerates the elements in FlatHashTable#index order.
  • Consequently, the higher bits of improved hash codes of successive elements are almost same.
  • This results in higher collision rate until the table grow up enough.

The FlatHashTable#index method is modified at 2.9.x.
At 2.8.x, it uses the lower bits of improved hash code, so that the problem does not occurred.

A possible fix is to use the lower bits of improved hash code, but I am not sure about other impacts on characteristics of the method.
Another possible fix is to implement sizeHint method and call it from ++= and other methods.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented Dec 8, 2011

Imported From: https://issues.scala-lang.org/browse/SI-5293?orig=1
Reporter: takuo
Affected Versions: 2.9.0, 2.9.0-1, 2.9.1
Attachments:

  • bench.scala (created on Dec 8, 2011 12:41:26 PM UTC, 1091 bytes)
@scabug

This comment has been minimized.

Copy link
Author

scabug commented Dec 8, 2011

@dcsobral said:
Contrary to what the subject implies, HashMap uses HashTable, not FlatHashTable.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented Dec 8, 2011

@dcsobral said:
The decision to use the upper bits on HashTable came from this commit:

commit 2f7197c50b31386118a660833828ea720bf9d239
Author: Aleksandar Pokopec <aleksandar.prokopec@epfl.ch>
Date:   Wed Oct 20 20:19:48 2010 +0000

    Changed hash code strategy in hash table - now ...
    
    Changed hash code strategy in hash table - now taking most significant bits when transforming the hash code into an index.

And HashTable also has an explanation of why:

  // Note:                                                                                                      
  // we take the most significant bits of the hashcode, not the lower ones                                      
  // this is of crucial importance when populating the table in parallel   

I'm surprised mutable.HashTable is used at all on the parallel collections.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented Dec 8, 2011

takuo said:
bq. Contrary to what the subject implies, HashMap uses HashTable, not FlatHashTable.

I have misunderstood that, so that I corrected the summary.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented Dec 14, 2011

@axel22 said:
You're right. Good point, thanks.

I would be in favour of implementing sizeHint, but that is only a temporary solution, since the problem is that the hash table iteration order would still expose the vulnerabilities. For instance, if someone created an empty hash table and inserted the elements into it one by one manually (having transparent growing behind the scene), the same effect would happen again.
But it does solve the problems with the ++=.

Parallel flat hash table on the other hand requires that the elements are divided into groups according to certain bits in the hash code (and it's unclear which bits this should be until the size of the hash table is known, so the choice has to be agnostic in the size of the resulting hash table). Second, elements in the same groups have to end up in the same contiguous block of the resulting hash table, so that these blocks can be populated in parallel by different processors without synchronization.

The problem with that is that traversing the elements in a cache-friendly way seems to always result in the elements with the same hashcode prefix being traversed together.

The solution is not yet obvious to me, but I think something can be done with segmenting the hashcode into 2 parts - by taking first 5 upper bits and the rest of the necessary bits from the bottom. I have to think about this a little bit, before I proceed.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented Dec 16, 2011

@axel22 said:
Perhaps a more sensible solution would be to add a hash-table instance specific value, and always XOR the hashcode with that value internally before computing the index.

Since this value would be hash-table instance specific (could also be recomputed upon table growth), that would solve the above problem.

The only issue is producing the XOR value - having a global random generator might kill scalability. A thread local random generator might help here.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented Dec 16, 2011

takuo said:
Thank you for responding my sudden request.

XORing with random values seems good.
Instead of using a random value, we could choose the value that makes the improved hash code of the element added first time to be 0xc0000000 (or some higher value):

protected var mask: Int = _

def addEntry(elem: A) : Boolean = {
  if (tableSize == 0) {
    mask = ...
    // improve(mask ^ elem.hashCode) == 0xc0000000
  }

  ...
}

This guarantee the element to be placed at the third quater of the table, ensureing the value changes every time.

@scabug

This comment has been minimized.

Copy link
Author

scabug commented Dec 16, 2011

@axel22 said:
You're welcome.
There are a couple of ways to go at this. I've tried XORing, but simple XORing with some table-specific value will not work - the same effect will happen again because the correlated bits are all mapped together. Instead, one must ensure that information from nonused bits are propagated into bits which are correlated (same) due to traversal order.
What I'm saying is that, rather than XORing, one should rotate.

The random value based rotating seems to solve the problem, and it doesn't seem to affect performance.

To avoid random values, one could compute this constant from the size of the table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment