Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Use a sane multiplier in @EqualsAndHashCode #660
For whatever crazy reason, Java uses in many places the "proven multiplier" 31 in the hashCode computation. This leads to more collisions than necessary, partly because of the unfortunate number, but mainly because of re-using this number in different places.
My attached example shows collisions due to using 31 in String, List, and lombok-generated hashCode. For the 256 generated objects, there are only 64 unique hashes. If lombok used a better multiplier, this number would increase to 144.
Here, nearly any odd number would do. Using 31 is one of the few bad choices.
Such a shame that THIS issue isn't itself a prime number, or I would have used that. Should have waited until 631 ;P
There are some caveats to just changing the implementation on a dime, but, you made your case quite well and we gotta go through this pain at some point, right?
Crossing my fingers that this won't break stuff, I've committed to a randomly chosen shortish prime number (277 is the lucky winner). While I was there, I also updated the primes used to stuff booleans into hashCode (they, too, were directly copied from Effective Java, which is where the '31' comes from and probably explains why its use is so widespread). Instead of 1231 and 1237, now we use 2591 and 2609.
I don't think there's a need to spin out an edge release for this update, but, we'll release soon, so, coming to a maven repo near you soon :P
Surprisingly, there's a tiny speed penalty for using such a "huge" multiplier as 227. This penalty is 1-2% in the case of integer pair (which is I guess the maximum as it minimizes the other computations). I'd bet that in any realistic scenario, this is more than offset by the reduction of hash collisions, but I'd suggest to change the multiplier again to avoid this penalty.
An amd86 CPU can encode an immediate operand in a single byte if it fits, otherwise it uses as many bytes as the instruction works with. Since the byte is signed, 227 doesn't fit in, so four bytes instead of one get used, which may make the instruction fetch to a bottleneck. I was curious if this can be actually measured in Java, and it looks like it can and the results are fairly consistent:
I'd suggest to use the biggest prime fitting in a signed byte, i.e., 113.