Migrated from Google Code (issue 625)
👤 Maaartinus 🕗 Jan 09, 2014 at 16:09 UTC
For whatever crazy reason, Java uses in many places the "proven multiplier" 31 in the hashCode computation. This leads to more collisions than necessary, partly because of the unfortunate number, but mainly because of re-using this number in different places.
My attached example shows collisions due to using 31 in String, List, and lombok-generated hashCode. For the 256 generated objects, there are only 64 unique hashes. If lombok used a better multiplier, this number would increase to 144.
Here, nearly any odd number would do. Using 31 is one of the few bad choices.
👤 reinierz 🕗 Jan 09, 2014 at 20:27 UTC
Such a shame that THIS issue isn't itself a prime number, or I would have used that. Should have waited until 631 ;P
There are some caveats to just changing the implementation on a dime, but, you made your case quite well and we gotta go through this pain at some point, right?
Crossing my fingers that this won't break stuff, I've committed to a randomly chosen shortish prime number (277 is the lucky winner). While I was there, I also updated the primes used to stuff booleans into hashCode (they, too, were directly copied from Effective Java, which is where the '31' comes from and probably explains why its use is so widespread). Instead of 1231 and 1237, now we use 2591 and 2609.
I don't think there's a need to spin out an edge release for this update, but, we'll release soon, so, coming to a maven repo near you soon :P
👤 reinierz 🕗 Jan 15, 2014 at 21:26 UTC
This fix is now part of the latest official release. 1.12.4 was just pushed for download and to maven central.
👤 Maaartinus 🕗 Jan 15, 2014 at 23:23 UTC
Surprisingly, there's a tiny speed penalty for using such a "huge" multiplier as 227. This penalty is 1-2% in the case of integer pair (which is I guess the maximum as it minimizes the other computations). I'd bet that in any realistic scenario, this is more than offset by the reduction of hash collisions, but I'd suggest to change the multiplier again to avoid this penalty.
An amd86 CPU can encode an immediate operand in a single byte if it fits, otherwise it uses as many bytes as the instruction works with. Since the byte is signed, 227 doesn't fit in, so four bytes instead of one get used, which may make the instruction fetch to a bottleneck. I was curious if this can be actually measured in Java, and it looks like it can and the results are fairly consistent:
I'd suggest to use the biggest prime fitting in a signed byte, i.e., 113.
👤 r.spilker 🕗 Jan 19, 2014 at 20:53 UTC
We now chose, using pseudo random selection, 59 for the PRIME and 79 and 97 for boolean true and false. It is available in the edge release: https://projectlombok.org/download-edge.html
End of migration