New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORecordId - hashCode function collisions #3903
Comments
We have com.orientechnologies.orient.core.index.hashindex.local.OMurmurHash3HashFunction hash function do not you mind to implement it as source of hash code and test how performance was changed ? |
@laa do you mean implementing ORecordId.hashCode() using Murmur algorithm? If I'm right, +1 |
yes, but I will really appreciate if @swimmesberger will try himself if he is interested of course. |
final OMurmurHash3HashFunction<OIdentifiable> hasher = new OMurmurHash3HashFunction<OIdentifiable>();
hasher.setValueSerializer(new OStreamSerializerRID());
ORecordId rec = new ORecordId(12, 31);
long hash = hasher.hashCode(rec);
ORecordId rec1 = new ORecordId(13, 0);
long hash1 = hasher.hashCode(rec1); Seems to work better (no collisions so far) but not really a dropin replacement for hashCode because hashCode returns an integer value and "OHashFunction hashCode(V value)" returns a long value. The OMurmurHash3HashFunction is indeed "much" slower than the normal hashCode function but for my usecase the performance drop is justifiable. In my fast non optimized test 10.000 calls to OMurmurHash3HashFunction hashCode function takes ~52ms and using the current hashCode function with 10.000 calls takes ~8ms. |
Any idea to improve current |
No idea but I just wanted to say that we can't really rely on ORecordID.hashCode() for the reason stated above so using ORecordID objects in HashMaps for example could be pretty dangerous. |
Hi, |
I don't think 37 could brings any advantages. Do you have any data how this number reduced your collision rate? |
I made a small test:
and the results are impressive: |
Impressive! Why 103? |
I did some testing with bigger prime numbers and 103 worked well :) |
Sounds very good! :) Thanks for the test! I didn't really had time to do some testing with different hashcode functions. |
You´re welcome |
Tested also with random RIDs and the more the RIDs are lower, the best result you have. Actually in 99% of cases this is what happen: rids are incremental. So definitely +1. @doppelrittberger Seems you found the magic number (103). Thanks ;-) |
Guys, new hash will fail if you will test more than just 100:) Simple example:
I'm not expect in cipher, but you will definitely need in XOR : affine transformations will not definitely work. |
And, btw, there is no requirement in java to have different hashcodes for different objects.
The following (autogenerated in eclipse) function much better than proposed:
|
Nothing like what you can up with Ilia, but we were looking at using this very small library for hash Ids. We are thinking more on the lines of URL ids too, which I know isn't really part of this discussion or issue. The cool thing with this hashing algorythm is, you can insert an rid and get a nice URL friendly hash Id. And, when using salt (the third input), the rid could be tokenized to an extent. Pretty sure this isn't helpful and sorry if it isn't. But still, I thought I'd throw it into the conversation. Scott |
@PhantomYdn: you´re right but since in a typical usecase the clusterId is much smaller than the clusterPosition this simple "trick" can prevent hash collisions. And even you´re also right that hash collisions are no real issue in Java, they might impact the performance of hash based sets and maps (like analyzed here: http://www.nurkiewicz.com/2014/04/hashmap-performance-improvements-in.html). So my suggestion was just to multiple the clusterId with a bigger prime to avoid collisions. The current version simply does 31*clusterId+clusterPosition and this generates much more collisions. In addition taking the first 32 bit of a long value just cuts off the sign and causes additional collisions (since every new record gets a negative clusterPosition) -> 1:1 collides with 1:-1 and so on...
This code prints: |
The hashCode function of the ORecordId class seems to not work properly:
Both hash and hash1 returns "403" as integer value.
Version: 2.0.5
The text was updated successfully, but these errors were encountered: