Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In place comparison of serializable values. #1826

Closed
andrii0lomakin opened this issue Nov 15, 2013 · 20 comments
Closed

In place comparison of serializable values. #1826

andrii0lomakin opened this issue Nov 15, 2013 · 20 comments
Assignees

Comments

@andrii0lomakin
Copy link
Member

Now when we compare keys in indexes we do following:

  1. deserialize key.
  2. compare it with other keys in bucket.

String keys can be relatively big but to compare them we often need very small piece of them. So following is proposed:

  1. Serialize key passed in to compare.
  2. Provide compare method for serializers so we will compare serialized binary key and one stored in direct memory it will save as sensible amount of operations.
@lvca
Copy link
Member

lvca commented Nov 15, 2013

This idea is VERY VERY cool!

@ghost ghost assigned lvca Nov 15, 2013
@andrii0lomakin
Copy link
Member Author

Forgot to mention if we read bytes one by one to compare from direct memory it will not be fast but we can use memcmp which is standard C library which exist on all platforms like tens years already, this approach will provide good speed. The main bottleneck now for indexes is serialization speed. It should be as fast as possible.

@lvca
Copy link
Member

lvca commented Nov 16, 2013

Is memcmp available as JNA API?

@andrii0lomakin
Copy link
Member Author

It is one line of code.

On Sat, Nov 16, 2013 at 9:49 AM, Luca Garulli notifications@github.comwrote:

Is memcmp available as JNA API?


Reply to this email directly or view it on GitHubhttps://github.com//issues/1826#issuecomment-28621758
.

Best regards,
Andrey Lomakin.

Orient Technologies
the Company behind OrientDB

@andrii0lomakin
Copy link
Member Author

Actually 4 lines.

public static native int memcmp(long str1, long str2, long len);

static {
Native.register(Platform.C_LIBRARY_NAME);
}

@lvca
Copy link
Member

lvca commented Nov 16, 2013

Cool

@ghost ghost assigned logart and andrii0lomakin Nov 19, 2013
@andrii0lomakin
Copy link
Member Author

Sorry guys no luck here call overhead eats everything. Actually I have doubts when created this issue too (spent several days thinking whether to create this issue or not ))) ) May be will come up with more nice idea about in place comparison later. Good that I did benchmark on prototype.

@lvca
Copy link
Member

lvca commented Nov 20, 2013

Si it's cheaper to call the Java comparison than C ? What's the cost? However you could avoid serialization and do the same using Java, couldn't it?.

@andrii0lomakin
Copy link
Member Author

It is very slow if you compare data byte by byte it is 1/4 slower, if you do in place comparison we need to copy array do native call and as result we have the same performance.

@andrii0lomakin
Copy link
Member Author

Actually you are right, binary comparison in Java may be faster. I will reopen it and try after #856 . It may be faster. Need to try it.

@andrii0lomakin
Copy link
Member Author

It matters of 3 hours to test it. not a lot of time.

@lvca
Copy link
Member

lvca commented Nov 20, 2013

Good!

@lvca
Copy link
Member

lvca commented Aug 28, 2014

We supported this in new binary serialization. Where was you referring in the code for this issue?

@andrii0lomakin
Copy link
Member Author

do not think that we refer to the same approach, do you mean that comparison string lets say "adf" in binary form with "sdf" will have return the same comparison result , as if they are compared in lexicographical order, do not you ?

@lvca
Copy link
Member

lvca commented Aug 29, 2014

When you need to unmarshall a field, it's unmarshalled the field name and passed to the compare, but it does byte-per-byte comparison avoiding to unmarshall the entire key before the comparison.

@andrii0lomakin
Copy link
Member Author

It is related to indexes, but any way, it can be reused.
Does your answer mean that any serialized primitive value can be compared
byte by byte and we can use this comparison result to sort serialized
values in lexicographical order ?

On Fri, Aug 29, 2014 at 3:15 PM, Luca Garulli notifications@github.com
wrote:

When you need to unmarshall a field, it's unmarshalled the field name and
passed to the compare, but it does byte-per-byte comparison avoiding to
unmarshall the entire key before the comparison.


Reply to this email directly or view it on GitHub
#1826 (comment)
.

Best regards,
Andrey Lomakin.

Orient Technologies
the Company behind OrientDB

@lvca
Copy link
Member

lvca commented Aug 29, 2014

@tglman started this, he's better to answer.

@tglman
Copy link
Member

tglman commented Aug 30, 2014

All dipends on the way data is serialized, so depends also on the types!
in the binary serialization for example Strings are serialized as UTF-8, and as far as i know the UTF-8 encoding support the byte comparison, i think also the Numeric value written as VarInt can be compared in a byte to byte way, we should check that for all the types, for implement this!

If you know already some issue for this let's take note :)

@lvca lvca modified the milestones: 2.1, 2.0rc1 Sep 2, 2014
@lvca lvca modified the milestones: 2.2, 2.1 Jan 31, 2015
@andrii0lomakin andrii0lomakin modified the milestones: 3.0, 2.2 Nov 18, 2015
@andrii0lomakin
Copy link
Member Author

Optional for 3.0

@andrii0lomakin andrii0lomakin removed this from the 3.0 milestone Apr 6, 2016
@lvca lvca added this to the 3.1 milestone Aug 3, 2017
@andrii0lomakin andrii0lomakin modified the milestone: 3.1 Aug 26, 2017
@andrii0lomakin
Copy link
Member Author

Not actual any more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants