In place comparison of serializable values. #1826

Open
laa opened this Issue Nov 15, 2013 · 19 comments

Comments

Projects
None yet
4 participants
Member

laa commented Nov 15, 2013

Now when we compare keys in indexes we do following:

  1. deserialize key.
  2. compare it with other keys in bucket.

String keys can be relatively big but to compare them we often need very small piece of them. So following is proposed:

  1. Serialize key passed in to compare.
  2. Provide compare method for serializers so we will compare serialized binary key and one stored in direct memory it will save as sensible amount of operations.
Owner

lvca commented Nov 15, 2013

This idea is VERY VERY cool!

lvca was assigned Nov 15, 2013

Member

laa commented Nov 16, 2013

Forgot to mention if we read bytes one by one to compare from direct memory it will not be fast but we can use memcmp which is standard C library which exist on all platforms like tens years already, this approach will provide good speed. The main bottleneck now for indexes is serialization speed. It should be as fast as possible.

Owner

lvca commented Nov 16, 2013

Is memcmp available as JNA API?

Member

laa commented Nov 16, 2013

It is one line of code.

On Sat, Nov 16, 2013 at 9:49 AM, Luca Garulli notifications@github.comwrote:

Is memcmp available as JNA API?


Reply to this email directly or view it on GitHubhttps://github.com/orientechnologies/orientdb/issues/1826#issuecomment-28621758
.

Best regards,
Andrey Lomakin.

Orient Technologies
the Company behind OrientDB

Member

laa commented Nov 16, 2013

Actually 4 lines.

public static native int memcmp(long str1, long str2, long len);

static {
Native.register(Platform.C_LIBRARY_NAME);
}

Owner

lvca commented Nov 16, 2013

Cool

@ghost ghost assigned logart and laa Nov 19, 2013

Member

laa commented Nov 20, 2013

Sorry guys no luck here call overhead eats everything. Actually I have doubts when created this issue too (spent several days thinking whether to create this issue or not ))) ) May be will come up with more nice idea about in place comparison later. Good that I did benchmark on prototype.

laa closed this Nov 20, 2013

Owner

lvca commented Nov 20, 2013

Si it's cheaper to call the Java comparison than C ? What's the cost? However you could avoid serialization and do the same using Java, couldn't it?.

Member

laa commented Nov 20, 2013

It is very slow if you compare data byte by byte it is 1/4 slower, if you do in place comparison we need to copy array do native call and as result we have the same performance.

Member

laa commented Nov 20, 2013

Actually you are right, binary comparison in Java may be faster. I will reopen it and try after #856 . It may be faster. Need to try it.

laa reopened this Nov 20, 2013

Member

laa commented Nov 20, 2013

It matters of 3 hours to test it. not a lot of time.

Owner

lvca commented Nov 20, 2013

Good!

Owner

lvca commented Aug 28, 2014

We supported this in new binary serialization. Where was you referring in the code for this issue?

lvca added the enhancement label Aug 28, 2014

Member

laa commented Aug 29, 2014

do not think that we refer to the same approach, do you mean that comparison string lets say "adf" in binary form with "sdf" will have return the same comparison result , as if they are compared in lexicographical order, do not you ?

Owner

lvca commented Aug 29, 2014

When you need to unmarshall a field, it's unmarshalled the field name and passed to the compare, but it does byte-per-byte comparison avoiding to unmarshall the entire key before the comparison.

Member

laa commented Aug 29, 2014

It is related to indexes, but any way, it can be reused.
Does your answer mean that any serialized primitive value can be compared
byte by byte and we can use this comparison result to sort serialized
values in lexicographical order ?

On Fri, Aug 29, 2014 at 3:15 PM, Luca Garulli notifications@github.com
wrote:

When you need to unmarshall a field, it's unmarshalled the field name and
passed to the compare, but it does byte-per-byte comparison avoiding to
unmarshall the entire key before the comparison.


Reply to this email directly or view it on GitHub
#1826 (comment)
.

Best regards,
Andrey Lomakin.

Orient Technologies
the Company behind OrientDB

Owner

lvca commented Aug 29, 2014

@tglman started this, he's better to answer.

Member

tglman commented Aug 30, 2014

All dipends on the way data is serialized, so depends also on the types!
in the binary serialization for example Strings are serialized as UTF-8, and as far as i know the UTF-8 encoding support the byte comparison, i think also the Numeric value written as VarInt can be compared in a byte to byte way, we should check that for all the types, for implement this!

If you know already some issue for this let's take note :)

@lvca lvca modified the milestone: 2.1, 2.0rc1 Sep 2, 2014

@lvca lvca modified the milestone: 2.2, 2.1 Jan 31, 2015

@laa laa modified the milestone: 3.0, 2.2 Nov 18, 2015

Member

laa commented Apr 6, 2016

Optional for 3.0

laa removed this from the 3.0 milestone Apr 6, 2016

@laa laa added storage team and removed storage team labels Apr 12, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment