In place comparison of serializable values. #1826

andrii0lomakin · 2013-11-15T21:10:14Z

Now when we compare keys in indexes we do following:

deserialize key.
compare it with other keys in bucket.

String keys can be relatively big but to compare them we often need very small piece of them. So following is proposed:

Serialize key passed in to compare.
Provide compare method for serializers so we will compare serialized binary key and one stored in direct memory it will save as sensible amount of operations.

lvca · 2013-11-15T22:59:38Z

This idea is VERY VERY cool!

andrii0lomakin · 2013-11-16T07:32:18Z

Forgot to mention if we read bytes one by one to compare from direct memory it will not be fast but we can use memcmp which is standard C library which exist on all platforms like tens years already, this approach will provide good speed. The main bottleneck now for indexes is serialization speed. It should be as fast as possible.

lvca · 2013-11-16T07:49:10Z

Is memcmp available as JNA API?

andrii0lomakin · 2013-11-16T07:52:30Z

It is one line of code.

On Sat, Nov 16, 2013 at 9:49 AM, Luca Garulli notifications@github.comwrote:

Is memcmp available as JNA API?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1826#issuecomment-28621758
.

Best regards,
Andrey Lomakin.

Orient Technologies
the Company behind OrientDB

andrii0lomakin · 2013-11-16T07:54:33Z

Actually 4 lines.

public static native int memcmp(long str1, long str2, long len);

static {
Native.register(Platform.C_LIBRARY_NAME);
}

lvca · 2013-11-16T08:21:11Z

Cool

andrii0lomakin · 2013-11-20T07:57:21Z

Sorry guys no luck here call overhead eats everything. Actually I have doubts when created this issue too (spent several days thinking whether to create this issue or not ))) ) May be will come up with more nice idea about in place comparison later. Good that I did benchmark on prototype.

lvca · 2013-11-20T09:26:51Z

Si it's cheaper to call the Java comparison than C ? What's the cost? However you could avoid serialization and do the same using Java, couldn't it?.

andrii0lomakin · 2013-11-20T09:29:22Z

It is very slow if you compare data byte by byte it is 1/4 slower, if you do in place comparison we need to copy array do native call and as result we have the same performance.

andrii0lomakin · 2013-11-20T09:42:07Z

Actually you are right, binary comparison in Java may be faster. I will reopen it and try after #856 . It may be faster. Need to try it.

andrii0lomakin · 2013-11-20T09:43:25Z

It matters of 3 hours to test it. not a lot of time.

lvca · 2013-11-20T09:44:17Z

Good!

lvca · 2014-08-28T09:47:28Z

We supported this in new binary serialization. Where was you referring in the code for this issue?

andrii0lomakin · 2014-08-29T11:59:42Z

do not think that we refer to the same approach, do you mean that comparison string lets say "adf" in binary form with "sdf" will have return the same comparison result , as if they are compared in lexicographical order, do not you ?

lvca · 2014-08-29T12:15:05Z

When you need to unmarshall a field, it's unmarshalled the field name and passed to the compare, but it does byte-per-byte comparison avoiding to unmarshall the entire key before the comparison.

andrii0lomakin · 2014-08-29T12:18:50Z

It is related to indexes, but any way, it can be reused.
Does your answer mean that any serialized primitive value can be compared
byte by byte and we can use this comparison result to sort serialized
values in lexicographical order ?

On Fri, Aug 29, 2014 at 3:15 PM, Luca Garulli notifications@github.com
wrote:

When you need to unmarshall a field, it's unmarshalled the field name and
passed to the compare, but it does byte-per-byte comparison avoiding to
unmarshall the entire key before the comparison.

—
Reply to this email directly or view it on GitHub
#1826 (comment)
.

Best regards,
Andrey Lomakin.

Orient Technologies
the Company behind OrientDB

lvca · 2014-08-29T12:22:14Z

@tglman started this, he's better to answer.

tglman · 2014-08-30T12:12:57Z

All dipends on the way data is serialized, so depends also on the types!
in the binary serialization for example Strings are serialized as UTF-8, and as far as i know the UTF-8 encoding support the byte comparison, i think also the Numeric value written as VarInt can be compared in a byte to byte way, we should check that for all the types, for implement this!

If you know already some issue for this let's take note :)

andrii0lomakin · 2016-04-06T06:14:05Z

Optional for 3.0

andrii0lomakin · 2019-10-17T07:03:47Z

Not actual any more

ghost assigned lvca Nov 15, 2013

ghost assigned logart and andrii0lomakin Nov 19, 2013

andrii0lomakin closed this as completed Nov 20, 2013

andrii0lomakin reopened this Nov 20, 2013

lvca added the enhancement label Aug 28, 2014

lvca modified the milestones: 2.1, 2.0rc1 Sep 2, 2014

lvca modified the milestones: 2.2, 2.1 Jan 31, 2015

andrii0lomakin modified the milestones: 3.0, 2.2 Nov 18, 2015

andrii0lomakin removed this from the 3.0 milestone Apr 6, 2016

andrii0lomakin added storage team and removed storage team labels Apr 12, 2016

lvca added this to the 3.1 milestone Aug 3, 2017

andrii0lomakin modified the milestone: 3.1 Aug 26, 2017

andrii0lomakin removed the storage team label Sep 30, 2019

andrii0lomakin closed this as completed Oct 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In place comparison of serializable values. #1826

In place comparison of serializable values. #1826

andrii0lomakin commented Nov 15, 2013

lvca commented Nov 15, 2013

andrii0lomakin commented Nov 16, 2013

lvca commented Nov 16, 2013

andrii0lomakin commented Nov 16, 2013

andrii0lomakin commented Nov 16, 2013

lvca commented Nov 16, 2013

andrii0lomakin commented Nov 20, 2013

lvca commented Nov 20, 2013

andrii0lomakin commented Nov 20, 2013

andrii0lomakin commented Nov 20, 2013

andrii0lomakin commented Nov 20, 2013

lvca commented Nov 20, 2013

lvca commented Aug 28, 2014

andrii0lomakin commented Aug 29, 2014

lvca commented Aug 29, 2014

andrii0lomakin commented Aug 29, 2014

lvca commented Aug 29, 2014

tglman commented Aug 30, 2014

andrii0lomakin commented Apr 6, 2016

andrii0lomakin commented Oct 17, 2019

In place comparison of serializable values. #1826

In place comparison of serializable values. #1826

Comments

andrii0lomakin commented Nov 15, 2013

lvca commented Nov 15, 2013

andrii0lomakin commented Nov 16, 2013

lvca commented Nov 16, 2013

andrii0lomakin commented Nov 16, 2013

andrii0lomakin commented Nov 16, 2013

lvca commented Nov 16, 2013

andrii0lomakin commented Nov 20, 2013

lvca commented Nov 20, 2013

andrii0lomakin commented Nov 20, 2013

andrii0lomakin commented Nov 20, 2013

andrii0lomakin commented Nov 20, 2013

lvca commented Nov 20, 2013

lvca commented Aug 28, 2014

andrii0lomakin commented Aug 29, 2014

lvca commented Aug 29, 2014

andrii0lomakin commented Aug 29, 2014

lvca commented Aug 29, 2014

tglman commented Aug 30, 2014

andrii0lomakin commented Apr 6, 2016

andrii0lomakin commented Oct 17, 2019