Improve Java API get() performance by reducing copies #10970

alanpaxton · 2022-11-21T10:40:00Z

Performance improvements for get() paths in the RocksJava API (JNI).
Document describing the performance results.

Replace uses of the legacy DB::Get() method wrapper returning data in a std::string with direct calls to DB::Get() passing a pinnable slice to receive this data. Copying from a pinned slice direct to the destination java byte array, without going via an intervening std::string, is a major performance gain for this code path.

Note that this gain only comes where DB::Get() is able to return a pinned buffer; where it has to copy into the buffer owned by the slice, there is still the intervening copy and no performance gain. It may be possible to address this case too, but it is not trivial.

adamretter

LGTM

adamretter · 2022-12-05T16:19:21Z

Hi @ajkr @siying, could you take a look at importing/merging this one please?

Benchmarks are a subset of facebook#7597 We just want to benchmark existing JNI methods with some potential performance improvements.

This should reduce by the 1 the number of value copies when the result is in the buffer cache, and can be returned by a PinnableSlice - in which case we can directly copy from the slice into the result Java byte[]

Add baseline benchmark results Fix instructions. Gather next set of results. Discussion

facebook-github-bot · 2022-12-17T21:43:38Z

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ajkr · 2022-12-20T23:50:30Z

java/GetBenchmarks.md

+GetBenchmarks.preallocatedByteBufferGet        no_column_family        1000        128             N/A        32768  thrpt   25      70765.578 ±    697.548  ops/s
+GetBenchmarks.preallocatedGet                  no_column_family        1000        128             N/A        32768  thrpt   25      69883.554 ±    944.184  ops/s
+
+### After fixing byte[] (.get and .preallocatedGet)


The .get and .preallocatedGet are benchmark functions. I'd find it easier to conclude there's a user-facing improvement if the same (optimized) version of java/jmh were used for both sides of the comparison.

In any case, thanks for the detailed instructions - I was able to try that and indeed found an improvement, especially with large / numerous values. Unlike the results here, mine benefited more from numerous values because PinnableSlice helps with reads from SST files more than it helps with reads from memtable. And the improvements were similar across the different benchmarks.

For the record I saw ~15% faster point lookup with 100K keys, 64KB values, 12-byte keys.

Command (not trying to use much long running features for stability): java -jar java/jmh/target/rocksdbjni-jmh-1.0-SNAPSHOT-benchmarks.jar -wi 1 -f 1 -p keyCount=100000 -p keySize=12 -p valueSize=65536 -p columnFamilyTestType="no_column_family" org.rocksdb.jmh.GetBenchmarks

Control (java/jmh built from PR10970; rocksdbjni.jar built from main) result:

Benchmark (columnFamilyTestType) (keyCount) (keySize) (valueSize) Mode Cnt Score Error Units GetBenchmarks.get no_column_family 100000 12 65536 thrpt 5 29633.553 ± 438.256 ops/s GetBenchmarks.preallocatedByteBufferGet no_column_family 100000 12 65536 thrpt 5 50115.108 ± 2197.008 ops/s GetBenchmarks.preallocatedGet no_column_family 100000 12 65536 thrpt 5 44642.995 ± 2717.312 ops/s

Experimental (java/jmh and rocksdbjni.jar built from PR10970) result:

Benchmark (columnFamilyTestType) (keyCount) (keySize) (valueSize) Mode Cnt Score Error Units GetBenchmarks.get no_column_family 100000 12 65536 thrpt 5 34735.884 ± 1569.739 ops/s GetBenchmarks.preallocatedByteBufferGet no_column_family 100000 12 65536 thrpt 5 58898.514 ± 5975.174 ops/s GetBenchmarks.preallocatedGet no_column_family 100000 12 65536 thrpt 5 53611.361 ± 1280.867 ops/s

Thanks Andrew. I retried and failed to reproduce those spectacular Mac results, and I'm not sure exactly what happened as I was building both this branch and the "baseline" branch I used in tests in consistent fashion.
Anyway, as your work and the Ubuntu tests confirm we have a small but useful improvement.

And thanks for the explanation re pinnable-slices and memtable vs SSTs. That's helpful for my understanding.

facebook-github-bot · 2022-12-21T19:56:42Z

@ajkr merged this pull request in f8969ad.

facebook-github-bot added the CLA Signed label Nov 21, 2022

alanpaxton marked this pull request as draft November 21, 2022 10:40

alanpaxton force-pushed the eb-1631-get-performance branch 3 times, most recently from e20f0b0 to 9b758fa Compare November 24, 2022 09:12

alanpaxton marked this pull request as ready for review November 24, 2022 09:27

alanpaxton force-pushed the eb-1631-get-performance branch from 5763550 to 8896a47 Compare December 5, 2022 08:33

adamretter self-requested a review December 5, 2022 16:18

adamretter approved these changes Dec 5, 2022

View reviewed changes

alanpaxton added 14 commits December 16, 2022 10:06

Improved get() benchmarks from @alucarded

5f17bfb

Benchmarks are a subset of facebook#7597 We just want to benchmark existing JNI methods with some potential performance improvements.

Report benchmarks before perf fixes

7139c34

Update rocksdb version for jmh

a0ace0d

Potential Java API get() performance improvement

f1428e4

This should reduce by the 1 the number of value copies when the result is in the buffer cache, and can be returned by a PinnableSlice - in which case we can directly copy from the slice into the result Java byte[]

Add instructions, fix build

ccb9bf9

Restrict instructions to GetBenchmarks

fc8be51

Extend performance fix to ByteBuffer (direct)

91c3649

Update benchmark results

a99e503

Add longer running benchmark results

92b3657

java source should be java 8

0d79eb8

More words in the benchmark markdown

a6c1781

format fix

eddfc34

Polish report/document of JMH tests for get()

a7fbb16

Add baseline benchmark results Fix instructions. Gather next set of results. Discussion

One more piece of explicative text

edf18f5

alanpaxton force-pushed the eb-1631-get-performance branch from 8896a47 to edf18f5 Compare December 16, 2022 10:06

ajkr approved these changes Dec 21, 2022

View reviewed changes

facebook-github-bot closed this in f8969ad Dec 21, 2022

facebook-github-bot added the Merged label Dec 21, 2022

igorcanadi mentioned this pull request Jan 17, 2024

[SYS-6913] Upgrade RocksDB-Cloud to 8.9.1 rockset/rocksdb-cloud#315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Java API get() performance by reducing copies #10970

Improve Java API get() performance by reducing copies #10970

alanpaxton commented Nov 21, 2022 •

edited

adamretter left a comment

adamretter commented Dec 5, 2022

facebook-github-bot commented Dec 17, 2022

ajkr Dec 20, 2022

ajkr Dec 21, 2022

alanpaxton Dec 21, 2022

facebook-github-bot commented Dec 21, 2022

Improve Java API get() performance by reducing copies #10970

Improve Java API get() performance by reducing copies #10970

Conversation

alanpaxton commented Nov 21, 2022 • edited

adamretter left a comment

Choose a reason for hiding this comment

adamretter commented Dec 5, 2022

facebook-github-bot commented Dec 17, 2022

ajkr Dec 20, 2022

Choose a reason for hiding this comment

ajkr Dec 21, 2022

Choose a reason for hiding this comment

alanpaxton Dec 21, 2022

Choose a reason for hiding this comment

facebook-github-bot commented Dec 21, 2022

alanpaxton commented Nov 21, 2022 •

edited