New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Java API get() performance by reducing copies #10970
Conversation
e20f0b0
to
9b758fa
Compare
5763550
to
8896a47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Benchmarks are a subset of facebook#7597 We just want to benchmark existing JNI methods with some potential performance improvements.
This should reduce by the 1 the number of value copies when the result is in the buffer cache, and can be returned by a PinnableSlice - in which case we can directly copy from the slice into the result Java byte[]
Add baseline benchmark results Fix instructions. Gather next set of results. Discussion
8896a47
to
edf18f5
Compare
@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
GetBenchmarks.preallocatedByteBufferGet no_column_family 1000 128 N/A 32768 thrpt 25 70765.578 ± 697.548 ops/s | ||
GetBenchmarks.preallocatedGet no_column_family 1000 128 N/A 32768 thrpt 25 69883.554 ± 944.184 ops/s | ||
|
||
### After fixing byte[] (.get and .preallocatedGet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .get and .preallocatedGet are benchmark functions. I'd find it easier to conclude there's a user-facing improvement if the same (optimized) version of java/jmh were used for both sides of the comparison.
In any case, thanks for the detailed instructions - I was able to try that and indeed found an improvement, especially with large / numerous values. Unlike the results here, mine benefited more from numerous values because PinnableSlice
helps with reads from SST files more than it helps with reads from memtable. And the improvements were similar across the different benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record I saw ~15% faster point lookup with 100K keys, 64KB values, 12-byte keys.
Command (not trying to use much long running features for stability): java -jar java/jmh/target/rocksdbjni-jmh-1.0-SNAPSHOT-benchmarks.jar -wi 1 -f 1 -p keyCount=100000 -p keySize=12 -p valueSize=65536 -p columnFamilyTestType="no_column_family" org.rocksdb.jmh.GetBenchmarks
Control (java/jmh built from PR10970; rocksdbjni.jar built from main) result:
Benchmark (columnFamilyTestType) (keyCount) (keySize) (valueSize) Mode Cnt Score Error Units
GetBenchmarks.get no_column_family 100000 12 65536 thrpt 5 29633.553 ± 438.256 ops/s
GetBenchmarks.preallocatedByteBufferGet no_column_family 100000 12 65536 thrpt 5 50115.108 ± 2197.008 ops/s
GetBenchmarks.preallocatedGet no_column_family 100000 12 65536 thrpt 5 44642.995 ± 2717.312 ops/s
Experimental (java/jmh and rocksdbjni.jar built from PR10970) result:
Benchmark (columnFamilyTestType) (keyCount) (keySize) (valueSize) Mode Cnt Score Error Units
GetBenchmarks.get no_column_family 100000 12 65536 thrpt 5 34735.884 ± 1569.739 ops/s
GetBenchmarks.preallocatedByteBufferGet no_column_family 100000 12 65536 thrpt 5 58898.514 ± 5975.174 ops/s
GetBenchmarks.preallocatedGet no_column_family 100000 12 65536 thrpt 5 53611.361 ± 1280.867 ops/s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Andrew. I retried and failed to reproduce those spectacular Mac results, and I'm not sure exactly what happened as I was building both this branch and the "baseline" branch I used in tests in consistent fashion.
Anyway, as your work and the Ubuntu tests confirm we have a small but useful improvement.
And thanks for the explanation re pinnable-slices and memtable vs SSTs. That's helpful for my understanding.
Performance improvements for
get()
paths in the RocksJava API (JNI).Document describing the performance results.
Replace uses of the legacy
DB::Get()
method wrapper returning data in astd::string
with direct calls toDB::Get()
passing a pinnable slice to receive this data. Copying from a pinned slice direct to the destination java byte array, without going via an intervening std::string, is a major performance gain for this code path.Note that this gain only comes where
DB::Get()
is able to return a pinned buffer; where it has to copy into the buffer owned by the slice, there is still the intervening copy and no performance gain. It may be possible to address this case too, but it is not trivial.