Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Java API get() performance by reducing copies #10970

Closed
wants to merge 14 commits into from

Conversation

alanpaxton
Copy link
Contributor

@alanpaxton alanpaxton commented Nov 21, 2022

Performance improvements for get() paths in the RocksJava API (JNI).
Document describing the performance results.

Replace uses of the legacy DB::Get() method wrapper returning data in a std::string with direct calls to DB::Get() passing a pinnable slice to receive this data. Copying from a pinned slice direct to the destination java byte array, without going via an intervening std::string, is a major performance gain for this code path.

Note that this gain only comes where DB::Get() is able to return a pinned buffer; where it has to copy into the buffer owned by the slice, there is still the intervening copy and no performance gain. It may be possible to address this case too, but it is not trivial.

@alanpaxton alanpaxton marked this pull request as draft November 21, 2022 10:40
@alanpaxton alanpaxton force-pushed the eb-1631-get-performance branch 3 times, most recently from e20f0b0 to 9b758fa Compare November 24, 2022 09:12
@alanpaxton alanpaxton marked this pull request as ready for review November 24, 2022 09:27
@adamretter adamretter self-requested a review December 5, 2022 16:18
Copy link
Collaborator

@adamretter adamretter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adamretter
Copy link
Collaborator

Hi @ajkr @siying, could you take a look at importing/merging this one please?

Benchmarks are a subset of facebook#7597

We just want to benchmark existing JNI methods with some potential performance improvements.
This should reduce by the 1 the number of value copies when the result is in the buffer cache, and can be returned by a PinnableSlice - in which case we can directly copy from the slice into the result Java byte[]
Add baseline benchmark results
Fix instructions. Gather next set of results.
Discussion
@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

GetBenchmarks.preallocatedByteBufferGet no_column_family 1000 128 N/A 32768 thrpt 25 70765.578 ± 697.548 ops/s
GetBenchmarks.preallocatedGet no_column_family 1000 128 N/A 32768 thrpt 25 69883.554 ± 944.184 ops/s

### After fixing byte[] (.get and .preallocatedGet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .get and .preallocatedGet are benchmark functions. I'd find it easier to conclude there's a user-facing improvement if the same (optimized) version of java/jmh were used for both sides of the comparison.

In any case, thanks for the detailed instructions - I was able to try that and indeed found an improvement, especially with large / numerous values. Unlike the results here, mine benefited more from numerous values because PinnableSlice helps with reads from SST files more than it helps with reads from memtable. And the improvements were similar across the different benchmarks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record I saw ~15% faster point lookup with 100K keys, 64KB values, 12-byte keys.

Command (not trying to use much long running features for stability): java -jar java/jmh/target/rocksdbjni-jmh-1.0-SNAPSHOT-benchmarks.jar -wi 1 -f 1 -p keyCount=100000 -p keySize=12 -p valueSize=65536 -p columnFamilyTestType="no_column_family" org.rocksdb.jmh.GetBenchmarks

Control (java/jmh built from PR10970; rocksdbjni.jar built from main) result:

Benchmark                                (columnFamilyTestType)  (keyCount)  (keySize)  (valueSize)   Mode  Cnt      Score      Error  Units
GetBenchmarks.get                              no_column_family      100000         12        65536  thrpt    5  29633.553 ±  438.256  ops/s
GetBenchmarks.preallocatedByteBufferGet        no_column_family      100000         12        65536  thrpt    5  50115.108 ± 2197.008  ops/s
GetBenchmarks.preallocatedGet                  no_column_family      100000         12        65536  thrpt    5  44642.995 ± 2717.312  ops/s

Experimental (java/jmh and rocksdbjni.jar built from PR10970) result:

Benchmark                                (columnFamilyTestType)  (keyCount)  (keySize)  (valueSize)   Mode  Cnt      Score      Error  Units
GetBenchmarks.get                              no_column_family      100000         12        65536  thrpt    5  34735.884 ± 1569.739  ops/s
GetBenchmarks.preallocatedByteBufferGet        no_column_family      100000         12        65536  thrpt    5  58898.514 ± 5975.174  ops/s
GetBenchmarks.preallocatedGet                  no_column_family      100000         12        65536  thrpt    5  53611.361 ± 1280.867  ops/s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Andrew. I retried and failed to reproduce those spectacular Mac results, and I'm not sure exactly what happened as I was building both this branch and the "baseline" branch I used in tests in consistent fashion.
Anyway, as your work and the Ubuntu tests confirm we have a small but useful improvement.

And thanks for the explanation re pinnable-slices and memtable vs SSTs. That's helpful for my understanding.

@facebook-github-bot
Copy link
Contributor

@ajkr merged this pull request in f8969ad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants