Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JMH microbenchmarks for RocksJava #6241

Closed
wants to merge 8 commits into from

Conversation

adamretter
Copy link
Collaborator

@adamretter adamretter commented Dec 23, 2019

This is the start of some JMH microbenchmarks for RocksJava.

Such benchmarks can help us decide on performance improvements of the Java API.

At the moment, I have only added benchmarks for various Comparator options, as that is one of the first areas where I want to improve performance. I plan to expand this to many more tests.

Details of how to compile and run the benchmarks are in the README.md.

A run of these on a XEON 3.5 GHz 4vCPU (QEMU Virtual CPU version 2.5+) / 8GB RAM KVM with Ubuntu 18.04, OpenJDK 1.8.0_232, and gcc 8.3.0 produced the following:

# Run complete. Total time: 01:43:17

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                         (comparatorName)   Mode  Cnt       Score       Error  Units
ComparatorBenchmarks.put                           native_bytewise thrpt   25   122373.920 ±  2200.538  ops/s
ComparatorBenchmarks.put              java_bytewise_adaptive_mutex thrpt   25    17388.201 ±  1444.006  ops/s
ComparatorBenchmarks.put          java_bytewise_non-adaptive_mutex thrpt   25    16887.150 ±  1632.204  ops/s
ComparatorBenchmarks.put       java_direct_bytewise_adaptive_mutex thrpt   25    15644.572 ±  1791.189  ops/s
ComparatorBenchmarks.put   java_direct_bytewise_non-adaptive_mutex thrpt   25    14869.601 ±  2252.135  ops/s
ComparatorBenchmarks.put                   native_reverse_bytewise thrpt   25   116528.735 ±  4168.797  ops/s
ComparatorBenchmarks.put      java_reverse_bytewise_adaptive_mutex thrpt   25    10651.975 ±   545.998  ops/s
ComparatorBenchmarks.put  java_reverse_bytewise_non-adaptive_mutex thrpt   25    10514.224 ±   930.069  ops/s

Indicating a ~7x difference between comparators implemented natively (C++) and those implemented in Java. Let's see if we can't improve on that in the near future...

@adamretter
Copy link
Collaborator Author

@koldat these may be of interest to you too ;-)

@adamretter adamretter force-pushed the feature/rocksdbjni-jmh branch 2 times, most recently from 8f5fcf8 to 0e55e04 Compare January 2, 2020 17:48
Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just clarify "JMH", such as how I suggest.

@@ -0,0 +1,18 @@
# JMH Benchmarks for RocksJava

These are JMH micro-benchmarks for RocksJava functionality.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I (and probably others) am not familiar with JMH. Suggest:

These are micro-benchmarks for RocksJava functionality, using JMH (Java Microbenchmark Harness - https://openjdk.java.net/projects/code-tools/jmh/).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, all done :-)

@@ -0,0 +1,18 @@
# JMH Benchmarks for RocksJava

These are micro-benchmarks for RocksJava functionality, using [JMH](Java Microbenchmark Harness - https://openjdk.java.net/projects/code-tools/jmh/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: malformed markdown link

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you have imported it to Phabricator. Should I still push an update to fix the markdown?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm sending mixed signals. Sorry. Yes, please update.

Phabricator import is required for our internal validation, and that's more useful to run early enough to share results with the author, saving round trips. We (or at least I) treat the PR as source of truth until closed, but it probably makes sense to assume we might be trying to land the PR if it's accepted and imported. So I'll try to be decisive about accepted vs. request changes.

For this PR, internal checks saw that LICENSE-HEADER.txt is missing a trailing newline. Please fix.

If you generally don't mind me pushing cosmetic things to your PR branches, I can probably fix them that way faster than round-trip. Let me know.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pdillinger. I just fixed those two small things and pushed.
I don't mind you pushing cosmetic commits at all, after all open-source is a team sport ;-)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@adamretter has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in 6477075.

facebook-github-bot pushed a commit that referenced this pull request Feb 3, 2020
Summary:
This is a redesign of the API for RocksJava comparators with the aim of improving performance. It also simplifies the class hierarchy.

**NOTE**: This breaks backwards compatibility for existing 3rd party Comparators implemented in Java... so we need to consider carefully which release branches this goes into.

Previously when implementing a comparator in Java the developer had a choice of subclassing either `DirectComparator` or `Comparator` which would use direct and non-direct byte-buffers resepectively (via `DirectSlice` and `Slice`).

In this redesign there we have eliminated the overhead of using the Java Slice classes, and just use `ByteBuffer`s. The `ComparatorOptions` supplied when constructing a Comparator allow you to choose between direct and non-direct byte buffers by setting `useDirect`.

In addition, the `ComparatorOptions` now allow you to choose whether a ByteBuffer is reused over multiple comparator calls, by setting `maxReusedBufferSize > 0`. When buffers are reused, ComparatorOptions provides a choice of mutex type by setting `useAdaptiveMutex`.

 ---
[JMH benchmarks previously indicated](#6241 (comment)) that the difference between C++ and Java for implementing a comparator was ~7x slowdown in Java.

With these changes, when reusing buffers and guarding access to them via mutexes the slowdown is approximately the same. However, these changes offer a new facility to not reuse mutextes, which reduces the slowdown to ~5.5x in Java. We also offer a `thread_local` mechanism for reusing buffers, which reduces slowdown to ~5.2x in Java (closes #4425).

These changes also form a good base for further optimisation work such as further JNI lookup caching, and JNI critical.

 ---
These numbers were captured without jemalloc. With jemalloc, the performance improves for all tests, and the Java slowdown reduces to between 4.8x and 5.x.

```
ComparatorBenchmarks.put                                                native_bytewise  thrpt   25  124483.795 ± 2032.443  ops/s
ComparatorBenchmarks.put                                        native_reverse_bytewise  thrpt   25  114414.536 ± 3486.156  ops/s
ComparatorBenchmarks.put              java_bytewise_non-direct_reused-64_adaptive-mutex  thrpt   25   17228.250 ± 1288.546  ops/s
ComparatorBenchmarks.put          java_bytewise_non-direct_reused-64_non-adaptive-mutex  thrpt   25   16035.865 ± 1248.099  ops/s
ComparatorBenchmarks.put                java_bytewise_non-direct_reused-64_thread-local  thrpt   25   21571.500 ±  871.521  ops/s
ComparatorBenchmarks.put                  java_bytewise_direct_reused-64_adaptive-mutex  thrpt   25   23613.773 ± 8465.660  ops/s
ComparatorBenchmarks.put              java_bytewise_direct_reused-64_non-adaptive-mutex  thrpt   25   16768.172 ± 5618.489  ops/s
ComparatorBenchmarks.put                    java_bytewise_direct_reused-64_thread-local  thrpt   25   23921.164 ± 8734.742  ops/s
ComparatorBenchmarks.put                              java_bytewise_non-direct_no-reuse  thrpt   25   17899.684 ±  839.679  ops/s
ComparatorBenchmarks.put                                  java_bytewise_direct_no-reuse  thrpt   25   22148.316 ± 1215.527  ops/s
ComparatorBenchmarks.put      java_reverse_bytewise_non-direct_reused-64_adaptive-mutex  thrpt   25   11311.126 ±  820.602  ops/s
ComparatorBenchmarks.put  java_reverse_bytewise_non-direct_reused-64_non-adaptive-mutex  thrpt   25   11421.311 ±  807.210  ops/s
ComparatorBenchmarks.put        java_reverse_bytewise_non-direct_reused-64_thread-local  thrpt   25   11554.005 ±  960.556  ops/s
ComparatorBenchmarks.put          java_reverse_bytewise_direct_reused-64_adaptive-mutex  thrpt   25   22960.523 ± 1673.421  ops/s
ComparatorBenchmarks.put      java_reverse_bytewise_direct_reused-64_non-adaptive-mutex  thrpt   25   18293.317 ± 1434.601  ops/s
ComparatorBenchmarks.put            java_reverse_bytewise_direct_reused-64_thread-local  thrpt   25   24479.361 ± 2157.306  ops/s
ComparatorBenchmarks.put                      java_reverse_bytewise_non-direct_no-reuse  thrpt   25    7942.286 ±  626.170  ops/s
ComparatorBenchmarks.put                          java_reverse_bytewise_direct_no-reuse  thrpt   25   11781.955 ± 1019.843  ops/s
```
Pull Request resolved: #6252

Differential Revision: D19331064

Pulled By: pdillinger

fbshipit-source-id: 1f3b794e6a14162b2c3ffb943e8c0e64a0c03738
@adamretter adamretter deleted the feature/rocksdbjni-jmh branch April 29, 2020 18:41
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants