Replace ConcurrentHashMap with ConcurrentLongHashMap: Massive Win #1703
+518
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@matthiasblaesing Ok, this change turns out to be a massive win from my perspective. While the new code is marginally faster, the actual goal is memory performance; specifically reduction of garbage and therefore Garbage Collection (GC).
Background
Essentially, this is a change from
ConcurrentHashMapto a replacement classConcurrentLongHashMap. Besides the implementation ofConcurrentLongHashMap, the only code change is to theMemoryclass, as seen here:So, what is going on? In essence the
Longkey in the hash map is thepeeraddress to a block of allocated memory. This must be tracked because every allocated block of memory must be freed eventually. The tracking happens like this:Important
The
peeraddress is actually a nativelongvalue, so what happens when you need to use that as a key in a Java hash map? The answer is AutoBoxing. Which surfaces an often overlooked issue:The Dark Side of Java Autoboxing: Hidden Performance Pitfalls
Every time a block of memory is allocated, basically the lifeblood of JNA, the
peermust be AutoBoxed to aLongobject when it is stored, and thepeervalue must be AutoBoxed again for theallocatedMemory.remove(peer)operation.This results in a large amount of
Long"garbage" objects that must ultimately be garbage collected.Note
Again, given that native memory allocation and deallocation are the lifeblood of JNA this can have a substantial effect on the overall performance of the system as a whole due to overworking GC.
ConcurrentLongHashMap
The
ConcurrentLongHashMapimplementaton comes from the Apache Artemis project, under the Apache License v2, located here:ConcurrentLongHashMap.java
The implementation provides a concurrent hash map where the key is pinned as a primitive
long. This means that when adding or removing objects keyed by thepeeraddress there is no AutoBoxing that occurs.Note
The Apache Artemis project replaced
ConcurrentHashMap<Long, V>with this class for the same reason that I am recommending it here -- the Java implementation by its nature generates a lot of garbage as a side-effect.Benchmarking
The open question that I had was: If Apache Artemis saw the need to replace
ConcurrentHashMapwhat is the overall benefit?I created a JMH benchmark to measure the GC impact using the JMH
GCProfilerfunctionality. The JMH benchmark Maven project is attached as a .zip at the bottom of this write-up. It includes a shell script namedrunbench.shand can be run like so:Not that it particularly matters, but the
runbench.shtakes an optional parameter to test different garbage collection algorithms:./runbench.sh G1 # the default ./runbench.sh Shenandoah ./runbench.sh ZGCBecause the amount of garbage generated is the same regardless of GC algorithm, the relative impact remains proportional no matter which is used. It was just a side exercise for curiosity.
I will show the "raw" results first, but will break them down more clearly in a pivot table afterwards. In the output, "CHM" as the mapType is the Java
ConcurrentHashMapand "CUSTOM" isConcurrentLongHashMap.Here are the same results in a more readable pivot table. Values are rounded to the nearest whole number. 👉 Note, except for Throughput (ops/s), lower values are better. CHM is the Java
ConcurrentHashMapand CLHM is theConcurrentLongHashMap.As you can see, CLHM is slightly faster, but it just destroys CHM in the size of garbage, the number of GC required during each iteration, and the amount of time consumed by GC.
Note
Certainly, the majority of the garbage generated by CHM is due to AutoBoxing
longvalues, but there are also likely differences between the two implementations that are responsible for more/less garbage in the internal data structures. Internally they work in very different ways.Again, I think this is a huge win in terms of system performance as a whole, in which JNA is playing a part. Garbage Collection is the hidden thief of Java, stealing performance in a way mostly invisible from the developer unless they are specifically looking for it.
Here is the benchmark. Use the
runbench.shto build and execute in a single step.chm-bench.zip