Skip to content

Conversation

@brettwooldridge
Copy link
Contributor

@brettwooldridge brettwooldridge commented Nov 18, 2025

@matthiasblaesing Ok, this change turns out to be a massive win from my perspective. While the new code is marginally faster, the actual goal is memory performance; specifically reduction of garbage and therefore Garbage Collection (GC).

Background

Essentially, this is a change from ConcurrentHashMap to a replacement class ConcurrentLongHashMap. Besides the implementation of ConcurrentLongHashMap, the only code change is to the Memory class, as seen here:

-    private static final Map<Long, Reference<Memory>> allocatedMemory =
-            new ConcurrentHashMap<>();
+    private static final ConcurrentLongHashMap<Reference<Memory>> allocatedMemory =
+            new ConcurrentLongHashMap<>();

So, what is going on? In essence the Long key in the hash map is the peer address to a block of allocated memory. This must be tracked because every allocated block of memory must be freed eventually. The tracking happens like this:

allocatedMemory.put(peer, new WeakReference<>(this));

Important

The peer address is actually a native long value, so what happens when you need to use that as a key in a Java hash map? The answer is AutoBoxing. Which surfaces an often overlooked issue:

The Dark Side of Java Autoboxing: Hidden Performance Pitfalls

Every time a block of memory is allocated, basically the lifeblood of JNA, the peer must be AutoBoxed to a Long object when it is stored, and the peer value must be AutoBoxed again for the allocatedMemory.remove(peer) operation.

This results in a large amount of Long "garbage" objects that must ultimately be garbage collected.

Note

Again, given that native memory allocation and deallocation are the lifeblood of JNA this can have a substantial effect on the overall performance of the system as a whole due to overworking GC.

ConcurrentLongHashMap

The ConcurrentLongHashMap implementaton comes from the Apache Artemis project, under the Apache License v2, located here:

ConcurrentLongHashMap.java

The implementation provides a concurrent hash map where the key is pinned as a primitive long. This means that when adding or removing objects keyed by the peer address there is no AutoBoxing that occurs.

Note

The Apache Artemis project replaced ConcurrentHashMap<Long, V> with this class for the same reason that I am recommending it here -- the Java implementation by its nature generates a lot of garbage as a side-effect.

Benchmarking

The open question that I had was: If Apache Artemis saw the need to replace ConcurrentHashMap what is the overall benefit?

I created a JMH benchmark to measure the GC impact using the JMH GCProfiler functionality. The JMH benchmark Maven project is attached as a .zip at the bottom of this write-up. It includes a shell script named runbench.sh and can be run like so:

./runbench.sh

Not that it particularly matters, but the runbench.sh takes an optional parameter to test different garbage collection algorithms:

./runbench.sh G1 # the default
./runbench.sh Shenandoah
./runbench.sh ZGC

Because the amount of garbage generated is the same regardless of GC algorithm, the relative impact remains proportional no matter which is used. It was just a side exercise for curiosity.

I will show the "raw" results first, but will break them down more clearly in a pivot table afterwards. In the output, "CHM" as the mapType is the Java ConcurrentHashMap and "CUSTOM" is ConcurrentLongHashMap.

Benchmark                                              (mapType)   Mode  Cnt      Score      Error   Units
ConcurrentMapGcBenchmark.addRemove                           CHM  thrpt   12  24453.196 ±  991.067  ops/ms
ConcurrentMapGcBenchmark.addRemove:gc.alloc.rate             CHM  thrpt   12   2440.423 ±   98.577  MB/sec
ConcurrentMapGcBenchmark.addRemove:gc.alloc.rate.norm        CHM  thrpt   12    104.728 ±    0.042    B/op
ConcurrentMapGcBenchmark.addRemove:gc.count                  CHM  thrpt   12    235.000             counts
ConcurrentMapGcBenchmark.addRemove:gc.time                   CHM  thrpt   12    333.000                 ms

ConcurrentMapGcBenchmark.addRemove                        CUSTOM  thrpt   12  27789.730 ± 1042.749  ops/ms
ConcurrentMapGcBenchmark.addRemove:gc.alloc.rate          CUSTOM  thrpt   12    672.000 ±   24.577  MB/sec
ConcurrentMapGcBenchmark.addRemove:gc.alloc.rate.norm     CUSTOM  thrpt   12     25.364 ±    0.036    B/op
ConcurrentMapGcBenchmark.addRemove:gc.count               CUSTOM  thrpt   12     60.000             counts
ConcurrentMapGcBenchmark.addRemove:gc.time                CUSTOM  thrpt   12     39.000                 ms

Here are the same results in a more readable pivot table. Values are rounded to the nearest whole number. 👉 Note, except for Throughput (ops/s), lower values are better. CHM is the Java ConcurrentHashMap and CLHM is the ConcurrentLongHashMap.

Benchmark CHM CLHM
Throughput (ops/s) 24453 27790
gc.alloc.rate (MB/s) 2440 672
gc.alloc.rate.norm (Bytes/op) 105 25
gc.count 235 60
gc.time (ms) 333 39

As you can see, CLHM is slightly faster, but it just destroys CHM in the size of garbage, the number of GC required during each iteration, and the amount of time consumed by GC.

Note

Certainly, the majority of the garbage generated by CHM is due to AutoBoxing long values, but there are also likely differences between the two implementations that are responsible for more/less garbage in the internal data structures. Internally they work in very different ways.

Again, I think this is a huge win in terms of system performance as a whole, in which JNA is playing a part. Garbage Collection is the hidden thief of Java, stealing performance in a way mostly invisible from the developer unless they are specifically looking for it.

Here is the benchmark. Use the runbench.sh to build and execute in a single step.
chm-bench.zip

@brettwooldridge
Copy link
Contributor Author

brettwooldridge commented Nov 18, 2025

The MacOS build failures are unrelated to this change.

@matthiasblaesing
Copy link
Member

The ConcurrentLongHashMap implementaton comes from the Apache Artemis project, under the Apache License v2, located here

At this point the discussion halts. JNA is dual licensed under ALv2 and LGPL2.1. So unless you are the sole author of said class, this can not be included.

@brettwooldridge
Copy link
Contributor Author

@matthiasblaesing Interesting. I was wondering about that. So, the only way to share code from another project is if it too is similarly dual licensed.

Well, all hope is not lost. This ConcurrentLongHashMap is actually overkill, because we only use two methods put() and remove(). I believe it is based on a simple sparse array design, so without referring to the code I may take a stab at a clean-room sparse array backed CHM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants