Add limit on ClassLoader WeakReference cache #7698

tylerbenson · 2023-01-31T20:02:33Z

Problem

Having many WeakReferences can cause problems in an application. To reduce the number, we have a cache lookup to allow us to reuse those WeakReference instances. This cache is currently unlimited in size, and only reduced by cleaning references to GC'd ClassLoaders. Most applications have a limited number of persistent ClassLoaders. Some applications generate them dynamically. This is where the cache can grow excessively.

Solution

Add a size limit to the cache so that less commonly used loaders are evicted.

Note: This could cause a common entry to be evicted which may result in a less performant lookup (because identity comparison won't match for subsequently generated WeakReferences). This is considered an acceptable tradeoff to avoid excessive memory use.

Related: #7678

## Problem Having many WeakReferences can cause problems in an application. To reduce the number, we have a cache lookup to allow us to reuse those WeakReference instances. This cache is currently unlimited in size, and only reduced by cleaning references to GC'd ClassLoaders. Most applications have a limited number of persistent ClassLoaders. Some applications generate them dynamically. This is where the cache can grow excessively. ## Solution Add a size limit to the cache so that less commonly used loaders are evicted. Note: This could cause a common entry to be evicted which may result in a less performant lookup (because identity comparison won't match for subsequently generated WeakReferences). This is considered an acceptable tradeoff to avoid excessive memory use.

These are optimizations where caching is just an optimization and invalidation can be recalculated. I didn't change the cache in `HelperInjector` as that seems to have more functionality around it.

laurit · 2023-02-21T09:17:17Z

@opentelemetrybot update

laurit · 2023-02-21T14:14:12Z

I measured heap usage for WeakLockFreeCache from starting Liferay (osgi application with many class loaders). For this I took heap dump after Liferay had started and in Eclipse memory analyzer looked up all instances of WeakLockFreeCache and summed the retained heap column.
baseline 1,012 objects consuming 13,117,752
this pr 1,012 objects consuming 18,115,592
#7866 806 objects consuming 1,017,816

Surprisingly the bound cache used in this pr does not result in smaller memory usage. As far as I can tell it is because ConcurrentLinkedHashMap is fairly large data structure. On my macbook

opentelemetry-java-instrumentation/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/internal/cache/concurrentlinkedhashmap/ConcurrentLinkedHashMap.java

Line 232 in 4d202ba

readBuffers = new AtomicReference[NUMBER_OF_READ_BUFFERS][READ_BUFFER_SIZE];

is a 16x128 array which filled with atomic refs already consumes 41,296

On a run without the javaagent or with my cache limit fixes, the assertion passes. When run on main, the test fails because the heap size difference is over 200MB.

tylerbenson · 2023-03-16T20:37:04Z

@trask per your request, I created a test that runs with the full javaagent. When run on main the test fails, when run with this branch or without the javaagent the test passes. (Notice, I @Ignored it to avoid running it in CI.)

@laurit I'm not familiar with Liferay. How many classloaders would you say it generates/uses? If the classloader usage is spread out more it probably isn't as problematic. This PR is explicitly trying to handle dynamically generated classloaders.

laurit · 2023-03-17T16:01:18Z

@tylerbenson Junit 5 has a different annotation for disabling tests, see https://junit.org/junit5/docs/current/user-guide/#writing-tests-disabling
I used liferay for testing just because it is a large app that is easily available so you could verify whether I messed up something in the measurements. It is osgi based so it has a lot of class loaders, a bit over 1000 if I remember correctly. You could use some other large app that you have at hand.
I had to strip one 0 zero off of the iteration count to make it pass on my laptop. I ran this test with your pr, main and my pr. Firstly I don't think Runtime.getRuntime().totalMemory() is the one you want to measure. It shows the size of the heap, not how much of it is in use. I think you should be using Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory() instead to get how much of the heap is currently in use. I took heap dumps at the end of the run and compared them. Memory analyzer reported the following heap sizes
your pr - 33 MB
main - 26.2 MB
my pr - 16.6 MB
I'll merge #7866 you can rebase this pr if you so wish. That pr would be useful even when we decide to adopt your approach for limiting cache sizes. Instead of having a separate ClassLoader -> Boolean map in each matcher it uses on ClassLoader -> BitSet map to track the matching status. Heap usage reduction comes from having only one map instead of the 200+ maps. To limit the cache size you'll have to modify this line

opentelemetry-java-instrumentation/javaagent-extension-api/src/main/java/io/opentelemetry/javaagent/extension/matcher/ClassLoaderHasClassesNamedMatcher.java

Line 67 in 153b713

private final Cache<ClassLoader, BitSet> enabled = Cache.weak();

My understanding is that when using one shared map what increases when class loaders are added is the array backing that map. Creating class loaders will eventually exhaust the heap and get gcd. That will prevent that map from growing further. As class loaders are much larger than what is in that map I doubt that you can observe that map growing too large even if you don't explicitly limit its size.

See #7698 This is an attempt to reduce memory usage for `ClassLoaderHasClassesNamedMatcher`. Instead of having each matcher keep a `Map<ClassLoader, Boolean>` we can have one `Map<ClassLoader, BitSet>` where each matcher uses one bit in the `BitSet`. Alternatively `Map<ClassLoader, Set<ClassLoaderHasClassesNamedMatcher>>` where set contains matchers that match for given class loader would also work well because these matchers usually don't match so we can expect to have only a few elements in the set.

# Conflicts: # javaagent-extension-api/src/main/java/io/opentelemetry/javaagent/extension/matcher/ClassLoaderHasClassesNamedMatcher.java

tylerbenson · 2023-03-17T18:39:10Z

@laurit Thanks for the pointer about @Disabled.

The reason that tracking and keeping Runtime.getRuntime().totalMemory() constrained is important is because of systems like Docker that monitor the process memory usage and kill the process if it exceeds a configured threshold. Minimizing large swings in allocated memory is valuable and often preferred even with a tradeoff of higher average memory (or even cpu) usage. Controlling those large swings is ultimately the goal of configuring these cache limits.

tylerbenson · 2023-03-20T16:45:22Z

@laurit I ran the included test on main with your change included and it does indeed solve the problem. I will go ahead and close this PR. Thanks for the fix.

tylerbenson · 2023-03-20T18:03:48Z

...-tooling/src/main/java/io/opentelemetry/javaagent/tooling/instrumentation/MuzzleMatcher.java

@@ -39,7 +39,7 @@ class MuzzleMatcher implements AgentBuilder.RawMatcher {
  private final InstrumentationModule instrumentationModule;
  private final Level muzzleLogLevel;
  private final AtomicBoolean initialized = new AtomicBoolean(false);
-  private final Cache<ClassLoader, Boolean> matchCache = Cache.weak();
+  private final Cache<ClassLoader, Boolean> matchCache = Cache.weakBounded(25);


@laurit do you think your change should be applied to this matcher too?

tylerbenson requested a review from a team as a code owner January 31, 2023 20:02

tylerbenson force-pushed the tyler/weak-bounded branch 2 times, most recently from cd13582 to 726fe30 Compare January 31, 2023 21:38

tylerbenson added 3 commits February 10, 2023 16:39

Fix errorprone

0f0e844

Add more cache size limits to Cache<ClassLoader, Boolean>

ff7c102

These are optimizations where caching is just an optimization and invalidation can be recalculated. I didn't change the cache in `HelperInjector` as that seems to have more functionality around it.

tylerbenson force-pushed the tyler/weak-bounded branch from 524eb08 to ff7c102 Compare February 10, 2023 21:39

laurit mentioned this pull request Feb 21, 2023

Reduce memory usage for ClassLoaderHasClassesNamedMatcher #7866

Merged

tylerbenson added 2 commits March 16, 2023 14:56

Merge remote-tracking branch 'upstream/main' into tyler/weak-bounded

e6bea36

Add test which runs with the javaagent to demonstrate the problem.

25c69f4

On a run without the javaagent or with my cache limit fixes, the assertion passes. When run on main, the test fails because the heap size difference is over 200MB.

tylerbenson force-pushed the tyler/weak-bounded branch from 0e53cd8 to 25c69f4 Compare March 16, 2023 20:17

tylerbenson added 2 commits March 17, 2023 14:24

Ignore -> Disabled

ae1616e

Merge branch 'main' into tyler/weak-bounded

e76b5ca

# Conflicts: # javaagent-extension-api/src/main/java/io/opentelemetry/javaagent/extension/matcher/ClassLoaderHasClassesNamedMatcher.java

tylerbenson closed this Mar 20, 2023

tylerbenson deleted the tyler/weak-bounded branch March 20, 2023 16:45

tylerbenson commented Mar 20, 2023

View reviewed changes

tylerbenson mentioned this pull request Aug 24, 2023

All caches should be size constrained #7678

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add limit on ClassLoader WeakReference cache #7698

Add limit on ClassLoader WeakReference cache #7698

tylerbenson commented Jan 31, 2023

laurit commented Feb 21, 2023

laurit commented Feb 21, 2023

tylerbenson commented Mar 16, 2023

laurit commented Mar 17, 2023

tylerbenson commented Mar 17, 2023

tylerbenson commented Mar 20, 2023

tylerbenson Mar 20, 2023

Add limit on ClassLoader WeakReference cache #7698

Add limit on ClassLoader WeakReference cache #7698

Conversation

tylerbenson commented Jan 31, 2023

Problem

Solution

laurit commented Feb 21, 2023

laurit commented Feb 21, 2023

tylerbenson commented Mar 16, 2023

laurit commented Mar 17, 2023

tylerbenson commented Mar 17, 2023

tylerbenson commented Mar 20, 2023

tylerbenson Mar 20, 2023

Choose a reason for hiding this comment