Enhance real memory circuit breaker with G1 GC #58674

henningandersen · 2020-06-29T14:08:15Z

Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above
the real memory circuit breaker limit and stays there for an extended
period. This situation will persist until the next young GC. The circuit
breaking itself hinders that from occurring in a timely manner since it
breaks all request before real work is done.

This commit gently nudges G1 to do a young GC and then double checks
that heap usage is still above the real memory circuit breaker limit
before throwing the circuit breaker exception.

Related to #57202

Reviewers: please also consider whether this should go to 7.8.1.

The overhead of triggering the GC is typically 1 ms when no concurrent cycle is running or around 10-20 ms on an 8GB heap and 20-40 ms on a 16GB heap (on my laptop). In addition to this comes a single young GC of 10-30 ms. On a bigger box, I get around 20-70 ms total overhead (time to trigger GC plus GC time) of 20-70 ms on a 30GB heap.

Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above the real memory circuit breaker limit and stays there for an extended period. This situation will persist until the next young GC. The circuit breaking itself hinders that from occurring in a timely manner since it breaks all request before real work is done. This commit gently nudges G1 to do a young GC and then double checks that heap usage is still above the real memory circuit breaker limit before throwing the circuit breaker exception. Related to elastic#57202

elasticmachine · 2020-06-29T14:08:17Z

Pinging @elastic/es-core-infra (:Core/Infra/Circuit Breakers)

jaymode

I'm on the fence here of whether this is the right approach. I wonder if it might be "safer" to attempt a System.gc() call instead since we're already going to break the request, so we could pay for a stop the world to allow more requests? There are other issues with that though such as what should we do when that GC call is set to run concurrently or disabled completely via JVM options. I'd hate to trigger an OOM from trying to get the GC to run using the approach in the PR

jaymode · 2020-06-30T18:43:59Z

server/src/main/java/org/elasticsearch/indices/breaker/HierarchyCircuitBreakerService.java

+        }
+    }
+
+    interface DoubleCheckStrategy {


Maybe name it OverLimitStrategy?

++, see 1d007ca

jaymode · 2020-06-30T18:44:36Z

server/src/main/java/org/elasticsearch/indices/breaker/HierarchyCircuitBreakerService.java

+        }
+
+        static long fallbackRegionSize(JvmInfo jvmInfo) {
+            // mimick JDK calculation


can you add a link or reference to this calculation?

Added in c8ac1af

jaymode · 2020-06-30T18:53:42Z

server/src/main/java/org/elasticsearch/indices/breaker/HierarchyCircuitBreakerService.java

+                    blackHole += localBlackHole;
+                    logger.trace("black hole [{}]", blackHole);
+                    long now = timeSupplier.getAsLong();
+                    assert now > this.lastCheckTime;


unfortunately neither System.currentTimeMillis() nor System.nanoTime() are always monotonic so now could be less than the last checked time so I do not believe that this assert should be here

Thanks, removed in b6b565a

jaymode · 2020-06-30T21:04:04Z

server/src/main/java/org/elasticsearch/indices/breaker/HierarchyCircuitBreakerService.java

+                            // we observed a memory drop, so some GC must have occurred
+                            break;
+                        }
+                        localBlackHole += new byte[allocationSize].hashCode();


is it possible for this to trigger an OOM?

Yes and no.

In theory yes, if there is really no collectible heap left.

But if that was the case, just creating the CircuitBreakingException poses the same risk. And if we are that close, we are doomed anyway I think. The chances of us having a workload at exactly 99.95 percent heap (corresponding to approximately 2000 regions) and surviving is so small that even if it was the case, the next time round we enter the same workload it would fall over.

Notice that we only need 1 region of space free or collectible space for this to succeed.

jaymode · 2020-07-01T14:07:02Z

server/src/main/java/org/elasticsearch/indices/breaker/HierarchyCircuitBreakerService.java

+        public MemoryUsage doubleCheck(MemoryUsage memoryUsed) {
+            long maxHeap = JvmInfo.jvmInfo().getMem().getHeapMax().getBytes();
+            boolean leader;
+            synchronized (lock) {


Any reason to use lock over this?

Given the locality of the usage, not really, it is just a personal preference since it avoids thinking about external synchronization on this. I am OK turning it into this if you prefer?

I'm fine with it; just curious.

henningandersen · 2020-07-01T20:08:49Z

I wonder if it might be "safer" to attempt a System.gc() call

In particular if this does full GC, it could be bad for system stability due to the time this could take.

We could utilize -XX:+ExplicitGCInvokesConcurrent option and make bootstrap checks to ensure it is on for G1. I think it triggers a concurrent GC which is an additional unnecessary overhead. I would also need to dig a bit on the JDK to investigate what the option really does (must do young GC).

henningandersen · 2020-07-01T20:57:40Z

I followed up on using ExplicitGCInvokesConcurrent. The System.gc() call does do a young collection followed by a concurrent cycle. But unfortunately, System.gc() call does not return until the concurrent cycle ends:

[2020-07-01T20:54:09.411+0000][4477][gc     ] GC(12) Pause Young (Concurrent Start) (System.gc()) 7791M->2989M(8192M) 18.242ms
[2020-07-01T20:54:09.412+0000][4477][gc     ] GC(13) Concurrent Cycle
[2020-07-01T20:54:10.467+0000][4477][gc     ] GC(13) Pause Remark 2989M->2989M(8192M) 0.535ms
[2020-07-01T20:54:10.835+0000][4477][gc     ] GC(13) Pause Cleanup 2989M->2989M(8192M) 0.349ms
[2020-07-01T20:54:10.839+0000][4477][gc     ] GC(13) Concurrent Cycle 1427.709ms
GC duration: 1446

(last line output by my application invoking System.gc(), in milliseconds)

I suppose we could do the System.gc() in a separate thread and then poll whether the amount of memory used dropped every 10 ms or so.

…breaking

jaymode

In particular if this does full GC, it could be bad for system stability due to the time this could take.

The only way this option should be on the table is with a concurrent cycle that is triggered by a dedicated thread. The issue there is how often should we attempt the call and how long do we wait for the GC to finish or poll memory used to decrease?

My concern with the approach taken is that we allocate and hope to catch the memory drop but we could miss a GC cycle occurring since this is a concurrent system and other allocations could be happening elsewhere and if we're still above the base memory usage so we keep allocating. There are facilities for monitoring the number of collections using the JvmStats class, so maybe that could be an option.

jaymode · 2020-07-06T17:00:41Z

server/src/main/java/org/elasticsearch/indices/breaker/HierarchyCircuitBreakerService.java

+
+        @Override
+        public MemoryUsage overLimit(MemoryUsage memoryUsed) {
+            long maxHeap = JvmInfo.jvmInfo().getMem().getHeapMax().getBytes();


Since this appears to be a consistent value, maybe we just keep it as a final long that is a class member.

Fixed in 3eacf32

jaymode · 2020-07-06T17:01:24Z

server/src/main/java/org/elasticsearch/indices/breaker/HierarchyCircuitBreakerService.java

+        public MemoryUsage doubleCheck(MemoryUsage memoryUsed) {
+            long maxHeap = JvmInfo.jvmInfo().getMem().getHeapMax().getBytes();
+            boolean leader;
+            synchronized (lock) {


I'm fine with it; just curious.

jaymode · 2020-07-07T16:00:22Z

server/src/main/java/org/elasticsearch/indices/breaker/HierarchyCircuitBreakerService.java

@@ -290,7 +308,7 @@ public long getParentLimit() {
    public void checkParentLimit(long newBytesReserved, String label) throws CircuitBreakingException {
        final MemoryUsage memoryUsed = memoryUsed(newBytesReserved);
        long parentLimit = this.parentSettings.getLimit();
-        if (memoryUsed.totalUsage > parentLimit) {
+        if (memoryUsed.totalUsage > parentLimit && doubleCheckMemoryUsed(memoryUsed).totalUsage > parentLimit) {


Maybe the method call can be replaced with overLimitStrategy.apply(memoryUsed) and the construction of the OverLimitStrategy will be responsible for handling the behavior of what to do?

Good idea, done in 3f61f93

server/src/main/java/org/elasticsearch/indices/breaker/HierarchyCircuitBreakerService.java

jaymode · 2020-07-07T16:19:58Z

server/src/main/java/org/elasticsearch/indices/breaker/HierarchyCircuitBreakerService.java

+
+    private static OverLimitStrategy createDoubleCheckStrategy(JvmInfo jvmInfo, LongSupplier currentMemoryUsageSupplier,
+                                                               LongSupplier timeSupplier, long minimumInterval) {
+        if (jvmInfo.useG1GC().equals("true")


In line with an earlier comment, we could pass in trackRealMemoryUsage to this method and add it to this check.

Also part of 3f61f93

jaymode · 2020-07-07T16:21:43Z

server/src/main/java/org/elasticsearch/indices/breaker/HierarchyCircuitBreakerService.java

+                                                               LongSupplier timeSupplier, long minimumInterval) {
+        if (jvmInfo.useG1GC().equals("true")
+            // messing with GC is "dangerous" so we apply an escape hatch. Not intended to be used.
+            && Boolean.parseBoolean(System.getProperty("es.real_memory_circuit_breaker.g1.double_check.enabled", "true"))) {


Do you mind using Booleans.parseBoolean(System.getProperty("es.real_memory_circuit_breaker.g1.double_check.enabled"), true)? The java Boolean parsing is pretty lenient and I thought it was a forbidden api at some point :|

Thanks, fixed in 845dc10

Now determine strategy based on whether real memory usage is tracked.

Test would not always trigger the over limit check twice, fixed.

henningandersen · 2020-07-07T21:19:58Z

if we're still above the base memory usage so we keep allocating

Notice that we limit the number of iterations to the number of regions necessary, which will be in the range [100;200[ (unless the region size has been tweaked in GC settings or min-heap-size != max-heap-size).

There are facilities for monitoring the number of collections using the JvmStats class, so maybe that could be an option.

I added this as an extra check such that we now exit the loop both on a GC count change and a memory usage drop. I opted to keep both to play it safe (not sure of the visibility guarantees of GC count changes). See 110925d.

…breaking

jaymode

LGTM. Thanks for iterating and entertaining my thoughts. IMO this should not go to 7.8.1. I don’t think it meets the criteria and 7.9 isn’t far off either.

henningandersen · 2020-07-13T09:22:57Z

@elasticmachine update branch

Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above the real memory circuit breaker limit and stays there for an extended period. This situation will persist until the next young GC. The circuit breaking itself hinders that from occurring in a timely manner since it breaks all request before real work is done. This commit gently nudges G1 to do a young GC and then double checks that heap usage is still above the real memory circuit breaker limit before throwing the circuit breaker exception. Related to elastic#57202

henningandersen · 2020-07-13T10:56:57Z

Thanks for reviewing @jaymode.

Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above the real memory circuit breaker limit and stays there for an extended period. This situation will persist until the next young GC. The circuit breaking itself hinders that from occurring in a timely manner since it breaks all request before real work is done. This commit gently nudges G1 to do a young GC and then double checks that heap usage is still above the real memory circuit breaker limit before throwing the circuit breaker exception. Related to #57202

henningandersen added >enhancement :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload v8.0.0 v7.8.1 v7.9.0 labels Jun 29, 2020

elasticmachine added the Team:Core/Infra Meta label for core/infra team label Jun 29, 2020

jaymode self-requested a review June 30, 2020 18:59

jaymode reviewed Jul 1, 2020

View reviewed changes

henningandersen added 3 commits July 1, 2020 21:34

Time not monotonic.

b6b565a

Add link to JDK G1 region size calculation.

c8ac1af

Rename to "over limit" rather than "double check"

1d007ca

henningandersen requested a review from jaymode July 1, 2020 20:09

Merge remote-tracking branch 'origin/master' into enhance_g1_circuit_…

66d642d

…breaking

jaymode reviewed Jul 6, 2020

View reviewed changes

henningandersen added 2 commits July 7, 2020 15:47

Keep maxHeap in field.

3eacf32

Improved logging.

addeede

jaymode reviewed Jul 7, 2020

View reviewed changes

henningandersen added 6 commits July 7, 2020 21:04

Also check young GC counts.

110925d

Move wiring into createDoubleCheckStrategy.

cfde2a7

createOverLimitStrategy naming

fea2ddc

Use Booleans.parseBoolean

845dc10

Always call the over limit strategy

3f61f93

Now determine strategy based on whether real memory usage is tracked.

Fix test stability

55a1774

Test would not always trigger the over limit check twice, fixed.

henningandersen added 2 commits July 7, 2020 23:22

Improved code formatting

ed460f9

Better field order.

7d148d0

Few line reorderings

263f474

henningandersen requested a review from jaymode July 7, 2020 21:28

henningandersen added 2 commits July 8, 2020 06:54

Merge remote-tracking branch 'origin/master' into enhance_g1_circuit_…

79bd399

…breaking

Added lock timeout.

facd321

jaymode approved these changes Jul 12, 2020

View reviewed changes

henningandersen removed the v7.8.1 label Jul 13, 2020

Merge branch 'master' into enhance_g1_circuit_breaking

14e574c

henningandersen merged commit c831f37 into elastic:master Jul 13, 2020

henningandersen added the backport pending label Jul 13, 2020

henningandersen removed the backport pending label Jul 13, 2020

jaymode mentioned this pull request Sep 18, 2020

use_real_memory setting causes GC collection issues #62278

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

DJRickyB mentioned this pull request May 6, 2022

Address memory pressure for intensive aggregations and small heaps #86531

Open

5 tasks

DJRickyB mentioned this pull request Jul 13, 2022

Parent Circuit Breaker should cause/allow memory to free before failing #88517

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance real memory circuit breaker with G1 GC #58674

Enhance real memory circuit breaker with G1 GC #58674

henningandersen commented Jun 29, 2020 •

edited

elasticmachine commented Jun 29, 2020

jaymode left a comment •

edited

jaymode Jun 30, 2020

henningandersen Jul 1, 2020

jaymode Jun 30, 2020

henningandersen Jul 1, 2020

jaymode Jun 30, 2020

henningandersen Jul 1, 2020

jaymode Jun 30, 2020

henningandersen Jul 1, 2020

jaymode Jul 1, 2020

henningandersen Jul 1, 2020

jaymode Jul 6, 2020

henningandersen commented Jul 1, 2020

henningandersen commented Jul 1, 2020

jaymode left a comment

jaymode Jul 6, 2020

henningandersen Jul 7, 2020

jaymode Jul 6, 2020

jaymode Jul 7, 2020

henningandersen Jul 7, 2020

jaymode Jul 7, 2020

henningandersen Jul 7, 2020

jaymode Jul 7, 2020

henningandersen Jul 7, 2020

henningandersen commented Jul 7, 2020

jaymode left a comment

henningandersen commented Jul 13, 2020

henningandersen commented Jul 13, 2020

Enhance real memory circuit breaker with G1 GC #58674

Enhance real memory circuit breaker with G1 GC #58674

Conversation

henningandersen commented Jun 29, 2020 • edited

elasticmachine commented Jun 29, 2020

jaymode left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henningandersen commented Jul 1, 2020

henningandersen commented Jul 1, 2020

jaymode left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henningandersen commented Jul 7, 2020

jaymode left a comment

Choose a reason for hiding this comment

henningandersen commented Jul 13, 2020

henningandersen commented Jul 13, 2020

henningandersen commented Jun 29, 2020 •

edited

jaymode left a comment •

edited