New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance real memory circuit breaker with G1 GC #58674
Enhance real memory circuit breaker with G1 GC #58674
Conversation
Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above the real memory circuit breaker limit and stays there for an extended period. This situation will persist until the next young GC. The circuit breaking itself hinders that from occurring in a timely manner since it breaks all request before real work is done. This commit gently nudges G1 to do a young GC and then double checks that heap usage is still above the real memory circuit breaker limit before throwing the circuit breaker exception. Related to elastic#57202
Pinging @elastic/es-core-infra (:Core/Infra/Circuit Breakers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence here of whether this is the right approach. I wonder if it might be "safer" to attempt a System.gc()
call instead since we're already going to break the request, so we could pay for a stop the world to allow more requests? There are other issues with that though such as what should we do when that GC call is set to run concurrently or disabled completely via JVM options. I'd hate to trigger an OOM from trying to get the GC to run using the approach in the PR
} | ||
} | ||
|
||
interface DoubleCheckStrategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe name it OverLimitStrategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, see 1d007ca
} | ||
|
||
static long fallbackRegionSize(JvmInfo jvmInfo) { | ||
// mimick JDK calculation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a link or reference to this calculation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in c8ac1af
blackHole += localBlackHole; | ||
logger.trace("black hole [{}]", blackHole); | ||
long now = timeSupplier.getAsLong(); | ||
assert now > this.lastCheckTime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately neither System.currentTimeMillis()
nor System.nanoTime()
are always monotonic so now
could be less than the last checked time so I do not believe that this assert should be here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, removed in b6b565a
// we observed a memory drop, so some GC must have occurred | ||
break; | ||
} | ||
localBlackHole += new byte[allocationSize].hashCode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible for this to trigger an OOM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no.
In theory yes, if there is really no collectible heap left.
But if that was the case, just creating the CircuitBreakingException
poses the same risk. And if we are that close, we are doomed anyway I think. The chances of us having a workload at exactly 99.95 percent heap (corresponding to approximately 2000 regions) and surviving is so small that even if it was the case, the next time round we enter the same workload it would fall over.
Notice that we only need 1 region of space free or collectible space for this to succeed.
public MemoryUsage doubleCheck(MemoryUsage memoryUsed) { | ||
long maxHeap = JvmInfo.jvmInfo().getMem().getHeapMax().getBytes(); | ||
boolean leader; | ||
synchronized (lock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to use lock
over this
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the locality of the usage, not really, it is just a personal preference since it avoids thinking about external synchronization on this. I am OK turning it into this
if you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with it; just curious.
In particular if this does full GC, it could be bad for system stability due to the time this could take. We could utilize |
I followed up on using
(last line output by my application invoking I suppose we could do the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular if this does full GC, it could be bad for system stability due to the time this could take.
The only way this option should be on the table is with a concurrent cycle that is triggered by a dedicated thread. The issue there is how often should we attempt the call and how long do we wait for the GC to finish or poll memory used to decrease?
My concern with the approach taken is that we allocate and hope to catch the memory drop but we could miss a GC cycle occurring since this is a concurrent system and other allocations could be happening elsewhere and if we're still above the base memory usage so we keep allocating. There are facilities for monitoring the number of collections using the JvmStats class, so maybe that could be an option.
|
||
@Override | ||
public MemoryUsage overLimit(MemoryUsage memoryUsed) { | ||
long maxHeap = JvmInfo.jvmInfo().getMem().getHeapMax().getBytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this appears to be a consistent value, maybe we just keep it as a final long that is a class member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 3eacf32
public MemoryUsage doubleCheck(MemoryUsage memoryUsed) { | ||
long maxHeap = JvmInfo.jvmInfo().getMem().getHeapMax().getBytes(); | ||
boolean leader; | ||
synchronized (lock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with it; just curious.
@@ -290,7 +308,7 @@ public long getParentLimit() { | |||
public void checkParentLimit(long newBytesReserved, String label) throws CircuitBreakingException { | |||
final MemoryUsage memoryUsed = memoryUsed(newBytesReserved); | |||
long parentLimit = this.parentSettings.getLimit(); | |||
if (memoryUsed.totalUsage > parentLimit) { | |||
if (memoryUsed.totalUsage > parentLimit && doubleCheckMemoryUsed(memoryUsed).totalUsage > parentLimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the method call can be replaced with overLimitStrategy.apply(memoryUsed)
and the construction of the OverLimitStrategy will be responsible for handling the behavior of what to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done in 3f61f93
server/src/main/java/org/elasticsearch/indices/breaker/HierarchyCircuitBreakerService.java
Outdated
Show resolved
Hide resolved
|
||
private static OverLimitStrategy createDoubleCheckStrategy(JvmInfo jvmInfo, LongSupplier currentMemoryUsageSupplier, | ||
LongSupplier timeSupplier, long minimumInterval) { | ||
if (jvmInfo.useG1GC().equals("true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In line with an earlier comment, we could pass in trackRealMemoryUsage
to this method and add it to this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also part of 3f61f93
LongSupplier timeSupplier, long minimumInterval) { | ||
if (jvmInfo.useG1GC().equals("true") | ||
// messing with GC is "dangerous" so we apply an escape hatch. Not intended to be used. | ||
&& Boolean.parseBoolean(System.getProperty("es.real_memory_circuit_breaker.g1.double_check.enabled", "true"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind using Booleans.parseBoolean(System.getProperty("es.real_memory_circuit_breaker.g1.double_check.enabled"), true)
? The java Boolean parsing is pretty lenient and I thought it was a forbidden api at some point :|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed in 845dc10
Now determine strategy based on whether real memory usage is tracked.
Test would not always trigger the over limit check twice, fixed.
Notice that we limit the number of iterations to the number of regions necessary, which will be in the range
I added this as an extra check such that we now exit the loop both on a GC count change and a memory usage drop. I opted to keep both to play it safe (not sure of the visibility guarantees of GC count changes). See 110925d. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for iterating and entertaining my thoughts. IMO this should not go to 7.8.1. I don’t think it meets the criteria and 7.9 isn’t far off either.
@elasticmachine update branch |
Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above the real memory circuit breaker limit and stays there for an extended period. This situation will persist until the next young GC. The circuit breaking itself hinders that from occurring in a timely manner since it breaks all request before real work is done. This commit gently nudges G1 to do a young GC and then double checks that heap usage is still above the real memory circuit breaker limit before throwing the circuit breaker exception. Related to elastic#57202
Thanks for reviewing @jaymode. |
Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above the real memory circuit breaker limit and stays there for an extended period. This situation will persist until the next young GC. The circuit breaking itself hinders that from occurring in a timely manner since it breaks all request before real work is done. This commit gently nudges G1 to do a young GC and then double checks that heap usage is still above the real memory circuit breaker limit before throwing the circuit breaker exception. Related to #57202
Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above
the real memory circuit breaker limit and stays there for an extended
period. This situation will persist until the next young GC. The circuit
breaking itself hinders that from occurring in a timely manner since it
breaks all request before real work is done.
This commit gently nudges G1 to do a young GC and then double checks
that heap usage is still above the real memory circuit breaker limit
before throwing the circuit breaker exception.
Related to #57202
Reviewers: please also consider whether this should go to 7.8.1.
The overhead of triggering the GC is typically 1 ms when no concurrent cycle is running or around 10-20 ms on an 8GB heap and 20-40 ms on a 16GB heap (on my laptop). In addition to this comes a single young GC of 10-30 ms. On a bigger box, I get around 20-70 ms total overhead (time to trigger GC plus GC time) of 20-70 ms on a 30GB heap.