Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance real memory circuit breaker with G1 GC #58674

Merged

Conversation

henningandersen
Copy link
Contributor

@henningandersen henningandersen commented Jun 29, 2020

Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above
the real memory circuit breaker limit and stays there for an extended
period. This situation will persist until the next young GC. The circuit
breaking itself hinders that from occurring in a timely manner since it
breaks all request before real work is done.

This commit gently nudges G1 to do a young GC and then double checks
that heap usage is still above the real memory circuit breaker limit
before throwing the circuit breaker exception.

Related to #57202

Reviewers: please also consider whether this should go to 7.8.1.

The overhead of triggering the GC is typically 1 ms when no concurrent cycle is running or around 10-20 ms on an 8GB heap and 20-40 ms on a 16GB heap (on my laptop). In addition to this comes a single young GC of 10-30 ms. On a bigger box, I get around 20-70 ms total overhead (time to trigger GC plus GC time) of 20-70 ms on a 30GB heap.

Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above
the real memory circuit breaker limit and stays there for an extended
period. This situation will persist until the next young GC. The circuit
breaking itself hinders that from occurring in a timely manner since it
breaks all request before real work is done.

This commit gently nudges G1 to do a young GC and then double checks
that heap usage is still above the real memory circuit breaker limit
before throwing the circuit breaker exception.

Related to elastic#57202
@henningandersen henningandersen added >enhancement :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload v8.0.0 v7.8.1 v7.9.0 labels Jun 29, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Circuit Breakers)

@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Jun 29, 2020
@jaymode jaymode self-requested a review June 30, 2020 18:59
Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence here of whether this is the right approach. I wonder if it might be "safer" to attempt a System.gc() call instead since we're already going to break the request, so we could pay for a stop the world to allow more requests? There are other issues with that though such as what should we do when that GC call is set to run concurrently or disabled completely via JVM options. I'd hate to trigger an OOM from trying to get the GC to run using the approach in the PR

}
}

interface DoubleCheckStrategy {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe name it OverLimitStrategy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, see 1d007ca

}

static long fallbackRegionSize(JvmInfo jvmInfo) {
// mimick JDK calculation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a link or reference to this calculation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in c8ac1af

blackHole += localBlackHole;
logger.trace("black hole [{}]", blackHole);
long now = timeSupplier.getAsLong();
assert now > this.lastCheckTime;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately neither System.currentTimeMillis() nor System.nanoTime() are always monotonic so now could be less than the last checked time so I do not believe that this assert should be here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed in b6b565a

// we observed a memory drop, so some GC must have occurred
break;
}
localBlackHole += new byte[allocationSize].hashCode();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible for this to trigger an OOM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no.

In theory yes, if there is really no collectible heap left.

But if that was the case, just creating the CircuitBreakingException poses the same risk. And if we are that close, we are doomed anyway I think. The chances of us having a workload at exactly 99.95 percent heap (corresponding to approximately 2000 regions) and surviving is so small that even if it was the case, the next time round we enter the same workload it would fall over.

Notice that we only need 1 region of space free or collectible space for this to succeed.

public MemoryUsage doubleCheck(MemoryUsage memoryUsed) {
long maxHeap = JvmInfo.jvmInfo().getMem().getHeapMax().getBytes();
boolean leader;
synchronized (lock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use lock over this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the locality of the usage, not really, it is just a personal preference since it avoids thinking about external synchronization on this. I am OK turning it into this if you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it; just curious.

@henningandersen
Copy link
Contributor Author

I wonder if it might be "safer" to attempt a System.gc() call

In particular if this does full GC, it could be bad for system stability due to the time this could take.

We could utilize -XX:+ExplicitGCInvokesConcurrent option and make bootstrap checks to ensure it is on for G1. I think it triggers a concurrent GC which is an additional unnecessary overhead. I would also need to dig a bit on the JDK to investigate what the option really does (must do young GC).

@henningandersen
Copy link
Contributor Author

I followed up on using ExplicitGCInvokesConcurrent. The System.gc() call does do a young collection followed by a concurrent cycle. But unfortunately, System.gc() call does not return until the concurrent cycle ends:

[2020-07-01T20:54:09.411+0000][4477][gc     ] GC(12) Pause Young (Concurrent Start) (System.gc()) 7791M->2989M(8192M) 18.242ms
[2020-07-01T20:54:09.412+0000][4477][gc     ] GC(13) Concurrent Cycle
[2020-07-01T20:54:10.467+0000][4477][gc     ] GC(13) Pause Remark 2989M->2989M(8192M) 0.535ms
[2020-07-01T20:54:10.835+0000][4477][gc     ] GC(13) Pause Cleanup 2989M->2989M(8192M) 0.349ms
[2020-07-01T20:54:10.839+0000][4477][gc     ] GC(13) Concurrent Cycle 1427.709ms
GC duration: 1446

(last line output by my application invoking System.gc(), in milliseconds)

I suppose we could do the System.gc() in a separate thread and then poll whether the amount of memory used dropped every 10 ms or so.

Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular if this does full GC, it could be bad for system stability due to the time this could take.

The only way this option should be on the table is with a concurrent cycle that is triggered by a dedicated thread. The issue there is how often should we attempt the call and how long do we wait for the GC to finish or poll memory used to decrease?

My concern with the approach taken is that we allocate and hope to catch the memory drop but we could miss a GC cycle occurring since this is a concurrent system and other allocations could be happening elsewhere and if we're still above the base memory usage so we keep allocating. There are facilities for monitoring the number of collections using the JvmStats class, so maybe that could be an option.


@Override
public MemoryUsage overLimit(MemoryUsage memoryUsed) {
long maxHeap = JvmInfo.jvmInfo().getMem().getHeapMax().getBytes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this appears to be a consistent value, maybe we just keep it as a final long that is a class member.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3eacf32

public MemoryUsage doubleCheck(MemoryUsage memoryUsed) {
long maxHeap = JvmInfo.jvmInfo().getMem().getHeapMax().getBytes();
boolean leader;
synchronized (lock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it; just curious.

@@ -290,7 +308,7 @@ public long getParentLimit() {
public void checkParentLimit(long newBytesReserved, String label) throws CircuitBreakingException {
final MemoryUsage memoryUsed = memoryUsed(newBytesReserved);
long parentLimit = this.parentSettings.getLimit();
if (memoryUsed.totalUsage > parentLimit) {
if (memoryUsed.totalUsage > parentLimit && doubleCheckMemoryUsed(memoryUsed).totalUsage > parentLimit) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the method call can be replaced with overLimitStrategy.apply(memoryUsed) and the construction of the OverLimitStrategy will be responsible for handling the behavior of what to do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done in 3f61f93


private static OverLimitStrategy createDoubleCheckStrategy(JvmInfo jvmInfo, LongSupplier currentMemoryUsageSupplier,
LongSupplier timeSupplier, long minimumInterval) {
if (jvmInfo.useG1GC().equals("true")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line with an earlier comment, we could pass in trackRealMemoryUsage to this method and add it to this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also part of 3f61f93

LongSupplier timeSupplier, long minimumInterval) {
if (jvmInfo.useG1GC().equals("true")
// messing with GC is "dangerous" so we apply an escape hatch. Not intended to be used.
&& Boolean.parseBoolean(System.getProperty("es.real_memory_circuit_breaker.g1.double_check.enabled", "true"))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind using Booleans.parseBoolean(System.getProperty("es.real_memory_circuit_breaker.g1.double_check.enabled"), true)? The java Boolean parsing is pretty lenient and I thought it was a forbidden api at some point :|

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in 845dc10

@henningandersen
Copy link
Contributor Author

if we're still above the base memory usage so we keep allocating

Notice that we limit the number of iterations to the number of regions necessary, which will be in the range [100;200[ (unless the region size has been tweaked in GC settings or min-heap-size != max-heap-size).

There are facilities for monitoring the number of collections using the JvmStats class, so maybe that could be an option.

I added this as an extra check such that we now exit the loop both on a GC count change and a memory usage drop. I opted to keep both to play it safe (not sure of the visibility guarantees of GC count changes). See 110925d.

Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for iterating and entertaining my thoughts. IMO this should not go to 7.8.1. I don’t think it meets the criteria and 7.9 isn’t far off either.

@henningandersen
Copy link
Contributor Author

@elasticmachine update branch

@henningandersen henningandersen merged commit c831f37 into elastic:master Jul 13, 2020
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Jul 13, 2020
Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above
the real memory circuit breaker limit and stays there for an extended
period. This situation will persist until the next young GC. The circuit
breaking itself hinders that from occurring in a timely manner since it
breaks all request before real work is done.

This commit gently nudges G1 to do a young GC and then double checks
that heap usage is still above the real memory circuit breaker limit
before throwing the circuit breaker exception.

Related to elastic#57202
@henningandersen
Copy link
Contributor Author

Thanks for reviewing @jaymode.

henningandersen added a commit that referenced this pull request Jul 13, 2020
Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above
the real memory circuit breaker limit and stays there for an extended
period. This situation will persist until the next young GC. The circuit
breaking itself hinders that from occurring in a timely manner since it
breaks all request before real work is done.

This commit gently nudges G1 to do a young GC and then double checks
that heap usage is still above the real memory circuit breaker limit
before throwing the circuit breaker exception.

Related to #57202
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload >enhancement Team:Core/Infra Meta label for core/infra team v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants