-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java.lang.StackOverflowError due to infinite loop in CircuitBreakerStateMachine$OpenState.tryAcquirePermission #2038
Comments
Hi, |
@polkosity I think the problem is here Line 812 in 12f66a3
where @RobWin what do you think? |
One more thing: if we decide to remove |
Hi
Therefore, thread with future interrupted after |
Also being affected by this in a couple of systems. Not sure how we can help in supplying data. |
I created pull request with fix. Also I removed unnecessary |
I found what we cannot remove |
@fatso83 do you really need to enable automaticTransitionFromOpenToHalfOpen? I assume we didn't find this issue earlier, because most people don't use it. You could disable it temporary until it is fixed. |
@Hartigan Thanks for you awesome analysis. |
@RobWin Thanks for the hint! This worked for us. We are quite a big org where we had a wrapper lib around resilience4j where this was enabled by default. Disabling it worked. |
Should not this be closed by #2072, now that it is merged?
@RobWin We were discussing this at work and we don't really understand how/why you believe the library is being used with the automatic state handling disabled. I am guessing this is down to some misunderstanding (on our part, most probably 😄). I would think automatic handling is something most people would want? Otherwise one has to manually implement some kind of logic to close the circuit. Something seems off. Right now it seems like we need to listen for state transitions and create a Timer that asks to close the circuit x seconds into the future. This seems a bit simplistic. What are we missing here? This is just us trying to learn more/understand more, since we think we are missing out on some obvious fact. |
@fatso83 The CircuitBreaker always automatically switches from open to half-open when a new request is processed. But it could mean that the state is shown as OPEN for a longer period of time until a new request arrives. Which confused some people who where looking at the CircuitBreaker state in systems which didn't have any traffic. But from the functional point of view there is no differences, except that it introduces problems right now :( |
Aha, thank you. That also explains why it was an issue by us, as we trigger alerts when the circuit breakers are open. So to have the circuit breaker alerts not be warning us all night we would need this. Makes sense, thanks! |
Well, If you have a production system which has not traffic at night, which would change the state automatically from open to half-open, why have warnings at night at all ;P |
Seems the issue is not yet fixed. |
Resilience4j version: 2.1.0
Java version: any
I've encountered java.lang.StackOverflowError due to infinite loop in CircuitBreakerStateMachine$OpenState.tryAcquirePermission in a production server, which was under strain.
After reviewing the code, I suggest it's caused by failure in the AtomicReference getAndUpdate call (CircuitBreakerStateMachine.java:333), as such failures can occurr due to contention among threads, according to Javadoc. As an aside, Javadoc also states that the updateFunction should be side-effect free, which is not the case here.
Once the updateFunction fails during an attempted transition from open to half open state, the AtomicReference CircuitBreakerStateMachine.stateReference still refers to an OpenState whose AtomicBoolean OpenState.isOpen has already been set to false (CircuitBreakerStateMachine.java:812), so the transitionToHalfOpenState() call will not be attempted again, additionally retryAfterWaitDuration is already in the past for this OpenState object. The effect is that OpenState.tryAcquirePermission calls itself recursively (CircuitBreakerStateMachine.java:737) until the StackOverflowError occurs.
Please see the attached screenshot displaying a partial stack trace of the StackOverflowError, which was generated under resilience4j-circuitbreaker v1.7.1, but the relevant code sections have not changed since then.
The text was updated successfully, but these errors were encountered: