Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve circuit breaker to reject excess attempts in half-open #323

Conversation

Ladicek
Copy link
Contributor

@Ladicek Ladicek commented Nov 26, 2020

Previously, circuit breakers in the half-open state would let
all invocations go through. In case of any failure, it would
move back to open, and in case of a configured number of consecutive
successes, it would move to closed.

This is fine and passes the TCK, but under high load, this can
cause a thundering herd problem on the guarded service.

A better solution, implemented by this commit, is: in half-open,
only allow the first successThreshold invocations to go through
(also known as "probe invocations"), and reject the rest. If any
of those probe invocations fail, move back to open; if all of them
succeed, move to closed.

Fixes #319

@Ladicek Ladicek linked an issue Nov 26, 2020 that may be closed by this pull request
@Ladicek
Copy link
Contributor Author

Ladicek commented Nov 26, 2020

Weird, that test passes for me just fine locally. Need to investigate.

@Ladicek
Copy link
Contributor Author

Ladicek commented Nov 26, 2020

OK, it does fail fairly often when I run the test repeatedly.

@Ladicek
Copy link
Contributor Author

Ladicek commented Nov 26, 2020

Got it, it's a concurrency issue that causes an assertion to fail, which causes executor shutdown, which causes thread interruptions, which manifests as a weird test failure :-)

@Ladicek
Copy link
Contributor Author

Ladicek commented Nov 26, 2020

Actually we've had the concurrency issue several times before, always solved it with using a pair of CountDownLatches instead of just one. I got tired of that, so I built a simple abstraction over that called Party (yay! 🎈 🎉 🍾).

Previously, circuit breakers in the half-open state would let
all invocations go through. In case of any failure, it would
move back to open, and in case of a configured number of consecutive
successes, it would move to closed.

This is fine and passes the TCK, but under high load, this can
cause a thundering herd problem on the guarded service.

A better solution, implemented by this commit, is: in half-open,
only allow the first `successThreshold` invocations to go through
(also known as "probe invocations"), and reject the rest. If any
of those probe invocations fail, move back to open; if all of them
succeed, move to closed.
@Ladicek Ladicek force-pushed the half-open-circuit-breaker-reject-excess-attempts branch from 30650c1 to ac5419f Compare November 26, 2020 13:41
@Ladicek Ladicek merged commit 7695b54 into smallrye:next Nov 26, 2020
@Ladicek Ladicek deleted the half-open-circuit-breaker-reject-excess-attempts branch November 26, 2020 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

investigate handling of half-open circuit breakers
1 participant