-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a way to detect unhealthy connections #5763
Conversation
Motivation: Currently, there is no extension point to detect errors for specific connections and terminate connections that are unhealthy. Related: line#5751 Modifications: - TBU Result:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in terms of correctness. Left only nit comments 👍 👍 👍
core/src/main/java/com/linecorp/armeria/client/HttpSessionHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/common/outlier/OutlierDetectionBuilder.java
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/common/outlier/OutlierDetectionBuilder.java
Show resolved
Hide resolved
* {@link OutlierDetectingRule} and {@link OutlierDetector}. | ||
*/ | ||
@UnstableApi | ||
public interface OutlierDetection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional) would OutlierDetectorFactory
be a clearer name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about the name, OutlierConfig
and OutlierContext
.
OutlierDetectorFactory
wasn't chosen since it had OutlierDetectingRule
.
I'm not strong on the name. Let's listen to other folks' opinions and choose a better one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OutlierDetection
sounds OK to me, but OutlierDetectingRule
could be renamed to OutlierRule
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OutlierDetectorFactory wasn't chosen since it had OutlierDetectingRule.
It seems like OutlierDetectingRule
is used alongside OutlierDetector
, and both are retrieved and created from this OutlierDetection
. What do you think of adding OutlierDetectingRule
to OutlierDetector
instead?
outlierDetector.rule().decide(...)
core/src/main/java/com/linecorp/armeria/common/outlier/OutlierDetectingRuleBuilder.java
Outdated
Show resolved
Hide resolved
outlierDetector.onSuccess(); | ||
// The connection was marked as an outlier, but it's back to normal. | ||
if (outlierDetector.isOutlier()) { | ||
markUnacquirable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I would still recommend users (especially those using HTTP1) to use a RetryingClient
as I think it's possible that requests that unluckily acquired this connection could be failed due to this call depending on the timing. Although I know the code path to markUnacquirable
is already exposed to the user, I feel like the timing of this call is more unpredictable with this change.
e.g. If canSendRequest = false
:
armeria/core/src/main/java/com/linecorp/armeria/client/AbstractHttpRequestHandler.java
Line 160 in 2219de1
id = session.incrementAndGetNumRequestsSent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. What do you think of explicitly executing this callback in the EventLoop to guarantee the execution order?
core/src/main/java/com/linecorp/armeria/common/outlier/OutlierDetectingRuleBuilder.java
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/client/HttpSessionHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/common/outlier/OutlierDetectionBuilder.java
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/client/ClientFactoryOptions.java
Show resolved
Hide resolved
* A builder for creating an {@link OutlierDetectingRule}. | ||
*/ | ||
@UnstableApi | ||
public final class OutlierDetectingRuleBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extend AbstractRuleBuilder
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are similar but AbstractRuleBuilder
is in the client
package and ClientRequestContext
is used while OutlierDetectingRuleBuilder
uses RequestContext
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Left a few suggestions. 👍
core/src/main/java/com/linecorp/armeria/client/HttpSessionHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/client/HttpSessionHandler.java
Outdated
Show resolved
Hide resolved
* {@link OutlierDetectingRule} and {@link OutlierDetector}. | ||
*/ | ||
@UnstableApi | ||
public interface OutlierDetection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OutlierDetectorFactory wasn't chosen since it had OutlierDetectingRule.
It seems like OutlierDetectingRule
is used alongside OutlierDetector
, and both are retrieved and created from this OutlierDetection
. What do you think of adding OutlierDetectingRule
to OutlierDetector
instead?
outlierDetector.rule().decide(...)
They are used together but the role of |
Haven't thought about this case. Then, I'm fine with the current design. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that you want to use this for xDS but I found there's a mismatch with the Envoy implementation so I've left a question. PTAL. 😉
core/src/main/java/com/linecorp/armeria/client/HttpSessionHandler.java
Outdated
Show resolved
Hide resolved
private final EventCounter counter; | ||
private final double failureRateThreshold; | ||
private final long minimumRequestThreshold; | ||
private volatile boolean isOutlier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if it once becomes an outlier, the client will never use it unlike Envoy bring it back after rejection time. Is that correct?
https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/outlier#ejection-algorithm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outlier algorithm of Envoy is designed at the host (endpoint) level. Hosts do not tend to change for a long time. So, after ejection_time, sending requests again like a circuit breaker is necessary.
To provide the same functionality as Envoy, we may create an OutlierDetectingEndpointGroup
using OutlierDetector
.
- If a host is determined as an outlier, the host is excluded from healthy endpoints for
ejection_time
. - After
ejection_time
, if the host is still in theendpoints
, a newOutlierDetector
is created for it. ejection_time
option will be provided byOutlierDetectingEndpointGroup
In fact, I implemented a similar EndpointGroup
internally for the LINE push server. It would be better to modify the implementation and add it to Armeria.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! 👍 👍 👍
Motivation:
Currently, there is no extension point to detect errors for specific connections and terminate connections that are unhealthy.
Related: #5717 #5751
API design:
OutlierDetector
is similar toCircuitBreaker
, but the state is simpler, only one direction, and optimized for ephemeral resources such as connections and streams.OutlierDetector
determines whether the target is an outlier based ononSuccess()
andonFailure()
events.OutlierDetectingRule
is used to decide whether a request fails.OutlierDetectionDecision
is the result ofOutlierDetectingRule
.OutlierDetectionDecision.FATAL
is a special result that can immediately mark the target as an outlier.Example:
Modifications:
OutlierDetector
,OutlierDetectingRule
andOutlierDetectionDecision
and their implementations tocommon.outlier
package.EventCounter
,EventCount
andSlidingWindowCounter
tocommon.util
package and exposeEventCounter
andEventCount
as public APIs to minimize duplication.SlidingWindowCounter
can be created withEventCounter.ofSlidingWindow(...)
.CircuitBreakerRule
,OutlierDetectingRule
returns a decision synchronously and is simplified to look at headers and causes. Because:OutlierDetector
inHttpSessionHandler
OutlierDetector
is added right afterHttpSessionHandler.invoke()
is called.CircuitBreakerListener.onEventCountUpdated(String,.circuitbreaker.EventCount)
has been deprecated in favor ofCircuitBreakerListener.onEventCountUpdated(String,.util.EventCount)
.ClientFactoryBuilder.connectionOutlierDetection(OutlierDetection)
to detect unhealthy connection.Result:
OutlierDetection
to detect unhealthy connections and close them gracefully.