New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configuring a timeout for Endpoint
selection in EndpointGroup
#4246
Conversation
Motivation: A `DynamicEndpointGroup` lazily resolves `Endpoint`s even after a request is started. Armeria clients use a connection timeout to wait for the late `Endpoint`s. https://github.com/line/armeria/blob/cc1656e6c9f7ba6b04d115e0298e90b8cf7c0971/core/src/main/java/com/linecorp/armeria/client/DefaultClientRequestContext.java#L320 The default connection timeout is 3.2 seconds that might be enough for a single `EndpointGroup`. However, `HealthCheckedEndpointGroup` wraps an `EndpointGroup` that could be a `DynamicEndpointGroup` as well. Let's say the delegate `EndpointGroup` is a DNS `EndpointGroup`. If a request is started, a DNS resolution and HTTP health check should be preceded. If a server starts, it could take more resources and 3.2 seconds that can not make sense to a complex `EndpointGroup` structure. Modifications: - Add `selectionTimeoutMillis()` to `EndpointGroup` so that a `EndpointSelector` uses the value to schedule a timeout. - Add `selectionTimeout{Millis}(long)` to `AbstractDynamicEndpointGroupBuilder` in order to override the timeout. - Set 0 to the default selection timeout for static endpoints. - Set the default connection timeout to the default selection timeout for `DynamicEndpointGrouop`. - Set the default response timeout to the default selection timeout for `HealthCheckedEndpointGroup`, `ZooKeeperEndpointGroup`, `EurekaEndpointGroup` and `ConsulDnsEndpointGroup`. - `HealthCheckedEndpointGroup` adds its `selectionTimeout` and delegate's `selectionTimeout`. - Raise `EndpointSelectionTimeoutException` if `EndpointSelctor` fails to select an `Endpoint` within a timeout. - Make `ZooKeeperEndpointGroupBuilder` and `EurekaEndpointGroupBuilder` implment `DynamicEndpointGroupSetters`. - Deprecate) `AbstractEndpointSelector.select(ctx, executor, timeout)` in favor of `AbstractEndpointSelector.select(ctx, executor)` Result: - You can configure a timeout for an `Endpoint` selection of an `DynamicEndpointGroup`. ``` DnsAddressEndpointGroup delegate = DnsAddressEndpointGroup.of(...); HealthCheckedEndpointGroup healthGroup = HealthCheckedEndpointGroup .builder(delegate, "/health") .selectionTimeout(Duration.ofSeconds(10)) // 👈👈👈 .build(); ```
Codecov Report
@@ Coverage Diff @@
## master #4246 +/- ##
============================================
+ Coverage 73.27% 73.46% +0.18%
- Complexity 17253 17340 +87
============================================
Files 1473 1475 +2
Lines 64966 65093 +127
Branches 8158 8169 +11
============================================
+ Hits 47606 47821 +215
+ Misses 13218 13117 -101
- Partials 4142 4155 +13
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good! Left some minor questions
|
||
/** | ||
* Creates a new instance. | ||
*/ | ||
protected AbstractDynamicEndpointGroupBuilder() {} | ||
protected AbstractDynamicEndpointGroupBuilder() { | ||
this(Flags.defaultConnectTimeoutMillis()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default for DnsEndpointGroupBuilder
will be connectTimeout
instead of responseTimeout
. I wanted to check if this is your intention.
If the intention is to set responseTimeout
as the default for pre-defined EndpointGroup
s in armeria
, what do you think of just removing the empty constructor?
We can catch these type of mistakes at compile-time this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response.
I think the default for DnsEndpointGroupBuilder will be connectTimeout instead of responseTimeout. I wanted to check if this is your intention.
Currently, the default select timeout for DnsEndpointGroup
is the default connection timeout.
armeria/core/src/test/java/com/linecorp/armeria/client/endpoint/SelectionTimeoutTest.java
Line 118 in af45d0b
assertSelectionTimeout(endpointGroup).isEqualTo(Flags.defaultConnectTimeoutMillis()); |
DNS servers usually return a response quickly because they cache DNS records. Therefore, 3.2 seconds may be reasonable.
By the way, I checked the default timeout for a DNS query. It is 5 seconds that can be a more sensible default timeout for `DnsEndpointGroup/
what do you think of just removing the empty constructor?
As this class is a public API, that could cause a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this class is a public API, that could cause a breaking change.
AbstractDynamicEndpointGroupBuilder is part of unstable APIs. Will remove the default constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds reasonable to use the same timeout for DnsEndpointGroup
and our DNS resolver.
* ... | ||
* .build(); | ||
* assert endpointGroup.selectionTimeoutMillis() == 13000; | ||
* }</pre> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; what do you think of adding an explanation that setting this will set both selectionTimeoutMillis
and initialSelectionTimeoutMillis
|
||
/** | ||
* Creates a new instance. | ||
* Creates a new instance with the default {@code selectionTimeoutMillis}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Creates a new instance with the default {@code selectionTimeoutMillis}. | |
* Creates a new instance with the specified default {@code selectionTimeoutMillis}. |
.../src/main/java/com/linecorp/armeria/client/endpoint/AbstractDynamicEndpointGroupBuilder.java
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/client/endpoint/AbstractEndpointSelector.java
Outdated
Show resolved
Hide resolved
@@ -143,6 +144,13 @@ static EndpointGroup of(EndpointSelectionStrategy selectionStrategy, | |||
@Override | |||
Endpoint selectNow(ClientRequestContext ctx); | |||
|
|||
/** | |||
* Returns the timeout to wait until a successful {@link Endpoint} selection. | |||
* {@code 0} means {@link #selectNow(ClientRequestContext)} should always return an {@link Endpoint}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also document what happens on the initial selection attempt? Will it wait indefinitely or just return null
?
core/src/main/java/com/linecorp/armeria/client/endpoint/dns/DnsEndpointGroupBuilder.java
Outdated
Show resolved
Hide resolved
.../linecorp/armeria/client/endpoint/healthcheck/AbstractHealthCheckedEndpointGroupBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/client/dns/DnsUtil.java
Outdated
Show resolved
Hide resolved
.../com/linecorp/armeria/client/endpoint/healthcheck/HealthCheckedEndpointGroupBuilderTest.java
Show resolved
Hide resolved
...corp/armeria/client/endpoint/healthcheck/HealthCheckedEndpointGroupSelectionTimeoutTest.java
Outdated
Show resolved
Hide resolved
.../linecorp/armeria/client/endpoint/healthcheck/AbstractHealthCheckedEndpointGroupBuilder.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
LGTM once @trustin's comments are addressed. 😉
@@ -73,25 +82,41 @@ public final CompletableFuture<Endpoint> select(ClientRequestContext ctx, | |||
return UnmodifiableFuture.completedFuture(endpoint); | |||
} | |||
|
|||
final long selectionTimeoutMillis = endpointGroup.selectionTimeoutMillis(); | |||
if (selectionTimeoutMillis == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would it be better to use -1
to indicate StaticEndpointGroup
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to differentiate the timeout of StaticEndpointGroup
from others.
A DynamincEndpointGroup
may have 0 selection timeout if it is made from a static resource such as a file or an environment variable.
core/src/main/java/com/linecorp/armeria/client/endpoint/EndpointGroup.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment. Looks great otherwise!
* If unspecified, {@link Flags#defaultConnectTimeoutMillis()} is used by default. | ||
*/ | ||
@UnstableApi | ||
default DynamicEndpointGroupSetters selectionTimeout(Duration selectionTimeout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
selectionTimeout
or selectTimeout
? I feel like selectionTimeout
is correct, but we have connectTimeout
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably selectionTimeout
is better here because we have EndpointSelectionTimeoutException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, @ikhoon 🙇
Motivation:
A
DynamicEndpointGroup
lazily resolvesEndpoint
s even after arequest is started. Armeria clients use a connection timeout to wait for
the late
Endpoint
s.armeria/core/src/main/java/com/linecorp/armeria/client/DefaultClientRequestContext.java
Line 320 in cc1656e
The default connection timeout is 3.2 seconds which might be enough for a
single
EndpointGroup
.However,
HealthCheckedEndpointGroup
wraps anEndpointGroup
thatcould be a
DynamicEndpointGroup
as well. Let's say the delegateEndpointGroup
is a DNSEndpointGroup
. If a request is started, a DNSresolution and HTTP health check should be preceded. If a server
starts, it could take more resources and 3.2 seconds that can not make
sense to a complex
EndpointGroup
structure.Modifications:
selectionTimeoutMillis()
toEndpointGroup
so that aEndpointSelector
uses the value to schedule a timeout.selectionTimeout{Millis}(long)
toAbstractDynamicEndpointGroupBuilder
in order to override thetimeout.
for
DynamicEndpointGrouop
.for
HealthCheckedEndpointGroup
,ZooKeeperEndpointGroup
,EurekaEndpointGroup
andConsulDnsEndpointGroup
.HealthCheckedEndpointGroup
adds itsselectionTimeout
anddelegate's
selectionTimeout
.EndpointSelectionTimeoutException
ifEndpointSelctor
failsto select an
Endpoint
within a timeout.ZooKeeperEndpointGroupBuilder
andEurekaEndpointGroupBuilder
implment
DynamicEndpointGroupSetters
.AbstractEndpointSelector.select(ctx, executor, timeout)
in favor of
AbstractEndpointSelector.select(ctx, executor)
Result:
Endpoint
selection of anDynamicEndpointGroup
.