-
Notifications
You must be signed in to change notification settings - Fork 275
Transport level retry #954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments on implementation independence and async tasks.
Adding this to TransportOptions
is an interesting choice, as it allows using different retry policies with a single transport.
import java.util.Iterator; | ||
import java.util.concurrent.CompletableFuture; | ||
|
||
public class RetryRestClientHttpClient implements TransportHttpClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should be co.elastic.transport.http.RetryingHttpClient
and be independent from org.elasticsearch.client
so that it can be independent of the http client implementation (see also below on exception handling)
try { | ||
return delegate.performRequest(endpointId, node, request, options); | ||
} catch (ResponseException e) { | ||
if (e.getResponse().getStatusLine().getStatusCode() == 503) { // TODO list of statuses, configurable or hardcoded? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be independent of the http client here, and the fact that we have an exception with an actual response is specific to RestClient
, because it has its own internal multi-node retry logic and because 503
is not part of the "ignore"
setting.
So the logic could be:
- if we have a successful response, check its status code and retry on
5xx
(will not happen with RestClient, but will happen with less smart implementations) - if we have an exception, retry.
return performRequestRetry(endpointId, node, request, options, backoffPolicy.iterator()); | ||
} | ||
|
||
public Response performRequestRetry(String endpointId, @Nullable Node node, Request request, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a separate public method?
try { | ||
Thread.sleep(backoffIter.next()); | ||
} catch (InterruptedException ie) { | ||
throw e; // TODO okay with masking IE and just returning original exception? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, InterruptedException
indicates that someone called Thread.interrupt()
which isn't really useful.
if (((ResponseException) e).getResponse().getStatusLine().getStatusCode() == 503) { // TODO list of statuses, configurable or hardcoded? | ||
if (backoffIter.hasNext()) { | ||
try { | ||
Thread.sleep(backoffIter.next()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid calling Thread.sleep
in asynchronous code. It may cause all threads of the calling thread pool to be blocked and prevent other tasks to make progress.
A caveat of Timer
is that it uses a single thread to run its tasks, but this should be good enough here, as the code to be run when the timer triggers is small (restarting the async request or propagating an exception).
Adds new retry functionality to the client, configurable in the Transport options like so:
Some doubts remaining in the form of TODOs in the code. (they break checkstyle ignore it for now)
Unit tests available in TransportTest.class