Skip to content

Conversation

l-trotta
Copy link
Contributor

@l-trotta l-trotta commented Mar 3, 2025

Adds new retry functionality to the client, configurable in the Transport options like so:

        RestClient restClient = RestClient
            .builder(new HttpHost(address.getHostString(), address.getPort(), "http"))
            .build();

        // setting transport option
        RestClientOptions options = new RestClientOptions(RequestOptions.DEFAULT, false,
            BackoffPolicy.constantBackoff(50L, 8));

        ElasticsearchTransport transport = new RestClientTransport(
            restClient, new JacksonJsonpMapper(), options);

        ElasticsearchAsyncClient esClient = new ElasticsearchAsyncClient(transport);

Some doubts remaining in the form of TODOs in the code. (they break checkstyle ignore it for now)

Unit tests available in TransportTest.class

@l-trotta l-trotta requested a review from swallez March 3, 2025 17:00
Copy link
Member

@swallez swallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments on implementation independence and async tasks.

Adding this to TransportOptions is an interesting choice, as it allows using different retry policies with a single transport.

import java.util.Iterator;
import java.util.concurrent.CompletableFuture;

public class RetryRestClientHttpClient implements TransportHttpClient {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should be co.elastic.transport.http.RetryingHttpClient and be independent from org.elasticsearch.client so that it can be independent of the http client implementation (see also below on exception handling)

try {
return delegate.performRequest(endpointId, node, request, options);
} catch (ResponseException e) {
if (e.getResponse().getStatusLine().getStatusCode() == 503) { // TODO list of statuses, configurable or hardcoded?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be independent of the http client here, and the fact that we have an exception with an actual response is specific to RestClient, because it has its own internal multi-node retry logic and because 503 is not part of the "ignore" setting.

So the logic could be:

  • if we have a successful response, check its status code and retry on 5xx (will not happen with RestClient, but will happen with less smart implementations)
  • if we have an exception, retry.

return performRequestRetry(endpointId, node, request, options, backoffPolicy.iterator());
}

public Response performRequestRetry(String endpointId, @Nullable Node node, Request request,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate public method?

try {
Thread.sleep(backoffIter.next());
} catch (InterruptedException ie) {
throw e; // TODO okay with masking IE and just returning original exception?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, InterruptedException indicates that someone called Thread.interrupt() which isn't really useful.

if (((ResponseException) e).getResponse().getStatusLine().getStatusCode() == 503) { // TODO list of statuses, configurable or hardcoded?
if (backoffIter.hasNext()) {
try {
Thread.sleep(backoffIter.next());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid calling Thread.sleep in asynchronous code. It may cause all threads of the calling thread pool to be blocked and prevent other tasks to make progress.

A caveat of Timer is that it uses a single thread to run its tasks, but this should be good enough here, as the code to be run when the timer triggers is small (restarting the async request or propagating an exception).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants