Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teach reindex to retry on search failures #18331

Merged
merged 1 commit into from May 17, 2016

Conversation

Projects
None yet
5 participants
@nik9000
Copy link
Contributor

commented May 13, 2016

This uses the same backoff policy we use for bulk and just retries until the request isn't rejected.

Closes #18059

* Initial delay after a rejection before retrying a bulk request. With the default maxRetries the total backoff for retrying rejections
* is about one minute per bulk request. Once the entire bulk request is successful the retry counter resets.
*/
public Self setRetryBackoffInitialTime(TimeValue retryBackoffInitialTime) {

This comment has been minimized.

Copy link
@nik9000

nik9000 May 13, 2016

Author Contributor

These weren't properly exposed by the builder.

@@ -26,12 +26,11 @@

public abstract class AbstractBulkIndexByScrollRequestBuilder<
Request extends AbstractBulkIndexByScrollRequest<Request>,
Response extends BulkIndexByScrollResponse,

This comment has been minimized.

Copy link
@nik9000

nik9000 May 13, 2016

Author Contributor

This Response isn't required any more!

@nik9000 nik9000 added WIP review and removed review WIP labels May 13, 2016

@nik9000

This comment has been minimized.

Copy link
Contributor Author

commented May 13, 2016

@tlrx, want to have a look?

@nik9000

This comment has been minimized.

Copy link
Contributor Author

commented May 13, 2016

Or @dakrone !

@dakrone

View changes

...s/reindex/src/main/java/org/elasticsearch/index/reindex/AbstractAsyncBulkByScrollAction.java Outdated

@Override
public void onFailure(Throwable e) {
logger.warn("Retrying {}", e, action);

This comment has been minimized.

Copy link
@dakrone

dakrone May 13, 2016

Member

In the case of a large number of rejections, is this going to make the logs too verbose?

This comment has been minimized.

Copy link
@dakrone

dakrone May 13, 2016

Member

Also I think this should be moved into the below if statement so we don't log if we're not retrying?

@dakrone

View changes

...s/reindex/src/main/java/org/elasticsearch/index/reindex/AbstractAsyncBulkByScrollAction.java Outdated

private boolean causedByRejection(Throwable e) {
while (e != null) {
if (e instanceof EsRejectedExecutionException) {

This comment has been minimized.

Copy link
@dakrone

dakrone May 13, 2016

Member

We have code in ExceptionsHelper for this (unwrapCause I believe)

This comment has been minimized.

Copy link
@nik9000

nik9000 May 16, 2016

Author Contributor

I tried that one and it didn't unwrap through SearchPhaseExecutionException or ReduceSearchPhaseException. But ExceptionsHelper.unwrap seems to do it!

@dakrone

View changes

...s/reindex/src/main/java/org/elasticsearch/index/reindex/AbstractAsyncBulkByScrollAction.java Outdated
if (causedByRejection(e) && retries.hasNext()) {
threadPool.schedule(retries.next(), ThreadPool.Names.SAME, this);
task.countSearchRetry();
return;

This comment has been minimized.

Copy link
@dakrone

dakrone May 13, 2016

Member

I think it'd be cleaner to put the finishHim(e); into an else statement than to return early in this logic

@dakrone

This comment has been minimized.

Copy link
Member

commented May 13, 2016

Left a couple of comments but otherwise looks good!

@tlrx

View changes

modules/reindex/src/main/java/org/elasticsearch/index/reindex/BulkByScrollTask.java Outdated
@@ -208,7 +215,8 @@ public XContentBuilder innerXContent(XContentBuilder builder, Params params)
builder.field("batches", batches);
builder.field("version_conflicts", versionConflicts);
builder.field("noops", noops);
builder.field("retries", retries);
builder.field("bulk_retries", bulkRetries);
builder.field("search_retries", searchRetries);

This comment has been minimized.

Copy link
@tlrx

tlrx May 16, 2016

Member

What about :

"retries": {
  "bulk": 0,
  "search": 0,
}

Note: I tend to like JSON inner objects since clients and parsers can skip whole objects while parsing...

@nik9000

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2016

@dakrone and @tlrx I pushed a few more commits to address some comments and try and tighten up RetryTests.

@dakrone

View changes

...s/reindex/src/main/java/org/elasticsearch/index/reindex/AbstractAsyncBulkByScrollAction.java Outdated
threadPool.schedule(retries.next(), ThreadPool.Names.SAME, this);
task.countSearchRetry();
} else {
logger.warn("Giving up on search because we retries {} time without success.", e, retries);

This comment has been minimized.

Copy link
@dakrone

dakrone May 16, 2016

Member

It's super minor, but our log standardization is usually all lowercase

This comment has been minimized.

Copy link
@dakrone

dakrone May 16, 2016

Member

Also, s/retries/retried/

This comment has been minimized.

Copy link
@dakrone

dakrone May 16, 2016

Member

and I think we should leave periods out of log messages too

@dakrone

View changes

...s/reindex/src/main/java/org/elasticsearch/index/reindex/AbstractAsyncBulkByScrollAction.java Outdated
public void onFailure(Throwable e) {
if (ExceptionsHelper.unwrap(e, EsRejectedExecutionException.class) != null) {
if (retries.hasNext()) {
logger.trace("Retrying rejected search.", e);

This comment has been minimized.

Copy link
@dakrone

dakrone May 16, 2016

Member

Same here with period and lowercase

@dakrone

View changes

...s/reindex/src/main/java/org/elasticsearch/index/reindex/AbstractAsyncBulkByScrollAction.java Outdated
finishHim(e);
}
} else {
logger.warn("Giving up on search because it failed with a non-retryable exception.", e);

This comment has been minimized.

Copy link
@dakrone

dakrone May 16, 2016

Member

Same here

@@ -208,7 +215,11 @@ public XContentBuilder innerXContent(XContentBuilder builder, Params params)
builder.field("batches", batches);
builder.field("version_conflicts", versionConflicts);
builder.field("noops", noops);
builder.field("retries", retries);
builder.startObject("retries"); {

This comment has been minimized.

Copy link
@dakrone

dakrone May 16, 2016

Member

This is a breaking change now so this PR needs to be marked as "breaking" and a note added to the migration guide

@dakrone

This comment has been minimized.

Copy link
Member

commented May 16, 2016

Left a few more comments, code looks good though

@tlrx

This comment has been minimized.

Copy link
Member

commented May 17, 2016

Besides @dakrone comments, LGTM

@nik9000

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2016

@dakrone and @tlrx: I pushed the logging fixes and breaking changes docs.

@dakrone

This comment has been minimized.

Copy link
Member

commented May 17, 2016

still LGTM

@nik9000

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2016

Cool. I'll squash a merge then!

@nik9000 nik9000 force-pushed the nik9000:reindex_retry_search_failures branch 2 times, most recently May 17, 2016

Reindex should retry on search failures
This uses the same backoff policy we use for bulk and just retries until
the request isn't rejected.

Instead of `{"retries": 12}` in the response to count retries this now
looks like `{"retries": {"bulk": 12", "search": 1}`.

Closes #18059

@nik9000 nik9000 force-pushed the nik9000:reindex_retry_search_failures branch to fe4823e May 17, 2016

@nik9000 nik9000 merged commit fe4823e into elastic:master May 17, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.