-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Transform] Retry Destination IndexNotFoundException #108394
Conversation
An Index can be removed from its previous shard in the middle of a Transform run. Ideally, this happens as part of the Delete API, and the Transform has already been stopped, but in the case that it isn't, we want to retry the checkpoint. If the Transform had been stopped, the retry will move the Indexer into a graceful shutdown. If the Transform had not been stopped, the retry will check if the Index exists or recreate the Index if it does not exist. This is currently how unattended Transforms work, and this change will make it so regular Transforms can also auto-recover from this error. Fix elastic#107263
Hi @prwhelan, I've created a changelog YAML for you. |
Pinging @elastic/ml-core (Team:ML) |
...sform/src/main/java/org/elasticsearch/xpack/transform/transforms/ClientTransformIndexer.java
Outdated
Show resolved
Hide resolved
@@ -63,7 +64,7 @@ public static Throwable getFirstIrrecoverableExceptionFromBulkResponses(Collecti | |||
} | |||
|
|||
if (unwrappedThrowable instanceof ElasticsearchException elasticsearchException) { | |||
if (isExceptionIrrecoverable(elasticsearchException)) { | |||
if (isExceptionIrrecoverable(elasticsearchException) && isNotIndexNotFoundException(elasticsearchException)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we no longer consider IndexNotFoundException
irrecoverable, could we modify the isExceptionIrrecoverable
method to reflect that instead of adding the isNotIndexNotFoundException
method?
Or the distinction is still needed for cases other than bulk responses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question - at least for now I was thinking of just handling it for the destination index? The source index can still fail with IndexNotFoundException (if PIT == null): https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/ClientTransformIndexer.java#L602
I haven't seen this same type of intermittent failure scenario though, where the source index gets relocated to a new shard and the request fails on the old shard, so I'm not sure if we just haven't seen it yet or if search behaves differently where the request will automatically reroute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we just haven't seen it yet or if search behaves differently where the request will automatically reroute.
I don't know either unfortunately :(
But since we are going in a direction to make more errors recoverable (and more transforms unattended) it could make sense to make IndexNotFoundException
recoverable.
But maybe let's not do it in this PR if you don't feel comfortable.
We can leave this PR for dealing with destination index only.
Please make it clear in the PR title/description.
...orm/src/test/java/org/elasticsearch/xpack/transform/utils/ExceptionRootCauseFinderTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -63,7 +64,7 @@ public static Throwable getFirstIrrecoverableExceptionFromBulkResponses(Collecti | |||
} | |||
|
|||
if (unwrappedThrowable instanceof ElasticsearchException elasticsearchException) { | |||
if (isExceptionIrrecoverable(elasticsearchException)) { | |||
if (isExceptionIrrecoverable(elasticsearchException) && isNotIndexNotFoundException(elasticsearchException)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we just haven't seen it yet or if search behaves differently where the request will automatically reroute.
I don't know either unfortunately :(
But since we are going in a direction to make more errors recoverable (and more transforms unattended) it could make sense to make IndexNotFoundException
recoverable.
But maybe let's not do it in this PR if you don't feel comfortable.
We can leave this PR for dealing with destination index only.
Please make it clear in the PR title/description.
An Index can be removed from its previous shard in the middle of a Transform run. Ideally, this happens as part of the Delete API, and the Transform has already been stopped, but in the case that it isn't, we want to retry the checkpoint.
If the Transform had been stopped, the retry will move the Indexer into a graceful shutdown.
If the Transform had not been stopped, the retry will check if the Index exists or recreate the Index if it does not exist.
This is currently how unattended Transforms work, and this change will make it so regular Transforms can also auto-recover from this error.
Fix #107263