-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sporadic failures in AsyncSearchAsyncTests #53375
Conversation
Shard group failure callbacks should be executed before incrementing the total operations. This is required to ensure that we don't notify a shard group failure **after** the completion callback.
This change ensures that we set the isRunning flag to `false` when storing the initial response of an async search request.
Pinging @elastic/es-search (:Search/Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two questions
...lugin/core/src/main/java/org/elasticsearch/xpack/core/search/action/AsyncSearchResponse.java
Outdated
Show resolved
Hide resolved
@@ -83,7 +83,10 @@ public void onResponse(AsyncSearchResponse searchResponse) { | |||
onFatalFailure(searchTask, cause, false, submitListener); | |||
} else { | |||
final String docId = searchTask.getSearchId().getDocId(); | |||
store.storeInitialResponse(docId, searchTask.getOriginHeaders(), searchResponse, | |||
// creates the fallback response if the node crashes/restarts in the middle of the request | |||
// TODO: store intermediate results ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate on this TODO? does it revolve around resiliency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, yes that's one of the follow up question we have in the meta issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so, I wonder if we need the TODO in the code then, cause we are tracking this anyways elsewhere.
This change fixes a race condition in shard group failure callbacks and ensures that we set the correct flag on initial stored responses.
Relates #49931
Closes #53360