Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sporadic failures in AsyncSearchAsyncTests #53375

Merged
merged 9 commits into from
Mar 11, 2020

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Mar 10, 2020

This change fixes a race condition in shard group failure callbacks and ensures that we set the correct flag on initial stored responses.

Relates #49931
Closes #53360

Shard group failure callbacks should be executed before incrementing
the total operations. This is required to ensure that we don't notify
a shard group failure **after** the completion callback.
This change ensures that we set the isRunning flag to `false`
when storing the initial response of an async search request.
@jimczi jimczi added >non-issue :Search/Search Search-related issues that do not fall into other categories >test-failure Triaged test failures from CI labels Mar 10, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two questions

@@ -83,7 +83,10 @@ public void onResponse(AsyncSearchResponse searchResponse) {
onFatalFailure(searchTask, cause, false, submitListener);
} else {
final String docId = searchTask.getSearchId().getDocId();
store.storeInitialResponse(docId, searchTask.getOriginHeaders(), searchResponse,
// creates the fallback response if the node crashes/restarts in the middle of the request
// TODO: store intermediate results ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate on this TODO? does it revolve around resiliency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, yes that's one of the follow up question we have in the meta issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so, I wonder if we need the TODO in the code then, cause we are tracking this anyways elsewhere.

@jimczi jimczi merged commit ab66529 into elastic:master Mar 11, 2020
@jimczi jimczi deleted the async_search_action_tests branch March 11, 2020 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search/Search Search-related issues that do not fall into other categories >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] AsyncSearchActionTests fails unpredictably
3 participants