Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not scroll if max docs is less than scroll size (update/delete by query) #81654

Merged
merged 24 commits into from
Dec 21, 2021

Conversation

idegtiarenko
Copy link
Contributor

This pr prevents opening scroll when update/delete by query is configured with max_docs that is less then a scroll size.

Closes: #54270

@idegtiarenko idegtiarenko added >enhancement v8.0.0 :Distributed/Reindex Issues relating to reindex that are not caused by issues further down Team:Distributed Meta label for distributed team v8.1.0 v7.16.2 labels Dec 13, 2021
@idegtiarenko
Copy link
Contributor Author

I am looking into updating docs for this change

@idegtiarenko idegtiarenko marked this pull request as ready for review December 13, 2021 15:12
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can make this change in the transport layer instead, see comment.

Comment on lines 72 to 76
// Do not open scroll if limit <= scroll size
var docsPerScroll = internal.getSearchRequest().source().size();
if (internal.getMaxDocs() != -1 && internal.getMaxDocs() <= docsPerScroll && internal.isAbortOnVersionConflict()) {
internal.getSearchRequest().scroll((Scroll) null);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to do this in the transport layer instead? That way it would apply also for internal reindex usages. We try to keep the REST layer free of logic.

If that is much more difficult, I do not mind it living here though.

@@ -256,6 +256,8 @@ protected BulkRequest buildBulk(Iterable<? extends ScrollableHitSource.Hit> docs
}

protected ScrollableHitSource buildScrollableResultSource(BackoffPolicy backoffPolicy) {
// Do not open scroll if maxDocs <= scroll size
mainRequest.disableScrollIfUnnecessary();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be this could actually be inlined here to avoid introducing additional methods into request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized this is not working for org.elasticsearch.reindex.Reindexer.AsyncIndexBySearchAction#buildScrollableResultSource as this constructs RemoteScrollableHitSource with mainRequest.getSearchRequest() without calling super. I will probably move this logic to constructor before buildScrollableResultSource is called

@idegtiarenko
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/rest-compatibility

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ievgen, this is close but I would like to copy the search request before modifying it.

docs/reference/rest-api/common-parms.asciidoc Outdated Show resolved Hide resolved
if (mainRequest.getMaxDocs() != -1
&& mainRequest.getMaxDocs() <= mainRequest.getSearchRequest().source().size()
&& mainRequest.isAbortOnVersionConflict()) {
mainRequest.getSearchRequest().scroll((Scroll) null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this main request is the original request created by the client. For the REST client, this is likely fine. But for internal clients, we could risk someone getting a funny effect out of reusing the search request for multiple operations. I am not sure it is a case, but it always annoyed me with the other things we set above.

Could we perhaps instead do this in buildScrollableResultSource instead and then create a new SearchRequest as a copy of the original when we need this to protect against this unexpected behavior?

@@ -365,7 +365,8 @@ public Self setScroll(TimeValue keepAlive) {
* Get scroll timeout
*/
public TimeValue getScrollTime() {
return searchRequest.scroll().keepAlive();
Scroll scroll = searchRequest.scroll();
return scroll != null ? scroll.keepAlive() : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above would also remove the need for this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to keep it as scroll is nullable and there could be something calling this from the executor as well

@@ -506,9 +511,6 @@ public boolean isCancelled() {
};
action.setScroll(scrollId());

// Set the base for the scroll to wait - this is added to the figure we calculate below
firstSearchRequest.scroll(timeValueSeconds(10));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved up to apply the change before search request is cloned inside of DummyAsyncBulkByScrollAction constructor

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment on where/how we should/can do this without affecting clients.

preparedSearchRequest.scroll((Scroll) null);
}

return mainRequest.setSearchRequest(preparedSearchRequest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is equally surprising. The client still sees the ReindexRequest changing, in case they reuse it for multiple request, the scroll will be gone. I would prefer to do this such that we keep a local new search request instead, for instance by passing such a modified request in to buildScrollableResultSource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do this, however I am concerned that this class still holds a reference to request and original search request as a result. This could be surprising in certain situations, for example I am checking if there is no scroll to refreshAndFinish early (this could be done other way).

@idegtiarenko idegtiarenko changed the title 54270 - do not scroll if max docs is less than scroll size (update/delete by query) do not scroll if max docs is less than scroll size (update/delete by query) Dec 17, 2021
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This direction looks good, left a few smaller comments

/*
* Do not open scroll if max docs <= scroll size and not resuming on version conflicts
*/
if (mainRequest.getMaxDocs() != -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the constant MAX_DOCS_ALL_MATCHES?

@@ -478,6 +510,12 @@ void onBulkResponse(BulkResponse response, Runnable onSuccess) {
return;
}

if (scrollSource.hasScroll() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should turn this into an assert instead, asserting that we never get here if there is no scroll source? Perhaps I missed a case where we would get here?

Copy link
Contributor Author

@idegtiarenko idegtiarenko Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be reached when max_docs is set, but the source index has less docs then that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to occasional failures of org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT › test {yaml=reference/docs/reindex/line_876} in this branch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. It is a slightly convoluted way to find out that we exhausted the search. I do see that it requires a bit more change (or at least examination) to find out that the search is exhausted. We can let this be like it is for now, but perhaps add a comment about this case being about the search being exhausted?

@@ -335,6 +342,48 @@ public void testBulkResponseSetsLotsOfStatus() throws Exception {
}
}

public void testHandlesBulkWithNoScroll() {
// given a request that should not open scroll
testRequest.setMaxDocs(1);

This comment was marked as resolved.

testRequest.getSearchRequest().source().size(100);

// when receiving bulk response
var responses = randomArray(0, 1, BulkItemResponse[]::new, AsyncBulkByScrollActionTests::createBulkResponse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand why we test with array size 0 here?

Copy link
Contributor Author

@idegtiarenko idegtiarenko Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This verifies behavior if source index has no documents.
This is relevant for a new condition added (exiting if source index has less documents then a max_docs).

May be this is worth dedicated test case. Will update.

private static BulkItemResponse createBulkResponse() {
return BulkItemResponse.success(
0,
DocWriteRequest.OpType.CREATE,

This comment was marked as resolved.

}

public void testDisableScrollWhenMaxDocsIsLessThenScrollSize() {
testRequest.setMaxDocs(1);

This comment was marked as resolved.

var preparedSearchRequest = AbstractAsyncBulkByScrollAction.prepareSearchRequest(testRequest, false, false);

assertThat(preparedSearchRequest.scroll(), nullValue());
}

This comment was marked as resolved.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a few comments but no need for an additional round.

/*
* Default to sorting by doc. We can't do this in the request itself because it is normal to *add* to the sorts rather than replace
* them and if we add _doc as the first sort by default then sorts will never work.... So we add it here, only if there isn't
* another sort.
*
* This modifies the original request!
*/
final SearchSourceBuilder sourceBuilder = mainRequest.getSearchRequest().source();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find it more intuitive to use preparedSearchRequest here. You have the comment to make it clear that we are modifying the original request. But it requires some thought here to be sure that it actually modifies the preparedSearchRequest's source. So I would suggest:

Suggested change
final SearchSourceBuilder sourceBuilder = mainRequest.getSearchRequest().source();
final SearchSourceBuilder sourceBuilder = preparedSearchRequest.source();

@@ -498,6 +551,9 @@ public boolean isCancelled() {
}
});

// Set the base for the scroll to wait - this is added to the figure we calculate below
firstSearchRequest.scroll(timeValueSeconds(10));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we perhaps remove the field and just use testRequest.getSearchRequest() instead? Seems unnecessary to have that field and using it is now slightly trickier due to the copy done.

assertThat(status.getCreated() + status.getUpdated() + status.getDeleted(), equalTo((long) responses.length));
}

public void testHandlesBulkWhenMaxDocsIsReached() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two tests here are nearly identical. Can we either make it such that we have one test with 50% probability of hitting the max_docs boundary or have the two tests share the code?

}

public void testEnableScrollWhenMaxDocsIsGreaterThenScrollSize() {
testRequest.setMaxDocs(between(100, 1000));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this not fail when max_docs=100?

Suggested change
testRequest.setMaxDocs(between(100, 1000));
testRequest.setMaxDocs(between(101, 1000));

}

public void testEnableScrollWhenProceedOnVersionConflict() {
testRequest.setMaxDocs(between(1, 100));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should go above 100 here - we enable it regardless?

Suggested change
testRequest.setMaxDocs(between(1, 100));
testRequest.setMaxDocs(between(1, 110));

@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.0
7.17 Commit could not be cherrypicked due to conflicts
7.16 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 81654

idegtiarenko added a commit to idegtiarenko/elasticsearch that referenced this pull request Dec 21, 2021
…query) (elastic#81654)

This change allows to not open scroll while reindex/delete_by_query/update_by_query
if configured max_docs if less then or equal to the number of documents returned by the scroll batch.
elasticsearchmachine pushed a commit that referenced this pull request Dec 21, 2021
…query) (#81654) (#81994)

This change allows to not open scroll while reindex/delete_by_query/update_by_query
if configured max_docs if less then or equal to the number of documents returned by the scroll batch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Reindex Issues relating to reindex that are not caused by issues further down >enhancement Team:Distributed Meta label for distributed team v8.0.0 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reindex and friends: don't scroll if limit is smaller than scroll
6 participants