-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do not scroll if max docs is less than scroll size (update/delete by query) #81654
Conversation
I am looking into updating docs for this change |
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can make this change in the transport layer instead, see comment.
// Do not open scroll if limit <= scroll size | ||
var docsPerScroll = internal.getSearchRequest().source().size(); | ||
if (internal.getMaxDocs() != -1 && internal.getMaxDocs() <= docsPerScroll && internal.isAbortOnVersionConflict()) { | ||
internal.getSearchRequest().scroll((Scroll) null); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to do this in the transport layer instead? That way it would apply also for internal reindex usages. We try to keep the REST layer free of logic.
If that is much more difficult, I do not mind it living here though.
@@ -256,6 +256,8 @@ protected BulkRequest buildBulk(Iterable<? extends ScrollableHitSource.Hit> docs | |||
} | |||
|
|||
protected ScrollableHitSource buildScrollableResultSource(BackoffPolicy backoffPolicy) { | |||
// Do not open scroll if maxDocs <= scroll size | |||
mainRequest.disableScrollIfUnnecessary(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be this could actually be inlined here to avoid introducing additional methods into request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized this is not working for org.elasticsearch.reindex.Reindexer.AsyncIndexBySearchAction#buildScrollableResultSource
as this constructs RemoteScrollableHitSource
with mainRequest.getSearchRequest()
without calling super. I will probably move this logic to constructor before buildScrollableResultSource
is called
@elasticmachine run elasticsearch-ci/rest-compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ievgen, this is close but I would like to copy the search request before modifying it.
if (mainRequest.getMaxDocs() != -1 | ||
&& mainRequest.getMaxDocs() <= mainRequest.getSearchRequest().source().size() | ||
&& mainRequest.isAbortOnVersionConflict()) { | ||
mainRequest.getSearchRequest().scroll((Scroll) null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this main request is the original request created by the client. For the REST client, this is likely fine. But for internal clients, we could risk someone getting a funny effect out of reusing the search request for multiple operations. I am not sure it is a case, but it always annoyed me with the other things we set above.
Could we perhaps instead do this in buildScrollableResultSource
instead and then create a new SearchRequest
as a copy of the original when we need this to protect against this unexpected behavior?
@@ -365,7 +365,8 @@ public Self setScroll(TimeValue keepAlive) { | |||
* Get scroll timeout | |||
*/ | |||
public TimeValue getScrollTime() { | |||
return searchRequest.scroll().keepAlive(); | |||
Scroll scroll = searchRequest.scroll(); | |||
return scroll != null ? scroll.keepAlive() : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above would also remove the need for this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to keep it as scroll is nullable and there could be something calling this from the executor as well
@@ -506,9 +511,6 @@ public boolean isCancelled() { | |||
}; | |||
action.setScroll(scrollId()); | |||
|
|||
// Set the base for the scroll to wait - this is added to the figure we calculate below | |||
firstSearchRequest.scroll(timeValueSeconds(10)); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved up to apply the change before search request is cloned inside of DummyAsyncBulkByScrollAction constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment on where/how we should/can do this without affecting clients.
preparedSearchRequest.scroll((Scroll) null); | ||
} | ||
|
||
return mainRequest.setSearchRequest(preparedSearchRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is equally surprising. The client still sees the ReindexRequest
changing, in case they reuse it for multiple request, the scroll will be gone. I would prefer to do this such that we keep a local new search request instead, for instance by passing such a modified request in to buildScrollableResultSource
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could do this, however I am concerned that this class still holds a reference to request and original search request as a result. This could be surprising in certain situations, for example I am checking if there is no scroll to refreshAndFinish early (this could be done other way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This direction looks good, left a few smaller comments
/* | ||
* Do not open scroll if max docs <= scroll size and not resuming on version conflicts | ||
*/ | ||
if (mainRequest.getMaxDocs() != -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the constant MAX_DOCS_ALL_MATCHES
?
@@ -478,6 +510,12 @@ void onBulkResponse(BulkResponse response, Runnable onSuccess) { | |||
return; | |||
} | |||
|
|||
if (scrollSource.hasScroll() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should turn this into an assert instead, asserting that we never get here if there is no scroll source? Perhaps I missed a case where we would get here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be reached when max_docs
is set, but the source index has less docs then that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to occasional failures of org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT › test {yaml=reference/docs/reindex/line_876}
in this branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks. It is a slightly convoluted way to find out that we exhausted the search. I do see that it requires a bit more change (or at least examination) to find out that the search is exhausted. We can let this be like it is for now, but perhaps add a comment about this case being about the search being exhausted?
@@ -335,6 +342,48 @@ public void testBulkResponseSetsLotsOfStatus() throws Exception { | |||
} | |||
} | |||
|
|||
public void testHandlesBulkWithNoScroll() { | |||
// given a request that should not open scroll | |||
testRequest.setMaxDocs(1); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
testRequest.getSearchRequest().source().size(100); | ||
|
||
// when receiving bulk response | ||
var responses = randomArray(0, 1, BulkItemResponse[]::new, AsyncBulkByScrollActionTests::createBulkResponse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand why we test with array size 0 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This verifies behavior if source index has no documents.
This is relevant for a new condition added (exiting if source index has less documents then a max_docs).
May be this is worth dedicated test case. Will update.
private static BulkItemResponse createBulkResponse() { | ||
return BulkItemResponse.success( | ||
0, | ||
DocWriteRequest.OpType.CREATE, |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
} | ||
|
||
public void testDisableScrollWhenMaxDocsIsLessThenScrollSize() { | ||
testRequest.setMaxDocs(1); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
var preparedSearchRequest = AbstractAsyncBulkByScrollAction.prepareSearchRequest(testRequest, false, false); | ||
|
||
assertThat(preparedSearchRequest.scroll(), nullValue()); | ||
} |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left a few comments but no need for an additional round.
/* | ||
* Default to sorting by doc. We can't do this in the request itself because it is normal to *add* to the sorts rather than replace | ||
* them and if we add _doc as the first sort by default then sorts will never work.... So we add it here, only if there isn't | ||
* another sort. | ||
* | ||
* This modifies the original request! | ||
*/ | ||
final SearchSourceBuilder sourceBuilder = mainRequest.getSearchRequest().source(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would find it more intuitive to use preparedSearchRequest
here. You have the comment to make it clear that we are modifying the original request. But it requires some thought here to be sure that it actually modifies the preparedSearchRequest
's source. So I would suggest:
final SearchSourceBuilder sourceBuilder = mainRequest.getSearchRequest().source(); | |
final SearchSourceBuilder sourceBuilder = preparedSearchRequest.source(); |
@@ -498,6 +551,9 @@ public boolean isCancelled() { | |||
} | |||
}); | |||
|
|||
// Set the base for the scroll to wait - this is added to the figure we calculate below | |||
firstSearchRequest.scroll(timeValueSeconds(10)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we perhaps remove the field and just use testRequest.getSearchRequest()
instead? Seems unnecessary to have that field and using it is now slightly trickier due to the copy done.
assertThat(status.getCreated() + status.getUpdated() + status.getDeleted(), equalTo((long) responses.length)); | ||
} | ||
|
||
public void testHandlesBulkWhenMaxDocsIsReached() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two tests here are nearly identical. Can we either make it such that we have one test with 50% probability of hitting the max_docs
boundary or have the two tests share the code?
} | ||
|
||
public void testEnableScrollWhenMaxDocsIsGreaterThenScrollSize() { | ||
testRequest.setMaxDocs(between(100, 1000)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this not fail when max_docs=100
?
testRequest.setMaxDocs(between(100, 1000)); | |
testRequest.setMaxDocs(between(101, 1000)); |
} | ||
|
||
public void testEnableScrollWhenProceedOnVersionConflict() { | ||
testRequest.setMaxDocs(between(1, 100)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should go above 100 here - we enable it regardless?
testRequest.setMaxDocs(between(1, 100)); | |
testRequest.setMaxDocs(between(1, 110)); |
💔 Backport failed
You can use sqren/backport to manually backport by running |
…query) (elastic#81654) This change allows to not open scroll while reindex/delete_by_query/update_by_query if configured max_docs if less then or equal to the number of documents returned by the scroll batch.
This pr prevents opening scroll when update/delete by query is configured with max_docs that is less then a scroll size.
Closes: #54270