do not scroll if max docs is less than scroll size (update/delete by query) #81654

idegtiarenko · 2021-12-13T14:16:29Z

This pr prevents opening scroll when update/delete by query is configured with max_docs that is less then a scroll size.

Closes: #54270

idegtiarenko · 2021-12-13T15:12:07Z

I am looking into updating docs for this change

elasticmachine · 2021-12-13T15:12:13Z

Pinging @elastic/es-distributed (Team:Distributed)

henningandersen

I wonder if we can make this change in the transport layer instead, see comment.

henningandersen · 2021-12-14T08:33:07Z

modules/reindex/src/main/java/org/elasticsearch/reindex/RestReindexAction.java

+        // Do not open scroll if limit <= scroll size
+        var docsPerScroll = internal.getSearchRequest().source().size();
+        if (internal.getMaxDocs() != -1 && internal.getMaxDocs() <= docsPerScroll && internal.isAbortOnVersionConflict()) {
+            internal.getSearchRequest().scroll((Scroll) null);
+        }


Would it be possible to do this in the transport layer instead? That way it would apply also for internal reindex usages. We try to keep the REST layer free of logic.

If that is much more difficult, I do not mind it living here though.

idegtiarenko · 2021-12-14T10:45:09Z

modules/reindex/src/main/java/org/elasticsearch/reindex/AbstractAsyncBulkByScrollAction.java

@@ -256,6 +256,8 @@ protected BulkRequest buildBulk(Iterable<? extends ScrollableHitSource.Hit> docs
    }

    protected ScrollableHitSource buildScrollableResultSource(BackoffPolicy backoffPolicy) {
+        // Do not open scroll if maxDocs <= scroll size
+        mainRequest.disableScrollIfUnnecessary();


May be this could actually be inlined here to avoid introducing additional methods into request

Just realized this is not working for org.elasticsearch.reindex.Reindexer.AsyncIndexBySearchAction#buildScrollableResultSource as this constructs RemoteScrollableHitSource with mainRequest.getSearchRequest() without calling super. I will probably move this logic to constructor before buildScrollableResultSource is called

idegtiarenko · 2021-12-14T12:21:05Z

@elasticmachine run elasticsearch-ci/rest-compatibility

henningandersen

Thanks Ievgen, this is close but I would like to copy the search request before modifying it.

docs/reference/rest-api/common-parms.asciidoc

henningandersen · 2021-12-16T12:45:33Z

modules/reindex/src/main/java/org/elasticsearch/reindex/AbstractAsyncBulkByScrollAction.java

+        if (mainRequest.getMaxDocs() != -1
+            && mainRequest.getMaxDocs() <= mainRequest.getSearchRequest().source().size()
+            && mainRequest.isAbortOnVersionConflict()) {
+            mainRequest.getSearchRequest().scroll((Scroll) null);


I believe this main request is the original request created by the client. For the REST client, this is likely fine. But for internal clients, we could risk someone getting a funny effect out of reusing the search request for multiple operations. I am not sure it is a case, but it always annoyed me with the other things we set above.

Could we perhaps instead do this in buildScrollableResultSource instead and then create a new SearchRequest as a copy of the original when we need this to protect against this unexpected behavior?

henningandersen · 2021-12-16T12:50:25Z

server/src/main/java/org/elasticsearch/index/reindex/AbstractBulkByScrollRequest.java

@@ -365,7 +365,8 @@ public Self setScroll(TimeValue keepAlive) {
     * Get scroll timeout
     */
    public TimeValue getScrollTime() {
-        return searchRequest.scroll().keepAlive();
+        Scroll scroll = searchRequest.scroll();
+        return scroll != null ? scroll.keepAlive() : null;


Above would also remove the need for this change.

I would suggest to keep it as scroll is nullable and there could be something calling this from the executor as well

idegtiarenko · 2021-12-16T13:41:34Z

modules/reindex/src/test/java/org/elasticsearch/reindex/AsyncBulkByScrollActionTests.java

@@ -506,9 +511,6 @@ public boolean isCancelled() {
        };
        action.setScroll(scrollId());

-        // Set the base for the scroll to wait - this is added to the figure we calculate below
-        firstSearchRequest.scroll(timeValueSeconds(10));
-


Moved up to apply the change before search request is cloned inside of DummyAsyncBulkByScrollAction constructor

henningandersen

One more comment on where/how we should/can do this without affecting clients.

henningandersen · 2021-12-17T08:53:22Z

modules/reindex/src/main/java/org/elasticsearch/reindex/AbstractAsyncBulkByScrollAction.java

+            preparedSearchRequest.scroll((Scroll) null);
+        }
+
+        return mainRequest.setSearchRequest(preparedSearchRequest);


I think this is equally surprising. The client still sees the ReindexRequest changing, in case they reuse it for multiple request, the scroll will be gone. I would prefer to do this such that we keep a local new search request instead, for instance by passing such a modified request in to buildScrollableResultSource.

I could do this, however I am concerned that this class still holds a reference to request and original search request as a result. This could be surprising in certain situations, for example I am checking if there is no scroll to refreshAndFinish early (this could be done other way).

henningandersen

This direction looks good, left a few smaller comments

henningandersen · 2021-12-20T09:05:21Z

modules/reindex/src/main/java/org/elasticsearch/reindex/AbstractAsyncBulkByScrollAction.java

+        /*
+         * Do not open scroll if max docs <= scroll size and not resuming on version conflicts
+         */
+        if (mainRequest.getMaxDocs() != -1


Can we use the constant MAX_DOCS_ALL_MATCHES?

henningandersen · 2021-12-20T09:06:21Z

modules/reindex/src/main/java/org/elasticsearch/reindex/AbstractAsyncBulkByScrollAction.java

@@ -478,6 +510,12 @@ void onBulkResponse(BulkResponse response, Runnable onSuccess) {
                return;
            }

+            if (scrollSource.hasScroll() == false) {


I think we should turn this into an assert instead, asserting that we never get here if there is no scroll source? Perhaps I missed a case where we would get here?

This could be reached when max_docs is set, but the source index has less docs then that.

Related to occasional failures of org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT › test {yaml=reference/docs/reindex/line_876} in this branch

I see, thanks. It is a slightly convoluted way to find out that we exhausted the search. I do see that it requires a bit more change (or at least examination) to find out that the search is exhausted. We can let this be like it is for now, but perhaps add a comment about this case being about the search being exhausted?

modules/reindex/src/test/java/org/elasticsearch/reindex/AsyncBulkByScrollActionTests.java

@@ -335,6 +342,48 @@ public void testBulkResponseSetsLotsOfStatus() throws Exception {
        }
    }

+    public void testHandlesBulkWithNoScroll() {
+        // given a request that should not open scroll
+        testRequest.setMaxDocs(1);


henningandersen · 2021-12-20T09:09:32Z

modules/reindex/src/test/java/org/elasticsearch/reindex/AsyncBulkByScrollActionTests.java

+        testRequest.getSearchRequest().source().size(100);
+
+        // when receiving bulk response
+        var responses = randomArray(0, 1, BulkItemResponse[]::new, AsyncBulkByScrollActionTests::createBulkResponse);


I am not sure I understand why we test with array size 0 here?

This verifies behavior if source index has no documents.
This is relevant for a new condition added (exiting if source index has less documents then a max_docs).

May be this is worth dedicated test case. Will update.

modules/reindex/src/test/java/org/elasticsearch/reindex/AsyncBulkByScrollActionTests.java

+    private static BulkItemResponse createBulkResponse() {
+        return BulkItemResponse.success(
+            0,
+            DocWriteRequest.OpType.CREATE,


modules/reindex/src/test/java/org/elasticsearch/reindex/AsyncBulkByScrollActionTests.java

+    }
+
+    public void testDisableScrollWhenMaxDocsIsLessThenScrollSize() {
+        testRequest.setMaxDocs(1);


modules/reindex/src/test/java/org/elasticsearch/reindex/AsyncBulkByScrollActionTests.java

+        var preparedSearchRequest = AbstractAsyncBulkByScrollAction.prepareSearchRequest(testRequest, false, false);
+
+        assertThat(preparedSearchRequest.scroll(), nullValue());
+    }


henningandersen

LGTM. Left a few comments but no need for an additional round.

henningandersen · 2021-12-21T12:25:59Z

modules/reindex/src/main/java/org/elasticsearch/reindex/AbstractAsyncBulkByScrollAction.java

        /*
         * Default to sorting by doc. We can't do this in the request itself because it is normal to *add* to the sorts rather than replace
         * them and if we add _doc as the first sort by default then sorts will never work.... So we add it here, only if there isn't
         * another sort.
+         *
+         * This modifies the original request!
         */
        final SearchSourceBuilder sourceBuilder = mainRequest.getSearchRequest().source();


I would find it more intuitive to use preparedSearchRequest here. You have the comment to make it clear that we are modifying the original request. But it requires some thought here to be sure that it actually modifies the preparedSearchRequest's source. So I would suggest:

Suggested change

final SearchSourceBuilder sourceBuilder = mainRequest.getSearchRequest().source();

final SearchSourceBuilder sourceBuilder = preparedSearchRequest.source();

henningandersen · 2021-12-21T12:33:58Z

modules/reindex/src/test/java/org/elasticsearch/reindex/AsyncBulkByScrollActionTests.java

@@ -498,6 +551,9 @@ public boolean isCancelled() {
            }
        });

+        // Set the base for the scroll to wait - this is added to the figure we calculate below
+        firstSearchRequest.scroll(timeValueSeconds(10));


Could we perhaps remove the field and just use testRequest.getSearchRequest() instead? Seems unnecessary to have that field and using it is now slightly trickier due to the copy done.

henningandersen · 2021-12-21T12:35:22Z

modules/reindex/src/test/java/org/elasticsearch/reindex/AsyncBulkByScrollActionTests.java

+        assertThat(status.getCreated() + status.getUpdated() + status.getDeleted(), equalTo((long) responses.length));
+    }
+
+    public void testHandlesBulkWhenMaxDocsIsReached() {


The two tests here are nearly identical. Can we either make it such that we have one test with 50% probability of hitting the max_docs boundary or have the two tests share the code?

henningandersen · 2021-12-21T12:36:26Z

modules/reindex/src/test/java/org/elasticsearch/reindex/AsyncBulkByScrollActionTests.java

+    }
+
+    public void testEnableScrollWhenMaxDocsIsGreaterThenScrollSize() {
+        testRequest.setMaxDocs(between(100, 1000));


Will this not fail when max_docs=100?

Suggested change

testRequest.setMaxDocs(between(100, 1000));

testRequest.setMaxDocs(between(101, 1000));

henningandersen · 2021-12-21T12:37:36Z

modules/reindex/src/test/java/org/elasticsearch/reindex/AsyncBulkByScrollActionTests.java

+    }
+
+    public void testEnableScrollWhenProceedOnVersionConflict() {
+        testRequest.setMaxDocs(between(1, 100));


Perhaps we should go above 100 here - we enable it regardless?

Suggested change

testRequest.setMaxDocs(between(1, 100));

testRequest.setMaxDocs(between(1, 110));

elasticsearchmachine · 2021-12-21T14:28:05Z

💔 Backport failed

Status	Branch	Result
✅	8.0
❌	7.17	Commit could not be cherrypicked due to conflicts
❌	7.16	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 81654

…query) (elastic#81654) This change allows to not open scroll while reindex/delete_by_query/update_by_query if configured max_docs if less then or equal to the number of documents returned by the scroll batch.

…query) (#81654) (#81994) This change allows to not open scroll while reindex/delete_by_query/update_by_query if configured max_docs if less then or equal to the number of documents returned by the scroll batch.

54270 - do not scroll if max docs is less than scroll size

1af9f08

idegtiarenko added >enhancement v8.0.0 :Distributed/Reindex Issues relating to reindex that are not caused by issues further down Team:Distributed Meta label for distributed team v8.1.0 v7.16.2 labels Dec 13, 2021

idegtiarenko added 2 commits December 13, 2021 15:29

54270 - fix formatting

26e9003

54270 - do not open scroll for reindex

5951d58

idegtiarenko requested a review from henningandersen December 13, 2021 15:11

idegtiarenko marked this pull request as ready for review December 13, 2021 15:12

idegtiarenko added 4 commits December 13, 2021 16:15

54270 - fix formatting

38b56d6

54270 - add docs

a7b3430

54270 - update docs

8fc98d2

Merge branch 'master' into 54270_do_not_scroll

50e3be1

henningandersen reviewed Dec 14, 2021

View reviewed changes

54270 - move logic into transport layer

0e9a277

idegtiarenko commented Dec 14, 2021

View reviewed changes

idegtiarenko requested a review from henningandersen December 14, 2021 10:45

54270 - format

51f9c0d

idegtiarenko added 4 commits December 14, 2021 15:17

Merge branch 'master' into 54270_do_not_scroll

0ec904b

81289 - move request modification to constructor

8398237

Merge branch 'master' into 54270_do_not_scroll

1d13cd1

54270 - fmt

1ac4479

henningandersen reviewed Dec 16, 2021

View reviewed changes

54270 - clone search request

a93787b

idegtiarenko commented Dec 16, 2021

View reviewed changes

54270 - handle bulk response with no scroll

891793a

henningandersen reviewed Dec 17, 2021

View reviewed changes

54270 - keep original request, use copy for scroll hit source

55923e2

idegtiarenko changed the title ~~54270 - do not scroll if max docs is less than scroll size (update/delete by query)~~ do not scroll if max docs is less than scroll size (update/delete by query) Dec 17, 2021

idegtiarenko added 2 commits December 17, 2021 12:22

54270 - revert unneeded change

c4bafac

54270 - improve doc

5c97be0

droberts195 added the v7.17.0 label Dec 17, 2021

mark-vieira added v7.16.3 and removed v7.16.2 labels Dec 18, 2021

idegtiarenko requested a review from henningandersen December 20, 2021 08:03

henningandersen reviewed Dec 20, 2021

View reviewed changes

idegtiarenko added 4 commits December 20, 2021 10:59

54270 - fix review comments

8b308e6

add a comment

74f48b0

Merge branch 'master' into 54270_do_not_scroll

b03f9d8

fix typo

3d93c8f

henningandersen self-requested a review December 21, 2021 12:22

henningandersen approved these changes Dec 21, 2021

View reviewed changes

fix comments

a1b0e38

idegtiarenko added the auto-backport-and-merge label Dec 21, 2021

idegtiarenko merged commit 11b5261 into master Dec 21, 2021

idegtiarenko deleted the 54270_do_not_scroll branch December 21, 2021 14:26

idegtiarenko mentioned this pull request Dec 21, 2021

[8.0] do not scroll if max docs is less than scroll size (update/delete by query) (#81654) #81994

Merged

idegtiarenko removed v7.17.0 v7.16.3 labels Jan 4, 2022

luisfavila mentioned this pull request May 16, 2024

[Feature Request] Don't scroll if max docs < scroll size (update by query/delete by query) opensearch-project/OpenSearch#13704

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do not scroll if max docs is less than scroll size (update/delete by query) #81654

do not scroll if max docs is less than scroll size (update/delete by query) #81654

idegtiarenko commented Dec 13, 2021

idegtiarenko commented Dec 13, 2021

elasticmachine commented Dec 13, 2021

henningandersen left a comment

henningandersen Dec 14, 2021

idegtiarenko Dec 14, 2021

idegtiarenko Dec 14, 2021

idegtiarenko commented Dec 14, 2021

henningandersen left a comment

henningandersen Dec 16, 2021

henningandersen Dec 16, 2021

idegtiarenko Dec 16, 2021

idegtiarenko Dec 16, 2021

henningandersen left a comment

henningandersen Dec 17, 2021

idegtiarenko Dec 17, 2021

henningandersen left a comment

henningandersen Dec 20, 2021

henningandersen Dec 20, 2021

idegtiarenko Dec 20, 2021 •

edited

Loading

idegtiarenko Dec 20, 2021

henningandersen Dec 20, 2021

This comment was marked as resolved.

henningandersen Dec 20, 2021

idegtiarenko Dec 20, 2021 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

henningandersen left a comment

henningandersen Dec 21, 2021

henningandersen Dec 21, 2021

henningandersen Dec 21, 2021

henningandersen Dec 21, 2021

henningandersen Dec 21, 2021

elasticsearchmachine commented Dec 21, 2021

	final SearchSourceBuilder sourceBuilder = mainRequest.getSearchRequest().source();
	final SearchSourceBuilder sourceBuilder = preparedSearchRequest.source();

	testRequest.setMaxDocs(between(100, 1000));
	testRequest.setMaxDocs(between(101, 1000));

	testRequest.setMaxDocs(between(1, 100));
	testRequest.setMaxDocs(between(1, 110));

do not scroll if max docs is less than scroll size (update/delete by query) #81654

do not scroll if max docs is less than scroll size (update/delete by query) #81654

Conversation

idegtiarenko commented Dec 13, 2021

idegtiarenko commented Dec 13, 2021

elasticmachine commented Dec 13, 2021

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

idegtiarenko commented Dec 14, 2021

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

idegtiarenko Dec 20, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

Choose a reason for hiding this comment

idegtiarenko Dec 20, 2021 • edited Loading

Choose a reason for hiding this comment

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Dec 21, 2021

💔 Backport failed

idegtiarenko Dec 20, 2021 •

edited

Loading

idegtiarenko Dec 20, 2021 •

edited

Loading