SQL: Replace scroll cursors with point-in-time and search_after #83381

Luegg · 2022-02-01T15:51:08Z

Resolves #61873

The goal of this PR is to remove the use of the deprecated scroll cursors in SQL. Functionality and APIs should remain the same with one notable difference: The last page of a search hit query used to always include a scroll cursor if it is non-empty. This is no longer the case, if a result set is exhausted, the PIT will be closed and the last page does not include a cursor.

Note, PIT can also be used for aggregation and PIVOT queries but this is not in the scope of this PR and will be implemented in a follow up.

Additionally, this PR resolves #80523 because the total doc count is no longer required.

Addresses elastic#83371

elasticsearchmachine · 2022-02-01T15:51:32Z

Hi @Luegg, I've created a changelog YAML for you.

elasticsearchmachine · 2022-02-01T15:56:34Z

Hi @Luegg, I've created a changelog YAML for you.

Luegg · 2022-02-04T08:44:46Z

...server/security/src/test/java/org/elasticsearch/xpack/sql/qa/security/RestSqlSecurityIT.java

     * It should exercise the same code as the other APIs but if we were truly
     * paranoid we'd hack together something to test the others as well.
     */
-    public void testHijackScrollFails() throws Exception {
-        createUser("full_access", "rest_minimal");
+    public void testHijackCursorFails() throws Exception {


Previously, this test asserted that a user with sufficient privilege to perform an equivalent request cannot hijack someone else's cursors. Since scroll cursors are "owned" by users, this was something SQL guaranteed for search hit queries. Now, a PIT is shared between users and this guarantee no longer holds.

The test now asserts that users with less privileges cannot hijack cursors for all sorts of queries.

Ah, right, now I know why I removed it ;-)
I'm for keep the test here, but just curious if you looked into if it's the PIT part that gets tested (downstream in ES), or simply that a user without proper rights can issue a search request (with or w/o a PIT ID) -- also considering the query randomization.

It doesn't test anything specific to PIT it just ensures that SQL cursors do not magically bypass ES security (which is very unlikely I hope).

astefan

Looks good in general.
I've left some comments and questions and, also, upon testing this I've noticed there is a slight inconsistency in the use of the cursor: with fetch_size: 1, the last page still has a cursor element even if the next page is empty, whereas if the last page has a size smaller than the fetch_size, there is no cursor.

astefan · 2022-02-08T13:54:20Z

...n/sql/qa/mixed-node/src/test/java/org/elasticsearch/xpack/sql/qa/mixed_node/SqlCompatIT.java

+            This test fails if bwc test spans ES versions that introduce breaking changes. In this case, requests to new nodes will be
+            redirected to the old nodes which will generate the cursor. Subsequent scroll requests to the new node with this cursor will
+            fail with a version conflict.
+            """, bwcVersion.after(VersionCompatibilityChecks.INTRODUCING_UNSIGNED_LONG));


I don't understand why this test is relevant. It looks like an overcomplicated scenario to test that cursors (either scroll of PIT) are not supported in a mixed version cluster. They are not supported in mixed versions either way, so why is relevant in this test to have the bwc version to be the one after unsigned_long support has been introduced?

The test itself encodes the level of compatibility that we currently provide during a rolling upgrade (you can scroll through a dataset as long as you're hitting nodes on the same version as the one that produced the cursor). So I think it has it's justification to test it. Unfortunately, that's not always working as expected. But it's weird to have it disabled for specific versions. I guess it's better to just @AwaitFix it. I've created an issue that explains the problem: #83726

astefan · 2022-02-08T14:15:36Z

.../plugin/sql/qa/server/src/main/java/org/elasticsearch/xpack/sql/qa/rest/RestSqlTestCase.java

-            runSql(new StringEntity(cursor(cursor).mode(mode).toString(), ContentType.APPLICATION_JSON), StringUtils.EMPTY, mode)
-        );
+
+        assertNull(cursor);


I don't think this test is 100% valid. You are expecting a cursor element to not exist for the last page in the request, whereas response.remove("cursor") can also return null if the cursor key actually exists in the map but it's null. It would be better if you'd look at the last page and check that response doesn't actually contain the cursor key.

astefan · 2022-02-08T14:36:17Z

...lugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/CompositeAggCursor.java

@@ -187,7 +184,7 @@ static void handle(
                byte[] queryAsBytes = null;
                if (afterKey != null) {
                    updateSourceAfterKey(afterKey, source);
-                    queryAsBytes = serializeQuery(source);
+                    queryAsBytes = Querier.serializeQuery(source);


Please, use static imports for serializeQuery and deserializeQuery.

do we have some guidelines on static imports? I see both ways of using static members throughout the codebase.

Not sure tbh, but in QL code base such a method call would be done with a static import. We tend to use static imports in general, unless the code itself is more clear if the class is prefixed as well. In this particular case I don't see the utility of having Querier in there as well and without it the code is less bloated as well.

Nothing official however I tend to use them:

when a method is used multiple times

when there's no method similar present in the current class (which typically occurs in testing)

when it reduces the method length and prevents wrapping

astefan · 2022-02-08T14:41:32Z

...n/sql/src/internalClusterTest/java/org/elasticsearch/xpack/sql/action/SqlCancellationIT.java

@@ -91,18 +90,13 @@ public void testCancellation() throws Exception {

        disableBlocks(plugins);
        Exception exception = expectThrows(Exception.class, future::get);
-        Throwable inner = ExceptionsHelper.unwrap(exception, SearchPhaseExecutionException.class);


Why is this line not needed anymore?

I'm not sure to be honest. Before, the TaskCancelledException used to be wrapped in a SearchPhaseExecutionException but it's not anymore. Looking at the stack trace, it still happens when trying to make the search request:

astefan · 2022-02-08T14:52:29Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/Querier.java

+        final OpenPointInTimeRequest openPitRequest = new OpenPointInTimeRequest(search.indices()).indicesOptions(search.indicesOptions())
+            .keepAlive(cfg.pageTimeout());
+
+        client.execute(OpenPointInTimeAction.INSTANCE, openPitRequest, wrap((openPointInTimeResponse) -> {


Nit and not a big deal, wrapping with brackets a single argument lambda is unnecessary and the code doesn't look so clean.

Hm, I actually prefer it the other way around but it seems to be pretty consistent across the QL code base. I'll change it here but it will be hard for me to always adhere to it if checkstyle does not yell at me about it ;)

I agree that the style is not enforced in a formal way in the QL code, other than PR reviewers that have looked at the code for some time already. When checkstyle has been eventually enforced throughout the ES code base, the code style in QL has already established its own way.
Regarding the use of brackets for single argument lambdas, yes, it's pretty consistent (only 22 exceptions out of almost 2000 uses). With time, you'll get used to it and we are here to help you :-) with reviews like mine here.

astefan · 2022-02-08T16:28:52Z

...k/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/SearchHitCursor.java

+
+        SearchSourceBuilder query = q;
+        if (log.isTraceEnabled()) {
+            log.trace("About to execute composite query {}", StringUtils.toString(query));


No composite query here ;-).

astefan · 2022-02-08T16:29:12Z

...k/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/SearchHitCursor.java

+
+            byte[] nextQuery;
+            try {
+                nextQuery = Querier.serializeQuery(source);


Static import please

astefan · 2022-02-08T16:32:38Z

...k/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/SearchHitCursor.java

+            return false;
+        }
+        SearchHitCursor other = (SearchHitCursor) obj;
+        return Arrays.equals(nextQuery, other.nextQuery)


I would do the "costly" comparisons last and have includeFrozen and limit as first ones.

bpintea

Looks good, only have one outstanding question.

bpintea · 2022-02-09T15:48:23Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/Querier.java

+
+    public static void closePointInTime(Client client, String pointInTimeId, ActionListener<Boolean> listener) {
+        // request should not be made with the parent task assigned because the parent task might already be canceled
+        client = client instanceof ParentTaskAssigningClient wrapperClient ? wrapperClient.unwrap() : client;


Nit: this could be moved inside the branch.

bpintea · 2022-02-09T16:36:14Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/Querier.java

-            } catch (Exception ex) {
-                cleanup(response, ex);
-            }
+            handleResponse(response, delegate);


we might have to revisit support for partial search results

Shouldn't the shard failures still be checked before handling the response? Not sure if we even have a test for this.

bpintea · 2022-02-09T16:55:16Z

...k/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/SearchHitCursor.java

+        );
+    }
+
+    protected Supplier<SearchHitRowSet> makeRowSet(int sizeRequested, SearchResponse response) {


Unless you're preparing the ground for the the other cursors, this can be private.

bpintea · 2022-02-09T17:05:35Z

...k/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/SearchHitCursor.java

+    }
+
+    private static void updateSearchAfter(SearchHit[] hits, SearchSourceBuilder source) {
+        assert hits.length > 0;


Could we get a successful response with no hits? Maybe an AIOBE would be preferred hier?

Good point, AIOBE is actually preferable here because it does not tear down the whole world with it. Since this method only gets called if hasRemaining is true (and that's never the case with empty hits) this cannot happen anyway.

bpintea · 2022-02-09T17:53:24Z

...server/security/src/test/java/org/elasticsearch/xpack/sql/qa/security/RestSqlSecurityIT.java

     * It should exercise the same code as the other APIs but if we were truly
     * paranoid we'd hack together something to test the others as well.
     */
-    public void testHijackScrollFails() throws Exception {
-        createUser("full_access", "rest_minimal");
+    public void testHijackCursorFails() throws Exception {


Ah, right, now I know why I removed it ;-)
I'm for keep the test here, but just curious if you looked into if it's the PIT part that gets tested (downstream in ES), or simply that a user without proper rights can issue a search request (with or w/o a PIT ID) -- also considering the query randomization.

Luegg · 2022-02-10T08:27:15Z

I've left some comments and questions and, also, upon testing this I've noticed there is a slight inconsistency in the use of the cursor: with fetch_size: 1, the last page still has a cursor element even if the next page is empty, whereas if the last page has a size smaller than the fetch_size, there is no cursor.

I've had another look to get rid of the empty last page in every case but it's probably not worth it. I think there are two ways to achieve this and both come at a cost:

Set track_total_hits to <currentOffset> + <pageSize> and use it to determine wether there will be another page. This requires to enable track_total_hits which is not recommended with search_after (according to this doc page) and probably increases search costs with O(<currentOffset>). Also, it would require to carry along <currentOffset> in the cursors.
Request one more hit than <pageSize> and only return a cursor if <pageSize> + 1 hits have been fetched. This adds a tiny cost to each query (lets say 1/1000 for a page size of 1000) in order to avoid returning a cursor for every query in 1000 (very handwavy calculation...). So the gain would probably cancel out with the cost.

Given this, I would go forward with the empty last page for now since it avoid some potential pitfalls. e.g. what to do if the <pageSize> + 1th record causes the query to fail?

Luegg · 2022-02-10T09:10:00Z

It's worth checking whether the old scroll cursor and behavior can be kept around as a pluggable option for performance testing moving forward.

I still think it's better to get rid of the scroll based implementation for good. Keeping it around would require to put in some feature toggle to somehow activate it and it will make it harder to abstract away some of the cursor logic when going forward with adding PIT to the other queries (we would always have to ensure compatibility with scroll based queries which has a slightly different life cycle and error modes).

astefan · 2022-02-10T16:45:01Z

Regarding

Given this, I would go forward with the empty last page for now since it avoid some potential pitfalls. e.g. what to do if the + 1th record causes the query to fail?

Agree, but please, create an issue to have this behavior recorded somewhere (last page having a cursor element even if the next page is empty, whereas if the last page has a size smaller than the fetch_size, there is no cursor element).

astefan

LGTM

bpintea

LGTM

costin

LGTM. Left one clarification comment on setting the timeout and another regarding the check on total hits.

...lugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/CompositeAggCursor.java

costin · 2022-02-14T16:47:40Z

...k/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/SearchHitRowSet.java

        // compute remaining limit (only if the limit is specified - that is, positive).
        int remaining = limit < 0 ? limit : limit - size;
        // either the search returned fewer records than requested or the limit is exhausted
-        if (size < sizeRequested || remaining == 0
-        // or exactly `totalHits` records have been fetched
-            || totalHits != null && totalHits.value == hits.length) {


It is possible for queries to ask the total hits in which case this check is useful; in the vast majority of cases it will be ignored due to the null check.

Good point, the only case I know about is COUNT(*) on implicit aggregations though (and in this case size is always 1). Are there others? Since it's not tested I'd personally prefer to keep it out for now. I think it's also not quite correct as is because it does not consider the relation (eq vs gte).

Luegg and others added 8 commits January 26, 2022 14:17

assert non-compatibility of SQL scroll cursors

fd6d863

replace ScrollCursor by SearchHitCursor

4ff687e

fix propagation of includeFrozenIndices config

d1f9410

address various test failures

c40852d

SQL: Fix txt format for empty result sets

949b4b4

Addresses elastic#83371

Update docs/changelog/83376.yaml

9946b32

mention issue in changelog

51b2912

Update docs/changelog/83376.yaml

fc5ad30

Luegg added >enhancement :Analytics/SQL SQL querying v8.1.0 labels Feb 1, 2022

Merge fix/textCursor into enhance/pitForSql2

c413dd3

Luegg force-pushed the enhance/pitForSql2 branch from 429fbb3 to c413dd3 Compare February 1, 2022 15:56

Update docs/changelog/83381.yaml

60527ec

Luegg added 4 commits February 2, 2022 11:12

adjust REST compatibility tests

4d1be07

remove duplicated (de)serializeQuery methods

05b7b15

Merge origin/main into enhance/pitForSql2

ee2cac8

unify error handling logic in searchWithPointInTime

f85babd

mark-vieira added v8.2.0 and removed v8.1.0 labels Feb 2, 2022

Luegg added 2 commits February 3, 2022 09:26

skip rest compatibility test that's no longer valid

cb33456

adjust RestSqlSecurityIT

4bcfa9b

Luegg force-pushed the enhance/pitForSql2 branch from 3809144 to 4bcfa9b Compare February 3, 2022 13:35

Luegg added 2 commits February 4, 2022 09:03

s/8.1/8.2/

3c31b1a

Merge origin/main into enhance/pitForSql2

352647f

Luegg force-pushed the enhance/pitForSql2 branch from d21c6f4 to 352647f Compare February 4, 2022 08:03

update comments and remove "scroll" terminology where needed

7d738ae

Luegg commented Feb 4, 2022

View reviewed changes

Update docs/changelog/83381.yaml

cea6f78

astefan reviewed Feb 8, 2022

View reviewed changes

address comments

6401376

bpintea reviewed Feb 9, 2022

View reviewed changes

Luegg added 2 commits February 10, 2022 09:57

address more comments

90b12a7

Merge origin/main into enhance/pitForSql2

d1383da

Luegg requested review from costin, bpintea and astefan February 10, 2022 09:10

Luegg mentioned this pull request Feb 10, 2022

SQL: Rest scroll requests with format=txt fail if last page is not empty and does not include a scroll cursor #83788

Closed

astefan approved these changes Feb 10, 2022

View reviewed changes

bpintea approved these changes Feb 10, 2022

View reviewed changes

costin approved these changes Feb 14, 2022

View reviewed changes

Luegg added 2 commits February 15, 2022 08:54

bring back request timeout

7e2248b

Merge origin/main into enhance/pitForSql2

b913510

Luegg added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 15, 2022

Merge origin/main into enhance/pitForSql2

3865165

elasticsearchmachine merged commit 68a04a3 into elastic:master Feb 15, 2022

Luegg deleted the enhance/pitForSql2 branch February 15, 2022 10:11

Luegg mentioned this pull request Feb 23, 2022

SQL: Fix issues with format=txt when paging through result sets and in mixed node environments #83833

Merged

bpintea mentioned this pull request Mar 9, 2022

SQL: Replace the scroll with PIT for data batching #62488

Closed

Luegg mentioned this pull request Mar 14, 2022

SQL: Avoid empty last pages for GROUP BY queries when possible #84356

Merged

Luegg mentioned this pull request Apr 4, 2022

SQL: Replace Scroll with search_after #33287

Closed

Luegg mentioned this pull request Apr 25, 2022

SQL: Fix pagination during rolling upgrade #85399

Open

Luegg mentioned this pull request Jun 8, 2022

[SQL] SQL query expands index alias and runs queries against the backing index instead of the alias #87259

Open

SQL: Replace scroll cursors with point-in-time and search_after #83381

SQL: Replace scroll cursors with point-in-time and search_after #83381

Conversation

Luegg commented Feb 1, 2022 • edited Loading

elasticsearchmachine commented Feb 1, 2022

elasticsearchmachine commented Feb 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bpintea left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Luegg commented Feb 10, 2022

Luegg commented Feb 10, 2022

astefan commented Feb 10, 2022 • edited Loading

astefan left a comment

Choose a reason for hiding this comment

bpintea left a comment

Choose a reason for hiding this comment

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Luegg commented Feb 1, 2022 •

edited

Loading

astefan commented Feb 10, 2022 •

edited

Loading