SQL: clear the cursor if nested inner hits are enough to fulfill the query required limits #35398

astefan · 2018-11-09T00:13:45Z

This is a fix for #35176. The bug was reported against 6.5, but this is reproduceable on master as well.

The very specific scenario where this happens assumes:

nested documents are used and returned in the query results
at least one matching document from results has more than one nested inner hit matching the query
the fetch size of the request (jdbc, rest) needs to be smaller than the required number of results of the query
fetch_size + calculated inner hits of the last cursor request = query's limit size

Example:

SELECT dep.dep_id FROM test_emp LIMIT 5
dep is a nested field
fetch_size set to 4
we assume that there are 5 documents matching the query and from these 5, one of them has 2 inner hits, the other four have one inner hit only

In this case, SQL will create a query with "size": 5 and 5 documents are returned, but SQL will count the total number of inner hits from all matching documents towards the LIMIT 5. In this case, the real limit is 6. SQL will create first scroll query with "size": 5, will look at the matching inner hits, will see that the matching ones are 6 (so the needed 5 are already there in the results) and that a second query to ES is not necessary anymore. But it will leave behind an open search context, which increases the memory usage unnecessarily.

…ired query limits

…o 35176_fix

elasticmachine · 2018-11-09T00:13:47Z

Pinging @elastic/es-search-aggs

astefan · 2018-11-09T00:17:54Z

Still, have to add tests to cover this scenario.

…o 35176_fix

astefan · 2018-11-09T14:31:28Z

@costin @matriv added tests for this scenario, as well. Ready for review. Thanks.

astefan · 2018-11-09T14:43:12Z

Retest this please

matriv

LGTM. Really nice catch!!

costin

Good catch and good tests. However instead of adding a test class, the behavior should be incorporated in the existing CSV test suite.
Further more the FetchSizeTestCase needs to be extended to include a test for nested since that's the suite that handles related matters.

costin · 2018-11-11T15:50:13Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/Querier.java

+                        (Boolean.TRUE.equals(response.isTerminatedEarly()) 
+                                || response.getHits().getTotalHits() == hits.length
+                                // or maybe the limit has been reached
+                                || (hits.length >= query.limit() && query.limit() > -1)


The hits are computed inside hitRowSet so this condition (and its comment) are not needed anymore (they are confusing).

costin · 2018-11-11T15:54:53Z

...ingle-node/src/test/java/org/elasticsearch/xpack/sql/qa/single_node/JdbcCsvNestedDocsIT.java

+/**
+ * Test class to check https://github.com/elastic/elasticsearch/issues/35176 fix
+ */
+public class JdbcCsvNestedDocsIT extends JdbcCsvSpecIT {


No need to create a separate class - the logic here can be folded into JdbcCsvSpecIT which would override executeJdbcQuery and if the test is nested, use a smaller max random. To make the change smaller, add a maximumFetchSize() method in SpecBaseIntegrationTest which then gets used inside executeJdbcQuery.

costin · 2018-11-11T15:56:54Z

Since this is a (serious) bug, it needs backporting to 6.5.x I think.

…o 35176_fix

astefan · 2018-11-12T23:35:12Z

@costin I've addressed review comments.

costin

Left some comments - once addressed it's good for merging.

costin · 2018-11-13T20:42:54Z

...l/qa/single-node/src/test/java/org/elasticsearch/xpack/sql/qa/single_node/JdbcCsvSpecIT.java

+    protected ResultSet executeJdbcQuery(Connection con, String query) throws SQLException {
+        // using a smaller fetchSize for nested documents' tests to uncover bugs
+        // similar with https://github.com/elastic/elasticsearch/issues/35176 quicker
+        if (fileName.startsWith("nested")) {


Instead of overriding executeJdbcQuery, override fetchSize():

protected int fetchSize() { if (fileName.startsWith("nested")) { return randomBoolean() ? super.fetchSize() : randomIntBetween(1,5)); } }

costin · 2018-11-13T20:43:43Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/Querier.java

-                        || (hits.length >= query.limit() && query.limit() > -1))) {
+                        (Boolean.TRUE.equals(response.isTerminatedEarly()) 
+                                || response.getHits().getTotalHits() == hits.length
+                                || hitRowSet.isLimitReached())) {
                    // if so, clear the scroll
                    clear(response.getScrollId(), ActionListener.wrap(
                            succeeded -> listener.onResponse(new SchemaSearchHitRowSet(schema, exts, hits, query.limit(), null)),


reuse the hitRowSet above instead of creating a new one.

costin · 2018-11-13T20:44:18Z

...k/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/SearchHitRowSet.java

@@ -91,6 +89,10 @@
            }
        }
    }
+


Method can have default visibility as it's not used anywhere else.

…o 35176_fix

costin · 2018-11-14T19:08:17Z

...l/qa/single-node/src/test/java/org/elasticsearch/xpack/sql/qa/single_node/JdbcCsvSpecIT.java

 public class JdbcCsvSpecIT extends CsvSpecTestCase {
    public JdbcCsvSpecIT(String fileName, String groupName, String testName, Integer lineNumber, CsvTestCase testCase) {
        super(fileName, groupName, testName, lineNumber, testCase);
    }

    @Override
-    protected ResultSet executeJdbcQuery(Connection con, String query) throws SQLException {
+    protected int fetchSize() {


Minor nitpick - to remove the super.fetchSize redundancy, you can make the method a one liner using the ternary conditional:

return fileName.startsWith("nested") && randomBoolean() ? randomIntBetween(1,5) : super.fetchSize();

…o 35176_fix

…query required limits (#35398)

astefan added 2 commits November 9, 2018 01:52

Clear the cursor if the inner hits size is enough to fulfill the requ…

274ca56

…ired query limits

Merge branch 'master' of https://github.com/elastic/elasticsearch int…

be11569

…o 35176_fix

astefan added >bug WIP v7.0.0 :Analytics/SQL SQL querying v6.6.0 labels Nov 9, 2018

astefan requested review from costin and matriv November 9, 2018 00:13

astefan added 2 commits November 9, 2018 16:28

Tests

741fa59

Merge branch 'master' of https://github.com/elastic/elasticsearch int…

8865f11

…o 35176_fix

astefan removed the WIP label Nov 9, 2018

Unnecessary import removed

d216bf2

matriv approved these changes Nov 9, 2018

View reviewed changes

costin requested changes Nov 11, 2018

View reviewed changes

costin added the v6.5.1 label Nov 11, 2018

astefan added 4 commits November 12, 2018 23:37

Addressed comments

2da8b1d

Merge branch 'master' of https://github.com/elastic/elasticsearch int…

934383f

…o 35176_fix

Merge branch 'master' of https://github.com/elastic/elasticsearch int…

2a0851c

…o 35176_fix

Removed unnecessary line

a87fc1a

costin approved these changes Nov 13, 2018

View reviewed changes

astefan added 2 commits November 14, 2018 17:01

Incorporated more feedback

f067c56

Merge branch 'master' of https://github.com/elastic/elasticsearch int…

0aae5af

…o 35176_fix

costin reviewed Nov 14, 2018

View reviewed changes

astefan added 2 commits November 15, 2018 07:36

Minor nitpick

9f7588a

Merge branch 'master' of https://github.com/elastic/elasticsearch int…

3b120e9

…o 35176_fix

astefan merged commit eaf010c into elastic:master Nov 15, 2018

astefan added a commit that referenced this pull request Nov 15, 2018

SQL: clear the cursor if nested inner hits are enough to fulfill the …

f270e27

…query required limits (#35398)

astefan added a commit that referenced this pull request Nov 15, 2018

SQL: clear the cursor if nested inner hits are enough to fulfill the …

03872f4

…query required limits (#35398)

astefan deleted the 35176_fix branch November 15, 2018 13:26

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL: clear the cursor if nested inner hits are enough to fulfill the query required limits #35398

SQL: clear the cursor if nested inner hits are enough to fulfill the query required limits #35398

astefan commented Nov 9, 2018

elasticmachine commented Nov 9, 2018

astefan commented Nov 9, 2018

astefan commented Nov 9, 2018

astefan commented Nov 9, 2018

matriv left a comment

costin left a comment

costin Nov 11, 2018

costin Nov 11, 2018

costin commented Nov 11, 2018

astefan commented Nov 12, 2018

costin left a comment

costin Nov 13, 2018

costin Nov 13, 2018

costin Nov 13, 2018

costin Nov 14, 2018

@@ @@ -91,6 +89,10 @@ @@
                           }
                       }
                   }

SQL: clear the cursor if nested inner hits are enough to fulfill the query required limits #35398

SQL: clear the cursor if nested inner hits are enough to fulfill the query required limits #35398

Conversation

astefan commented Nov 9, 2018

elasticmachine commented Nov 9, 2018

astefan commented Nov 9, 2018

astefan commented Nov 9, 2018

astefan commented Nov 9, 2018

matriv left a comment

Choose a reason for hiding this comment

costin left a comment

Choose a reason for hiding this comment

costin Nov 11, 2018

Choose a reason for hiding this comment

costin Nov 11, 2018

Choose a reason for hiding this comment

costin commented Nov 11, 2018

astefan commented Nov 12, 2018

costin left a comment

Choose a reason for hiding this comment

costin Nov 13, 2018

Choose a reason for hiding this comment

costin Nov 13, 2018

Choose a reason for hiding this comment

costin Nov 13, 2018

Choose a reason for hiding this comment

costin Nov 14, 2018

Choose a reason for hiding this comment