Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: clear the cursor if nested inner hits are enough to fulfill the query required limits #35398

Merged
merged 13 commits into from
Nov 15, 2018

Conversation

astefan
Copy link
Contributor

@astefan astefan commented Nov 9, 2018

This is a fix for #35176. The bug was reported against 6.5, but this is reproduceable on master as well.

The very specific scenario where this happens assumes:

  • nested documents are used and returned in the query results
  • at least one matching document from results has more than one nested inner hit matching the query
  • the fetch size of the request (jdbc, rest) needs to be smaller than the required number of results of the query
  • fetch_size + calculated inner hits of the last cursor request = query's limit size

Example:

  • SELECT dep.dep_id FROM test_emp LIMIT 5
  • dep is a nested field
  • fetch_size set to 4
  • we assume that there are 5 documents matching the query and from these 5, one of them has 2 inner hits, the other four have one inner hit only

In this case, SQL will create a query with "size": 5 and 5 documents are returned, but SQL will count the total number of inner hits from all matching documents towards the LIMIT 5. In this case, the real limit is 6. SQL will create first scroll query with "size": 5, will look at the matching inner hits, will see that the matching ones are 6 (so the needed 5 are already there in the results) and that a second query to ES is not necessary anymore. But it will leave behind an open search context, which increases the memory usage unnecessarily.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@astefan
Copy link
Contributor Author

astefan commented Nov 9, 2018

Still, have to add tests to cover this scenario.

@astefan astefan removed the WIP label Nov 9, 2018
@astefan
Copy link
Contributor Author

astefan commented Nov 9, 2018

@costin @matriv added tests for this scenario, as well. Ready for review. Thanks.

@astefan
Copy link
Contributor Author

astefan commented Nov 9, 2018

Retest this please

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Really nice catch!!

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and good tests. However instead of adding a test class, the behavior should be incorporated in the existing CSV test suite.
Further more the FetchSizeTestCase needs to be extended to include a test for nested since that's the suite that handles related matters.

(Boolean.TRUE.equals(response.isTerminatedEarly())
|| response.getHits().getTotalHits() == hits.length
// or maybe the limit has been reached
|| (hits.length >= query.limit() && query.limit() > -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hits are computed inside hitRowSet so this condition (and its comment) are not needed anymore (they are confusing).

/**
* Test class to check https://github.com/elastic/elasticsearch/issues/35176 fix
*/
public class JdbcCsvNestedDocsIT extends JdbcCsvSpecIT {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to create a separate class - the logic here can be folded into JdbcCsvSpecIT which would override executeJdbcQuery and if the test is nested, use a smaller max random. To make the change smaller, add a maximumFetchSize() method in SpecBaseIntegrationTest which then gets used inside executeJdbcQuery.

@costin costin added the v6.5.1 label Nov 11, 2018
@costin
Copy link
Member

costin commented Nov 11, 2018

Since this is a (serious) bug, it needs backporting to 6.5.x I think.

@astefan
Copy link
Contributor Author

astefan commented Nov 12, 2018

@costin I've addressed review comments.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments - once addressed it's good for merging.

protected ResultSet executeJdbcQuery(Connection con, String query) throws SQLException {
// using a smaller fetchSize for nested documents' tests to uncover bugs
// similar with https://github.com/elastic/elasticsearch/issues/35176 quicker
if (fileName.startsWith("nested")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of overriding executeJdbcQuery, override fetchSize():

protected int fetchSize() {
        if (fileName.startsWith("nested")) {
           return randomBoolean() ? super.fetchSize() : randomIntBetween(1,5));
        }
  }

|| (hits.length >= query.limit() && query.limit() > -1))) {
(Boolean.TRUE.equals(response.isTerminatedEarly())
|| response.getHits().getTotalHits() == hits.length
|| hitRowSet.isLimitReached())) {
// if so, clear the scroll
clear(response.getScrollId(), ActionListener.wrap(
succeeded -> listener.onResponse(new SchemaSearchHitRowSet(schema, exts, hits, query.limit(), null)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reuse the hitRowSet above instead of creating a new one.

@@ -91,6 +89,10 @@
}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method can have default visibility as it's not used anywhere else.

public class JdbcCsvSpecIT extends CsvSpecTestCase {
public JdbcCsvSpecIT(String fileName, String groupName, String testName, Integer lineNumber, CsvTestCase testCase) {
super(fileName, groupName, testName, lineNumber, testCase);
}

@Override
protected ResultSet executeJdbcQuery(Connection con, String query) throws SQLException {
protected int fetchSize() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick - to remove the super.fetchSize redundancy, you can make the method a one liner using the ternary conditional:

return fileName.startsWith("nested") && randomBoolean() ? randomIntBetween(1,5) : super.fetchSize();

@astefan astefan merged commit eaf010c into elastic:master Nov 15, 2018
astefan added a commit that referenced this pull request Nov 15, 2018
astefan added a commit that referenced this pull request Nov 15, 2018
@astefan astefan deleted the 35176_fix branch November 15, 2018 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants