-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: clear the cursor if nested inner hits are enough to fulfill the query required limits #35398
Conversation
Pinging @elastic/es-search-aggs |
Still, have to add tests to cover this scenario. |
Retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Really nice catch!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch and good tests. However instead of adding a test class, the behavior should be incorporated in the existing CSV test suite.
Further more the FetchSizeTestCase
needs to be extended to include a test for nested since that's the suite that handles related matters.
(Boolean.TRUE.equals(response.isTerminatedEarly()) | ||
|| response.getHits().getTotalHits() == hits.length | ||
// or maybe the limit has been reached | ||
|| (hits.length >= query.limit() && query.limit() > -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hits are computed inside hitRowSet so this condition (and its comment) are not needed anymore (they are confusing).
/** | ||
* Test class to check https://github.com/elastic/elasticsearch/issues/35176 fix | ||
*/ | ||
public class JdbcCsvNestedDocsIT extends JdbcCsvSpecIT { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to create a separate class - the logic here can be folded into JdbcCsvSpecIT
which would override executeJdbcQuery
and if the test is nested
, use a smaller max random. To make the change smaller, add a maximumFetchSize() method in SpecBaseIntegrationTest
which then gets used inside executeJdbcQuery
.
Since this is a (serious) bug, it needs backporting to 6.5.x I think. |
@costin I've addressed review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments - once addressed it's good for merging.
protected ResultSet executeJdbcQuery(Connection con, String query) throws SQLException { | ||
// using a smaller fetchSize for nested documents' tests to uncover bugs | ||
// similar with https://github.com/elastic/elasticsearch/issues/35176 quicker | ||
if (fileName.startsWith("nested")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of overriding executeJdbcQuery
, override fetchSize()
:
protected int fetchSize() {
if (fileName.startsWith("nested")) {
return randomBoolean() ? super.fetchSize() : randomIntBetween(1,5));
}
}
|| (hits.length >= query.limit() && query.limit() > -1))) { | ||
(Boolean.TRUE.equals(response.isTerminatedEarly()) | ||
|| response.getHits().getTotalHits() == hits.length | ||
|| hitRowSet.isLimitReached())) { | ||
// if so, clear the scroll | ||
clear(response.getScrollId(), ActionListener.wrap( | ||
succeeded -> listener.onResponse(new SchemaSearchHitRowSet(schema, exts, hits, query.limit(), null)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reuse the hitRowSet
above instead of creating a new one.
@@ -91,6 +89,10 @@ | |||
} | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method can have default visibility as it's not used anywhere else.
public class JdbcCsvSpecIT extends CsvSpecTestCase { | ||
public JdbcCsvSpecIT(String fileName, String groupName, String testName, Integer lineNumber, CsvTestCase testCase) { | ||
super(fileName, groupName, testName, lineNumber, testCase); | ||
} | ||
|
||
@Override | ||
protected ResultSet executeJdbcQuery(Connection con, String query) throws SQLException { | ||
protected int fetchSize() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick - to remove the super.fetchSize
redundancy, you can make the method a one liner using the ternary conditional:
return fileName.startsWith("nested") && randomBoolean() ? randomIntBetween(1,5) : super.fetchSize();
…query required limits (#35398)
…query required limits (#35398)
This is a fix for #35176. The bug was reported against 6.5, but this is reproduceable on master as well.
The very specific scenario where this happens assumes:
Example:
SELECT dep.dep_id FROM test_emp LIMIT 5
dep
is anested
fieldfetch_size
set to 4In this case, SQL will create a query with
"size": 5
and 5 documents are returned, but SQL will count the total number of inner hits from all matching documents towards theLIMIT 5
. In this case, the real limit is 6. SQL will create first scroll query with"size": 5
, will look at the matching inner hits, will see that the matching ones are 6 (so the needed 5 are already there in the results) and that a second query to ES is not necessary anymore. But it will leave behind an open search context, which increases the memory usage unnecessarily.