-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Replace scroll cursors with point-in-time and search_after #83381
SQL: Replace scroll cursors with point-in-time and search_after #83381
Conversation
Hi @Luegg, I've created a changelog YAML for you. |
429fbb3
to
c413dd3
Compare
Hi @Luegg, I've created a changelog YAML for you. |
3809144
to
4bcfa9b
Compare
d21c6f4
to
352647f
Compare
* It should exercise the same code as the other APIs but if we were truly | ||
* paranoid we'd hack together something to test the others as well. | ||
*/ | ||
public void testHijackScrollFails() throws Exception { | ||
createUser("full_access", "rest_minimal"); | ||
public void testHijackCursorFails() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this test asserted that a user with sufficient privilege to perform an equivalent request cannot hijack someone else's cursors. Since scroll cursors are "owned" by users, this was something SQL guaranteed for search hit queries. Now, a PIT is shared between users and this guarantee no longer holds.
The test now asserts that users with less privileges cannot hijack cursors for all sorts of queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, now I know why I removed it ;-)
I'm for keep the test here, but just curious if you looked into if it's the PIT part that gets tested (downstream in ES), or simply that a user without proper rights can issue a search request (with or w/o a PIT ID) -- also considering the query randomization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't test anything specific to PIT it just ensures that SQL cursors do not magically bypass ES security (which is very unlikely I hope).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general.
I've left some comments and questions and, also, upon testing this I've noticed there is a slight inconsistency in the use of the cursor: with fetch_size: 1
, the last page still has a cursor element even if the next page is empty, whereas if the last page has a size smaller than the fetch_size
, there is no cursor
.
This test fails if bwc test spans ES versions that introduce breaking changes. In this case, requests to new nodes will be | ||
redirected to the old nodes which will generate the cursor. Subsequent scroll requests to the new node with this cursor will | ||
fail with a version conflict. | ||
""", bwcVersion.after(VersionCompatibilityChecks.INTRODUCING_UNSIGNED_LONG)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this test is relevant. It looks like an overcomplicated scenario to test that cursors (either scroll of PIT) are not supported in a mixed version cluster. They are not supported in mixed versions either way, so why is relevant in this test to have the bwc version to be the one after unsigned_long support has been introduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test itself encodes the level of compatibility that we currently provide during a rolling upgrade (you can scroll through a dataset as long as you're hitting nodes on the same version as the one that produced the cursor). So I think it has it's justification to test it. Unfortunately, that's not always working as expected. But it's weird to have it disabled for specific versions. I guess it's better to just @AwaitFix
it. I've created an issue that explains the problem: #83726
runSql(new StringEntity(cursor(cursor).mode(mode).toString(), ContentType.APPLICATION_JSON), StringUtils.EMPTY, mode) | ||
); | ||
|
||
assertNull(cursor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this test is 100% valid. You are expecting a cursor
element to not exist for the last page in the request, whereas response.remove("cursor")
can also return null
if the cursor
key actually exists in the map but it's null
. It would be better if you'd look at the last page and check that response
doesn't actually contain the cursor
key.
@@ -187,7 +184,7 @@ static void handle( | |||
byte[] queryAsBytes = null; | |||
if (afterKey != null) { | |||
updateSourceAfterKey(afterKey, source); | |||
queryAsBytes = serializeQuery(source); | |||
queryAsBytes = Querier.serializeQuery(source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, use static imports for serializeQuery
and deserializeQuery
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have some guidelines on static imports? I see both ways of using static members throughout the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure tbh, but in QL code base such a method call would be done with a static import. We tend to use static imports in general, unless the code itself is more clear if the class is prefixed as well. In this particular case I don't see the utility of having Querier
in there as well and without it the code is less bloated as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing official however I tend to use them:
- when a method is used multiple times
- when there's no method similar present in the current class (which typically occurs in testing)
- when it reduces the method length and prevents wrapping
@@ -91,18 +90,13 @@ public void testCancellation() throws Exception { | |||
|
|||
disableBlocks(plugins); | |||
Exception exception = expectThrows(Exception.class, future::get); | |||
Throwable inner = ExceptionsHelper.unwrap(exception, SearchPhaseExecutionException.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this line not needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final OpenPointInTimeRequest openPitRequest = new OpenPointInTimeRequest(search.indices()).indicesOptions(search.indicesOptions()) | ||
.keepAlive(cfg.pageTimeout()); | ||
|
||
client.execute(OpenPointInTimeAction.INSTANCE, openPitRequest, wrap((openPointInTimeResponse) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit and not a big deal, wrapping with brackets a single argument lambda is unnecessary and the code doesn't look so clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I actually prefer it the other way around but it seems to be pretty consistent across the QL code base. I'll change it here but it will be hard for me to always adhere to it if checkstyle does not yell at me about it ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the style is not enforced in a formal way in the QL code, other than PR reviewers that have looked at the code for some time already. When checkstyle has been eventually enforced throughout the ES code base, the code style in QL has already established its own way.
Regarding the use of brackets for single argument lambdas, yes, it's pretty consistent (only 22 exceptions out of almost 2000 uses). With time, you'll get used to it and we are here to help you :-) with reviews like mine here.
|
||
SearchSourceBuilder query = q; | ||
if (log.isTraceEnabled()) { | ||
log.trace("About to execute composite query {}", StringUtils.toString(query)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No composite query here ;-).
|
||
byte[] nextQuery; | ||
try { | ||
nextQuery = Querier.serializeQuery(source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Static import please
return false; | ||
} | ||
SearchHitCursor other = (SearchHitCursor) obj; | ||
return Arrays.equals(nextQuery, other.nextQuery) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do the "costly" comparisons last and have includeFrozen
and limit
as first ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only have one outstanding question.
|
||
public static void closePointInTime(Client client, String pointInTimeId, ActionListener<Boolean> listener) { | ||
// request should not be made with the parent task assigned because the parent task might already be canceled | ||
client = client instanceof ParentTaskAssigningClient wrapperClient ? wrapperClient.unwrap() : client; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this could be moved inside the branch.
} catch (Exception ex) { | ||
cleanup(response, ex); | ||
} | ||
handleResponse(response, delegate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might have to revisit support for partial search results
Shouldn't the shard failures still be checked before handling the response? Not sure if we even have a test for this.
); | ||
} | ||
|
||
protected Supplier<SearchHitRowSet> makeRowSet(int sizeRequested, SearchResponse response) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you're preparing the ground for the the other cursors, this can be private.
} | ||
|
||
private static void updateSearchAfter(SearchHit[] hits, SearchSourceBuilder source) { | ||
assert hits.length > 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get a successful response with no hits? Maybe an AIOBE would be preferred hier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, AIOBE is actually preferable here because it does not tear down the whole world with it. Since this method only gets called if hasRemaining
is true
(and that's never the case with empty hits
) this cannot happen anyway.
* It should exercise the same code as the other APIs but if we were truly | ||
* paranoid we'd hack together something to test the others as well. | ||
*/ | ||
public void testHijackScrollFails() throws Exception { | ||
createUser("full_access", "rest_minimal"); | ||
public void testHijackCursorFails() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, now I know why I removed it ;-)
I'm for keep the test here, but just curious if you looked into if it's the PIT part that gets tested (downstream in ES), or simply that a user without proper rights can issue a search request (with or w/o a PIT ID) -- also considering the query randomization.
I've had another look to get rid of the empty last page in every case but it's probably not worth it. I think there are two ways to achieve this and both come at a cost:
Given this, I would go forward with the empty last page for now since it avoid some potential pitfalls. e.g. what to do if the |
I still think it's better to get rid of the scroll based implementation for good. Keeping it around would require to put in some feature toggle to somehow activate it and it will make it harder to abstract away some of the cursor logic when going forward with adding PIT to the other queries (we would always have to ensure compatibility with scroll based queries which has a slightly different life cycle and error modes). |
Regarding
Agree, but please, create an issue to have this behavior recorded somewhere (last page having a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left one clarification comment on setting the timeout and another regarding the check on total hits.
...lugin/sql/src/main/java/org/elasticsearch/xpack/sql/execution/search/CompositeAggCursor.java
Outdated
Show resolved
Hide resolved
// compute remaining limit (only if the limit is specified - that is, positive). | ||
int remaining = limit < 0 ? limit : limit - size; | ||
// either the search returned fewer records than requested or the limit is exhausted | ||
if (size < sizeRequested || remaining == 0 | ||
// or exactly `totalHits` records have been fetched | ||
|| totalHits != null && totalHits.value == hits.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible for queries to ask the total hits in which case this check is useful; in the vast majority of cases it will be ignored due to the null check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, the only case I know about is COUNT(*)
on implicit aggregations though (and in this case size is always 1). Are there others? Since it's not tested I'd personally prefer to keep it out for now. I think it's also not quite correct as is because it does not consider the relation (eq vs gte).
Resolves #61873
The goal of this PR is to remove the use of the deprecated scroll cursors in SQL. Functionality and APIs should remain the same with one notable difference: The last page of a search hit query used to always include a scroll cursor if it is non-empty. This is no longer the case, if a result set is exhausted, the PIT will be closed and the last page does not include a cursor.
Note, PIT can also be used for aggregation and PIVOT queries but this is not in the scope of this PR and will be implemented in a follow up.
Additionally, this PR resolves #80523 because the total doc count is no longer required.