New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Avoid empty last pages for GROUP BY queries when possible #84356
SQL: Avoid empty last pages for GROUP BY queries when possible #84356
Conversation
Hi @Luegg, I've created a changelog YAML for you. |
6a585a1
to
7818f65
Compare
7818f65
to
a5854ad
Compare
) { | ||
|
||
if (log.isTraceEnabled()) { | ||
logSearchResponse(response, log); | ||
} | ||
// there are some results | ||
if (response.getAggregations().asList().isEmpty() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was always true...
|
||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The try/catch
seems to be a leftover that could have been removed in #83833
CompositeAggRowSet rowSet = makeRowSet.get(); | ||
Map<String, Object> afterKey = rowSet.afterKey(); | ||
// retry | ||
if (mightProducePartialPages && shouldRetryDueToEmptyPage(response)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding mightProducePartialPages
here ensures that we do not retry for queries without bucket selectors.
Pinging @elastic/es-ql (Team:QL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -162,71 +163,64 @@ static void handle( | |||
BiFunction<SearchSourceBuilder, CompositeAggRowSet, CompositeAggCursor> makeCursor, | |||
Runnable retry, | |||
ActionListener<Page> listener, | |||
Schema schema | |||
boolean mightProducePartialPages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: couldProducePartialPage
would be more suggestive to me (for both arg and method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A small nit, the issue description could be tweaked a bit to make it clear when the empty page is being consumed (to not pass it to the client) as oppose to when it is NOT, e.g.: when the result size is less than the fetch size AND there are no bucket selector/pipeline aggregations.
Personally I had to read the description several times to understand whether the conditions listed where for the NOT case or not (and thus if there was an AND or OR between them).
private static void updateCompositeAfterKey(SearchResponse r, SearchSourceBuilder search) { | ||
CompositeAggregation composite = getComposite(r); | ||
static boolean mightProducePartialPages(CompositeAggregationBuilder aggregation) { | ||
return aggregation.getPipelineAggregations().stream().anyMatch(a -> a instanceof BucketSelectorPipelineAggregationBuilder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we're using any other type of pipeline aggregation so simply returning if the size > 0 should be enough.
Also not a fun of using the streams api due to their cost - the above can become:
for (var a : aggregation.getPipelineAggregrations()) {
return a instanceof BucketSelectorPipelineAggregationBuilder
}
listener::onFailure | ||
) | ||
); | ||
} | ||
|
||
private Supplier<SearchHitRowSet> makeRowSet(int sizeRequested, SearchResponse response) { | ||
return () -> new SearchHitRowSet(extractors, mask, sizeRequested, limit, response); | ||
private Supplier<SearchHitRowSet> makeRowSet(SearchResponse response) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
2f11cc3
to
c587609
Compare
💔 Backport failed
You can use sqren/backport to manually backport by running |
Not backporting to 8.1.1 because of changeset overlapping with #83381 |
…ic#84356) Resolves elastic#75528. Instead of always returning an empty last page for `GROUP BY` queries, `CompositeAggCursor` will now only do so if it is not possible to tell wether there are more pages based on the composite aggregation response. This is the case in two situations: * The last page contains exactly `fetch_size` results. In this case, the composite aggregation return an `after_key` even if there are no more keys remaining (see also elastic#75573) * The query uses a bucket selector. In this case, the composite aggregation might return partial pages with less than `size` buckets and the `buckets.size() < sizeRequested` heuristic for detecting last pages does no longer work. Hence, if any (or both) of the two conditions above applies, SQL will still return an empty last page. If neither of the conditions apply, the last page will always be non-empty. This PR is also a weak prerequisite for addressing elastic#84349 because it allows to immediately close PITs for aggregation queries returning only one page. As a result the performance impact of using PIT for aggregations should be minimized.
Resolves elastic#84349 This PR has a small overlap with elastic#84356 but can be merged independently.
…ic#84356) Resolves elastic#75528. Instead of always returning an empty last page for `GROUP BY` queries, `CompositeAggCursor` will now only do so if it is not possible to tell wether there are more pages based on the composite aggregation response. This is the case in two situations: * The last page contains exactly `fetch_size` results. In this case, the composite aggregation return an `after_key` even if there are no more keys remaining (see also elastic#75573) * The query uses a bucket selector. In this case, the composite aggregation might return partial pages with less than `size` buckets and the `buckets.size() < sizeRequested` heuristic for detecting last pages does no longer work. Hence, if any (or both) of the two conditions above applies, SQL will still return an empty last page. If neither of the conditions apply, the last page will always be non-empty. This PR is also a weak prerequisite for addressing elastic#84349 because it allows to immediately close PITs for aggregation queries returning only one page. As a result the performance impact of using PIT for aggregations should be minimized.
Resolves elastic#84349 This PR has a small overlap with elastic#84356 but can be merged independently.
resolves elastic#85520 The failure was caused by the assumption that groupBy queries always return a scroll cursor that has been fixed in elastic#84356. Because we will run into the same problem again with 8.2.1 this fix also needs to be backported to 8.2.
Resolves #75528.
Instead of always returning an empty last page for
GROUP BY
queries,CompositeAggCursor
will now only do so if it is not possible to tell wether there are more pages based on the composite aggregation response. This is the case in two situations:fetch_size
results. In this case, the composite aggregation return anafter_key
even if there are no more keys remaining (see also A way to tell whether more buckets can be fetched with after_key in composite aggregation #75573)size
buckets and thebuckets.size() < sizeRequested
heuristic for detecting last pages does no longer work.Hence, if any (or both) of the two conditions above applies, SQL will still return an empty last page. If neither of the conditions apply, the last page will always be non-empty.
This PR is also a weak prerequisite for addressing #84349 because it allows to immediately close PITs for aggregation queries returning only one page. As a result the performance impact of using PIT for aggregations should be minimized.