Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Avoid empty last pages for GROUP BY queries when possible #84356

Merged
merged 4 commits into from Mar 14, 2022

Conversation

Luegg
Copy link
Contributor

@Luegg Luegg commented Feb 24, 2022

Resolves #75528.

Instead of always returning an empty last page for GROUP BY queries, CompositeAggCursor will now only do so if it is not possible to tell wether there are more pages based on the composite aggregation response. This is the case in two situations:

  • The last page contains exactly fetch_size results. In this case, the composite aggregation return an after_key even if there are no more keys remaining (see also A way to tell whether more buckets can be fetched with after_key in composite aggregation #75573)
  • The query uses a bucket selector. In this case, the composite aggregation might return partial pages with less than size buckets and the buckets.size() < sizeRequested heuristic for detecting last pages does no longer work.

Hence, if any (or both) of the two conditions above applies, SQL will still return an empty last page. If neither of the conditions apply, the last page will always be non-empty.

This PR is also a weak prerequisite for addressing #84349 because it allows to immediately close PITs for aggregation queries returning only one page. As a result the performance impact of using PIT for aggregations should be minimized.

@elasticsearchmachine
Copy link
Collaborator

Hi @Luegg, I've created a changelog YAML for you.

@Luegg Luegg force-pushed the fix/reduceEmptyLastPages branch 2 times, most recently from 6a585a1 to 7818f65 Compare February 28, 2022 13:11
) {

if (log.isTraceEnabled()) {
logSearchResponse(response, log);
}
// there are some results
if (response.getAggregations().asList().isEmpty() == false) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was always true...


try {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try/catch seems to be a leftover that could have been removed in #83833

CompositeAggRowSet rowSet = makeRowSet.get();
Map<String, Object> afterKey = rowSet.afterKey();
// retry
if (mightProducePartialPages && shouldRetryDueToEmptyPage(response)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding mightProducePartialPages here ensures that we do not retry for queries without bucket selectors.

@Luegg Luegg marked this pull request as ready for review March 1, 2022 12:08
@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Mar 1, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -162,71 +163,64 @@ static void handle(
BiFunction<SearchSourceBuilder, CompositeAggRowSet, CompositeAggCursor> makeCursor,
Runnable retry,
ActionListener<Page> listener,
Schema schema
boolean mightProducePartialPages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: couldProducePartialPage would be more suggestive to me (for both arg and method).

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A small nit, the issue description could be tweaked a bit to make it clear when the empty page is being consumed (to not pass it to the client) as oppose to when it is NOT, e.g.: when the result size is less than the fetch size AND there are no bucket selector/pipeline aggregations.

Personally I had to read the description several times to understand whether the conditions listed where for the NOT case or not (and thus if there was an AND or OR between them).

private static void updateCompositeAfterKey(SearchResponse r, SearchSourceBuilder search) {
CompositeAggregation composite = getComposite(r);
static boolean mightProducePartialPages(CompositeAggregationBuilder aggregation) {
return aggregation.getPipelineAggregations().stream().anyMatch(a -> a instanceof BucketSelectorPipelineAggregationBuilder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we're using any other type of pipeline aggregation so simply returning if the size > 0 should be enough.
Also not a fun of using the streams api due to their cost - the above can become:

for (var a : aggregation.getPipelineAggregrations()) {
    return a instanceof BucketSelectorPipelineAggregationBuilder
}

listener::onFailure
)
);
}

private Supplier<SearchHitRowSet> makeRowSet(int sizeRequested, SearchResponse response) {
return () -> new SearchHitRowSet(extractors, mask, sizeRequested, limit, response);
private Supplier<SearchHitRowSet> makeRowSet(SearchResponse response) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Luegg Luegg added v8.1.1 and removed v8.1.1 labels Mar 14, 2022
@Luegg Luegg added auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport-and-merge Automatically create backport pull requests and merge when ready labels Mar 14, 2022
@elasticsearchmachine elasticsearchmachine merged commit 906dda0 into elastic:master Mar 14, 2022
@Luegg Luegg deleted the fix/reduceEmptyLastPages branch March 14, 2022 11:30
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.1 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 84356

@Luegg Luegg removed the v8.1.1 label Mar 14, 2022
@Luegg
Copy link
Contributor Author

Luegg commented Mar 14, 2022

Not backporting to 8.1.1 because of changeset overlapping with #83381

elasticsearchmachine pushed a commit that referenced this pull request Mar 14, 2022
Resolves #84349

This PR has a small overlap with #84356 but can be merged independently.
javanna pushed a commit to javanna/elasticsearch that referenced this pull request Mar 16, 2022
…ic#84356)

Resolves elastic#75528.

Instead of always returning an empty last page for `GROUP BY` queries,
`CompositeAggCursor` will now only do so if it is not possible to tell
wether there are more pages based on the composite aggregation response.
This is the case in two situations: * The last page contains exactly
`fetch_size` results. In this case, the composite aggregation return an
`after_key` even if there are no more keys remaining (see also
elastic#75573) * The query uses
a bucket selector. In this case, the composite aggregation might return
partial pages with less than `size` buckets and the `buckets.size() <
sizeRequested` heuristic for detecting last pages does no longer work.

Hence, if any (or both) of the two conditions above applies, SQL will
still return an empty last page. If neither of the conditions apply, the
last page will always be non-empty.

This PR is also a weak prerequisite for addressing
elastic#84349 because it allows
to immediately close PITs for aggregation queries returning only one
page. As a result the performance impact of using PIT for aggregations
should be minimized.
javanna pushed a commit to javanna/elasticsearch that referenced this pull request Mar 16, 2022
Resolves elastic#84349

This PR has a small overlap with elastic#84356 but can be merged independently.
CohenIdo pushed a commit to CohenIdo/elasticsearch that referenced this pull request Mar 24, 2022
…ic#84356)

Resolves elastic#75528.

Instead of always returning an empty last page for `GROUP BY` queries,
`CompositeAggCursor` will now only do so if it is not possible to tell
wether there are more pages based on the composite aggregation response.
This is the case in two situations: * The last page contains exactly
`fetch_size` results. In this case, the composite aggregation return an
`after_key` even if there are no more keys remaining (see also
elastic#75573) * The query uses
a bucket selector. In this case, the composite aggregation might return
partial pages with less than `size` buckets and the `buckets.size() <
sizeRequested` heuristic for detecting last pages does no longer work.

Hence, if any (or both) of the two conditions above applies, SQL will
still return an empty last page. If neither of the conditions apply, the
last page will always be non-empty.

This PR is also a weak prerequisite for addressing
elastic#84349 because it allows
to immediately close PITs for aggregation queries returning only one
page. As a result the performance impact of using PIT for aggregations
should be minimized.
CohenIdo pushed a commit to CohenIdo/elasticsearch that referenced this pull request Mar 24, 2022
Resolves elastic#84349

This PR has a small overlap with elastic#84356 but can be merged independently.
elasticsearchmachine pushed a commit that referenced this pull request Mar 31, 2022
resolves #85520

The failure was caused by the assumption that groupBy queries always
return a scroll cursor that has been fixed in
#84356.

Because we will run into the same problem again with 8.2.1 this fix also
needs to be backported to 8.2.
Luegg pushed a commit to Luegg/elasticsearch that referenced this pull request Mar 31, 2022
resolves elastic#85520

The failure was caused by the assumption that groupBy queries always
return a scroll cursor that has been fixed in
elastic#84356.

Because we will run into the same problem again with 8.2.1 this fix also
needs to be backported to 8.2.
elasticsearchmachine pushed a commit that referenced this pull request Mar 31, 2022
resolves #85520

The failure was caused by the assumption that groupBy queries always
return a scroll cursor that has been fixed in
#84356.

Because we will run into the same problem again with 8.2.1 this fix also
needs to be backported to 8.2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/SQL SQL querying auto-backport-and-merge Automatically create backport pull requests and merge when ready auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug Team:QL (Deprecated) Meta label for query languages team v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: queries with GROUP BY always return a scroll cursor in REST API
7 participants