Skip to content

Conversation

@GalLalouche
Copy link
Contributor

@GalLalouche GalLalouche commented Nov 13, 2025

Pushes down count(*) by round_to to Lucene.
Example query:

FROM employees
| STATS COUNT(*) BY DATE_TRUNC(1 YEAR, hire_date)

This is actually a culmination of several rules:

  1. ReplaceDateTruncBucketWithRoundTo Replaces the DATE_TRUNC with a ROUND_TO
  2. ReplaceRoundToWithQueryAndTags Replaces the ROUND_TO with query and tags.
  3. PushCountQueryAndTagsToSource (This PR) Pushes the aggregation down to Lucene.

Note that a query with a filter is not yet supported, but will be done a follow-up PR.

FROM employees
| STATS COUNT(*) WHERE hire_date > "1985-01-01" BY d=DATE_TRUNC(1 YEAR, hire_date) 

private final Expression limit;
private final List<Attribute> attrs;
private final List<Stat> stats;
private final Stat stat;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this since we don't support multiple aggregates right now anyway. When we do, we can turn this back into a list.

public class ReplaceRoundToWithQueryAndTagsTests extends AbstractLocalPhysicalPlanOptimizerTests {

public ReplaceRoundToWithQueryAndTagsTests(String name, Configuration config) {
public class SubtituteRoundToTests extends AbstractLocalPhysicalPlanOptimizerTests {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Renamed since it now tests for both rewrites in the same rule batch.
  • I've also refactored this to reduce some of the duplication.
  • This could have really benefited from the planned golden test feature!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests for INLINE STATS with count + date_histogram as well, if there isn't yet? Having some CsvTests for them will be great. Just to make sure the new filter(>0) added does not give us troubles for the inline join after the aggregation.

fork and subquery may also have aggregation inside the branches, having some additional tests for them will give us extra confidence.

@GalLalouche GalLalouche marked this pull request as ready for review November 14, 2025 13:13
@GalLalouche GalLalouche requested a review from nik9000 November 14, 2025 13:13
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Nov 14, 2025
@GalLalouche GalLalouche added >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Nov 14, 2025
@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Nov 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @GalLalouche, I've created a changelog YAML for you.

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @GalLalouche , the new rule added makes sense to me. I added comments around additional tests, they will give us extra confidence of this change.

I'm just curious if there are any early performance results of this change yet? It will be really exciting to see the improvements.

I'll leave the review of the changes in operators to Nik.

);
}
assertMap("circuit breakers not reset to 0", stats, matchesMap().extraOk().entry("nodes", nodesMatcher));
// assertMap("circuit breakers not reset to 0", stats, matchesMap().extraOk().entry("nodes", nodesMatcher));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is commented out, is it ok to remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lords no! 😅 I commented it out while developing, abut it should definitely be commented back in!

LocalPhysicalOptimizerContext> {

@Override
protected PhysicalPlan rule(AggregateExec aggregateExec, LocalPhysicalOptimizerContext ctx) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like ctx is not used/needed by this rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Changed the parent to the non-parameterized version.

public class ReplaceRoundToWithQueryAndTagsTests extends AbstractLocalPhysicalPlanOptimizerTests {

public ReplaceRoundToWithQueryAndTagsTests(String name, Configuration config) {
public class SubtituteRoundToTests extends AbstractLocalPhysicalPlanOptimizerTests {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests for INLINE STATS with count + date_histogram as well, if there isn't yet? Having some CsvTests for them will be great. Just to make sure the new filter(>0) added does not give us troubles for the inline join after the aggregation.

fork and subquery may also have aggregation inside the branches, having some additional tests for them will give us extra confidence.

public List<ElementType> tagTypes() {
return List.of(switch (queryBuilderAndTags.getFirst().tags().getFirst()) {
case Integer i -> ElementType.INT;
case Long l -> ElementType.LONG;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason that double is not supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only support COUNT pushdown at the moment. I can simplify this by removing StatsType altogether, so it's clearer, but I wanted to avoid overly complicating this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants