-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Transform] Optimize composite agg execution using ordered groupings #75424
[Transform] Optimize composite agg execution using ordered groupings #75424
Conversation
…togram group by comes first. The order is only changed for execution, the provided config remains unchanged
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic.
Also, good looking to the future, it might be good to bubble up group_by values where the index is sorted by that value.
Heartily approve.
} | ||
|
||
// Arrays.sort provides stable sort (to respect the input order if priorities match), Collections.sort not | ||
UnmodifiableEntryWithPriority[] prioritizedGroups = new UnmodifiableEntryWithPriority[groups.size()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnmodifiableEntryWithPriority
could be replaced with Tuple<Entry<String, SingleGroupSource>, Integer>
Then you could do Comparitor.comparing(Tuple::v2)
. The static class seems like a ton of unnecessary code. But this is a minor quibble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had something like that in one of the 1st versions (before realizing the problem with sort), however generic types and generic arrays don't work together (I could suppress it, but don't like to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however generic types and generic arrays don't work together
Very true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Good job! Just 2 minor remarks.
...sform/src/main/java/org/elasticsearch/xpack/transform/transforms/pivot/GroupByOptimizer.java
Outdated
Show resolved
Hide resolved
prioritizedGroups[index++] = new UnmodifiableEntryWithPriority(groupBy, priority); | ||
} | ||
|
||
Arrays.sort(prioritizedGroups, (a, b) -> b.compareTo(a)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could Collections.reverseOrder()
be used as a comparator here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not reversing the order, I want to re-order according to the priority I calculated from inspecting the group by definition. In case the priority equals the order of the config should remain that's why I need a stable sort, it respects the insertion order.
Collections.sort
does not provide stable sort. But re-thinking about it, I could solve that by boosting the 1st group with groups.size()
, the 2nd with groups.size() - 1
and so on.
…transform/transforms/pivot/GroupByOptimizer.java Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co>
…hs/elasticsearch into transform-optimizeGroupByOrder
I simplified |
…lastic#75424) Automatically reorder group_by for composite aggs, ensuring date histogram group by comes first. The order is only changed for execution, the provided config remains unchanged. In case of 2 group_by's of the same order type, the configuration order is respected. Script and runtime field based group_by's are penalized.
Automatically reorder group_by for composite aggs, ensuring date histogram
group by comes first. The order is only changed for execution, the provided
config remains unchanged.
In case of 2 group_by's of the same order type, the configuration order is
respected. Script and runtime field based group_by's are penalized.