New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save memory when string terms are not on top #57758
Conversation
This reworks string flavored implementations of the `terms` aggregation to save memory when it is under another bucket by dropping the usage of `asMultiBucketAggregator`.
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
In my tests this makes the |
I'm putting together a lower cardinality test case to see what that does. |
When the terms aggregation is inside another aggregation and cardinality is high (~5 million buckets collected), this is flat for performance for global ords and marginally better for the "map" strategy (before after). When the cardinality is lower (~30 thousand buckets collected) this is about 15% faster for global ords (before after). I expect this is because It uses much less memory for the "map" strategy when the terms agg is a child of a high cardinality agg because the bucket collection strategy can share the map of terms. @polyfractal had asked me about the performance difference for At some point we could indeed investigate using those strategies with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
return collectsFromSingleBucket ? new FromSingle(bigArrays) : new FromMany(bigArrays); | ||
} | ||
|
||
private BytesKeyedBucketOrds() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that reminds me, I should put a private default constructor on ValuesSourceConfig
, thanks.
this.collectionStrategy = new RemapGlobalOrds(collectsFromSingleBucket); | ||
} else { | ||
// Dense ords don't know how to collect from many buckets | ||
assert collectsFromSingleBucket; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How confidant are we that this assert will never trip? I'm wondering if it makes sense to change the condition to read if (remapGlobalOrds || collectFromSingleBucket==false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty confident. But I'll switch it to a hard check just for extra assurance.
run elasticsearch-ci/default-distro |
This reworks string flavored implementations of the `terms` aggregation to save memory when it is under another bucket by dropping the usage of `asMultiBucketAggregator`.
This reworks string flavored implementations of the
terms
aggregationto save memory when it is under another bucket by dropping the usage of
asMultiBucketAggregator
.