Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge remaining sig_terms into terms #57397

Merged
merged 7 commits into from
Jun 4, 2020
Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented May 31, 2020

Merges the remaining implementation of significant_terms into terms
so that we can more easilly make them work properly without
asMultiBucketAggregator which should save memory and speed them up.

Relates #56487

Merges the remaining implementation of `significant_terms` into `terms`
so that we can more easilly make them work properly without
`asMultiBucketAggregator` which *should* save memory and speed them up.

Relates elastic#56487
@nik9000
Copy link
Member Author

nik9000 commented May 31, 2020

@elasticsearchmachine run elasticsearch-ci/default-distro

@nik9000
Copy link
Member Author

nik9000 commented May 31, 2020

run elasticsearch-ci/1

1 similar comment
@nik9000
Copy link
Member Author

nik9000 commented Jun 1, 2020

run elasticsearch-ci/1

@nik9000 nik9000 marked this pull request as ready for review June 1, 2020 12:40
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@nik9000 nik9000 requested a review from not-napoleon June 1, 2020 12:41
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 1, 2020
@@ -648,7 +647,6 @@ void updateBucket(OrdBucket spare, long globalOrd, long bucketOrd, long docCount
spare.globalOrd = globalOrd;
spare.bucketOrd = bucketOrd;
spare.docCount = docCount;
otherDocCount += docCount;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was causing inaccurate doc counts when shard_min_doc_count was more than 1.

@@ -164,17 +164,18 @@ public void collectDebugInfo(BiConsumer<String, Object> add) {
BucketOrdsEnum ordsEnum = bucketOrds.ordsEnum(owningBucketOrds[ordIdx]);
Supplier<B> emptyBucketBuilder = emptyBucketBuilder(owningBucketOrds[ordIdx]);
while (ordsEnum.next()) {
long docCount = bucketDocCount(ordsEnum.ord());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I flip the if statement here and moved the spare initialization down to right before I need it to line up with the way the map version works.

/**
* An aggregator of string values.
*/
public class StringTermsAggregator extends AbstractStringTermsAggregator {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just renamed this class to MapStringTermsAggregator but a lot changed.

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Much easier to see the relationship between these two aggs now.

@nik9000 nik9000 merged commit 69cd443 into elastic:master Jun 4, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 4, 2020
Merges the remaining implementation of `significant_terms` into `terms`
so that we can more easilly make them work properly without
`asMultiBucketAggregator` which *should* save memory and speed them up.

Relates elastic#56487
nik9000 added a commit that referenced this pull request Jun 4, 2020
Merges the remaining implementation of `significant_terms` into `terms`
so that we can more easilly make them work properly without
`asMultiBucketAggregator` which *should* save memory and speed them up.

Relates #56487
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 4, 2020
nik9000 added a commit that referenced this pull request Jun 4, 2020
@jakelandis jakelandis removed the v8.0.0 label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants