New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Freq Terms Enum #5489
Freq Terms Enum #5489
Conversation
A frequency caching terms enum, that also allows to be configured with an optional filter. To be used by both significant terms and phrase suggester. This change extracts the frequency caching into the same code, and allow in the future to add a filter to control/customize the background frequencies
Aye, it is still missing (should be added in another change) the ability to parse and provide the filter form the sig terms agg |
@@ -79,6 +79,7 @@ public SignificantStringTerms buildAggregation(long owningBucketOrdinal) { | |||
for (int i = 0; i < bucketOrds.size(); i++) { | |||
if (spare == null) { | |||
spare = new SignificantStringTerms.Bucket(new BytesRef(), 0, 0, 0, 0, null); | |||
termsAggFactory.buildTermsEnum(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be called in front of the for-loop instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to only call it when we create the spare, otherwise we don't need it. Same logic applies to creating the spare lazily, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the code is a bit confusing, but because insertion into the priority queue is done with insertWithOverflow
, spare == null
is going to be true on the first ordered.size()
iterations of the loop and false afterwards. So here termsAggFactory.buildTermsEnum(context);
would not be called only once but ordered.size()
times.
I wonder if the FreqTermsEnum needs splitting into two classes - one that handles filtering TTF and DF stats and a separate one that handles the responsibility of caching TTF and DF stats? |
++ @markharwood |
A frequency caching terms enum, that also allows to be configured with an optional filter. To be used by both significant terms and phrase suggester.
This change extracts the frequency caching into the same code, and allow in the future to add a filter to control/customize the background frequencies