New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parsing to Significant Terms aggregations #24682

Merged
merged 3 commits into from May 16, 2017

Conversation

Projects
None yet
3 participants
@tlrx
Member

tlrx commented May 15, 2017

This pull request adds parsing methods for the SignificantStringTerms and SignificantLongTerms aggregations.

@javanna

left a few minors, LGTM otherwise

if (key != null) {
return Long.toString(key);
}
return null;

This comment has been minimized.

@javanna

javanna May 16, 2017

Member

can this happen in practice?

This comment has been minimized.

@tlrx

tlrx May 16, 2017

Member

No, I should have changed that.

if (key != null) {
return key.utf8ToString();
}
return null;

This comment has been minimized.

@javanna

javanna May 16, 2017

Member

same as above, just wondering if this can really happen, when can a key be null?

This comment has been minimized.

@tlrx

tlrx May 16, 2017

Member

No, we can return key.utf8ToString() directly.

This comment has been minimized.

@javanna
}
@Override
public int compareTerm(SignificantTerms.Bucket other) {

This comment has been minimized.

@javanna

javanna May 16, 2017

Member

is this method needed? I think it isn't used anywhere?

This comment has been minimized.

@tlrx

tlrx May 16, 2017

Member

It seems it's not used but I prefer to leave it in core - I saw something about it might be needed in scripted aggs or something. I'll try to figure this out, and if it's unused then I'll create a separate PR in core.

This comment has been minimized.

@javanna

javanna May 16, 2017

Member

sounds good.

This comment has been minimized.

@tlrx

tlrx May 16, 2017

Member

Created #24714 to remove the method.

protected void assertBucket(MultiBucketsAggregation.Bucket expected, MultiBucketsAggregation.Bucket actual, boolean checkOrder) {
super.assertBucket(expected, actual, checkOrder);

This comment has been minimized.

@javanna

javanna May 16, 2017

Member

remove one of these empty lines? ;)

@Before
public void setUpSignificanceHeuristic() {
significanceHeuristic = randomSignificanceHeuristic();
format = randomNumericDocValueFormat();

This comment has been minimized.

@javanna

javanna May 16, 2017

Member

I think that to make this work as part of AggregationsTests you need to rename this method to setUp and remove before annotation?

This comment has been minimized.

@tlrx

tlrx May 16, 2017

Member

The AggregationsTests passed so I didn't catch it. I changed that, thanks.

This comment has been minimized.

@javanna

javanna May 16, 2017

Member

ok weird that it worked. does the format have a default value in case this method is not called?

This comment has been minimized.

@tlrx

tlrx May 16, 2017

Member

No... but the output does not depend on the format (or heuristic) so a null format produces a valid XContent output that is parseable.

keyConsumer.accept(parser, bucket);
} else if (CommonFields.DOC_COUNT.getPreferredName().equals(currentFieldName)) {
long value = parser.longValue();
bucket.subsetDf = value;

This comment has been minimized.

@javanna

javanna May 16, 2017

Member

this one we set although it is never printed out as part of toXContent?

This comment has been minimized.

@tlrx

tlrx May 16, 2017

Member

We render the doc count which is in fact the subsetDf.

@cbuescher

LGTM, I left a few minor comments that might either be addressed, ignored or handled as follow ups, as you prefer.

implements SignificantTerms {
private static final String SCORE = "score";
private static final String BG_COUNT = "bg_count";

This comment has been minimized.

@cbuescher

cbuescher May 16, 2017

Member

Could we reuse the constants used in InternalSignificantTerms by making them public? I think we did that elsewhere and also do this with the CommonFields.

This comment has been minimized.

@tlrx

tlrx May 16, 2017

Member

Sure - I changed them in core specially and after that I forgot to use them :/ Thanks

@Override
public SignificantTerms.Bucket getBucketByKey(String term) {
for (SignificantTerms.Bucket bucket : getBuckets()) {

This comment has been minimized.

@cbuescher

cbuescher May 16, 2017

Member

Should we lazily compute a bucketMap like in InternalMappedSignificantTerms and then be able reuse it when this method gets called many times? I have no strong opinions on this though.

This comment has been minimized.

@tlrx

tlrx May 16, 2017

Member

I don't have too... I tend to think that it can be done on caller's side if a lot of bucket are going to be retrieve by their keys. I'll add one for the sake of coherency with the internal implementations and other parsed aggregations.

This comment has been minimized.

@tlrx

tlrx May 16, 2017

Member

I also added a test about this method.

@Override
public long getSubsetSize() {
throw new UnsupportedOperationException();

This comment has been minimized.

@cbuescher

cbuescher May 16, 2017

Member

We render a subsetSize as the DOC_COUNT field in the surrounding aggregation, I'm not sure if this is equivalent with the bucket subset size but maybe it could be used here? Probably would need some checking with somebody who knows the aggregation better though.

This comment has been minimized.

@tlrx

tlrx May 16, 2017

Member

I'm not sure how they are related to be honest. Since we don't render the superset size and subset size I think it's ok to throw an unsupported operation exception here.

This comment has been minimized.

@tlrx

tlrx May 23, 2017

Member

Discussion that gives more background around where these fields come from: #5146 (comment)

@Override
public long getSupersetSize() {
throw new UnsupportedOperationException();

This comment has been minimized.

@cbuescher

cbuescher May 16, 2017

Member

I was wondering if getSuperset/SubsetSize is part of the Bucket interface but not rendered via the Rest response, should we either add rendering of these values to the bucket response or remove it from the interface to get equivalent behaviour of functionality of the transport client with the high level rest client here? I think this can be done in a separate issue though, maybe its not needed at all.

This comment has been minimized.

@tlrx

tlrx May 16, 2017

Member

Maybe @markharwood has an opinion on this?

This comment has been minimized.

@cbuescher

cbuescher May 16, 2017

Member

I'm okay with the UnsupportedOperationException for now if we can track this question (whether we can reach consistency between the functionality the transport client provides via the SignificantTerms.Bucket interface with the rest response) in a separate issue

This comment has been minimized.

@tlrx

tlrx May 24, 2017

Member

This took me some time to wrap my head around this, but I finally created #24865 to address this.

@tlrx

This comment has been minimized.

Member

tlrx commented May 16, 2017

Thanks @javanna @cbuescher ! I updated a bit, would you like to have another look please?

@cbuescher

This comment has been minimized.

Member

cbuescher commented May 16, 2017

Update looks good to me.

tlrx added some commits May 15, 2017

@tlrx tlrx merged commit d5fc520 into elastic:feature/client_aggs_parsing May 16, 2017

1 check passed

CLA Commit author is a member of Elasticsearch
Details
@tlrx

This comment has been minimized.

Member

tlrx commented May 16, 2017

Thanks @javanna @cbuescher !

@tlrx tlrx deleted the tlrx:add-parsing-to-sig-terms-aggs branch May 16, 2017

javanna added a commit to javanna/elasticsearch that referenced this pull request May 23, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment