Added syntactic sugar over agg definition #5721

uboness · 2014-04-07T23:53:49Z

some aggregations typically tend to be requested together. But using the normal (formal) dsl can make the request too verbose. The agg factory/parsing infrastructure was changed to enable factories to automatically add more factories, and by that enables "syntatctic sugar" to save verbosity
one common use case for this is the missing aggregation. Often, when running values source aggs (e.g. terms, avg, etc...) one also wants to get the count of those document that are not associated with values (therefore were not aggregated). Now, using the new infrastructure, one can define a track_missing setting on any values source based aggregation, and a missing aggregation will automatically be injected into the request - based on the same value source.

- some aggregations typically tend to be requested together. But using the normal (formal) dsl can make the request too verbose. The agg factory/parsing infrastructure was changed to enable factories to automatically add more factories, and by that enables "syntatctic sugar" to save verbosity - one common use case for this is the `missing` aggregation. Often, when running values source aggs (e.g. terms, avg, etc...) one also wants to get the count of those document that are not associated with values (therefore were not aggregated). Now, using the new infrastructure, one can define a `track_missing` setting on any values source based aggregation, and a `missing` aggregation will automatically be injected into the request - based on the same value source. Closes #5324

roytmana · 2014-04-08T01:27:35Z

@uboness I could not play with it and docs do not make iy clear if it replicate sub aggs defined within an aggs that are marked to track missing. In other words if I agg by state and then city and mark state agg to track missing there will be city buckets within missing bucket along with any accumulator aggs such as sum avg etc defined within city?

Also I feel it would be much more convenient to have missing as part of normal bucket array with key specified when enabling missing tracking for the agg rather than as sibling node. After all missing is just another value (granted somewhat special) of that field. It is just like the databases let you handle null values almost like the normal ones and with great ease and convenience

uboness · 2014-04-08T01:37:55Z

@roytmana adding track_missing in the states terms agg will add a sibling states_missing agg next to it. it will be an empty missing agg, with only the doc_count that will indicate the number of docs missing the field. you'll need to specify track_missing on the cities aggs to have cities_missing agg injected next to that one

Also I feel it would be much more convenient to have missing as part of normal bucket array with key specified when enabling missing tracking for the agg rather than as sibling node. After all missing is just another value (granted somewhat special) of that field. It is just like the databases let you handle null values almost like the normal ones and with great ease and convenience

so, for the missing, we decided this is the best we'll go with that regard, as adding the missing count to the aggs themselves also adds complexity we'd like to avoid. That said, we do plan very soon (this week perhaps) to let you define default values for missing docs, which will enable you to treat those like you'd treat null values in the db.

roytmana · 2014-04-08T01:47:15Z

Thanks for the info @uboness. Then as you said it is truly is a syntactic sugar adding little value if it does not replicate sub aggs defined within the agg with tracked null values. I would like to re-open my request since it solves none of the problems I outlined in it if you do not mind?

uboness · 2014-04-08T01:58:46Z

@roytmana well, it's there to solve the verbosity of the request, which seemed to be your main concern outlined above. As for the response, the difference in how you'd fetch the data is negligible I'd say:

aggs.states._missing

vs

aggs.states_missing.value

In any case, you can also wait for the default values feature I mentioned above and use that if it better fits your needs

roytmana · 2014-04-08T02:07:01Z

Thank you @uboness I will wait for default values.
The major complexity that is not solved is in having to replicate sub aggs within each missing definition. In my example I still want my city breakdown within missing for states and all the sub aggs I defined within the city.

It would have been very nice to have an optional "other" bucket in aggs as well :-)

jpountz · 2014-04-08T07:30:00Z

src/main/java/org/elasticsearch/search/aggregations/AggregationBuilders.java

+    public static ValueCountBuilder count(String name) {
+        return valueCount(name);
+    }
+


Can you do it in a separate change?

(I'm all for it, but would just like to make sure we have a changes entry for it)

jpountz · 2014-04-08T08:39:07Z

It doesn't feel right to me to make process return sibling aggregators: it makes this method perform two very different things. Maybe we could instead make either the parser return several factories or the factories return several aggregators?

Although I think it is nice to users to have documentation of track_missing for every aggregation, I think the duplication will make it a pain to update. Maybe it should be written only once and then included or referenced in every aggregation that exposes it (so that there is a single place to update)?

Regarding the syntax, maybe we should make it future-proof in case we would also like to be able to replicate the sub-tree of aggregations under the missing agg? Something like:

"track_missing": {
    "term_missing": "counts"
}

where term_missing is the key under which the missing aggregation should be and counts tells that it should just track missing counts.

uboness · 2014-04-08T10:12:20Z

It doesn't feel right to me to make process return sibling aggregators: it makes this method perform two very different things. Maybe we could instead make either the parser return several factories or the factories return several aggregators?

I'm not sure I agree with that, the whole idea of syntactic sugar is that it's just applied on the syntax, which means this syntax represents a more elaborated syntax. So this behaviour should not (almost by definition) propagate to the underlying implementation of the factories or the aggregators... it should be done on the highest level possible.

Although I think it is nice to users to have documentation of track_missing for every aggregation, I think the duplication will make it a pain to update. Maybe it should be written only once and then included or referenced in every aggregation that exposes it (so that there is a single place to update)?

I tend to agree, there's just no place for common settings in the docs now. I did try to put it on the "main" aggs and share it between related ones (e.g. range, date_range, ip_range). Maybe we should consider adding a section for common configurations (also put the values source configuration there?), but I'd do that on a different PR

Regarding the syntax, maybe we should make it future-proof in case we would also like to be able to replicate the sub-tree of aggregations under the missing agg?

I don't think it's not future proof right now... if needed, in the future we could always introduce support for objects as well (nothing prevents us from doing that)

jpountz · 2014-04-09T12:41:21Z

it should be done on the highest level possible.

I'm good with that, but can we keep concerns separated and not do both validation and return the siblings in the same method?

Maybe we should consider adding a section for common configurations (also put the values source configuration there?), but I'd do that on a different PR

OK, can you open an issue already and mark it as a blocker for 1.2 so that we don't forget?

I don't think it's not future proof right now... if needed, in the future we could always introduce support for objects as well (nothing prevents us from doing that)

OK.

s1monw · 2015-03-20T21:45:56Z

this is stale for alomst 1 year. I am closing it for now - we can still revisit

jpountz reviewed Apr 8, 2014
View reviewed changes

clintongormley added the discuss label Aug 7, 2014

clintongormley assigned costin and clintongormley and unassigned costin Aug 22, 2014

clintongormley removed their assignment Sep 7, 2014

clintongormley added the :Analytics/Aggregations Aggregations label Nov 11, 2014

drewr force-pushed the master branch from dcc3da0 to 7c20a8a Compare February 20, 2015 16:48

s1monw closed this Mar 20, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added syntactic sugar over agg definition #5721

Added syntactic sugar over agg definition #5721

uboness commented Apr 7, 2014

roytmana commented Apr 8, 2014

uboness commented Apr 8, 2014

roytmana commented Apr 8, 2014

uboness commented Apr 8, 2014

roytmana commented Apr 8, 2014

jpountz Apr 8, 2014

jpountz Apr 8, 2014

uboness Apr 8, 2014

jpountz commented Apr 8, 2014

uboness commented Apr 8, 2014

jpountz commented Apr 9, 2014

s1monw commented Mar 20, 2015

Added syntactic sugar over agg definition #5721

Added syntactic sugar over agg definition #5721

Conversation

uboness commented Apr 7, 2014

roytmana commented Apr 8, 2014

uboness commented Apr 8, 2014

roytmana commented Apr 8, 2014

uboness commented Apr 8, 2014

roytmana commented Apr 8, 2014

jpountz Apr 8, 2014

Choose a reason for hiding this comment

jpountz Apr 8, 2014

Choose a reason for hiding this comment

uboness Apr 8, 2014

Choose a reason for hiding this comment

jpountz commented Apr 8, 2014

uboness commented Apr 8, 2014

jpountz commented Apr 9, 2014

s1monw commented Mar 20, 2015