Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Filters aggregation #6974

Merged
merged 1 commit into from Aug 1, 2014

Conversation

Projects
None yet
7 participants
@colings86
Copy link
Member

colings86 commented Jul 23, 2014

A multi-bucket aggregation where multiple filters can be defined (each filter defines a bucket). The buckets will collect all the documents that match their associated filter.

This aggregation can be very useful when one wants to compare analytics between different criterias. It can also be accomplished using multiple definitions of the single filter aggregation, but here, the user will only need to define the sub-aggregations only once.

(this is a continuation of Pull Request #6119)

Closes #6118,#6119

@colings86 colings86 added the review label Jul 23, 2014

@kimchy

This comment has been minimized.

Copy link
Member

kimchy commented Jul 23, 2014

LGTM, @rashidkpc mind looking at it for the proposed API format?

@rashidkpc

This comment has been minimized.

Copy link
Member

rashidkpc commented Jul 23, 2014

API looks good so far. I'm going to build this, try to integrate it into Kibana 4 and see how it goes.

@rashidkpc rashidkpc self-assigned this Jul 23, 2014

@rashidkpc

This comment has been minimized.

Copy link
Member

rashidkpc commented Jul 23, 2014

Ok, I prototyped this out and got it working in Kibana 4. The only input I have into the API format is that it is, slightly, inconsistent with how we do other aggregations that require the buckets to be explicitly defined. The currently proposed filters agg format looks like this:

{
    "aggs" : {
        "messages" : {
            "filters" : {
                "errors" : { "query" : { "match" : { "body" : "error" } } },
                "warnings" : { "query" : { "match" : { "body" : "warning" } } }
            },
            "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }
        }
    }
}

Its a bit awkward, since the keys of the of filters agg are not configuration parameters, but rather names. We don't do that anywhere else. The only other place were we have explicit definitions of buckets like this in in the range agg, where we use an array to define the ranges:

{
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                "ranges" : [
                    { "to" : 50 },
                    { "from" : 50, "to" : 100 },
                    { "from" : 100 }
                ]
            }
        }
    }
}

I wonder if we should do the same thing, and return the buckets as an array, like the range agg does. Granted if we did this with the filters agg we'd end up with filters.filters, which is also weird:

{
    "aggs" : {
        "messages" : {
            "filters" : {
                "filters" : [
                    { "query" : { "match" : { "body" : "error" } } },
                    { "query" : { "match" : { "body" : "warning" } } }
                ]
            }
            "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }
        }
    }
}

Of course, you bring up a good point, people might want the response to be keyed. In which case, we could again use the convention defined by the range agg, using the keyed property (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-range-aggregation.html#_keyed_response)

If we wanted to do the same thing in the filters agg we could use the _name convention from named filters/queries (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-named-queries-and-filters.html). Eg:

{
    "aggs" : {
        "messages" : {
            "filters" : {
                "filters" : [
                    { "query" : { "match" : { "body" : "error", "_name": "errors" } } },
                    { "query" : { "match" : { "body" : "warning", "_name": "warnings" } } }
                ]
            }
            "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }
        }
    }
}

I don't love the filters.filters thing, but it is consistent.

@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Jul 24, 2014

@rashidkpc I'm not sure we need the filters.filters. filters can either have an array as value, in which case it is ordered, or an object, in which case it is keyed. See #6119 (comment)

This assumes that Jackson and client languages can differentiate. @polyfractal - I know that PHP does some funky things with converting hashes to arrays, but I think that's only for empty hashes?

@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Jul 25, 2014

Hmm, I may have to eat my words about not requiring the filters.filters setting. See #7020

@clintongormley clintongormley referenced this pull request Jul 25, 2014

Closed

exclude aggregator #7020

@colings86

This comment has been minimized.

Copy link
Member Author

colings86 commented Jul 28, 2014

Updated so that you request is filters.filters again and can support providing an array of anonymous filters in the request which are returned in the same order in the response. Not sure if I like it however, since unlike the histogram aggregator the 'keyed: false' option does not have a key field which links back to the request so the only link between the bucket in the response and the filter in the request is its position in the array

@colings86

This comment has been minimized.

Copy link
Member Author

colings86 commented Jul 28, 2014

@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Jul 28, 2014

@colings86 (Ignore my comments about #7020 )

Not sure we need the double level filters.filters? Surely just filters: {....} or filters: [...] would be sufficient?

@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Jul 28, 2014

@rashidkpc can you confirm that you'd be happy with support for these two syntax options?

Results returned with keys:

"aggs" : {
    "messages" : {
        "filters" : {
            "errors" : { "query" : { "match" : { "body" : "error" } } },
            "warnings" : { "query" : { "match" : { "body" : "warning" } } }
        }
    },
    "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }

}

Results returned as an array:

"aggs" : {
    "messages" : {
        "filters" : [
            { "query" : { "match" : { "body" : "error" } }, "name": "errors"},
            { "query" : { "match" : { "body" : "warning" } }, "name" : "warnings" }
        ]
    },
    "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }        
}
@rashidkpc

This comment has been minimized.

Copy link
Member

rashidkpc commented Jul 28, 2014

After reviewing the range aggregation, I understand why the syntax is how it is. I think the syntax as proposed by @clintongormley will be fine for the filters aggregation. filters.filters is probably not required here unless we plan to add any other parameters to the aggregation.

@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Jul 28, 2014

w00t

@colings86 colings86 removed the review label Jul 30, 2014

@colings86 colings86 assigned colings86 and unassigned rashidkpc Jul 30, 2014

@colings86

This comment has been minimized.

Copy link
Member Author

colings86 commented Jul 30, 2014

Tried to implement this but the aggregation framework throws an error if the aggregation is not an object so does not work for the Array case. At least for the array case we may need to go back to having filters.filters as it doesn't seem correct to change the agg framework for what seems like a special case here.

thoughts?

@colings86

This comment has been minimized.

Copy link
Member Author

colings86 commented Jul 30, 2014

because of the above issue the syntax is going to stay as the following:

Results returned with keys:

"aggs" : {
    "messages" : {
        "filters" : {
            "filters" : {
                "errors" : { "query" : { "match" : { "body" : "error" } } },
                "warnings" : { "query" : { "match" : { "body" : "warning" } } }
            }
        }
    },
    "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }

}

Results returned as an array:

"aggs" : {
    "messages" : {
        "filters" : {
            "filters" : [
                { "query" : { "match" : { "body" : "error" } }, "name": "errors"},
                { "query" : { "match" : { "body" : "warning" } }, "name" : "warnings" }
            ]
        }
    },
    "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }        
}

@colings86 colings86 added the review label Jul 30, 2014

@jpountz

View changes

...ain/java/org/elasticsearch/search/aggregations/bucket/filters/FiltersAggregationBuilder.java Outdated
*/
public class FiltersAggregationBuilder extends AggregationBuilder<FiltersAggregationBuilder> {

private Map<String, FilterBuilder> filters = new HashMap<>();

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 30, 2014

Contributor

Should it be a linked hash map so that the order would be the same in the response?

}
reduced.aggregations = InternalAggregations.reduce(aggregationsList, bigArrays);
return reduced;
}

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 30, 2014

Contributor

Could you make the reduction create a new aggregation instead of filling the first one? This proved to be error-prone in the past.

@jpountz

View changes

src/main/java/org/elasticsearch/search/aggregations/bucket/filters/InternalFilters.java Outdated
}
}

InternalFilters reduced = (InternalFilters) aggregations.get(0);

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 30, 2014

Contributor

similar concern about in-place reduction

@jpountz jpountz removed the review label Jul 30, 2014

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Jul 30, 2014

@colings86 Left some comments.

@colings86

This comment has been minimized.

Copy link
Member Author

colings86 commented Jul 31, 2014

@jpountz I implemented thhe changes you suggested and also I added tests and documentation for array type request as well as updating the Builder to be able to construct array requests since I realised these were missing

@colings86 colings86 added the review label Jul 31, 2014

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Aug 1, 2014

LGTM

@jpountz jpountz removed the review label Aug 1, 2014

Aggregations Added Filters aggregation
A multi-bucket aggregation where multiple filters can be defined (each filter defines a bucket). The buckets will collect all the documents that match their associated filter.

This aggregation can be very useful when one wants to compare analytics between different criterias. It can also be accomplished using multiple definitions of the single filter aggregation, but here, the user will only need to define the sub-aggregations only once.

Closes #6118

@colings86 colings86 merged commit 3c9c9f3 into elastic:master Aug 1, 2014

@JasonPunyon

This comment has been minimized.

@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Aug 7, 2014

@JasonPunyon fixed, thanks

@colings86 colings86 deleted the colings86:pr/6119 branch Aug 13, 2014

@colings86 colings86 assigned colings86 and unassigned colings86 Aug 21, 2014

@clintongormley clintongormley changed the title Added Filters aggregation Aggregations: Added Filters aggregation Sep 10, 2014

@clintongormley clintongormley changed the title Aggregations: Added Filters aggregation Added Filters aggregation Jun 6, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.