Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scriptable Metrics Aggregation #7075

Merged

Conversation

Projects
None yet
7 participants
@colings86
Copy link
Member

colings86 commented Jul 29, 2014

A metrics aggregation which runs specified scripts at the init, collect, combine, and reduce phases

Closes #5923

@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Jul 29, 2014

Looking forward to reading the docs :)

@colings86

This comment has been minimized.

Copy link
Member Author

colings86 commented Jul 29, 2014

@clintongormley yep. Long way from finished yet as need to write tests and read the docs. Put the PR up as @brwe wanted to have a look at it and so others could have a play around with it but its not ready for review yet

@mattweber

This comment has been minimized.

Copy link
Contributor

mattweber commented Jul 29, 2014

+1! So cool, can't wait to give this a try. Thanks!

@colings86

This comment has been minimized.

Copy link
Member Author

colings86 commented Jul 30, 2014

Following gist can be followed as an example of using the scriptable metrics aggregation.

https://gist.github.com/colings86/0994f1b188b41be6052a

@jpountz

View changes

src/main/java/org/elasticsearch/search/aggregations/AggregationBuilders.java Outdated
@@ -156,4 +157,8 @@ public static TopHitsBuilder topHits(String name) {
public static GeoBoundsBuilder geoBounds(String name) {
return new GeoBoundsBuilder(name);
}

public static ScriptBuilder script(String name) {

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 30, 2014

Contributor

Maybe the name should reflect the fact that this is a metric aggregation to leave some namespace if we have a scripted bucket agg some day?

@jpountz

View changes

src/main/java/org/elasticsearch/search/aggregations/InternalAggregations.java Outdated
@@ -112,7 +114,7 @@ public InternalAggregations(List<InternalAggregation> aggregations) {
* @param aggregationsList A list of aggregation to reduce
* @return The reduced addAggregation
*/
public static InternalAggregations reduce(List<InternalAggregations> aggregationsList, BigArrays bigArrays) {
public static InternalAggregations reduce(List<InternalAggregations> aggregationsList, BigArrays bigArrays, ScriptService scriptService, Client client) {

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 30, 2014

Contributor

Would be nice to reuse the ReduceContext now that there are several variables

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Jul 30, 2014

I left some comments but my major concern is about the client: do we need to expose it to the scripts? I'm worried that it could allow to do crazy things?

@colings86

This comment has been minimized.

Copy link
Member Author

colings86 commented Jul 30, 2014

@jpountz I agree that exposing the client is dangerous and it allows for some crazy things. I spoke to @imotov about this and he said that it was something explicitly requested for the scripted facets. I would be happy to not provide it to the scripts, it would also make the code a bit cleaner

@uboness

This comment has been minimized.

Copy link
Contributor

uboness commented Jul 30, 2014

+1 on not exposing it... we shouldn't encourage doing crazy things with scripts

@rjernst

This comment has been minimized.

Copy link
Member

rjernst commented Aug 5, 2014

One note: expression scripts can only be used for "search", they do not allow "executable". It looks like only the map phase uses search, and the rest executable?

@colings86

This comment has been minimized.

Copy link
Member Author

colings86 commented Aug 6, 2014

Yeah you are right. Actually since the expression language doesn't support assignment (i think, correct me if I am wrong) I don't think you would be able to use it even in the map script since the map script has to update some state in the params. Also, currently it is assumed that the same language is used for all the scripts in a single aggregations. I think the use cases for this aggregation are almost always going to need more complex scripts than the Lucene expression language can provide, so I see them mostly using groovy and maybe other languages through plugins

@colings86 colings86 added the review label Aug 6, 2014

@jpountz

View changes

src/main/java/org/elasticsearch/search/aggregations/AggregationModule.java Outdated
@@ -18,6 +18,8 @@
*/
package org.elasticsearch.search.aggregations;

import org.elasticsearch.search.aggregations.metrics.scripted.ScriptedMetricParser;

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 6, 2014

Contributor

Can you put this import with the other ones?

@jpountz

View changes

...main/java/org/elasticsearch/search/aggregations/metrics/scripted/InternalScriptedMetric.java Outdated

public Map<String, Object> reduceParams() {
return reduceParams;
}

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 6, 2014

Contributor

Do we need to have public accessors for these properties?

@jpountz

View changes

docs/reference/search/aggregations/metrics/scripted-metric-aggregation.asciidoc Outdated
}
}
}
}

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 6, 2014

Contributor

I think it would be nice to detail a bit more this example to explain what the data looks like at each stage, with example documents and eg. 2 shards.

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 6, 2014

Contributor

Maybe we could also give examples of simple aggregations (eg. sum or terms) reimplemented with scripted_metric to make it clear how it works?

@jpountz

View changes

docs/reference/search/aggregations/metrics/scripted-metric-aggregation.asciidoc Outdated
--------------------------------------------------
"params" : {
"aggregation" : {}
}

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 6, 2014

Contributor

I'm wondering if we should add it to the context all the time instead of adding it just when parameters are not set? For example, I just tried the following aggregation to reimplement sum and it didn't work (always returns 0). I assume it it because I cannot modify the context? Or am I doing something wrong? If I change sum to be a property of an aggregation object instead of being its own variable then it works.

GET test/_search
{
  "aggs": {
    "s": {
      "scripted_metric": {
        "params": { "sum": 0 },
        "map_script": "for (n in doc['f'].values) {sum += n};",
        "combine_script": "return sum",
        "reduce_script": "sum = 0; for (a in aggregations) {sum += a}; return sum;"
      }
    }
  }
}

This comment has been minimized.

Copy link
@colings86

colings86 Aug 6, 2014

Author Member

From looking at the ScriptService it seems that only objects and arrays provided in the params will be updated as the params map is copied when the script is compiled. so your example won't work but the following would:

GET test/_search
{
  "aggs": {
    "s": {
      "scripted_metric": {
        "params": { "result" : { "sum": 0 } },
        "map_script": "for (n in doc['f'].values) {result.sum += n};",
        "combine_script": "return result.sum",
        "reduce_script": "sum = 0; for (a in aggregations) {sum += a}; return sum;"
      }
    }
  }
}

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 20, 2014

Contributor

OK, thanks for the details.

@jpountz

View changes

...in/java/org/elasticsearch/search/aggregations/metrics/scripted/ScriptedMetricAggregator.java Outdated
} else {
this.reduceParams = reduceParams;
}
this.params.put("_ctx", context);

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 6, 2014

Contributor

Why would scripts use the aggregation context?

This comment has been minimized.

Copy link
@colings86

colings86 Aug 6, 2014

Author Member

@jpountz not sure I have a use case. It's another thing that was in @imotov 's scripted facet. Looking at what you can do with the context it looks about as dangerous as providing the client so maybe it too should be removed?

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 20, 2014

Contributor

+1 on removing it

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Aug 6, 2014

Left some comments but it looks great in general!

@jpountz jpountz removed the review label Aug 6, 2014

@clintongormley

View changes

docs/reference/search/aggregations/metrics/scripted-metric-aggregation.asciidoc Outdated
"aggs": {
"profit": {
"scripted_metric": {
"init_script" : "aggregation['transactions'] = []",

This comment has been minimized.

Copy link
@clintongormley

clintongormley Aug 7, 2014

Member

I think we should use a shorter variable than aggregation in the script. How about _agg?

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 20, 2014

Contributor

+1

@colings86

This comment has been minimized.

Copy link
Member Author

colings86 commented Aug 19, 2014

@jpountz I made the code changes you suggested. I also updated the docs with a worked example and left some replies to your comments. Let me know what you think.

@jpountz jpountz added the review label Aug 20, 2014

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Aug 20, 2014

It looks good in general, docs are great. In my opinion, the only things that need to be done before merging it would be:

  • removing the aggregation context from the script context
  • maybe rename aggregation to _agg in the script context as @clintongormley suggested

@jpountz jpountz removed the review label Aug 20, 2014

@colings86 colings86 added the review label Aug 20, 2014

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Aug 20, 2014

LGTM

@jpountz jpountz removed the review label Aug 20, 2014

@colings86 colings86 force-pushed the colings86:feature/scriptableMetricsAggregation branch Aug 20, 2014

Aggregations: Scriptable Metrics Aggregation
A metrics aggregation which runs specified scripts at the init, collect, combine, and reduce phases

Closes #5923

@colings86 colings86 force-pushed the colings86:feature/scriptableMetricsAggregation branch to 7f943f0 Aug 20, 2014

@colings86 colings86 merged commit 7f943f0 into elastic:master Aug 20, 2014

@colings86 colings86 assigned colings86 and unassigned colings86 Aug 21, 2014

@colings86 colings86 deleted the colings86:feature/scriptableMetricsAggregation branch Aug 21, 2014

@diegoasth

This comment has been minimized.

Copy link

diegoasth commented Oct 6, 2014

Great News! I am trying to use a scripted metric aggregation inside a date_histogram aggregation but unfortunately it seems to take docs outside the date-bucket into consideration, but this is not what I expected..

@diegoasth

This comment has been minimized.

Copy link

diegoasth commented Oct 6, 2014

{
"size": 0,
"aggs": {
"histo": {
"date_histogram": {
"field": "timestamp",
"interval": "1h",
"min_doc_count": 0,
"format": "yyyy-MM-dd HH:mm"
},
"aggs": {
"profit": {
"scripted_metric": {
"init_script": " _agg['kpiA'] =0;",
"map_script": "_agg.kpiA += doc['kpiA'].value; ",
"combine_script": "teste =0; teste += _agg.kpiA; return teste;",
"reduce_script": "profit = 0; for (a in _aggs) { profit += a;}; return profit;"
}
}
}
}
}
}

@colings86

This comment has been minimized.

Copy link
Member Author

colings86 commented Oct 7, 2014

@diegoasth I have tried to reproduce the issue your are seeing in the gist below [1], in both 1.4 and in the master branch, without success. Could you take a look at the gist and try it on your system to see if it reproduces your issue. If not, could you create a similar gist to illustrate the steps to reproduce the issue you are seeing?

[1] https://gist.github.com/colings86/d1a9c76898b4b2783b89

@diegoasth

This comment has been minimized.

Copy link

diegoasth commented Oct 8, 2014

Hi @colings86 , thanks or your attention.

in deed, our gist does not reproduce the issue, so I created another one that shows it. My goal is to use a scripted_metric inside a date_histogram (let me know if it is not possible). I expect that the scripts (init_script, map_script, combine_script and reduce_script) consider only the documents in that specific date-bucket. For the sake of comparison, I used the native SUM aggregation inside the date_histogram and it worked fine, returning the expected results per bucket.

https://gist.github.com/diegoasth/406148525616f9f527ab

@colings86

This comment has been minimized.

Copy link
Member Author

colings86 commented Oct 9, 2014

@diegoasth this is indeed a bug. Thanks for raising it and for providing the steps to reproduce it. I'll look into getting it fixed

@colings86

This comment has been minimized.

Copy link
Member Author

colings86 commented Oct 9, 2014

@diegoasth I have raised the following issue for this bug:

#8036

@diegoasth

This comment has been minimized.

Copy link

diegoasth commented Oct 9, 2014

thank you very much. The ability to wite my custom aggregations (inside ES) is a very awaited feature.

@clintongormley clintongormley changed the title Aggregations: Scriptable Metrics Aggregation Scriptable Metrics Aggregation Jun 6, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.