Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Bucket Script aggregation to reference on string results #36642

Open
larrycinnabar opened this issue Dec 14, 2018 · 9 comments · May be fixed by #44579
Open

Allow Bucket Script aggregation to reference on string results #36642

larrycinnabar opened this issue Dec 14, 2018 · 9 comments · May be fixed by #44579
Labels
:Analytics/Aggregations Aggregations >enhancement help wanted adoptme Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@larrycinnabar
Copy link

larrycinnabar commented Dec 14, 2018

Currently bucket script aggregations allows only numbers to be referenced on:

      "some-stringified-metric":{  
          "bucket_script":{  
            "buckets_path":{  
              "a":"path>of>nested>metrics>agg_that_returns_a_string.value"
            },
            "script":{  
              "source":"params.a"
            }
          }
        },

You will get:
"buckets_path must reference either a number value or a single value numeric metric aggregation, got: java.lang.String"

Result of the agg is just a string, not a list of object, or something complex. Why plain single strings are so offended that they can not be returned as a result?

@spinscale spinscale added the :Analytics/Aggregations Aggregations label Dec 14, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@polyfractal
Copy link
Contributor

Right now, most of the pipeline aggs are geared towards doing numeric transformations (derivatives, etc), which is where that limitation is coming from. The BucketScript agg is definitely a little different, since you have access to a script.

What sort of operation are you wanting to do with the strings?

@larrycinnabar
Copy link
Author

larrycinnabar commented Dec 18, 2018

My case is a little specific:

I have a query with dozens of aggregations. Some of them are really nested. Then, on application level we need to read elastic raw response (json) and marshal it to a struct. To simplify this thing we did a trick: For every metric that we need - we will have:

  1. real aggregation (may be really deep nested)
  2. "public" aggregation - that is top-level and just uses bucket script to direct to its real aggregation value

Why strings are needed? - Because some metrics can return not numeric result, but the value result .
I provide here two examples:

  1. Simple example: the most used term
  2. Complex example. Some of our metrics - are histograms: [{key:a,count:100},{key:b,count:200}...]. It's might be a result of terms aggregation, but we use ScriptedMetric: because sometimes (a) logic is behind just a TermsAggregation, and (b) we can jsonify the result as a simple string.
    So, yes, I want to perform a some kind of terms aggregation, and the list of buckets it returns - are jsonified in a string, and I want it to be available as top-level aggregation (via bucket-script)

@polyfractal
Copy link
Contributor

I see, thanks for the detailed description! I think it makes sense to open up BucketScript for any kind of return value, not just numerics, given the open-ended nature of scripts. It will still need to support gap_policy and related numeric-based features (which may not make sense for strings), but I think that's probably acceptable.

I think it would be a pretty straightforward change: modify the BucketScript painless context to return Object instead of Double, adjust the aggregator to work with objects, update the docs.

I'm going to label this team-discuss to see what the rest of the team thinks.

@polyfractal
Copy link
Contributor

Discussed this in the team meeting, and there was no objection to bucket_script being able to return Object instead of doubles. Unfortunately, I think implementing this will probably be tied up with larger refactoring done to the pipeline framework, so it may be some time before something like this can be extended/fixed.

@Hohol
Copy link
Contributor

Hohol commented Jul 17, 2019

I'd like to work on this.

@polyfractal
Copy link
Contributor

👍 Note that #44179 will change BucketScript a little, and I'm working on a PR right now to add a GapPolicy.NONE, which will also affect things.

I'm not sure how easy this enhancement will be. The pipeline framework is pretty much hardcoded to expect doubles everywhere right now. We may need to chip away at refactoring first before this ticket can be addressed.

@linuradu
Copy link

@polyfractal I think I'm stuck because of this feature request; I'm trying to count all specific fields based on an aggregation result. I posted here my Issue.

Can you please take a look and let me know if there is any chance of doing what I need?

@linuradu
Copy link

@

@polyfractal I think I'm stuck because of this feature request; I'm trying to count all specific fields based on an aggregation result. I posted here my Issue.

Can you please take a look and let me know if there is any chance of doing what I need?

I resolved this issue and here is the answer: https://stackoverflow.com/questions/60662222

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement help wanted adoptme Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants