Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to retrieve in an aggregations request only pipeline aggs results #15823

Closed
acarstoiu opened this issue Jan 7, 2016 · 9 comments
Closed

Comments

@acarstoiu
Copy link

I have explained this matter in several other issues, but it's time to make it a ticket on its own.
Very often programmers find themselves in the need to post-process the aggregation results computed by Elasticsearch. Since October 2015 the pipeline aggregations are officially avaliable to everyone, so a bunch of use cases can now be handled by just crafting a more elaborate search query.

That's very good, but not enough yet, because clients need also to be off-loaded in terms of network traffic and memory. For now, they receive and are forced to load from the network reply the results of +completely uninteresting+, intermediate aggregations.

Ideally, we should have control on the aggregation level whether its results should be returned or not (a prune property accepted by all aggregations would be just fine). Alternatively, one could also prune from the results the aggregations used in pipeline aggregations via a search-wide flag called (say) prunePipelinedAggs with three possible values:

  • false: default, for backwards compatibility (but I would vote for basic as default value)
  • basic: suppresses only the results of basic aggregations that serve as source data for pipeline aggs; the results of "unrefined" aggregations remain untouched, in the reply
  • all: suppresses the results of all aggregations (both basic and pipeline) that serve as source data for pipeline aggs

Wording may vary, but you get the idea. Particularly "basic" is not an established term.

@clintongormley
Copy link

@acarstoiu Use response filtering to return just the aggs that you want. See https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#_response_filtering

@acarstoiu
Copy link
Author

Yes, I found this on my own in June last year, see this comment. But that's just a workaround, isn't it?!

@shuangshui
Copy link

@acarstoiu Me to need this feature.
In my case the bucket is 50 Mb big, but it's no use to me; I just want the bucket size (returned by the reducer)

@acarstoiu
Copy link
Author

Use a pipeline aggregation and follow the @clintongormley's link.

@shuangshui
Copy link

@acarstoiu thanks. this do solve my problem
but still is needs to sort the bucket and load all of them in to the server mem and return part of the result. This is indeed not a very efficient solution? dont know if i'm right

@acarstoiu
Copy link
Author

Off topic: please stop using a mechanical translator and learn the language. English has by far the simplest grammar among the European languages, it's the best you can get in terms of simplicity (and I know the Chinese grammar is way simpler).

And now to the matter: yes, it is less than optimal, that's why this issue exists (albeit closed - @clintongormley has yet to explain why).

@clintongormley
Copy link

@acarstoiu kindly refrain from castigating other users about their level of English.

@acarstoiu
Copy link
Author

Well, I did ponder whether to write that or not, but I honestly believe it helps the guy a lot more than being "politically correct", a term invented in the west. Try using a mechanical translator for un șut în fund înseamnă un pas înainte ✌️

@shuangshui
Copy link

@acarstoiu @clintongormley thanks a lot for your kindness.I'll improve my English although I was not using google translator.so embarrassing.
On the topic, when doing aggregation in Es, it will sort the bucket by doc_count by default.
In my case the query usually involves 1 million distinct values to group by. So it's very slow(about 20 seconds) . I'm wondering if there is a way to turn off the ordering/sorting in aggregation, which I think will speed up my query a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants