Field Collapsing/Combining #256

Closed
ppearcy opened this Issue Jul 13, 2010 · 244 comments

Projects

None yet
@ppearcy
ppearcy commented Jul 13, 2010

Ability to collapse on a field. For example, I want the most relevant result from all different report types. Or similarly, the most recent result of each report type. Or maybe, I want to de-dup on headline.

So, the sort order would dictate which one from the group is returned. Similar to what is discussed here:
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/

From my understanding, it seems that in order for field collapsing to be efficient, the result set must be relatively small.

This is also referred to as "Combine" on some other search products.

@Omega359

Count this comment as a vote to have this feature added.

@kwloafman

I could make good use of this feature. Go for it!

@Fiedzia
Fiedzia commented Sep 30, 2010

+1 vote for that

@ekalyoncu

yes it's really cool feature.

@ekalyoncu

In SOLR, grouping is not supported for distributed search. If it's implemented, it can be big plus for ElasticSearch

@giorgiovinci

The only workaround is to "group" the results on the client side is correct?
+1 For this. To have the logic on the server is what we need!

@jeroenr
jeroenr commented Nov 2, 2010

+1 This sounds really useful

@apatrida
apatrida commented Nov 9, 2010

This is probably a broader topic of collapsing (dropping dupes based on sort order although many times one field isn't enough to decide a good dedupe), or full rollups where you retain the individual documents within an aggregate replacement document ("5 books by this author").

There are fun issues with each, such as do you try to satisfy the requested window results? How does paging work when things are missing? Does the total document count get adjusted (but is still wrong as you don't know what other pages hold)? ...

@Fiedzia
Fiedzia commented Nov 9, 2010

For me this should work like "select distinct" in sql - so i expect duplicates to be removed everywhere - including total document count, pagination and window result.

@apatrida
apatrida commented Nov 9, 2010

at that point, its a full group-by and in SQL you are getting aggregate values back in functions, and sometimes undefined if you ask for non-aggregate fields ... in the search engine how are the other fields besides the rollup key being treated? Is it a grouping into a master aggregate document listing all the children, or at least the fact that there are children such as what Endeca does? Of is it a deduping and the first one at highest relevancy wins even if many of the other fields differ outside of the key (you need compound keys then as deduping on a single field isn't enough to make that desirable)?

@ppearcy
ppearcy commented Dec 14, 2010

Hey,
Just wanted to say that we are using our own poor man's version of this to satisfy some requirements by just requesting 10x the amount requested and collapsing down client side. Complete hack, but works 99% of the time.

We're now applying this and adding facets to it with a two phased approach. We first get the list of doc ids and then we pass them in as a term list and faceting on that query.

Was curious if there was any more efficient method of doing this?

Thanks,
Paul

@dmartinpro

+1 vote for this issue too.
This is a really useful feature. Think about an e-commerce shop, indexing all sku. When looking at a product, a customer should have in his results list the products (and not the sku).

@till
till commented May 10, 2011

subscribe

@tfreitas

+1

@vincenttheeten

plz don't make us switch to SOLR just for this feature
+1

@kimchy
Member
kimchy commented May 13, 2011

Note that solr does not implment it for a distributed search (as far as I know) and the implementation is problematic (my view).

@till
till commented May 13, 2011

Are you referring to the "field collapse patch" floating around in their Jira? I haven't checked if that made it into a recent release so I don't know how up to date my info is, I just noticed that queries using "field collapse patch" are by magnitude slower than queries without.

@mikemccand
Collaborator

Note that there is now (finally!) a new grouping module in Lucene -- see https://issues.apache.org/jira/browse/LUCENE-1421

It's been back-ported to 3.x, under lucene/contrib/grouping.

So in theory exposing this in ElasticSearch should be straightforward? (And, if it's not, I'd really like to know about that so we can fix it!).

There is some performance hit but not as bad as I had expected. See the 3 TermGroupXXX charts here: http://people.apache.org/~mikemccand/lucenebench -- it's ~ 2.3x-2.5X slower than the straight TermQuery, when grouping by a field with 100, 10K, 1M unique values (though, the sort and groupSort are relevance; maybe when sorting by other fields this is slower). This should also be the worst-case slowdown since TermQuery is such an "easy" query; queries which are "hard" and don't produce many results should see less net impact from the grouping overhead, I expect.

@kimchy
Member
kimchy commented May 19, 2011

Cool!, saw that a few days ago, will definitely have a look.

@tfreitas
tfreitas commented Jun 3, 2011

Hi, with the release of Lucene 3.2, one of its features are:
"A new grouping module, under lucene / contrib / grouping, enable search results to Be group by single-valued indexed field "
http://wiki.apache.org/lucene-java/ReleaseNote32

@darxriggs

+1

@aaronbinns

+1

@0xPIT
0xPIT commented Jun 13, 2011

++1

@mkreidenweis

+1

@bbock
bbock commented Jun 14, 2011

+1

@selaux
selaux commented Jun 14, 2011

+1

@jmayr
jmayr commented Jun 14, 2011

+1

@mikemccand
Collaborator

I'm also working on making it easy(ier) to distribute grouping, by adding static merge methods to TopDocs/TopGroups. Ie, each shard can run the 1st pass collector, send top groups back to front end, front end merges the top groups (SearchGroup.merge) and issues request to all shards to run 2nd pass collector, gets results back, merges with TopGroups.merge. This is all under https://issues.apache.org/jira/browse/LUCENE-3191

@spinscale
Member

+1

@stevencasey

+1

any news on whether https://issues.apache.org/jira/browse/LUCENE-1421 as mentioned by mikemccand will work in elasticsearch?

@marvinthepa

+1

@shtejv
shtejv commented Jun 17, 2011

+1

@theone1984

+1

@liebharc

+1

@letier
letier commented Jun 17, 2011

+1

@aparo
aparo commented Jun 17, 2011

+1

@Karthago

+1

@wolfs
wolfs commented Jun 17, 2011

+1

@wuan
wuan commented Jun 17, 2011

+1

@dachev
dachev commented Jun 20, 2011

+1

@mbj
mbj commented Jun 24, 2011

+1

@ncb000gt

+1

@ofavre
ofavre commented Jun 28, 2011

+∞

@Hadoukanen

Is this being worked on?
This is the only thing that keeps the company i am working for from using it at the moment.
We need it to get "unique" headers from news articles.
We could make our own frontend that does this, but we rather have all search, sort and folding in the same software.
I can understand that this can be a problematic thing in a cluster when all results are not known.

How about this for a solution:
"Field Collapsing" the results in the nodes using Lucene functionality, to reduce the amount of data to be transported.
Then on the node that received the request from the client you do you own "Field Collapsing" when combining the results.

Hope it helps.

@ofavre
ofavre commented Jul 4, 2011

Lucene 3.3 has improved its grouping (more abstract and multiple response per groups, mainly).
A few commits ago, ES has switched to Lucene 3.3 for upcoming version 0.17.
This is good news!

Any idea how long this might take to implement? / Any update status of what still needs to be solved?
Thanks

@kimchy
Member
kimchy commented Jul 4, 2011

Heya, an update on this: I plan to try and tackle this in the next version, see how it goes. The new lucene version does come with grouping support (though, its not going to be tremendously fast, and require more memory). The change requires some internal changes in elasticsearch to represent the fact that grouping is being performed, how to represent it, and get it hooked into the internal single shard search, and distributed search.

@aaschmid

+1

@richardsyeo

+1. Our use case is...property search results which might contain properties for a new Development (large piece of land being built on by a Developer) which might have properties (Plots) of more than one Style. Properties with same Style might have a different price because they might have slightly bigger garden, etc. We would want to offer the user the ability to collapse results on Development and Style. So if a Development had 100 properties containing 5 styles each style with 20 properties we would expect to see 5 items in the results which we would render in the results differently to indicated number of properties and price range.

@vincenttheeten

I'm out of the office until August 1st 2011.
I will have limited access to my mailbox.
For urgent matters, please contact CREAX directly at +32 57 22 94 80.

Best regards,
Vincent

@nahap
nahap commented Jul 25, 2011

+1

@oleander

+1

@mattweber

+1

@kcheang
kcheang commented Sep 18, 2011

+1

@medcl
Member
medcl commented Sep 19, 2011

+1

@cazacugmihai

+1

@jprante
jprante commented Nov 4, 2011

+1

@Shay do you have any updates on this?

I noticed https://issues.apache.org/jira/browse/SOLR-2066

@karussell

hey all, also have a look at child/parent feature ... http://www.elasticsearch.org/guide/reference/query-dsl/top-children-query.html

@electic
electic commented Nov 20, 2011

+1 de-duping would be nice.

@mindflayer

Hi there, any update on this topic? Thanks in advance.

@gpstathis

+1

@naamakat

+1

@scriby
scriby commented Dec 29, 2011

+1

@bryangreen

+1
Seems like a great feature. Glad to see that there is some level of support from Lucene as well.

@nippo
nippo commented Jan 4, 2012

+1

@scharrier

Hi ! Any active work on / informations about that ticket ?

It's the only missing feature for a full ES use as requester for us.
It would be really, really usefull :)

@nurikabe

+1

@chrisixion

+1

@trivoallan

+1 !

@tomiford

+1

@martijnvg
Member

People that might be interested in using grouping with elastic search can checkout the following link:
https://github.com/martijnvg/elasticsearch-with-local-grouping

The readme describes how grouping can be used.

@jprante
jprante commented Mar 2, 2012

Thank you Martijn. This is a nice effort. Sadly, the fork does not support sharding because it is a port of Lucene grouping contrib. I'm interested in grouping over more than one shards.

@martijnvg
Member

Hi jprante, the Lucene grouping contrib it self doesn't support distributed grouping. However the the grouping contrib has methods that help a Lucene developer to implement distributed grouping like for example merge search response from different shards. This is also what Solr uses in its distributed grouping implementation. The code in the fork needs some rewriting in order to support distributed grouping.

@alexbrasetvik
Member

Shay posted an update regarding implementing this here: http://elasticsearch-users.115913.n3.nabble.com/0-19-0-Released-tp3792971p3796850.html

The major changes for the next few weeks on 0.20 that I was planning to try and tackle (hopefully successfully :) ) are:

  • Better shard allocation, initially trying to even out shard allocation also based on which index they belong to, to get even distribution also within an index across a cluster.
  • Refactor the field data support, allowing for different pluggable implementations, which can allow for ones that are more optimized for memory usage for example. It will tie in with the mapping allowing to decide which one to use on which field, though the aim is to have sensible auto detected defaults.
    - Refactor the search execution code, to allow for more interesting search executions, like one that does grouping. Once its done, try and see if we can tackle grouping.
@guanyum
guanyum commented Apr 19, 2012

+1 coding one of our search products on es recently, it will index 0.2b docs. and one of the requirements really need this func.

@tehmaze
tehmaze commented Apr 20, 2012

+1

@mbj
mbj commented Apr 26, 2012

@guanyum to do facet search on arbitrary key value properties :D ?

@d1rtym0nk3y

+1

@tugaanaa

+1 it would be super useful to have this feature for distributed search.

@wormling

+1

@Jagdeep1

+1

@mhcdev
mhcdev commented Jun 15, 2012

+1

@ghost
ghost commented Jun 26, 2012

+1

@schovi
schovi commented Jun 27, 2012

+1

@meher
meher commented Jun 28, 2012

+1

@vkmita
vkmita commented Jul 11, 2012

+1

@ahfeel
ahfeel commented Jul 23, 2012

+1

@lazerscience

+1

@originell

+1

@mahemoff
mahemoff commented Aug 7, 2012

+1

@romanb
romanb commented Aug 9, 2012

+1

@sinfomicien

+1

@markcode

+1

@qhoxie
qhoxie commented Aug 28, 2012

+1

@willtrking

+1

@andy318
andy318 commented Sep 11, 2012

+1

@rashidkpc rashidkpc referenced this issue in elastic/kibana May 14, 2013
Closed

Creating own query #80

@Arnok13
Arnok13 commented May 24, 2013

Any news on a milestone for this feature ?

@lusid
lusid commented May 29, 2013

In my opinion, this is the only thing Elastic Search is missing.

@phoenixbai

+1!

@loachli
loachli commented Jun 5, 2013

+1

@oucil
oucil commented Jun 18, 2013

+1

@skytteren

+1

@davidonlaptop

+1

@cutalion

+1

@gsmith85
gsmith85 commented Aug 4, 2013

+1

@a-c-m
a-c-m commented Aug 9, 2013

It won't be part of the 0.90 final.

0.9 came and went. Do we have any idea where on the road map this might be now? Seems to have a lot of support / interest.

@shadow000fire

I've been hearing about a new Aggregates framework. Will that cover this functionality?

@mattweber

@shadow000fire no, aggregations are similar to facets.

@shadow000fire

Is Aggregates already implemented in the trunk? I was poking around but couldn't find it.

Thanks,
Jay

On Aug 12, 2013, at 10:56 AM, Matt Weber notifications@github.com wrote:

@shadow000fire no, aggregations are similar to facets.


Reply to this email directly or view it on GitHub.

@btiernay

@shadow000fire Just curious about this feature. Do you have a link to information on the aggregates framework? I'm wondering if it might be able to solve my issue decribed in #3456. Thanks

@brusic
brusic commented Aug 13, 2013

The aggregate framework is described here: #3300

@pmanvi
pmanvi commented Aug 16, 2013

+1

@natefriedman

+1

@saxxi
saxxi commented Aug 27, 2013

+1

Hello ES Community,
I'm also figuring if there is a raw equivalence of grouping or if exists for now a pattern to resolve this situation.

My problem:

  • I have a tons of documents to search
  • each article has many (1 to 10) variants which I'd like to present the best version match
  • we can forget about the other docs versions with lower score
  • (we list 20 results by page)

How to accomplish this? Here's some thoughts:
a) get 200 results per time and filter out which I don't need
b) use two filters
1. some facet query (see below)
2. query something like
SELECT [elasticsearch query] WHERE [...] AND version_ids IN ('x', 'y', 'z')

It seems Clinton Gormley-2 on a post* on nabble.com also tried to explain a workaround, but I really did not understand how it works, can somebody give me a help?

Thanks, regards, Adit

*reference: http://elasticsearch-users.115913.n3.nabble.com/Getting-Distinct-Values-td3830953.html

@phungleson

I think what Clinton Gormley-2 explained probably only about step 1 (facet query, be it terms/stats...) in your option b). You still need to do step 2 with the latest ES if I am not wrong.

Options b) makes more sense than option a) btw. At least can make use of some indexing facilities via face queries.

@saxxi
saxxi commented Aug 28, 2013

Hi! Sorry for my late reply.. I'm trying my best to study every possible alternative (aka RTFM) before asking what's already answered!

I'm still figuring how facets & terms/stats can resolve "step 1", here's my attempts:

@shyos on my question comments "Term facet does this very well"

I think I can manage Step 2, would you or the community suggest how to rewrite the last query on my pastebin (the facet / term / stats query)?

Thank you for your help :)

PS: the answer/direction could also be some pseudo-code I can run on my client-side (I use ruby but anything would be good)

@wojons
wojons commented Aug 30, 2013

@kimchy after 3 years 181 comments this is the 3rd oldest open issue on github for elasticsearch. I am not sure that this is going to get supported or not but it looks like there is a huge support within the community. I am sure most of the people in here can tell you that you have one o f the greatest search index/database out there features like this just help solidify that.

@saxxi
saxxi commented Sep 2, 2013

any ideas on how to get around this problem?

@phungleson you did suggest "Having to write hundreds of lines of code to mimic field-collapsing functionality is not so hard but not quite fulfilling."

Even it's not fulfilling I'm assigned to get this working. Could you suggest me any direction on where to start or some basic ideas/principals to follow accomplish pagination with grouping?

@brusic
brusic commented Sep 2, 2013

You can always look at Solr's implementation of field collapsing and port
the logic to elasticsearch. I would assume that at its heart is a custom
Lucene collector that aggregates results:
http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/Collector.html

The trick here is how to do so in an efficient way in a distributed
environment. I was going to attempt a simply naive solution, but I was
waiting for the facet refactor to be released first. It appears that the
facet refactor has now become the new aggregation module. Either way, I
would hold off, or risk a potentially total refactor once the aggregation
module is released.

On Mon, Sep 2, 2013 at 5:22 AM, Adit Saxena notifications@github.comwrote:

any ideas on how to get around this problem?

@phungleson https://github.com/phungleson you did suggest "Having to
write hundreds of lines of code to mimic field-collapsing functionality is
not so hard but not quite fulfilling."

Even it's not fulfilling I'm assigned to get this working. Could you
suggest me any direction on where to start or some basic ideas/principals
to follow accomplish pagination with grouping?


Reply to this email directly or view it on GitHubhttps://github.com/elasticsearch/elasticsearch/issues/256#issuecomment-23657067
.

@phungleson

@saxxi the simple idea is the same with one posted here, we need to do 2 requests.
First request to do a faceting to get some grouped category_ids.
Second request get full docs information based on category_ids.

Sorting of categories do in the first request. Pagination of category_ids do manually in code after first and before second. Secondary sorting of docs can do in second request or manually.

One annoying issue we have is that we write a lot. By the time we do the second request, there might be some new docs and the first faceted result already out-dated.

@brusic Not sure if the issue of distributed environment can be solved if we route based on category_id?

@fulmicoton

What about implementing grouping/collapsing as a plugin?

With 328608f I feel like there is not much obstacle to writing a search plugin, as there been any effort in this direction?

@brusic
brusic commented Sep 8, 2013

Creating a plugin is not the hard part, it is creating an efficient and correct implementation in a sharded environment. I just glanced at the field collapsing feature on Solr and I from what I read, it does not work correctly with shards unless all the documents that are grouped belong to the same shard.

With custom routing, it would be possible to do in elasticsearch, but that would only apply to specific use cases. Elasticsearch already has issues with not having correct counts with facets. I am eagerly awaiting for the new aggregation framework before re-investigating. I doubt there is anything in the aggregation framework itself, but perhaps a refactoring to the underlying classes will help other features.

@fulmicoton

Only ngroups (returning the number of groups) does not work correctly if documents belonging to a group are not in the same shard. Grouping on the other hands works perfectly fine in a sharded environment.

@saxxi
saxxi commented Sep 9, 2013

Hi @brusic & @phungleson, thanks for your inputs. I've tried hard to follow your directions and I've understood all but the faceted query part (sorry!):

I've only managed to find the results with these facets:

eg. results = {
  hits: [
    { id: "the_game", _score: 3, author_id: 'john'},
    { id: "the_play", _score: 2, author_id: 'john'},
    { id: "the_good", _score: 1, author_id: 'mark'}
    ...
  ],
  facets: {
     author: {
         "john" : 30, // total docs found, this would be better: "john": ["the_game", "the_play", ...], "mark": ["the_good", ...]
         "mark" : 10
     }
  }
}

Nevertheless, here's what I've accomplished.


Assumptions.

  1. I'm looking for relevant content
  2. I've assumed that first 300 docs are relevant, so I consider restricting my research to this selection, regardless many or some of these are from the same few authors.
  3. for my needs I didn't "really" needed full pagination, it was enough a "show more" button updated through ajax.

Here's my actual ruby pseudo-code: https://gist.github.com/saxxi/6495116

I still need to test this in production, but for now it looks promising. Thanks to all!

@phungleson

@saxxi better to elaborate your question in group chat or stack over flow rather in Github issues, particularly this issue, I guess.

@wojons
wojons commented Sep 11, 2013

@brusic I was thinking the best way to run this in a distrubuted cluster maybe is to do the grouping at the merge sort. Assuming the values are sorted one the collapse feleds it would be much easier to remove dups then and not before a sort or once they have merged onto a single machine.

@shadow000fire

@wojons If you're talking about doing this as a plugin, I've done a few different ways. Note that they're all workaround a to a native implementation and may or may not perform very well depending on your data.

1- Do the query and sort by the collapse field. Iterate through the hits and remove anything after the first N hits per "group". Then re-order by whatever field you want. The final sort will not be accurate accross the global result set because the shards will have sorted by your collapse field and returned those top G hits. You may have to may multiple queries to get all the groups, which can either be done with scroll or from/size.

2- Similar to number #1 but sort results by score and iterate through the hits to create the groups yourself. This works ok if you don't get too many hits per group more than you want.

(Note that the paging required by both of these can be improved hu re-doing the query with an additional filter after you've identified one or more new groups.

@stevegraham

hate to be that guy, but +1

@saxxi
saxxi commented Sep 11, 2013

@phungleson - I've updated my stackoverflow answer and gist. I'm thinking on joining soon the IRC community as well, I hope to see you there too.

@fulmicoton

@wojons Collapsing is implemented that way in several enterprise search engine. One of the main problem is that no matter how many documents will be returned by the shards, it might not be sufficient to fill your page of hit after postprocessing. When collapsing for diversity (like a search engine like google does) it is not such a big problem to collapse the first result and then fallback to uncollapsed search. If your use case is for instance to group result per category in a e-commerce search engine, this may not be an option for you.

@fulmicoton

If some of you are interested in the way it is done in Solr, I described it in a blog post : http://fulmicoton.com/posts/grouping-in-solr/

@mishu-
mishu- commented Sep 17, 2013

+1 :)

@andyxchang

+1

@obfischer

+1 This just what I need +1000

@edigu
edigu commented Oct 16, 2013

+1

@brusic
brusic commented Oct 16, 2013

The issue of field collapsing was address slightly in this blog post: http://www.elasticsearch.com/blog/from-amsterdam-with-love-elasticsearchs-second-company-all-hands/

"We again fleshed out what is needed in order to properly support field collapsing in a distributed environment execution, as well as the ability to get inner hits (for nested / parent child cases). We have a good idea on the type of refactoring we need in our search execution infrastructure, and hope to tackle it post 1.0."

The elasticsearch team is working on it and the timetable is somewhere post 1.0. You can now stop with the +1s. :)

@scharf
scharf commented Oct 17, 2013

+1
to make "somewhere post 1.0" not too long after 1.0 ;-)

@teuneboon

+1

@mlpinit
mlpinit commented Oct 18, 2013

+1 :)

@vyasarajgk

+1:)

@ymost
ymost commented Nov 21, 2013

+1

@devour25

I've been reading http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/search-aggregations.html,
it says
"Bucketing
A family of aggregations that build buckets, where each bucket is associated with a key and a document criteria. When the aggregations is executed, the buckets criterias are evaluated on every document in the context and when matches, the document is considered to "fall in" the relevant bucket. By the end of the aggreagation process, we’ll end up with a list of buckets - each one with a set of documents that "belong" to it."

for each document returned, it would have a score, is it possible to select only top 5 results from each bucket and all together order by scores of these documents?
(Real case scenario is a search engine only wants to select 5 highest score documents that belongs to each owner, but all the selected documents need to be put together and determine which document displays first)

thanks

@dmr
dmr commented Jan 15, 2014

+1

@jmizgajski

+1

@s1monw
Collaborator
s1monw commented Jan 29, 2014

+1

@brusic
brusic commented Jan 29, 2014

Wait, did Simon just +1 this issue? :)

@fulmicoton

@s1monw Could you communicate on whether there is an ongoing project at elasticsearch on that? I was just thinking about resuming a grouping plugin project next weekend. I'd better drop it if the functionality will be shipped in next release.

@s1monw
Collaborator
s1monw commented Jan 31, 2014

@poulejapon obviously this feature is of great demand so I am pretty sure that this is high prio as it always was. I can't make any promises when this will be implemented but I can promise it's not shipping in the next release and very unlikely in 1.1. This feature to be done right needs a reasonable refactoring on the search execution layer that is why we didn't crank it out already. The demand is I think obvious and there is no need for further +1 on it unless you really need to express yourself as I did since I think it's important. Stay tuned there is hope! :)

@maddin4code

+1

@SaSa1983
SaSa1983 commented Mar 4, 2014

+1

@Kumen
Kumen commented Mar 4, 2014

+1

@adri
adri commented Mar 10, 2014

+1

@recurrence

+1

@daniilyar

+1

@davengeo

+1

@bompi88
bompi88 commented Mar 26, 2014

+1

@g00fy-
g00fy- commented Mar 27, 2014

+1

@brupm
brupm commented Apr 1, 2014

👍

@Firfi
Firfi commented Apr 3, 2014

+1

@imarsman
imarsman commented Apr 4, 2014

This would be incredibly useful for the application I am writing for my company. I am, however, amazed at how capable Elasticsearch is already that I feel it would be rude not to say thank-you before adding my YES to this request for this feature to be added.

@petard
petard commented Apr 4, 2014

+1

@grishick

+1 this is a tie breaker for us right now when evaluating ES vs Solr

@Limfocit

+1

@zeelax
zeelax commented May 12, 2014

+1

@clintongormley
Member

See #6124, which looks like it will handle all field-collapsing requirements, in a distributed manner.

@thejohnfreeman

While neat, is it possible to perform aggregations against all collapsed documents? For example, collapse a set of books on the author field, then aggregate terms in the publisher field, to find the most common publishers by number of distinct authors?

@mattweber

@thejohnfreeman I imagine #6124 is just the first steps, but considering this is a bucket aggregator, what you describe should be possible. Keep and eye on the PR.

@martijnvg
Member

Let me +1 this issue for the last time :)

The top_hits aggregation will handle the field collapse requirements and #6124 is the first step.

@thejohnfreeman Right now the top_hits can only be used as leaf aggregation. Can you example also be implemented via two nested terms aggregations (first on author field and then on publisher) and a top_hits aggregation as leaf?

@martijnvg martijnvg closed this May 23, 2014
@artemredkin

What about paging? As far as I can tell, where is no way to page agg results.

@martijnvg
Member

@artemredkin Pagination isn't supported yet, but it shouldn't be to difficult to add that.

@brusic
brusic commented May 23, 2014

+1

:)

@artemredkin

Cool!
You are awesome :)

@artemredkin

should I add an issue for pagination?

@javanna
Member
javanna commented May 26, 2014

Hi @artemredkin we already have issue #6299 for it ;)

@artemredkin

Got it, thanks!

@vvaradhan

Is there a master-snapshot version available through maven? I can start on my development till 1.3.0 gets officially released.

Also, what would be a likely release date of 1.3.0?

@SaSa1983

You can build the 1.3.0 branch

It contains the aggregations feature

@mikemccabe

Released in http://www.elasticsearch.org/downloads/1-3-0/ - #6124 is referenced in release notes.

@JnBrymn-EB

No traffic on this in almost a year. Should it be presumed that this issue is closed by #6124 ?

@brusic
brusic commented Jun 14, 2015

Correct.
On Jun 14, 2015 9:17 AM, "JnBrymn-EB" notifications@github.com wrote:

No traffic on this in almost a year. Should it be presumed that this issue
is closed by #6124 #6124 ?


Reply to this email directly or view it on GitHub
#256 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment