New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering after a top_hits have been applied or bucket selector on buckets, which can select based on buckets which has top_hits applied #94967
Comments
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Hi @mat013. I tried your last query and get the same error as you because the bucket_selector is not at the same level as the top hits aggregation. However, even if I move it to the correct level, it does not work and I get the error:
Looking at the documentation it appears that the bucket selector only works for metric buckets with numerical values, while your buckets contain a hits array with complete documents. So it appears what you are trying is not currently supported. I think you already suspected this, by adding the In the meantime, I assume you can handle this by filtering in the client code? |
I can get by for the simple cases... not for the more advance... so I will look forward if there will come a solution. I'll see if I can find some alternative solutions in the meanwhile. Have a nice day :) |
I have found that while this is not possible for
And the result had buckets that looked like this:
|
Thanks a lot. Obviously the reality is more detailed, so the toy example I provided will not scale really well for a document (in our case has more than 100 attributes), but it is definitely interesting as I can then do it in two stages first finding the one and take the objid and then search them out afterwards... So I would appreciate if this enhancement suggestion is still under evaluation whether it can be implemented in a more simple manners as originally suggested In my case field1 will not be a number but a string or keyword, and version and chdate is timestamps Have a nice day |
Fixed in #95828. |
Description
I am unsure whether it is a bug (because I could not find anything in the documentation) or it new feature, and I have tried to ask various forums and googled to see if I could come up with an answer to my problem.
I have a use case (which I imagine is not that unique), which is as follows:
I am trying to create a query in elasticsearch, which is able to retrieve the latest documents for each group and then filter on some criteria on those documents which has been found.
As an example:
Say the following documents are indexed in myindex in elasticsearch:
I would like to find all documents, which has field1 set to x for the document with the highest chdate, for each ident and version, where the selected document has field1 set to x.
In a case where x is 0 then the documents, which has objid 4 and 5 should be returned In a case where x is 1 then the documents, which has objid 1 should be returned
Initially I tried to do following query
{ "size": 0, "aggs": { "by_ident": { "terms": { "field": "ident.keyword", "size": 10 }, "aggs": { "by_version": { "terms": { "field": "version", "size": 10000 }, "aggs": { "by_latest": { "top_hits": { "sort": [{ "chdate": { "order": "desc" } }], "size": 1 } } } } } } } }
However it is not possible to apply a filter afterwards to the top hits
ChatGPT suggested to use a bucket selector
{ "size": 0, "aggs": { "ident": { "terms": { "field": "ident" }, "aggs": { "version": { "terms": { "field": "version" }, "aggs": { "top_hits_agg": { "top_hits": { "size": 1, "sort": [ { "chdate": { "order": "desc" } } ] } }, "field1_filter": { "bucket_selector": { "buckets_path": { "hits": "top_hits_agg.hits.hits", "field1": "top_hits_agg.hits.hits._source.field1" }, "script": { "source": "params.field1 == 0" } } } } } } } } }
However it fails with: Validation Failed: 1: No aggregation found for path [top_hits_agg.hits.hits._source.field1]
So what I would like to suggest is one of the following:
Personally I think option 2 will be more applicable, however option 1 would maybe more easier to comprehend when somebody has to maintain the query later on.
The text was updated successfully, but these errors were encountered: