Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support collapsing top_hits aggregation results #89846

Open
nemphys opened this issue Sep 7, 2022 · 2 comments
Open

Support collapsing top_hits aggregation results #89846

nemphys opened this issue Sep 7, 2022 · 2 comments
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@nemphys
Copy link

nemphys commented Sep 7, 2022

Description

Since collapsing results is a very handy feature and the top_hits aggregation exposes most of the normal per hit features, it would only make sense to support collapsing there, too (preferably with the same syntax as the normal top level collapse parameter).

I am currently at a dead end trying to achieve this behaviour, since any other way I have tried fails to meet some of the
requirements and suffers from bad performance.

My goal is simple: use a (low cardinality, just 4 distinct values) keyword field for a terms aggregation and then fetch the top N hits for each bucket, collapsed using a parent identifier field. Lacking the ability to just add a { collapse: { field: "parent_id" } } to the top hits aggregation, I have tried wrapping it with a terms aggregation (on the parent_id field) and then using a top hits sub-aggregation with a size of 1. This seems to work, but 1) it is slow and 2) I am having a hard time getting the results order right, since I normally sort by _score first, and then by some other field (which acts as a tie breaker in order to get consistent results ordering); the first is possible using a max aggregation on the terms query and then sorting by it, but the second seems to be impossible.

Ideal query:

{
    "size": 0,
    "query": { ... a script_score query },
    "aggs": {
        "field_values": {
            "terms": {
                "field": "<some_field>",
                "min_doc_count": 1
            },
            "aggs": {
                "collapsed_count": {
                    "cardinality": {
                        "field": "<parent_id_field>"
                    }
                },
                "top_5": {
                    "top_hits": {
                        "size": 5,
                        "collapse": {
                            "field": "<parent_id_field>"
                        }
                    }
                 }
            }
        }
    }
}            

Any ideas would be appreciated.

@nemphys nemphys added >enhancement needs:triage Requires assignment of a team area label labels Sep 7, 2022
@fcofdez fcofdez added :Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Sep 7, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@LucasDove
Copy link

want this feature too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

4 participants