Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested documents _score is overwritten by the root document _score #23329

Closed
volodymyrpavlenko opened this issue Feb 23, 2017 · 6 comments
Closed
Labels
>feature feedback_needed :Search/Search Search-related issues that do not fall into other categories

Comments

@volodymyrpavlenko
Copy link

volodymyrpavlenko commented Feb 23, 2017

I'm currently trying to implement a query to make a scored search on a nested document and then perform an aggregation on the hits. I also need to sort the aggregations according to the score of the nested document.

A very simplified mapping can be described:

{
    "mappings": {
      "parent": {
        "properties": {
          "children": {
            "dynamic": false,
            "type": "nested",
            "properties": {
              "name": {
                "type": "string",
                "index": "analyzed",
                "analyzer": "standard"
              },
              "id": {
                "index": "not_analyzed",
                "omit_norms": "true",
                "type": "string",
                "index_options": "docs",
                "doc_values": "true"
              }
            }
          }
        }
      }
    }
  }

To achieve the initial requirement, I used technique that is called "field collapse". The query looks like this:

{
  "size": 0,
  "query": {
    "filtered": {
      "query": {
        "nested": {
          "query": {
            "match": {
              "children.name": {
                "query": "<free-text-search>",
                "type": "boolean",
                "operator": "AND",
                "analyzer": "standard",
                "boost": 1.0
              }
            }
          },
          "path": "children"
        }
      }
    }
  },
  "aggregations": {
    "iEzOD7MlzNsjvHxsDcAkVAVtDicmIkg9": {
      "nested": {
        "path": "children"
      },
      "aggregations": {
        "childId": {
          "terms": {
            "field": "children.id",
            "size": 10,
            "order": [{
              "top_score": "desc"
            }, {
              "_count": "desc"
            }, {
              "_term": "asc"
            }]
          },
          "aggregations": {
            "top_score": {
              "max": {
                "script": "_score"
              }
            },
            "top_hit": {
              "top_hits": {
                "size": 1,
                "_source": {
                  "includes": ["id", "name"],
                  "excludes": []
                }
              }
            }
          }
        }
      }
    }
  }
}

It works fine as long as we have one child per parent. But this stops working when we get multiple children per parent.

Investigating the reasons, I found that the problem is that nested document _score is actually taken from the root document, even though it is displayed correctly in inner_docs if I output them.

I tested this in 1.7.1 and 2.4.3 and the problem persists.

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Nested Docs labels Feb 14, 2018
@talevy talevy added help wanted adoptme :Search/Search Search-related issues that do not fall into other categories and removed :Search/Search Search-related issues that do not fall into other categories labels Mar 26, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@colings86 colings86 removed the discuss label May 21, 2018
@mayya-sharipova
Copy link
Contributor

We discussed it, and decided to add another optional parameter to the top_hits aggregation - query. And it will be for this query that the top hits in top_hits aggregation will be returned. This parameter will be compulsory in the nested context, but will be optional in no-nested context.

@volodymyrpavlenko
Copy link
Author

Sounds really great! Could you give a guess how this will affect performance? Will the same query run multiple times, once during the query phase and once during the aggregation phase?

@mayya-sharipova
Copy link
Contributor

@volodymyrpavlenko I think your problem could be solved without even introducing any modifications to the code, but just using rescore. Can you add rescore to your query without any parameter, and see if it addresses your problem (it does for me), something like this:

"query": {
  "query": {
    "nested": {
      "query": {
        "match": {
          "children.name": {
            "query": "<free-text-search>",
            "type": "boolean",
            "operator": "AND",
            "analyzer": "standard",
            "boost": 1.0
          }
        }
      },
      "path": "children"
    }
  }
},
"rescore" : {
  "query" : {
    "rescore_query" : {
      "match": {
          "children.name": {
            "query": "<free-text-search>",
            "type": "boolean",
            "operator": "AND",
            "analyzer": "standard",
            "boost": 1.0
          }
        }
    },
    "query_weight" : 0,
    "rescore_query_weight" : 1
  }
}

Then the nested docs in the top_hits agg should be rescored with only nested query.

@alpar-t
Copy link
Contributor

alpar-t commented Jul 24, 2018

Hi @volodymyrpavlenko did you have the chance to check this out? We would like to know whether the proposed solution works for you.

@alpar-t
Copy link
Contributor

alpar-t commented Aug 20, 2018

No further feedback received. @volodymyrpavlenko let us know if you get to check this out and we can look at re-opening this issue.

@alpar-t alpar-t closed this as completed Aug 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature feedback_needed :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

8 participants