Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

has_child and inner_hits for grandchild hit doesn't work #11118

Closed
lukens opened this issue May 12, 2015 · 21 comments
Closed

has_child and inner_hits for grandchild hit doesn't work #11118

lukens opened this issue May 12, 2015 · 21 comments
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@lukens
Copy link

lukens commented May 12, 2015

I'm trying to get inner_hits to work on an a two level deep has_child query, but the grandchild hits just seem to return an empty array. If I specify the inner hits at the top level I get the results I want, but its rather too slow. Using inner hits on the has child query seems much faster, but doesn't return the grandchild hits.

Below is an example query:

{
  "from" : 0,
  "size" : 25,
  "query" : {
    "has_child" : {
      "query" : {
        "has_child" : {
          "query" : {
            "filtered" : {
              "query" : {
                "multi_match" : {
                  "query" : "asia",
                  "fields" : [ "_search" ],
                  "operator" : "AND",
                  "analyzer" : "library_synonyms",
                  "fuzziness" : "1"
                }
              },
              "filter" : {
                "and" : {
                  "filters" : [ {
                    "terms" : {
                      "range" : [ "Global" ]
                    }
                  } ]
                }
              }
            }
          },
          "child_type" : "document-ref",
          "inner_hits" : {
            "name" : "document-ref"
          }
        }
      },
      "child_type" : "class",
      "inner_hits" : {
        "size" : 1000,
        "_source" : false,
        "fielddata_fields" : [ "class" ],
        "name" : "class"
      }
    }
  },
  "fielddata_fields" : [ "name" ]
}

The document-ref inner hits just always returns an empty array. Should this work (and, if so, any ideas why it isn't?), or is it beyond the means of what inner hits can currently do?

@lukens
Copy link
Author

lukens commented May 12, 2015

I've created a simpler test case for this, and it seems a little clearer that this doesn't currently work.

Add mappings:

curl -XPOST 'http://localhost:9200/grandchildren' -d '{
  "mappings" : {
    "parent" : {
      "properties" : {
        "parent-name" : {
          "type" : "string",
          "index" : "not_analyzed"
        }
      }
    },
    "child" : {
      "_parent" : {
        "type" : "parent"
      },
      "_routing" : {
        "required" : true
      },
      "properties" : {
        "parent-name" : {
          "type" : "string",
          "index" : "not_analyzed"
        }
      }
    },
    "grandchild" : {
      "_parent" : {
        "type" : "child"
      },
      "_routing" : {
        "required" : true
      },
      "properties" : {
        "grandchild-name" : {
          "type" : "string",
          "index" : "not_analyzed"
        }
      }
    }
  }
}'

Populate:

curl -XPOST 'http://localhost:9200/grandchildren/parent/parent' -d '{ "parent-name" : "Parent" }'
curl -XPOST 'http://localhost:9200/grandchildren/child/child?parent=parent&routing=parent' -d '{ "child-name" : "Child" }'
curl -XPOST 'http://localhost:9200/grandchildren/grandchild/grandchild?parent=child&routing=parent' -d '{ "grandchild-name" : "Grandchild" }'

Query:

curl -XGET 'http://localhost:9200/grandchildren/_search?pretty' -d '{
  "query" : {
    "has_child" : {
      "query" : {
        "has_child" : {
          "query" : {
            "match_all" : {}
          },
          "child_type" : "grandchild",
          "inner_hits" : {
            "name" : "grandchild"
          }
        }
      },
      "child_type" : "child",
      "inner_hits" : {
        "name" : "child"
      }
    }
  }
}'

Result:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "grandchildren",
      "_type" : "parent",
      "_id" : "parent",
      "_score" : 1.0,
      "_source":{ "parent-name" : "Parent" },
      "inner_hits" : {
        "grandchild" : {
          "hits" : {
            "total" : 0,
            "max_score" : null,
            "hits" : [ ]
          }
        },
        "child" : {
          "hits" : {
            "total" : 1,
            "max_score" : 1.0,
            "hits" : [ {
              "_index" : "grandchildren",
              "_type" : "child",
              "_id" : "child",
              "_score" : 1.0,
              "_source":{ "child-name" : "Child" }
            } ]
          }
        }
      }
    } ]
  }
}

Not only is the grandchild hits empty, but they're also not nested within the child hits, so aren't going to give me what I want anyway. I'm not sure what the intended/expected behaviour would be here, but guess I need to try something else for now.

@lukens
Copy link
Author

lukens commented May 12, 2015

Above with inner hits at top-level:

Query:

curl -XGET 'http://localhost:9200/grandchildren/_search?pretty' -d '{
  "query" : {
    "has_child" : {
      "query" : {
        "has_child" : {
          "query" : {
            "match_all" : {}
          },
          "child_type" : "grandchild"
        }
      },
      "child_type" : "child"
    }
  },
  "inner_hits" : {
    "child" : {
      "type" : {
        "child" : {
          "inner_hits" : {
            "child" : {
              "type" : {
                "grandchild" : {}
              }
            }
          }
        }
      }
    }
  }
}'

Result:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "grandchildren",
      "_type" : "parent",
      "_id" : "parent",
      "_score" : 1.0,
      "_source":{ "parent-name" : "Parent" },
      "inner_hits" : {
        "child" : {
          "hits" : {
            "total" : 1,
            "max_score" : 1.0,
            "hits" : [ {
              "_index" : "grandchildren",
              "_type" : "child",
              "_id" : "child",
              "_score" : 1.0,
              "_source":{ "child-name" : "Child" },
              "inner_hits" : {
                "child" : {
                  "hits" : {
                    "total" : 1,
                    "max_score" : 1.0,
                    "hits" : [ {
                      "_index" : "grandchildren",
                      "_type" : "grandchild",
                      "_id" : "grandchild",
                      "_score" : 1.0,
                      "_source":{ "grandchild-name" : "Grandchild" }
                    } ]
                  }
                }
              }
            } ]
          }
        }
      }
    } ]
  }
}

Here the grandchild inner hit is correctly found, and nested

@lukens
Copy link
Author

lukens commented May 12, 2015

[I'm using 1.5.2]

@lukens
Copy link
Author

lukens commented May 13, 2015

I managed to track down why this wasn't working. When parsing 'has child' queries, inner hits were always being added to the parseContext, and never as child inner hits to the parent inner hits.

I've put together a very quick and dirty fix for this (https://github.com/lukens/elasticsearch/commit/fb22d622e7b24074b5f4fe3e26cffd4c38cff75b) which has got it working for my case, but I don't think is suitable for inclusion in a release.

Issues I see with my with my fix:

  1. It's just messy, it doesn't really fix the issue, but just cleans up after it. Children still add their inner hits to the parseContext, the parent just removes them again afterwards, and adds them as children to its own inner hits.
  2. The parent removes them again afterwards by mutating a Map it gets from the current SearchContext's InnerHitsContext. This is obviously bad and messy, and would be broken if InnerHitsContext was changed to return a copy of the map, rather than the map itself. Nasty dependencies between classes.
  3. I think a child can specify inner hits even if a parent doesn't. These would currently get lost. I'm not sure what the behaviour should be here, it should probably be considered an invalid query.
  4. If this was done properly, descendants at different levels could add inner hits with the same name, whereas currently this could cause issues. Maybe you shouldn't be able to have the same name at different levels, but if implemented correctly, there should be no need to enforce this.

I considered submitting as a pull request, but felt it was far too rough and ready.

@lukens lukens changed the title has_child and inner_hits for grandchild hit has_child and inner_hits for grandchild hit doesn't work May 13, 2015
@martijnvg
Copy link
Member

@lukens If you want nested inner hits then you need to use the top level inner hits, the inner_hits on a query doesn't support nesting. The fact that your grandparents inner hits is empty is clearly a bug, thanks for bringing this up!

@martijnvg martijnvg self-assigned this May 13, 2015
@martijnvg martijnvg added the >bug label May 13, 2015
@lukens
Copy link
Author

lukens commented May 14, 2015

Hi, it's grandchild, rather than grandparent, that isn't working. Though you also seem to be suggesting it shouldn't be. Either way, the change I committed shows that it can work, my change just isn't a very nice way to make it work.

@lukens
Copy link
Author

lukens commented May 14, 2015

Ah, or are you saying it should work, but just shouldn't be nested? I'm not really sure what the point of it would be if it wasn't nested, though that may just be because it doesn't fit my use case, and I can't think of a use case where it would be useful.

I think it would be good if nesting did work, as that would still allow either use case, really.

The problem with top level inner_hits is that I have to apply the query once in the has_child query, and then again in the inner_hits query, which makes everything slower than it would otherwise need to be.

@martijnvg
Copy link
Member

@lukens yes, I meant grandchild. The reason it is a bug is, because the inner_hits in your response shouldn't be empty.

The top level inner hits and inner hits defined on a query internally to ES is the same thing and either way of defining inner hits will yield the same performance in terms of query time. The nested inner hits support in the query dsl was left out to reduce complexity and most of the times there is just a single level relationship. Obviously that means for your use case that you need to use top level inner hits.

Maybe the inner hits support in the query dsl should support multi level relationships too, but I think the parsing logic shouldn't be get super complex because this. I need to think more about this. Like you said if it the grandchild isn't nested its hits in the response, then it isn't very helpful.

The only overhead of top level inner hits is that queries are defined twice, so the request body gets larger. If you're concerned with that, you can consider using search templates, so that you don't you reduce the amount of data send to your cluster.

@lukens
Copy link
Author

lukens commented May 14, 2015

Isn't there also an overhead in running the query twice with the top level inner_hits, or does Elasticsearch do something clever so that it only gets run once? (and, if so, does it need to be specified again in the inner hits?)

I've not yet come across search templates, are these compatible with highly dynamic searches?

I'd like try and spend a bit more time on getting nesting working in a less hacky manner, but am under enormous pressure just to get a project completed at the moment. For the case when grandchildren aren't nested in the query, I guess the simple solution is to also not nest them in the response. I don't think this would overcomplicate the parsing too much.

I have a fairly nice way to handle all this in my mind, just not the time to implement it at the moment.

@lukens
Copy link
Author

lukens commented May 14, 2015

OK, switching to top level hits doesn't seem to have affected performance, so I can work with that for now. It had seemed much slower before, but once I'd actually got inner_hits on the query working, that ended up just as slow, until I tweaked some other things.

The "fix" I've committed above is probably only really of any use as a reference if you want to go with nesting, as it doesn't solve the problem when not nested. I expect there's something in the inner workings of inner_hits that currently relies on them being nested.

@martijnvg
Copy link
Member

Isn't there also an overhead in running the query twice with the top level inner_hits, or does Elasticsearch do something clever so that it only gets run once? (and, if so, does it need to be specified again in the inner hits?)

inner_hits runs as part of the fetch phase and always executes an additional search to fetch the inner hits. The search being executed is cheap. It only runs an a single shard and just runs a query that fetches top child docs that matches with _parent:[parent_id] (all docs associated with parent parent_id) and the inner query defined in the has_child query. This is a query that ES (actually Lucene) can execute relatively quickly. This mini search is executed for each hit being returned.

I've not yet come across search templates, are these compatible with highly dynamic searches?

Yes, the dynamic part of the search request can be templated.

The "fix" I've committed above is probably only really of any use as a reference if you want to go with nesting, as it doesn't solve the problem when not nested. I expect there's something in the inner workings of inner_hits that currently relies on them being nested.

Yes, the inner hits features relies on the fact that grandchild and child are nested. When using the top-level inner hits notation this works out, but when using inner_hits as part of the query dsl this doesn't work out, because grandchild inner hit definition isn't nested under the child inner hit definition.

In order to fix this properly the query dsl parsing logic should just support nested inner hits. I think the format doesn't need to change in order to support this. Just because the fact the two has_child queries are nested should be enough for automatically nest the two inner hit definitions.

@alvinchow86
Copy link

Allso seeing this issue, in my case multi-level nested documents (as in #13064). Would be great to get a solution to this.

@lipixun
Copy link

lipixun commented Nov 2, 2015

Will this issue be fixed at 2.x? Really looking forward to see the nested inner hits in query dsl.

@sumitjainn
Copy link

When will this be fixed, its a blocker for us !!

@yehosef
Copy link

yehosef commented Dec 20, 2015

+1

1 similar comment
@jCalamari
Copy link

+1

@clintongormley
Copy link

@martijnvg just pinging you about this as a reminder

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jan 21, 2016
…g nested `has_child`, `has_parent` or `nested` queries in the query dsl.

Closes elastic#11118
@abulhol
Copy link

abulhol commented Feb 1, 2016

+1

abulhol added a commit to abulhol/elasticsearch that referenced this issue Feb 10, 2016
I added information on how to define _source fields for inner_hits. In addition, the current information on the retrieval of inner_hits for multiple nested documents is incorrect (see also elastic#11118).
@JanSchroeder
Copy link

+1

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Apr 29, 2016
…y dsl.

Fix a limitation that prevent from hierarchical inner hits be defined in query dsl.

Removed the nested_path, parent_child_type and query options from inner hits dsl. These options are only set by ES
upon parsing the has_child, has_parent and nested queries are using their respective query builders.

These options are still used internally, when these options are set a new private copy is created based on the
provided InnerHitBuilder and configuring either nested_path or parent_child_type and the inner query of the query builder
being used.

Closes elastic#11118
@lwiskowski lwiskowski mentioned this issue May 5, 2016
@abhishek5678
Copy link

Hi Everyone,
where is document data

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Inner Hits labels Feb 14, 2018
@waynehamadi
Copy link

It looks like this issue is still unsolved, at least in elasticsearch 7.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests