Return matching nested inner objects per hit #3022

Closed
martijnvg opened this Issue May 10, 2013 · 80 comments

Projects

None yet
@martijnvg
elastic member

Add support for including the matching nested inner objects per hit element.

@eranid

+1

@roeena

+1

@btiernay

I'm curious on the intended behaviour of this feature:

  • Will it be possible to do a global sort, offset, limit based on properties of the child?
  • Will it be possible to return the matching child AND parent?

The answers to these questions will have implications in how we proceed in implementing our current application.
Thanks!

@brusic

Sorting on nested documents has been supported since the 0.90 release: #2662

Nested queries always returns the parent so I am assuming the behavior will remain the same. Hopefully this feature will have many settings, similar to most other elasticsearch features.

And I hate sounding like a broken record, but can we please stop with the +1s? The elasticsearch team is not influenced by them and they only create noise.

@btiernay

Sorting on nested documents has been supported since the 0.90 release: #2662

By "global sort", a mean without regard to parent-nested relationship. That is, it is possible to return sorted children which may not be contiguous with respect to their parent. For example:

Hit 1. nested1,1 -> parent1
Hit 2. nested2,1 -> parent2
Hit 3. nested1,2 -> parent1

Notice how different parents are interleaved.

Nested queries always returns the parent so I am assuming the behavior will remain the same. Hopefully this feature will have many settings, similar to most other elasticsearch features.

It would be nice to have flexibility here as you describe.

And I hate sounding like a broken record, but can we please stop with the +1s? The elasticsearch team is not influenced by them and they only create noise.

Message received, sorry about that.

@brusic

IMHO, your use case is better suited for parent/child documents and not nested documents. The way I see things is that inner/nested documents always form a single document with the outer/parent document. The inner/nested documents never appear separately. This feature breaks that model slightly by not returning certain nested documents, but the parent is always the same. Of course, I do not work for elasticsearch so my views and thoughts have no bearing on the issue. :)

BTW, there was nothing wrong with your comment. Adding discussion to an issue via a concrete use case provides value and is the type of comment we should be seeing. A comment with nothing but +1 does not provide value. Perhaps I should just create an email filter that ignores github messages with only +1.

@eranid

Parent-Child has the problem of using ALOT of in-memory for the joins.
I was using it at first, but as the index grew to hundreds of GB, it became a memory and CPU monster.
When most of my queries are "get me the photos that were tagged with certain tags with some value in a range of dates" (the nested document is the tag)
I have to use either parent-child or nested.

Since there might be lots of tags per photo, I want to get just the relevant tags (don't care about getting the parent really, though I'd rather not).

Parent-Child just can't handle this. with 7GB of memory, The machine takes forever to do the joins, and sometimes crashes.

Also, I did not know the +1 was a bother. I thought it helped you guys prioritize features.
My apologies. Will spread the word.

@brusic

I never said parent-child was efficient, just that its functionality is better suited to your use case. :) Even if nested documents eventually supported your use case, the overhead of sorting will also be it grossly inefficient. Each parent document would need to be scored several times.

As far as +1 goes, there has been some discussion about them. There are a few issues that are 2-3 years old that have hundreds of +1s. You can make the judgement if they are effective or not. I am not on the elasticsearch team so everyone should follow their advice on proper github etiquette and not mine. :)

@btiernay

Even if nested documents eventually supported your use case, the overhead of sorting will also be it grossly inefficient. Each parent document would need to be scored several times.

This may be true given what lucene currently supports for BlockJoinQuery and BlockJoinCollector. This is a good article describing the basics: http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html?m=1

The join can currently only go in one direction (mapping child docIDs to parent docIDs), but in some cases you need to map parent docIDs to child docIDs. For example, when searching songs, perhaps you want all matching songs sorted by their title. You can't easily do this today because the only way to get song hits is to group by album or band/artist.

@martijnvg
elastic member

@btiernay @brusic The idea is that the nested inner objects hits are included in the root doc hit. Something like this:

"hits" : [ {
      "_index" : "test",
      "_type" : "type1",
      "_id" : "1",
      "_score" : 1.584377, "_source" : ....,
      "nested_hits" : {
        "total" : 2,
          "max_score" : 1.6391755,
          "hits" : [ {
            "_offset" : 1, 
            "_score" : 1.6391755, "_source" : ...
          }, {
            "_offset" : 0,
            "_score" : 1.5295786, "_source" : ...
          } ]
      }
}

In the above case _offset is nested field's array offset in the _source.

It should be possible to specify a global sort and a sort inside the root document and what to show per nested hit (the complete inner object based on the source or just some fields). In addition supporting highlighting and other per hit features makes a lot of sense as well.

@eranid The memory usage of the parent/child have been reduced in the new 0.90.1 version. Hopefully parent/child queries can work now better in your environment.

@brusic

@martijnvg, so the full source will still be returned? The nested hits is a great idea in terms of flexibility and makes more sense than editing the source (which I referred to above in "breaking the model"), I just hope that it is efficient. I have some convoluted logic to deal with filtering nested documents on the client side, and the serialization/deserialization using Jackson is a bit of a performance hit.

Can scoring be avoid on the nested hits results? My use case calls for scoring using the fields in the parent document, but only filtering the nested documents. Not sure if you thought of this scenario, but a flexible scoring model would be a great feature.

@martijnvg
elastic member

@brusic The full source can optionally returned if that is requested, but it isn't necessary. The source of the nested inner object will be separately returned, but is based on the source in the root document. The source can also be disabled and individual fields can be separately be set to stored in the mapping, these individual fields can then be requested instead of the source.

The overhead of fetching inner nested objects should be small. This should be done in the fetch phase (so only for the competitive root docs) by re-executing the inner query of the nested query only on the nested docs of the root docs to be retrieved (a big filter).

Not sure what you mean with the avoiding the scoring on neste hits. Just use a field from the parent for scoring via sorting by script?

@btiernay

@martijnvg: Very nice proposal. A couple of clarifications:

It should be possible to specify a global sort and a sort inside the root document

When you say "global sort" do you mean global with respect to the root document, or with respect to nested documents? I could see how you might be implying the ability to do either.

...based on the sort or just some fields

I assume you mean "source" not "sort"?

@btiernay

@brusic: With respect to:

The nested hits is a great idea in terms of flexibility and makes more sense than editing the source (which I referred to above in "breaking the model"), I just hope that it is efficient.

I think this really depends on the size and structure of your documents. We have some very large documents (deep and wide) for which the ability to return the nested documents without "editing" the source would be much more efficient.

@clintongormley
elastic member

@eranid to add to what @martijnvg said: up until 0.90.1, parent-child relationships required the parent IDs and child IDs to be held in memory. From 0.90.1 onwards, only the parent IDs need to be held in memory. This is a massive saving and should make parent-child much lighter.

@martijnvg
elastic member

@btiernay The global sort is with respect to the root document. You could use nested sorting as global sorting which will base the ordering of root docs based on aggregate sort values from the nested inner objects.

I assume you mean "source" not "sort"?

Yes, I meant source.

@martijnvg
elastic member

We definitely want to get this feature in, but in order get in it in right, a refactoring is needed in the fetch phase.
The fetch phase needs to have "a hit in a hit" concept (inner hits), that should cover both nested hits and getting child hits as part of the parent hit. All features that currently work on normal hits like for example explain, highlighting, fields and partial fields should also work for inner hits (if applicable).

@btiernay

@martijnvg To be clear, I suppose there would be no way of inverting the relationship to sort globally based on nested docs (effectively ignoring the root-nested grouping) globally? If so, is this due to a Lucene imposed limitation?

@martijnvg
elastic member

@btiernay You can sort globally based on the nested docs with the current nested sorting support. The global nested sorting won't be changed when inner hits are added that allows to sort nested hits per root / main document hit. Makes sense?

@btiernay

@martijnvg: Sorry for being so dense here, but it is still unclear if I can return nested docs as the root document using this approach. Then, I would be able to sort by the nested doc, without regard to parents, very similar to how parent-child relationships work.

@martijnvg
elastic member

@btiernay No, with this approach the nested inner objects can't be a root document on its own. Nested inner objects are always part of the root document.

@btiernay

@martijnvg: Thanks again for the clarification. Much appreciated. I realize your answer / solution is consistent with the other aspects of nested docs (e.g whole part relationships). However, I'm very curious if my proposal is technically feasible since I think it could be very powerful and more performant than the alternative parent-child approach.

@martijnvg
elastic member

@btiernay I think your idea is technically possible. Right now the inner nested objects don't have a unique identifier like regular root document have. In theory we could use the path + the offset in the nested array as additional data to the root documents's unique identifier for the inner nested object's unique key.

Also inner objects are tightly coupled to the lifecycle of the root document. If a root document is removed all the nested inner objects (which are stored as separate Lucene documents) are removed as well. Updating or adding individual nested inner objects isn't possible without reindexing the root document and all other nested inner objects (Lucene document block). If nested inner objects were exposed as independent hits in the search result, I guess the fact that these hits have limitations would be confusing.

@btiernay

That gives me hope then :)

we could use the path + the offset in the nested array as additional data to the root documents's unique identifier for the inner nested object's unique key

That's an interesting idea. I hadn't thought about the id field. I like it :)

If nested inner objects were exposed as independent hits in the search result, I guess the fact that these hits have limitations would be confusing.

Perhaps, but consider "write once" applications in which the documents rarely, (if ever) change. Given the potential speedup / memory improvements that can be achieved using block documents (especially for deeply nested or wide documents), it would be a shame to not expose this functionality.

@martijnvg martijnvg was assigned Aug 27, 2013
@julianhille

any progress on this one? cause i'd love to see this.
Otherwise any etimated time or any way to help out?

@GabrielKast

I would also like to know if there is any progress on that feature. Any way we could help out?
I have more or less the same use case as described. I wouls like to select some children in a tree of data where the chlildren have sense only when they are included in their parent. (The use case is : I have a company, with many establisments linked to that company, I would like to query/retrieve the establishment based on their geographical position. The position belongs to the children, but all the "good data" are linkde to the parent document ie the Company)
I can manage to do something with parent/children, but I need to duplicate some data from the parent to the children and vice-versa.
Another way to avoid issues would be to be able to embed the parent in a query with a "has_parent"/embed the child in a query with a "has_child". I know it's not in the perimeter of that issue but maybe it's a simpler idea?
I have the intuition that nested_hits would be a faster solution.
Something might also be difficult (I am not familiar with ES internals..) : how do you compute the "nested_count" ? to know how many are the nested hits. Maybe it's more of a parent/children feature.
(please be kind if I'm a little clumsy I don't usually post comment on github :) )

@gpstathis

+1, will be great to have this, right now we are using nested queries and have to filter the sub-docs at the app layer.

@martijnvg
elastic member

@julianhille @GabrielKast @gpstathis I'm working on this feature. I have an implementation that works for nested inner objects: https://github.com/martijnvg/elasticsearch/commits/nested_fields

The goal is to put this into a more generic framework (in FetchPhase), so that the notion of the inner hits also works with parent/child and that an inner hit has the same set of fetch features as a normal hit (for example highlighting and matched queries, explain etc).

If you take a look at the NestedHitsTests test in the mentioned branch, you can see how it can be used.

There is no ETA for this feature yet.

@gpstathis

@martijnvg this goes above and beyond what I was hoping to see. Since the nested hits are sorted, having their original offset in the source array makes total sense and is a nice touch.

The only thing I'm having a hard time following (and it's probably due to my lack of understanding of the internals) are the key values for the Map returned by nestedHits(). E.g.

[...]
assertThat(response.getHits().getHits()[1].nestedHits().get("1").getHits().length, equalTo(1));
[...]

Is the key string value "1" an internal field name? Apologies in advance if there is an obvious answer to this that I'm missing.

@brusic

"martijnvg authored 10 months ago"

You are breaking my heart! :)

How does the new aggregation framework tie into this feature? Not an issue or are you designing around it?

@martijnvg
elastic member

@gpstathis That is just a label that can given to a nested field (like faceting). In this example label 1 just points the the nested query and reuses the inner query and the path:

SearchResponse response = client().prepareSearch("test")
      .setQuery(QueryBuilders.nestedQuery("nested", QueryBuilders.termQuery("field2", "value3")))
      .addNestedHit("1", "nested") // 1 is just a label to identify the nested inner objects later on in the response.
      .execute().actionGet();

This will most likely look different once it is in, I think initially with the reusing mechanism (so the path and inner query need to be specified twice).

@brusic The nested fields feature, doesn't have a directly link to aggregations, this just returns per hit the top matching nested inner objects for each hit. Aggregations has a nested aggregator that allows one to aggregate on nested objects.

@gpstathis

Thanks @martijnvg. Clear now.

@brusic

Thanks for the updates. There is no ETA for this feature, but is there a possibility for it to be include in 0.90? Not sure when the feature freeze will occur for the 0.90 branch.

@martijnvg
elastic member

Nowadays the transport layer allows the use of versioning, so that shouldn't keep us from adding it to 0.90. If this gets in and the 0.90 is still the branch we actively release from and there is no break in backwards compatibility in the rest layer then I think it can also be added to 0.90 branch.

@kimchy
elastic member

Yea, its possible, though the way forward to me with this feature is the concept of generic inner (to any degree) hits, which will apply to nested or children, and in which case, any "fetch" phase logic (like highlighting, ...) can be done on those. For that, we need to restructure how we implement the fetch phase, which is quite a bit of work. With the breadth of work left for 1.0, I think its safe to say, at least in terms of our end, that we will get to this feature post 1.0.

What @martijnvg did is good, it helped us expose what we need, and realize the need for inner hits concept and the work around fetch phase refactoring.

@mallyadeepak

Is there an ETA for this feature yet ? We are exploring ES to implement custom scoring functions that will require access to nested hits. This doesn't seem to be possible with nested type or the parent/child type. Would appreciate any suggestions on alternative ways to solve this too.

@martijnvg
elastic member

@mallyadeepak There is no ETA for this feature. However there is a workaround for parent/child type that allows you to access the child hits for the matches parents: https://speakerdeck.com/mvgroningen/document-relations-at-ossc?slide=41

@GabrielKast

Hi Martijn,
What is the "multi search api workaround" that you talk about in your slides ?
Thx

@martijnvg
elastic member

@GabrielKast It allows you to retrieve the top matching child hits that contributed to the score of each of the returned parent hits via an extra subsequent request.

@Yasaswani

When including the nested hits patch, the nested hits are being returned but the other test cases of elastic search are failing .(Test cases Run from maven when packaging the jar). the exceptions are like the one below. Any fix for the below issue?

at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

     at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
     at java.lang.Thread.run(Thread.java:722)

Caused by: java.lang.IndexOutOfBoundsException: Invalid combined index of 361, maximum is 358
at org.jboss.netty.buffer.SlicedChannelBuffer.(SlicedChannelBuffer.java:46)
at org.jboss.netty.buffer.HeapChannelBuffer.slice(HeapChannelBuffer.java:201)
at org.elasticsearch.transport.netty.ChannelBufferStreamInput.readBytesReference(ChannelBufferStreamInput.java:56)
at org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:88)
at org.elasticsearch.common.io.stream.AdapterStreamInput.readBytesReference(AdapterStreamInput.java:64)
at org.elasticsearch.search.fetch.nested.NestedSearchHit.readFrom(NestedSearchHit.java:80)
at org.elasticsearch.search.fetch.nested.NestedSearchHit.read(NestedSearchHit.java:72)
at org.elasticsearch.search.fetch.nested.NestedSearchHits.readFrom(NestedSearchHits.java:70)
at org.elasticsearch.search.fetch.nested.NestedSearchHits.read(NestedSearchHits.java:59)
at org.elasticsearch.search.internal.InternalSearchHit.readFrom(InternalSearchHit.java:652)
at org.elasticsearch.search.internal.InternalSearchHit.readSearchHit(InternalSearchHit.java:520)
at org.elasticsearch.search.internal.InternalSearchHits.readFrom(InternalSearchHits.java:219)
at org.elasticsearch.search.internal.InternalSearchHits.readFrom(InternalSearchHits.java:199)
at org.elasticsearch.search.internal.InternalSearchHits.readSearchHits(InternalSearchHits.java:193)
at org.elasticsearch.search.internal.InternalSearchResponse.readFrom(InternalSearchResponse.java:109)
at org.elasticsearch.search.internal.InternalSearchResponse.readInternalSearchResponse(InternalSearchResponse.java:103)
at org.elasticsearch.action.search.SearchResponse.readFrom(SearchResponse.java:230)
at org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:146)
... 23 more
Throwable #2: java.lang.RuntimeException: Unclosed Searchers instance for shards: [[test][1],[test][0],[test][2],]
at org.elasticsearch.test.ElasticsearchTestCase.ensureAllSearchersClosed(ElasticsearchTestCase.java:165)
at org.elasticsearch.test.ElasticsearchIntegrationTest.after(ElasticsearchIntegrationTest.java:245)
at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1558)
at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:794)
at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:745)
at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:647)
at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:681)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:692)
at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:722)

@martijnvg
elastic member

@Yasaswani Yes, that is a bug in the serialisation layer. I've fixed and rebased the branched, so should be fixed now.

@Yasaswani

Do we have a patch for the fix?

@Yasaswani

And we also noticed that , for the method addNestedHit(String name, String path, QueryBuilder childQuery, int offset, int size, SortBuilder sort, String... fields) when we pass a list of fields to be returned it is always returing only the last field. We made a fix this in NestedhitsParseElement .
if (fields == null){
fields = Lists.newArrayList(parser.text());
}else{
fields.add(parser.text());
}

Is there a fix available for this issue ?

@Yasaswani

The below test suite fails even after the serialization fix
Suite: org.elasticsearch.cluster.SpecificMasterNodesTests
ERROR 17.9s | SpecificMasterNodesTests.testAliasFilterValidation <<<

Throwable #1: org.elasticsearch.ElasticsearchIllegalArgumentException: failed to parse filter for [a_test]
at __randomizedtesting.SeedInfo.seed([7AA6B3D372CC8C9D:F7FC15B48589BAD0]:0)
at org.elasticsearch.cluster.metadata.MetaDataIndexAliasesService$1.execute(MetaDataIndexAliasesService.java:168)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:308)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NullPointerException
at org.elasticsearch.index.query.NestedFilterParser.parse(NestedFilterParser.java:156)
at org.elasticsearch.index.query.QueryParseContext.executeFilterParser(QueryParseContext.java:279)
at org.elasticsearch.index.query.QueryParseContext.parseInnerFilter(QueryParseContext.java:260)
at org.elasticsearch.index.query.IndexQueryParserService.parseInnerFilter(IndexQueryParserService.java:273)
at org.elasticsearch.cluster.metadata.MetaDataIndexAliasesService$1.execute(MetaDataIndexAliasesService.java:163)
... 5 more

Why is the SearchContext null in NestedFilterParser?

@martijnvg
elastic member

@Yasaswani When adding an index alias with filter there is just no search context available, this is an issue that should be fixed. This error only happens when you add an index alias with a nested filter in this branch, during a normal search this error shouldn't happen.

The branch I mentioned is experimental to just prove that returning nested hits is possible. It hasn't been thoroughly tested yet, so there maybe other bugs hidden as well in this branch.

@koombal

@gpstathis can i ask how do u accomplish further filtering through the app layer? we are getting the parent docs and we need to have both the parent docs and their matching nested docs... how would u suggest we get the right nested docs using the app layer?

@btiernay

"the app layer"++

@gpstathis

@koombal: we query child doc ids and for each matching parent doc, we discard any child docs that don't match the child ids we used in the query. Works for us because we use ids for search terms but your mileage may vary depending on how complex your own queries are.

@koombal

@martijnvg are there any news regarding the ETA of the feature? we already have a complex doc structure in a NoSQL DB that syncs with ES through a river - In order to support parent child model we will have to make a lot of modifications in our app. the only solution we came up with was to use the search results as an input for a new search using lucene in memory... (RAMDirectory) - where we will still have to keep references to the holding objects...
Any advice will be appreciated

@addedsparkle

Also extremely interested in when this feature would be available.

@voleg

+1

@brusic

Since #7164 has been merged, where does that leave this issue?

@martijnvg
elastic member

@brusic It is getting close. Work is being done on a PR that adds inner_hits for including nested inner objects / children hits in regular search hits.

@pspanja

+1

@cphoover

@martijnvg this feature would be super useful for my use case. We have products that contain an array of material subdocuments with attributes attached to those materials (price, title, color... etc). We need the ability to be able to see results on both the product and material level.

Any word on a timeline for this "inner_hits" feature?

For now I am contemplating having two product types a rolled_up product and a material type. Search now entails two queries one for the matching style. Then one for the material that has a style code matching the first query.

@andrerom

In our case we have a CMS with (like most other such systems) a model of Content -< Location, and we would like to be able to search on content as well as locations without having to index twice.

Potentially tricky thing is how this feature would work when searching for the nested documents (Locations) and getting hits for several of them. Ideally in our case we would prefer several search hits (Content) with corresponding inner object hits (Location), so sorting is correct from elastic search side.

@brusic

@martijnvg Thanks for all the hard work. What is the current PR that is being worked on? I would like to try out some development branches. I'm hoping to see a 1.5 tag someday. :)

@martijnvg
elastic member

@cphoover @andrerom @brusic I opened a PR for this feature: #8153

I think the PR is in a good state and it is currently in the review state.

@cphoover

Thank you @martijnvg would love to see this PR land, as it would be perfect for our use case, and I'm sure, many others' as well.

@s1monw

+1

@brusic

Well if @s1monw +1ed the issue, then it must be important. Nevermind my constant pestering. :)

@onuralp

+1

@martijnvg martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Dec 2, 2014
@martijnvg martijnvg Added `inner_hits` feature that allows to include nested hits.
Inner hits allows to embed nested inner objects, children documents or the parent document that contributed to the matching of the returned search hit as inner hits, which would otherwise be hidden.

Closes #8153
Closes #3022
Closes #3152
d7e224d
@martijnvg martijnvg closed this in #8153 Dec 2, 2014
@martijnvg martijnvg added a commit that referenced this issue Dec 2, 2014
@martijnvg martijnvg Added `inner_hits` feature that allows to include nested hits.
Inner hits allows to embed nested inner objects, children documents or the parent document that contributed to the matching of the returned search hit as inner hits, which would otherwise be hidden.

Closes #8153
Closes #3022
Closes #3152
025c82c
@brusic
@ricardo-silveira

sorry, the topic was huge and I couldn't read it all. You mean that I can make a query and return the nested documents, instead of the main doc? So far I have a workaround, I use _source to help myself and I plug some python to the mix....

@ricardo-silveira

Are you sure?

I get the following error:

nested: QueryParsingException[[crawler_2015-04-14] [nested] filter does not support [inner_hits]]; }]",
"status": 400

@brusic
@ricardo-silveira

In my case I am using the version 1.4.4...

We were using 0.9, now we have just migrated to 1.4, and then you tell me that this feature is avaiable in a new release? :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment