Skip to content
This repository

Nested Object/Docs Mapping and Searching #1095

Closed
kimchy opened this Issue July 06, 2011 · 8 comments

5 participants

Shay Banon Clinton Gormley Paul Pearcy Jörg Prante Philippe Green
Shay Banon
Owner
kimchy commented July 06, 2011

Nested objects/documents allow to map certain sections in the document indexed as nested allowing to query them as if they are separate docs joining with the parent owning doc.

Note, this feature is experimental and might require reindexing the data if using it.

One of the problems when indexing inner objects that occur several times in a doc is that "cross object" search match will occur, for example:

{
    "obj1" : [
        {
            "name" : "blue",
            "count" : 4
        },
        {
            "name" : "green",
            "count" : 6
        }
    ]
}

Searching for name set to blue and count higher than 5 will match the doc, because in the first element the name matches blue, and in the second element, count matches "higher than 5".

Nested Mapping

Nested mapping allow to map certain inner objects (usually multi instance ones), for example:

{
    "type1" : {
        "properties" : {
            "obj1" : {
                "type" : "nested"
            }
        }
    }
}

The above will cause all obj1 to be indexed as a nested doc. The mapping is similar in nature to setting type to object, except that its nested.

The nested object fields can also be automatically added to the immediate parent by setting include_in_parent to true, and also included in the root object by setting include_in_root to true.

Nested docs will also automatically use the root doc _all field.

Nested Queries

Nested queries allow to search within nested docs, resulting in the root parent doc (join), for example:

{
    "nested" : {
        "path" : "obj1",
        "score_mode" : "avg"
        "query" : {
            "bool" : {
                "must" : [
                    "text" : {"obj1.name" : "blue"},
                    "range" : {"obj1.count" : {"gt" : 5}}
                ]
            }
        }
    }
}

The query path points to the nested object path, and the query (or filter) includes the query that will run on the nested docs matching the direct path, and joining with the root parent docs.

The score_mode allows to set how inner children matching affects scoring of parent. It defaults to avg, but can be total, max and none.

Multi level nesting is automatically supported, and detected, resulting in an inner nested query to automatically match the relevant nesting level (and not root) if it exists within another nested query.

Internal Implementation

Internally, nested objects are indexed as additional documents, but, since they can be guaranteed to be indexed within the same "block", it allows for extremely fast joining with parent docs.

Those internal nested documents are automatically masked away when doing operations against the index (like searching with a match_all query), and they bubble out when using the nested query.

Left Overs

There are many things to still do to have it as a complete. The most important one is have a good story around faceting and nested docs (it will really enhance it as well, especially for key and value based facets).

Shay Banon kimchy closed this in 3a7f766 July 06, 2011
Clinton Gormley
Owner

This is awesome!

Perhaps clarify that root_and_nested only differs from object_and_nested when there are multiple levels to the hierarchy.

Maybe parent_and_nested is a better name than object_and_nested? Or even just have nested and have a flag inside the mapping like

 {foo: { 
    type: "nested", 
    include_in_parent: true/false, 
    include_in_root: true/false
}}
Shay Banon
Owner
kimchy commented July 06, 2011

I like the include thingy, updated the issue and pushing support for that.

Paul Pearcy

Very cool!!!

Curious, will this be able to support the popularity use case? This would be where each document has nested popularity values where these values can be updated independent of the parent doc? Or is it required to update the parent and nested docs at the same time?

At first glance, I don't think so, since I don't see how you can refer to the _id of the nested doc.

Thanks for all the awesome work!

Shay Banon
Owner
kimchy commented July 06, 2011

No, it won't support that, since it only works from a single doc, and you need to reindex the document. Thats another problem that parent child mapping tries to solve a bit, though not perfectly.

Jörg Prante

This is a good thing. I know the separation of regions within a document as "scope search", seen in a product of a former norwegian enterprise search vendor. But the solution provided here seems more elegant and with even less limitations.

Will nested queries with "bool should" instead of "bool must" work as expected?

Shay Banon
Owner
kimchy commented July 10, 2011

@jprante: Yes, any type of query can work "inside" a nested query.

Jörg Prante

A nested query use case might be author name search in a library catalog. Imagine a book title written by two authors, "Smith, Joe" and "Jones, Peter". Searching for "joe jones" or "peter smith" will no longer return false hits. Instead, "joe smith" or "peter jones" will deliver the correct results when using a nested query on the parent of the forename/surname pair.

Philippe Green

Was curious if this feature is still experimental? In what scenarios might one need to reindex data when using this feature?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.