-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typeless parent/child #20257
Comments
I'm not sure we need to retain the notion of types, maybe we could go with something like below? You say that
Something else that made me wonder when reading your proposal is that users generally do not like modifying the structure of their documents for specifying metadata (about the join in that case), so maybe it should remain a meta field, something like below?
|
When thinking about this refactoring I thought we needed to distinguish between the different documents (in case of multiple join fields), but when answering this question I realize we don't :) ,
The other reason I moved away from metadata is that besides the If we can make metadata completely pluggable (from rest layer to field mapper layer) then we specify the required parameter in the url and having the ability to isolate the p/c code. But I don't feel that this should be a requirement for this refactoring. Although if we really want this we do have the time to develop this. |
This is a very good point. |
I think we should do this. It is already a deficiency in the pluggability of Metadata fields. And I like @jpountz suggested api. |
Hi All, Interesting discussion, I'm jumping-in with a couple of questions/comments. Right now, as a user I can independently define the properties of my parent and child documents. Then, when inserting a document, I would either set properties allowed by the parent OR properties allowed by the child. ES won't have any way to tell if the properties are 'parent properties' or 'child properties'. Am I correct ? Another question: Would it be possible to support cross index parent-child relationships? All the best |
Your assumptions are correct. It is indeed hard to keep feature parity when moving forward, however we think that removing types remains a good trade-off by making Elasticsearch easier to use, easier to understand and possibly faster. Cross index parent-child relationships would not be possible as-is since Elasticsearch relies on the fact that a parent and all its children are on the same shard. |
I would like to add to this that today we use the type as marker of what is a child and what is a parent. Today you could use the same field in both parent and child, so properties alone isn't enough to identify what is what. In this case both types would have the same field defined in their mappings. With the proposed change, the So I think there is no step backwards in terms of functionality, but the way how parent child relationships are defined is just different with types no longer being there. |
I totally understand moving mappings up to the index level (basically removal of types), however we use strict mapping and there will be no way with this proposal that we'll be able to enforce which fields are allowed in a child vs a parent. If my understanding is correct, I'll be able to enforce mapping on the union of my logical children fields with logical parent fields, but if a field is logically "allowed" in a child there is nothing preventing it from also being in a parent document. I can certainly live with these constraints, I just want to be sure I understand and plan for the future. Thanks - C |
After some internal discussions with @jpountz and @martijnvg I'd like to update this issue with the latest status.
So the proposal is as follow: Defining a single parent-child relation:
Defining a parent-child-grand-child relation:
Defining multiple parent-child relation:
With this format the relation between each entity is explicit and we can use the hierarchy to validate join values inside documents. Adding a child to an existing parent:
Indexing a parent question:
Indexing a child answer:
Indexing a grandchild comment:
So the plan here is to have a metadata ParentJoinFieldMapper separated in a module. Then we would migrate @clintongormley WDYT ? |
How about replacing
with
It feels more natural to me, but maybe I'm missing something. |
IMO the first option is more visual, you see more clearly the relation tree and you don't have to repeat the join name but I am fine either way. |
The first option is more visual but is somewhat confusing, eg why does
but that encourages users to use several layers of inheritance, which is unwise. I think I prefer the simple version that @jpountz suggested. In the case that a parent has multiple child types, it could be:
Question: Does the parent document need to know about which join field to use? If a document is neither a parent nor a child, does the join field still contain a value, or is it null? |
Right and since most of the use cases are for a single parent-child relation the syntax would be the same anyway:
Yes because we use a different docvalue field for each "parent=>child" relation.
It is not required to add a join field in this case so the field can be missing in the document. |
* Add parent-join module This change adds a new module named `parent-join`. The goal of this module is to provide a replacement for the `_parent` field but as a first step this change only moves the `has_child`, `has_parent` queries and the `children` aggregation to this module. These queries and aggregations are no longer in core but they are deployed by default as a module. Relates #20257
* Add parent-join module This change adds a new module named `parent-join`. The goal of this module is to provide a replacement for the `_parent` field but as a first step this change only moves the `has_child`, `has_parent` queries and the `children` aggregation to this module. These queries and aggregations are no longer in core but they are deployed by default as a module. Relates #20257
This change removes the field data specialization needed for the parent field and replaces it with a simple DocValuesIndexFieldData. The underlying global ordinals are retrieved via a new function called IndexOrdinalsFieldData#getOrdinalMap. The children aggregation is also modified to use a simple WithOrdinals value source rather than the deleted WithOrdinals.Parent. Relates elastic#20257
@jimczi will this new join field type rely on Global Ordinals? Or, more to the point, what will be the practical considerations relevant to using Thanks for any guidance on this -- or anticipated guidance :-) Very excited to see this new feature coming together. |
… work with the new join field type and at the same time maintaining support for the `_parent` meta field type. Relates to elastic#20257
This change moves the parent_id query to the parent-join module and handles the case when only the parent-join field can be declared on an index (index with single type on). If single type is off it uses the legacy parent join field mapper and switch to the new one otherwise (default in 6). Relates elastic#20257
This change moves the parent_id query to the parent-join module and handles the case when only the parent-join field can be declared on an index (index with single type on). If single type is off it uses the legacy parent join field mapper and switch to the new one otherwise (default in 6). Relates #20257
at the same time maintaining support for the `_parent` meta field type/ Relates to elastic#20257
…hild relation within documents of the same index (elastic#24978) * Introduce ParentJoinFieldMapper, a field mapper that creates parent/child relation within documents of the same index This change adds a new field mapper named ParentJoinFieldMapper. This mapper is a replacement for the ParentFieldMapper but instead of using the types in the mapping it uses an internal field to materialize parent/child relation within a single index. This change also adds a fetch sub phase that automatically retrieves the join name (parent or child name) and the parent id for child documents in the response hit fields. The compatibility with `has_parent`, `has_child` queries and `children` agg will be added in a follow up. Relates elastic#20257
… work with the new join field type and at the same time maintaining support for the `_parent` meta field type. Relates to elastic#20257
This change moves the parent_id query to the parent-join module and handles the case when only the parent-join field can be declared on an index (index with single type on). If single type is off it uses the legacy parent join field mapper and switch to the new one otherwise (default in 6). Relates elastic#20257
at the same time maintaining support for the `_parent` meta field type/ Relates to elastic#20257
This is a full backport of the typeless parent child feature (parent-join) introduced in master. It includes: * Introduce ParentJoinFieldMapper, a field mapper that creates parent/child relation within documents of the same index (#24978) * Disallow multiple parent-join fields per mapping (#25002) * Change `has_child`, `has_parent` queries and `childen` aggregation to work with the new join field type and at the same time maintaining support for the `_parent` meta field type. * Move parent_id query to the parent-join module (#25072) * Changed inner_hits to work with the new join field type and at the same time maintaining support for the `_parent` meta field type Relates #20257
This commit adds the docs for the new parent-join field. It explains how to define, index and query this new field. Relates elastic#20257
Will this change also remove Nested objects? Or are there plans to improve the performance gap between a nested object query and a parent-child query?
|
No, nested objects are not impacted by this change.
This issue is just about typeless parent join that will replace the parent/child with types. We did not change the internals and how the query is executed so it should be similar in terms of performance.
The same as before ;) |
* Add documentation for the new parent-join field This commit adds the docs for the new parent-join field. It explains how to define, index and query this new field. Relates #20257
* Add documentation for the new parent-join field This commit adds the docs for the new parent-join field. It explains how to define, index and query this new field. Relates #20257
The typeless parent-join has landed in master and 5.x. The first release for this will be 5.6 where users can start exploring this typeless parent/child by setting New issues can be opened for enhancements or bugs but this long standing issue can be closed ! |
Related to #15613
Types are going to be removed. The parent/child is tightly coupled to types, so that needs to be changed. Parent/child still needs to distinguish between a parent or a child document, so the idea is that a new meta field type name
_join
will be allow that. This new field would replace_parent
meta field. Each_join
meta field type would maintain an indexed field to distinguish between parent and child documents and a doc values field needed for the join operation.Example of how to use typeless parent/child:
Indexing question document (parent):
Indexing answer document (child):
Besides the
parent
the typeless parent/child will also need ajoin
url parameter. In order to prevent adding more feature specific options to transport and rest layer, meta fields should be completely isolated, from rest to mapping layer. This will result in cleaner code and allows parent/child to be moved to a module.Adding
answer-to-comments
join field:Indexing question document (parent):
Indexing answer document (child):
Indexing comment document (grand child):
The text was updated successfully, but these errors were encountered: