Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce ParentJoinFieldMapper, a field mapper that creates parent/child relation within documents of the same index #24978

Merged
merged 3 commits into from
May 31, 2017

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented May 31, 2017

This change adds a new field mapper named ParentJoinFieldMapper. This mapper is a replacement for the ParentFieldMapper but instead of using the types in the mapping
it uses an internal field to materialize parent/child relation within a single index.
This change also adds a fetch sub phase that automatically retrieves the join name (parent or child name) and the parent id for child documents in the response hit fields.
The compatibility with has_parent, has_child queries and children agg will be added in a follow up.

Relates #20257

…hild relation within documents of the same index

This change adds a new field mapper named ParentJoinFieldMapper. This mapper is a replacement for the ParentFieldMapper but instead of using the types in the mapping
it uses an internal field to materialize parent/child relation within a single index.
This change also adds a fetch sub phase that automatically retrieves the join name (parent or child name) and the parent id for child documents in the response hit fields.
The compatibility with `has_parent`, `has_child` queries and `children` agg will be added in a follow up.

Relates elastic#20257
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 🎉! I left some comments.

* A field mapper used internally by the {@link ParentJoinFieldMapper} to index
* the value that link documents in the index (parent _id or _id if the document is a parent).
*/
public final class ParentIDFieldMapper extends FieldMapper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename to ParentIdFieldMapper? That is more consistent with IdFieldMapper name.

FIELD_TYPE.setTokenized(false);
FIELD_TYPE.setOmitNorms(true);
FIELD_TYPE.setHasDocValues(true);
FIELD_TYPE.setIndexOptions(IndexOptions.DOCS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParentFieldMapper sets this to IndexOptions.NONE. I wonder if we should that too here?

Upside of adding an indexed field is that somone doesn't need to use the parent_id query, but on the other hand it does increase the index size and I'm not sure how often one would search by this field. With _parent field the field was made a non indexed field with the idea in mind that only a few would ever use _parent field for plain querying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is inner_hits where we use a DocValueQuery to find the children of a parent. I agree that a plain parent_id query is an edge case but for inner_hits it is quite a common case so could be worth to index by default ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
}

final ArrayList<ParentIDFieldMapper> newParentIDFields = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/final ArrayList/final List

builder.addParent(parent, children);
iterator.remove();
}
return builder;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think additionally we should exposen eager_global_ordinals setting here to like in ParentFieldMapper.java?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I can do that in a follow up. This PR is big enough ;)

Copy link
Member

@martijnvg martijnvg May 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 then we shouldn't forget about it :)

* A field mapper used internally by the {@link ParentJoinFieldMapper} to index
* the value that link documents in the index (parent _id or _id if the document is a parent).
*/
public final class ParentIDFieldMapper extends FieldMapper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this class can not extend from FieldMapper and be a regular class? Initially this confused me a bit.
This class just needs to have parent and children fields. Also for ParentJoinFieldMapper#iterator() it can have a field mapper field (I think KeywordFieldMapper will suffice as type). Then the ParentJoinFieldMapper#parse(...) can just iterate over parentIDFields and add the required join doc values fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with this approach but this field is special and we'll need to add some helpers to handle has_child and has_parent rewriting so I opted for a full field mapper impl.
Also I think it's good to have a dedicated field type. For instance the keyword field uses sorted_set for the doc_values but for this field we want to enforce a single value for each document.

@Override
public Mapper parse(ParseContext context) throws IOException {
// Only one join field per document
checkDuplicateJoinFields(context.doc());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should only allow one join field definition per index? (not sure what the best way to validate this)
Because all parent/child relations can be expressed in a single join field.

Also as is now one is free to add a join field mapper to an index, which I think is problematic when documents have already been indexed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to restrict to a single join field per index but there is no clean way to do it yet.
We'll need to add some logic in the mapper service to allow this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, lets do this in a follow up change.

@jimczi
Copy link
Contributor Author

jimczi commented May 31, 2017

retest this please

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jimczi jimczi merged commit b5d62ae into elastic:master May 31, 2017
@jimczi jimczi deleted the parent_join_field_mapper branch May 31, 2017 16:07
@jimczi
Copy link
Contributor Author

jimczi commented May 31, 2017

Thanks @martijnvg
We'll backport in 5.x (hopefully 5.5) when the has_child and has_parent query support this new field mapper.

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jun 7, 2017
…hild relation within documents of the same index (elastic#24978)

* Introduce ParentJoinFieldMapper, a field mapper that creates parent/child relation within documents of the same index

This change adds a new field mapper named ParentJoinFieldMapper. This mapper is a replacement for the ParentFieldMapper but instead of using the types in the mapping
it uses an internal field to materialize parent/child relation within a single index.
This change also adds a fetch sub phase that automatically retrieves the join name (parent or child name) and the parent id for child documents in the response hit fields.
The compatibility with `has_parent`, `has_child` queries and `children` agg will be added in a follow up.

Relates elastic#20257
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jun 7, 2017
jimczi added a commit that referenced this pull request Jun 7, 2017
This is a full backport of the typeless parent child feature (parent-join) introduced in master.
It includes:
* Introduce ParentJoinFieldMapper, a field mapper that creates parent/child relation within documents of the same index (#24978)
* Disallow multiple parent-join fields per mapping (#25002)
* Change `has_child`, `has_parent` queries and `childen` aggregation to work with the new join field type and at the same time maintaining support for the `_parent` meta field type.
* Move parent_id query to the parent-join module (#25072)
* Changed inner_hits to work with the new join field type and
at the same time maintaining support for the `_parent` meta field type

Relates #20257
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Parent/Child labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Search Search-related issues that do not fall into other categories v5.6.0 v6.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants