-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce ParentJoinFieldMapper, a field mapper that creates parent/child relation within documents of the same index #24978
Conversation
…hild relation within documents of the same index This change adds a new field mapper named ParentJoinFieldMapper. This mapper is a replacement for the ParentFieldMapper but instead of using the types in the mapping it uses an internal field to materialize parent/child relation within a single index. This change also adds a fetch sub phase that automatically retrieves the join name (parent or child name) and the parent id for child documents in the response hit fields. The compatibility with `has_parent`, `has_child` queries and `children` agg will be added in a follow up. Relates elastic#20257
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 🎉! I left some comments.
* A field mapper used internally by the {@link ParentJoinFieldMapper} to index | ||
* the value that link documents in the index (parent _id or _id if the document is a parent). | ||
*/ | ||
public final class ParentIDFieldMapper extends FieldMapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename to ParentIdFieldMapper
? That is more consistent with IdFieldMapper
name.
FIELD_TYPE.setTokenized(false); | ||
FIELD_TYPE.setOmitNorms(true); | ||
FIELD_TYPE.setHasDocValues(true); | ||
FIELD_TYPE.setIndexOptions(IndexOptions.DOCS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParentFieldMapper
sets this to IndexOptions.NONE
. I wonder if we should that too here?
Upside of adding an indexed field is that somone doesn't need to use the parent_id
query, but on the other hand it does increase the index size and I'm not sure how often one would search by this field. With _parent
field the field was made a non indexed field with the idea in mind that only a few would ever use _parent field for plain querying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is inner_hits
where we use a DocValueQuery
to find the children of a parent. I agree that a plain parent_id
query is an edge case but for inner_hits
it is quite a common case so could be worth to index by default ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
} | ||
|
||
final ArrayList<ParentIDFieldMapper> newParentIDFields = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/final ArrayList/final List
builder.addParent(parent, children); | ||
iterator.remove(); | ||
} | ||
return builder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think additionally we should exposen eager_global_ordinals
setting here to like in ParentFieldMapper.java
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I can do that in a follow up. This PR is big enough ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 then we shouldn't forget about it :)
* A field mapper used internally by the {@link ParentJoinFieldMapper} to index | ||
* the value that link documents in the index (parent _id or _id if the document is a parent). | ||
*/ | ||
public final class ParentIDFieldMapper extends FieldMapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this class can not extend from FieldMapper
and be a regular class? Initially this confused me a bit.
This class just needs to have parent
and children
fields. Also for ParentJoinFieldMapper#iterator()
it can have a field mapper field (I think KeywordFieldMapper
will suffice as type). Then the ParentJoinFieldMapper#parse(...)
can just iterate over parentIDFields
and add the required join doc values fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started with this approach but this field is special and we'll need to add some helpers to handle has_child
and has_parent
rewriting so I opted for a full field mapper impl.
Also I think it's good to have a dedicated field type. For instance the keyword field uses sorted_set
for the doc_values but for this field we want to enforce a single value for each document.
@Override | ||
public Mapper parse(ParseContext context) throws IOException { | ||
// Only one join field per document | ||
checkDuplicateJoinFields(context.doc()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should only allow one join field definition per index? (not sure what the best way to validate this)
Because all parent/child relations can be expressed in a single join field.
Also as is now one is free to add a join field mapper to an index, which I think is problematic when documents have already been indexed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to restrict to a single join field per index but there is no clean way to do it yet.
We'll need to add some logic in the mapper service to allow this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, lets do this in a follow up change.
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @martijnvg |
…hild relation within documents of the same index (elastic#24978) * Introduce ParentJoinFieldMapper, a field mapper that creates parent/child relation within documents of the same index This change adds a new field mapper named ParentJoinFieldMapper. This mapper is a replacement for the ParentFieldMapper but instead of using the types in the mapping it uses an internal field to materialize parent/child relation within a single index. This change also adds a fetch sub phase that automatically retrieves the join name (parent or child name) and the parent id for child documents in the response hit fields. The compatibility with `has_parent`, `has_child` queries and `children` agg will be added in a follow up. Relates elastic#20257
This is a full backport of the typeless parent child feature (parent-join) introduced in master. It includes: * Introduce ParentJoinFieldMapper, a field mapper that creates parent/child relation within documents of the same index (#24978) * Disallow multiple parent-join fields per mapping (#25002) * Change `has_child`, `has_parent` queries and `childen` aggregation to work with the new join field type and at the same time maintaining support for the `_parent` meta field type. * Move parent_id query to the parent-join module (#25072) * Changed inner_hits to work with the new join field type and at the same time maintaining support for the `_parent` meta field type Relates #20257
This change adds a new field mapper named ParentJoinFieldMapper. This mapper is a replacement for the ParentFieldMapper but instead of using the types in the mapping
it uses an internal field to materialize parent/child relation within a single index.
This change also adds a fetch sub phase that automatically retrieves the join name (parent or child name) and the parent id for child documents in the response hit fields.
The compatibility with
has_parent
,has_child
queries andchildren
agg will be added in a follow up.Relates #20257