- 
                Notifications
    
You must be signed in to change notification settings  - Fork 25.6k
 
Correctly identify parent of copy_to destination field for synthetic source purposes #113153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2a0a8e1    to
    f61215c      
    Compare
  
    fc3da97    to
    9ad3fba      
    Compare
  
    | 
           I am not sure what 9.0.0 bwc snapshots are?  | 
    
| 
           Pinging @elastic/es-storage-engine (Team:StorageEngine)  | 
    
| if (indexSettings.getSkipIgnoredSourceWrite() == false) { | ||
| /* | ||
| Mark this field as containing copied data meaning it should not be present | ||
| in synthetic _source (to be consistent with stored _source). | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: update text, no given field here..
| Mark this field as containing copied data meaning it should not be present | ||
| in synthetic _source (to be consistent with stored _source). | ||
| Ignored source values take precedence over standard synthetic source implementation | ||
| so by adding this nothing entry we "disable" field in synthetic source. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/nothing/void/
| if (parent == null) { | ||
| // There are scenarios when this can happen: | ||
| // 1. all values of the field that is the source of copy_to are null | ||
| // 2. copy_to points at a field inside disabled object | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/inside disabled/inside a disabled/
| } | ||
| int offset = parent.isRoot() ? 0 : parent.fullPath().length() + 1; | ||
| ignoredFieldValues.add( | ||
| new IgnoredSourceFieldMapper.NameValue(copyToField, offset, XContentDataHelper.nothing(), context.doc()) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename nothing() to voidValue()?
| } | ||
| 
               | 
          ||
| // Go one level down if possible | ||
| var pathComponents = pathInCurrent.split("\\."); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: do this once before the loop, increment the start element on each iteration?
| private List<IgnoredSourceFieldMapper.NameValue> ignoredValues; | ||
| // If this loader has anything to write. | ||
| // In special cases this can be false even if doc values loaders or stored field loaders | ||
| // have values. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: reference the copy_to case as an example?
| } | ||
| 
               | 
          ||
| if (ignoredValues != null && ignoredValues.isEmpty() == false) { | ||
| // Use an ordered map between field names and writer functions, to order writing by field name. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy this comment.
| } | ||
| 
               | 
          ||
| @Override | ||
| public void prepare() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda like this version, compared to running everything in write().
| DocValuesLoader docValuesLoader(LeafReader leafReader, int[] docIdsInLeaf) throws IOException; | ||
| 
               | 
          ||
| /** | ||
| Perform any preprocessing needed before producing synthetic source. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add: " to deduce whether this mapper (and its children, if any) have values to write"
| The expectation is for this method to be called before {@link SyntheticFieldLoader#hasValue()} | ||
| and {@link SyntheticFieldLoader#write(XContentBuilder)} are used. | ||
| */ | ||
| default void prepare() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prepareHasValue? to make it more concrete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this more because it's a verb and it does not have has in there which has boolean associations.
| k: "hey" | ||
| - match: | ||
| hits.hits.0.fields: | ||
| a.b.c: [ "hey" ] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we decide to match a.b.c versus a.b\.c'? I was wondering about that in findParentObject`..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got it, we first check for a leaf field match before checking for object matches.
| 
           This leads to a memory bump as we first create the map containing all values to write per doc and then start writing. I'd think this is not that huge per doc (a few MB, worst case), so it should be acceptable. The logic is fairly clean, well done.  | 
    
| return; | ||
| } | ||
| 
               | 
          ||
| List<NameValue> ignoredFieldValues = new ArrayList<>(context.getIgnoredFieldValues()); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: wrap in ArrayList within the branch below, otherwise we don't mutate the returned list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we use this list outside of the branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah i see what you mean.
| 
               | 
          ||
| // Go one level down if possible | ||
| var pathComponents = pathInCurrent.split("\\."); | ||
| var childMapperName = new StringBuilder(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create the StringBuilder outside the loop and then call setLength(0) in the loop to avoid creating it multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at setLength and it replaces internal byte array with a new one so i think it's equivalent?
| String pathInCurrent = leafFieldPath; | ||
| 
               | 
          ||
| while (current != null) { | ||
| if (current.mappers.containsKey(pathInCurrent)) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we have to check for pathInCurrent to be null since we pass it to containsKey. leafFieldPath is copyToFields right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of a scenario when we would add null as a copy to field in document parser context. I am not sure what this code should do in that case too.
| 
           @elasticmachine update branch  | 
    
| 
           @elasticmachine update branch  | 
    
| 
           Test failures look like preexisting issues, there are open issues for them like #113301. I guess they are muted on main but not in 8.16 branch?  | 
    
| 
           I'll go ahead and merge this so that we have data generation tests running.  | 
    
          💔 Backport failed
 You can use sqren/backport to manually backport by running   | 
    
…source purposes (elastic#113153) (cherry picked from commit b9855b8) # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java
          💚 All backports created successfully
 Questions ?Please refer to the Backport tool documentation  | 
    
This PR is a continuation of #112294 and handles a case of copy_to destination fields being inside object that was not properly addressed there.