Skip to content

Conversation

@lkts
Copy link
Contributor

@lkts lkts commented Sep 18, 2024

This PR is a continuation of #112294 and handles a case of copy_to destination fields being inside object that was not properly addressed there.

@lkts lkts force-pushed the fix_synthetic_source_copy_to branch from fc3da97 to 9ad3fba Compare September 19, 2024 16:14
@lkts
Copy link
Contributor Author

lkts commented Sep 19, 2024

I am not sure what 9.0.0 bwc snapshots are?

@lkts lkts marked this pull request as ready for review September 19, 2024 22:16
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

if (indexSettings.getSkipIgnoredSourceWrite() == false) {
/*
Mark this field as containing copied data meaning it should not be present
in synthetic _source (to be consistent with stored _source).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: update text, no given field here..

Mark this field as containing copied data meaning it should not be present
in synthetic _source (to be consistent with stored _source).
Ignored source values take precedence over standard synthetic source implementation
so by adding this nothing entry we "disable" field in synthetic source.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/nothing/void/

if (parent == null) {
// There are scenarios when this can happen:
// 1. all values of the field that is the source of copy_to are null
// 2. copy_to points at a field inside disabled object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/inside disabled/inside a disabled/

}
int offset = parent.isRoot() ? 0 : parent.fullPath().length() + 1;
ignoredFieldValues.add(
new IgnoredSourceFieldMapper.NameValue(copyToField, offset, XContentDataHelper.nothing(), context.doc())
Copy link
Contributor

@kkrik-es kkrik-es Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename nothing() to voidValue()?

}

// Go one level down if possible
var pathComponents = pathInCurrent.split("\\.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: do this once before the loop, increment the start element on each iteration?

private List<IgnoredSourceFieldMapper.NameValue> ignoredValues;
// If this loader has anything to write.
// In special cases this can be false even if doc values loaders or stored field loaders
// have values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: reference the copy_to case as an example?

}

if (ignoredValues != null && ignoredValues.isEmpty() == false) {
// Use an ordered map between field names and writer functions, to order writing by field name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy this comment.

}

@Override
public void prepare() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like this version, compared to running everything in write().

DocValuesLoader docValuesLoader(LeafReader leafReader, int[] docIdsInLeaf) throws IOException;

/**
Perform any preprocessing needed before producing synthetic source.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add: " to deduce whether this mapper (and its children, if any) have values to write"

The expectation is for this method to be called before {@link SyntheticFieldLoader#hasValue()}
and {@link SyntheticFieldLoader#write(XContentBuilder)} are used.
*/
default void prepare() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepareHasValue? to make it more concrete.

Copy link
Contributor Author

@lkts lkts Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this more because it's a verb and it does not have has in there which has boolean associations.

k: "hey"
- match:
hits.hits.0.fields:
a.b.c: [ "hey" ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we decide to match a.b.c versus a.b\.c'? I was wondering about that in findParentObject`..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got it, we first check for a leaf field match before checking for object matches.

@kkrik-es
Copy link
Contributor

This leads to a memory bump as we first create the map containing all values to write per doc and then start writing. I'd think this is not that huge per doc (a few MB, worst case), so it should be acceptable.

The logic is fairly clean, well done.

return;
}

List<NameValue> ignoredFieldValues = new ArrayList<>(context.getIgnoredFieldValues());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: wrap in ArrayList within the branch below, otherwise we don't mutate the returned list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we use this list outside of the branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah i see what you mean.


// Go one level down if possible
var pathComponents = pathInCurrent.split("\\.");
var childMapperName = new StringBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create the StringBuilder outside the loop and then call setLength(0) in the loop to avoid creating it multiple times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at setLength and it replaces internal byte array with a new one so i think it's equivalent?

String pathInCurrent = leafFieldPath;

while (current != null) {
if (current.mappers.containsKey(pathInCurrent)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we have to check for pathInCurrent to be null since we pass it to containsKey. leafFieldPath is copyToFields right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a scenario when we would add null as a copy to field in document parser context. I am not sure what this code should do in that case too.

@lkts
Copy link
Contributor Author

lkts commented Sep 20, 2024

@elasticmachine update branch

@lkts
Copy link
Contributor Author

lkts commented Sep 20, 2024

@elasticmachine update branch

@lkts
Copy link
Contributor Author

lkts commented Sep 20, 2024

Test failures look like preexisting issues, there are open issues for them like #113301. I guess they are muted on main but not in 8.16 branch?

@lkts
Copy link
Contributor Author

lkts commented Sep 20, 2024

I'll go ahead and merge this so that we have data generation tests running.

@lkts lkts merged commit b9855b8 into elastic:main Sep 20, 2024
12 of 17 checks passed
@lkts lkts deleted the fix_synthetic_source_copy_to branch September 20, 2024 19:08
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 113153

lkts added a commit to lkts/elasticsearch that referenced this pull request Sep 20, 2024
…source purposes (elastic#113153)

(cherry picked from commit b9855b8)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java
@lkts
Copy link
Contributor Author

lkts commented Sep 20, 2024

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

lkts added a commit that referenced this pull request Sep 23, 2024
…hetic source purposes (#113153) (#113307)

* Correctly identify parent of copy_to destination field for synthetic source purposes (#113153)

(cherry picked from commit b9855b8)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java

* Comma
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants