New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement synthetic source support for annotated text field #107735
Implement synthetic source support for annotated text field #107735
Conversation
Documentation preview: |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Related #107734. |
Hi @lkts, I've created a changelog YAML for you. |
...t/src/yamlRestTest/resources/rest-api-spec/test/mapper_annotatedtext/20_synthetic_source.yml
Outdated
Show resolved
Hide resolved
buildkite test this please |
@elasticmachine update branch |
...ext/src/main/java/org/elasticsearch/index/mapper/annotatedtext/AnnotatedTextFieldMapper.java
Show resolved
Hide resolved
@@ -595,7 +600,23 @@ protected boolean supportsIgnoreMalformed() { | |||
|
|||
@Override | |||
protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) { | |||
throw new AssumptionViolatedException("not supported"); | |||
assumeFalse("ignore_malformed not supported", ignoreMalformed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the reuse here
@@ -0,0 +1,164 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also test a few failure scenarios? Especially because using synthetic source depends on multiple settings combined together (stored, multi-field).
Also I am wondering what happens if we have a field that is both stored and has the keyword multi-filed. I guess in that case we would prefer relying on the stored field because it would allow us to return the contents exactly as they are...including duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failure scenarios are tested in TextFieldFamilySyntheticSourceSupport#invalidExample
.
If field is stored, we always prefer that over a multi field. I'll add a test.
return null; | ||
public static KeywordFieldMapper getKeywordFieldMapperForSyntheticSource(Iterable<? extends Mapper> multiFields) { | ||
for (Mapper sub : multiFields) { | ||
if (sub.typeName().equals(KeywordFieldMapper.CONTENT_TYPE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: flipping equals reduces the chance for a NPE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is existing code that i moved so i'll leave it as is if you don't mind.
outPrimary.add(v); | ||
} | ||
}); | ||
List<String> outList = store ? outPrimary : new HashSet<>(outPrimary).stream().sorted().collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash set -> sort -> can we just use a sorted set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is existing code that i moved so i'll leave it as is if you don't mind.
List<String> loadBlock; | ||
if (loadBlockFromSource) { | ||
// The block loader infrastructure will never return nulls. Just zap them all. | ||
loadBlock = in.stream().filter(m -> m != null).toList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just filter for null above when creating in
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in
can have nulls, it is a valid value for a field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This PR adds synthetic source support for
annotated_text
fields. Existing implementation fortext
is reused including test infrastructure so the majority of the change is moving and making things accessible.Contributes to #106460, #78744.