Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement synthetic source support for annotated text field #107735

Merged
merged 10 commits into from Apr 25, 2024

Conversation

lkts
Copy link
Contributor

@lkts lkts commented Apr 22, 2024

This PR adds synthetic source support for annotated_text fields. Existing implementation for text is reused including test infrastructure so the majority of the change is moving and making things accessible.

Contributes to #106460, #78744.

Copy link

Documentation preview:

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@lkts
Copy link
Contributor Author

lkts commented Apr 22, 2024

Related #107734.

@elasticsearchmachine
Copy link
Collaborator

Hi @lkts, I've created a changelog YAML for you.

@lkts
Copy link
Contributor Author

lkts commented Apr 23, 2024

buildkite test this please

@lkts
Copy link
Contributor Author

lkts commented Apr 24, 2024

@elasticmachine update branch

docs/plugins/mapper-annotated-text.asciidoc Show resolved Hide resolved
@@ -595,7 +600,23 @@ protected boolean supportsIgnoreMalformed() {

@Override
protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) {
throw new AssumptionViolatedException("not supported");
assumeFalse("ignore_malformed not supported", ignoreMalformed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the reuse here

docs/plugins/mapper-annotated-text.asciidoc Show resolved Hide resolved
@@ -0,0 +1,164 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also test a few failure scenarios? Especially because using synthetic source depends on multiple settings combined together (stored, multi-field).

Also I am wondering what happens if we have a field that is both stored and has the keyword multi-filed. I guess in that case we would prefer relying on the stored field because it would allow us to return the contents exactly as they are...including duplicates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure scenarios are tested in TextFieldFamilySyntheticSourceSupport#invalidExample.

If field is stored, we always prefer that over a multi field. I'll add a test.

return null;
public static KeywordFieldMapper getKeywordFieldMapperForSyntheticSource(Iterable<? extends Mapper> multiFields) {
for (Mapper sub : multiFields) {
if (sub.typeName().equals(KeywordFieldMapper.CONTENT_TYPE)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: flipping equals reduces the chance for a NPE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is existing code that i moved so i'll leave it as is if you don't mind.

outPrimary.add(v);
}
});
List<String> outList = store ? outPrimary : new HashSet<>(outPrimary).stream().sorted().collect(Collectors.toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash set -> sort -> can we just use a sorted set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is existing code that i moved so i'll leave it as is if you don't mind.

List<String> loadBlock;
if (loadBlockFromSource) {
// The block loader infrastructure will never return nulls. Just zap them all.
loadBlock = in.stream().filter(m -> m != null).toList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just filter for null above when creating in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in can have nulls, it is a valid value for a field

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@lkts lkts merged commit e1d902d into elastic:main Apr 25, 2024
14 checks passed
@lkts lkts deleted the feature/annotated_text_synthetic_source branch April 25, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants