Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semantic_text field mapping #107519

Conversation

carlosdelest
Copy link
Member

This PR contains changes related to just the semantic_text field mapping, including some specific YAML mapping tests.

Automatic Inference on ingestion will be included as a separate PR. This change deals with just the mapping side:

  • Provides the mapping definition for semantic_text
  • On document ingestion, checks for the model settings in its mapping. If it's not present, updates it to ensure documents adhere to the mapping in the future.
  • Creates the appropriate mappings for dense_vector or sparse_vector fields internally
  • Validates and indexes the inference results as part of the document ingestion using these internal mappings

carlosdelest and others added 9 commits April 16, 2024 10:06
Fix the merging of the object field within the semantic_text mapper, the merge context should be set at the parent level (was at the object/child level before merging).
…-text-field-mapping-specifics

# Conflicts:
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/SemanticTextFeature.java
@carlosdelest carlosdelest added >non-issue :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team labels Apr 16, 2024
@carlosdelest carlosdelest marked this pull request as ready for review April 16, 2024 13:09
@carlosdelest carlosdelest requested a review from a team as a code owner April 16, 2024 13:09
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass complete. Need to go through the tests now :)

@@ -22,6 +22,7 @@
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.Set;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test here that ensures the InferenceFieldMapper logic is correct? Generally this is done by making a small test only plugin that provides a mapper that satisfies this InferenceFieldMapper interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I've added them in a new test class - MappingLookupInferenceFieldMapperTests. LMKWYT!

@@ -17,6 +17,7 @@
requires org.apache.httpcomponents.httpasyncclient;
requires org.apache.httpcomponents.httpcore.nio;
requires org.apache.lucene.core;
requires org.elasticsearch.logging;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this inclusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - it seems we're not using logging. I'm removing it 👍

args -> new InferenceResult((String) args[0], (ModelSettings) args[1], (List<Chunk>) args[2])
);

@SuppressWarnings("unchecked")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@SuppressWarnings("unchecked")

This doesn't need the suppression

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

public class SemanticTextFieldMapper extends FieldMapper implements InferenceFieldMapper {
public static final String CONTENT_TYPE = "semantic_text";

private static final Logger logger = LogManager.getLogger(SemanticTextFieldMapper.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logger is unused?

Comment on lines 389 to 394
if (current == null) {
conflicts.addConflict("model_settings", "");
return false;
}
conflicts.addConflict("model_settings", "");
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two branches are exactly the same, is this on purpose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes no sense, I'm simplifying it

@carlosdelest
Copy link
Member Author

@elasticsearchmachine update branch

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't get through all of this yet but left a couple of things that should get fixed IMO. I'll continue later

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the tests. But I think this is almost ready to go :)

@@ -0,0 +1,105 @@
setup:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some tests for the following cases? Yaml tests give us a ton more coverage.

  • Multi-field (itself as a multi-field and multi-fields UNDER it)
  • nested fields
  • copy to
  • Missing inference chunk text, missing inference results (you already have good tests around bad params or mixed types, huzzah!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ben, those were definitely good tests to have:

Multi-field (itself as a multi-field and multi-fields UNDER it)

  • It can't be used as a multifield (added test)
  • For now, we're disabling multi-fields for semantic_text and provide support later (added check and test). It would imply checking the field content to understand if it's coming from a reindex or an index operation in order to access the appropriate content.

nested fields
Added test to check it can be used as a nested field

copy to

  • For the same reasons than multi-fields, disabling to use it as the origin of copy_to.
  • It can be used as a copy_to target, but this support will be added for the inference PR to come next.

Missing inference chunk text, missing inference results
Added them 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the copy-to check will be interesting, we need to be sure to handle it correctly :)

@carlosdelest
Copy link
Member Author

@elasticmachine update branch

@carlosdelest
Copy link
Member Author

@elasticmachine update branch

@carlosdelest carlosdelest merged commit aad04b1 into elastic:main Apr 30, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants