Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semantic_text field mapper and inference #107262

Conversation

carlosdelest
Copy link
Member

This PR adds the semantic_text field mapping and inference actions:

  • ShardBulkInferenceActionFilter: An ActionFilter that intercepts BulkShardRequests and perform inference on the ones that have inference field metadata associated. Inference for the appropriate fields is generated and model settings included as part of the inference results.
  • SemanticTextFieldMapper:
    • Provides the mapping definition for semantic_text
    • On document ingestion, checks for the model settings in its mapping. If it's not present, updates it to ensure documents adhere to the mapping in the future.
    • Creates the appropriate mappings for dense_vector or sparse_vector fields internally
    • Validates and indexes the inference results as part of the document ingestion using these internal mappings

Mikep86 and others added 30 commits January 12, 2024 09:29
Store semantic_text model info in IndexMetadata:

On document ingestion, we need to perform inference only once, in the coordinating node. Otherwise, we would be doing inference for each of the shards the document is stored in.

The problem with the coordinating node is that it doesn't necessarily hold mapping information if it is not used for storing an index. A pure coordinating node doesn't have any mapping information at all.

We need to understand when we need to generate text embeddings on the coordinating node. This means that the model information associated with index fields needs to be efficiently accessed from there.

This information needs to be kept up to date with mapping changes, and not be recomputed otherwise.

The model / fields information is going to be included as part of the IndexMetadata, to ensure it is communicated to all nodes in the cluster.
Adds SemanticTextInferenceResultFieldMapper, which indexes inference results for semantic_text fields.
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	server/src/test/java/org/elasticsearch/snapshots/SnapshotResiliencyTests.java
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
…elastic#106357)

---------

Co-authored-by: carlosdelest <carlos.delgado@elastic.co>
…ing (elastic#106560)

This PR refactors the semantic text field mapper to register its sub fields in the mapping instead of re-creating them each time when parsing documents.
It also fixes the generation of these fields in case the semantic text field is defined in an object field.
Lastly this change adds a new section called model_settings in the field parameter that is updated by the field mapper when inference results are received from a bulk action. The model settings are available in the fields as soon as the first document with the inference field is ingested and they are used to validate that updates. They are used to ensure consistency between what's used in the bulk action and what's defined in the field mapping.
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
…ce metadata in `IndexMetadata` (elastic#106743)

This change refactors the integration of the field inference metadata in IndexMetadata. Instead of partial diffs, the new class simply sends the entire object as diff if it has changed.
This PR also rename the fields and methods related to the inference fields consistently.
The inference phase (in the transport shard bulk action) is also changed so that inference is not called if:

The document contains a value for the inference input.
The document also contains a value for the inference results of that field (in the _inference map).
If the document contains no value for the inference input but an inference result for that field, it is marked as failed.
---------

Co-authored-by: carlosdelest <carlos.delgado@elastic.co>
try {
// make sure that we don't expand dots in field names while parsing
context.path().setWithinLeafObject(true);
if (context.path().isWithinLeafObject() == false) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isWithinLeafObject() was not restored previously to its original value. Now we need to have that for parsing inference results.

@@ -106,6 +107,12 @@ public void testCopyToFieldsParsing() throws Exception {

fieldMapper = mapperService.documentMapper().mappers().getMapper("new_field");
assertThat(fieldMapper.typeName(), equalTo("long"));

MappingLookup mappingLookup = mapperService.mappingLookup();
assertThat(mappingLookup.sourcePaths("another_field"), equalTo(Set.of("copy_test", "int_to_str_test", "another_field")));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that copy_to information is included in source paths for inference fields

@@ -118,23 +118,25 @@ private SparseEmbeddingResults makeResults(List<String> input) {
for (int i = 0; i < input.size(); i++) {
var tokens = new ArrayList<SparseEmbeddingResults.WeightedToken>();
for (int j = 0; j < 5; j++) {
tokens.add(new SparseEmbeddingResults.WeightedToken(Integer.toString(j), (float) j));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes were needed in the test inference service to properly test sparse_vector with actual indexing

}

@Override
public Collection<ActionFilter> getActionFilters() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating an ActionFilter makes the inference action actually pluggable from the inference plugin - thanks Jim!

…into carlosdelest/semantic-text-field-mapping

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java
#	server/src/main/java/org/elasticsearch/index/mapper/InferenceFieldMapper.java
#	x-pack/plugin/inference/build.gradle
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
#	x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/20_semantic_text_field_mapper.yml
@carlosdelest carlosdelest changed the title semantic_text field mapper semantic_text field mapper and inference Apr 9, 2024
@carlosdelest carlosdelest changed the base branch from main to carlosdelest/semantic-text-inference-add-ons April 9, 2024 14:40
@carlosdelest carlosdelest changed the base branch from carlosdelest/semantic-text-inference-add-ons to main April 9, 2024 14:41
…anges' into carlosdelest/semantic-text-field-mapping
@carlosdelest carlosdelest force-pushed the carlosdelest/semantic-text-field-mapping branch from 4bf1ba3 to cab2534 Compare April 9, 2024 15:27
@carlosdelest carlosdelest force-pushed the carlosdelest/semantic-text-field-mapping branch from cab2534 to 0f57a5b Compare April 9, 2024 15:32
@carlosdelest carlosdelest added >feature :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team >new-field-mapper Added when a new field type / mapper is being introduced labels Apr 10, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @carlosdelest, I've created a changelog YAML for you.

carlosdelest and others added 4 commits April 10, 2024 17:01
…mapping

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	server/src/main/java/org/elasticsearch/cluster/metadata/InferenceFieldMetadata.java
#	server/src/test/java/org/elasticsearch/cluster/metadata/IndexMetadataTests.java
…-field-mapping' into carlosdelest/semantic-text-field-mapping
@carlosdelest carlosdelest marked this pull request as ready for review April 10, 2024 15:05
@carlosdelest carlosdelest requested a review from a team as a code owner April 10, 2024 15:05
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@carlosdelest
Copy link
Member Author

Closing this PR - opening a separate PR for the field mapping, will open a new one with the inference changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature >new-field-mapper Added when a new field type / mapper is being introduced :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants