-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[ML] Replace "text embedding" with "dense embedding" #136321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Replace "text embedding" with "dense embedding" #136321
Conversation
The name "text embedding" is used in many places where dense vector embeddings are handled, despite the type of the embedding vector not being exclusive to text embeddings. For example, image or multimodal embeddings may also produce a dense vector. To allow future reuse of classes related to dense vectors with multimodal embeddings, the naming is being changed to the more general "dense embedding". Classes which explicitly relate to text embeddings are not being renamed. This rename is internal to the code only and does not change the name of any JSON objects which currently use "text_embedding", as doing so would be a breaking change. - For everything not exclusively related to text embedding, rename classes, methods and variables to use "dense embedding" instead of "text embedding" - Use correct class name in ElasticTextEmbeddingPayload.TextEmbeddingFloat.PARSER - Correct the javadoc in DenseEmbeddingBitResults
|
Pinging @elastic/ml-core (Team:ML) |
| private static final ConstructingObjectParser<TextEmbeddingFloatResults, Void> PARSER = new ConstructingObjectParser<>( | ||
| TextEmbeddingByteResults.class.getSimpleName(), | ||
| private static final ConstructingObjectParser<DenseEmbeddingFloatResults, Void> PARSER = new ConstructingObjectParser<>( | ||
| DenseEmbeddingFloatResults.class.getSimpleName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parser was previously using the incorrect class name, which would have led to any errors encountered when parsing to report the wrong class.
davidkyle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| namedWriteables.add(new NamedWriteableRegistry.Entry(InferenceResults.class, TextExpansionResults.NAME, TextExpansionResults::new)); | ||
| namedWriteables.add( | ||
| new NamedWriteableRegistry.Entry(InferenceResults.class, MlTextEmbeddingResults.NAME, MlTextEmbeddingResults::new) | ||
| new NamedWriteableRegistry.Entry(InferenceResults.class, MlDenseEmbeddingResults.NAME, MlDenseEmbeddingResults::new) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ this is why NAME can't change
| + embedding.embeddings().get(0).getClass().getName() | ||
| + ". Expected TextEmbeddingFloatResults.Embedding or TextEmbeddingByteResults.Embedding." | ||
| + ". Expected " | ||
| + DenseEmbeddingFloatResults.Embedding.class.getName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getSimpleName() is the class name without the package prefix and and is a bit more readable
| + DenseEmbeddingFloatResults.Embedding.class.getName() | |
| + DenseEmbeddingFloatResults.Embedding.class.getSimpleName() |
| "Validation call did not return expected results type." | ||
| + "Expected a result of type [" | ||
| + TextEmbeddingFloatResults.NAME | ||
| + DenseEmbeddingFloatResults.NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| + DenseEmbeddingFloatResults.NAME | |
| + DenseEmbeddingResults.NAME |
The name without the specific element type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DenseEmbeddingResults is an interface, so it doesn't have a NAME constant, so I'll use the simple class name instead.
The name "text embedding" is used in many places where dense vector embeddings are handled, despite the type of the embedding vector not being exclusive to text embeddings. For example, image or multimodal embeddings may also produce a dense vector. To allow future reuse of classes related to dense vectors with multimodal embeddings, the naming is being changed to the more general "dense embedding". Classes which explicitly relate to text embeddings are not being renamed.
This rename is internal to the code only and does not change the name of any JSON objects which currently use "text_embedding", as doing so would be a breaking change.