Ability to add metadata to an existing embedding #4715
Replies: 5 comments 2 replies
-
No, since the text in the /**
* Demonstrates how to add new metadata to existing embeddings in the store
* without recalculating the embeddings themselves.
*
* Since EmbeddingStore does not have a dedicated update/patch method,
* we use a remove-and-re-add approach: find matching entries, update their
* TextSegment metadata in memory, remove the old entries, and re-add them
* with the same IDs (preserving the original embeddings).
*/
public static void addMetadataToMatchingEntries(
EmbeddingStore<TextSegment> store,
Embedding queryEmbedding,
Filter filter,
Map<String, Object> newMetadata) {
// Step 1: Search for entries that match the filter
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.filter(filter)
.maxResults(1000)
.minScore(0.0)
.build();
EmbeddingSearchResult<TextSegment> result = store.search(request);
List<EmbeddingMatch<TextSegment>> matches = result.matches();
if (matches.isEmpty()) {
return;
}
// Step 2: For each match, remove the old entry and re-add with updated metadata
for (EmbeddingMatch<TextSegment> match : matches) {
String id = match.embeddingId();
Embedding embedding = match.embedding();
TextSegment originalSegment = match.embedded();
// Add the new metadata to the existing segment's metadata
originalSegment.metadata().putAll(newMetadata);
// Remove the old entry
store.remove(id);
// Re-add with the same ID — embedding is NOT recalculated
store.add(id, embedding, originalSegment);
}
}
/**
* Usage example
*/
public static void main(String[] args) {
// Assume you have an embedding store and model already set up
// EmbeddingStore<TextSegment> store = ...;
// EmbeddingModel embeddingModel = ...;
// Original ingestion: documents are embedded and stored
// with metadata like {"source": "wiki", "topic": "java"}
// Later, you want to add a new metadata field "reviewed" = "true"
// to all entries where topic = "java", WITHOUT re-embedding.
// Use a dummy query embedding (we rely on the filter, not similarity)
// Embedding queryEmbedding = embeddingModel.embed("dummy").content();
// Filter filter = metadataKey("topic").isEqualTo("java");
// Map<String, Object> newMetadata = Map.of("reviewed", "true");
// addMetadataToMatchingEntries(store, queryEmbedding, filter, newMetadata);
}
}
Key points:
- The embedding vector is preserved from the original EmbeddingMatch — no re-embedding needed.
- store.add(id, embedding, textSegment) re-adds with the same ID, so the entry is effectively replaced (most store implementations treat this as an upsert).
- The search() call requires a queryEmbedding even though we only care about the filter. You can pass any embedding and set minScore(0.0) to get all filtered results. This is a
limitation of the current API.
- The maxResults should be set high enough to cover all matching entries — you may need to paginate if you have many. |
Beta Was this translation helpful? Give feedback.
-
Perhaps some vector stores support this operation natively (feel free to analyze which ones), I can also imagine this being implemented as a default method in the |
Beta Was this translation helpful? Give feedback.
-
|
Before implementing this, it seems to me that the first step towards it should be a proper way to update entries in the EmbeddingStore |
Beta Was this translation helpful? Give feedback.
-
|
Doesn't your workaround lack paging? Is it possible that the full result set wouldn't be returned as you have to create an embedding of the query string? Your workaround could also be expensive considering you're removing then re-adding it to the store. I'm wondering if a new SPI method for this should just throw NotImplementedException or something. Thanks for responding. |
Beta Was this translation helpful? Give feedback.
-
Ofc, but that's just an example
What do you mean?
Yes, it can be expensive. But what is better: slow or no feature at all? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I don't see a way of adding metadata to existing TextSegment entries in an embedding store. Creating embedding entries is expensive. As you try to improve results to your RAG queries you often want to calculate and add new possible metadata filters. Right now, with langchain4j APIs adding new metadata to an entry requires recalculating the entire embedding. (Am I correct on this?)
Would be cool to add this method to EmbeddingsStore and add support for it in those vectordbs that support it:
Volunteering to do this because I need it. If there is an existing way to do it or anybody can think of a better way of defining this API let me know. thanks.
Beta Was this translation helpful? Give feedback.
All reactions