How to clean the embeddings table data? #23351
Replies: 2 comments 2 replies
-
|
There’s no direct foreign key or automatic cleanup between documents_segments and embeddings in Dify—the link is indirect, based on a hash of the segment content, model name, and provider name. The embeddings table acts as a persistent cache and will accumulate orphaned data over time if segments are deleted or changed reference. To clean up embeddings, you need to identify which embeddings are no longer referenced by any current segment. Since document_segments doesn’t store the hash, you’ll need to recompute the hash for each segment’s content (using the same logic as Dify’s embedding cache), then compare those hashes to what’s in the embeddings table. Any embedding whose hash/model/provider combo isn’t referenced by any current segment can be safely deleted reference. There’s no built-in script for this, so you’ll need to write a cleanup script or SQL query. Always back up your database before running deletes. For large tables, do this in batches and consider locking or running during low-traffic periods to avoid performance issues. Best practice: schedule regular maintenance to remove unreferenced embeddings, and monitor the size of your embeddings table to prevent future bloat reference. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
What impacts will there be if the Embedding table is directly deleted? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Self Checks
Content
What is the correspondence between table documents_segments and table embeddings? I need to clean up some data in the document_degments table. How can I also clean up the corresponding embeddings table data? The embeddings table is already 60GB and needs to be cleaned up.
Beta Was this translation helpful? Give feedback.
All reactions