Strategy for Inserting 100 Million Documents into Vector Database #32754
Unanswered
MartinMashalov
asked this question in
Q&A and General discussion
Replies: 2 comments 1 reply
-
We recommend batch insert for Milvus. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thank you for your response! How can I do this batch insert (from JSON file or through some Python integration)? Any reference to documentation would be super helpful.
Thank you so much!
Martin
…
On May 2, 2024, at 04:42, groot ***@***.***> wrote:
We recommend batch insert for Milvus.
The RPC transfer size limit is 64MB for each insert call. So, it is better to insert data batch by batch with each batch size between 20~40MB.
Each dimension is a float32 value. So, for 1536-dim embeddings, you can insert 3000 rows ~ 7000 rows for each batch. If there are some other metadata along with the embeddings, especially long-length strings, you might reduce the row count to 1000 ~ 2000 rows for each batch.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there!
I am trying to insert 100 million documents into the Mulvis vector database. I am using openai embeddings for the documents. Since I am new to Mulvis, I am wondering what the best strategy would be to insert all of these documents efficiently and also as quickly as possible. Would batch indexing be a good idea? Has anyone else encountered a similar problem and figured out how to overcome it? I would appreciate any guidance on this issue.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions