Question about Vector Insertion and Index Building #5023
-
Hi there, firstly, I got a few questions about my experiment with Milvus. Then I do have some questions about vector insertion and when does index build. --- my questions here --- ii. I got extremely poor testing results from 1. h. : the searching time is extremely high around 700 seconds per query. I feel like Milvus did brute-force during the test. While the result from get_collection_stats function shows all segments have already got PQ index yet. So I'm wondering that does my Milvus did use index for searching or just do the brute-force? Besides, does Milvus really build index during inserting? iii. I notice that there are two background operations in Milvus: merging segments and build the index. I'm wondering that if both operations are needed, which one would be done first? Also, can I do the merging operation manually? iv. If I keep inserting continuous data to a partition in a collection with a small auto_flush_interval [1 second]. May the small auto_flush_interval number would increase the number of segments at this partition? v. In "Performance Tuning" section, there is a sentence says: "In scenario with continuous data insertion, because Milvus does not index segments with a size less than index_file_size, it uses brute-force search as the query method. ". Does it mean that once the segment is inserted over index_file_size, the segment would be built index immediately? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 10 replies
-
No, it doesn't.
Please refer to create_index.
Please refer to merge-data
Yes, it will in a short time.
Need to wait for merging segments. |
Beta Was this translation helpful? Give feedback.
-
Merge operation only happens on segments which size less than index_file_size. Two ways to create index:
The way_1, milvus automatically build index for segments which size larger than index_file_size For your question ii, I guess you are using the way_1. Since the index type has been specified, you saw all the segments are marked as "PQ", but in fact, their build index tasks maybe are still working, index not finished. So you found the search is brute-force search. |
Beta Was this translation helpful? Give feedback.
No, it doesn't.
Disk IO is the main factor that affects the insertion performance. Multi-threading doesn't help it.
Please refer to create_index.
According to your description, you firstly specified the index and then inserted the data. So the data inserted must reach index_ file_size (1024 Mb), the index would be created. How about your row_count of the collection 1?
But if your data is static, you can insert the data before specif…