AISAQ 1.2B vectors: 600s+ query latency despite vectorIndex: sync warmup #49397

Se7enquick · 2026-04-27T15:28:15Z

Se7enquick
Apr 27, 2026

Setup:

Milvus 2.6.15, 5 QueryNodes (i4i.4xlarge: 12 CPU, 96GB RAM, 3TB NVMe each)
3 collections: titles (322M), abstracts (260M), papers (1.2B with dense+sparse)
1024-dim float32 vectors, IP metric, AISAQ index
RAM usage: ~30GB/96GB per node (plenty of headroom)

AISAQ build params:
inline_pq: -1, rearrange: true, max_degree: 56, search_list_size: 100,
num_entry_points: 1000, pq_cache_size: 16777216 (16MB),
search_cache_budget_gb_ratio: 0, pq_code_budget_gb_ratio: 0.125

Tiered storage config:
warmup:
scalarField: sync
scalarIndex: sync
vectorField: disable
vectorIndex: sync

Problem:
After all segments are Loaded (100%) with vectorIndex: sync, the first search query on papers_embeddings takes 600-3900 seconds. Subsequent queries are 20-70s, then gradually
improve to <5s. Titles and abstracts behave similarly but with shorter warmup times (10-200s first query). But after some time - again becoming slow

QueryNode logs show AiSAQ Other Exception: Future was cancelled on many segments during the first search, even though sync warmup completed successfully (each segment loads in
~20ms). The sync warmup loads PQ pivots, entry points, and rearrange metadata, but the actual _disk.index graph files on NVMe are only read during the first search, causing massive
I/O contention across ~584 segments per node.

For papers specifically, the first query often hits proxy timeout (~80 min) and returns UNAVAILABLE, making the collection effectively unusable after any pod restart.

What we tried (no improvement for cold start):

gracefulStopTimeout: 30, lazyload.requestResourceTimeout: 15000, maxRetryTimes: 5
pq_read_page_cache_size: 31457280 (30MB per thread)
beamwidth: 8, vectors_beamwidth: 2
memoryLowWatermarkRatio: 0.75 / 0.85
Resource groups (separating collections to different nodes) — made papers worse
Collection-level warmup override (warmup.vectorIndex: sync)

What we can't do:

alter_index_properties for search_cache_budget_gb_ratio — returns "not a configable index property"
mmap.vectorIndex: true — AISAQ doesn't support mmap
enableDisk: false — AISAQ requires disk-based search

Questions:

Is this expected behavior for AISAQ with search_cache_budget_gb_ratio=0? Does sync warmup only load metadata and NOT pre-read _disk.index graph files into OS page cache?
Would rebuilding with search_cache_budget_gb_ratio=0.03 actually eliminate the cold start, or would it only help with repeated queries?
Is there any way to pre-warm the NVMe page cache for AISAQ disk.index files after segment load, before the segment becomes queryable?
The Ask AI bot keeps recommending enableDisk: false and mmap.vectorIndex: false for AISAQ — this contradicts AISAQ being a disk-based index. Can someone clarify?

yhmo · 2026-04-28T09:40:40Z

yhmo
Apr 28, 2026
Collaborator

Wait @foxspy to answer the questions.

0 replies

foxspy · 2026-04-28T12:54:42Z

foxspy
Apr 28, 2026
Collaborator

Based on the behavior described above, my assessment is that this does not fully match expectations. AISAQ uses O_DIRECT I/O, so we should not expect OS page-cache-based warmup behavior.

Could you share the exact query pattern or test procedure you used?

One important detail is that inline_pq=-1 means no PQ vectors are inlined into the graph nodes, so random I/O during graph search will hit the disk directly, which can make the query very slow. Please also verify whether the results returned by the 5-second and 600-
second queries are consistent.

Could you share the query script, QueryNode WARN/ERROR logs around load and first search, perf flame graphs, and CPU/NVMe metrics such as iostat -x 1? Please also run the same query in both the 5s and 600s cases and compare the returned topK IDs/distances.

8 replies

Se7enquick Apr 29, 2026
Author

Thank you very much
I am rebuilding the index now with this parameters:

inline_pq=0 - scale mode,
pq_cache_size='16777216' - 16MB per segment в RAM (I hope I got it right and this is the 'per segment' parameter and not per index)
search_cache_budget_gb_ratio=0.05 - hot cache
rearrange=true - disk locality
max_degree=48 - little less

We have default timeout settings
The brute-force search is 100% not the case, this collection was working with default settings on 5 querynodes and after a cold start with warmup:sync it was performing very good (~5 seconds per query with 16-32 topK), but after adding the other 2 collections (both ~300 mil embeddings) - they started to fight for resources and each different collection query was triggering a cold start, so basically cache was constantly re-building. We solved this by creating resource groups, now that we have 3 resource groups - 2 for smaller collection and 3 for the bigger one - I'll try to test it out after index will be rebuilt

We specifically using AISAQ to lower our cost, our resources are limited to 5 querynodes of 96 GB RAM, and speed is not the top pritority. Why do you think DISKANN will be faster and better for us? We haven't thought on that since AISAQ is the evolution of DISKANN

Thank you very much for clarification, I am really appreciate that!

xiaofan-luan Apr 29, 2026
Maintainer

Thank you very much I am rebuilding the index now with this parameters:

inline_pq=0 - scale mode,

pq_cache_size='16777216' - 16MB per segment в RAM (I hope I got it right and this is the 'per segment' parameter and not per index)

search_cache_budget_gb_ratio=0.05 - hot cache

rearrange=true - disk locality

max_degree=48 - little less

We have default timeout settings The brute-force search is 100% not the case, this collection was working with default settings on 5 querynodes and after a cold start with warmup:sync it was performing very good (~5 seconds per query with 16-32 topK), but after adding the other 2 collections (both ~300 mil embeddings) - they started to fight for resources and each different collection query was triggering a cold start, so basically cache was constantly re-building. We solved this by creating resource groups, now that we have 3 resource groups - 2 for smaller collection and 3 for the bigger one - I'll try to test it out after index will be rebuilt

We specifically using AISAQ to lower our cost, our resources are limited to 5 querynodes of 96 GB RAM, and speed is not the top pritority. Why do you think DISKANN will be faster and better for us? We haven't thought on that since AISAQ is the evolution of DISKANN

Thank you very much for clarification, I am really appreciate that!

If you are working on a discovery search case, do take a look at our latest zilliz cloud on demand search feature.
The whole index is stored on S3 and search takes 10-30 s, but it is fully pay as you go so we only charge storage cost if no active search happens. The product will be released on May 7th!

Se7enquick Apr 29, 2026
Author

Thank you very much I am rebuilding the index now with this parameters:

inline_pq=0 - scale mode,

pq_cache_size='16777216' - 16MB per segment в RAM (I hope I got it right and this is the 'per segment' parameter and not per index)

search_cache_budget_gb_ratio=0.05 - hot cache

rearrange=true - disk locality

max_degree=48 - little less

We have default timeout settings The brute-force search is 100% not the case, this collection was working with default settings on 5 querynodes and after a cold start with warmup:sync it was performing very good (~5 seconds per query with 16-32 topK), but after adding the other 2 collections (both ~300 mil embeddings) - they started to fight for resources and each different collection query was triggering a cold start, so basically cache was constantly re-building. We solved this by creating resource groups, now that we have 3 resource groups - 2 for smaller collection and 3 for the bigger one - I'll try to test it out after index will be rebuilt
We specifically using AISAQ to lower our cost, our resources are limited to 5 querynodes of 96 GB RAM, and speed is not the top pritority. Why do you think DISKANN will be faster and better for us? We haven't thought on that since AISAQ is the evolution of DISKANN
Thank you very much for clarification, I am really appreciate that!

If you are working on a discovery search case, do take a look at our latest zilliz cloud on demand search feature. The whole index is stored on S3 and search takes 10-30 s, but it is fully pay as you go so we only charge storage cost if no active search happens. The product will be released on May 7th!

we used Zilliz before, but it became too expensive for the company for our infra, that's the reason why we're moving to self-hosted?

xiaofan-luan Apr 30, 2026
Maintainer

Thank you very much I am rebuilding the index now with this parameters:

inline_pq=0 - scale mode,

pq_cache_size='16777216' - 16MB per segment в RAM (I hope I got it right and this is the 'per segment' parameter and not per index)

search_cache_budget_gb_ratio=0.05 - hot cache

rearrange=true - disk locality

max_degree=48 - little less

We have default timeout settings The brute-force search is 100% not the case, this collection was working with default settings on 5 querynodes and after a cold start with warmup:sync it was performing very good (~5 seconds per query with 16-32 topK), but after adding the other 2 collections (both ~300 mil embeddings) - they started to fight for resources and each different collection query was triggering a cold start, so basically cache was constantly re-building. We solved this by creating resource groups, now that we have 3 resource groups - 2 for smaller collection and 3 for the bigger one - I'll try to test it out after index will be rebuilt
We specifically using AISAQ to lower our cost, our resources are limited to 5 querynodes of 96 GB RAM, and speed is not the top pritority. Why do you think DISKANN will be faster and better for us? We haven't thought on that since AISAQ is the evolution of DISKANN
Thank you very much for clarification, I am really appreciate that!

If you are working on a discovery search case, do take a look at our latest zilliz cloud on demand search feature. The whole index is stored on S3 and search takes 10-30 s, but it is fully pay as you go so we only charge storage cost if no active search happens. The product will be released on May 7th!

we used Zilliz before, but it became too expensive for the company for our infra, that's the reason why we're moving to self-hosted?

We are rolling out a new product named lake search, where you can put all your data and index on your object storage? This can saves your cost up to ten times. If you are interested to be our pilot users, please reach me out at james.luan@zilliz.com

Se7enquick Apr 30, 2026
Author

Sounds good, we will definetely think about it!

For now we still have problems, after rebuilding index looks like it does not fit in memory now...

{"level":"ERROR","time":"2026/04/30 01:51:42.393 +00:00","caller":"funcutil/parallel.go:89","message":"loadSegmentFunc","error":"At LoadSegment: At Load: ChunkedSegmentSealedImpl::LoadFieldData() cancelled for segment 465896860654863196 field 0","errorVerbose":"At LoadSegment: At Load: ChunkedSegmentSealedImpl::LoadFieldData() cancelled for segment 465896860654863196 field 0\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/querynodev2/segments.(*segmentLoader).Load.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/querynodev2/segments/segment_loader.go:386\n | [...repeated from below...]\nWraps: (2) At LoadSegment\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/querynodev2/segments.(*segmentLoader).loadSealedSegment.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/querynodev2/segments/segment_loader.go:939\n | github.com/milvus-io/milvus/pkg/v2/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:82\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/v2@v2.11.3/worker.go:73\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1693\nWraps: (4) At Load\nWraps: (5) ChunkedSegmentSealedImpl::LoadFieldData() cancelled for segment 465896860654863196 field 0\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) merr.milvusError","idx":0,"stack":"github.com/milvus-io/milvus/pkg/v2/util/funcutil.ProcessFuncParallel.func3\n\t/go/src/github.com/milvus-io/milvus/pkg/util/funcutil/parallel.go:89"}

xiaofan-luan · 2026-04-30T02:56:41Z

xiaofan-luan
Apr 30, 2026
Maintainer

the load is canceled. might because load is too slow.
you will need to change the timeout if possible.

4 replies

Se7enquick Apr 30, 2026
Author

The thing is that after ~10 minutes it shows 20/96 RAM usage and 0% loaded, so probably more of an OOM possibility issue again because of search_cache_budget_gb_ratio=0.05 is too high...
The problem is that queryCoord.segmentTaskTimeout (default 120s) cancels segment loading before it completes. Each segment takes ~10s to download from S3 + deserialize, but with 1458 segments loading in parallel batches, the total exceeds the timeout. We set queryCoord.segmentTaskTimeout: 600000 in config but it doesn't seem to take effect. With search_cache_budget_gb_ratio=0.05, the _cached_nodes.bin files add ~5s per segment. We're now rebuilding with search_cache_budget_gb_ratio=0 to eliminate this. What is the correct way to increase the segment load timeout?

Maybe you can advice the optimized config for our case? We can add 1 more node (so 6x96)

xiaofan-luan Apr 30, 2026
Maintainer

The default load timeout is only 10 minutes.
If load last more than 10 minutes the whole load operation will fail and rolbck.

you will need to modify queryCoord.loadTimeoutSeconds , by default is 600s

foxspy Apr 30, 2026
Collaborator

Tuning pq_cache_size can significantly improve AISAQ performance. Note that this setting is applied at the segment level, so we need to be careful to avoid OOM errors.

What are your accuracy and perf requirements? If pq_code_budget_gb_ratio=1/16 works for you, DiskANN might be worth a try. It should be a lot faster for 1024-dim vectors, though accuracy may be a bit worse.

The warmup behavior in the queries above is still unclear. Once the data is loaded, we can first check whether the slow and fast queries return correct results.

Se7enquick May 2, 2026
Author

@foxspy yes they are returning correct results, this just a problem that stays even after re-loading, cold queries are taking more then 1 hour to complete. We've re-built the index of the biggest collection once again with suggested params, spent 2 days waiting, and we still cannot perform a query there, something in the index params just seems to be off
Papers dense index (dense_index):

index_type: AISAQ                                                                                                                                                                  
metric_type: IP                                                                                                                                                                    
max_degree: 48                                                                                                                                                                     
search_list_size: 100                                                                                                                                                              
inline_pq: 0                                                                                                                                                                       
rearrange: True                                                                                                                                                                    
num_entry_points: 1000                                                                                                                                                             
pq_code_budget_gb_ratio: 0.125                                                                                                                                                     
disk_pq_code_budget_gb_ratio: 0.25
pq_cache_size: 16777216 (16 MiB)
search_cache_budget_gb_ratio: 0

Field: dense_embedding, dim=1024, 1.2B rows.

AISAQ 1.2B vectors: 600s+ query latency despite vectorIndex: sync warmup #49397

Uh oh!

Uh oh!

Se7enquick Apr 27, 2026

Replies: 3 comments · 12 replies

Uh oh!

yhmo Apr 28, 2026 Collaborator

Uh oh!

foxspy Apr 28, 2026 Collaborator

Uh oh!

Uh oh!

Se7enquick Apr 29, 2026 Author

Uh oh!

xiaofan-luan Apr 29, 2026 Maintainer

Uh oh!

Se7enquick Apr 29, 2026 Author

Uh oh!

xiaofan-luan Apr 30, 2026 Maintainer

Uh oh!

Se7enquick Apr 30, 2026 Author

Uh oh!

xiaofan-luan Apr 30, 2026 Maintainer

Uh oh!

Uh oh!

Se7enquick Apr 30, 2026 Author

Uh oh!

xiaofan-luan Apr 30, 2026 Maintainer

Uh oh!

foxspy Apr 30, 2026 Collaborator

Uh oh!

Se7enquick May 2, 2026 Author

Se7enquick
Apr 27, 2026

Replies: 3 comments 12 replies

yhmo
Apr 28, 2026
Collaborator

foxspy
Apr 28, 2026
Collaborator

Se7enquick Apr 29, 2026
Author

xiaofan-luan Apr 29, 2026
Maintainer

Se7enquick Apr 29, 2026
Author

xiaofan-luan Apr 30, 2026
Maintainer

Se7enquick Apr 30, 2026
Author

xiaofan-luan
Apr 30, 2026
Maintainer

Se7enquick Apr 30, 2026
Author

xiaofan-luan Apr 30, 2026
Maintainer

foxspy Apr 30, 2026
Collaborator

Se7enquick May 2, 2026
Author