AISAQ 1.2B vectors: 600s+ query latency despite vectorIndex: sync warmup #49397
Replies: 3 comments 12 replies
-
|
Wait @foxspy to answer the questions. |
Beta Was this translation helpful? Give feedback.
-
|
Based on the behavior described above, my assessment is that this does not fully match expectations. AISAQ uses O_DIRECT I/O, so we should not expect OS page-cache-based warmup behavior. Could you share the exact query pattern or test procedure you used? One important detail is that inline_pq=-1 means no PQ vectors are inlined into the graph nodes, so random I/O during graph search will hit the disk directly, which can make the query very slow. Please also verify whether the results returned by the 5-second and 600- Could you share the query script, QueryNode WARN/ERROR logs around load and first search, perf flame graphs, and CPU/NVMe metrics such as iostat -x 1? Please also run the same query in both the 5s and 600s cases and compare the returned topK IDs/distances. |
Beta Was this translation helpful? Give feedback.
-
|
the load is canceled. might because load is too slow. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Setup:
AISAQ build params:
inline_pq: -1, rearrange: true, max_degree: 56, search_list_size: 100,
num_entry_points: 1000, pq_cache_size: 16777216 (16MB),
search_cache_budget_gb_ratio: 0, pq_code_budget_gb_ratio: 0.125
Tiered storage config:
warmup:
scalarField: sync
scalarIndex: sync
vectorField: disable
vectorIndex: sync
Problem:
After all segments are Loaded (100%) with vectorIndex: sync, the first search query on papers_embeddings takes 600-3900 seconds. Subsequent queries are 20-70s, then gradually
improve to <5s. Titles and abstracts behave similarly but with shorter warmup times (10-200s first query). But after some time - again becoming slow
QueryNode logs show AiSAQ Other Exception: Future was cancelled on many segments during the first search, even though sync warmup completed successfully (each segment loads in
~20ms). The sync warmup loads PQ pivots, entry points, and rearrange metadata, but the actual _disk.index graph files on NVMe are only read during the first search, causing massive
I/O contention across ~584 segments per node.
For papers specifically, the first query often hits proxy timeout (~80 min) and returns UNAVAILABLE, making the collection effectively unusable after any pod restart.
What we tried (no improvement for cold start):
What we can't do:
Questions:
Beta Was this translation helpful? Give feedback.
All reactions