IndexSVSIVF: enable IDFilter and runtime intra_query_threads by ibhati · Pull Request #2 · ibhati/faiss

ibhati · 2026-04-29T00:12:32Z

Follow-up to facebookresearch#4801. Wires up two IndexSVSIVF features that became available after the SVS runtime was extended.

Changes

IDFilter support in search() — SearchParameters::sel is now plumbed into the SVS runtime via make_faiss_id_filter, mirroring IndexSVSVamana::search(). Previously the selector was silently ignored.
Runtime intra_query_threads — search() now syncs intra_query_threads to the backend (alongside num_threads), so it can be tuned after train() / deserialize_impl(). The doc comment in IndexSVSIVF.h is updated; the prior "must be set before train()" limitation no longer applies.

Tests

IVFSearchWithIDSelector — rewritten to assert every non-sentinel label falls in the selector range.
IVFSearchWithRestrictiveFilter (new) — narrow ID window (~10% of dataset) with n_probes = num_centroids to exercise the adaptive filter loop end-to-end.
IVFSetIntraQueryThreadsAfterTrain / DynamicIVFSetIntraQueryThreadsAfterTrain (new) — bump intra_query_threads after add(), then confirm the next search() rebuilds the backend pool and get_intra_query_threads() reports the new value.

- search() now plumbs SearchParameters::sel into the SVS runtime via make_faiss_id_filter, mirroring IndexSVSVamana::search. - search() also syncs intra_query_threads to the backend (in addition to num_threads), so the field can be tuned post-train()/load(). - Tests: * Rewrite IVFSearchWithIDSelector to actually verify returned labels fall inside the selector range (drops the 'IVF does not support IDSelector via SVS runtime' workaround comment). * IVFSearchWithRestrictiveFilter: narrow filter (~10% of dataset), asserts every non-sentinel label is in range. * IVF/DynamicIVFSetIntraQueryThreadsAfterTrain: change intra_query_threads after add(), confirm next search() rebuilds the backend pool and get_intra_query_threads() reflects the new value.

…ult handlers (facebookresearch#5185) Summary: Pull Request resolved: facebookresearch#5185 Three sequential post-BLAS / end_multiple loops in faiss were leaving OMP threads idle while a single thread did all the work. Each is parallelized with `#pragma omp parallel for schedule(static)`, gated by an `if (...)` clause to avoid spawn-cost regressions on small workloads. **Changes** 1. `exhaustive_L2sqr_blas_cmax` (AVX2 + ARM SVE): after `sgemm_` completes, the per-query result accumulation loop ran single-threaded while all OMP threads were idle. Each query `i` reads a distinct row of `ip_block` and writes to `dis_tab[i]/ids_tab[i]` — no cross-query dependencies. Added `#pragma omp parallel for schedule(static) if ((i1 - i0) >= 16)` to both ISA specializations. 2. `HeapBlockResultHandler::end_multiple`: `heap_reorder` is O(k log k) per query and was sequential. The original author left a `// maybe parallel for` comment. `add_results` in the same class already has `#pragma omp parallel for`; `end_multiple` was the only remaining sequential step. Gate: `if ((i1 - i0) * k >= 1024)`. 3. `ReservoirBlockResultHandler::end_multiple`: same pattern — reservoir `to_result` (partial sort, O(capacity)) was sequential despite `add_results` being parallelized. `// maybe parallel for` comment removed and replaced with the actual pragma. Gate: `if ((i1 - i0) * this->k >= 1024)`. The `if (...)` thresholds were chosen from microbenchmark data: below the threshold, OMP fanout cost exceeds the work, producing 3-6× regressions on small batches. Above the threshold, parallelization yields 9-14× speedups at 16 threads. Data independence verified for all three: each loop iteration operates on a disjoint slice of `dis_tab`/`ids_tab` indexed by query `i`. **Benchmark results** A local microbench (not landed) was used for A/B measurement. Host: Intel Sapphire Rapids, 28 physical cores, AVX-512. Pinned with `taskset -c 0-15` (OMP=16) and `taskset -c 0` (OMP=1). Median of 5 reps. Synthetic uniform-random distance distributions. `HeapBlockResultHandler::end_multiple` (us, lower better): | nq | k | parent t=1 | this t=1 | parent t=16 | this t=16 | speedup t=16 | |------:|-----:|-----------:|---------:|------------:|----------:|--------------:| | 64 | 10 | 9.2 | 7.2 | 8.1 | 8.3 | 0.98× (gated) | | 64 | 100 | 340 | 345 | 318 | 67 | 4.79× | | 64 | 1000 | 5,796 | 5,700 | 5,886 | 501 | 11.76× | | 512 | 100 | 2,811 | 2,769 | 2,677 | 312 | 8.59× | | 512 | 1000 | 46,109 | 46,070 | 45,758 | 3,778 | 12.11× | | 4096 | 100 | 22,041 | 21,588 | 21,672 | 1,869 | 11.60× | | 4096 | 1000 | 369,069 | 376,541 | 372,481 | 25,442 | 14.64× | `ReservoirBlockResultHandler::end_multiple` (us): | nq | k | parent t=16 | this t=16 | speedup | |------:|-----:|------------:|----------:|--------------:| | 64 | 10 | 18.0 | 18.1 | 0.99× (gated) | | 64 | 100 | 659 | 96 | 6.86× | | 64 | 1000 | 7,592 | 553 | 13.73× | | 512 | 100 | 5,498 | 490 | 11.21× | | 512 | 1000 | 59,548 | 4,677 | 12.73× | | 4096 | 100 | 44,064 | 3,230 | 13.64× | | 4096 | 1000 | 476,388 | 32,237 | 14.78× | `IndexFlatL2::search` end-to-end — drives `exhaustive_L2sqr_blas_cmax` (ms): | nb | nq | k | parent t=16 | this t=16 | speedup | |------:|------:|----:|------------:|----------:|--------:| | 1024 | 1024 | 10 | 1.71 | 1.45 | 1.18× | | 1024 | 4096 | 100 | 58.5 | 15.5 | 3.78× | | 4096 | 4096 | 100 | 76.9 | 39.4 | 1.95× | Single-threaded paths (OMP=1) are within ±5% of parent across all configurations — the `if (...)` clause makes the pragma a no-op below the threshold, eliminating overhead for serial callers. Caveats: the `IndexFlatL2::search` numbers measure the full search path, so the speedup attributed to change #1 also includes contributions from change #2 (heap handler, also called by this path). The `end_multiple` numbers isolate the changed function via `PauseTiming`/`ResumeTiming` around setup. ARM SVE not measured directly (no Graviton host); the AVX2 numbers are the strongest available proxy. Reviewed By: mnorris11 Differential Revision: D103830810 fbshipit-source-id: 8434fa6f16b78c5ff7b2244ac5d5fe9cc8c012a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexSVSIVF: enable IDFilter and runtime intra_query_threads#2

IndexSVSIVF: enable IDFilter and runtime intra_query_threads#2
ibhati wants to merge 1 commit into
ib/svs_ivffrom
ib/svs_ivf_filter_intraquery

ibhati commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ibhati commented Apr 29, 2026

Changes

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant