Skip to content

0.1.7 — Optimized CUDA MaxSim + RROQ158 kernel + rename to colsearch + bugfixes

Latest

Choose a tag to compare

@ddickmann ddickmann released this 21 Apr 16:10

Highlights

This release ships three things at once:

1. Optimized CUDA kernels for MaxSim and RROQ158 on H100

  • Fused single-pass mma.sync.b1.b1.s32.and.popc kernel for RROQ158
  • Multi-tier (32/64/128/256/512) padded MaxSim dispatcher
  • Autotune-aware warmup at production batch sizes
  • int64-pointer fix in the Triton MaxSim kernel
  • GPU full-corpus fast-path that bypasses LEMUR routing when the preloaded corpus already pays the VRAM cost
  • Persistent query/corpus scratch buffers

Result on BEIR-8 (H100): 3.12× geomean QPS over FastPlaid at fp16 (446.7 vs 143.2 QPS), 2.06× at rroq158_gs128 (294.4 QPS). See benchmarks/competitive_benchmark.md.

2. Rename voyager-indexcolsearch

  • PyPI: pip install colsearch
  • GitHub: ddickmann/colsearch (the old ddickmann/voyager-index URL auto-redirects)
  • Python package: colsearch (with a voyager_index compat shim)
  • Console script: colsearch-server (voyager-index-server retained as an alias)

3. Bugfixes

  • Bench disk-tightening: os.rename / os.link instead of shutil.copytree (~100 GB peak disk saved on BEIR-8 sweep).
  • CPU whole-corpus fast-path that bypasses the per-query 522k-row numpy fancy-index gather (was making quora rroq158/cpu allocate ~5 GB/query and hang for 90+ minutes).
  • RROQ158 hot-path alloc-churn fix that eliminated a 1.2 GB/call churn forcing CUDA allocator GC every ~200 queries.
  • Reference-api Dockerfile installs Rust via rustup (pinned 1.94.1) instead of Debian's apt rustc 1.85, which predated the avx512_target_feature stabilization (1.89+) used by the AVX-512 VPOPCNTDQ tier — unblocks docker-smoke.
  • Release pipeline now uses skip-existing: true on PyPI publish so colsearch can ship without forcing a no-op version bump on the independently-versioned native crates.

Compatibility (rename)

  • pip install voyager-index users: install colsearch instead. Legacy distribution stops at 0.1.6.
  • import voyager_index keeps working in 0.1.7 via a thin shim that emits a single DeprecationWarning and eagerly aliases every voyager_index.X.Y submodule to the canonical colsearch.X.Y module in sys.modules so isinstance and enum identity continue to hold. Removed in 0.2.0.
  • voyager-index-server console script aliased to colsearch-server for the 0.1.x cycle, removed in 0.2.0.
  • VOYAGER_INDEX_PATH env var still honoured but emits a deprecation warning. Migrate to COLSEARCH_INDEX_PATH.
  • VOYAGER_BENCH_CPU_TIME_BUDGET_S is honoured; COLSEARCH_BENCH_CPU_TIME_BUDGET_S is the new canonical name.
  • Other VOYAGER_* env vars (VOYAGER_RROQ158_N_THREADS, VOYAGER_RROQ158_USE_B1_FUSED) are unchanged in 0.1.7 to avoid breaking ops scripts.
  • Docker image: latence/colsearch (was latence/voyager-zero); reference Dockerfile builds colsearch:latest.
  • Kubernetes manifests under deploy/k8s/ ship the new colsearch namespace and labels — adjust your overlays.

Native packages

The native crates (latence-shard-engine, latence-solver) are versioned independently of the root colsearch package and only need a republish when their Rust source changes. 0.1.7 is a metadata-only change for them; the published wheels stay at 0.1.6.

Provenance

Historical artefacts under reports/, validation-reports/, research/, notebooks/, and benchmarks/_smoke_*.py / benchmarks/_diag_*.py retain their original voyager-index 0.1.x labels and SHAs. Run identifiers in reports/fast_plaid_head_to_head/results_v7.jsonl (e.g. voyager_fp16, voyager_rroq158_gs128) and the corresponding benchmarks/fast_plaid_head_to_head.py --libraries flags are kept stable so historical JSONL stays readable; they map 1:1 to the colsearch fp16 / rroq158_gs128 lanes.

Install

pip install colsearch

Migrating from voyager-index:

pip uninstall voyager-index
pip install colsearch
# `import voyager_index` keeps working until 0.2.0; update to `import colsearch` at your leisure.

See the full CHANGELOG.md entry for 0.1.7.