Releases: latenceainew/colsearch
0.1.7 — Optimized CUDA MaxSim + RROQ158 kernel + rename to colsearch + bugfixes
Highlights
This release ships three things at once:
1. Optimized CUDA kernels for MaxSim and RROQ158 on H100
- Fused single-pass
mma.sync.b1.b1.s32.and.popckernel for RROQ158 - Multi-tier (32/64/128/256/512) padded MaxSim dispatcher
- Autotune-aware warmup at production batch sizes
- int64-pointer fix in the Triton MaxSim kernel
- GPU full-corpus fast-path that bypasses LEMUR routing when the preloaded corpus already pays the VRAM cost
- Persistent query/corpus scratch buffers
Result on BEIR-8 (H100): 3.12× geomean QPS over FastPlaid at fp16 (446.7 vs 143.2 QPS), 2.06× at rroq158_gs128 (294.4 QPS). See benchmarks/competitive_benchmark.md.
2. Rename voyager-index → colsearch
- PyPI:
pip install colsearch - GitHub:
ddickmann/colsearch(the oldddickmann/voyager-indexURL auto-redirects) - Python package:
colsearch(with avoyager_indexcompat shim) - Console script:
colsearch-server(voyager-index-serverretained as an alias)
3. Bugfixes
- Bench disk-tightening:
os.rename/os.linkinstead ofshutil.copytree(~100 GB peak disk saved on BEIR-8 sweep). - CPU whole-corpus fast-path that bypasses the per-query 522k-row numpy fancy-index gather (was making
quora rroq158/cpuallocate ~5 GB/query and hang for 90+ minutes). - RROQ158 hot-path alloc-churn fix that eliminated a 1.2 GB/call churn forcing CUDA allocator GC every ~200 queries.
- Reference-api Dockerfile installs Rust via rustup (pinned 1.94.1) instead of Debian's apt rustc 1.85, which predated the
avx512_target_featurestabilization (1.89+) used by the AVX-512 VPOPCNTDQ tier — unblocksdocker-smoke. - Release pipeline now uses
skip-existing: trueon PyPI publish so colsearch can ship without forcing a no-op version bump on the independently-versioned native crates.
Compatibility (rename)
pip install voyager-indexusers: installcolsearchinstead. Legacy distribution stops at0.1.6.import voyager_indexkeeps working in0.1.7via a thin shim that emits a singleDeprecationWarningand eagerly aliases everyvoyager_index.X.Ysubmodule to the canonicalcolsearch.X.Ymodule insys.modulessoisinstanceand enum identity continue to hold. Removed in0.2.0.voyager-index-serverconsole script aliased tocolsearch-serverfor the 0.1.x cycle, removed in 0.2.0.VOYAGER_INDEX_PATHenv var still honoured but emits a deprecation warning. Migrate toCOLSEARCH_INDEX_PATH.VOYAGER_BENCH_CPU_TIME_BUDGET_Sis honoured;COLSEARCH_BENCH_CPU_TIME_BUDGET_Sis the new canonical name.- Other
VOYAGER_*env vars (VOYAGER_RROQ158_N_THREADS,VOYAGER_RROQ158_USE_B1_FUSED) are unchanged in 0.1.7 to avoid breaking ops scripts. - Docker image:
latence/colsearch(waslatence/voyager-zero); reference Dockerfile buildscolsearch:latest. - Kubernetes manifests under
deploy/k8s/ship the newcolsearchnamespace and labels — adjust your overlays.
Native packages
The native crates (latence-shard-engine, latence-solver) are versioned independently of the root colsearch package and only need a republish when their Rust source changes. 0.1.7 is a metadata-only change for them; the published wheels stay at 0.1.6.
Provenance
Historical artefacts under reports/, validation-reports/, research/, notebooks/, and benchmarks/_smoke_*.py / benchmarks/_diag_*.py retain their original voyager-index 0.1.x labels and SHAs. Run identifiers in reports/fast_plaid_head_to_head/results_v7.jsonl (e.g. voyager_fp16, voyager_rroq158_gs128) and the corresponding benchmarks/fast_plaid_head_to_head.py --libraries flags are kept stable so historical JSONL stays readable; they map 1:1 to the colsearch fp16 / rroq158_gs128 lanes.
Install
pip install colsearchMigrating from voyager-index:
pip uninstall voyager-index
pip install colsearch
# `import voyager_index` keeps working until 0.2.0; update to `import colsearch` at your leisure.See the full CHANGELOG.md entry for 0.1.7.
0.1.6 — RROQ158 SOTA Default at group_size=128
This release promotes the dim-aware Rroq158Config(group_size=128) lane to
the build-time default for newly created RROQ158 indexes — closing the
production-validation arc started in 0.1.5 (Phase 7 / Phase 8).
Highlights
- ~13% smaller per-token storage (~40 vs ~46 bytes/token at dim=128;
~6.4× smaller than fp16, up from ~5.5×). - CPU p95 ~10–30% faster on the BEIR-6 mean (one fewer scale load per
group in the popcount kernel; nfcorpus −22%, scifact −15%, scidocs −10%,
fiqa within +2% noise). - NDCG@10 within ±0.005 of the previous gs=32 baseline; per-dataset
mean Δ vs gs=32 across BEIR-6 = +0.0006 — Pareto-equal in quality
while delivering smaller storage AND lower-or-equal CPU p95 on every
dataset measured. - Dim-aware fallback
_resolve_group_size(requested, dim)
transparently steps down togs=64/gs=32(with a log warning) on
production corpora whose token dim is not divisible by 128, so the new
default works on dim=64 / 96 / 128 / 160 alike without caller changes.
Headline BEIR-6 averages (rroq158 gs=128 default)
| Codec | NDCG@10 | R@100 | GPU p95 (ms) | CPU p95 (ms) |
|---|---|---|---|---|
| fp16 (baseline) | 0.5206 | 0.7360 | 4.0 (1.00×) | 103 (1.00×) |
| rroq158 (gs=128, default) | 0.5069 | 0.7298 | 4.8 (1.20×) | 310 (3.00×) |
| rroq158 (gs=32) | 0.5063 | 0.7282 | 4.8 (1.20×) | 325 (3.15×) |
| rroq4_riem | 0.5158 | 0.7345 | 8.5 (2.13×) | 580 (5.63×) |
Full per-dataset / per-codec sweep:
reports/beir_2026q2_gs128/.
Migration
Existing on-disk indexes are unaffected — the manifest carries the
build-time group_size and only newly built indexes pick up the new
default.
- Pin
Rroq158Config(group_size=32)to restore the previous default
exactly. - Pin
Rroq158Config(group_size=64)for the safest cross-dataset choice
(covers high-intra-token-variance corpora like arguana).
See docs/guides/quantization-tuning.md
for the full decision matrix and per-dim recipe table.
Install
pip install --upgrade voyager-index==0.1.6
# native (optional, for the Rust SIMD CPU kernel + tabu solver):
pip install --upgrade voyager-index[native]==0.1.6
# full (server + multimodal + native + GPU triton):
pip install --upgrade voyager-index[full,gpu]==0.1.6Full changelog and merged PRs in
CHANGELOG.md.
0.1.5
Changelog
This changelog tracks the official shipped OSS release line. Older draft notes
that did not correspond to a published release were removed so version history
reads in release order again.
Unreleased
0.1.5 — Release Gate Hotfix
This release republishes the shard-engine decomposition work on a clean CI line
after fixing the small lint regressions that slipped through the initial 0.1.4
cut.
Release integrity
- fixed the shard refactor parity script bootstrap so the release lint lane
accepts the repo-local import setup - normalized import ordering and explicit public exports in the refactor-touching
files that failed the hosted Ruff gate - bumped the root package and supported native packages onto the
0.1.5line
so the hotfix release cleanly supersedes the drafted0.1.4cut
0.1.4 — Shard Engine Decomposition And Release Evidence
This release keeps the shard product surface stable while decomposing the large
shard-engine modules behind compatibility facades and hardening the parity
evidence required to ship that refactor safely.
Shard engine maintainability
- split the shard manager, store, fetch pipeline, LEMUR router, builder, WAL,
and ColBANDIT reranker into focused internal modules while preserving public
import paths - reduced config coupling by separating serving configuration from sweep-only
configuration behind compatibility exports - introduced internal protocols for router, store, fetch, reranker, and native
exact backends to narrow cross-module ownership
Runtime capability visibility
- surfaced fallback and capability state for LEMUR routing, pinned staging, and
native exact execution through shard statistics and reference API metadata - added startup logging for shard capability selection so development and
production runs expose fallback decisions explicitly
Validation and release confidence
- added shard refactor contract coverage for import compatibility, artifact
parity, query trace stability, and runtime capability reporting - added a machine-readable shard refactor parity report and wired it into CI so
release evidence is reproducible instead of ad hoc - bumped the root package and supported native packages onto the
0.1.4line - refreshed release hygiene checks to validate the aligned package versions
0.1.3 — Production Release Hardening
This release closes the gap between the public product story and the shipped
package, native-wheel, and release pipeline surfaces.
Packaging and install surface
- added a canonical
voyager-index[full]install profile for the full public CPU-safe surface - added
shard-nativeand broadenednativeso the public native story now covers bothlatence-shard-engineandlatence-solver - bumped the root package and supported native packages onto the
0.1.3line - tightened package data so the shipped sdist includes the graph quality fixture required by release validation
Graph-aware production path
- kept
latence-graphas a public optional extra and pinned it to the verified publiclatence>=0.1.1line - clarified throughout the docs that the graph lane can consume compatible prebuilt graph data directly and remains additive to the shard-first hot path
- preserved the graph route-conformance, provenance, and retrieval-uplift evidence as a distinct proof layer from shard performance benchmarks
CI, release, and OSS hygiene
- expanded the native release bundle to include the shard-engine wheel alongside the solver wheel
- tightened release documentation and automation around clean-install rehearsal, native-wheel validation, and publish gating
- refreshed the README, install docs, issue templates, and contributor guidance around the supported production lane
- added repo-governance files for dependency updates, code ownership, and contributor conduct
0.1.2 — Shard Production Surface
This release makes the shard engine the clear public product surface.
Retrieval and serving
- production-wired shard search with LEMUR routing, ColBANDIT, and Triton MaxSim
- shard scoring controls exposed for
int8,fp8, androq4 - durable CRUD, WAL, checkpoint, recovery, and shard admin endpoints
- multi-worker single-host reference server posture
API and SDK
- base64 vector transport helpers exposed from
voyager_index.transport - public HTTP API accepts base64 payloads for dense and multivector requests
- shard configuration knobs surfaced on collection create, search, and info APIs
- dense hybrid mode selection documented and shipped as
rrfortabu
Docs and DX
- README, quickstart, API docs, and top-level guides rewritten around the shard-first story
- benchmark methodology documented with a 100k comparison placeholder table
- reference API examples now lead with base64 and shard-friendly install profiles
Release and packaging
- release notes and changelog chronology cleaned up
- CI trimmed to shard-only production lanes plus solver validation
- supported native add-on story reduced to
latence_solver
0.1.0 — Initial OSS Foundation Release
Initial public package release for voyager-index.
Foundation
- installable
voyager_indexpackage and published OSS packaging surface - durable reference FastAPI service
- dense, late-interaction, and multimodal collection kinds
- CRUD, restart-safe persistence, and public examples
Retrieval
- exact MaxSim exports through the public package
- CPU-safe MaxSim fallback when Triton is unavailable
- hybrid dense + BM25 retrieval
- optional solver-backed refinement via
latence_solver
Multimodal
- preprocessing helpers for renderable source documents
- multimodal model registry and provider seams
- ColPali-oriented multimodal retrieval surface
v0.1.3
Production release hardening for the full public voyager-index surface.
- adds the canonical
voyager-index[full]install profile for the full CPU-safe production lane - publishes both supported native packages on the
0.1.3line:latence-shard-engineandlatence-solver - keeps the Latence graph lane optional, additive, and policy-driven on top of the shard-first hot path
- gates publishing behind the full release validation stack, including clean-install, graph-route, solver, shard, and Docker checks
v0.1.2
Prebuilt Rust wheels for Linux and macOS.
Install
pip install voyager-index # pure Python
pip install voyager-index[native] # + prebuilt Rust kernels (Linux x86_64, macOS x86_64/arm64)
pip install voyager-index[native,server] # + FastAPI reference serverPlatforms
- Linux x86_64 (manylinux 2.28)
- macOS Intel (x86_64)
- macOS Apple Silicon (arm64)
- Python 3.10, 3.11, 3.12