Skip to content

Releases: latenceainew/colsearch

0.1.7 — Optimized CUDA MaxSim + RROQ158 kernel + rename to colsearch + bugfixes

21 Apr 16:10

Choose a tag to compare

Highlights

This release ships three things at once:

1. Optimized CUDA kernels for MaxSim and RROQ158 on H100

  • Fused single-pass mma.sync.b1.b1.s32.and.popc kernel for RROQ158
  • Multi-tier (32/64/128/256/512) padded MaxSim dispatcher
  • Autotune-aware warmup at production batch sizes
  • int64-pointer fix in the Triton MaxSim kernel
  • GPU full-corpus fast-path that bypasses LEMUR routing when the preloaded corpus already pays the VRAM cost
  • Persistent query/corpus scratch buffers

Result on BEIR-8 (H100): 3.12× geomean QPS over FastPlaid at fp16 (446.7 vs 143.2 QPS), 2.06× at rroq158_gs128 (294.4 QPS). See benchmarks/competitive_benchmark.md.

2. Rename voyager-indexcolsearch

  • PyPI: pip install colsearch
  • GitHub: ddickmann/colsearch (the old ddickmann/voyager-index URL auto-redirects)
  • Python package: colsearch (with a voyager_index compat shim)
  • Console script: colsearch-server (voyager-index-server retained as an alias)

3. Bugfixes

  • Bench disk-tightening: os.rename / os.link instead of shutil.copytree (~100 GB peak disk saved on BEIR-8 sweep).
  • CPU whole-corpus fast-path that bypasses the per-query 522k-row numpy fancy-index gather (was making quora rroq158/cpu allocate ~5 GB/query and hang for 90+ minutes).
  • RROQ158 hot-path alloc-churn fix that eliminated a 1.2 GB/call churn forcing CUDA allocator GC every ~200 queries.
  • Reference-api Dockerfile installs Rust via rustup (pinned 1.94.1) instead of Debian's apt rustc 1.85, which predated the avx512_target_feature stabilization (1.89+) used by the AVX-512 VPOPCNTDQ tier — unblocks docker-smoke.
  • Release pipeline now uses skip-existing: true on PyPI publish so colsearch can ship without forcing a no-op version bump on the independently-versioned native crates.

Compatibility (rename)

  • pip install voyager-index users: install colsearch instead. Legacy distribution stops at 0.1.6.
  • import voyager_index keeps working in 0.1.7 via a thin shim that emits a single DeprecationWarning and eagerly aliases every voyager_index.X.Y submodule to the canonical colsearch.X.Y module in sys.modules so isinstance and enum identity continue to hold. Removed in 0.2.0.
  • voyager-index-server console script aliased to colsearch-server for the 0.1.x cycle, removed in 0.2.0.
  • VOYAGER_INDEX_PATH env var still honoured but emits a deprecation warning. Migrate to COLSEARCH_INDEX_PATH.
  • VOYAGER_BENCH_CPU_TIME_BUDGET_S is honoured; COLSEARCH_BENCH_CPU_TIME_BUDGET_S is the new canonical name.
  • Other VOYAGER_* env vars (VOYAGER_RROQ158_N_THREADS, VOYAGER_RROQ158_USE_B1_FUSED) are unchanged in 0.1.7 to avoid breaking ops scripts.
  • Docker image: latence/colsearch (was latence/voyager-zero); reference Dockerfile builds colsearch:latest.
  • Kubernetes manifests under deploy/k8s/ ship the new colsearch namespace and labels — adjust your overlays.

Native packages

The native crates (latence-shard-engine, latence-solver) are versioned independently of the root colsearch package and only need a republish when their Rust source changes. 0.1.7 is a metadata-only change for them; the published wheels stay at 0.1.6.

Provenance

Historical artefacts under reports/, validation-reports/, research/, notebooks/, and benchmarks/_smoke_*.py / benchmarks/_diag_*.py retain their original voyager-index 0.1.x labels and SHAs. Run identifiers in reports/fast_plaid_head_to_head/results_v7.jsonl (e.g. voyager_fp16, voyager_rroq158_gs128) and the corresponding benchmarks/fast_plaid_head_to_head.py --libraries flags are kept stable so historical JSONL stays readable; they map 1:1 to the colsearch fp16 / rroq158_gs128 lanes.

Install

pip install colsearch

Migrating from voyager-index:

pip uninstall voyager-index
pip install colsearch
# `import voyager_index` keeps working until 0.2.0; update to `import colsearch` at your leisure.

See the full CHANGELOG.md entry for 0.1.7.

0.1.6 — RROQ158 SOTA Default at group_size=128

20 Apr 14:33
b3de46f

Choose a tag to compare

This release promotes the dim-aware Rroq158Config(group_size=128) lane to
the build-time default for newly created RROQ158 indexes — closing the
production-validation arc started in 0.1.5 (Phase 7 / Phase 8).

Highlights

  • ~13% smaller per-token storage (~40 vs ~46 bytes/token at dim=128;
    ~6.4× smaller than fp16, up from ~5.5×).
  • CPU p95 ~10–30% faster on the BEIR-6 mean (one fewer scale load per
    group in the popcount kernel; nfcorpus −22%, scifact −15%, scidocs −10%,
    fiqa within +2% noise).
  • NDCG@10 within ±0.005 of the previous gs=32 baseline; per-dataset
    mean Δ vs gs=32 across BEIR-6 = +0.0006 — Pareto-equal in quality
    while delivering smaller storage AND lower-or-equal CPU p95 on every
    dataset measured.
  • Dim-aware fallback _resolve_group_size(requested, dim)
    transparently steps down to gs=64 / gs=32 (with a log warning) on
    production corpora whose token dim is not divisible by 128, so the new
    default works on dim=64 / 96 / 128 / 160 alike without caller changes.

Headline BEIR-6 averages (rroq158 gs=128 default)

Codec NDCG@10 R@100 GPU p95 (ms) CPU p95 (ms)
fp16 (baseline) 0.5206 0.7360 4.0 (1.00×) 103 (1.00×)
rroq158 (gs=128, default) 0.5069 0.7298 4.8 (1.20×) 310 (3.00×)
rroq158 (gs=32) 0.5063 0.7282 4.8 (1.20×) 325 (3.15×)
rroq4_riem 0.5158 0.7345 8.5 (2.13×) 580 (5.63×)

Full per-dataset / per-codec sweep:
reports/beir_2026q2_gs128/.

Migration

Existing on-disk indexes are unaffected — the manifest carries the
build-time group_size and only newly built indexes pick up the new
default.

  • Pin Rroq158Config(group_size=32) to restore the previous default
    exactly.
  • Pin Rroq158Config(group_size=64) for the safest cross-dataset choice
    (covers high-intra-token-variance corpora like arguana).

See docs/guides/quantization-tuning.md
for the full decision matrix and per-dim recipe table.

Install

pip install --upgrade voyager-index==0.1.6
# native (optional, for the Rust SIMD CPU kernel + tabu solver):
pip install --upgrade voyager-index[native]==0.1.6
# full (server + multimodal + native + GPU triton):
pip install --upgrade voyager-index[full,gpu]==0.1.6

Full changelog and merged PRs in
CHANGELOG.md.

0.1.5

15 Apr 20:13

Choose a tag to compare

Changelog

This changelog tracks the official shipped OSS release line. Older draft notes
that did not correspond to a published release were removed so version history
reads in release order again.

Unreleased

0.1.5 — Release Gate Hotfix

This release republishes the shard-engine decomposition work on a clean CI line
after fixing the small lint regressions that slipped through the initial 0.1.4
cut.

Release integrity

  • fixed the shard refactor parity script bootstrap so the release lint lane
    accepts the repo-local import setup
  • normalized import ordering and explicit public exports in the refactor-touching
    files that failed the hosted Ruff gate
  • bumped the root package and supported native packages onto the 0.1.5 line
    so the hotfix release cleanly supersedes the drafted 0.1.4 cut

0.1.4 — Shard Engine Decomposition And Release Evidence

This release keeps the shard product surface stable while decomposing the large
shard-engine modules behind compatibility facades and hardening the parity
evidence required to ship that refactor safely.

Shard engine maintainability

  • split the shard manager, store, fetch pipeline, LEMUR router, builder, WAL,
    and ColBANDIT reranker into focused internal modules while preserving public
    import paths
  • reduced config coupling by separating serving configuration from sweep-only
    configuration behind compatibility exports
  • introduced internal protocols for router, store, fetch, reranker, and native
    exact backends to narrow cross-module ownership

Runtime capability visibility

  • surfaced fallback and capability state for LEMUR routing, pinned staging, and
    native exact execution through shard statistics and reference API metadata
  • added startup logging for shard capability selection so development and
    production runs expose fallback decisions explicitly

Validation and release confidence

  • added shard refactor contract coverage for import compatibility, artifact
    parity, query trace stability, and runtime capability reporting
  • added a machine-readable shard refactor parity report and wired it into CI so
    release evidence is reproducible instead of ad hoc
  • bumped the root package and supported native packages onto the 0.1.4 line
  • refreshed release hygiene checks to validate the aligned package versions

0.1.3 — Production Release Hardening

This release closes the gap between the public product story and the shipped
package, native-wheel, and release pipeline surfaces.

Packaging and install surface

  • added a canonical voyager-index[full] install profile for the full public CPU-safe surface
  • added shard-native and broadened native so the public native story now covers both latence-shard-engine and latence-solver
  • bumped the root package and supported native packages onto the 0.1.3 line
  • tightened package data so the shipped sdist includes the graph quality fixture required by release validation

Graph-aware production path

  • kept latence-graph as a public optional extra and pinned it to the verified public latence>=0.1.1 line
  • clarified throughout the docs that the graph lane can consume compatible prebuilt graph data directly and remains additive to the shard-first hot path
  • preserved the graph route-conformance, provenance, and retrieval-uplift evidence as a distinct proof layer from shard performance benchmarks

CI, release, and OSS hygiene

  • expanded the native release bundle to include the shard-engine wheel alongside the solver wheel
  • tightened release documentation and automation around clean-install rehearsal, native-wheel validation, and publish gating
  • refreshed the README, install docs, issue templates, and contributor guidance around the supported production lane
  • added repo-governance files for dependency updates, code ownership, and contributor conduct

0.1.2 — Shard Production Surface

This release makes the shard engine the clear public product surface.

Retrieval and serving

  • production-wired shard search with LEMUR routing, ColBANDIT, and Triton MaxSim
  • shard scoring controls exposed for int8, fp8, and roq4
  • durable CRUD, WAL, checkpoint, recovery, and shard admin endpoints
  • multi-worker single-host reference server posture

API and SDK

  • base64 vector transport helpers exposed from voyager_index.transport
  • public HTTP API accepts base64 payloads for dense and multivector requests
  • shard configuration knobs surfaced on collection create, search, and info APIs
  • dense hybrid mode selection documented and shipped as rrf or tabu

Docs and DX

  • README, quickstart, API docs, and top-level guides rewritten around the shard-first story
  • benchmark methodology documented with a 100k comparison placeholder table
  • reference API examples now lead with base64 and shard-friendly install profiles

Release and packaging

  • release notes and changelog chronology cleaned up
  • CI trimmed to shard-only production lanes plus solver validation
  • supported native add-on story reduced to latence_solver

0.1.0 — Initial OSS Foundation Release

Initial public package release for voyager-index.

Foundation

  • installable voyager_index package and published OSS packaging surface
  • durable reference FastAPI service
  • dense, late-interaction, and multimodal collection kinds
  • CRUD, restart-safe persistence, and public examples

Retrieval

  • exact MaxSim exports through the public package
  • CPU-safe MaxSim fallback when Triton is unavailable
  • hybrid dense + BM25 retrieval
  • optional solver-backed refinement via latence_solver

Multimodal

  • preprocessing helpers for renderable source documents
  • multimodal model registry and provider seams
  • ColPali-oriented multimodal retrieval surface

v0.1.3

15 Apr 15:19

Choose a tag to compare

Production release hardening for the full public voyager-index surface.

  • adds the canonical voyager-index[full] install profile for the full CPU-safe production lane
  • publishes both supported native packages on the 0.1.3 line: latence-shard-engine and latence-solver
  • keeps the Latence graph lane optional, additive, and policy-driven on top of the shard-first hot path
  • gates publishing behind the full release validation stack, including clean-install, graph-route, solver, shard, and Docker checks

v0.1.2

31 Mar 19:10

Choose a tag to compare

Prebuilt Rust wheels for Linux and macOS.

Install

pip install voyager-index                # pure Python
pip install voyager-index[native]        # + prebuilt Rust kernels (Linux x86_64, macOS x86_64/arm64)
pip install voyager-index[native,server] # + FastAPI reference server

Platforms

  • Linux x86_64 (manylinux 2.28)
  • macOS Intel (x86_64)
  • macOS Apple Silicon (arm64)
  • Python 3.10, 3.11, 3.12