21 Apr 16:10

ddickmann

ddcadc5

0.1.7 — Optimized CUDA MaxSim + RROQ158 kernel + rename to colsearch + bugfixes Latest

Latest

Highlights

This release ships three things at once:

1. Optimized CUDA kernels for MaxSim and RROQ158 on H100

Fused single-pass mma.sync.b1.b1.s32.and.popc kernel for RROQ158
Multi-tier (32/64/128/256/512) padded MaxSim dispatcher
Autotune-aware warmup at production batch sizes
int64-pointer fix in the Triton MaxSim kernel
GPU full-corpus fast-path that bypasses LEMUR routing when the preloaded corpus already pays the VRAM cost
Persistent query/corpus scratch buffers

Result on BEIR-8 (H100): 3.12× geomean QPS over FastPlaid at fp16 (446.7 vs 143.2 QPS), 2.06× at rroq158_gs128 (294.4 QPS). See benchmarks/competitive_benchmark.md.

2. Rename `voyager-index` → `colsearch`

PyPI: pip install colsearch
GitHub: ddickmann/colsearch (the old ddickmann/voyager-index URL auto-redirects)
Python package: colsearch (with a voyager_index compat shim)
Console script: colsearch-server (voyager-index-server retained as an alias)

3. Bugfixes

Bench disk-tightening: os.rename / os.link instead of shutil.copytree (~100 GB peak disk saved on BEIR-8 sweep).
CPU whole-corpus fast-path that bypasses the per-query 522k-row numpy fancy-index gather (was making quora rroq158/cpu allocate ~5 GB/query and hang for 90+ minutes).
RROQ158 hot-path alloc-churn fix that eliminated a 1.2 GB/call churn forcing CUDA allocator GC every ~200 queries.
Reference-api Dockerfile installs Rust via rustup (pinned 1.94.1) instead of Debian's apt rustc 1.85, which predated the avx512_target_feature stabilization (1.89+) used by the AVX-512 VPOPCNTDQ tier — unblocks docker-smoke.
Release pipeline now uses skip-existing: true on PyPI publish so colsearch can ship without forcing a no-op version bump on the independently-versioned native crates.

Compatibility (rename)

pip install voyager-index users: install colsearch instead. Legacy distribution stops at 0.1.6.
import voyager_index keeps working in 0.1.7 via a thin shim that emits a single DeprecationWarning and eagerly aliases every voyager_index.X.Y submodule to the canonical colsearch.X.Y module in sys.modules so isinstance and enum identity continue to hold. Removed in 0.2.0.
voyager-index-server console script aliased to colsearch-server for the 0.1.x cycle, removed in 0.2.0.
VOYAGER_INDEX_PATH env var still honoured but emits a deprecation warning. Migrate to COLSEARCH_INDEX_PATH.
VOYAGER_BENCH_CPU_TIME_BUDGET_S is honoured; COLSEARCH_BENCH_CPU_TIME_BUDGET_S is the new canonical name.
Other VOYAGER_* env vars (VOYAGER_RROQ158_N_THREADS, VOYAGER_RROQ158_USE_B1_FUSED) are unchanged in 0.1.7 to avoid breaking ops scripts.
Docker image: latence/colsearch (was latence/voyager-zero); reference Dockerfile builds colsearch:latest.
Kubernetes manifests under deploy/k8s/ ship the new colsearch namespace and labels — adjust your overlays.

Native packages

The native crates (latence-shard-engine, latence-solver) are versioned independently of the root colsearch package and only need a republish when their Rust source changes. 0.1.7 is a metadata-only change for them; the published wheels stay at 0.1.6.

Provenance

Historical artefacts under reports/, validation-reports/, research/, notebooks/, and benchmarks/_smoke_*.py / benchmarks/_diag_*.py retain their original voyager-index 0.1.x labels and SHAs. Run identifiers in reports/fast_plaid_head_to_head/results_v7.jsonl (e.g. voyager_fp16, voyager_rroq158_gs128) and the corresponding benchmarks/fast_plaid_head_to_head.py --libraries flags are kept stable so historical JSONL stays readable; they map 1:1 to the colsearch fp16 / rroq158_gs128 lanes.

Install

pip install colsearch

Migrating from voyager-index:

pip uninstall voyager-index
pip install colsearch
# `import voyager_index` keeps working until 0.2.0; update to `import colsearch` at your leisure.

See the full CHANGELOG.md entry for 0.1.7.

Assets 2

20 Apr 14:33

ddickmann

v0.1.6

b3de46f

0.1.6 — RROQ158 SOTA Default at group_size=128

This release promotes the dim-aware Rroq158Config(group_size=128) lane to
the build-time default for newly created RROQ158 indexes — closing the
production-validation arc started in 0.1.5 (Phase 7 / Phase 8).

Highlights

~13% smaller per-token storage (~40 vs ~46 bytes/token at dim=128;
~6.4× smaller than fp16, up from ~5.5×).
CPU p95 ~10–30% faster on the BEIR-6 mean (one fewer scale load per
group in the popcount kernel; nfcorpus −22%, scifact −15%, scidocs −10%,
fiqa within +2% noise).
NDCG@10 within ±0.005 of the previous gs=32 baseline; per-dataset
mean Δ vs gs=32 across BEIR-6 = +0.0006 — Pareto-equal in quality
while delivering smaller storage AND lower-or-equal CPU p95 on every
dataset measured.
Dim-aware fallback _resolve_group_size(requested, dim)
transparently steps down to gs=64 / gs=32 (with a log warning) on
production corpora whose token dim is not divisible by 128, so the new
default works on dim=64 / 96 / 128 / 160 alike without caller changes.

Headline BEIR-6 averages (rroq158 gs=128 default)

Codec	NDCG@10	R@100	GPU p95 (ms)	CPU p95 (ms)
fp16 (baseline)	0.5206	0.7360	4.0 (1.00×)	103 (1.00×)
rroq158 (gs=128, default)	0.5069	0.7298	4.8 (1.20×)	310 (3.00×)
rroq158 (gs=32)	0.5063	0.7282	4.8 (1.20×)	325 (3.15×)
rroq4_riem	0.5158	0.7345	8.5 (2.13×)	580 (5.63×)

Full per-dataset / per-codec sweep:
reports/beir_2026q2_gs128/.

Migration

Existing on-disk indexes are unaffected — the manifest carries the
build-time group_size and only newly built indexes pick up the new
default.

Pin Rroq158Config(group_size=32) to restore the previous default
exactly.
Pin Rroq158Config(group_size=64) for the safest cross-dataset choice
(covers high-intra-token-variance corpora like arguana).

See docs/guides/quantization-tuning.md
for the full decision matrix and per-dim recipe table.

Install

pip install --upgrade voyager-index==0.1.6
# native (optional, for the Rust SIMD CPU kernel + tabu solver):
pip install --upgrade voyager-index[native]==0.1.6
# full (server + multimodal + native + GPU triton):
pip install --upgrade voyager-index[full,gpu]==0.1.6

Full changelog and merged PRs in
CHANGELOG.md.

Assets 2

15 Apr 20:13

ddickmann

v0.1.5

becb7a6

0.1.5

Changelog

This changelog tracks the official shipped OSS release line. Older draft notes
that did not correspond to a published release were removed so version history
reads in release order again.

Unreleased

0.1.5 — Release Gate Hotfix

This release republishes the shard-engine decomposition work on a clean CI line
after fixing the small lint regressions that slipped through the initial 0.1.4
cut.

Release integrity

fixed the shard refactor parity script bootstrap so the release lint lane
accepts the repo-local import setup
normalized import ordering and explicit public exports in the refactor-touching
files that failed the hosted Ruff gate
bumped the root package and supported native packages onto the 0.1.5 line
so the hotfix release cleanly supersedes the drafted 0.1.4 cut

0.1.4 — Shard Engine Decomposition And Release Evidence

This release keeps the shard product surface stable while decomposing the large
shard-engine modules behind compatibility facades and hardening the parity
evidence required to ship that refactor safely.

Shard engine maintainability

split the shard manager, store, fetch pipeline, LEMUR router, builder, WAL,
and ColBANDIT reranker into focused internal modules while preserving public
import paths
reduced config coupling by separating serving configuration from sweep-only
configuration behind compatibility exports
introduced internal protocols for router, store, fetch, reranker, and native
exact backends to narrow cross-module ownership

Runtime capability visibility

surfaced fallback and capability state for LEMUR routing, pinned staging, and
native exact execution through shard statistics and reference API metadata
added startup logging for shard capability selection so development and
production runs expose fallback decisions explicitly

Validation and release confidence

added shard refactor contract coverage for import compatibility, artifact
parity, query trace stability, and runtime capability reporting
added a machine-readable shard refactor parity report and wired it into CI so
release evidence is reproducible instead of ad hoc
bumped the root package and supported native packages onto the 0.1.4 line
refreshed release hygiene checks to validate the aligned package versions

0.1.3 — Production Release Hardening

This release closes the gap between the public product story and the shipped
package, native-wheel, and release pipeline surfaces.

Packaging and install surface

added a canonical voyager-index[full] install profile for the full public CPU-safe surface
added shard-native and broadened native so the public native story now covers both latence-shard-engine and latence-solver
bumped the root package and supported native packages onto the 0.1.3 line
tightened package data so the shipped sdist includes the graph quality fixture required by release validation

Graph-aware production path

kept latence-graph as a public optional extra and pinned it to the verified public latence>=0.1.1 line
clarified throughout the docs that the graph lane can consume compatible prebuilt graph data directly and remains additive to the shard-first hot path
preserved the graph route-conformance, provenance, and retrieval-uplift evidence as a distinct proof layer from shard performance benchmarks

CI, release, and OSS hygiene

expanded the native release bundle to include the shard-engine wheel alongside the solver wheel
tightened release documentation and automation around clean-install rehearsal, native-wheel validation, and publish gating
refreshed the README, install docs, issue templates, and contributor guidance around the supported production lane
added repo-governance files for dependency updates, code ownership, and contributor conduct

0.1.2 — Shard Production Surface

This release makes the shard engine the clear public product surface.

Retrieval and serving

production-wired shard search with LEMUR routing, ColBANDIT, and Triton MaxSim
shard scoring controls exposed for int8, fp8, and roq4
durable CRUD, WAL, checkpoint, recovery, and shard admin endpoints
multi-worker single-host reference server posture

API and SDK

base64 vector transport helpers exposed from voyager_index.transport
public HTTP API accepts base64 payloads for dense and multivector requests
shard configuration knobs surfaced on collection create, search, and info APIs
dense hybrid mode selection documented and shipped as rrf or tabu

Docs and DX

README, quickstart, API docs, and top-level guides rewritten around the shard-first story
benchmark methodology documented with a 100k comparison placeholder table
reference API examples now lead with base64 and shard-friendly install profiles

Release and packaging

release notes and changelog chronology cleaned up
CI trimmed to shard-only production lanes plus solver validation
supported native add-on story reduced to latence_solver

0.1.0 — Initial OSS Foundation Release

Initial public package release for voyager-index.

Foundation

installable voyager_index package and published OSS packaging surface
durable reference FastAPI service
dense, late-interaction, and multimodal collection kinds
CRUD, restart-safe persistence, and public examples

Retrieval

exact MaxSim exports through the public package
CPU-safe MaxSim fallback when Triton is unavailable
hybrid dense + BM25 retrieval
optional solver-backed refinement via latence_solver

Multimodal

preprocessing helpers for renderable source documents
multimodal model registry and provider seams
ColPali-oriented multimodal retrieval surface

Assets 2

15 Apr 15:19

ddickmann

v0.1.3

b6de36a

v0.1.3

Production release hardening for the full public voyager-index surface.

adds the canonical voyager-index[full] install profile for the full CPU-safe production lane
publishes both supported native packages on the 0.1.3 line: latence-shard-engine and latence-solver
keeps the Latence graph lane optional, additive, and policy-driven on top of the shard-first hot path
gates publishing behind the full release validation stack, including clean-install, graph-route, solver, shard, and Docker checks

Assets 2

31 Mar 19:10

ddickmann

v0.1.2

5ccb9e1

v0.1.2

Prebuilt Rust wheels for Linux and macOS.

Install

pip install voyager-index                # pure Python
pip install voyager-index[native]        # + prebuilt Rust kernels (Linux x86_64, macOS x86_64/arm64)
pip install voyager-index[native,server] # + FastAPI reference server

Platforms

Linux x86_64 (manylinux 2.28)
macOS Intel (x86_64)
macOS Apple Silicon (arm64)
Python 3.10, 3.11, 3.12

Assets 2

Releases: latenceainew/colsearch

0.1.7 — Optimized CUDA MaxSim + RROQ158 kernel + rename to colsearch + bugfixes

Highlights

1. Optimized CUDA kernels for MaxSim and RROQ158 on H100

2. Rename voyager-index → colsearch

3. Bugfixes

Compatibility (rename)

Native packages

Provenance

Install

Uh oh!

0.1.6 — RROQ158 SOTA Default at group_size=128

Highlights

Headline BEIR-6 averages (rroq158 gs=128 default)

Migration

Install

Uh oh!

0.1.5

Changelog

Unreleased

0.1.5 — Release Gate Hotfix

Release integrity

0.1.4 — Shard Engine Decomposition And Release Evidence

Shard engine maintainability

Runtime capability visibility

Validation and release confidence

0.1.3 — Production Release Hardening

Packaging and install surface

Graph-aware production path

CI, release, and OSS hygiene

0.1.2 — Shard Production Surface

Retrieval and serving

API and SDK

Docs and DX

Release and packaging

0.1.0 — Initial OSS Foundation Release

Foundation

Retrieval

Multimodal

Uh oh!

v0.1.3

Uh oh!

v0.1.2

Install

Platforms

Uh oh!

2. Rename `voyager-index` → `colsearch`