Skip to content

smarthi/rotorvec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rotorvec

CI PyPI crates.io License

A vector index built on block-diagonal rotor quantization — a family of data-oblivious quantizers that swap TurboQuant's dense d×d random orthogonal matrix for small per-block rotations from Clifford algebra. Inspired by turbovec (TurboQuant) and the RotorQuant paper.

"Replace the d×d random orthogonal matrix with Clifford rotors... exploiting algebraic sparsity" — RotorQuant paper

Four rotation variants

Variant Block Algebra Cost / vector When to pick it
Planar2 2 2D Givens (SO(2)) O(d) Cheapest. Per the RotorQuant paper, best PPL for KV-cache decorrelation.
Rotor3 3 Cl(3,0) sandwich R v R̃ O(d) Default. Richer algebraic structure than Givens, modest cost.
Iso4 4 Quaternion left-iso (SO(4)) O(d) Most decorrelation per group; slightly more compute than Planar2.
WalshRotor3 3 + global signed FWHT then Cl(3,0) sandwich O(d log d) v0.2.3 default for power-of-two dims. Closes the recall gap on data with strong cross-coordinate correlations (image features, SIFT). Requires dim ∈ {128, 256, 512, 1024, 2048, 4096}.

For comparison, TurboQuant uses 16,384 parameters (a 128×128 matrix) and O(d²) FMAs per vector. The first three rotorvec variants are O(d) per vector; WalshRotor3 is O(d log d) — sitting between block-only and dense-GEMM in both cost and decorrelation power.

Picking the right variant

Your data has... Pick
dim is a power of 2 (SIFT-128, CLIP-512, BGE-1024) WalshRotor3 — strongest recall
dim not a power of 2 (GloVe-100, MiniLM-384, CLIP-768, OpenAI-1536) Rotor3 (default), Planar2, or Iso4
Want the cheapest per-vector cost Planar2

Quick start

Python

Build the extension locally with uv (a published wheel is on the v0.2 roadmap). uv handles the venv, pulls in maturin as the PEP 517 build backend automatically, and produces an editable install:

cd rotorvec-python
uv venv --python 3.12
uv pip install -e .

Then either activate the venv (source .venv/bin/activate) or prefix your commands with uv run (e.g. uv run pytest tests/).

Then:

import numpy as np
from rotorvec import RotorQuantIndex, IdMapIndex

# Default: Cl(3,0) rotors, 4-bit codes
index = RotorQuantIndex(dim=1536, bits=4)

# Or pick a variant explicitly
planar = RotorQuantIndex(dim=1536, bits=4, rotation="planar2")
iso    = RotorQuantIndex(dim=1536, bits=4, rotation="iso4")

# Power-of-two dim → use WalshRotor3 for best recall on hard data
walsh  = RotorQuantIndex(dim=1024, bits=4, rotation="walshrotor3")

vectors = np.random.randn(10_000, 1536).astype(np.float32)
queries = np.random.randn(8, 1536).astype(np.float32)

index.add(vectors)
scores, indices = index.search(queries, k=10)   # (8, 10) float32 / int64

index.write("index.rv")
loaded = RotorQuantIndex.load("index.rv")

Stable external ids:

ids = np.array([1001, 1002, 1003], dtype=np.uint64)
m = IdMapIndex(dim=1536, bits=4, rotation="planar2")
m.add_with_ids(vectors[:3], ids)
scores, returned_ids = m.search(queries, k=3)   # ids are uint64
m.remove(1002)

Accepted rotation values: "planar2", "rotor3" (default), "iso4", "walshrotor3" (aliases: "walsh", "walsh-rotor3").

Rust

use rotorvec::{RotorQuantIndex, Rotation};

let mut index = RotorQuantIndex::new(1536, 4);                       // Rotor3 default
let mut planar = RotorQuantIndex::with_rotation(1536, 4, Rotation::Planar2);
let mut iso    = RotorQuantIndex::with_rotation(1536, 4, Rotation::Iso4);
let mut walsh  = RotorQuantIndex::with_rotation(1024, 4, Rotation::WalshRotor3); // dim must be pow2

index.add(&vectors);
let results = index.search(&queries, 10);
index.write("index.rv").unwrap();
let loaded = RotorQuantIndex::load("index.rv").unwrap();

For stable external ids:

use rotorvec::{IdMapIndex, Rotation};

let mut index = IdMapIndex::with_rotation(1536, 4, Rotation::Planar2);
index.add_with_ids(&vectors, &[1001, 1002, 1003]);
let (scores, ids) = index.search(&queries, 10);
index.remove(1002);

Python usage guide

A walkthrough covering the common patterns: building an index from embeddings, picking the right variant for your data, searching with stable IDs, persisting to disk, and the gotchas worth knowing.

Install

pip install rotorvec

Wheels are abi3-py310 — one binary per platform covers Python 3.10 through 3.13. Linux x86_64/aarch64, macOS aarch64, Windows x64 wheels are on PyPI; other platforms build from sdist.

Choosing dim, bits, and rotation

Three knobs determine compression rate and recall:

Knob What it controls Constraint Practical advice
dim embedding dimension dim % 8 == 0 (always); power of 2 if you want walshrotor3 match your embedding model — 384 / 768 / 1024 / 1536 are typical
bits bits per coordinate 2, 3, or 4 start at 4 (best recall); drop to 2-3 only if memory is tight
rotation decorrelation scheme see the variant table start at "rotor3" unless your dim is a power of 2 — then use "walshrotor3"

A rough sizing formula: each vector takes dim * bits / 8 bytes + 4 bytes for the norm. At dim=1536, bits=4: 772 bytes/vector → 100K vectors fit in ~75 MB, 1M vectors in ~750 MB.

Build an index from sentence-transformer embeddings

End-to-end with sentence-transformers (the most common RAG / semantic search starting point):

import numpy as np
from sentence_transformers import SentenceTransformer
from rotorvec import RotorQuantIndex

model = SentenceTransformer("all-MiniLM-L6-v2")   # dim=384
corpus = ["The capital of France is Paris.",
          "Mount Everest is the tallest peak.",
          "rotorvec compresses vectors via Clifford rotors.",
          # ... your documents
         ]

embeddings = model.encode(corpus, convert_to_numpy=True).astype(np.float32)
# Shape: (n_docs, 384). dtype must be float32.

# dim=384 is not a power of 2, so use rotor3 (or planar2 / iso4).
index = RotorQuantIndex(dim=384, bits=4, rotation="rotor3")
index.add(embeddings)

# Search
query_text = "Where is the Eiffel Tower?"
q = model.encode([query_text], convert_to_numpy=True).astype(np.float32)
scores, indices = index.search(q, k=5)

for rank, (i, s) in enumerate(zip(indices[0], scores[0])):
    print(f"{rank+1}. score={s:.3f}  {corpus[i]}")

For OpenAI embeddings (text-embedding-3-small, dim=1536) the only change is RotorQuantIndex(dim=1536, ...).

For CLIP image embeddings (ViT-B/32, dim=512) you can use the stronger walshrotor3 because 512 is a power of 2:

index = RotorQuantIndex(dim=512, bits=4, rotation="walshrotor3")

Stable external IDs (RAG, deletes, multi-tenant)

The default RotorQuantIndex returns positional indices (0..n). If your documents have stable IDs (database PKs, hashes, etc.), use IdMapIndex — it survives deletes:

import numpy as np
from rotorvec import IdMapIndex

doc_ids = np.array([1001, 1002, 1003, 1004], dtype=np.uint64)
embeddings = np.random.randn(4, 384).astype(np.float32)

idx = IdMapIndex(dim=384, bits=4, rotation="rotor3")
idx.add_with_ids(embeddings, doc_ids)

# Search returns your ids, not positional indices
scores, ids = idx.search(query, k=3)
# ids dtype is uint64 — your original document IDs

# Deletes are O(1)
idx.remove(1002)
assert idx.contains(1002) is False
assert len(idx) == 3

When the queue of documents grows past a million, this is the layer you almost certainly want — the positional index forces you to track "which row was which document" externally, which gets painful fast.

Save and load

.rv for the positional index, .rvim for the ID-mapped index. Format includes the rotation variant byte so loading picks the right block size automatically:

# Save
index.write("corpus_v1.rv")

# Load (works across machines, same dim/bits/rotation comes back)
from rotorvec import RotorQuantIndex
loaded = RotorQuantIndex.load("corpus_v1.rv")
assert loaded.dim == 384
assert loaded.bits == 4
assert loaded.rotation == "rotor3"

# IdMapIndex round-trips the id table too
idx.write("corpus_v1.rvim")
loaded_id = IdMapIndex.load("corpus_v1.rvim")

The on-disk format is stable across patch releases. Major-version bumps (e.g. v0.2.x → v0.3.0) may break compat — release notes call that out when it happens.

Incremental adds

add() is incremental — you can call it repeatedly without rebuilding:

index = RotorQuantIndex(dim=1536, bits=4)

for batch in stream_embeddings(batch_size=10_000):  # any iterable
    index.add(batch)
    print(f"index size: {len(index)}")

Internally, the first search() after any add() rebuilds the SIMD block cache lazily. If you have an interactive workload mixing add/search, you can warm the cache deterministically:

index.prepare()   # pay the rebuild cost now, not on the next search

Searching multiple queries at once

search() is batch-shaped — pass a 2D array of queries, get back 2D scores + indices. This is meaningfully faster per query than looping because the rotation matrix and centroid LUTs build once:

# 1 query at a time — repeated rebuild overhead
for q in queries:
    scores, indices = index.search(q.reshape(1, -1), k=10)

# All queries at once — single rebuild, parallelized across queries
scores, indices = index.search(queries, k=10)

Common errors and fixes

Error Cause Fix
dim must be a multiple of 8 dim is 100, 200, etc. use 104, 200, 384, 768, 1536 (already-supported), or zero-pad your vectors to the next multiple of 8
Rotation::WalshRotor3 requires dim to be a power of 2 tried walshrotor3 with dim=384 use rotation="rotor3" instead
vectors length not a multiple of dim numpy shape mismatch check vectors.shape == (n, dim); reshape if needed
vectors arg is rejected wrong dtype ensure vectors.astype(np.float32)float64 is not accepted

Performance tips

  • Pin bits=4 unless RAM is critical. The recall delta between 4-bit and 3-bit is meaningful (5–10pp on real data); 4-bit and 2-bit is severe (15–25pp).
  • Use walshrotor3 when you can. If your dim is a power of 2, there's no reason to use the block-only variants — walshrotor3 is same-or-better recall with negligible search overhead.
  • Batch queries. The per-query LUT build is amortized across the batch; 100 queries at once is ~5–10× faster than 100 separate search calls.
  • Call prepare() before timing. First search() after add() builds the SIMD cache lazily — that cost shouldn't show up in your latency measurements.

Integration with existing tooling

  • LangChain / LlamaIndex / Haystack — wrappers are on the v0.2.4+ roadmap. For now, use rotorvec directly via the API above; the integrations are thin shims.
  • FAISS users: the API mirrors FAISS conventions where possible (add / search, k-as-positional, (scores, indices) return). The main difference: rotorvec is data-oblivious (no train() step), so there's no equivalent to FAISS's IndexPQ.train(xb). Just add() and you're done.

How it works

The pipeline per vector:

  1. Normalize to unit length, store the norm separately.
  2. (WalshRotor3 only) Sign-flip by a deterministic ±1 vector, then apply the fast Walsh-Hadamard transform, then rescale by 1/√d. This spreads each coordinate's energy across all d outputs in O(d log d) work — the cross-block mixing the block-only variants lack.
  3. Generate per-block rotation matrices from a deterministic ChaCha8 seed:
    • Planar2: one angle θ → 2×2 Givens matrix per pair.
    • Rotor3 / WalshRotor3: a unit Cl(3,0) rotor R = (s, b₁₂, b₁₃, b₂₃) → 3×3 SO(3) matrix (the sandwich R v R̃ reduces algebraically to a 3×3 rotation for grade-1 input).
    • Iso4: a unit quaternion q = (w, x, y, z) → 4×4 SO(4) matrix (left-isoclinic action M v ↔ q · v).
  4. Block-rotate — block-diagonal matrix multiply, no inter-block dependencies.
  5. Quantize each rotated coordinate to a Lloyd-Max centroid (2/3/4 bit).
  6. Bit-pack into bit-plane format.

Search reverses the process: apply the same WHT pre-pass (if any) and block rotation to the query, accumulate per-coordinate inner products against each stored vector's quantized codes, multiply by stored norms, return top-k.

The trade-off across block-only variants (Planar2, Rotor3, Iso4): they only decorrelate within blocks, not across them. That's fine for data with weak cross-coordinate correlations (typical text embeddings) but loses recall on data with strong cross-coord structure (image features like SIFT). WalshRotor3 exists specifically to close that gap on power-of-2 dimensions.

What's different from turbovec

turbovec (TurboQuant) rotorvec (block-only) rotorvec (WalshRotor3)
Decorrelation Dense d×d random orthogonal matrix Block-diagonal small rotations Signed FWHT + block rotation
Rotation cost O(d²) per vector (BLAS GEMM) O(d) per vector O(d log d) per vector
Parameters d² floats ~d floats ~d floats + d sign bits
BLAS dependency Yes (Accelerate / OpenBLAS) None None
Decorrelation scope All d coordinates Within 2/3/4-element blocks only All d coordinates
Dim constraint dim % 8 == 0 dim % 8 == 0 dim power of 2

Repo layout

This is a Cargo workspace with two members:

Run the Rust test suite with cargo test -p rotorvec --release. Run the Python suite with cd rotorvec-python && uv run pytest tests/ (after uv pip install -e .).

CI / Release

Three GitHub Actions workflows live in .github/workflows/:

Workflow Trigger What it does
ci.yml every push to main and every PR cargo fmt --check, cargo clippy -D warnings, cargo test, pytest across Python 3.10 / 3.12 / 3.13 on Linux + macOS
release-pypi.yml tag py-vX.Y.Z Builds wheels for Linux x86_64/aarch64, macOS x86_64/aarch64, Windows x64; builds sdist; publishes to PyPI via OIDC trusted publishing
release-crates.yml tag vX.Y.Z cargo publish -p rotorvec to crates.io via OIDC trusted publishing

One-time PyPI / crates.io setup (see each registry's docs for the exact UI):

  1. PyPI: create the project on pypi.org, add a "pending publisher" pointing at smarthi/rotorvec, workflow release-pypi.yml, environment pypi. Then create a GitHub environment named pypi on the repo (Settings → Environments).
  2. crates.io: configure OIDC trusted publishing in your crates.io account settings, scoped to the same repo + workflow + environment (crates-io). Create the matching GitHub environment.

Once configured, neither workflow needs API tokens stored as secrets.

Status

v0.2.4 — current release. Published on crates.io and PyPI. 26 tests passing (16 unit + 9 integration + 1 doctest). CI green on Linux + macOS, Python 3.10/3.12/3.13.

Release history

Version Headline
v0.2.4 WalshRotor3 variant — beats turbovec on SIFT-1M recall (0.496 vs 0.487) at 1.8× faster build
v0.2.2 Real-dataset benchmarks (GloVe-100, SIFT-1M) + FAISS baseline + apples-to-apples audit
v0.2.1 NEON 4-bit search kernel — 130× faster search on aarch64, 96% of turbovec QPS at d=128
v0.2.0 Benchmark harness + Rotor3 trailing-padding fix (recall@10 at d=128: 0.78 → 0.83)
v0.1.x Initial release — three rotation variants, scalar search, file format, CI/CD

Roadmap (v0.2.4+)

  • v0.2.5 — Generalized cross-block mixing for non-power-of-2 dim (covers 384, 768, 1536)
  • v0.2.6 — x86 AVX2 / AVX-512 search kernels (cloud x86 coverage)
  • v0.2.7 — NEON for 2-bit / 3-bit code widths
  • v0.3.0 — HNSW wrapper for production-scale corpora (>10M vectors)
  • LangChain / LlamaIndex / Haystack integrations once HNSW lands

File format

The on-disk format encodes which rotation variant the index uses, so loading a .rv or .rvim file picks the right block size automatically. See the header doc-comment in src/io.rs for the byte layout.

Attribution

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors