Releases · sjmoran/bitbudget

07 Jun 06:25

sjmoran

v0.3.1

ab1a960

v0.3.1 — binary-mean (proper threshold placement) Latest

Latest

Adds binary-mean: a 1-bit sign code thresholded at the per-coordinate mean rather than zero. A zero threshold assumes mean-centred data (LSH assumption A1); for off-centre embeddings like e5 it wastes most bits, and re-centring recovers them — e5 1-bit retention goes from 53% to 86% of float, while already-centred embedders barely move. The board now shows where the threshold is placed can matter as much as how many bits are spent. binary (zero threshold) stays for reproducibility.

Leaderboard: https://sjmoran.github.io/bitbudget/

Assets 2

06 Jun 22:10

sjmoran

v0.3.0

3133bb4

v0.3.0 — popular embedders + methods on the board

What's new

Embedders (7 tabs): OpenAI text-embedding-3 small & large, BGE (bge-base-en-v1.5), E5 (e5-base-v2), GTE (gte-base), alongside mxbai and MiniLM.
Methods: adds float16 (half precision) and int4 (4-bit scalar).

Every row is a measured bitbudget run, reproducible from a pip install. The headline holds across embedders: a 1-bit code with re-rank is lossless at 32× on OpenAI text-embedding-3-small (0.509 nDCG@10 at 192 B).

Protocol: three BEIR corpora (scifact, nfcorpus, arguana), so the full board reproduces in minutes on a laptop.

Engine fixes: embedding cache reuse (+--force), ~14× faster PQ (vectorised k-means + subsample), model caching, a urllib OpenAI embedder (no hard dep), and a doc_prompt for e5-style prefixes.

Leaderboard: https://sjmoran.github.io/bitbudget/

Assets 2

06 Jun 15:24

sjmoran

v0.2.0

8287ec8

BitBudget v0.2.0

BitBudget 0.2.0 adds the indexing axis to the benchmark.

Install

pip install bitbudget          # core (numpy only)
pip install bitbudget[faiss]   # + the faiss-backed indexes

What's new

Indexing leaderboard — bitbudget bench-index. The organisation axis: build an index over the document vectors and report recall@10, throughput (QPS) and bytes/vector, so graph and compact-code indexes can be compared on one frontier. Run on synthetic data, a cached embedding, or your own vectors (--npz).
Indexes: flat, hnsw, ivfpq (faiss), and bittrie — a van Emde Boas / PATRICIA radix-trie over compact codes with a multithreaded C kernel compiled on demand (the wheel stays pure-Python; numpy fallback when no compiler). On synthetic 100k×128 the bit-trie reaches ~16k QPS at 8 bytes per vector.
bench-index auto-splits the faiss indexes and the bit-trie into separate processes, avoiding the macOS OpenMP clash, so a single command works everywhere.
New plugin registry: register your own index with @index(...), alongside @method and @embedder.
Live leaderboard now carries both the compression and indexing boards: https://sjmoran.github.io/bitbudget/

Published to PyPI via Trusted Publishing. CI green on Python 3.9 / 3.11 / 3.12.

Assets 2

06 Jun 13:00

sjmoran

v0.1.0

b957153

BitBudget v0.1.0

BitBudget is a reproducible benchmark for embedding compression and indexing, organised by the projection–quantisation–organisation lens. The headline finding: bits beat dimensions.

Live leaderboard

https://sjmoran.github.io/bitbudget/

Install

pip install bitbudget          # core (numpy only)
pip install bitbudget[faiss]   # + the organisation-axis indexes

Two leaderboards

Compression — quality per byte. 8 methods across the lens axes (float32, int8, binary, binary+rerank, pq, rabitq, matryoshka, pca), scored by nDCG@10 retained per byte on BEIR with per-corpus error bars. Register your own with @method(...) and run bitbudget run. Headline: a one-bit code with re-ranking is lossless at 32× compression, reproduced at RAG scale on 8.8M MS MARCO passages.
Indexing — recall per query-second. bitbudget bench-index builds an index over the vectors and reports recall@10, QPS and bytes/vec for flat / hnsw / ivfpq (faiss) and bittrie, a van Emde Boas / PATRICIA radix-trie over compact codes with a multithreaded C kernel (compiled on demand; the wheel stays pure-Python). Register your own with @index(...).

Reproducibility

Plugin registries for methods, embedders and indexes. CI green on Python 3.9 / 3.11 / 3.12. MIT licensed. Published to PyPI via Trusted Publishing (no stored tokens).

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's new

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Install

What's new

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Live leaderboard

Install

Two leaderboards

Reproducibility

Uh oh!

Releases: sjmoran/bitbudget

v0.3.1 — binary-mean (proper threshold placement)

Uh oh!

v0.3.0 — popular embedders + methods on the board

What's new

Uh oh!

BitBudget v0.2.0

Install

What's new

Uh oh!

BitBudget v0.1.0

Live leaderboard

Install

Two leaderboards

Reproducibility

Uh oh!