Skip to content

add vector similarity benchmark suite#114

Merged
kacy merged 7 commits intomainfrom
feat/vector-benchmark
Feb 14, 2026
Merged

add vector similarity benchmark suite#114
kacy merged 7 commits intomainfrom
feat/vector-benchmark

Conversation

@kacy
Copy link
Copy Markdown
Owner

@kacy kacy commented Feb 14, 2026

summary

3-way benchmark comparing ember's vector similarity search (VADD/VSIM) against chromadb and pgvector. uses identical HNSW parameters across all systems for a fair comparison.

what's measured:

  • insert throughput (vectors/sec)
  • query throughput and latency (p50/p95/p99)
  • memory usage (RSS after indexing)
  • recall@10 accuracy (SIFT1M mode)

fairness: all systems use M=16, ef_construction=64, cosine similarity. same test vectors generated once and fed to each system sequentially (no CPU contention).

what was tested

  • bench/bench-vector.sh --quick --ember-only runs end-to-end on macOS (no docker needed)
  • verified CSV output and JSON intermediate files are well-formed
  • python venv auto-creation handles macOS externally-managed environment (PEP 668)
  • all 350 existing tests pass with --features vector
=== vector benchmark configuration ===
mode:       random
vectors:    1000 base, 100 queries
dimensions: 128
k:          10
hnsw:       M=16, ef_construction=64
metric:     cosine

metric                            ember
------                            -----
insert (vectors/sec)             4204.9
query (queries/sec)              3562.2
query p50 (ms)                    0.275
query p95 (ms)                    0.318
query p99 (ms)                    0.325
memory (MB)                          12

design considerations

  • venv for python deps: the script auto-creates .bench-venv/ when pip install would fail (PEP 668 on modern macOS/linux). avoids requiring the user to manually manage environments.
  • sequential server execution: each system starts, gets benchmarked, and stops before the next begins — prevents CPU contention that would skew results.
  • pgvector index timing: HNSW index is built after all inserts (faster than incremental), and index build time is reported separately from insert throughput.
  • SIFT1M as optional: the --sift flag downloads the 160MB dataset on first use. random vector mode is the default since it needs no external data.

kacy added 7 commits February 13, 2026 23:20
3-way comparison of ember vs chromadb vs pgvector for vector
similarity search. benchmarks insert throughput, query latency
(p50/p95/p99), query throughput, and memory usage with identical
HNSW parameters (M=16, ef_construction=64, cosine similarity).

includes:
- bench-vector.sh: orchestrator with --ember-only, --quick, --sift flags
- bench-vector.py: python harness with client wrappers for all 3 systems
- SIFT1M dataset loader for recall accuracy testing
- setup-vm-vector.sh: additional VM deps (docker, python, images)
- README updates with vector benchmark section
chromadb deprecated the /api/v1/ endpoints. the heartbeat
health check now hits /api/v2/heartbeat instead.
psycopg2 sends python lists as numeric[], which pgvector's <=>
operator doesn't accept. format vectors as pgvector string
literals ("[0.1,0.2,0.3]") with explicit ::vector cast.
docker's default shm size is 64MB, but postgres shared_buffers
is set to 256MB. add --shm-size=512m to avoid "No space left
on device" when building HNSW indexes on larger datasets.
100k random vectors, 128-dim, cosine, k=10 kNN search.
ember wins on query throughput (3.2x chromadb, 1.4x pgvector)
and memory usage (4-6x less). insert throughput is lower due
to per-vector RESP protocol overhead.
formatting changes across concurrent.rs, concurrent_handler.rs,
and connection.rs. suppress dead_code warning on cmd_raw test
helper (intended for future protobuf binary tests).
the 1M-vector HNSW index exceeds 16GB RAM during construction
on c2-standard-8. needs c2-standard-16 or higher.
@kacy kacy merged commit 4e4bea3 into main Feb 14, 2026
7 checks passed
@kacy kacy deleted the feat/vector-benchmark branch February 14, 2026 13:46
kacy added a commit that referenced this pull request Feb 19, 2026
* add vector similarity benchmark suite

3-way comparison of ember vs chromadb vs pgvector for vector
similarity search. benchmarks insert throughput, query latency
(p50/p95/p99), query throughput, and memory usage with identical
HNSW parameters (M=16, ef_construction=64, cosine similarity).

includes:
- bench-vector.sh: orchestrator with --ember-only, --quick, --sift flags
- bench-vector.py: python harness with client wrappers for all 3 systems
- SIFT1M dataset loader for recall accuracy testing
- setup-vm-vector.sh: additional VM deps (docker, python, images)
- README updates with vector benchmark section

* fix chromadb health check to use v2 API

chromadb deprecated the /api/v1/ endpoints. the heartbeat
health check now hits /api/v2/heartbeat instead.

* fix pgvector type casting for vector columns

psycopg2 sends python lists as numeric[], which pgvector's <=>
operator doesn't accept. format vectors as pgvector string
literals ("[0.1,0.2,0.3]") with explicit ::vector cast.

* fix pgvector docker container shared memory limit

docker's default shm size is 64MB, but postgres shared_buffers
is set to 256MB. add --shm-size=512m to avoid "No space left
on device" when building HNSW indexes on larger datasets.

* update vector benchmark results from gcp c2-standard-8

100k random vectors, 128-dim, cosine, k=10 kNN search.
ember wins on query throughput (3.2x chromadb, 1.4x pgvector)
and memory usage (4-6x less). insert throughput is lower due
to per-vector RESP protocol overhead.

* run cargo fmt and fix clippy warning

formatting changes across concurrent.rs, concurrent_handler.rs,
and connection.rs. suppress dead_code warning on cmd_raw test
helper (intended for future protobuf binary tests).

* note sift1m requires larger VM for benchmarking

the 1M-vector HNSW index exceeds 16GB RAM during construction
on c2-standard-8. needs c2-standard-16 or higher.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant