add vector similarity benchmark suite by kacy · Pull Request #114 · kacy/ember

kacy · 2026-02-14T04:21:06Z

summary

3-way benchmark comparing ember's vector similarity search (VADD/VSIM) against chromadb and pgvector. uses identical HNSW parameters across all systems for a fair comparison.

what's measured:

insert throughput (vectors/sec)
query throughput and latency (p50/p95/p99)
memory usage (RSS after indexing)
recall@10 accuracy (SIFT1M mode)

fairness: all systems use M=16, ef_construction=64, cosine similarity. same test vectors generated once and fed to each system sequentially (no CPU contention).

what was tested

bench/bench-vector.sh --quick --ember-only runs end-to-end on macOS (no docker needed)
verified CSV output and JSON intermediate files are well-formed
python venv auto-creation handles macOS externally-managed environment (PEP 668)
all 350 existing tests pass with --features vector

=== vector benchmark configuration ===
mode:       random
vectors:    1000 base, 100 queries
dimensions: 128
k:          10
hnsw:       M=16, ef_construction=64
metric:     cosine

metric                            ember
------                            -----
insert (vectors/sec)             4204.9
query (queries/sec)              3562.2
query p50 (ms)                    0.275
query p95 (ms)                    0.318
query p99 (ms)                    0.325
memory (MB)                          12

design considerations

venv for python deps: the script auto-creates .bench-venv/ when pip install would fail (PEP 668 on modern macOS/linux). avoids requiring the user to manually manage environments.
sequential server execution: each system starts, gets benchmarked, and stops before the next begins — prevents CPU contention that would skew results.
pgvector index timing: HNSW index is built after all inserts (faster than incremental), and index build time is reported separately from insert throughput.
SIFT1M as optional: the --sift flag downloads the 160MB dataset on first use. random vector mode is the default since it needs no external data.

3-way comparison of ember vs chromadb vs pgvector for vector similarity search. benchmarks insert throughput, query latency (p50/p95/p99), query throughput, and memory usage with identical HNSW parameters (M=16, ef_construction=64, cosine similarity). includes: - bench-vector.sh: orchestrator with --ember-only, --quick, --sift flags - bench-vector.py: python harness with client wrappers for all 3 systems - SIFT1M dataset loader for recall accuracy testing - setup-vm-vector.sh: additional VM deps (docker, python, images) - README updates with vector benchmark section

chromadb deprecated the /api/v1/ endpoints. the heartbeat health check now hits /api/v2/heartbeat instead.

psycopg2 sends python lists as numeric[], which pgvector's <=> operator doesn't accept. format vectors as pgvector string literals ("[0.1,0.2,0.3]") with explicit ::vector cast.

docker's default shm size is 64MB, but postgres shared_buffers is set to 256MB. add --shm-size=512m to avoid "No space left on device" when building HNSW indexes on larger datasets.

100k random vectors, 128-dim, cosine, k=10 kNN search. ember wins on query throughput (3.2x chromadb, 1.4x pgvector) and memory usage (4-6x less). insert throughput is lower due to per-vector RESP protocol overhead.

formatting changes across concurrent.rs, concurrent_handler.rs, and connection.rs. suppress dead_code warning on cmd_raw test helper (intended for future protobuf binary tests).

the 1M-vector HNSW index exceeds 16GB RAM during construction on c2-standard-8. needs c2-standard-16 or higher.

* add vector similarity benchmark suite 3-way comparison of ember vs chromadb vs pgvector for vector similarity search. benchmarks insert throughput, query latency (p50/p95/p99), query throughput, and memory usage with identical HNSW parameters (M=16, ef_construction=64, cosine similarity). includes: - bench-vector.sh: orchestrator with --ember-only, --quick, --sift flags - bench-vector.py: python harness with client wrappers for all 3 systems - SIFT1M dataset loader for recall accuracy testing - setup-vm-vector.sh: additional VM deps (docker, python, images) - README updates with vector benchmark section * fix chromadb health check to use v2 API chromadb deprecated the /api/v1/ endpoints. the heartbeat health check now hits /api/v2/heartbeat instead. * fix pgvector type casting for vector columns psycopg2 sends python lists as numeric[], which pgvector's <=> operator doesn't accept. format vectors as pgvector string literals ("[0.1,0.2,0.3]") with explicit ::vector cast. * fix pgvector docker container shared memory limit docker's default shm size is 64MB, but postgres shared_buffers is set to 256MB. add --shm-size=512m to avoid "No space left on device" when building HNSW indexes on larger datasets. * update vector benchmark results from gcp c2-standard-8 100k random vectors, 128-dim, cosine, k=10 kNN search. ember wins on query throughput (3.2x chromadb, 1.4x pgvector) and memory usage (4-6x less). insert throughput is lower due to per-vector RESP protocol overhead. * run cargo fmt and fix clippy warning formatting changes across concurrent.rs, concurrent_handler.rs, and connection.rs. suppress dead_code warning on cmd_raw test helper (intended for future protobuf binary tests). * note sift1m requires larger VM for benchmarking the 1M-vector HNSW index exceeds 16GB RAM during construction on c2-standard-8. needs c2-standard-16 or higher.

kacy added 7 commits February 13, 2026 23:20

fix chromadb health check to use v2 API

6753e9e

chromadb deprecated the /api/v1/ endpoints. the heartbeat health check now hits /api/v2/heartbeat instead.

fix pgvector type casting for vector columns

2727d68

psycopg2 sends python lists as numeric[], which pgvector's <=> operator doesn't accept. format vectors as pgvector string literals ("[0.1,0.2,0.3]") with explicit ::vector cast.

fix pgvector docker container shared memory limit

4b59c71

docker's default shm size is 64MB, but postgres shared_buffers is set to 256MB. add --shm-size=512m to avoid "No space left on device" when building HNSW indexes on larger datasets.

update vector benchmark results from gcp c2-standard-8

879e2ec

100k random vectors, 128-dim, cosine, k=10 kNN search. ember wins on query throughput (3.2x chromadb, 1.4x pgvector) and memory usage (4-6x less). insert throughput is lower due to per-vector RESP protocol overhead.

run cargo fmt and fix clippy warning

c928bcf

formatting changes across concurrent.rs, concurrent_handler.rs, and connection.rs. suppress dead_code warning on cmd_raw test helper (intended for future protobuf binary tests).

note sift1m requires larger VM for benchmarking

9d8cd35

the 1M-vector HNSW index exceeds 16GB RAM during construction on c2-standard-8. needs c2-standard-16 or higher.

kacy merged commit 4e4bea3 into main Feb 14, 2026
7 checks passed

kacy deleted the feat/vector-benchmark branch February 14, 2026 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add vector similarity benchmark suite#114

add vector similarity benchmark suite#114
kacy merged 7 commits intomainfrom
feat/vector-benchmark

kacy commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kacy commented Feb 14, 2026

summary

what was tested

design considerations

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant