Skip to content

feat: add VADD_BATCH command for bulk vector inserts#161

Merged
kacy merged 8 commits intomainfrom
feat/vadd-batch
Feb 16, 2026
Merged

feat: add VADD_BATCH command for bulk vector inserts#161
kacy merged 8 commits intomainfrom
feat/vadd-batch

Conversation

@kacy
Copy link
Copy Markdown
Owner

@kacy kacy commented Feb 16, 2026

summary

  • adds VADD_BATCH command that sends multiple vectors in a single round trip, eliminating per-vector protocol overhead for bulk inserts
  • RESP3 syntax: VADD_BATCH key DIM n elem1 f32... elem2 f32... [METRIC|QUANT|M|EF]
  • gRPC: VAddBatch(VAddBatchRequest) returns (IntResponse) with packed float entries
  • max batch size: 10,000 vectors. all vectors validated upfront (NaN/inf rejects entire batch)
  • AOF persistence reuses existing VAdd records (one per applied vector) — no new format needed

what was tested

  • 10 new parser tests covering: basic 2-vector batch, options, single entry, empty entries, wrong arity, missing DIM keyword, DIM 0, DIM > max, insufficient floats, M/EF exceeds max
  • 1 new shard test: to_aof_records_for_vadd_batch — 3 applied vectors produce 3 AofRecord::VAdd records
  • cargo test -p ember-protocol — 330 tests pass
  • cargo test -p emberkv-core --features vector — 351 tests pass (1 pre-existing failure in memory::tests::entry_overhead_not_too_small, unrelated)
  • cargo build -p ember-server --features jemalloc,vector,grpc — clean build

design considerations

  • DIM keyword in RESP3: required so the parser knows where each vector ends and the next element name begins. without it, an element name that happens to parse as a float would be ambiguous
  • to_aof_record → to_aof_records refactor: changed return type from Option<AofRecord> to Vec<AofRecord> so VADD_BATCH can expand each applied vector into an individual VAdd record. all existing arms are mechanical Some(x)vec![x] / Nonevec![] changes
  • no new AOF format: batch inserts replay as individual VAdd records on recovery. simpler than adding a batch record type and avoids version compatibility concerns
  • upfront validation: all vectors checked for NaN/inf before any are inserted. memory estimated for the entire batch with one enforce_memory_limit call. partial insert on usearch error mid-batch is tracked for correct AOF recording

kacy added 8 commits February 16, 2026 07:24
add VADD_BATCH command that accepts multiple vectors in a single
command to reduce per-vector round-trip overhead for bulk inserts.

RESP3 syntax: VADD_BATCH key DIM n elem1 f32... elem2 f32... [opts]

the DIM keyword is required so the parser knows where each vector
ends and the next element name begins. max batch size is 10,000.
- add VAddBatchResult struct and vadd_batch() method to keyspace
  with upfront NaN/inf validation and single memory check for the
  entire batch
- add ShardRequest::VAddBatch and ShardResponse::VAddBatchResult
- refactor to_aof_record → to_aof_records (returns Vec<AofRecord>)
  so VADD_BATCH can expand each applied vector into its own
  AofRecord::VAdd — no new AOF format needed
wire up VADD_BATCH through both sharded and concurrent mode code
paths in connection.rs. response returns integer count of newly
added elements, matching VADD's pattern.
- add VAddBatchEntry and VAddBatchRequest proto messages
- add VAddBatch RPC returning IntResponse (count of added elements)
- add to PipelineRequest oneof (field 72)
- implement v_add_batch handler in grpc.rs with validation
- regenerate go and python proto stubs
- add vadd_batch() method to python gRPC client
- update bench-vector.py to send batches via VADD_BATCH (RESP) and
  vadd_batch (gRPC) instead of individual VADD calls
- update bench-memory.sh vector helper to use VADD_BATCH
- bump command count to 107 in README and bench README
when system python has the base deps but grpc mode forces a venv,
the venv was missing numpy/redis. now installs all required deps
alongside ember-py.
the protoc codegen produces `from ember.v1 import` but the package
layout needs `from ember.proto.ember.v1 import`. the Makefile has
a sed fixup for this but manual regen skipped it.
benchmarked on GCP c2-standard-8. VADD_BATCH improves insert
throughput: RESP 963 → 1,483 vec/s (+54%), gRPC 1,009 → 2,374
vec/s (+135%). query throughput unchanged as expected.
@kacy kacy merged commit f449325 into main Feb 16, 2026
6 of 7 checks passed
@kacy kacy deleted the feat/vadd-batch branch February 16, 2026 13:25
kacy added a commit that referenced this pull request Feb 19, 2026
* feat: add VADD_BATCH command parsing to protocol layer

add VADD_BATCH command that accepts multiple vectors in a single
command to reduce per-vector round-trip overhead for bulk inserts.

RESP3 syntax: VADD_BATCH key DIM n elem1 f32... elem2 f32... [opts]

the DIM keyword is required so the parser knows where each vector
ends and the next element name begins. max batch size is 10,000.

* feat: add VADD_BATCH to core engine and refactor AOF recording

- add VAddBatchResult struct and vadd_batch() method to keyspace
  with upfront NaN/inf validation and single memory check for the
  entire batch
- add ShardRequest::VAddBatch and ShardResponse::VAddBatchResult
- refactor to_aof_record → to_aof_records (returns Vec<AofRecord>)
  so VADD_BATCH can expand each applied vector into its own
  AofRecord::VAdd — no new AOF format needed

* feat: add VADD_BATCH dispatch to connection handler

wire up VADD_BATCH through both sharded and concurrent mode code
paths in connection.rs. response returns integer count of newly
added elements, matching VADD's pattern.

* feat: add VADD_BATCH gRPC RPC and regenerate client stubs

- add VAddBatchEntry and VAddBatchRequest proto messages
- add VAddBatch RPC returning IntResponse (count of added elements)
- add to PipelineRequest oneof (field 72)
- implement v_add_batch handler in grpc.rs with validation
- regenerate go and python proto stubs

* feat: update python client and benchmarks to use VADD_BATCH

- add vadd_batch() method to python gRPC client
- update bench-vector.py to send batches via VADD_BATCH (RESP) and
  vadd_batch (gRPC) instead of individual VADD calls
- update bench-memory.sh vector helper to use VADD_BATCH
- bump command count to 107 in README and bench README

* fix: install base deps into venv when grpc benchmarks are requested

when system python has the base deps but grpc mode forces a venv,
the venv was missing numpy/redis. now installs all required deps
alongside ember-py.

* fix: correct import path in generated python grpc stubs

the protoc codegen produces `from ember.v1 import` but the package
layout needs `from ember.proto.ember.v1 import`. the Makefile has
a sed fixup for this but manual regen skipped it.

* docs: update vector benchmark results with VADD_BATCH numbers

benchmarked on GCP c2-standard-8. VADD_BATCH improves insert
throughput: RESP 963 → 1,483 vec/s (+54%), gRPC 1,009 → 2,374
vec/s (+135%). query throughput unchanged as expected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant