feat: add VADD_BATCH command for bulk vector inserts by kacy · Pull Request #161 · kacy/ember

kacy · 2026-02-16T12:40:48Z

summary

adds VADD_BATCH command that sends multiple vectors in a single round trip, eliminating per-vector protocol overhead for bulk inserts
RESP3 syntax: VADD_BATCH key DIM n elem1 f32... elem2 f32... [METRIC|QUANT|M|EF]
gRPC: VAddBatch(VAddBatchRequest) returns (IntResponse) with packed float entries
max batch size: 10,000 vectors. all vectors validated upfront (NaN/inf rejects entire batch)
AOF persistence reuses existing VAdd records (one per applied vector) — no new format needed

what was tested

10 new parser tests covering: basic 2-vector batch, options, single entry, empty entries, wrong arity, missing DIM keyword, DIM 0, DIM > max, insufficient floats, M/EF exceeds max
1 new shard test: to_aof_records_for_vadd_batch — 3 applied vectors produce 3 AofRecord::VAdd records
cargo test -p ember-protocol — 330 tests pass
cargo test -p emberkv-core --features vector — 351 tests pass (1 pre-existing failure in memory::tests::entry_overhead_not_too_small, unrelated)
cargo build -p ember-server --features jemalloc,vector,grpc — clean build

design considerations

DIM keyword in RESP3: required so the parser knows where each vector ends and the next element name begins. without it, an element name that happens to parse as a float would be ambiguous
to_aof_record → to_aof_records refactor: changed return type from Option<AofRecord> to Vec<AofRecord> so VADD_BATCH can expand each applied vector into an individual VAdd record. all existing arms are mechanical Some(x) → vec![x] / None → vec![] changes
no new AOF format: batch inserts replay as individual VAdd records on recovery. simpler than adding a batch record type and avoids version compatibility concerns
upfront validation: all vectors checked for NaN/inf before any are inserted. memory estimated for the entire batch with one enforce_memory_limit call. partial insert on usearch error mid-batch is tracked for correct AOF recording

add VADD_BATCH command that accepts multiple vectors in a single command to reduce per-vector round-trip overhead for bulk inserts. RESP3 syntax: VADD_BATCH key DIM n elem1 f32... elem2 f32... [opts] the DIM keyword is required so the parser knows where each vector ends and the next element name begins. max batch size is 10,000.

- add VAddBatchResult struct and vadd_batch() method to keyspace with upfront NaN/inf validation and single memory check for the entire batch - add ShardRequest::VAddBatch and ShardResponse::VAddBatchResult - refactor to_aof_record → to_aof_records (returns Vec<AofRecord>) so VADD_BATCH can expand each applied vector into its own AofRecord::VAdd — no new AOF format needed

wire up VADD_BATCH through both sharded and concurrent mode code paths in connection.rs. response returns integer count of newly added elements, matching VADD's pattern.

- add VAddBatchEntry and VAddBatchRequest proto messages - add VAddBatch RPC returning IntResponse (count of added elements) - add to PipelineRequest oneof (field 72) - implement v_add_batch handler in grpc.rs with validation - regenerate go and python proto stubs

- add vadd_batch() method to python gRPC client - update bench-vector.py to send batches via VADD_BATCH (RESP) and vadd_batch (gRPC) instead of individual VADD calls - update bench-memory.sh vector helper to use VADD_BATCH - bump command count to 107 in README and bench README

when system python has the base deps but grpc mode forces a venv, the venv was missing numpy/redis. now installs all required deps alongside ember-py.

the protoc codegen produces `from ember.v1 import` but the package layout needs `from ember.proto.ember.v1 import`. the Makefile has a sed fixup for this but manual regen skipped it.

benchmarked on GCP c2-standard-8. VADD_BATCH improves insert throughput: RESP 963 → 1,483 vec/s (+54%), gRPC 1,009 → 2,374 vec/s (+135%). query throughput unchanged as expected.

* feat: add VADD_BATCH command parsing to protocol layer add VADD_BATCH command that accepts multiple vectors in a single command to reduce per-vector round-trip overhead for bulk inserts. RESP3 syntax: VADD_BATCH key DIM n elem1 f32... elem2 f32... [opts] the DIM keyword is required so the parser knows where each vector ends and the next element name begins. max batch size is 10,000. * feat: add VADD_BATCH to core engine and refactor AOF recording - add VAddBatchResult struct and vadd_batch() method to keyspace with upfront NaN/inf validation and single memory check for the entire batch - add ShardRequest::VAddBatch and ShardResponse::VAddBatchResult - refactor to_aof_record → to_aof_records (returns Vec<AofRecord>) so VADD_BATCH can expand each applied vector into its own AofRecord::VAdd — no new AOF format needed * feat: add VADD_BATCH dispatch to connection handler wire up VADD_BATCH through both sharded and concurrent mode code paths in connection.rs. response returns integer count of newly added elements, matching VADD's pattern. * feat: add VADD_BATCH gRPC RPC and regenerate client stubs - add VAddBatchEntry and VAddBatchRequest proto messages - add VAddBatch RPC returning IntResponse (count of added elements) - add to PipelineRequest oneof (field 72) - implement v_add_batch handler in grpc.rs with validation - regenerate go and python proto stubs * feat: update python client and benchmarks to use VADD_BATCH - add vadd_batch() method to python gRPC client - update bench-vector.py to send batches via VADD_BATCH (RESP) and vadd_batch (gRPC) instead of individual VADD calls - update bench-memory.sh vector helper to use VADD_BATCH - bump command count to 107 in README and bench README * fix: install base deps into venv when grpc benchmarks are requested when system python has the base deps but grpc mode forces a venv, the venv was missing numpy/redis. now installs all required deps alongside ember-py. * fix: correct import path in generated python grpc stubs the protoc codegen produces `from ember.v1 import` but the package layout needs `from ember.proto.ember.v1 import`. the Makefile has a sed fixup for this but manual regen skipped it. * docs: update vector benchmark results with VADD_BATCH numbers benchmarked on GCP c2-standard-8. VADD_BATCH improves insert throughput: RESP 963 → 1,483 vec/s (+54%), gRPC 1,009 → 2,374 vec/s (+135%). query throughput unchanged as expected.

kacy added 8 commits February 16, 2026 07:24

feat: add VADD_BATCH dispatch to connection handler

543d550

wire up VADD_BATCH through both sharded and concurrent mode code paths in connection.rs. response returns integer count of newly added elements, matching VADD's pattern.

fix: install base deps into venv when grpc benchmarks are requested

cb882a4

when system python has the base deps but grpc mode forces a venv, the venv was missing numpy/redis. now installs all required deps alongside ember-py.

fix: correct import path in generated python grpc stubs

3abd604

the protoc codegen produces `from ember.v1 import` but the package layout needs `from ember.proto.ember.v1 import`. the Makefile has a sed fixup for this but manual regen skipped it.

docs: update vector benchmark results with VADD_BATCH numbers

019bf48

benchmarked on GCP c2-standard-8. VADD_BATCH improves insert throughput: RESP 963 → 1,483 vec/s (+54%), gRPC 1,009 → 2,374 vec/s (+135%). query throughput unchanged as expected.

kacy merged commit f449325 into main Feb 16, 2026
6 of 7 checks passed

kacy deleted the feat/vadd-batch branch February 16, 2026 13:25

kacy mentioned this pull request Feb 16, 2026

fix: comprehensive security audit hardening #162

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add VADD_BATCH command for bulk vector inserts#161

feat: add VADD_BATCH command for bulk vector inserts#161
kacy merged 8 commits intomainfrom
feat/vadd-batch

kacy commented Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kacy commented Feb 16, 2026

summary

what was tested

design considerations

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant