fix: comprehensive security audit hardening#162
Merged
Conversation
add VADD_BATCH command that accepts multiple vectors in a single command to reduce per-vector round-trip overhead for bulk inserts. RESP3 syntax: VADD_BATCH key DIM n elem1 f32... elem2 f32... [opts] the DIM keyword is required so the parser knows where each vector ends and the next element name begins. max batch size is 10,000.
- add VAddBatchResult struct and vadd_batch() method to keyspace with upfront NaN/inf validation and single memory check for the entire batch - add ShardRequest::VAddBatch and ShardResponse::VAddBatchResult - refactor to_aof_record → to_aof_records (returns Vec<AofRecord>) so VADD_BATCH can expand each applied vector into its own AofRecord::VAdd — no new AOF format needed
wire up VADD_BATCH through both sharded and concurrent mode code paths in connection.rs. response returns integer count of newly added elements, matching VADD's pattern.
- add VAddBatchEntry and VAddBatchRequest proto messages - add VAddBatch RPC returning IntResponse (count of added elements) - add to PipelineRequest oneof (field 72) - implement v_add_batch handler in grpc.rs with validation - regenerate go and python proto stubs
- add vadd_batch() method to python gRPC client - update bench-vector.py to send batches via VADD_BATCH (RESP) and vadd_batch (gRPC) instead of individual VADD calls - update bench-memory.sh vector helper to use VADD_BATCH - bump command count to 107 in README and bench README
when system python has the base deps but grpc mode forces a venv, the venv was missing numpy/redis. now installs all required deps alongside ember-py.
the protoc codegen produces `from ember.v1 import` but the package layout needs `from ember.proto.ember.v1 import`. the Makefile has a sed fixup for this but manual regen skipped it.
benchmarked on GCP c2-standard-8. VADD_BATCH improves insert throughput: RESP 963 → 1,483 vec/s (+54%), gRPC 1,009 → 2,374 vec/s (+135%). query throughput unchanged as expected.
audit across all 6 crates: ember-protocol, emberkv-core, ember-server, ember-persistence, ember-cluster, emberkv-cli. 60 findings total (6 critical, 16 high, 30 medium, 28 low). categorized by impact x effort with estimated LOC for each fix.
- reject NaN/infinity in VADD, VADD_BATCH, and VSIM vector components - fix VADD_BATCH off-by-one: >= instead of > for batch size check - fix VADD_BATCH element name collision with flags by using dim-based entry detection instead of flag-name matching - cap SCAN COUNT at 10M to prevent unbounded allocation hints - add depth limit to Frame::serialize() to prevent stack overflow on deeply nested response frames - cap Vec pre-allocation in parser to 1024 entries to limit memory amplification from large array/map headers
- replace expect() in track_size with Option return to prevent shard panics on invariant violations - fix effective_limit overflow: use u128 intermediate to handle large max_bytes without precision loss - clamp TTL u64→i64 cast to i64::MAX in iter_entries to prevent snapshot corruption on extreme TTLs - assert shard_count <= u16::MAX in engine to prevent shard_id truncation - add PartialBatch error variant to VectorWriteError so partially applied VADD_BATCH vectors are persisted to AOF instead of lost
…currency - add gRPC authentication interceptor using constant-time comparison against requirepass. clients must send an `authorization` metadata header. - add MAX_PIPELINE_DEPTH (10,000) to cap frames parsed per read in both sharded and concurrent connection handlers, preventing unbounded memory growth from huge pipelines. - add concurrency_limit_per_connection(256) to tonic server builder in both run() and run_concurrent() to bound per-connection request load. - subscription and pattern length limits added to gRPC subscribe handler (from earlier work, included in this commit).
…idation gaps - make AOF truncation crash-safe by writing fresh header to a temp file then atomically renaming over the original, preventing data loss if the process crashes mid-truncation. - validate snapshot shard_id during recovery to detect misplaced or swapped snapshot files between shards. - add VADD dimension validation in read_payload_for_tag (encrypted AOF path) to match the existing check in from_bytes, preventing a crafted AOF from triggering a 16GB read loop. - zero encryption key bytes on drop using volatile writes to prevent key material from lingering in freed memory.
- filter AUTH commands from REPL history to prevent plaintext password storage in ~/.emberkv_history. - set history file permissions to 0600 (owner read/write only) after saving, regardless of the process umask. - sanitize server-supplied strings by stripping ANSI escape sequences and control characters before terminal output, preventing malicious servers from manipulating the user's terminal display.
…oning - change SlotRange::new from debug_assert to assert (enforced in release) - validate slot ranges in AssignSlots raft command (start <= end, < 16384) - validate slot in BeginMigration (must be < 16384) - require active migration before CompleteMigration succeeds - cap encoded collection lengths to prevent silent u16 truncation - reject gossip updates with incarnation > u64::MAX/2 to prevent refutation poisoning
resolve conflicts in command.rs (keep improved VADD_BATCH parser with NaN/inf rejection and entry-length detection) and keyspace.rs (keep PartialBatch error variant for AOF persistence of partial inserts). remove duplicate VAddBatch dispatch arm in shard.rs from auto-merge.
kacy
added a commit
that referenced
this pull request
Feb 19, 2026
* feat: add VADD_BATCH command parsing to protocol layer add VADD_BATCH command that accepts multiple vectors in a single command to reduce per-vector round-trip overhead for bulk inserts. RESP3 syntax: VADD_BATCH key DIM n elem1 f32... elem2 f32... [opts] the DIM keyword is required so the parser knows where each vector ends and the next element name begins. max batch size is 10,000. * feat: add VADD_BATCH to core engine and refactor AOF recording - add VAddBatchResult struct and vadd_batch() method to keyspace with upfront NaN/inf validation and single memory check for the entire batch - add ShardRequest::VAddBatch and ShardResponse::VAddBatchResult - refactor to_aof_record → to_aof_records (returns Vec<AofRecord>) so VADD_BATCH can expand each applied vector into its own AofRecord::VAdd — no new AOF format needed * feat: add VADD_BATCH dispatch to connection handler wire up VADD_BATCH through both sharded and concurrent mode code paths in connection.rs. response returns integer count of newly added elements, matching VADD's pattern. * feat: add VADD_BATCH gRPC RPC and regenerate client stubs - add VAddBatchEntry and VAddBatchRequest proto messages - add VAddBatch RPC returning IntResponse (count of added elements) - add to PipelineRequest oneof (field 72) - implement v_add_batch handler in grpc.rs with validation - regenerate go and python proto stubs * feat: update python client and benchmarks to use VADD_BATCH - add vadd_batch() method to python gRPC client - update bench-vector.py to send batches via VADD_BATCH (RESP) and vadd_batch (gRPC) instead of individual VADD calls - update bench-memory.sh vector helper to use VADD_BATCH - bump command count to 107 in README and bench README * fix: install base deps into venv when grpc benchmarks are requested when system python has the base deps but grpc mode forces a venv, the venv was missing numpy/redis. now installs all required deps alongside ember-py. * fix: correct import path in generated python grpc stubs the protoc codegen produces `from ember.v1 import` but the package layout needs `from ember.proto.ember.v1 import`. the Makefile has a sed fixup for this but manual regen skipped it. * docs: update vector benchmark results with VADD_BATCH numbers benchmarked on GCP c2-standard-8. VADD_BATCH improves insert throughput: RESP 963 → 1,483 vec/s (+54%), gRPC 1,009 → 2,374 vec/s (+135%). query throughput unchanged as expected. * style: fix formatting in VADD_BATCH protocol tests * docs: comprehensive security audit report audit across all 6 crates: ember-protocol, emberkv-core, ember-server, ember-persistence, ember-cluster, emberkv-cli. 60 findings total (6 critical, 16 high, 30 medium, 28 low). categorized by impact x effort with estimated LOC for each fix. * fix: harden ember-protocol against input validation and resource issues - reject NaN/infinity in VADD, VADD_BATCH, and VSIM vector components - fix VADD_BATCH off-by-one: >= instead of > for batch size check - fix VADD_BATCH element name collision with flags by using dim-based entry detection instead of flag-name matching - cap SCAN COUNT at 10M to prevent unbounded allocation hints - add depth limit to Frame::serialize() to prevent stack overflow on deeply nested response frames - cap Vec pre-allocation in parser to 1024 entries to limit memory amplification from large array/map headers * fix: harden emberkv-core against panics, overflows, and data loss - replace expect() in track_size with Option return to prevent shard panics on invariant violations - fix effective_limit overflow: use u128 intermediate to handle large max_bytes without precision loss - clamp TTL u64→i64 cast to i64::MAX in iter_entries to prevent snapshot corruption on extreme TTLs - assert shard_count <= u16::MAX in engine to prevent shard_id truncation - add PartialBatch error variant to VectorWriteError so partially applied VADD_BATCH vectors are persisted to AOF instead of lost * fix: harden ember-server against auth bypass, pipeline abuse, and concurrency - add gRPC authentication interceptor using constant-time comparison against requirepass. clients must send an `authorization` metadata header. - add MAX_PIPELINE_DEPTH (10,000) to cap frames parsed per read in both sharded and concurrent connection handlers, preventing unbounded memory growth from huge pipelines. - add concurrency_limit_per_connection(256) to tonic server builder in both run() and run_concurrent() to bound per-connection request load. - subscription and pattern length limits added to gRPC subscribe handler (from earlier work, included in this commit). * fix: harden ember-persistence against crash-unsafe truncation and validation gaps - make AOF truncation crash-safe by writing fresh header to a temp file then atomically renaming over the original, preventing data loss if the process crashes mid-truncation. - validate snapshot shard_id during recovery to detect misplaced or swapped snapshot files between shards. - add VADD dimension validation in read_payload_for_tag (encrypted AOF path) to match the existing check in from_bytes, preventing a crafted AOF from triggering a 16GB read loop. - zero encryption key bytes on drop using volatile writes to prevent key material from lingering in freed memory. * fix: harden emberkv-cli against credential leaks and terminal injection - filter AUTH commands from REPL history to prevent plaintext password storage in ~/.emberkv_history. - set history file permissions to 0600 (owner read/write only) after saving, regardless of the process umask. - sanitize server-supplied strings by stripping ANSI escape sequences and control characters before terminal output, preventing malicious servers from manipulating the user's terminal display. * fix: harden ember-cluster against state machine abuse and gossip poisoning - change SlotRange::new from debug_assert to assert (enforced in release) - validate slot ranges in AssignSlots raft command (start <= end, < 16384) - validate slot in BeginMigration (must be < 16384) - require active migration before CompleteMigration succeeds - cap encoded collection lengths to prevent silent u16 truncation - reject gossip updates with incarnation > u64::MAX/2 to prevent refutation poisoning * remove security audit report from branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
summary
comprehensive security audit across all 6 crates, addressing 45+ findings categorized by impact and effort. each crate has an atomic commit with focused fixes. depends on #161 (VADD_BATCH).
ember-protocol
emberkv-core
[]indexing with.get()in keyspace helperschecked_mul/saturating_mulfor memory estimatesember-server
authorizationmetadata when--requirepassis setember-persistence
write_volatileemberkv-cli
ember-cluster
SlotRange::newfromdebug_asserttoassert(enforced in release)what was tested
cargo test --workspace --features vector,encryption— all tests pass (351 pass, 1 pre-existing failure inentry_overhead_not_too_small)cargo clippy --workspace -- -D warnings— cleancargo fmt --all -- --check— cleandesign considerations
u64::MAX / 2to leave ample headroom for legitimate use while preventing a single malicious gossip message from permanently disabling suspicion refutationsubtle::ConstantTimeEqto prevent timing side-channels on password comparison