v0.1.3 Production Readiness — Phases 87-91#64
Conversation
|
Important Review skippedToo many files! This PR contains 298 files, which is 148 over the limit of 150. ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (298)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Review Summary by Qodov0.1.3 Production Readiness — Phases 87-91: Correctness Hardening, Toolchain Modernization, and Code Hygiene Refactoring
WalkthroughsDescription**v0.1.3 Production Readiness — Phases 87-91**: Comprehensive production hardening milestone with correctness improvements, toolchain modernization, and major code hygiene refactoring. **Phase 87: Production Contract** • Published docs/PRODUCTION-CONTRACT.md with v1.0 promises: per-command-class SLOs, supported platform matrix, durability semantics, and 49-item GA Exit Criteria checklist **Phase 88: Rust 1.85 → 1.94.0 Upgrade** • MSRV bumped with rust-toolchain.toml committed for reproducible builds • All CI workflows pinned to dtolnay/rust-toolchain@1.94.0 • 16 clippy::manual_is_multiple_of lint fixes across 10 files **Phase 89: Correctness Hardening — Fuzz & Loom** • 7 cargo-fuzz targets: resp_parse, resp_parse_differential, inline_parse, wal_v3_record, rdb_load, gossip_deser, acl_rule • Loom model tests for ResponseSlot fill/poll/reset state machine under all thread interleavings • **Critical bug fix**: RESP3 Set/Push/Map handlers in parse_frame_zerocopy now validate for -1 (null) and negative counts before capacity operations, preventing overflow crashes. 4 regression tests added. • Fuzz CI workflow (.github/workflows/fuzz.yml): 15 min/target on PR, 6h nightly **Phase 90: Unsafe Audit & Panic Eradication** • 156/156 unsafe blocks now have // SAFETY: comments explaining invariants, CI-enforced by scripts/audit-unsafe.sh • Published docs/security/unsafe-audit.md with 7 categories of unsafe usage, risks, and mitigations • All hot-path unwrap()/expect() calls annotated with #[allow(clippy::unwrap_used)] + justification • Ratchet script (scripts/audit-unwrap.sh) prevents new unannotated unwraps **Phase 91: Code Hygiene — File Splits (8 oversize files → modular subfiles)** • sorted_set.rs (3092 → mod 1347 + read 1270 + write 518): Dual-structure consistency helpers, comprehensive range query utilities • stream.rs (2028 → mod 694 + read 533 + write 827): Consumer group management, XADD/XTRIM/XGROUP/XREADGROUP/XCLAIM implementations • string.rs (1867 → mod 871 + read 376 + write 643): Numeric parsing/formatting helpers, SET/GET/INCR/APPEND operations • set.rs (1618 → mod 529 + read 742 + write 404): Set operations with glob matching and collection helpers • list.rs (1521 → mod 486 + read 609 + write 527): Index resolution, LPUSH/RPUSH/LRANGE/LMOVE operations • hash.rs (1353 → mod 475 + read 549 + write 345): Field operations and scan functionality • db.rs (1655 → 1419 + db_read 243): Ref enum extraction with public re-exports maintaining API compatibility • bptree.rs (1667 → 1399 + bptree_iter 343): Internal API exposed for iterator implementation • hnsw/graph.rs (1555 → 1220 + graph_serde 381): Serialization logic extracted • Zero public API changes — all re-exports maintain same import paths • 1881/1881 unit tests pass (4 new parser regression tests added) Diagramflowchart LR
A["Production Contract<br/>Phase 87"] --> B["Rust 1.94.0<br/>Phase 88"]
B --> C["Fuzz & Loom<br/>Phase 89"]
C --> D["Bug Fix:<br/>RESP3 Validation"]
C --> E["Unsafe Audit<br/>Phase 90"]
E --> F["156 SAFETY Comments<br/>Panic Eradication"]
F --> G["Code Hygiene<br/>Phase 91"]
G --> H["8 Files Split<br/>≤1500 lines each"]
H --> I["1881 Tests Pass<br/>Zero API Changes"]
File Changes1. src/command/sorted_set/mod.rs
|
Code Review by Qodo
1.
|
1. hash_read.rs: Vec::new() → Vec::with_capacity() in hscan; pos.to_string() → itoa::Buffer for cursor serialization (hot-path allocation rule) 2. hash_write.rs: new_value.to_string() → itoa::Buffer in HINCRBY (hot-path allocation rule). format_float() kept as-is — float formatting inherently requires allocation. 3. set_write.rs: try_parse_i64().unwrap() → let-else continue; choose().unwrap() → let-else with Frame::Null fallback; get_or_create_set().unwrap() → let-else with Frame::Null fallback 4. set_read.rs: choose().unwrap() → let-else with Frame::Null/break in srandmember (both single and duplicate-allowed paths, both mutable and readonly variants) 5. parse.rs parse_frame_zerocopy: unwraps retained with #[allow] — post-validation guarantees; fuzzing covers divergence risk 6. parse.rs RESP3 null map bug: validate_frame now allows %-1 as null map (was rejecting all negative map counts, making the null-map handler in parse_frame_zerocopy unreachable). Also fixed in parse_single_frame_zc. Added test_resp3_null_map and test_resp3_negative_map_count regression tests. 48/48 parser tests pass, clippy clean both feature sets.
1. hash_read.rs: Vec::new() → Vec::with_capacity() in hscan; pos.to_string() → itoa::Buffer for cursor serialization (hot-path allocation rule) 2. hash_write.rs: new_value.to_string() → itoa::Buffer in HINCRBY (hot-path allocation rule). format_float() kept as-is — float formatting inherently requires allocation. 3. set_write.rs: try_parse_i64().unwrap() → let-else continue; choose().unwrap() → let-else with Frame::Null fallback; get_or_create_set().unwrap() → let-else with Frame::Null fallback 4. set_read.rs: choose().unwrap() → let-else with Frame::Null/break in srandmember (both single and duplicate-allowed paths, both mutable and readonly variants) 5. parse.rs parse_frame_zerocopy: unwraps retained with #[allow] — post-validation guarantees; fuzzing covers divergence risk 6. parse.rs RESP3 null map bug: validate_frame now allows %-1 as null map (was rejecting all negative map counts, making the null-map handler in parse_frame_zerocopy unreachable). Also fixed in parse_single_frame_zc. Added test_resp3_null_map and test_resp3_negative_map_count regression tests. 48/48 parser tests pass, clippy clean both feature sets.
8a9e6ff to
aae1918
Compare
Moon v1.0 promises landed as docs/PRODUCTION-CONTRACT.md — the single source of truth every v0.1.3 hardening phase tests against. Sections: - Supported Platforms (Tier 1 Linux aarch64, Tier 2 x86_64, Tier 3 CI) - Performance SLOs per command class (provisional until Phase 97) - Durability Modes (appendfsync × crash-class RPO/RTO matrix) - Availability & Replication Guarantees - Security Guarantees - Out of Scope with per-item rationale - GA Exit Criteria Checklist (49/49 Active REQ-IDs traced) Also: - CHANGELOG [Unreleased] entry documenting the publication - README Production Readiness section pointing at the contract - docs/runbooks/ stub directory for Phase 99 (REL-05) Closes CONTRACT-01 (Phase 87 of milestone v0.1.3).
- Cargo.toml rust-version = "1.94" - New rust-toolchain.toml pinning channel 1.94.0 (auto-install on clone) - CI workflows: dtolnay/rust-toolchain@stable → @1.94.0 (6 swaps) - MSRV job renamed from (1.85) to (1.94) - CLAUDE.md: all 1.85 references → 1.94 - CHANGELOG [Unreleased] documents the bump Closes RUST-01 (Phase 88 of milestone v0.1.3). CI verification deferred to next push — file edits only.
Rust 1.94 introduced clippy::manual_is_multiple_of lint. All 16 call
sites across 10 files auto-fixed via `cargo clippy --fix`:
- command/{stream,string,hash,config,sorted_set}.rs
- persistence/auto_save.rs, shard/coordinator.rs
- vector/{turbo_quant/encoder,diskann/pq}.rs
- storage/tiered/cold_tier.rs
Verification (on moon-dev VM, Rust 1.94.0):
- cargo clippy -- -D warnings: clean (default + tokio features)
- 1877/1877 unit tests pass (lib-only)
- 33 integration test failures are pre-existing (server command
dispatch under tokio runtime, not toolchain-related)
Part of Phase 88, RUST-01.
…crash
Phase 89 (Correctness Hardening — Fuzz & Loom) infrastructure:
Fuzz targets (7 targets, cargo +nightly fuzz):
- resp_parse: RESP2/RESP3 parser with bounded config
- resp_parse_differential: same input → two identical parses invariant
- inline_parse: telnet-style command parser
- wal_v3_record: WAL v3 record decoder (CRC32C, LZ4, truncation)
- rdb_load: RDB snapshot loader (magic, type tags, checksums)
- gossip_deser: cluster bus gossip message + roundtrip
- acl_rule: ACL rule string parser (all prefix paths)
Loom model tests:
- ResponseSlot fill/take state machine under all interleavings
- Concurrent fill + take race (loom exhaustive)
- Fill → take → refill cycle (reusability)
- Non-loom std-thread smoke tests for CI without loom feature
CI:
- .github/workflows/fuzz.yml: 15m per target on PR, 6h nightly
Bug fix (found by fuzzer in 10 seconds):
- RESP3 Set (~), Push (>), Map (%) handlers in parse_frame_zerocopy
did `atoi().unwrap() as usize` without checking for -1 (null) or
negative counts → capacity overflow in FrameVec::with_capacity.
- Fixed: null check (-1 → Frame::Null), negative check (< 0 rejected),
capacity clamped to max_array_length.
- Added regression tests: test_fuzz_crash_resp3_set_negative_count,
test_resp3_null_set, test_resp3_null_push, test_resp3_negative_set_count.
- 46/46 parser tests pass.
Known: second fuzz crash pending investigation (validation/zerocopy
position divergence on bare LF without CR). Tracked for gap closure.
Closes FUZZ-01 (infrastructure), LOOM-01 (infrastructure), SEC-08 (ACL
fuzz target). 24h clean run deferred to CI wall-clock.
Every unsafe block in src/ now has a preceding // SAFETY: comment explaining the invariant that makes the operation sound. Categories: - UnsafeCell access guarded by atomic state machine (response_slot.rs) - Tagged-pointer dereference after tag verification (compact_value.rs, compact_key.rs) - MaybeUninit access after bounds-checked hash probing (dashtable/segment.rs) - SIMD intrinsics behind target_feature gates (dashtable/simd.rs, avx512.rs) - io_uring SQE submission with registered FDs/buffers (uring_driver.rs, buf_ring.rs) - OwnedFd::from_raw_fd after dup/accept (conn_accept.rs, event_loop.rs, listener.rs) - Pointer arithmetic in HNSW TQ-ADC search (hnsw/search.rs) - mmap with validated file descriptor (sealed_mmap.rs) Added scripts/audit-unsafe.sh — CI enforcer that fails on any unsafe block without a SAFETY comment within 3 preceding lines. Result: 156/156 blocks annotated, 0 violations. Part of UNSAFE-01, SEC-05 (Phase 90 of v0.1.3).
…ates PANIC-01 partial: annotated all unwrap/expect in hot-path modules with #[allow(clippy::unwrap_used)] + justification comments. Categories: - Post-validation (parse_frame_zerocopy: 20 calls) - Atomic-invariant (response_slot: 2 calls) - RwLock poison (handlers: ~15 calls) - Control-flow-guarded Option (pubsub_rx/tx, pending: ~10 calls) - Insert-then-get (db.rs: ~15 calls) - Startup/init (listener, metadata: ~5 calls) SEC-05: docs/security/unsafe-audit.md published — categories, invariants, risks, enforcement, and v1.0 recommendations. CI enforcement: - scripts/audit-unsafe.sh: fails on unsafe blocks without SAFETY comment - scripts/audit-unwrap.sh: ratchet (baseline=35, prevents increase) - .github/workflows/ci.yml: safety-audit job runs both scripts Closes UNSAFE-01, PANIC-01 (annotation), SEC-05. Remaining: 35 un-annotated unwraps in src/command/ to fix in Phase 91.
HYGIENE-01: Split command and storage files exceeding 1500-line rule: Command modules (converted to directory modules with read/write splits): - sorted_set.rs (3092 → mod 1347 + read 1270 + write 518) - stream.rs (2028 → mod 694 + read 533 + write 827) - string.rs (1867 → mod 871 + read 376 + write 643) - set.rs (1618 → mod 529 + read 742 + write 404) - list.rs (1521 → mod 486 + read 609 + write 527) - hash.rs (1353 → mod 475 + read 549 + write 345) [proactive] Storage modules (kept as files with extracted submodules): - db.rs (1655 → 1419 + db_read.rs 243) - bptree.rs (1667 → 1399 + bptree_iter.rs 343) All split files individually ≤ 1500 lines. Re-exports maintain same public API paths — no downstream import changes needed. Remaining > 1500 (deferred to HYGIENE-02 handler unification): - handler_monoio.rs (2005) - handler_sharded.rs (1650) - event_loop.rs (1505) Verification: - cargo clippy -D warnings: clean (both feature sets) - 1881/1881 unit tests pass - Zero public API changes Closes HYGIENE-01 (8/10 files, remaining 3 are handler unification scope).
1. hash_read.rs: Vec::new() → Vec::with_capacity() in hscan; pos.to_string() → itoa::Buffer for cursor serialization (hot-path allocation rule) 2. hash_write.rs: new_value.to_string() → itoa::Buffer in HINCRBY (hot-path allocation rule). format_float() kept as-is — float formatting inherently requires allocation. 3. set_write.rs: try_parse_i64().unwrap() → let-else continue; choose().unwrap() → let-else with Frame::Null fallback; get_or_create_set().unwrap() → let-else with Frame::Null fallback 4. set_read.rs: choose().unwrap() → let-else with Frame::Null/break in srandmember (both single and duplicate-allowed paths, both mutable and readonly variants) 5. parse.rs parse_frame_zerocopy: unwraps retained with #[allow] — post-validation guarantees; fuzzing covers divergence risk 6. parse.rs RESP3 null map bug: validate_frame now allows %-1 as null map (was rejecting all negative map counts, making the null-map handler in parse_frame_zerocopy unreachable). Also fixed in parse_single_frame_zc. Added test_resp3_null_map and test_resp3_negative_map_count regression tests. 48/48 parser tests pass, clippy clean both feature sets.
Fuzz CI: - Remove rust-toolchain.toml before fuzzing so nightly compiler is used (-Zsanitizer=address requires nightly, rust-toolchain.toml was overriding +nightly to stable 1.94.0) - Use explicit cargo +nightly for fuzz run audit-unwrap.sh: - Fix set -e crash: grep for #[cfg(test)] now has || true to prevent early exit on files without test modules (split submodules) - Baseline corrected from 22 to 98 (true count after script bug fix; includes function-level #[allow] not detected by line-level grep and split submodule files without #[cfg(test)])
…ash #2) Replaced every .unwrap() in parse_frame_zerocopy with defensive fallbacks (return Frame::Null) using crlf_or_null!/atoi_or_null!/ parse_count! macros. The function no longer panics on ANY input. Root cause: validation pass and zerocopy pass could diverge on position tracking when input contains bare \n without \r (e.g. *5\n\r\n). Validation would advance pos differently than zerocopy, causing atoi to receive garbage bytes and panic. Fix approach: instead of trying to keep two passes in perfect sync (fragile), make the zerocopy pass self-defending — any parse failure returns Frame::Null rather than crashing. This is correct because: - Valid RESP never produces Null where a real frame is expected - The server treats unexpected Null as a protocol error - No server crash from malformed client input Verification: - All 3 prior crash artifacts execute clean (0 crashes) - 1.6M fuzz iterations in 60s with 0 crashes - 48/48 parser tests pass - cargo clippy -D warnings clean (both feature sets)
CONTRIBUTING.md — contributor guide covering: - Golden rules (no crash on client input, no hot-path alloc, SAFETY comments) - PR checklist, code conventions, error handling contract - Unwrap policy with allowed/forbidden examples - Hot-path allocation rules with allowed/forbidden patterns - Module split convention (directory modules with re-exports) - Unsafe code contract (approval, SAFETY, audit categories) - Fuzzing contract (every deserializer needs a fuzz target) - Lock-free data structure contract (loom models required) - CI pipeline table with blocking status docs/API-CONTRACT.md — API stability tiers: - Tier 1 Stable: RESP protocol, commands, RDB/WAL formats, CLI flags, ACL syntax - Tier 2 Versioned: on-disk formats with MOON_FORMAT_VERSION - Tier 3 Internal: ShardMessage, ResponseSlot, DashTable, CompactValue - Change process for new commands, fuzz targets, format modifications - Crate public API inventory (fuzz/test/bench consumers) - SemVer policy with data-store semantics CLAUDE.md updates: - Parser defensiveness rule (Frame::Null on failure, never panic) - #[allow(clippy::unwrap_used)] annotation convention - File split convention (directory modules) - Fuzzing and loom test requirements for new code - Module structure rules (crate:: imports, tests in mod.rs) - CI section updated with safety audit + fuzz jobs
39 packages updated via cargo update: - tokio 1.50.0 → 1.51.1 - libc 0.2.183 → 0.2.184 - redis 1.1.0 → 1.2.0 (dev-dep) - aws-lc-sys 0.39.0 → 0.39.1 - arc-swap 1.9.0 → 1.9.1 - zerocopy 0.8.47 → 0.8.48 - icu_* 2.1.x → 2.2.0 - and 32 other transitive dependencies Verified: clippy clean (both features), 1883/1883 tests pass.
Rust 1.94.0 → 1.94.1 (security + bugfix patch): - rust-toolchain.toml, CLAUDE.md, all CI workflows updated - OrbStack provisioning commands updated Deps: 39+ packages updated via cargo update (tokio 1.51.1, libc 0.2.184, redis 1.2.0, aws-lc-sys 0.39.1, zerocopy 0.8.48, icu 2.2.0, zerovec 0.11.6, etc.) Verified on moon-dev VM (Rust 1.94.1): - cargo clippy -D warnings: clean (both features) - 1883/1883 unit tests pass
aae1918 to
002d68e
Compare
- PR: 3 critical targets only (resp_parse, resp_parse_differential, wal_v3_record) × 5 min instead of 6 targets × 15 min - Multi-process fuzzing (-fork=2) for 2× throughput - Nightly corpus seeds PR runs via artifact download - Shared cache key between PR and nightly jobs - Nightly unchanged: all 6 targets × 6h with corpus archival Lower-risk targets (inline_parse, gossip_deser, acl_rule) run nightly only.
Summary
First batch of the v0.1.3 Production Readiness milestone (Phases 87-91). Focuses on correctness hardening, toolchain modernization, and code hygiene — no new user-facing features except the Production Contract document.
Phase 87: Production Contract
docs/PRODUCTION-CONTRACT.md— Moon's v1.0 promises: per-command-class SLOs (provisional), supported platform matrix (Linux aarch64 primary, x86_64 secondary), durability mode semantics, and a 49-item GA Exit Criteria checklist that Phases 88-100 tick off.docs/runbooks/stub directory for Phase 99.Phase 88: Rust 1.85 → 1.94.0
Cargo.toml,rust-toolchain.tomlcommitted for reproducible builds.dtolnay/rust-toolchain@1.94.0.CLAUDE.mdOrbStack provisioning updated.clippy::manual_is_multiple_oflint fixes across 10 files.Phase 89: Correctness Hardening — Fuzz & Loom
resp_parse,resp_parse_differential,inline_parse,wal_v3_record,rdb_load,gossip_deser,acl_rule.ResponseSlotfill/poll/reset state machine under all thread interleavings..github/workflows/fuzz.yml): 15 min/target on PR, 6h nightly.~), Push (>), Map (%) handlers inparse_frame_zerocopydidatoi().unwrap() as usizewithout checking for-1(null) or negative counts → capacity overflow crash. Fixed with null check + negative guard + capacity clamp. Regression tests added.unwrap()fromparse_frame_zerocopy— replaced with safe.unwrap_or/saturatingpatterns. Second fuzz crash (bare LF without CR causing position divergence) resolved.Phase 90: Unsafe Audit & Panic Eradication
// SAFETY:comments explaining invariants. CI-enforced byscripts/audit-unsafe.sh.docs/security/unsafe-audit.md— 7 categories of unsafe usage with risks and mitigations.unwrap()/expect()calls annotated with#[allow(clippy::unwrap_used)]+ justification. Ratchet script (scripts/audit-unwrap.sh) prevents new unannotated unwraps.ci.yml.Phase 91: Code Hygiene — File Splits
sorted_set.rs(3092 → mod 1347 + read 1270 + write 518)stream.rs(2028 → mod 694 + read 533 + write 827)string.rs(1867 → mod 871 + read 376 + write 643)set.rs(1618 → mod 529 + read 742 + write 404)list.rs(1521 → mod 486 + read 609 + write 527)hash.rs(1353 → mod 475 + read 549 + write 345)db.rs(1655 → 1469 + db_read 243) — multi-part-aof'sclear()/insert_for_load()/recalculate_memory()/reserve()correctly placed in maindb.rs(write ops),db_read.rsholds read-only Ref enumsbptree.rs(1667 → 1399 + bptree_iter 343)hnsw/graph.rs(1555 → 1220 + graph_serde 381)Additional fixes
CONTRIBUTING.mdandAPI-CONTRACT.mdaddedVerification
cargo clippy -- -D warnings: clean (both default and tokio features)cargo clippy --no-default-features --features runtime-tokio,jemalloc -- -D warnings: cleanscripts/audit-unsafe.sh: 156/156 SAFETY comments, 0 violationsfeat/multi-part-aofHEAD with 0 conflictsTest plan
db.rsbulk-load methods (clear,insert_for_load,reserve,recalculate_memory) are in correct file with doc comments + debug_assert