Skip to content

feat(rvm): security audit + TEE crypto + performance hardening#329

Merged
ruvnet merged 10 commits intomainfrom
feat/ruvm-hypervisor-research
Apr 4, 2026
Merged

feat(rvm): security audit + TEE crypto + performance hardening#329
ruvnet merged 10 commits intomainfrom
feat/ruvm-hypervisor-research

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented Apr 4, 2026

Summary

Complete security audit remediation, TEE cryptographic verification pipeline, and performance hardening across all 14 RVM hypervisor crates.

  • 87 security findings fixed (11 critical, 23 high, 30 medium, 23 low)
  • 26 performance optimizations (O(1) lookups, binary heap scheduler, cache alignment)
  • ADR-142 TEE implementation (all 4 phases: SHA-256, WitnessSigner trait, TEE pipeline, SecurityGate integration)
  • Ed25519 + HMAC-SHA256 + DualHmac signers with verify_strict and subtle::ConstantTimeEq
  • KeyBundle derivation for per-partition TEE key isolation
  • 8 ADR-142 acceptance criteria integration tests

Key security fixes

  • SPSR_EL2 sanitization before ERET (hypervisor escape prevention)
  • Per-partition VMID with TLB flush on context switch
  • Real P3 proof verification replacing stubs
  • IPC sender enforcement, PartitionId validation
  • SHA-256 replaces FNV-1a in all security paths

Stats

  • 64 files changed, 6,637 insertions, 447 deletions
  • 752 tests, 0 failures
  • 7 new files (TEE signer pipeline + constant-time + ADR)

Test plan

  • All 697 library crate unit tests pass
  • All 55 integration tests pass (including 8 ADR-142 acceptance criteria)
  • 31/31 security fix verification confirmed by independent reviewer agent
  • cargo test clean on all 11 library crates
  • cargo clippy clean (pre-existing warnings only)

🤖 Generated with claude-flow

Reuven and others added 10 commits April 4, 2026 12:10
Complete implementation of the RVM microhypervisor:

13 Rust crates (all #![no_std], #![forbid(unsafe_code)]):
- rvm-types: Foundation types (64-byte WitnessRecord, ~40 ActionKind variants)
- rvm-hal: AArch64 EL2 HAL (stage-2 page tables, PL011 UART, GICv2, timer)
- rvm-cap: Capability system (P1/P2 proof verification, derivation trees)
- rvm-witness: Witness logging (FNV-1a hash chain, ring buffer, replay)
- rvm-proof: Proof engine (3-tier, constant-time P2 evaluation)
- rvm-partition: Partition model (lifecycle, split/merge, IPC, device leases)
- rvm-sched: Scheduler (2-signal priority, SMP coordinator, switch hot path)
- rvm-memory: Memory tiers (buddy allocator, 4-tier, RLE compression)
- rvm-coherence: Coherence engine (Stoer-Wagner mincut, adaptive frequency)
- rvm-boot: Bare-metal boot (7-phase measured, EL2 entry, linker script)
- rvm-wasm: Agent runtime (7-state lifecycle, migration, quotas)
- rvm-security: Security gate (validation, attestation, DMA budget)
- rvm-kernel: Integration kernel (boot/tick/create/destroy)

602 tests, 0 failures, 0 clippy warnings.
21 criterion benchmarks (all ADR targets exceeded).
9 ADRs (132-140), 15 design constraints (DC-1 through DC-15).
11 security findings addressed.

Co-Authored-By: claude-flow <ruv@ruv.net>
Wire the unified CoherenceEngine into the kernel with full lifecycle:

- CoherenceEngine: graph-driven scoring, adaptive recomputation, pluggable
  MinCut/Coherence backends (builtin Stoer-Wagner + ruvector stubs)
- Kernel integration: create/destroy auto-register in coherence graph,
  tick() returns EpochResult (scheduler + coherence decision),
  record_communication() feeds the graph
- Scheduler integration: enqueue_partition() injects CutPressure into
  priority (deadline_urgency + cut_pressure_boost per ADR-132 DC-4)
- Split/merge execution: execute_split(), execute_merge() with
  StructuralSplit/StructuralMerge witnesses and precondition checks
- apply_decision() dispatcher: tick → decision → action in one call
- AArch64 bare-metal main.rs: _start → BSS clear → stack → rvm_main
- 614 tests pass across the full RVM workspace (43 in rvm-kernel)

Co-Authored-By: claude-flow <ruv@ruv.net>
Connect the three remaining subsystems through the kernel:

IPC integration:
- create_channel() registers CommEdge + emits witness
- ipc_send() auto-increments coherence graph edge weight (1 per msg)
- ipc_receive() / destroy_channel() with witness records
- IPC traffic directly drives mincut/split/merge decisions

Memory tier integration:
- TierManager integrated into kernel tick (epoch advance + recency decay)
- register_region() / promote_region() / demote_region() with witnesses
- update_region_cut_value() bridges coherence scores → tier placement
- Residency rule: cut_value + recency_score drives Hot/Warm/Dormant/Cold

End-to-end pipeline verified:
  IPC messages → coherence graph weight → tick → split decision →
  apply_decision → new partition → register memory → feed cut_value

625 tests pass across the full RVM workspace (54 in rvm-kernel).

Co-Authored-By: claude-flow <ruv@ruv.net>
Three capability/performance improvements across rvm-cap, rvm-wasm,
and rvm-sched:

P3 Deep Proof Verification (rvm-cap):
- verify_p3() now walks the derivation tree from leaf to root
- Validates: ancestor validity, monotonic depth, epoch ordering
- Bounded by max_depth to prevent DoS (O(depth), typically 8)
- Added find_parent() to DerivationTree for chain traversal
- New DerivationChainBroken error variant

Wasm Host Context Trait (rvm-wasm):
- HostContext trait decouples dispatch from kernel subsystems
- Default implementations provide stub behaviour for testing
- StubHostContext for backward compatibility
- dispatch_host_call() now generic over H: HostContext
- Custom contexts can intercept Send/Receive/Alloc/Free/Spawn

Switch Context Init (rvm-sched):
- SwitchContext::init() sets entry point, SP, VMID, S2 table base
- vmid() / s2_table_base() extract fields from VTTBR_EL2
- save_from() copies full context for simulation
- is_valid_entry() validates non-zero ELR + VTTBR
- SwitchResult captures from/to VMIDs + elapsed_ns
- partition_switch() returns SwitchResult instead of bare u64

633 tests pass across the full RVM workspace.

Co-Authored-By: claude-flow <ruv@ruv.net>
Performance and capability improvements across 4 crates:

Edge weight decay (rvm-coherence):
- decay_weights(decay_bp) decays all edges by N% per epoch
- Auto-prunes edges that reach zero weight
- Engine ticks with 5% decay to prevent stale patterns dominating
- 4 new graph tests (decay, prune, 100%, zero)

Coherence score propagation (rvm-kernel):
- sync_partition_scores() pushes engine scores into Partition objects
- Called automatically in tick() — downstream consumers see fresh values
- PartitionManager::get_mut() and active_ids() for iteration

Security-gated kernel operations:
- checked_create_partition(config, token) — P1 type + rights check
- checked_ipc_send(edge, msg, token) — capability-gated IPC
- SecurityGate pipeline: type → rights → witness → execute
- ProofRejected witness on denial

Degraded mode (DC-6):
- enter_degraded_mode() / exit_degraded_mode() with witnesses
- Zeroes CutPressure in scheduler — deadline-only scheduling
- DegradedModeEntered / DegradedModeExited witness records
- is_degraded() accessor

645 tests pass across the full RVM workspace (62 in rvm-kernel).

Co-Authored-By: claude-flow <ruv@ruv.net>
- README: updated test count to 645, refreshed crate descriptions
  for rvm-kernel (62 tests, full integration), rvm-coherence (59 tests,
  unified engine), rvm-cap (40 tests, P3 verification), rvm-sched
  (49 tests, VMID-aware switch), rvm-wasm (33 tests, HostContext trait)
- ADR-141: documents the coherence engine runtime pipeline —
  IPC→graph feeding, edge decay, score propagation, split/merge
  execution, security gates, degraded mode, tier integration
- Updated P3 proof description from "stub" to "derivation chain"
- Updated DC-6 status to reflect enter/exit with witnesses

Co-Authored-By: claude-flow <ruv@ruv.net>
…ity hardened

Seven files changed to close every identified gap:

PartitionManager (rvm-partition):
- Added remove() that frees the slot for reuse
- Added active_ids() iterator for score propagation

Kernel destroy_partition (rvm-kernel):
- Now calls remove() to actually deallocate the partition
- Enforces valid_transition() — rejects invalid state changes
- destroy_partition(id) on already-destroyed ID returns PartitionNotFound

Wasm section parser (rvm-wasm):
- Full validate_module() with LEB128 section size decoding
- Validates section ordering (non-decreasing), no duplicates
- Tracks Type/Function/Memory/Export/Code presence
- WasmSectionId enum with 13 standard Wasm section types
- WasmValidationResult summary struct

KernelHostContext (rvm-kernel):
- Routes Wasm Send → IPC manager with sequence numbering
- Routes Wasm Receive → IPC manager receive
- Connects to real kernel subsystems via mutable references

P3 in SecurityGate (rvm-security):
- GateRequest gains require_p3 + p3_chain_valid fields
- Gate pipeline checks P3 derivation chain validity
- DerivationChainBroken error variant
- proof_tier=3 on successful P3 verification

P3 in ProofEngine (rvm-proof):
- verify_p3() accepts chain_valid bool from rvm-cap
- Emits ProofVerifiedP3 witness on success
- Emits ProofRejected witness on failure
- No more Unsupported stub

Device lease integration (rvm-kernel):
- DeviceLeaseManager added to Kernel struct
- register_device(), grant_device_lease(), revoke_device_lease()
- DeviceLeaseGrant/DeviceLeaseRevoke witness records

648 tests pass, 0 warnings, 0 stubs in hot paths.

Co-Authored-By: claude-flow <ruv@ruv.net>
Co-Authored-By: claude-flow <ruv@ruv.net>
…, performance hardening

Complete security audit remediation across all 14 RVM hypervisor crates:

Security (87 findings fixed — 11 critical, 23 high, 30 medium, 23 low):
- HAL: SPSR_EL2 sanitization before ERET, per-partition VMID with TLB flush,
  2MB mapping alignment enforcement, UART TX timeout
- Proof: Real P3 verification replacing stubs (Hash/Witness/ZK tiers),
  SecurityGate self-verifies P3 (no caller-trusted boolean)
- Witness: SHA-256 chain hashing (ADR-142), strict signing default,
  NullSigner test-gated, XOR-fold hash truncation
- IPC: Kernel-enforced sender identity, channel authorization
- Cap: GRANT_ONCE consumption, delegation depth overflow protection,
  owner verification, derivation tree slot leak rollback
- Types: PartitionId validation (reject 0/hypervisor, >4096)
- WASM: Target/length validation on send(), module size limit, quota dedup
- Scheduler: Binary heap run queue, epoch wrapping_add, SMP cpu_count enforcement
- All integer overflow paths use wrapping_add/saturating_add/checked_add

TEE implementation (ADR-142, all 4 phases):
- Phase 1: SHA-256 replaces FNV-1a in witness chain, attestation, measured boot
- Phase 2: WitnessSigner trait with SignatureError enum, HmacSha256WitnessSigner,
  Ed25519WitnessSigner (verify_strict), DualHmacSigner, constant_time.rs
- Phase 3: SoftwareTeeProvider/Verifier, TeeWitnessSigner<P,V> pipeline
- Phase 4: SignedSecurityGate, WitnessLog::signed_append, CryptoSignerAdapter,
  ProofEngine::verify_p3_signed, KeyBundle derivation infrastructure
- subtle crate integration for ConstantTimeEq

Performance (26 optimizations):
- O(1) lookups: IPC channel, partition, coherence node, nonce replay
- Binary max-heap scheduler queue (O(log n) enqueue/dequeue)
- Coherence adjacency matrix + cached per-node weights
- BuddyAllocator trailing_zeros bitmap scan + precomputed bit_offset LUT
- Cache-line aligned SwitchContext (hot fields first) and PerCpuScheduler
- DerivationTree O(1) parent_index, combined region overlap+free scan
- #[inline] on 11+ hot-path functions, FNV-1a 8x loop unroll
- CapSlot packing (generation sentinel), RunQueueEntry sentinel, MessageQueue bitmask

Documentation:
- ADR-142: TEE-Backed Cryptographic Verification (with 6 reviewer amendments)
- ADR-135 addendum: P3 no longer deferred
- ADR-132 addendum: DC-3 deferral resolved
- ADR-134 addendum: SHA-256 + HMAC signatures

752 tests, 0 failures across 11 library crates + integration suite.

Co-Authored-By: claude-flow <ruv@ruv.net>
@ruvnet ruvnet merged commit 453aed0 into main Apr 4, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant