Skip to content

Integrate renacer chaos testing for aprender-shell robustness validation #99

@noahgift

Description

@noahgift

Summary

The renacer syscall tracer now supports chaos engineering mode via CLI flags (v0.6.3, paiml/renacer#17). This feature enables robustness testing of aprender-shell under resource pressure, which is critical for ensuring reliable sub-10ms latency in production environments.

Why This Matters for aprender-shell

aprender-shell has strict performance requirements:

  • P99 latency < 10ms for suggestions
  • Memory efficiency for large histories (up to 500MB)
  • Graceful degradation under resource pressure

Chaos testing validates these guarantees hold even when:

  • Memory is constrained (embedded systems, containers)
  • CPU is throttled (CI runners, shared hosts)
  • Processes are interrupted by signals

Proposed Integration

1. Add Chaos Test Makefile Target

# In Makefile - add chaos testing target
.PHONY: chaos-test
chaos-test: build-release
	@echo "Running chaos tests with renacer..."
	# Gentle chaos - CI/CD safe
	renacer --chaos gentle -c -- ./target/release/aprender-shell suggest "git "
	# Aggressive chaos - stress testing
	renacer --chaos aggressive -c -- ./target/release/aprender-shell suggest "cargo "
	# Memory pressure test
	renacer --chaos-memory-limit 32M -c -- ./target/release/aprender-shell suggest "docker "
	# CPU throttle test (simulates slow CI runner)
	renacer --chaos-cpu-limit 0.25 -c -- ./target/release/aprender-shell suggest "kubectl "
	@echo "Chaos tests passed!"

2. Add Performance Baseline Script

Create scripts/chaos-baseline.sh:

#!/bin/bash
# Chaos engineering baseline for aprender-shell
# Validates performance under resource pressure

set -e

MODEL_PATH="${APRENDER_MODEL:-$HOME/.aprender/shell_model.bin}"
RENACER="renacer"

echo "=== aprender-shell Chaos Engineering Baseline ==="
echo "Model: $MODEL_PATH"
echo ""

# Test 1: Gentle chaos (should always pass)
echo "Test 1: Gentle chaos (memory=512MB, cpu=80%, timeout=120s)"
$RENACER --chaos gentle -c --stats-extended -- \
    aprender-shell suggest "git " 2>&1 | head -20

# Test 2: Memory pressure (64MB limit)
echo ""
echo "Test 2: Memory pressure (64MB limit)"
$RENACER --chaos-memory-limit 64M -c -- \
    aprender-shell suggest "cargo build" 2>&1 | head -20

# Test 3: CPU throttle (25% - simulates slow CI)
echo ""
echo "Test 3: CPU throttle (25%)"
$RENACER --chaos-cpu-limit 0.25 -c -- \
    aprender-shell suggest "docker run" 2>&1 | head -20

# Test 4: Aggressive chaos (stress test)
echo ""
echo "Test 4: Aggressive chaos (memory=64MB, cpu=25%, timeout=10s, signals=on)"
$RENACER --chaos aggressive -c -- \
    aprender-shell suggest "npm install" 2>&1 | head -20

# Test 5: Signal handling
echo ""
echo "Test 5: Signal injection"
$RENACER --chaos-signals --chaos-timeout 30s -c -- \
    aprender-shell suggest "make " 2>&1 | head -20

echo ""
echo "=== All chaos tests completed ==="

3. CI Integration

Add to .github/workflows/ci.yml:

  chaos-test:
    name: Chaos Engineering
    runs-on: ubuntu-latest
    needs: [build]
    steps:
      - uses: actions/checkout@v4
      
      - name: Install renacer
        run: cargo install renacer
        
      - name: Build release
        run: cargo build --release -p aprender-shell
        
      - name: Train test model
        run: |
          echo "git status" > /tmp/history.txt
          echo "cargo build" >> /tmp/history.txt
          echo "docker ps" >> /tmp/history.txt
          ./target/release/aprender-shell train /tmp/history.txt
          
      - name: Run chaos tests
        run: |
          # Gentle chaos (should always pass in CI)
          renacer --chaos gentle -c -- ./target/release/aprender-shell suggest "git "
          
          # Memory pressure
          renacer --chaos-memory-limit 64M -c -- ./target/release/aprender-shell suggest "cargo "

renacer Chaos CLI Reference

Presets

Preset Memory CPU Timeout Signals
gentle 512MB 80% 120s off
aggressive 64MB 25% 10s on

Individual Flags

--chaos <PRESET>           # Use preset (gentle, aggressive)
--chaos-memory-limit <SIZE>  # e.g., 64M, 128M, 1G
--chaos-cpu-limit <FRACTION> # 0.0-1.0 (e.g., 0.5 = 50%)
--chaos-timeout <DURATION>   # e.g., 10s, 2m, 1h
--chaos-signals             # Enable SIGALRM/SIGUSR1 injection

Example Output

$ renacer --chaos aggressive -c -- aprender-shell suggest "git "
⚠️  Chaos mode enabled: memory=64MB, cpu=25%, timeout=10s, signals=on

git status  1.000

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 45.23    0.012345         123       100           read
 30.12    0.008234          82       100           write
...

Expected Outcomes

Pass Criteria

  • suggest completes within timeout
  • No segfaults or panics
  • Graceful error messages if limits exceeded
  • Exit code 0 for gentle chaos

Acceptable Failures (Aggressive)

  • Clean exit with resource limit error
  • Meaningful error message
  • No zombie processes or resource leaks

Related

Acceptance Criteria

  • Add make chaos-test target to Makefile
  • Create scripts/chaos-baseline.sh script
  • Add chaos job to CI workflow
  • Document chaos testing in README
  • Verify P99 latency holds under gentle chaos
  • Validate graceful degradation under aggressive chaos

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions