Skip to content

Pipeline Plan 441

ezigus edited this page Apr 27, 2026 · 1 revision

Plan Complete: Ruflo Memory Process Leak Fix

I've completed a comprehensive root cause analysis and created a detailed implementation plan for issue #441. Here's the summary:

Root Cause (High Confidence: 90%)

The ruflo_with_timeout function in scripts/lib/ruflo-adapter.sh:404 runs background processes without setsid, then uses pkill -P to kill them on timeout. The problem: pkill -P only kills direct children, not grandchildren. When the ruflo binary spawns Node processes that fork/exec further, those grandchildren persist as orphaned processes (~385 accumulating).

Key Findings

  1. Current Implementation (line 404):

    ( "$@" ) >"$_rft_tmp" &  # No setsid — no process group isolation
  2. Inadequate Timeout Kill (line 416):

    pkill -TERM -P "$bg_pid"  # Only kills direct children, not grandchildren
  3. Missing Fallback: No SIGKILL escalation if TERM doesn't work.

Fix Strategy

Use process groups (setsid) + two-phase kill:

  • Create new session group with setsid (line 404)
  • Capture process group ID (new lines after 405)
  • Phase 1: Kill entire group with kill -TERM -- -$bg_pgid
  • Phase 2: If still alive after 1s, use kill -KILL -- -$bg_pgid

Why this works: Process groups kill all descendants at once, not just direct children.

Task Checklist (10 tasks total)

See TodoList for tracking. Tasks flow as:

  • Tasks 1-4: Code changes (sequential)
  • Tasks 5-7: Testing (parallel after code)
  • Tasks 8-10: Validation (parallel after tests)

Key Risks & Mitigations

Risk Likelihood Mitigation
setsid unavailable 15% Fallback gracefully to old behavior
Process group syntax rejected 10% Test on both macOS bash 3.2 and Linux bash 4.x
KILL signal breaks locks 25% Limit grace period to 1 second, clean temp files first
1-second latency regression 30% Measure before/after, optimize if needed

Definition of Done

pgrep -fc "ruflo memory" ≤ 2 after 10 iterations
✅ All tests pass (npm test)
✅ Memory leak eliminated (385 procs → 0)
✅ Code documented with comments explaining setsid + process group kill

Files to Modify

  1. scripts/lib/ruflo-adapter.sh — Update ruflo_with_timeout (lines 404-428)
  2. scripts/test-ruflo-leak.sh — Create new test script

Ready to proceed to build stage.

Clone this wiki locally