Skip to content

Fix Claude Process leak[MEMORY INTENSIVE], archiving, and stale claude session monitoring.#2042

Merged
juliusmarminge merged 12 commits intopingdotgg:mainfrom
crafael23:fixfeat-claude-process-cleanup-process-reaper
Apr 16, 2026
Merged

Fix Claude Process leak[MEMORY INTENSIVE], archiving, and stale claude session monitoring.#2042
juliusmarminge merged 12 commits intopingdotgg:mainfrom
crafael23:fixfeat-claude-process-cleanup-process-reaper

Conversation

@crafael23
Copy link
Copy Markdown
Contributor

@crafael23 crafael23 commented Apr 15, 2026

Prevent Claude process leaks on archive and session restarts

Branch: fixfeat-claude-process-cleanup-process-reaper
Base: main
Issue: #2007


What Changed

This PR fixes an unbounded process leak where Claude runtime processes survive thread archives, effort/model/runtime-mode changes, and long idle periods. Three targeted changes close every identified leak path:

  1. Archive now stops the provider session. The WebSocket command handler for thread.archive reads the current thread state and, when the thread has a live (non-stopped) session, dispatches thread.session.stop before closing terminals. If session stop fails the archive still completes — cleanup is best-effort, never blocking.

  2. ClaudeAdapter.startSession closes the prior session before replacing it. When a second startSession call arrives for the same threadId (triggered by effort, model, or runtime-mode changes), the adapter now calls stopSessionInternal on the existing session context first, preventing orphaned Claude child processes.

  3. A new ProviderSessionReaper sweeps stale sessions on a timer. A background fiber wakes every 30 minutes, queries persisted session bindings via ProviderSessionDirectory.listBindings(), cross-references each binding's lastSeenAt against a configurable inactivity threshold (default 30 min), and calls ProviderService.stopSession for any session that exceeds it — unless the thread still has an active turn. The reaper is started alongside the orchestration reactor during server startup and is scoped to the reactor lifecycle.

Architecture — how the three fixes layer together

graph TB
    User((User Action))

    subgraph "Fix 1 — Archive cleanup (ws.ts)"
        Archive["thread.archive command"]
        SessionCheck{"thread.session<br/>exists & not stopped?"}
        StopCmd["dispatch(thread.session.stop)"]
        TermClose["terminalManager.close()"]
    end

    subgraph "Fix 2 — Replace guard (ClaudeAdapter.ts)"
        StartSession["startSession(threadId)"]
        ExistCheck{"existing session<br/>for threadId?"}
        StopOld["stopSessionInternal(existing)"]
        SpawnNew["spawn new Claude process"]
    end

    subgraph "Fix 3 — Inactivity reaper (ProviderSessionReaper.ts)"
        Timer["Schedule.spaced(30 min)"]
        Sweep["sweep()"]
        ListBindings["directory.listBindings()"]
        IdleCheck{"idle > threshold<br/>& no active turn?"}
        Reap["providerService.stopSession()"]
    end

    User -->|"archives thread"| Archive
    Archive --> SessionCheck
    SessionCheck -->|yes| StopCmd
    SessionCheck -->|no| TermClose
    StopCmd --> TermClose

    User -->|"changes effort/model/runtime"| StartSession
    StartSession --> ExistCheck
    ExistCheck -->|yes| StopOld
    StopOld --> SpawnNew
    ExistCheck -->|no| SpawnNew

    Timer --> Sweep
    Sweep --> ListBindings
    ListBindings --> IdleCheck
    IdleCheck -->|yes| Reap
    IdleCheck -->|no| Sweep

    style Archive fill:#3498db,color:#fff
    style StopCmd fill:#27ae60,color:#fff
    style StopOld fill:#27ae60,color:#fff
    style Reap fill:#27ae60,color:#fff
    style Timer fill:#9b59b6,color:#fff
Loading

Supporting infrastructure:

  • ProviderSessionDirectory.listBindings() — new method that returns all persisted runtime bindings with full metadata (lastSeenAt, status, resumeCursor, etc.), ordered oldest-first. This gives the reaper (and future diagnostics) a single query to enumerate every tracked session.
  • ProviderRuntimeBindingWithMetadata — new interface extending ProviderRuntimeBinding with lastSeenAt, used by listBindings and the reaper.
  • ProviderSessionDirectoryLayerLive — extracted to module scope in server.ts so it can be shared cleanly between ProviderLayerLive and the new ProviderRuntimeLayerLive without duplicate instantiation.
  • Diagnostic scripts — two lightweight pgrep/ps shell wrappers (issue-2007-process-snapshot.sh and issue-2007-claude-processes.sh) used during manual verification; included in the commit for reproducibility. Full source in the Diagnostic scripts section below.

Files changed (15 files, +1 197 / −26)

File What
apps/server/src/ws.ts Archive handler dispatches thread.session.stop when the thread has an active session
apps/server/src/provider/Layers/ClaudeAdapter.ts startSession stops existing session before replacing it
apps/server/src/provider/Layers/ClaudeAdapter.test.ts Test: replacement session closes the first query
apps/server/src/provider/Layers/ProviderSessionReaper.ts New — inactivity reaper implementation
apps/server/src/provider/Layers/ProviderSessionReaper.test.ts New — 5 test cases for the reaper
apps/server/src/provider/Services/ProviderSessionReaper.ts New — service interface
apps/server/src/provider/Layers/ProviderSessionDirectory.ts listBindings() implementation + toRuntimeBinding helper
apps/server/src/provider/Layers/ProviderSessionDirectory.test.ts Test: listBindings returns metadata in oldest-first order
apps/server/src/provider/Services/ProviderSessionDirectory.ts ProviderRuntimeBindingWithMetadata type + listBindings signature
apps/server/src/provider/Layers/CodexAdapter.test.ts Adds listBindings stub to test layer
apps/server/src/server.ts Wires ProviderSessionReaperLive into ProviderRuntimeLayerLive; hoists shared directory layer
apps/server/src/serverRuntimeStartup.ts Starts reaper alongside orchestration reactor at boot
apps/server/src/server.test.ts 4 new archive integration tests (session stop, no-session, already-stopped, stop failure)
issue-2007-process-snapshot.sh New — manual diagnostic: timestamped pgrep + ps snapshot with optional --label persistence (source below)
issue-2007-claude-processes.sh New — manual diagnostic: Claude-only process listing / count (source below)

Why

The problem

When a user archives a Claude thread, the UI and server clear terminal state and mark the thread archived — but the underlying Claude SDK child process (claude --session-id … --print) is never stopped. The thread.archive command handler only called terminalManager.close(); it never dispatched thread.session.stop, and no event listener in ProviderCommandReactor reacted to thread.archived events.

Before fix — archive leaks the Claude process

sequenceDiagram
    participant User
    participant UI
    participant WS as WebSocket Handler<br/>(ws.ts)
    participant OE as OrchestrationEngine
    participant TM as TerminalManager
    participant Claude as Claude Process<br/>(PID 63697)

    User->>UI: Archive thread
    UI->>WS: dispatch({ type: "thread.archive", threadId })
    WS->>OE: dispatch(thread.archive)
    OE-->>WS: { sequence: N }
    WS->>TM: close({ threadId })
    TM-->>WS: OK

    Note over WS: No thread.session.stop dispatched
    Note over Claude: Still running<br/>PID 63697 alive<br/>Memory ~150-250 MB
    Note over Claude: Orphaned indefinitely
Loading

The same class of leak occurs when effort, model, or runtime-mode changes trigger a session restart: a new startSession call spawns a fresh Claude process, but the old one is left running because the adapter had no "stop before replace" guard.

Before fix — session restart orphans the old process

sequenceDiagram
    participant User
    participant Server
    participant Adapter as ClaudeAdapter
    participant Old as Claude Process A<br/>(PID 67734, effort=high)
    participant New as Claude Process B<br/>(PID 72031, effort=max)

    User->>Server: Send message (effort: max)
    Server->>Adapter: startSession({ threadId, effort: max })

    Note over Adapter: Existing session found<br/>but no stop logic exists

    Adapter->>New: spawn claude --effort max --resume
    New-->>Adapter: session.started

    Note over Old: Never stopped<br/>PID 67734 still alive
    Note over Old,New: 2 Claude processes for 1 thread
Loading

Without a safety net, these orphaned processes accumulate indefinitely — one per archived thread, one per mid-conversation parameter change — consuming memory and CPU until the app or machine is restarted.

TC-A2 accumulation: 5 archives → 5 leaked processes

graph LR
    subgraph "Before fix — process count after each archive"
        A1["Archive #1<br/>count: 5"] --> A2["Archive #2<br/>count: 5"]
        A2 --> A3["Archive #3<br/>count: 5"]
        A3 --> A4["Archive #4<br/>count: 5"]
        A4 --> A5["Archive #5<br/>count: 5"]
    end

    style A1 fill:#e74c3c,color:#fff
    style A2 fill:#e74c3c,color:#fff
    style A3 fill:#e74c3c,color:#fff
    style A4 fill:#e74c3c,color:#fff
    style A5 fill:#e74c3c,color:#fff
Loading

All 5 PIDs (94071, 96911, 97203, 97480, 97916) remained alive through every archive. The count never dropped.

Why this approach

Three independent, non-overlapping fixes — each targeting a distinct leak vector:

Fix 1 — Archive now stops the session (ws.ts)

sequenceDiagram
    participant User
    participant UI
    participant WS as WebSocket Handler<br/>(ws.ts)
    participant OE as OrchestrationEngine
    participant TM as TerminalManager
    participant Claude as Claude Process

    User->>UI: Archive thread
    UI->>WS: dispatch({ type: "thread.archive", threadId })
    WS->>OE: dispatch(thread.archive)
    OE-->>WS: { sequence: N }

    rect rgb(39, 174, 96, 0.1)
        Note over WS: NEW — check thread.session status
        WS->>OE: getReadModel()
        OE-->>WS: thread.session.status = "ready"
        WS->>OE: dispatch(thread.session.stop)
        OE->>Claude: stopSession
        Claude-->>OE: exited
    end

    WS->>TM: close({ threadId })
    TM-->>WS: OK

    Note over Claude: Process cleaned up
Loading
  • Direct fix for the reported bug — minimal, targeted, happens at the exact moment the user triggers the leak.
  • Skips thread.session.stop when session is null or already "stopped".
  • If session stop fails, archive still completes — cleanup is best-effort, never blocking.

Fix 2 — Adapter stops old session before replacement (ClaudeAdapter.ts)

sequenceDiagram
    participant Server
    participant Adapter as ClaudeAdapter
    participant Old as Claude Process A<br/>(effort=high)
    participant New as Claude Process B<br/>(effort=max)

    Server->>Adapter: startSession({ threadId, effort: max })

    rect rgb(39, 174, 96, 0.1)
        Note over Adapter: NEW — existing session detected
        Adapter->>Adapter: logWarning("claude.session.replacing")
        Adapter->>Old: stopSessionInternal({ emitExitEvent: false })
        Old-->>Adapter: closed
    end

    Adapter->>New: spawn claude --effort max --resume
    New-->>Adapter: session.started

    Note over Old: Cleaned up
    Note over New: Only 1 process for this thread
Loading
  • Prevents orphans from effort/model/runtime-mode changes.
  • Uses emitExitEvent: false so the replacement looks seamless to the UI.
  • Catches defects from the stop call so a failed cleanup never blocks the new session.

Fix 3 — Background reaper sweeps stale sessions (ProviderSessionReaper.ts)

sequenceDiagram
    participant Timer as Schedule.spaced<br/>(every 30 min)
    participant Reaper as ProviderSessionReaper
    participant Dir as ProviderSessionDirectory
    participant OE as OrchestrationEngine
    participant PS as ProviderService

    Timer->>Reaper: sweep tick
    Reaper->>Dir: listBindings()
    Dir-->>Reaper: [binding₁, binding₂, binding₃]
    Reaper->>OE: getReadModel()
    OE-->>Reaper: threads with session state

    loop For each binding
        alt status = "stopped"
            Note over Reaper: skip (already stopped)
        else lastSeenAt within threshold
            Note over Reaper: skip (still fresh)
        else thread has activeTurnId
            Note over Reaper: skip (turn in progress)
        else idle > threshold & no active turn
            Reaper->>PS: stopSession({ threadId })
            PS-->>Reaper: OK or error (caught, continues)
            Note over Reaper: Reaped
        end
    end

    Note over Reaper: Log sweep-complete if reapedCount > 0
Loading
  • Defense-in-depth — catches leaks from any code path, including future ones not yet audited.
  • Handles sessions that survive a crash or are left behind by unknown code paths.
  • Scoped to the reactor lifecycle; shuts down cleanly with the server.

All three fixes are independent; each can be reverted without affecting the others.


Verification

Automated tests

8 new test cases covering every changed code path:

Test Location What it verifies
closes the previous session before replacing an existing thread session ClaudeAdapter.test.ts Replacement startSession calls close() on the first query; second query stays open; active sessions list shows only the replacement
stops the provider session and closes thread terminals after archive server.test.ts Archive dispatches thread.session.stop then closes terminals, in that order
archives without dispatching session stop when the thread has no session server.test.ts Archive skips session stop when thread.session is null
archives without dispatching session stop when the thread session is already stopped server.test.ts Archive skips session stop when session.status === "stopped"
archives and still closes terminals when session stop fails server.test.ts Session stop failure is caught; archive result and terminal close still succeed
reaps stale persisted sessions without active turns ProviderSessionReaper.test.ts Session past threshold with no activeTurnId is stopped
skips stale sessions when the thread still has an active turn ProviderSessionReaper.test.ts Active turn protects the session from reaping
does not reap sessions that are still within the inactivity threshold ProviderSessionReaper.test.ts Fresh lastSeenAt keeps the session alive
skips persisted sessions that are already marked stopped ProviderSessionReaper.test.ts status: "stopped" is a no-op for the reaper
continues reaping other sessions when one stop attempt fails ProviderSessionReaper.test.ts One failed stop does not prevent reaping the next session
lists persisted bindings with metadata in oldest-first order ProviderSessionDirectory.test.ts listBindings() returns all rows with lastSeenAt, sorted by lastSeenAt ascending

Manual verification — before-fix baseline

A full matrix of 9 manual test cases was executed against the unfixed main build to establish the baseline leak behavior. Each case used two custom diagnostic scripts to capture process state at critical moments.

Diagnostic scripts

Two lightweight shell scripts were used during manual verification to capture process state at each step. They are included in the commit for reproducibility.

issue-2007-process-snapshot.sh — full process snapshot with optional persistence

A pgrep + ps wrapper that prints a timestamped snapshot of all T3 Code-related processes (Claude, Codex, Electron, src/bin.ts). When invoked with --label <name>, it also persists the snapshot to .logs/issue-2007/process-snapshots/<timestamp>-<label>.txt for later diff comparison. The ps output includes PID, PPID, elapsed time, RSS (memory), and the full command line — enough to identify session IDs, models, effort levels, and permission modes from process argv.

#!/usr/bin/env bash
set -euo pipefail

script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
repo_root="$(cd "${script_dir}/.." && pwd)"
snapshot_dir="${repo_root}/.logs/issue-2007/process-snapshots"
process_pattern='(^|/|[[:space:]])claude([[:space:]]|$)|codex app-server|src/bin.ts|T3 Code|Electron'
label=""

usage() {
  cat <<'EOF'
Usage: ./scripts/issue-2007-process-snapshot.sh [--label <name>]

Print the Issue #2007 process snapshot used in the reproduction guide.
When --label is provided, the snapshot is also written under:
  .logs/issue-2007/process-snapshots/
EOF
}

sanitize_label() {
  local value="$1"
  value="$(printf '%s' "${value}" | tr '[:upper:]' '[:lower:]')"
  value="$(printf '%s' "${value}" | sed -E 's/[^a-z0-9._-]+/-/g; s/^-+//; s/-+$//')"
  if [[ -z "${value}" ]]; then
    value="snapshot"
  fi
  printf '%s\n' "${value}"
}

render_snapshot() {
  printf '\n=== %s ===\n' "$(date '+%Y-%m-%dT%H:%M:%S%z')"
  echo "-- pgrep --"
  pgrep -fal "${process_pattern}" || true
  echo "-- ps --"
  ps -Ao pid,ppid,etime,rss,command | grep -E "${process_pattern}" | grep -v "grep -E" || true
}

while [[ $# -gt 0 ]]; do
  case "$1" in
    --label)
      if [[ $# -lt 2 ]]; then
        echo "Error: --label requires a value." >&2
        exit 1
      fi
      label="$2"
      shift 2
      ;;
    -h|--help)
      usage
      exit 0
      ;;
    *)
      echo "Error: unknown argument: $1" >&2
      usage >&2
      exit 1
      ;;
  esac
done

if [[ -n "${label}" ]]; then
  mkdir -p "${snapshot_dir}"
  snapshot_path="${snapshot_dir}/$(date -u '+%Y%m%dT%H%M%SZ')-$(sanitize_label "${label}").txt"
  render_snapshot | tee "${snapshot_path}"
  printf '\nSaved snapshot to %s\n' "${snapshot_path}" >&2
  exit 0
fi

render_snapshot

Usage:

# Print to stdout
./scripts/issue-2007-process-snapshot.sh

# Print and persist with a label
./scripts/issue-2007-process-snapshot.sh --label tc-a1-after-archive
issue-2007-claude-processes.sh — Claude-only process filter / counter

A focused pgrep filter that shows only processes whose command line matches the pattern (^|/|[[:space:]])claude([[:space:]]|$) — i.e., actual claude binary invocations, not unrelated matches. The --count flag prints just the total, useful for quick before/after comparisons.

#!/usr/bin/env bash
set -euo pipefail

mode="list"
claude_pattern='(^|/|[[:space:]])claude([[:space:]]|$)'

usage() {
  cat <<'EOF'
Usage: ./scripts/issue-2007-claude-processes.sh [--count]

List Claude-related processes for Issue #2007 reproduction checks.
Use --count when a case only needs the current process total.
EOF
}

while [[ $# -gt 0 ]]; do
  case "$1" in
    --count)
      mode="count"
      shift
      ;;
    -h|--help)
      usage
      exit 0
      ;;
    *)
      echo "Error: unknown argument: $1" >&2
      usage >&2
      exit 1
      ;;
  esac
done

if [[ "${mode}" == "count" ]]; then
  printf '%s\n' "$(pgrep -fal "${claude_pattern}" 2>/dev/null | wc -l | tr -d '[:space:]')"
  exit 0
fi

pgrep -fal "${claude_pattern}" || true

Usage:

# List Claude processes
./scripts/issue-2007-claude-processes.sh

# Just the count
./scripts/issue-2007-claude-processes.sh --count

Before-fix results matrix

27 labeled snapshots were collected across all cases. The results corroborate the investigation hypothesis: archive and session-restart paths leave prior Claude processes running, while delete and full shutdown clean them up.

ID Scenario Before-fix result Evidence
TC-A1 Archive 1 Claude thread Leak confirmed Claude process PID 63697 (--session-id 296187f1-…) present before archive → still present after archive → still present after unarchive/resume (same PID reused, never stopped)
TC-A2 Archive 5 Claude threads Accumulation confirmed 5 threads spawned 5 Claude PIDs (94071, 96911, 97203, 97480, 97916). After archiving all 5 one by one, --count stayed at 5 through every snapshot. Process count never dropped.
TC-A3 Delete 1 Claude thread (control) Delete cleans up 1 Claude PID (48142) before delete → 0 Claude processes after delete. Strong contrast with TC-A1/A2 — thread.session.stop is dispatched in the delete path but not in archive.
TC-C1 Change effort (high → max) Overlap confirmed After effort change: 2 Claude processes — PID 67734 (--effort high) + PID 72031 (--effort max --resume). Old process not stopped before new one started.
TC-C2 Change model (opus → sonnet) Overlap confirmed After model change: 2 Claude processes — PID 87274 (opus) + PID 90697 (sonnet --resume). Same orphan class as TC-C1.
TC-C3 Change runtime mode Overlap confirmed After runtime change: 2 Claude processes — PID 1738 (bypassPermissions) + PID 6051 (default permission mode --resume). Same orphan class.
TC-D1 Server stop/start + resume Shutdown cleans up; resume works 1 Claude PID (19719) before shutdown → 0 after clean shutdown → 1 new PID (24761, --resume 2591a34f-…) after restart. OS-level parent exit reaps children correctly.
TC-D2 Clean app exit Inferred clean from TC-D1 Same shutdown mechanism; no stray processes observed.
TC-B1 35+ min idle Not executed Idle-reaper behavior is new in this PR; no baseline to compare against (by design — the reaper didn't exist before).

TC-A1 lifecycle — before vs. after fix

sequenceDiagram
    participant Script as Snapshot Script
    participant App as T3 Code Server
    participant Claude as Claude Process

    rect rgb(231, 76, 60, 0.08)
        Note over Script,Claude: BEFORE FIX
        Script->>Script: tc-a1-baseline (14:12:42)
        Note over Claude: No Claude processes

        App->>Claude: startSession (PID 63697)
        Script->>Script: tc-a1-ready (14:15:41)
        Note over Claude: PID 63697 alive<br/>session 296187f1

        App->>App: thread.archive dispatched
        Note over App: Only terminal closed
        Script->>Script: tc-a1-after-archive (14:16:33)
        Note over Claude: PID 63697 STILL ALIVE

        App->>App: thread unarchived + RESUMED
        Script->>Script: tc-a1-after-resume (14:17:36)
        Note over Claude: PID 63697 STILL ALIVE<br/>Same PID, never stopped
    end

    rect rgb(39, 174, 96, 0.08)
        Note over Script,Claude: AFTER FIX
        App->>Claude: startSession (new PID)
        Note over Claude: 1 Claude process

        App->>App: thread.archive dispatched
        App->>App: thread.session.stop dispatched
        App->>Claude: stopSession
        Claude-->>App: exited
        Note over Claude: 0 Claude processes

        App->>Claude: Resume → startSession (new PID)
        Note over Claude: 1 Claude process (cold resume)
    end
Loading

TC-C1 effort change — before vs. after fix

sequenceDiagram
    participant Script as Snapshot Script
    participant Adapter as ClaudeAdapter
    participant OldProc as Claude Process A
    participant NewProc as Claude Process B

    rect rgb(231, 76, 60, 0.08)
        Note over Script,NewProc: BEFORE FIX
        Adapter->>OldProc: startSession (PID 67734, effort=high)
        Script->>Script: tc-c1-before-effort-change (14:40:04)
        Note over OldProc: 1 process

        Note over Adapter: No stop-before-replace logic
        Adapter->>NewProc: startSession (PID 72031, effort=max)
        Script->>Script: tc-c1-after-effort-change (14:40:52)
        Note over OldProc,NewProc: 2 processes (LEAK)
    end

    rect rgb(39, 174, 96, 0.08)
        Note over Script,NewProc: AFTER FIX
        Adapter->>OldProc: startSession (effort=high)
        Note over OldProc: 1 process

        Adapter->>OldProc: stopSessionInternal (emitExitEvent: false)
        OldProc-->>Adapter: closed
        Adapter->>NewProc: startSession (effort=max)
        Note over NewProc: 1 process (clean replacement)
    end
Loading

After-fix verification — full results

The same test matrix was re-executed against the fix build. All cases that previously leaked now clean up:

graph TB
    subgraph "Before fix — Claude process count"
        direction LR
        B_A1["TC-A1<br/>Archive 1<br/><b>1 leaked</b>"]
        B_A2["TC-A2<br/>Archive 5<br/><b>5 leaked</b>"]
        B_C1["TC-C1<br/>Effort Δ<br/><b>2 (overlap)</b>"]
        B_C2["TC-C2<br/>Model Δ<br/><b>2 (overlap)</b>"]
        B_C3["TC-C3<br/>Runtime Δ<br/><b>2 (overlap)</b>"]
        B_D1["TC-D1<br/>Shutdown<br/><b>0 (clean)</b>"]
    end

    subgraph "After fix — Claude process count"
        direction LR
        A_A1["TC-A1<br/>Archive 1<br/><b>0</b>"]
        A_A2["TC-A2<br/>Archive 5<br/><b>0</b>"]
        A_C1["TC-C1<br/>Effort Δ<br/><b>1</b>"]
        A_C2["TC-C2<br/>Model Δ<br/><b>1</b>"]
        A_C3["TC-C3<br/>Runtime Δ<br/><b>1</b>"]
        A_D1["TC-D1<br/>Shutdown<br/><b>0 (clean)</b>"]
    end

    B_A1 -.->|"fixed"| A_A1
    B_A2 -.->|"fixed"| A_A2
    B_C1 -.->|"fixed"| A_C1
    B_C2 -.->|"fixed"| A_C2
    B_C3 -.->|"fixed"| A_C3
    B_D1 -.->|"unchanged"| A_D1

    style B_A1 fill:#e74c3c,color:#fff
    style B_A2 fill:#e74c3c,color:#fff
    style B_C1 fill:#e67e22,color:#fff
    style B_C2 fill:#e67e22,color:#fff
    style B_C3 fill:#e67e22,color:#fff
    style B_D1 fill:#27ae60,color:#fff

    style A_A1 fill:#27ae60,color:#fff
    style A_A2 fill:#27ae60,color:#fff
    style A_C1 fill:#27ae60,color:#fff
    style A_C2 fill:#27ae60,color:#fff
    style A_C3 fill:#27ae60,color:#fff
    style A_D1 fill:#27ae60,color:#fff
Loading
Test case Before fix After fix Fixed by
TC-A1 (archive 1) 1 leaked process 0 processes after archive Fix 1 — archive stops session
TC-A2 (archive 5) 5 leaked processes 0 processes after all archives Fix 1 — archive stops session
TC-A3 (delete — control) 0 (already worked) 0 (unchanged) N/A
TC-C1 (effort change) 2 processes (overlap) 1 process (clean replace) Fix 2 — adapter stop-before-replace
TC-C2 (model change) 2 processes (overlap) 1 process (clean replace) Fix 2 — adapter stop-before-replace
TC-C3 (runtime change) 2 processes (overlap) 1 process (clean replace) Fix 2 — adapter stop-before-replace
TC-D1 (shutdown + resume) 0 after shutdown, 1 on resume Unchanged N/A
TC-D2 (clean exit) 0 (clean) Unchanged N/A
TC-B1 (35 min idle) Not tested (no reaper existed) Reaper stops session after threshold; provider.session.reaped logged Fix 3 — background reaper

Checklist

  • [~] This PR is small and focused, dont think its small, but it is focused, it touches several systems.
  • I explained what changed and why
  • No UI changes (server-side process lifecycle only)
  • Automated tests cover all new and changed code paths
  • Manual before/after verification completed with process snapshots

Note

Medium Risk
Touches core session/process lifecycle and introduces a background sweeper that can stop sessions; incorrect thresholds, lastSeenAt parsing, or status checks could cause unexpected session termination or mask cleanup failures.

Overview
Prevents provider process leaks by making session lifecycle cleanup best-effort but explicit: thread.archive now conditionally dispatches thread.session.stop (based on the thread’s current session status) before closing terminals, and both ClaudeAdapter and CodexAppServerManager now dispose/stop any existing session for the same threadId before starting a replacement.

Adds a new ProviderSessionReaper service, wired into server startup, that periodically scans persisted provider session bindings (via new ProviderSessionDirectory.listBindings() returning lastSeenAt metadata) and stops sessions that have been idle past a configurable threshold when no active turn is running; ProviderService.startSession also stops stale sessions in other providers after a successful start.

Includes focused test coverage for session replacement, archive-stop behavior (including failure/defect tolerance), directory binding listing order/metadata, and reaper behavior; adds a test:process-reaper script to run the relevant subset.

Reviewed by Cursor Bugbot for commit 59fdbb0. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Fix Claude and Codex process leaks, stop stale sessions on archive, and add background session reaping

  • startSession in ClaudeAdapter and CodexAppServerManager now disposes any existing session for the same thread before starting a replacement, without emitting a lifecycle exit event; disposal failures are logged but do not block the new session.
  • ProviderService.startSession stops sessions on other providers for the same thread after a successful start (e.g. switching from Codex to Claude cleans up the Codex session).
  • The thread.archive command handler in ws.ts now dispatches thread.session.stop before closing terminals, skipping the stop if no active session exists; failures are logged without affecting the archive result.
  • A new ProviderSessionReaper background service periodically sweeps idle provider sessions exceeding a configurable inactivity threshold, skipping sessions with an active turn or already stopped, and is started on server startup.
  • ProviderSessionDirectory gains a listBindings method returning all persisted bindings with metadata, used by the reaper to find stale sessions.
  • Behavioral Change: replacing an active session no longer emits session.exited/session/closed for the replaced session.

Macroscope summarized 59fdbb0.

- Raise DEFAULT_INACTIVITY_THRESHOLD_MS from 5 to 30 minutes
- Add test:process-reaper script for targeted reaper-related test runs
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 15, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 542b0c82-1c74-4eaf-a98b-f936ebd40aa5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added vouch:unvouched PR author is not yet trusted in the VOUCHED list. size:L 100-499 changed lines (additions + deletions). labels Apr 15, 2026
Comment thread apps/server/src/provider/Layers/ClaudeAdapter.ts
@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp bot commented Apr 15, 2026

Approvability

Verdict: Needs human review

Introduces a new background ProviderSessionReaper service with scheduled sweeps to stop stale sessions, plus session replacement logic across multiple adapters. These are significant runtime behavior changes affecting session lifecycle management that warrant human review.

You can customize Macroscope's approvability policy. Learn more.

- Apply prettier formatting throughout ClaudeAdapter.ts
- Replace Effect.catchDefect with Effect.catchCause when stopping replaced sessions so both typed failures and unexpected defects are caught
Comment thread apps/server/src/provider/Layers/ProviderSessionReaper.ts
- Replace Effect.catch with Effect.catchCause so fatal defects during
  session stop don't abort the entire reap cycle
- Add test verifying reaper continues to subsequent sessions after a defect
Comment thread apps/server/src/ws.ts
- Switch from Effect.catch to Effect.catchCause so archive handles
  both expected errors and defects (e.g. Effect.die) during session stop
- Add test verifying terminals still close when session stop defects
@crafael23
Copy link
Copy Markdown
Contributor Author

I'm sorry if this is rather big. But I'll be using this pr version for now since I constantly find my ram being filled up to its max 90% of the time after a couple of chats.

If this doesn't get merged it's okay, just thought this would be an appropriate and very helpful solution.

@crafael23 crafael23 changed the title Fixfeat claude process cleanup process reaper Fix Claude Process leak[MEMORY INTENSIVE], archiving, and stale claude session monitoring. Apr 16, 2026
Berkay2002 added a commit to Berkay2002/bcode that referenced this pull request Apr 16, 2026
Co-authored-by: codex <codex@users.noreply.github.com>
Comment thread apps/server/src/codexAppServerManager.ts Outdated
Co-authored-by: codex <codex@users.noreply.github.com>
Comment thread apps/server/src/ws.ts Outdated
juliusmarminge and others added 2 commits April 16, 2026 14:09
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 83b005e. Configure here.

Comment thread apps/server/src/ws.ts Outdated
Co-authored-by: codex <codex@users.noreply.github.com>
@juliusmarminge juliusmarminge merged commit e0117b2 into pingdotgg:main Apr 16, 2026
11 of 12 checks passed
znoraka pushed a commit to znoraka/t3code that referenced this pull request Apr 17, 2026
…e session monitoring. (pingdotgg#2042)

Co-authored-by: Julius Marminge <julius0216@outlook.com>
Co-authored-by: codex <codex@users.noreply.github.com>
aaditagrawal added a commit to aaditagrawal/t3code that referenced this pull request Apr 18, 2026
Integrates upstream/main (9df3c64) on top of fork's main (9602c18).

Upstream features adopted:
- Claude Opus 4.5 and 4.7 built-in models (pingdotgg#2072, pingdotgg#2143)
- Node-native TypeScript migration across desktop/server (pingdotgg#2098)
- Configurable project grouping with client-settings overrides (pingdotgg#2055, pingdotgg#2099)
- Thread status in command palette (pingdotgg#2107)
- Responsive composer / plan sidebar on narrow windows (pingdotgg#1198)
- Capture-phase CTRL+J keydown for Windows terminal toggle (pingdotgg#2113/pingdotgg#2142)
- Bypass xterm for global terminal shortcuts (pingdotgg#1580)
- Windows ARM build target (pingdotgg#2080)
- Windows PATH hydration + repair (pingdotgg#1729)
- Gitignore-aware workspace search (pingdotgg#2078)
- Claude process leak fix + stale session monitoring (pingdotgg#2042)
- Preserve provider bindings when stopping sessions (pingdotgg#2084)
- Clean up invalid pending-approval projections (pingdotgg#2106) — new migration
- Extract backend startup readiness coordination
- Drop stale text-gen options on reset (pingdotgg#2076)
- Extend negative repository identity cache TTL (pingdotgg#2083)
- Allow deleting non-empty projects from warning toast (pingdotgg#1264)
- Restore defaults only on General settings (pingdotgg#1710)
- Release workflow modernization (blacksmith runners, GitHub App token guards, v0.0.20 version bump)

Fork features preserved:
- All 8 providers (codex, claudeAgent, copilot, cursor, opencode,
  geminiCli, amp, kilo) with their adapters, services, and tests
- Fork's custom OpenCode protocol impl in apps/server/src/opencode/ (kept
  over upstream's @opencode-ai/sdk-based provider added in pingdotgg#1758 — fork's
  version is tested and integrated; upstream's parallel files deleted)
- Fork's direct-CLI Cursor adapter (kept over upstream's new ACP-based
  CursorProvider added in pingdotgg#1355 — upstream's parallel files deleted)
- Fork's ProviderRegistry aggregates only codex + claudeAgent snapshots;
  the other 6 providers register via ProviderAdapterRegistry
- PROVIDER_CACHE_IDS stays at [codex, claudeAgent] matching what the
  registry actually caches
- Migration IDs preserved (fork 23/24/25/26; upstream's new 025 lands at
  ID 27 to avoid re-applying on deployed fork DBs)
- Fork's generic per-provider settings (enabled/binaryPath/configDir/
  customModels) kept over upstream's opencode-specific serverUrl/password
- Log directory IPC channels, updateInstallInFlight tracking, icon
  composer pipeline all preserved
- Fork's simplified release.yml (no npm CLI publish, no nightly infra)
- composerDraftStore normalizeProviderKind widened to accept all 8 kinds
- Dark mode --background set to #0f0f0f

Test status:
- All 9 package typechecks pass
- Lint clean (0 errors)
- Tests: 1877 passed, 15 skipped (incl. 4 historically-flaky GitManager
  cross-repo PR selector tests newly gated with TODO for Node-native-TS
  follow-up)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L 100-499 changed lines (additions + deletions). vouch:unvouched PR author is not yet trusted in the VOUCHED list.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants