fix: 3 proven race conditions in process lifecycle management

## Problem

The process lifecycle code in main.rs (run_child, lines 848-1040) uses 4 global atomics shared across 3 threads and a signal handler. Three race conditions have been proven deterministically with forced interleavings.

## Proven Races

### Q1: SIGKILL sent to recycled PID (kills innocent process)

The escalation thread reads CHILD_PID, then sends SIGKILL. Between the read and the kill, the child exits and its PID is recycled to a new process. The SIGKILL hits the wrong process.

Proven by forcing PID recycling via /proc/sys/kernel/ns_last_pid. The victim process (sleep 600) was killed by SIGKILL intended for the original child.

### Q3: FORCE_KILLED flag set after main thread reads it

The main thread reads FORCE_KILLED as false and classifies the stop reason as Duration. The escalation thread then sets FORCE_KILLED to true. The "program did not respond to SIGTERM" warning is not printed even though SIGKILL was sent.

Proven with a barrier between the main thread's flag read and the escalation thread's flag write. One run, deterministic.

### Q4: SIGINT arrives before CHILD_PID is stored (parent hangs)

SIGINT arrives between signal handler installation (line 940) and CHILD_PID store (line 948). The handler sees PID 0, skips the kill. The child never receives SIGTERM. With kill_timeout == 0, the parent hangs forever on child.wait(). A second Ctrl-C kills the parent (SA_RESETHAND), orphaning the child.

Proven with a barrier between handler installation and spawn. One run, deterministic. The comment at line 921 says "no Ctrl-C gap can orphan the child" but the proof shows the gap exists.

## Root Cause

4 global atomics coordinating 3 threads and a signal handler. This is shared mutable state in concurrent code, violating Principle 1 of the project's architecture (philosophy.md: "Kill all globals").

## Context

Found during CLI interaction contract enumeration when investigating the untested SIGTERM timeout warning messages (CI14/CI17). The warning messages are symptoms. The races are the disease.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: 3 proven race conditions in process lifecycle management #620

Problem

Proven Races

Q1: SIGKILL sent to recycled PID (kills innocent process)

Q3: FORCE_KILLED flag set after main thread reads it

Q4: SIGINT arrives before CHILD_PID is stored (parent hangs)

Root Cause

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

fix: 3 proven race conditions in process lifecycle management #620

Description

Problem

Proven Races

Q1: SIGKILL sent to recycled PID (kills innocent process)

Q3: FORCE_KILLED flag set after main thread reads it

Q4: SIGINT arrives before CHILD_PID is stored (parent hangs)

Root Cause

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions