fix: prevent data races and panics in SDK stop/heartbeat handling#52
Conversation
The preflight SDK shares the same two concurrency bugs just fixed in action-kit (#457): - The shared stopEvents slice was appended/resliced/ranged from the HTTP stop and status handlers, the heartbeat-timeout goroutine and the signal handler with no lock — a data race. Guard it with a mutex. - heartbeat.Monitor.Stop closed the pulse channel non-idempotently and RecordHeartbeat did a blocking send, so two racing stop paths could double-close and a concurrent RecordHeartbeat could send on a closed channel — both panic in a non-recovered goroutine, crashing the process. Make Stop idempotent and RecordHeartbeat a non-blocking, closed-safe send, and use LoadAndDelete in stopMonitorHeartbeat. Adds race tests for both paths.
81131ec to
08a839a
Compare
|
Claude finished @joshiste's task in 2m 18s —— View job Claude's Review
Summary: The fix correctly mirrors the action-kit#457 pattern and addresses both races described in the PR body. No bugs found.
Not a new issue, just confirming scope: the broader map/goroutine leak on Verification: No blocking issues. LGTM. |
monitorHeartbeatWithCallback stored a new monitor for an execution id without stopping any monitor already registered under that id, so a repeated Start (same execution id) leaked the previous monitor's two goroutines. Swap-and-Stop the previous monitor (Stop is idempotent). Adds a goroutine-leak regression test.
|



Problem
The kit audit found the preflight SDK carries the same two concurrency bugs just fixed in action-kit (action-kit#457) — its stop/heartbeat plumbing is a near-verbatim copy — plus a related monitor-goroutine leak. All three can affect any extension using preflights; the first two crash the process (data race / panic in a non-recovered goroutine, not caught by
PanicRecovery).stopEventsslice data race (MAJOR) —markAsStopped(append+stopEvents[1:]) andgetStopEvent(range) run with no lock from the HTTPstatus/cancelhandlers, the heartbeat-timeout goroutine (CancelPreflight) and the signal handler.Monitor.Stopclosedpulsenon-idempotently andRecordHeartbeatdid a blocking send; two racing stop paths →close of closed channel, a concurrent record →send on closed channel.monitorHeartbeatWithCallbackstored a new monitor under an execution id without stopping any monitor already registered there, leaking the previous monitor's two goroutines on a retried Start.Fix
stopEventswith a packagesync.Mutex.heartbeat.Monitorself-guarding:mu+closedflag → idempotentStopand a non-blocking, closed-safeRecordHeartbeat.stopMonitorHeartbeatusessync.Map.LoadAndDelete.monitorHeartbeatWithCallbackusessync.Map.Swapto stop-and-replace any existing monitor for the same execution id.Adds race tests for the stop/heartbeat paths and a goroutine-leak regression test for repeated Start.
Verification
go build ./...,go vet ./...,go test -race ./...pass (modulego/preflight_kit_sdk); existing tests still pass.Fixes (1) and (2) mirror action-kit#457; (3) is the preflight-specific monitor leak. This closes all three preflight-kit findings from the kit audit.