fix: recover debug-gather goroutine panics and free run state on stop#93
Conversation
The debug action spawns a detached goroutine that runs RunSteadybitDebug with no panic recovery; a panic there crashed the whole extension process, taking down every other in-flight action. Recover in the goroutine, log the failure, and mark the run finished so Status completes instead of polling forever. Also delete the per-execution entry from the debugRuns map on stop; entries were stored in Prepare/Start but never removed, leaking one entry per execution for the lifetime of the process.
|
Claude finished @joshiste's task in 1m 4s —— View job Review
|
|



Addresses the extension-debug audit MAJOR findings.
1. Unrecovered goroutine panic crashes the whole extension
Startspawns a detached goroutine runningRunSteadybitDebug(...)with no panic recovery. A panic there is fatal to the entire extension process, taking down every other in-flight action it is serving.Fix: recover in the goroutine, log the failure, and store
DebugRun{Finished: true}soStatuscompletes (instead of polling forever) —Statusalready handles a missing result zip gracefully (empty artifacts).2.
debugRunsmap-entry leakEntries are stored in
Prepare/Startbut never deleted, so thesync.Mapgrows by one entry per execution for the process lifetime.Fix:
debugRuns.Delete(executionId)inStop.Testing
go build ./...,go vet,gofmtclean./simplify
3-agent pass (efficiency N/A for a recover+delete): all clean — no safe-go/recover helper exists to reuse (
exthttp.PanicRecoveryis HTTP-handler-only), the twoStorepaths are mutually exclusive, and both fixes are at the right altitude.Follow-up (deferred, single root cause)
RunSteadybitDebugtakes nocontext, so two residuals remain and are best fixed together by making it cancellable + awaited inStopbeforeos.RemoveAll:Stop, itsdebugRuns.Storere-inserts an entry that won't be deleted (re-leak for the aborted/slow case);Stop'sos.RemoveAll(WorkingDir)can race the still-running goroutine writing into that dir.Neither is a regression (the goroutine already wrote to
WorkingDir); they need a cancellation redesign ofRunSteadybitDebug. Happy to do that as a follow-up.