fix: run the memory watchdog for every command; plug the connect() adapter leak#274
Merged
Merged
Conversation
…apter leak Three playground processes grew past 40 GB each and froze the machine. Two causes, both fixed: 1. connect() leaked its session-probe TerminalAdapter on the existing-session and probe-failure paths, and nobody owned the QR-path adapter after the init TUI exited. The leaked statement-store WebSocket + subscriptions keep the event loop alive and are the machinery that can enter the documented polkadot-api microtask-flood state. 2. The 4 GB memory watchdog only ran for deploy/mod/contract. The worker-thread watchdog is the only guard that survives event-loop starvation (signal handlers, hardExit timers, and index.ts's final process.exit all stop firing), so it now defaults ON for every command, with explicit opt-out preserved. New regression coverage: src/utils/auth.connect.test.ts pins the adapter lifecycle contract (destroy on existing/throw paths, ownership transfer on the QR path); cli-runtime.test.ts pins the watchdog default.
Contributor
|
Dev build ready — try this branch: |
Contributor
E2E Test Pass · ✅ PASSTag:
Sentry traces: view spans for this run |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Incident
Three
playgroundprocesses grew to 40+ GB each (45.8 / 42.1 / 41.7 GB in Activity Monitor) and swapped the laptop to a standstill; they had to be force-quit. The commands had looked finished but never returned to the shell.Root cause
Two layers:
connect()leaked its session-probe adapter. It creates aTerminalAdapter(statement-store WebSocket + subscriptions) to check for an existing session. On the existing-session path it returned plain address data without destroying the adapter — and since the adapter isn't part of the result, no caller could. The probe-failure path leaked it too, and on the QR path nobody ownedlogin.adapterafter the init TUI exited.getSessionSigner()andfindSession()already destroy their probe adapters on early-return paths;connect()was the one that forgot.inithad no memory watchdog. The leaked subscription machinery is exactly what can enter the documented polkadot-api microtask-flood state (seeprocess-guard.ts), where the event loop is starved: signal handlers,hardExittimers, andindex.ts's finalprocess.exit()all stop firing. Ctrl+C and tab-close are dead; the frozen TUI frame looks "done". The worker-thread watchdog is the only guard that survives that state — and it only ran fordeploy/mod/contract, notinit/decentralise/build/update/logout.Verified the watchdog itself works in compiled binaries: a probe SEA binary was SIGKILLed at 4.04 GB RSS, and the installed
pg modemits[mem +Ns]samples underDOT_MEMORY_TRACE=1.Changes
src/utils/auth.ts—connect()destroys the probe adapter on the existing-session and probe-failure paths (fire-and-forget.destroy().catch(() => {}), same Bun-SEA rationale asgetSessionSigner()); QR path still transfers ownership viaLoginHandlesrc/commands/init/index.ts— init ownslogin.adapterand destroys it in thefinallyafter the TUI exitssrc/cli-runtime.ts—watchdogoption defaults to true for every command; explicit opt-out preservedsrc/utils/auth.connect.test.ts— new file pinning the adapter lifecycle contract (written failing-first: 2 of 3 cases failed pre-fix)src/cli-runtime.test.ts— default-on + opt-out regression testsCLAUDE.md— watchdog invariant updated to the new defaultVerification
pnpm format:check,pnpm lint:license,pnpm test(584/584),pnpm buildall passplayground initnow emits watchdog samples underDOT_MEMORY_TRACE=1(previously impossible — no watchdog) and still exits on its own after "setup complete"createAdapter()call sites: every path now destroys or transfers exactly once;waitForLogincontains no destroy call, so no double-destroyCaveat
The exact upstream allocation site of the flood was not reproduced live (idle init/mod stayed flat at ~300 MB over 30+ min probes). The defense is structural: if the state recurs, the process dies at 4 GB with a message pointing at
DOT_MEMORY_TRACE=1, which will pin it.