fix(init): cancel in-flight Mastra requests on teardown#825
Merged
Conversation
Create an `AbortController` alongside the `MastraClient` in `runWizard`, pass its signal via `abortSignal` in `ClientOptions`, and abort it from a `using` disposable so any in-flight fetches are canceled on every exit path (success, error, cancellation). Why this matters: - `MastraClient` has no `close()`/`dispose()` API. Without an explicit abort, keep-alive sockets in Bun's fetch dispatcher can hold the event loop alive past the wizard's natural exit, causing the shell to appear stuck. The original `process.exit(0)` workaround (removed in #802) papered over this symptom; this PR addresses the root cause. - Explicit cancellation means we no longer rely on Bun's fetch dispatcher to auto-unref idle sockets. Cross-runtime robust. Implementation notes: - `AbortController` is created per-`runWizard` call. Scope matches the `MastraClient`. - `using _mastraCleanup` disposable calls `abortController.abort()` on every exit path. Idempotent (signal's `aborted` flag guards against double-abort). - Custom `fetch` wrapper preserves `init.signal` via the spread — MastraClient's per-request signals still reach the underlying `fetch` call. - No `run.cancel()` — the server observes the dropped fetch connection and cancels the run server-side without an extra round-trip during teardown. Tests: - Capture `ClientOptions` from each MastraClient instance via a prototype `getWorkflow` hook that reads `this.options`. - Assert `abortSignal` is aborted after success, tool-error, and WizardCancelledError paths. - Assert the signal is forwarded live to MastraClient at construction (not pre-aborted).
Contributor
|
Contributor
Codecov Results 📊✅ 138 passed | Total: 138 | Pass Rate: 100% | Execution Time: 0ms 📊 Comparison with Base Branch
✨ No test changes detected All tests are passing successfully. ✅ Patch coverage is 100.00%. Project has 1949 uncovered lines. Coverage diff@@ Coverage Diff @@
## main #PR +/-##
==========================================
- Coverage 95.28% 95.26% -0.02%
==========================================
Files 284 284 —
Lines 41071 41076 +5
Branches 0 0 —
==========================================
+ Hits 39131 39127 -4
- Misses 1940 1949 +9
- Partials 0 0 —Generated by Codecov Action |
Pre-merge review feedback cleanup: - Remove the `if (!signal.aborted)` guard around `abortController.abort()`. `AbortController.abort()` is spec-idempotent — the guard added a line for zero behavior. The comment now states the idempotence property directly. - Rewrite the "forwards a live abortSignal" test to actually prove what it claims. The previous version captured the signal from `startAsyncMock` and only re-asserted identity against `capturedClientOptions[0]` — tautological. New version reads `signal.aborted` from the `getWorkflow` spy (which runs synchronously at `new MastraClient(...)` time, before any fetch dispatch) and asserts it's `false`. Then asserts that by the time `runWizard` returns, the same signal is `true`. Proves both that the signal is live during construction AND that teardown aborts it.
6 tasks
BYK
added a commit
that referenced
this pull request
Apr 23, 2026
…831) ## Summary At the end of a `sentry init` flow, the process hangs until the user presses a key. Follow-up to #802, #824, #825 — addresses the third and final contributor to the post-wizard hang. ## Root cause Our no-op patch of `process.stdin.pause` silently swallowed clack's `rl.close() → input.pause()` call. Stdin stayed in flowing/ref'd mode from `readline.createInterface()`'s internal `input.resume()`, keeping the libuv event loop alive until any keypress delivered a `data` event. ### Why the patch was needed `stdin-reopen.ts` replaces `process.stdin.pause`/`resume` with no-ops at install time to dodge Bun's fd-0 `EINVAL` on pause/resume transitions (see the comment at the install site). That fix is correct and must stay. ### Why the bug was invisible `rl.close()` is the only place clack ever pauses stdin — it relies entirely on Node's standard readline cleanup discipline. Our no-op patch swallowed every call without any visible error, so there was no log line, no warning, no failed assertion. ### Why PRs #802/#824/#825 didn't catch it - #802 fixed the `/dev/tty` ReadStream contribution (explicit `.destroy()`). - #824 adopted `using`/`Symbol.dispose` for guaranteed teardown + termios restore. - #825 aborted the MastraClient signal to release keep-alive sockets. Post-teardown state after all three fixes: - `/dev/tty` ReadStream: destroyed ✓ - MastraClient AbortController: aborted ✓ - **`process.stdin`: still ref'd and flowing ✗** ← the remaining anchor PR #782's original `process.exit(0)` workaround masked this by killing the process unconditionally. Each subsequent PR peeled off one contributor; stdin was the last one standing. ## Fix One call to the just-restored `original.pause` at the end of `closeFreshTtyForwarding()`: ```ts // Release the libuv handle on fd 0. Clack's prompt lifecycle relies on // `rl.close() → rl.pause() → this.input.pause()` to pause stdin, but we // replaced `process.stdin.pause` with a no-op at install time... Now that // the original `.pause()` is restored, invoke it directly so stock // Node/Bun cleanup can finish. Idempotent: safe when stdin was already // paused. try { original.pause.call(process.stdin); } catch { // Defensive: swallow errors from runtimes that throw if stdin is // already destroyed. } ``` Rationale for `.pause()` over alternatives: - Exactly what Node's `rl.close()` would have called — matches clack's implicit contract. - Idempotent on already-paused streams. - Doesn't destroy the stream (unlike `.destroy()`); any future code that wanted to read stdin (none does in `init`; it's a terminal command) could still resume. ## Regression tests Two new tests in `test/lib/init/stdin-reopen.test.ts`: ### Unit: teardown invokes restored pause exactly once Replaces the beforeEach stub with a counting spy BEFORE install (so install captures the spy as `original.pause`). Verifies: - During install, `process.stdin.pause` is the patched no-op (not the spy). - Calls made mid-wizard hit the no-op (spy count stays 0). - After teardown, `process.stdin.pause` is restored to the spy AND the spy was invoked exactly once. ### Integration: stdin is not flowing after teardown Puts `process.stdin` into flowing mode via real `Readable.prototype.resume` (simulating what clack does via `readline.createInterface`). Runs install + teardown. Asserts `process.stdin.readableFlowing !== true` after disposal — without the fix, this assertion would fail because the no-op pause never actually pauses. ## Test plan - [x] `bun test test/lib/init/stdin-reopen.test.ts` — 15 pass (13 existing + 2 new) - [x] `bun test test/lib/init/ test/commands/init.test.ts` — 193 pass - [x] `bun test --timeout 15000 test/lib test/commands test/types` — 5777 pass, 0 fail - [x] `bun run typecheck` — clean - [x] `bun run lint` — clean (only pre-existing markdown.ts warning) - [ ] Manual: `curl -fsSL https://cli.sentry.dev/install | SENTRY_INIT=1 bash` — shell prompt should return immediately after "Setup complete" without a keypress. ## Risk Very low. Single call to a restored function (known state) guarded by try/catch. No API changes, no new dependencies, no test fixture churn. Two new tests exercise the exact regression. ## Out of scope - Revisiting whether the no-op `pause`/`resume` patch is still necessary with current Bun (Bun's fd-0 EINVAL may be fixed — worth investigating later as a simplification, but not as part of this hot-fix). - E2E spawn test asserting process-exits-without-keypress (requires pty fixture infrastructure we don't currently have).
6 tasks
BYK
added a commit
that referenced
this pull request
Apr 23, 2026
## Summary After "Sentry SDK installed successfully!", `sentry init` still hangs until a keypress despite #802/#824/#825/#831. Root cause is a Bun 1.3.11 libuv refcount bug that userland cannot fix. Restores PR #782's `process.exit` workaround, but properly wrapped in `setTimeout(..., 100).unref()` so it's transparent in the happy path and terminal only when the Bun bug bites. ## Root cause (verified) Opening our fresh `/dev/tty` ReadStream (the `curl | bash` TTY-delivery workaround in `stdin-reopen.ts`) combined with clack's internal `readline.createInterface(process.stdin)` leaks a libuv handle that NO userland cleanup releases. Verified by systematic matrix test against real `/dev/tty` under a pty: | Scenario | Result | |---------------------------------------|-----------| | `fresh` alone | FAST ✓ | | `readline.createInterface` alone | FAST ✓ | | `readline` + our pause/resume patch | FAST ✓ | | `fresh + readline` | HANG 7s | | `fresh + readline + fresh.destroy()` | HANG 7s | | `fresh + readline + rl.close()` | HANG 7s | | `fresh + readline + process.stdin.destroy()` | HANG 7s | | `fresh + readline + removeAllListeners` | HANG 7s | | `fresh + readline + setTimeout(exit, 100).unref()` | FAST (via forced exit) | `process.stdin.unref()` is `undefined` on Bun 1.3.11, so Node's canonical "let the process exit" escape hatch isn't available. ## Why PRs #802/#824/#825/#831 didn't fix it Each peeled off a **legitimate contributing cause** — all should stay: - #802: `/dev/tty` ReadStream being ref'd (explicit `fresh.destroy()`) - #824: hardened teardown via `using`/`Symbol.dispose` + termios restore - #825: MastraClient keep-alive sockets (AbortController) - #831: `process.stdin` flowing state (restored `pause()` call) But the libuv refcount bug is a Bun-internal issue, not a stream-state or socket issue. No amount of userland cleanup fixes it. ## Fix Restore a force-exit safety net in `src/commands/init.ts`, wrapped in `setTimeout(..., 100).unref()`: ```ts if (process.env.NODE_ENV !== "test") { setTimeout(() => { process.exit(process.exitCode ?? 0); }, 100).unref(); } ``` Properties: - **Transparent in the happy path** — when the loop drains naturally (future Bun versions that fix the refcount bug, non-TTY flows, `--yes` with no prompts), the `.unref()` timer doesn't hold the loop. Process exits before the timer fires. - **Terminal when needed** — when the Bun bug bites, the timer fires after a 100ms grace period. Imperceptible to the user. - **100ms grace period** — enough for Sentry telemetry flush and stdio buffer flush to complete first. Matches best practices for terminal commands. ## Test gate `NODE_ENV !== "test"` guard: `bun test` sets `NODE_ENV=test` automatically. Without this guard, each call to `initCommand.func` in tests would schedule an unref'd 100ms timer; accumulated timers fire across test files and terminate the test runner mid-suite. The guard avoids this while leaving the safety net active in all real-world invocations (including `bun run dev`, compiled binary, npm bundle). ## Test plan - [x] `bun test test/commands/init.test.ts test/lib/init/` — 193 pass - [x] `bun test --timeout 15000 test/lib test/commands test/types` — 5777 pass, 0 fail - [x] `bun run typecheck` — clean - [x] `bun run lint` — clean (only pre-existing markdown.ts warning) - [x] Manual repro: the production scenario (real `/dev/tty` under a pty) hangs for 7s without this fix, exits in 286ms with it. - [ ] User validation via `curl -fsSL https://cli.sentry.dev/install | SENTRY_INIT=1 bash` after merge. ## Follow-ups - Exploration task: find an alternative to the fresh `/dev/tty` ReadStream approach for the `curl | bash` TTY-delivery workaround (the original bug #767 was fixing). If we can make that work without a second ReadStream on stdin, the Bun refcount bug is sidestepped entirely and the safety net becomes redundant. - File a Bun upstream issue with the systematic matrix repro. ## Risk Low. Single-file change. `.unref()` ensures the timer is transparent in healthy flows. Guarded against test-runner interference. All prior fixes remain in place because each addresses a legit cause.
5 tasks
BYK
added a commit
that referenced
this pull request
Apr 23, 2026
…ety net (#835) ## Summary Delete `src/lib/init/stdin-reopen.ts` entirely and the `setTimeout().unref()` safety net from #833. Net **−838 / +1 lines**. The `forwardFreshTtyToStdin` workaround was created to fix a Bun single-file-binary bug where TTY fds inherited via `curl | bash` → `exec sentry init </dev/tty` (in install.sh) accepted `setRawMode(true)` but never delivered keypress events. Research shows that bug is fixed on Bun 1.3.11 — and the workaround is actively causing the newer hang patched by #833. ## Empirical findings Reproduction harness: Python `pty.fork()` mirroring install.sh's exact `exec bin </dev/tty` flow against `bun build --compile --target=bun-linux-x64` binaries on Bun 1.3.11. ### The original bug is gone | Observable | Original bug | Bun 1.3.11 | |-----------------------------------------|--------------|------------| | `process.stdin.isTTY` after `</dev/tty` | `undefined` | `true` | | `setRawMode(true)` | no effect | works | | `data` events on keystroke | **never** | delivered | | Clack `text/confirm/select` prompts | hung forever | completes | Verified with three binaries running sequential clack prompts through the exact `exec bin </dev/tty` invocation. All exit cleanly on Enter without any workaround. ### The workaround IS the cause of the current hang | Scenario (real `/dev/tty` under pty) | Result | |--------------------------------------------|--------------------| | Clack prompts + fetch, **no workaround** | exits clean, 4.26s | | Clack prompts (no fetch) + workaround | exits clean, 4.19s | | Clack prompts + fetch + **workaround** | **HANG 30s** | Upstream: [oven-sh/bun#29126](oven-sh/bun#29126) — Bun's `tty.ReadStream` extends `fs.ReadStream` with default highWaterMark; any `new ReadStream(tty_fd)` holds the libuv loop open and `destroy()` doesn't release the handle. Our workaround opened a second `tty.ReadStream` on `/dev/tty` alongside clack's `readline.createInterface(process.stdin)`, leaking that handle. ## Changes **Deleted:** - `src/lib/init/stdin-reopen.ts` (320 lines) - `test/lib/init/stdin-reopen.test.ts` (452 lines) - `using _tty = forwardFreshTtyToStdin()` + namespace import in `wizard-runner.ts` - 6 × `expect(closeFreshTtyForwardingSpy).toHaveBeenCalledTimes(1)` assertions + spy setup in `wizard-runner.test.ts` - The `setTimeout(process.exit, 100).unref()` safety net in `init.ts` (from #833 — no longer needed once the root cause is removed) **Kept (orthogonal & legitimate):** - PR #824's `using`/`Symbol.dispose` pattern for the MastraClient `AbortController` - PR #825's MastraClient `AbortController` cleanup ## Validation plan This cleanup deletes the workaround based on PTY-harness testing. The real-world `curl | bash` flow has subtle differences (different terminal types, macOS vs Linux glibc vs Alpine, bash vs zsh, etc.), so a phased rollout is recommended: 1. **Merge to main.** Triggers nightly GHCR publish. 2. **Nightly smoke test** — install from cli.sentry.dev/install with `SENTRY_VERSION=nightly SENTRY_INIT=1` on: - macOS (system Terminal.app) - Linux glibc (Ubuntu) - Linux musl (Alpine) - WSL 3. **Monitor Sentry telemetry** for `channel=nightly` users for a few days for any keystroke-delivery regressions. 4. **Promote to stable** after the nightly window confirms clean. If any platform regresses the original keystroke bug, the revert is a single commit away and we'll scope the workaround narrowly (e.g. `process.platform === "darwin"` only) instead of always-on. ## Test plan - [x] `bun test test/lib/init/ test/commands/init.test.ts` — 178 pass (15 deleted stdin-reopen tests accounted for) - [x] `bun test --timeout 15000 test/lib test/commands test/types` — 5762 pass, 0 fail - [x] `bun run typecheck` — clean - [x] `bun run lint` — clean (only pre-existing markdown.ts warning) - [ ] Manual nightly verification: `curl -fsSL https://cli.sentry.dev/install | SENTRY_VERSION=nightly SENTRY_INIT=1 bash` on each target platform. ## Follow-ups - File a Bun upstream issue specifically about `tty.ReadStream + process.stdin` handle leak (distinct from but related to #29126). - Once nightly telemetry confirms no regressions, propagate the pattern deletion to any other commands that might have adopted similar stdin workarounds (none currently; `init` was the only one).
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Create an
AbortControlleralongside theMastraClientinrunWizard, pass its signal viaabortSignalinClientOptions, and abort it from ausingdisposable so any in-flight fetches are canceled on every exit path (success, error, cancellation).Companion PR to #824 — both are independent follow-ups to #802 and can land in either order.
Why
MastraClienthas noclose()/dispose()API (verified innode_modules/@mastra/client-js/dist/client.d.ts). Without an explicit abort, keep-alive sockets in Bun's fetch dispatcher can hold the event loop alive past the wizard's natural exit, causing the shell to appear stuck. The originalprocess.exit(0)workaround (removed in #802) papered over this symptom by forcing exit; this PR addresses the root cause sosentry initreleases cleanly under any runtime.Explicit cancellation also removes the implicit dependency on Bun's fetch dispatcher auto-unref'ing idle sockets, making the fix robust across future Bun versions and alternative runtimes.
Implementation
AbortControllercreated per-runWizardcall. Scope matches theMastraClient.using _mastraCleanupdisposable callsabortController.abort()on every exit path. Idempotent — guarded bysignal.abortedto avoid double-abort diagnostics.fetchwrapper preservesinit.signalvia the existing object spread — MastraClient's per-request signals still reach the underlyingfetchcall.run.cancel(). The server observes the dropped fetch connection and cancels the run server-side. Avoids an extra HTTP round-trip during teardown, which could be slow if the server is why we're erroring.Tests
ClientOptionsfrom eachMastraClientinstance via a prototypegetWorkflowhook that readsthis.options(exposed viaBaseResource).abortSignalis aborted after success, tool-error, andWizardCancelledErrorpaths.MastraClientat construction (not pre-aborted) so in-flight fetches during the run are actually gated on it.Test plan
bun test test/lib/init/wizard-runner.test.ts— 19 pass, 0 failbun test test/lib/init/ test/commands/init.test.ts— greenbun run typecheck— cleanbun run lint— clean (only pre-existing markdown.ts warning)Notes
using(TS 5 / Bun native) in exactly one place — matches the pattern adopted in fix(init): harden /dev/tty teardown and adopt Symbol.dispose #824 for/dev/ttyteardown.tsconfig.jsonalready hastarget: "ESNext"so no build-config change needed.usingdeclaration is the second one inwizard-runner.ts— idiomatically consistent. If this lands first, it's the first one.