Skip to content

fix(init): cancel in-flight Mastra requests on teardown#825

Merged
BYK merged 2 commits intomainfrom
fix/init-mastra-abort-on-teardown
Apr 23, 2026
Merged

fix(init): cancel in-flight Mastra requests on teardown#825
BYK merged 2 commits intomainfrom
fix/init-mastra-abort-on-teardown

Conversation

@BYK
Copy link
Copy Markdown
Member

@BYK BYK commented Apr 23, 2026

Summary

Create an AbortController alongside the MastraClient in runWizard, pass its signal via abortSignal in ClientOptions, and abort it from a using disposable so any in-flight fetches are canceled on every exit path (success, error, cancellation).

Companion PR to #824 — both are independent follow-ups to #802 and can land in either order.

Why

MastraClient has no close()/dispose() API (verified in node_modules/@mastra/client-js/dist/client.d.ts). Without an explicit abort, keep-alive sockets in Bun's fetch dispatcher can hold the event loop alive past the wizard's natural exit, causing the shell to appear stuck. The original process.exit(0) workaround (removed in #802) papered over this symptom by forcing exit; this PR addresses the root cause so sentry init releases cleanly under any runtime.

Explicit cancellation also removes the implicit dependency on Bun's fetch dispatcher auto-unref'ing idle sockets, making the fix robust across future Bun versions and alternative runtimes.

Implementation

  • AbortController created per-runWizard call. Scope matches the MastraClient.
  • using _mastraCleanup disposable calls abortController.abort() on every exit path. Idempotent — guarded by signal.aborted to avoid double-abort diagnostics.
  • Custom fetch wrapper preserves init.signal via the existing object spread — MastraClient's per-request signals still reach the underlying fetch call.
  • No run.cancel(). The server observes the dropped fetch connection and cancels the run server-side. Avoids an extra HTTP round-trip during teardown, which could be slow if the server is why we're erroring.

Tests

  • Capture ClientOptions from each MastraClient instance via a prototype getWorkflow hook that reads this.options (exposed via BaseResource).
  • Assert abortSignal is aborted after success, tool-error, and WizardCancelledError paths.
  • Assert the signal is forwarded live to MastraClient at construction (not pre-aborted) so in-flight fetches during the run are actually gated on it.

Test plan

  • bun test test/lib/init/wizard-runner.test.ts — 19 pass, 0 fail
  • bun test test/lib/init/ test/commands/init.test.ts — green
  • bun run typecheck — clean
  • bun run lint — clean (only pre-existing markdown.ts warning)

Notes

Create an `AbortController` alongside the `MastraClient` in
`runWizard`, pass its signal via `abortSignal` in
`ClientOptions`, and abort it from a `using` disposable so any
in-flight fetches are canceled on every exit path (success, error,
cancellation).

Why this matters:
- `MastraClient` has no `close()`/`dispose()` API. Without an
  explicit abort, keep-alive sockets in Bun's fetch dispatcher can
  hold the event loop alive past the wizard's natural exit,
  causing the shell to appear stuck. The original
  `process.exit(0)` workaround (removed in #802) papered over
  this symptom; this PR addresses the root cause.
- Explicit cancellation means we no longer rely on Bun's fetch
  dispatcher to auto-unref idle sockets. Cross-runtime robust.

Implementation notes:
- `AbortController` is created per-`runWizard` call. Scope
  matches the `MastraClient`.
- `using _mastraCleanup` disposable calls
  `abortController.abort()` on every exit path. Idempotent
  (signal's `aborted` flag guards against double-abort).
- Custom `fetch` wrapper preserves `init.signal` via the
  spread — MastraClient's per-request signals still reach the
  underlying `fetch` call.
- No `run.cancel()` — the server observes the dropped fetch
  connection and cancels the run server-side without an extra
  round-trip during teardown.

Tests:
- Capture `ClientOptions` from each MastraClient instance via a
  prototype `getWorkflow` hook that reads `this.options`.
- Assert `abortSignal` is aborted after success, tool-error, and
  WizardCancelledError paths.
- Assert the signal is forwarded live to MastraClient at
  construction (not pre-aborted).
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://cli.sentry.dev/_preview/pr-825/

Built to branch gh-pages at 2026-04-23 09:43 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Codecov Results 📊

138 passed | Total: 138 | Pass Rate: 100% | Execution Time: 0ms

📊 Comparison with Base Branch

Metric Change
Total Tests
Passed Tests
Failed Tests
Skipped Tests

✨ No test changes detected

All tests are passing successfully.

✅ Patch coverage is 100.00%. Project has 1949 uncovered lines.
❌ Project coverage is 95.26%. Comparing base (base) to head (head).

Coverage diff
@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
- Coverage    95.28%    95.26%    -0.02%
==========================================
  Files          284       284         —
  Lines        41071     41076        +5
  Branches         0         0         —
==========================================
+ Hits         39131     39127        -4
- Misses        1940      1949        +9
- Partials         0         0         —

Generated by Codecov Action

Pre-merge review feedback cleanup:

- Remove the `if (!signal.aborted)` guard around
  `abortController.abort()`. `AbortController.abort()` is
  spec-idempotent — the guard added a line for zero behavior.
  The comment now states the idempotence property directly.

- Rewrite the "forwards a live abortSignal" test to actually
  prove what it claims. The previous version captured the
  signal from `startAsyncMock` and only re-asserted identity
  against `capturedClientOptions[0]` — tautological.

  New version reads `signal.aborted` from the `getWorkflow`
  spy (which runs synchronously at `new MastraClient(...)`
  time, before any fetch dispatch) and asserts it's `false`.
  Then asserts that by the time `runWizard` returns, the same
  signal is `true`. Proves both that the signal is live
  during construction AND that teardown aborts it.
@BYK BYK merged commit 70fae42 into main Apr 23, 2026
26 checks passed
@BYK BYK deleted the fix/init-mastra-abort-on-teardown branch April 23, 2026 09:50
BYK added a commit that referenced this pull request Apr 23, 2026
…831)

## Summary

At the end of a `sentry init` flow, the process hangs until the user
presses a key. Follow-up to #802, #824, #825 — addresses the third and
final contributor to the post-wizard hang.

## Root cause

Our no-op patch of `process.stdin.pause` silently swallowed clack's
`rl.close() → input.pause()` call. Stdin stayed in flowing/ref'd mode
from `readline.createInterface()`'s internal `input.resume()`, keeping
the libuv event loop alive until any keypress delivered a `data` event.

### Why the patch was needed

`stdin-reopen.ts` replaces `process.stdin.pause`/`resume` with no-ops at
install time to dodge Bun's fd-0 `EINVAL` on pause/resume transitions
(see the comment at the install site). That fix is correct and must
stay.

### Why the bug was invisible

`rl.close()` is the only place clack ever pauses stdin — it relies
entirely on Node's standard readline cleanup discipline. Our no-op patch
swallowed every call without any visible error, so there was no log
line, no warning, no failed assertion.

### Why PRs #802/#824/#825 didn't catch it

- #802 fixed the `/dev/tty` ReadStream contribution (explicit
`.destroy()`).
- #824 adopted `using`/`Symbol.dispose` for guaranteed teardown +
termios restore.
- #825 aborted the MastraClient signal to release keep-alive sockets.

Post-teardown state after all three fixes:
- `/dev/tty` ReadStream: destroyed ✓
- MastraClient AbortController: aborted ✓
- **`process.stdin`: still ref'd and flowing ✗** ← the remaining anchor

PR #782's original `process.exit(0)` workaround masked this by killing
the process unconditionally. Each subsequent PR peeled off one
contributor; stdin was the last one standing.

## Fix

One call to the just-restored `original.pause` at the end of
`closeFreshTtyForwarding()`:

```ts
// Release the libuv handle on fd 0. Clack's prompt lifecycle relies on
// `rl.close() → rl.pause() → this.input.pause()` to pause stdin, but we
// replaced `process.stdin.pause` with a no-op at install time... Now that
// the original `.pause()` is restored, invoke it directly so stock
// Node/Bun cleanup can finish. Idempotent: safe when stdin was already
// paused.
try {
  original.pause.call(process.stdin);
} catch {
  // Defensive: swallow errors from runtimes that throw if stdin is
  // already destroyed.
}
```

Rationale for `.pause()` over alternatives:
- Exactly what Node's `rl.close()` would have called — matches clack's
implicit contract.
- Idempotent on already-paused streams.
- Doesn't destroy the stream (unlike `.destroy()`); any future code that
wanted to read stdin (none does in `init`; it's a terminal command)
could still resume.

## Regression tests

Two new tests in `test/lib/init/stdin-reopen.test.ts`:

### Unit: teardown invokes restored pause exactly once

Replaces the beforeEach stub with a counting spy BEFORE install (so
install captures the spy as `original.pause`). Verifies:
- During install, `process.stdin.pause` is the patched no-op (not the
spy).
- Calls made mid-wizard hit the no-op (spy count stays 0).
- After teardown, `process.stdin.pause` is restored to the spy AND the
spy was invoked exactly once.

### Integration: stdin is not flowing after teardown

Puts `process.stdin` into flowing mode via real
`Readable.prototype.resume` (simulating what clack does via
`readline.createInterface`). Runs install + teardown. Asserts
`process.stdin.readableFlowing !== true` after disposal — without the
fix, this assertion would fail because the no-op pause never actually
pauses.

## Test plan

- [x] `bun test test/lib/init/stdin-reopen.test.ts` — 15 pass (13
existing + 2 new)
- [x] `bun test test/lib/init/ test/commands/init.test.ts` — 193 pass
- [x] `bun test --timeout 15000 test/lib test/commands test/types` —
5777 pass, 0 fail
- [x] `bun run typecheck` — clean
- [x] `bun run lint` — clean (only pre-existing markdown.ts warning)
- [ ] Manual: `curl -fsSL https://cli.sentry.dev/install | SENTRY_INIT=1
bash` — shell prompt should return immediately after "Setup complete"
without a keypress.

## Risk

Very low. Single call to a restored function (known state) guarded by
try/catch. No API changes, no new dependencies, no test fixture churn.
Two new tests exercise the exact regression.

## Out of scope

- Revisiting whether the no-op `pause`/`resume` patch is still necessary
with current Bun (Bun's fd-0 EINVAL may be fixed — worth investigating
later as a simplification, but not as part of this hot-fix).
- E2E spawn test asserting process-exits-without-keypress (requires pty
fixture infrastructure we don't currently have).
BYK added a commit that referenced this pull request Apr 23, 2026
## Summary

After "Sentry SDK installed successfully!", `sentry init` still hangs
until a keypress despite #802/#824/#825/#831. Root cause is a Bun 1.3.11
libuv refcount bug that userland cannot fix.

Restores PR #782's `process.exit` workaround, but properly wrapped in
`setTimeout(..., 100).unref()` so it's transparent in the happy path and
terminal only when the Bun bug bites.

## Root cause (verified)

Opening our fresh `/dev/tty` ReadStream (the `curl | bash` TTY-delivery
workaround in `stdin-reopen.ts`) combined with clack's internal
`readline.createInterface(process.stdin)` leaks a libuv handle that NO
userland cleanup releases. Verified by systematic matrix test against
real `/dev/tty` under a pty:

| Scenario                              | Result    |
|---------------------------------------|-----------|
| `fresh` alone                         | FAST ✓    |
| `readline.createInterface` alone      | FAST ✓    |
| `readline` + our pause/resume patch   | FAST ✓    |
| `fresh + readline`                    | HANG 7s   |
| `fresh + readline + fresh.destroy()`  | HANG 7s   |
| `fresh + readline + rl.close()`       | HANG 7s   |
| `fresh + readline + process.stdin.destroy()` | HANG 7s |
| `fresh + readline + removeAllListeners` | HANG 7s |
| `fresh + readline + setTimeout(exit, 100).unref()` | FAST (via forced
exit) |

`process.stdin.unref()` is `undefined` on Bun 1.3.11, so Node's
canonical "let the process exit" escape hatch isn't available.

## Why PRs #802/#824/#825/#831 didn't fix it

Each peeled off a **legitimate contributing cause** — all should stay:

- #802: `/dev/tty` ReadStream being ref'd (explicit `fresh.destroy()`)
- #824: hardened teardown via `using`/`Symbol.dispose` + termios restore
- #825: MastraClient keep-alive sockets (AbortController)
- #831: `process.stdin` flowing state (restored `pause()` call)

But the libuv refcount bug is a Bun-internal issue, not a stream-state
or socket issue. No amount of userland cleanup fixes it.

## Fix

Restore a force-exit safety net in `src/commands/init.ts`, wrapped in
`setTimeout(..., 100).unref()`:

```ts
if (process.env.NODE_ENV !== "test") {
  setTimeout(() => {
    process.exit(process.exitCode ?? 0);
  }, 100).unref();
}
```

Properties:
- **Transparent in the happy path** — when the loop drains naturally
(future Bun versions that fix the refcount bug, non-TTY flows, `--yes`
with no prompts), the `.unref()` timer doesn't hold the loop. Process
exits before the timer fires.
- **Terminal when needed** — when the Bun bug bites, the timer fires
after a 100ms grace period. Imperceptible to the user.
- **100ms grace period** — enough for Sentry telemetry flush and stdio
buffer flush to complete first. Matches best practices for terminal
commands.

## Test gate

`NODE_ENV !== "test"` guard: `bun test` sets `NODE_ENV=test`
automatically. Without this guard, each call to `initCommand.func` in
tests would schedule an unref'd 100ms timer; accumulated timers fire
across test files and terminate the test runner mid-suite. The guard
avoids this while leaving the safety net active in all real-world
invocations (including `bun run dev`, compiled binary, npm bundle).

## Test plan

- [x] `bun test test/commands/init.test.ts test/lib/init/` — 193 pass
- [x] `bun test --timeout 15000 test/lib test/commands test/types` —
5777 pass, 0 fail
- [x] `bun run typecheck` — clean
- [x] `bun run lint` — clean (only pre-existing markdown.ts warning)
- [x] Manual repro: the production scenario (real `/dev/tty` under a
pty) hangs for 7s without this fix, exits in 286ms with it.
- [ ] User validation via `curl -fsSL https://cli.sentry.dev/install |
SENTRY_INIT=1 bash` after merge.

## Follow-ups

- Exploration task: find an alternative to the fresh `/dev/tty`
ReadStream approach for the `curl | bash` TTY-delivery workaround (the
original bug #767 was fixing). If we can make that work without a second
ReadStream on stdin, the Bun refcount bug is sidestepped entirely and
the safety net becomes redundant.
- File a Bun upstream issue with the systematic matrix repro.

## Risk

Low. Single-file change. `.unref()` ensures the timer is transparent in
healthy flows. Guarded against test-runner interference. All prior fixes
remain in place because each addresses a legit cause.
BYK added a commit that referenced this pull request Apr 23, 2026
…ety net (#835)

## Summary

Delete `src/lib/init/stdin-reopen.ts` entirely and the
`setTimeout().unref()` safety net from #833. Net **−838 / +1 lines**.

The `forwardFreshTtyToStdin` workaround was created to fix a Bun
single-file-binary bug where TTY fds inherited via `curl | bash` → `exec
sentry init </dev/tty` (in install.sh) accepted `setRawMode(true)` but
never delivered keypress events. Research shows that bug is fixed on Bun
1.3.11 — and the workaround is actively causing the newer hang patched
by #833.

## Empirical findings

Reproduction harness: Python `pty.fork()` mirroring install.sh's exact
`exec bin </dev/tty` flow against `bun build --compile
--target=bun-linux-x64` binaries on Bun 1.3.11.

### The original bug is gone

| Observable                              | Original bug | Bun 1.3.11 |
|-----------------------------------------|--------------|------------|
| `process.stdin.isTTY` after `</dev/tty` | `undefined`  | `true`     |
| `setRawMode(true)`                      | no effect    | works      |
| `data` events on keystroke              | **never**    | delivered  |
| Clack `text/confirm/select` prompts     | hung forever | completes  |

Verified with three binaries running sequential clack prompts through
the exact `exec bin </dev/tty` invocation. All exit cleanly on Enter
without any workaround.

### The workaround IS the cause of the current hang

| Scenario (real `/dev/tty` under pty)       | Result             |
|--------------------------------------------|--------------------|
| Clack prompts + fetch, **no workaround**   | exits clean, 4.26s |
| Clack prompts (no fetch) + workaround      | exits clean, 4.19s |
| Clack prompts + fetch + **workaround**     | **HANG 30s**       |

Upstream:
[oven-sh/bun#29126](oven-sh/bun#29126) — Bun's
`tty.ReadStream` extends `fs.ReadStream` with default highWaterMark; any
`new ReadStream(tty_fd)` holds the libuv loop open and `destroy()`
doesn't release the handle. Our workaround opened a second
`tty.ReadStream` on `/dev/tty` alongside clack's
`readline.createInterface(process.stdin)`, leaking that handle.

## Changes

**Deleted:**
- `src/lib/init/stdin-reopen.ts` (320 lines)
- `test/lib/init/stdin-reopen.test.ts` (452 lines)
- `using _tty = forwardFreshTtyToStdin()` + namespace import in
`wizard-runner.ts`
- 6 × `expect(closeFreshTtyForwardingSpy).toHaveBeenCalledTimes(1)`
assertions + spy setup in `wizard-runner.test.ts`
- The `setTimeout(process.exit, 100).unref()` safety net in `init.ts`
(from #833 — no longer needed once the root cause is removed)

**Kept (orthogonal & legitimate):**
- PR #824's `using`/`Symbol.dispose` pattern for the MastraClient
`AbortController`
- PR #825's MastraClient `AbortController` cleanup

## Validation plan

This cleanup deletes the workaround based on PTY-harness testing. The
real-world `curl | bash` flow has subtle differences (different terminal
types, macOS vs Linux glibc vs Alpine, bash vs zsh, etc.), so a phased
rollout is recommended:

1. **Merge to main.** Triggers nightly GHCR publish.
2. **Nightly smoke test** — install from cli.sentry.dev/install with
`SENTRY_VERSION=nightly SENTRY_INIT=1` on:
   - macOS (system Terminal.app)
   - Linux glibc (Ubuntu)
   - Linux musl (Alpine)
   - WSL
3. **Monitor Sentry telemetry** for `channel=nightly` users for a few
days for any keystroke-delivery regressions.
4. **Promote to stable** after the nightly window confirms clean.

If any platform regresses the original keystroke bug, the revert is a
single commit away and we'll scope the workaround narrowly (e.g.
`process.platform === "darwin"` only) instead of always-on.

## Test plan

- [x] `bun test test/lib/init/ test/commands/init.test.ts` — 178 pass
(15 deleted stdin-reopen tests accounted for)
- [x] `bun test --timeout 15000 test/lib test/commands test/types` —
5762 pass, 0 fail
- [x] `bun run typecheck` — clean
- [x] `bun run lint` — clean (only pre-existing markdown.ts warning)
- [ ] Manual nightly verification: `curl -fsSL
https://cli.sentry.dev/install | SENTRY_VERSION=nightly SENTRY_INIT=1
bash` on each target platform.

## Follow-ups

- File a Bun upstream issue specifically about `tty.ReadStream +
process.stdin` handle leak (distinct from but related to #29126).
- Once nightly telemetry confirms no regressions, propagate the pattern
deletion to any other commands that might have adopted similar stdin
workarounds (none currently; `init` was the only one).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant