Recover from failed Ghostty surface creation (#217)#218
Merged
dhilgaertner merged 1 commit intomainfrom Apr 29, 2026
Merged
Conversation
Auto-review sessions could hang on "Waiting for terminal" forever when ghostty_surface_new returned nil — the broken GhosttySurfaceView stayed cached in TerminalManager, no retry was attempted, and the UI had no failure path. This adds bounded retries inside GhosttySurfaceView (4 attempts with backoff up to ~7.5s), makes TerminalManager.surface(for:) treat a cached view with a nil surface as a miss, propagates a new .failed terminal-readiness state through SessionService, and surfaces an error overlay with a Retry button in ReadinessAwareTerminal that destroys the broken view and re-preInitializes a fresh one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dgershman
approved these changes
Apr 29, 2026
Collaborator
dgershman
left a comment
There was a problem hiding this comment.
Code & Security Review
Critical Issues
None found.
Security Review
Strengths:
- All new state and callbacks are
@MainActor-isolated, consistent with the existing concurrency model. - No new external inputs or command injection surfaces introduced — retry logic operates purely on internal state.
TerminalManager.retry()properly destroys the old surface before re-creating, preventing resource leaks.- The
onSurfaceCreationFailedcallback is cleared indestroy(), avoiding retain-cycle or use-after-free issues.
Concerns:
- None. The changes are internal to the app's process and don't introduce new trust boundaries.
Code Quality
Well-structured changes:
- Clean separation of concerns:
GhosttySurfaceViewowns retry mechanics,TerminalManagerbridges toSessionService, andSessionServiceowns the recovery/re-arm logic. - The
TerminalReadiness.failedenum case is inserted withsortOrder: -1, correctly ordering it below all other states. TheComparableconformance means the existing< .shellReadyguard in the loading overlay naturally covers the new case without modification. retryDelaysas a static constant with exponential backoff (0.5s → 4.0s, ~7.5s total) is a reasonable budget — long enough for transient Ghostty init issues, short enough to not leave users hanging.- The
surface(for:)cache-miss path correctly discards broken views (hasSurface == false), preventing callers from getting permanently stuck with a nil-surface view. DispatchQueue.main.asyncAfterretry with[weak self]guard +self.surface == nilcheck is correct — prevents retrying after success or deallocation.
Minor observations (non-blocking):
createAttemptsis reset to 0 both on success (line 93) and after exhausting retries (line 118). This means a view that fails and is thendestroy()'d + re-created will correctly restart from attempt 0. Good.- The
wireTerminalReadinesscallback guardguard let currentState = self.appState.terminalReadiness[terminalID]would skip the.failedstate update if the terminal isn't tracked. However, the.failedcase inside the switch is only reachable after the guard passes, and all managed terminals are tracked — so this is correct.
Summary Table
| Priority | Issue |
|---|---|
| 🟢 | Retry backoff schedule is reasonable but not configurable — fine for now |
| 🟢 | The fixed 5s shell-readiness delay (existing code, not new) is still a TODO — not introduced by this PR |
Recommendation: Approve — well-scoped, clean implementation with correct resource management and no security concerns.
This was referenced May 2, 2026
dhilgaertner
added a commit
that referenced
this pull request
May 4, 2026
…xes (#238) Closes #235 ## Summary - Adds `docs/automation.md` as the canonical guide to Settings → Automation and the full auto-flow lifecycle. - Updates `docs/architecture.md` with the dual Ghostty/tmux backend (#229), the new `TerminalRouter` dispatch, the Settings tab split (#228), and the Review Board surface (#188, #205, #207, #210, #212, #220, #226, #231). - Adds `crow rename-terminal` (#206) to `docs/cli-reference.md`. - Adds troubleshooting rows for tmux backend missing, Ghostty surface retry (#218), GitLab nested groups (#233), `GITLAB_HOST` silent skip (#215), auto-respond not firing, and the silent no-op `hook-event` behavior (#234). - Adds `CROW_TMUX_BACKEND` and `CROW_HOOK_DEBUG` to the `docs/configuration.md` env-var table. - Backfills `CHANGELOG.md` `[Unreleased]` with every PR from #137 through #234, grouped by theme. - Updates `README.md` features list and docs index for the new automation suite, review board, terminal renaming, and tmux opt-in. - Appends an "Implementation Status (2026-05)" footer to `docs/terminal-runtime-research.md` noting #229 shipped the headless-PTY backend recommended in the original research. The audit checklist called out as deliverable #1 in the issue is posted as a [comment on #235](#235 (comment)). ## Test plan - [ ] `git diff --stat main` shows only the listed `docs/`, `CHANGELOG.md`, and `README.md` files - [ ] Render each modified doc on GitHub and confirm anchors / cross-links resolve - [ ] Confirm `crow --help` matches the command list in `docs/cli-reference.md` (only `rename-terminal` was missing pre-PR) - [ ] Walk every PR number in the CHANGELOG against `git log --since=2026-04-15 main --oneline` and confirm each one resolves - [ ] Re-export `docs/crow-screenshot.jpeg` against the current Settings/Review-Board UI — **deferred to a follow-up**, called out in the audit comment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #217
Auto-review sessions could hang on "Waiting for terminal" forever when
ghostty_surface_newreturned nil — the brokenGhosttySurfaceViewstayed cached in
TerminalManager, no retry was attempted, and the UIhad no failure path. This implements all three remediations called out
in the ticket:
GhosttySurfaceView.createSurface(): 4attempts at 0.5s / 1.0s / 2.0s / 4.0s before giving up. On exhaustion
it fires a new
onSurfaceCreationFailedcallback.TerminalManager.surface(for:)rejects broken cached views: ifthe cached entry's
hasSurfaceis false, it's destroyed and replacedinstead of returned.
TerminalReadiness.failedstate (also
SurfaceState.failed) flows fromsurfaceDidFail→wireTerminalReadiness→appState.terminalReadiness,and
ReadinessAwareTerminalrenders an error overlay with a Retrybutton. Retry routes through a new
appState.onRetryTerminalcallback to
SessionService.retryTerminal(terminalID:), which resetsreadiness, re-arms auto-launch, and asks
TerminalManagerto destroythe broken view and re-
preInitialize.The session-list status dot also gets a red "failed" variant so the
problem is visible from the sidebar.
Out of scope: a session-level "failed"
SessionStatus. Theterminal-level state is enough to recover; adding a new session status
would ripple through CLI, persistence, and validation.
Test plan
after `ghostty_surface_new` in `GhosttySurfaceView.createSurface`)
and confirm the retry log fires 4 times with backoff, then
`[Ghostty] createSurface() exhausted retries` is logged.
Retry button instead of an indefinite "Waiting for terminal" spinner.
`GhosttySurfaceView`, reaches `.shellReady`, and auto-launches Claude.
logs `cached view has nil surface, discarding` and rebuilds the view.
preinit still reach `.shellReady` within ~5s and auto-launch Claude.
🤖 Generated with Claude Code