Phase 4 (#135): instrumentation + remove starving 300ms Task.sleep#139
Merged
Phase 4 (#135): instrumentation + remove starving 300ms Task.sleep#139
Conversation
Phase 4A of issue #135. Phases 1-3 catch the daemon post-reload wedge (CLI stall-detects at 30s, test harness pthread-terminates at 30s), but we still don't know WHERE the handler hangs. The server stderr dump from PR #134 currently goes silent after "Reloaded!", leaving root-cause analysis speculative. This PR adds ~8 targeted `fputs` markers through `handlePreviewSnapshot` and one after the reload's `MainActor.run` block. Each marker writes to the server subprocess's stderr — captured in the per-instance temp file and dumped by PR #134's post-failure CI step. When the next flake hits, the CI log will show exactly how far the handler got before the wedge: [snapshot] enter [snapshot] pre session-check ← MainActor queue blocked here? [snapshot] post session-check [snapshot] post 300ms sleep ← Task.sleep problem? [snapshot] main-actor capture enter ← MainActor work starts [snapshot] main-actor capture done ← Snapshot.capture returned [snapshot] encoding [snapshot] returning Plus on the reload side, after the MainActor.run block containing `loadPreview`: [reload] main-actor block returned No behavior change. `fputs` + `fflush` is synchronous libc, no Swift concurrency involvement, no allocations beyond the literal strings. Verified locally: the full 7-stage sequence appears on the happy path of `hotReloadStructural`. Phase 4B (the actual fix) is gated on seeing what a real CI failure shows. This PR is the diagnostic prerequisite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-1 review of #139 flagged three optional polishes worth applying before merge: - The `[snapshot] main-actor capture enter` marker fires on entry but the `captureFailed` throw path (when `host.window(for:)` returns nil) exits the MainActor block without a matching marker. Added a do/catch around the block with a `[snapshot] main-actor capture threw: ...` marker. This way the CI dump shows we exited the block one way or another, even on error paths. - Added a `TODO(#135-4B)` tag on the instrumentation's doc comment so the cleanup obligation is grep-visible when Phase 4B lands. - Expanded the doc comment with two clarifications the reviewer verified: (1) markers are not user-visible because `DaemonClient.registerStderrLogForwarder` forwards MCP `LogMessageNotification` payloads, not raw daemon stderr; (2) macOS snapshot handler only — iOS returns early above. No behavior change. `swift test --filter hotReloadStructural` passes in 8.5s; manually verified the new throw-path marker does not fire on the happy path (only on the failure path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4A (same PR, prior commit) added markers that caught the
post-reload wedge on the first CI run after merge. Run 72439165932's
server stderr dump showed:
[snapshot] enter
[snapshot] pre session-check
[snapshot] post session-check ← last marker before silence
The next marker `[snapshot] post 300ms sleep` never fires. The ONLY
code between those two markers is:
guard isMacOSSession else { ... } // no await
try await Task.sleep(for: .milliseconds(300)) // ← wedges here
The hang is `Task.sleep`, which ultimately depends on Swift's
cooperative scheduler to resume. On a hot-reload test that polls
snapshots rapidly without swiftc breaks (the literal-only path in
particular — dozens of rapid snapshot calls per `awaitSnapshotChange`
loop), the pool accumulates enough load that the timer's continuation
stays unscheduled. Test `hotReloadLiteralOnly` hit this at around
the 8th snapshot cycle on CI run 72439165932.
This is exactly the failure mode issue #135 described — and the
reason PR #136 built a pthread-based timeout for the test harness.
Same pattern, different place.
The sleep itself is unnecessary: `Snapshot.capture` calls
`NSView.cacheDisplay(in:to:)`, which forces layout synchronously
when the view is dirty. The MCP client→server round trip already
gives the UI plenty of time to settle between view mount and
capture. Verified: all 7 MacOSMCPTests pass without the sleep —
including `hotReloadLiteralOnly` and `hotReloadStructural`, both
of which assert image bytes actually change after the reload.
The sleep was added in the initial commit (820b0cd) with the
generic comment "Wait briefly for SwiftUI to finish layout". No
evidence it was ever load-bearing; appears to have been defensive
paranoia that turned load-bearing only as an unintended starvation
trigger.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes issue #135.
Origin and plan change
Opened as Phase 4A — diagnostic-only instrumentation to localize the post-reload wedge that Phases 1-3 caught-but-didn't-fix. The first CI run on this PR (run 72439165932) caught the wedge in action and pinpointed it to one specific line, so Phase 4B (the actual fix) got added to this same PR rather than a follow-up.
What the markers caught
From the CI dump, server `61077F6F`'s stderr after ~8 successful snapshot cycles:
```
[snapshot] enter
[snapshot] pre session-check
[snapshot] post session-check ← last marker before 10-minute silence
```
The ONLY code between `[snapshot] post session-check` and the never-fired `[snapshot] post 300ms sleep` is:
```swift
guard isMacOSSession else { ... } // no await
try await Task.sleep(for: .milliseconds(300)) // ← wedges here
```
`Task.sleep` ultimately depends on Swift's cooperative scheduler to resume. Under rapid snapshot polling (literal-only path has no swiftc breaks to drain the pool), the timer's continuation stays unscheduled — classic cooperative-pool starvation, the same class of failure PR #136 addressed for the test harness.
The fix
Remove the 300ms sleep. It was added in the initial commit (820b0cd) with the generic comment "Wait briefly for SwiftUI to finish layout." It's unnecessary: `Snapshot.capture` calls `NSView.cacheDisplay(in:to:)`, which forces layout synchronously when the view is dirty. The client→server round-trip already gives the UI plenty of time to settle.
All 7 `MacOSMCPTests` pass without the sleep — including both hot-reload tests, which assert image bytes actually change after the reload (would fail if capture returned stale frames).
Diagnostic markers stay
Keeping the `[snapshot]` and `[reload]` markers in place for at least one release cycle. They fire on every snapshot call on the macOS path (test-only visibility — `DaemonClient.registerStderrLogForwarder` doesn't forward raw daemon stderr), and they're the only reason we found the root cause. Tagged with `TODO(#135-4B)` for future cleanup. Follow-up PR can remove them once we're confident no other failure mode lurks in this handler.
Test plan
Related
Closes #135. Builds on all prior work:
previewsmcp logssubcommand and promote Debugging section #133 — the original logs subcommand that this whole epic unblocked🤖 Generated with Claude Code