fix(tui): drain vt emulator reply pipe to prevent compositor deadlock#362
Merged
dpup merged 4 commits intoMay 28, 2026
Merged
Conversation
When the child inside the container emitted any terminal query that required a reply (CSI c Primary Device Attributes, CSI 6n cursor position, etc.), moat's compositor froze on the first paint. From the user's perspective the screen rendered the initial banner and then never updated again — input still reached the child (so Ctrl+C typed twice into a "frozen" Claude actually terminated it), but no further output reached the host terminal. Root cause: vt.NewEmulator creates an internal io.Pipe for replies the emulator generates on the child's behalf. The emulator's handlers write replies to the pipe writer, expecting a consumer to drain the reader side. Moat never drained it. The first reply-bearing handler blocked inside io.PipeWriter.Write — which is reached from Writer.Write under w.mu — so the renderLoop goroutine could never reacquire the mutex to paint another frame. Fix: add an Injector interface and a SetInjector method on tui.Writer. On compositor entry, spawn a drain goroutine that reads from the emulator and forwards bytes to the injector (the existing InjectableReader in exec.go's stdin chain). If no injector is wired up, replies are drained to /dev/null — so the deadlock can never recur regardless of caller plumbing. The child still sees its replies when an injector is present. Shutdown: closing the emulator's input pipe writer (an io.PipeWriter, documented thread-safe) unblocks Read so the drain goroutine exits. We deliberately avoid vt.Emulator.Close(): it sets e.closed unsynchronized with concurrent Read calls and trips the race detector. Tests: TestWriter_CompositorMode_DeviceAttributesQuery_DoesNotDeadlock fails reliably without the fix (2s timeout via SIGQUIT goroutine dump shows the exact stack from production); a second test verifies the Primary DA reply reaches the injector. Full unit suite passes under -race.
Addresses review findings on majorcontext#362. The drain goroutine called inj.Inject directly, which delegates to io.PipeWriter.Write on the InjectableReader's pipe — that pipe is drained by Docker's stdin-copy goroutine, which back-pressures if the container's stdin reader stalls (paused child, PTY input buffer full, etc.). Under back-pressure Inject would block, drain would stall, and the NEXT reply-bearing emulator handler (CSI 6n, InBandResize from Writer.Resize on SIGWINCH, a second DA query) would block inside outputLocked under w.mu — the same deadlock this PR fixes, one indirection deeper. The stale-bytes leak on compositor exit→re-enter shared the same root cause: drain could be parked in Inject holding old-session bytes when the exit closed the emulator out from under it. Fix: interpose a bounded buffered channel between drain and a small forwarder goroutine. Drain reads em.pr → pushes onto the channel non-blocking (drop-on-full); forwarder reads channel → calls Inject. The emulator pipe is always drained promptly regardless of downstream liveness. Under sustained back-pressure we drop replies — acceptable for capability queries: real terminals occasionally drop these too, and the child falls back to defaults. New test TestWriter_CompositorMode_BlockedInjector_DoesNotBlockEmulator reproduces the back-pressure scenario (blocking injector + 100 CSI c queries in a row) and fails reliably without the decoupling. Also in this commit: - Fix the SetInjector docstring contradiction (paragraph 1 claimed not-calling-it causes deadlock; paragraph 2 correctly said replies are always drained). Reworded to lead with the always-drain invariant. - Make closeEmulatorReplyPipe log.Warn when InputPipe no longer satisfies io.Closer instead of silently no-opping — a vt-package API change would otherwise silently re-introduce the freeze. - Make TestWriter_CompositorMode_DeviceAttributesQuery_DoesNotDeadlock skip Cleanup on the timeout path so a future regression fails at 2s instead of waiting for go test's outer -timeout. Cleanup needs w.mu which the leaked sub-goroutine holds during a deadlock.
… name Addresses two follow-up review findings on majorcontext#362. closeEmulatorReplyPipe: rephrase the log.Warn for the io.Closer type-assertion-failure path. The previous message claimed a freeze would recur, which would mislead on-call diagnostics. The actual fallout is per-session leakage — the in-flight drain stays parked on em.Read once we orphan the emulator (nothing further writes to its pipe) — but each new compositor entry spawns a fresh emulator+drain pair that works correctly. The symptom is growing goroutine count, not a screen freeze. Also expand the surrounding godoc to spell this out so future readers don't repeat the misdiagnosis. enterCompositorLocked: replace a stale reference to a never-defined "stopCompositorLocked" with the real call sites (closeEmulatorReplyPipe, invoked from exitCompositorLocked / Reset / Cleanup). Greppability.
Collaborator
|
Nice find. Have you been able to manual test in both docker and apple containers? |
Contributor
Author
|
Claude starts with both docker and apple containers. |
|
@dpup would it be possible to get a release cut with this fix ASAP? Sucks to not be able to use the new version of claude w/ moat rn. |
Collaborator
|
@vanakema yeah, let me do that now. Sorry. The weird thing is it stopped hitting me... |
|
I see the release was cut; thank you so much @dpup! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes moat hanging on startup every session. Not sure what changed, the issue started today and I can't use moat without this fix. Description by Claude below.
Summary
When the child inside the container emits any terminal query that requires a reply (CSI c Primary Device Attributes, CSI 6n cursor position, in-band resize, color queries, etc.), moat's compositor freezes on the first paint. From the user's perspective the screen rendered the initial banner and then never updated again. Input still reached the child — so Ctrl+C typed twice into a "frozen" Claude actually terminated the container — but no further output reached the host terminal. With Claude Code 2.1.150+ this reproduces 100% of the time.
Root cause
vt.NewEmulatorconstructs an internalio.Pipefor replies the emulator generates on the child's behalf. The handlers write replies into the pipe writer, expecting a consumer to drain the reader side. Moat never drained it. The first reply-bearing handler blocked insideio.PipeWriter.Write— reached fromtui.Writer.Writeunderw.mu— sorenderLoopcould never reacquire the mutex to paint another frame.Goroutine dump from a stuck session showed the exact chain:
And, in another goroutine,
renderLoop [sync.Mutex.Lock, 3 minutes].Changes
internal/tui/writer.goInjectorinterface (single methodInject([]byte) error) and aninjectorfield onWriter, plusSetInjectorto wire it up.drainEmulatorRepliesgoroutine that copiesem.Read→injector.Inject. Bytes are still drained if no injector is wired, so the deadlock can't recur regardless of caller plumbing — the child just doesn't see its replies in that case and falls back to defaults.Reset/Cleanup: close the emulator's input pipe viaem.InputPipe().(io.Closer).Close()to unblockReadand let the drain goroutine exit. We deliberately avoidvt.Emulator.Close()— it writese.closedunsynchronized with concurrentReadcalls and trips-race. The pipe close is documented as safe for parallel Close-vs-Read and achieves the same effective shutdown.cmd/moat/cli/exec.goInjectableReaderfor the stdin chain, callstatusWriter.SetInjector(injectable). The injectable already exists for synthetic keystrokes (Ctrl+L fromresetTUI), so emulator replies travel the same recorded path and appear in trace dumps alongside user input.Test plan
TestWriter_CompositorMode_DeviceAttributesQuery_DoesNotDeadlock— sendsCSI cin compositor mode, fails (deadlock) without the fix, passes with it.TestWriter_CompositorMode_InjectsEmulatorReplies— verifies the Primary DA reply (begins withCSI ?) reaches the configured injector.go test -race ./internal/tui/clean.make test-unitpasses.moat claudein a project with amoat.yamlthat triggered the freeze 100% of the time before this PR — Claude now starts and accepts input normally.