Skip to content

fix(tui): drain vt emulator reply pipe to prevent compositor deadlock#362

Merged
dpup merged 4 commits into
majorcontext:mainfrom
abezzub-dr:fix/tui-compositor-emulator-reply-deadlock
May 28, 2026
Merged

fix(tui): drain vt emulator reply pipe to prevent compositor deadlock#362
dpup merged 4 commits into
majorcontext:mainfrom
abezzub-dr:fix/tui-compositor-emulator-reply-deadlock

Conversation

@abezzub-dr
Copy link
Copy Markdown
Contributor

@abezzub-dr abezzub-dr commented May 28, 2026

Fixes moat hanging on startup every session. Not sure what changed, the issue started today and I can't use moat without this fix. Description by Claude below.

Summary

When the child inside the container emits any terminal query that requires a reply (CSI c Primary Device Attributes, CSI 6n cursor position, in-band resize, color queries, etc.), moat's compositor freezes on the first paint. From the user's perspective the screen rendered the initial banner and then never updated again. Input still reached the child — so Ctrl+C typed twice into a "frozen" Claude actually terminated the container — but no further output reached the host terminal. With Claude Code 2.1.150+ this reproduces 100% of the time.

Root cause

vt.NewEmulator constructs an internal io.Pipe for replies the emulator generates on the child's behalf. The handlers write replies into the pipe writer, expecting a consumer to drain the reader side. Moat never drained it. The first reply-bearing handler blocked inside io.PipeWriter.Write — reached from tui.Writer.Write under w.mu — so renderLoop could never reacquire the mutex to paint another frame.

Goroutine dump from a stuck session showed the exact chain:

docker.StartAttached.func1                ← io.Copy(opts.Stdout, resp.Reader)
  io.multiWriter.Write → trace.RecordingWriter.Write
    tui.Writer.Write                       (writer.go:171, HOLDS w.mu)
      processDataLocked → outputLocked
        vt.Emulator.Write → ansi.Parser.advance → handleCsi 'c'
          io.PipeWriter.Write              ← BLOCKED FOREVER (no reader)

And, in another goroutine, renderLoop [sync.Mutex.Lock, 3 minutes].

Changes

internal/tui/writer.go

  • Add Injector interface (single method Inject([]byte) error) and an injector field on Writer, plus SetInjector to wire it up.
  • On compositor entry, spawn drainEmulatorReplies goroutine that copies em.Readinjector.Inject. Bytes are still drained if no injector is wired, so the deadlock can't recur regardless of caller plumbing — the child just doesn't see its replies in that case and falls back to defaults.
  • On compositor exit / Reset / Cleanup: close the emulator's input pipe via em.InputPipe().(io.Closer).Close() to unblock Read and let the drain goroutine exit. We deliberately avoid vt.Emulator.Close() — it writes e.closed unsynchronized with concurrent Read calls and trips -race. The pipe close is documented as safe for parallel Close-vs-Read and achieves the same effective shutdown.

cmd/moat/cli/exec.go

  • After constructing the InjectableReader for the stdin chain, call statusWriter.SetInjector(injectable). The injectable already exists for synthetic keystrokes (Ctrl+L from resetTUI), so emulator replies travel the same recorded path and appear in trace dumps alongside user input.

Test plan

  • New test TestWriter_CompositorMode_DeviceAttributesQuery_DoesNotDeadlock — sends CSI c in compositor mode, fails (deadlock) without the fix, passes with it.
  • New test TestWriter_CompositorMode_InjectsEmulatorReplies — verifies the Primary DA reply (begins with CSI ?) reaches the configured injector.
  • go test -race ./internal/tui/ clean.
  • make test-unit passes.
  • Manual: rebuilt binary, ran moat claude in a project with a moat.yaml that triggered the freeze 100% of the time before this PR — Claude now starts and accepts input normally.

When the child inside the container emitted any terminal query that
required a reply (CSI c Primary Device Attributes, CSI 6n cursor
position, etc.), moat's compositor froze on the first paint. From the
user's perspective the screen rendered the initial banner and then
never updated again — input still reached the child (so Ctrl+C typed
twice into a "frozen" Claude actually terminated it), but no further
output reached the host terminal.

Root cause: vt.NewEmulator creates an internal io.Pipe for replies the
emulator generates on the child's behalf. The emulator's handlers
write replies to the pipe writer, expecting a consumer to drain the
reader side. Moat never drained it. The first reply-bearing handler
blocked inside io.PipeWriter.Write — which is reached from Writer.Write
under w.mu — so the renderLoop goroutine could never reacquire the
mutex to paint another frame.

Fix: add an Injector interface and a SetInjector method on tui.Writer.
On compositor entry, spawn a drain goroutine that reads from the
emulator and forwards bytes to the injector (the existing
InjectableReader in exec.go's stdin chain). If no injector is wired
up, replies are drained to /dev/null — so the deadlock can never
recur regardless of caller plumbing. The child still sees its replies
when an injector is present.

Shutdown: closing the emulator's input pipe writer (an io.PipeWriter,
documented thread-safe) unblocks Read so the drain goroutine exits.
We deliberately avoid vt.Emulator.Close(): it sets e.closed
unsynchronized with concurrent Read calls and trips the race detector.

Tests: TestWriter_CompositorMode_DeviceAttributesQuery_DoesNotDeadlock
fails reliably without the fix (2s timeout via SIGQUIT goroutine dump
shows the exact stack from production); a second test verifies the
Primary DA reply reaches the injector. Full unit suite passes under
-race.
Addresses review findings on majorcontext#362.

The drain goroutine called inj.Inject directly, which delegates to
io.PipeWriter.Write on the InjectableReader's pipe — that pipe is
drained by Docker's stdin-copy goroutine, which back-pressures if the
container's stdin reader stalls (paused child, PTY input buffer full,
etc.). Under back-pressure Inject would block, drain would stall, and
the NEXT reply-bearing emulator handler (CSI 6n, InBandResize from
Writer.Resize on SIGWINCH, a second DA query) would block inside
outputLocked under w.mu — the same deadlock this PR fixes, one
indirection deeper. The stale-bytes leak on compositor exit→re-enter
shared the same root cause: drain could be parked in Inject holding
old-session bytes when the exit closed the emulator out from under it.

Fix: interpose a bounded buffered channel between drain and a small
forwarder goroutine. Drain reads em.pr → pushes onto the channel
non-blocking (drop-on-full); forwarder reads channel → calls Inject.
The emulator pipe is always drained promptly regardless of downstream
liveness. Under sustained back-pressure we drop replies — acceptable
for capability queries: real terminals occasionally drop these too,
and the child falls back to defaults.

New test TestWriter_CompositorMode_BlockedInjector_DoesNotBlockEmulator
reproduces the back-pressure scenario (blocking injector + 100 CSI c
queries in a row) and fails reliably without the decoupling.

Also in this commit:

- Fix the SetInjector docstring contradiction (paragraph 1 claimed
  not-calling-it causes deadlock; paragraph 2 correctly said replies
  are always drained). Reworded to lead with the always-drain invariant.
- Make closeEmulatorReplyPipe log.Warn when InputPipe no longer
  satisfies io.Closer instead of silently no-opping — a vt-package API
  change would otherwise silently re-introduce the freeze.
- Make TestWriter_CompositorMode_DeviceAttributesQuery_DoesNotDeadlock
  skip Cleanup on the timeout path so a future regression fails at 2s
  instead of waiting for go test's outer -timeout. Cleanup needs w.mu
  which the leaked sub-goroutine holds during a deadlock.
… name

Addresses two follow-up review findings on majorcontext#362.

closeEmulatorReplyPipe: rephrase the log.Warn for the io.Closer
type-assertion-failure path. The previous message claimed a freeze
would recur, which would mislead on-call diagnostics. The actual
fallout is per-session leakage — the in-flight drain stays parked on
em.Read once we orphan the emulator (nothing further writes to its
pipe) — but each new compositor entry spawns a fresh emulator+drain
pair that works correctly. The symptom is growing goroutine count,
not a screen freeze. Also expand the surrounding godoc to spell this
out so future readers don't repeat the misdiagnosis.

enterCompositorLocked: replace a stale reference to a never-defined
"stopCompositorLocked" with the real call sites (closeEmulatorReplyPipe,
invoked from exitCompositorLocked / Reset / Cleanup). Greppability.
@dpup
Copy link
Copy Markdown
Collaborator

dpup commented May 28, 2026

Nice find. Have you been able to manual test in both docker and apple containers?

@abezzub-dr
Copy link
Copy Markdown
Contributor Author

Claude starts with both docker and apple containers.

@dpup dpup merged commit 7674d46 into majorcontext:main May 28, 2026
4 checks passed
@vanakema
Copy link
Copy Markdown

vanakema commented Jun 1, 2026

@dpup would it be possible to get a release cut with this fix ASAP? Sucks to not be able to use the new version of claude w/ moat rn.

@dpup
Copy link
Copy Markdown
Collaborator

dpup commented Jun 1, 2026

@vanakema yeah, let me do that now. Sorry. The weird thing is it stopped hitting me...

@dpup dpup mentioned this pull request Jun 1, 2026
2 tasks
@vanakema
Copy link
Copy Markdown

vanakema commented Jun 1, 2026

I see the release was cut; thank you so much @dpup!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants