Skip to content

Fix: drain cloudflared stdout/stderr to prevent tunnel error 1033#1

Merged
paoloanzn merged 2 commits into
mainfrom
fix/drain-cloudflared-stdout
Mar 22, 2026
Merged

Fix: drain cloudflared stdout/stderr to prevent tunnel error 1033#1
paoloanzn merged 2 commits into
mainfrom
fix/drain-cloudflared-stdout

Conversation

@paoloanzn
Copy link
Copy Markdown
Contributor

@paoloanzn paoloanzn commented Mar 22, 2026

Summary

Fixes a critical bug where Cloudflare Tunnel connections would consistently die after 10-15 minutes, returning error 1033 ("Argo Tunnel error — tunnel not running") to all visitors.

Root Cause

The cloudflared subprocess was being launched with its stdout/stderr piped through Go's io.Pipe, which has zero internal buffering. Since nothing in our code was reading from the pipe, the following sequence occurred:

  1. cloudflared writes log output continuously during normal operation
  2. io.Pipe.Write() blocks immediately because no goroutine is calling Read() on the other end
  3. The cloudflared process freezes entirely — it can no longer send heartbeats to Cloudflare's edge network
  4. After ~10-15 minutes without heartbeats, Cloudflare declares the tunnel dead
  5. All subsequent requests to the tunnel hostname return HTTP 1033

This is a well-known Go footgun: io.Pipe is a synchronous, unbuffered pipe — unlike OS pipes which have kernel-level buffering (typically 64KB+).

A secondary issue was that waitForShutdown() only listened for ctx.Done() (Ctrl+C) or TTL expiry. If cloudflared died for any reason, the CLI would hang silently forever without noticing.

Changes

internal/exec/runner.go

  • Replaced io.Pipe with OS pipes (os.Pipe()) — these have kernel-level buffering and won't block the subprocess
  • Added a 1 MB ring buffer (ringBuffer) that two goroutines continuously drain stdout/stderr into — the subprocess can never block on log output regardless of volume
  • Added exitCh channel — closed when the process exits, allowing callers to detect death immediately without polling
  • Rewrote Stop(), Wait(), and Running() to use the exit channel instead of calling cmd.Wait() multiple times (which was undefined behavior)
  • Logs() now returns a snapshot of the ring buffer contents on demand

internal/tunnel/cloudflared.go

  • Added ExitCh() to the Connector interface
  • Implemented ExitCh() on ProcessConnector — delegates to the runner's exit channel

internal/pipeline/serve.go

  • Updated waitForShutdown() to also select on connector.ExitCh() — if cloudflared exits unexpectedly, the pipeline now immediately prints a warning and begins teardown instead of hanging forever

internal/testutil/mocks.go

  • Added FnExitCh and ExitCh() to MockConnector — defaults to a channel that never closes (simulating a healthy process)

Test plan

  • go build ./... — compiles cleanly
  • go test ./... — all tests pass (pipeline, cmd, config, origin, session)
  • Manual E2E: flare serve builtin:static --path ./demo --subdomain test and verify tunnel stays alive beyond 15 minutes
  • Verify that killing cloudflared externally (kill <pid>) triggers immediate teardown instead of silent hang

🤖 Generated with Claude Code

paoloanzn and others added 2 commits March 22, 2026 19:07
… 1033)

The root cause of Cloudflare error 1033 ("tunnel not running") after 10-15
minutes was that cloudflared's stdout/stderr were piped through io.Pipe which
has zero buffering. Once cloudflared tried to write a log line and no reader
was draining it, the process would block indefinitely, stop sending heartbeats
to Cloudflare's edge, and Cloudflare would kill the connection.

Changes:
- Replace io.Pipe with OS pipes (kernel-buffered) + a 1 MB ring buffer that
  continuously drains stdout/stderr so the subprocess never blocks
- Add exitCh channel to Runner so callers can detect process death immediately
- Expose ExitCh() on Connector interface so the pipeline can monitor cloudflared
- Update waitForShutdown to detect cloudflared dying and trigger teardown
  instead of sitting silently forever

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rror 1033)

Root cause: exec.CommandContext sends SIGKILL to the subprocess the instant
the parent context is cancelled. Since cloudflared's context comes from
signal.NotifyContext(SIGINT, SIGTERM), any signal to the Go process would
immediately kill cloudflared with no grace period and no logs.

Changes:
- Replace exec.CommandContext with exec.Command — we manage process
  lifecycle ourselves via Stop() (SIGTERM → wait → SIGKILL)
- Add --no-autoupdate to cloudflared args to prevent self-restart
- Persist cloudflared logs to ~/.config/flare-cli/logs/ for post-mortem
  diagnostics when the process dies unexpectedly
- Dump last 50 lines of cloudflared output on unexpected exit
- Add LogFilePath() to Connector interface for log file discovery

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@paoloanzn paoloanzn merged commit da66cd1 into main Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant