Fix: drain cloudflared stdout/stderr to prevent tunnel error 1033#1
Merged
Conversation
… 1033)
The root cause of Cloudflare error 1033 ("tunnel not running") after 10-15
minutes was that cloudflared's stdout/stderr were piped through io.Pipe which
has zero buffering. Once cloudflared tried to write a log line and no reader
was draining it, the process would block indefinitely, stop sending heartbeats
to Cloudflare's edge, and Cloudflare would kill the connection.
Changes:
- Replace io.Pipe with OS pipes (kernel-buffered) + a 1 MB ring buffer that
continuously drains stdout/stderr so the subprocess never blocks
- Add exitCh channel to Runner so callers can detect process death immediately
- Expose ExitCh() on Connector interface so the pipeline can monitor cloudflared
- Update waitForShutdown to detect cloudflared dying and trigger teardown
instead of sitting silently forever
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rror 1033) Root cause: exec.CommandContext sends SIGKILL to the subprocess the instant the parent context is cancelled. Since cloudflared's context comes from signal.NotifyContext(SIGINT, SIGTERM), any signal to the Go process would immediately kill cloudflared with no grace period and no logs. Changes: - Replace exec.CommandContext with exec.Command — we manage process lifecycle ourselves via Stop() (SIGTERM → wait → SIGKILL) - Add --no-autoupdate to cloudflared args to prevent self-restart - Persist cloudflared logs to ~/.config/flare-cli/logs/ for post-mortem diagnostics when the process dies unexpectedly - Dump last 50 lines of cloudflared output on unexpected exit - Add LogFilePath() to Connector interface for log file discovery Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a critical bug where Cloudflare Tunnel connections would consistently die after 10-15 minutes, returning error 1033 ("Argo Tunnel error — tunnel not running") to all visitors.
Root Cause
The
cloudflaredsubprocess was being launched with its stdout/stderr piped through Go'sio.Pipe, which has zero internal buffering. Since nothing in our code was reading from the pipe, the following sequence occurred:cloudflaredwrites log output continuously during normal operationio.Pipe.Write()blocks immediately because no goroutine is callingRead()on the other endThis is a well-known Go footgun:
io.Pipeis a synchronous, unbuffered pipe — unlike OS pipes which have kernel-level buffering (typically 64KB+).A secondary issue was that
waitForShutdown()only listened forctx.Done()(Ctrl+C) or TTL expiry. If cloudflared died for any reason, the CLI would hang silently forever without noticing.Changes
internal/exec/runner.goio.Pipewith OS pipes (os.Pipe()) — these have kernel-level buffering and won't block the subprocessringBuffer) that two goroutines continuously drain stdout/stderr into — the subprocess can never block on log output regardless of volumeexitChchannel — closed when the process exits, allowing callers to detect death immediately without pollingStop(),Wait(), andRunning()to use the exit channel instead of callingcmd.Wait()multiple times (which was undefined behavior)Logs()now returns a snapshot of the ring buffer contents on demandinternal/tunnel/cloudflared.goExitCh()to theConnectorinterfaceExitCh()onProcessConnector— delegates to the runner's exit channelinternal/pipeline/serve.gowaitForShutdown()to alsoselectonconnector.ExitCh()— if cloudflared exits unexpectedly, the pipeline now immediately prints a warning and begins teardown instead of hanging foreverinternal/testutil/mocks.goFnExitChandExitCh()toMockConnector— defaults to a channel that never closes (simulating a healthy process)Test plan
go build ./...— compiles cleanlygo test ./...— all tests pass (pipeline, cmd, config, origin, session)flare serve builtin:static --path ./demo --subdomain testand verify tunnel stays alive beyond 15 minuteskill <pid>) triggers immediate teardown instead of silent hang🤖 Generated with Claude Code