Skip to content

streaming: prevent reader Close from deadlocking on a fatal read error#78

Merged
raphael merged 3 commits into
mainfrom
fix/reader-close-self-deadlock
May 28, 2026
Merged

streaming: prevent reader Close from deadlocking on a fatal read error#78
raphael merged 3 commits into
mainfrom
fix/reader-close-self-deadlock

Conversation

@raphael
Copy link
Copy Markdown
Member

@raphael raphael commented May 28, 2026

Summary

Follow-up to #77. That PR fixed Close deadlocking on a stalled subscriber; this fixes a sibling deadlock in the same area that I spotted while reviewing it.

On a fatal read error — handleReadError returns the error when Redis reports the stream key no longer exists — the reader's read loop calls Close() on itself:

if err := handleReadError(err, r.logger); err != nil {
    r.logger.Error(...)
    r.Close()
    return
}

But Close does r.wait.Wait(), which waits for the read goroutine to run its deferred cleanup() (wait.Done()). Since that goroutine is the one blocked inside Close, it deadlocks: the read goroutine and its Redis connection leak, and any external Close() blocks forever too. Same goroutine + connection leak as #77, different trigger.

Fix

Trigger the shutdown asynchronously (pulse.Go(r.logger, r.Close)) and let the read goroutine return so cleanup releases the wait group. Close is already idempotent via sync.Once (from #77), so this is safe alongside a concurrent external Close.

Only Reader is affected — Sink.read logs and continues on read errors rather than closing itself.

Note on the test

The fatal path is only reached when Redis returns "stream key no longer exists", which it does not emit deterministically for a blocking XREAD (deleting the key doesn't unblock the call, so a Redis-driven test would be timing-dependent). To get a deterministic regression test that exercises the real read() loop, I added a small package-level xreadFn seam that defaults to (*Reader).xread and is only overridden in tests to inject the fatal error. Happy to drop it if you'd rather not carry the seam.

Test plan

  • go test -race ./streaming -run TestReaderCloseOnFatalReadError -count=10 — new regression test; hangs on pre-fix code (verified by reverting the fix), passes here.
  • go test -race ./streaming -count=1

raphael added 3 commits May 28, 2026 11:56
On a fatal read error (e.g. the underlying stream key being destroyed)
the reader read loop called Close synchronously. Close waits on the read
goroutine via wait.Wait, so calling it from that same goroutine deadlocked
and leaked the reader and its Redis connection; any external Close blocked
forever too.

Trigger the shutdown asynchronously so the read goroutine can return and
release the wait group. Close is already idempotent via sync.Once, so this
is safe alongside a concurrent external Close.

Adds a regression test that drives the real read loop through a simulated
fatal error (via a small xreadFn seam) and asserts the reader closes
instead of hanging. The test hangs on pre-fix code.
Move the test-only read hook from a Reader struct field to a package
variable so it no longer pollutes Reader. Behavior is unchanged; it
defaults to (*Reader).xread and is only overridden in tests.
@raphael raphael enabled auto-merge (squash) May 28, 2026 19:09
@raphael raphael merged commit 4dd6a29 into main May 28, 2026
5 checks passed
@raphael raphael deleted the fix/reader-close-self-deadlock branch May 28, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant