Skip to content

Buffered requestResignChan#1207

Merged
brandur merged 1 commit intomasterfrom
brandur-buffered-request-resign-chan
Apr 13, 2026
Merged

Buffered requestResignChan#1207
brandur merged 1 commit intomasterfrom
brandur-buffered-request-resign-chan

Conversation

@brandur
Copy link
Copy Markdown
Contributor

@brandur brandur commented Apr 12, 2026

Aims to fix another intermittent test failure:

https://github.com/riverqueue/river/actions/runs/24312716933/job/70985012980?pr=1205

--- FAIL: TestElector_WithNotifier (0.00s)
    --- FAIL: TestElector_WithNotifier/RequestResignStress (20.41s)
        elector_test.go:350: Generated postgres schema "leadership_2026_04_12t17_52_43_schema_04" with migrations [1 2 3 4 5 6] on line "main" in 229.626141ms [4 generated] [0 reused]
        elector_test.go:352: Starting test_client_id
        elector_test.go:375: Requesting leadership resign
        elector_test.go:375: Requesting leadership resign
        elector_test.go:375: Requesting leadership resign
        elector_test.go:375: Requesting leadership resign
        elector_test.go:375: Requesting leadership resign
        test_signal.go:95: timed out waiting on test signal after 10s
        test_signal.go:95: timed out waiting on test signal after 10s
        riverdbtest.go:293: Checked in postgres schema "leadership_2026_04_12t17_52_43_schema_04"; 1 idle schema(s) [5 generated] [3 reused]
FAIL
FAIL	github.com/riverqueue/river/internal/leadership	20.948s

The problem is that when requestResignChan is unbuffered, if
keepLeadershipLoop hasn't yet entered its select, then the default
statement on the select below will cause all senders (we have 5
competing senders in the RequestResignStress test case) to fall
through without sending anything:

select {
case <-ctx.Done():
case e.requestResignChan <- struct{}{}:
default:
        // if context is not done and requestResignChan has an item in it
        // already, do nothing
}

By making the channel buffered, we guarantee at least one sender gets a
message through, and we don't end up hanging the test.

@brandur
Copy link
Copy Markdown
Contributor Author

brandur commented Apr 12, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@brandur brandur requested a review from bgentry April 12, 2026 18:19
@brandur
Copy link
Copy Markdown
Contributor Author

brandur commented Apr 12, 2026

@bgentry Again, kind of hoping that we can rely on Claude/Codex to cross reference their work and root out any tricky concurrency problems here.

Copy link
Copy Markdown
Contributor

@bgentry bgentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix. Not something we can reasonably add a test for, right?

Aims to fix another intermittent test failure:

https://github.com/riverqueue/river/actions/runs/24312716933/job/70985012980?pr=1205

    --- FAIL: TestElector_WithNotifier (0.00s)
        --- FAIL: TestElector_WithNotifier/RequestResignStress (20.41s)
            elector_test.go:350: Generated postgres schema "leadership_2026_04_12t17_52_43_schema_04" with migrations [1 2 3 4 5 6] on line "main" in 229.626141ms [4 generated] [0 reused]
            elector_test.go:352: Starting test_client_id
            elector_test.go:375: Requesting leadership resign
            elector_test.go:375: Requesting leadership resign
            elector_test.go:375: Requesting leadership resign
            elector_test.go:375: Requesting leadership resign
            elector_test.go:375: Requesting leadership resign
            test_signal.go:95: timed out waiting on test signal after 10s
            test_signal.go:95: timed out waiting on test signal after 10s
            riverdbtest.go:293: Checked in postgres schema "leadership_2026_04_12t17_52_43_schema_04"; 1 idle schema(s) [5 generated] [3 reused]
    FAIL
    FAIL	github.com/riverqueue/river/internal/leadership	20.948s

The problem is that when `requestResignChan` is unbuffered, if
`keepLeadershipLoop` hasn't yet entered its `select`, then the `default`
statement on the `select` below will cause all senders (we have 5
competing senders in the `RequestResignStress` test case) to fall
through without sending anything:

    select {
    case <-ctx.Done():
    case e.requestResignChan <- struct{}{}:
    default:
            // if context is not done and requestResignChan has an item in it
            // already, do nothing
    }

By making the channel buffered, we guarantee at least one sender gets a
message through, and we don't end up hanging the test.
@brandur brandur force-pushed the brandur-buffered-request-resign-chan branch from cc43053 to bedab2f Compare April 13, 2026 04:46
@brandur
Copy link
Copy Markdown
Contributor Author

brandur commented Apr 13, 2026

Good fix. Not something we can reasonably add a test for, right?

Yeah, actually turns out we can write a test for this! Added.

@brandur brandur merged commit 27a4c30 into master Apr 13, 2026
15 checks passed
@brandur brandur deleted the brandur-buffered-request-resign-chan branch April 13, 2026 05:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants