Fix client state not being reset after failed Start() by mfly · Pull Request #1136 · riverqueue/river

mfly · 2026-01-28T14:05:11Z

When Client.Start() failed (e.g., due to database connection errors or missing tables), the internal isRunning flag remained true. This caused subsequent Start() calls to return nil immediately without actually attempting to start the client, leaving the application in a non-functional state where jobs were never processed.

The fix adds a new StartFailed() method to BaseStartStop that properly resets internal state after a startup failure. This is called when Start() encounters a real error (not when Stop() cancels the context).

mfly · 2026-01-29T10:02:18Z

Fixed the deadlock issue revealed by the stress tests - they pass locally now

When Client.Start() failed (e.g., due to database connection errors or missing tables), the internal isRunning flag remained true. This caused subsequent Start() calls to return nil immediately without actually attempting to start the client, leaving the application in a non-functional state where jobs were never processed. The fix calls baseStartStop.Stop() after closing the stopped channel on startup failure, which properly resets the client's internal state so that Start() can be called again.

When Start() fails due to a real error (e.g., database connection failure), the client's internal state was left in a running state, preventing subsequent Start() calls from succeeding. Add StartFailed() method to BaseStartStop that properly resets internal state after a startup failure. This is separate from Stop() handling - when Stop() cancels the context (ErrStop), Stop() itself handles cleanup via finalizeStop(). Fixes the issue where a client could not be restarted after a transient startup failure.

bgentry · 2026-02-06T02:28:29Z

Awesome find and fix, thank you @mfly! While we're reviewing this would you mind adding yourself to the CLA?

bgentry

@brandur I suspect you're going to have some strong opinions on how you'd like to see this fixed, going to assign it to you 😄

brandur · 2026-02-08T22:52:22Z

@mfly Thanks for this! Let me spend a little time looking a bit more closely — the idea behind the startstop infra is that it gives you a reliable way to make sure all theses edge conditions are handled correctly. It'd be nice if we could get a little more of a hands free resolution that'd work anywhere this is used.

mfly · 2026-02-13T09:41:25Z

@mfly Thanks for this! Let me spend a little time looking a bit more closely — the idea behind the startstop infra is that it gives you a reliable way to make sure all theses edge conditions are handled correctly. It'd be nice if we could get a little more of a hands free resolution that'd work anywhere this is used.

Yes, please do 🙏 - I did ponder a bit on a more automated solution, but opted for being more explicit. I'm obviously lacking the context here.

This one's presented as an alternative to #1136. Basically, a current problem with the start/stop infrastructure is that in the event of a partial start where a service returns from its start function, but without `Stop` having been called on, we can get into a situation where the start/stop's `isRunning` flag is still set to true, and when the start/stop is started again, it'll fall through thinking it's already running. Here, we check for this condition on subsequent starts. If the `stopped` channel is non-nil but already closed, we reset all internal state including `isRunning` so the service can start again. To prove this works, I pull in the test case added in #1136 verbatim, and also add one more specific test in `start_stop_test.go` for a more precise version.

brandur · 2026-03-26T06:00:19Z

Sorry for the delay on this one. I put up a variant at #1187 and copied your test out to make sure it also resolves the problem.

This one's presented as an alternative to #1136. Basically, a current problem with the start/stop infrastructure is that in the event of a partial start where a service returns from its start function, but without `Stop` having been called on, we can get into a situation where the start/stop's `isRunning` flag is still set to true, and when the start/stop is started again, it'll fall through thinking it's already running. Here, we check for this condition on subsequent starts. If the `stopped` channel is non-nil but already closed, we reset all internal state including `isRunning` so the service can start again. To prove this works, I pull in the test case added in #1136 verbatim, and also add one more specific test in `start_stop_test.go` for a more precise version.

brandur · 2026-03-26T06:04:27Z

(And thanks for the original fix!)

mfly · 2026-03-26T21:54:08Z

Sorry for the delay on this one. I put up a variant at #1187 and copied your test out to make sure it also resolves the problem.

Great, thanks! Closing this one!

…art (#1187) This one's presented as an alternative to #1136. Basically, a current problem with the start/stop infrastructure is that in the event of a partial start where a service returns from its start function, but without `Stop` having been called on, we can get into a situation where the start/stop's `isRunning` flag is still set to true, and when the start/stop is started again, it'll fall through thinking it's already running. Here, we check for this condition on subsequent starts. If the `stopped` channel is non-nil but already closed, we reset all internal state including `isRunning` so the service can start again. To prove this works, I pull in the test case added in #1136 verbatim, and also add one more specific test in `start_stop_test.go` for a more precise version.

mfly and others added 2 commits February 4, 2026 16:27

mfly force-pushed the fix-client-start-failure-state branch from ff3ea11 to f01b895 Compare February 4, 2026 15:27

bgentry reviewed Feb 6, 2026

View reviewed changes

bgentry assigned brandur Feb 6, 2026

brandur mentioned this pull request Mar 26, 2026

Startstop: Start successfully again even in the event of a partial start #1187

Merged

mfly closed this Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix client state not being reset after failed Start()#1136

Fix client state not being reset after failed Start()#1136
mfly wants to merge 2 commits intoriverqueue:masterfrom
mfly:fix-client-start-failure-state

mfly commented Jan 28, 2026 •

edited

Loading

Uh oh!

mfly commented Jan 29, 2026

Uh oh!

bgentry commented Feb 6, 2026

Uh oh!

bgentry left a comment

Uh oh!

brandur commented Feb 8, 2026

Uh oh!

mfly commented Feb 13, 2026

Uh oh!

brandur commented Mar 26, 2026

Uh oh!

brandur commented Mar 26, 2026

Uh oh!

mfly commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mfly commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfly commented Jan 29, 2026

Uh oh!

bgentry commented Feb 6, 2026

Uh oh!

bgentry left a comment

Choose a reason for hiding this comment

Uh oh!

brandur commented Feb 8, 2026

Uh oh!

mfly commented Feb 13, 2026

Uh oh!

brandur commented Mar 26, 2026

Uh oh!

brandur commented Mar 26, 2026

Uh oh!

mfly commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mfly commented Jan 28, 2026 •

edited

Loading