Skip to content

Conversation

@ndyakov
Copy link
Member

@ndyakov ndyakov commented Nov 28, 2025

A regression was reported where with aggressive timeout configuration the pool was stuck. Traced back to #3518

This PR fixes #3619 and #3623

@ndyakov ndyakov force-pushed the ndyakov/freeturn-fix branch from 96d0e65 to 4274cff Compare November 28, 2025 22:52
@ndyakov ndyakov force-pushed the ndyakov/freeturn-fix branch from 17acb71 to c80f50b Compare November 28, 2025 23:06
@ndyakov
Copy link
Member Author

ndyakov commented Nov 28, 2025

@cyningsun feel free to review if you can.

@ndyakov ndyakov requested a review from Copilot November 28, 2025 23:30
Copilot finished reviewing on behalf of ndyakov November 28, 2025 23:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical double-free bug in the connection pool that caused the pool to get stuck with aggressive timeout configurations. The bug occurred when a dial goroutine created a connection, the original waiter timed out, and the connection was delivered to another waiter - both the dial goroutine and the recipient would call freeTurn(), causing the semaphore to be released twice and allowing more concurrent operations than the pool size permits.

Key Changes

  • Added gotConn atomic boolean to track whether a waiter successfully received a connection
  • Modified dial goroutine logic to conditionally call freeTurn() based on whether the original waiter received a connection
  • Enhanced context timeout handling in redis.go to respect caller deadlines during connection initialization waits
  • Added comprehensive test coverage for the double-free scenario

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
redis.go Enhanced timeout logic to use minimum of remaining command timeout and DialTimeout when waiting for connection initialization
internal/pool/want_conn.go Added atomic gotConn field to track whether waiter received a connection, enabling proper turn management
internal/pool/pool.go Modified dial goroutine to conditionally free turns based on waiter's connection receipt status, preventing double-free
internal/pool/double_freeturn_test.go Added test demonstrating the double-free bug scenario with slow dials and timeouts
internal/pool/double_freeturn_simple_test.go Added simple test to detect double-free by checking queue length during critical window

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@ndyakov ndyakov requested a review from Copilot November 28, 2025 23:53
Copilot finished reviewing on behalf of ndyakov November 29, 2025 00:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ndyakov ndyakov force-pushed the ndyakov/freeturn-fix branch from 1b6e627 to 9710b24 Compare November 29, 2025 00:49
@ndyakov ndyakov requested a review from Copilot November 29, 2025 10:52
Copilot finished reviewing on behalf of ndyakov November 29, 2025 10:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@cyningsun
Copy link
Contributor

Thanks for the excellent work @ndyakov! Your test coverage and solution look comprehensive.

I've prepared #3626 as an alternative approach focusing on turn ownership semantics. It's offered in case it provides any additional useful insights, but please feel free to close it if #3625 already covers all the necessary ground.

@ndyakov
Copy link
Member Author

ndyakov commented Dec 1, 2025

#3626 was merged instead

@ndyakov ndyakov closed this Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v9.17.0 ReadOnly cluster freezes

4 participants