New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More connection locking #28777
More connection locking #28777
Conversation
@KeithP this should hopefully tame that last deadlock you were seeing |
Hrm something isn't quite right. I'm seeing my test app run one test, do a rollback and then just sit there. No raise or anything. All the tests in my app should run, but only one is before it stalls and just sits there. Here's the logs.
|
checkin conn | ||
if conn.owner != owner_thread && @sharing_threads[conn.owner] | ||
synchronize do | ||
@sharing_threads[conn.owner].delete(Thread.current) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be .delete(owner_thread)
.
@@ -392,7 +417,14 @@ def active_connection? | |||
# #checkout will not be automatically released. | |||
def release_connection(owner_thread = Thread.current) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel there should be a hard check and a raise (akin to what is in the remove()
) method. And I'd want it to happen before @thread_cached_conns.delete(...)
:
def release_connection(owner_thread = Thread.current)
raise "Cannot release a shared connection" if @locked_thread == owner_thread
if conn = @thread_cached_conns.delete(connection_cache_key(owner_thread))
# ...
@@ -392,7 +417,14 @@ def active_connection? | |||
# #checkout will not be automatically released. | |||
def release_connection(owner_thread = Thread.current) | |||
if conn = @thread_cached_conns.delete(connection_cache_key(owner_thread)) | |||
checkin conn | |||
if conn.owner != owner_thread && @sharing_threads[conn.owner] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a small comment (unless I'm wrong on this one)
# if this branch is taken, then it must be that `@sharing_thread[conn.owner].include?(owner_thread)` # => true
@matthewd looks like we get the same failure as Eileen. Failure/Error: @cond.wait(@monitor.instance_variable_get(:@mon_mutex), timeout)
|
5ca6281
to
1c2de33
Compare
1c2de33
to
8e11ab4
Compare
I've pulled out the deadlock fix in ce2abff and pushed that to master / 5-1-stable. We'll punt the larger change here to 5.1.1; I think I've just fixed the problem @eileencodes and @KeithP described above, but this is too risky to drop in post-rc2, when we're not expecting many users to encounter the original issue. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Address two failure modes for connection pool thread locking (yay) by making it more complicated (boo).
First, a deadlock between the pool lock and the connection lock, due to inconsistent lock acquisition order:
clear_query_cache
requires the connection lock, and is invoked by a checkin callback while holding the pool lock. (I don't think we should actually be invoking callbacks while holding the pool lock, but that's a matter for another time.)Second, the less proximate cause of the above failure: the main thread could checkin its connection while another thread was still using it. The second thread would then unexpectedly change connections between two queries -- even while inside a transaction on the first one.
We now keep track of who has borrowed the connection, and don't complete the "unlock" until they have released it. This also means that the lock only applies to connection acquisition: after that (and until it attempts to release it) the borrowing thread treats the connection as its own.