New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't reap connections that have already been reassigned #25707

Merged
merged 4 commits into from Jul 6, 2016

Conversation

Projects
None yet
5 participants
@matthewd
Member

matthewd commented Jul 5, 2016

Fixes #25585

@thedarkone any brilliant ideas for testing this?

So far my only thoughts are to either Mock Everything, or blindly hammer it in a loop and hope for the best/worst. The latter is how I manually proved it locally, but for a proper test, neither sounds particularly exciting. 馃槙

matthewd added some commits Jul 5, 2016

Check connection ownership before allowing a thread to release it
A thread can only release a connection if it owns it, or it's owned by a
thread that has died.
Re-check that the connection is still stale before we reap it
A concurrent thread may have also detected it to be stale, and already
released (or even reassigned) it by now.

Fixes #25585
Reduce locking by taking ownership of stale connections
This way, we aren't racing other threads, so we don't need to re-check
the conditional. And we no longer need to hold the lock while calling
remove (which can choose to make a new connection while we wait).
@thedarkone

This comment has been minimized.

Contributor

thedarkone commented Jul 6, 2016

any brilliant ideas for testing this?

Nope, testing races is an ugly business, it is usually done via concurrent torture test suites. They are heavyweight and I don't think we have a capacity to implement them right now (and keep running them on CI).

PS: that is a mindbogglingly sloppy race in my code, sorry.

@sonalkr132

This comment has been minimized.

Contributor

sonalkr132 commented Jul 6, 2016

This works 馃槉 Thanks @matthewd
Any idea which release it will go in?

@sonalkr132 sonalkr132 referenced this pull request Jul 6, 2016

Closed

Rails 5 update #1341

@matthewd matthewd added this to the 5.0.1 milestone Jul 6, 2016

@matthewd

This comment has been minimized.

Member

matthewd commented Jul 6, 2016

Nope, testing races is an ugly business

Yeah, I don't see any way to inject latches to force the worst-case sequencing. I guess I'll just go with the increased runtime assertions.

PS: that is a mindbogglingly sloppy race in my code, sorry.

I think the phrase you're looking for is "I can't believe I wrote that", in which case... well... 馃槼

Your reduction of locking obviously made it much more likely to occur, but I think the race was present (in an actual, possible-to-manifest, way) in 4.2 too: the acquire_connection reaps couldn't overlap because they were covered by the large checkout synchronize, but one could have occurred in the middle of a scheduled reap (if the scheduled reaper had been enabled). So, pretty definitely my fault. 馃榿

@jrafanie

This comment has been minimized.

Contributor

jrafanie commented Jul 6, 2016

@thedarkone @matthewd thanks for your work on these race conditions. It sounds horrible to diagnose and debug, and incredibly simple once you propose the solution. I have no idea how much time you must have racked your brain trying to recreate it!

@matthewd matthewd merged commit aeba05d into rails:master Jul 6, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

matthewd added a commit that referenced this pull request Jul 6, 2016

Merge pull request #25707 from matthewd/double-reap
Don't reap connections that have already been reassigned
@matthewd

This comment has been minimized.

Member

matthewd commented Jul 6, 2016

Backported as 2537d08

@matthewd matthewd deleted the matthewd:double-reap branch Jun 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment