New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional deadlocks between Dependencies::Interlock and db adapter lock #34310
Comments
|
Further tracking finds that the symmetric-encryption gem manually calls |
|
Thanks, this is a super clear report! Yeah, talking to the database inside a model definition is bad form, but it shouldn't hang. It sounds like thread B needs to do the same "try to take lock, if unavailable then surrender other lock and wait" on its attempt to reclaim the share lock that it does on the DB lock. |
|
Hm. But AFAICT you'd then have a thread constantly oselating between which lock it's waiting on, which either spins the CPU or requires manual sleeping in some way. And presumably loosing it's place in the queue each time. Which is not ideal but would perhaps solve the problem. A bunch of the theory around deadlock prevention talks about acquiring all the locks in the same order. The way the |
|
Okay, I checked out what happens if I revert the @lock = LoadInterlockAwareMonitor.new
...
@lock.synchronize do
ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
execute_sql_query
end
endSo with thread A being roughly the same, thread B:
One possibility would be to just never hold both locks at the same time, eg: class LoadInterlockAwareMonitor < Monitor
def synchronize
ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
super do
yield
end
end
end
endBut this makes the rather strong assumption that no code inside Another possibility would be to not reclaim the interlock until after the lock is released, eg @lock = Monitor.new # without the current overrides in LoadInterlockAwareMonitor
@lock.synchronize do
do_stuff_that_can_autoload
ActiveSupport::Dependencies.interlock.permit_concurrent_loads_until_mon_exit(@lock)
do_known_safe_stuff
end # <- releases @lock then reacquires interlock shareThis is more of a pain to code/maintain/understand but at least the caller has to opt-in. And a quick scan of the current uses of What do you think? I'd be happy to give that last option a go if you think it might be worth trying. |
|
I was feeling pretty enthusiastic about the idea of just legislating that the locks shall not meet... until I remembered The Tricky One: rails/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb Lines 285 to 288 in a52c698
IIRC that's the only place we invoke user-application code while holding the DB lock... but it's enough to complicate our lives substantially. For everything else we could probably get away with something like your permit-around-synchronize suggestion, but this one needs the interlock to be able to run the application code. Your latter idea feels like it could almost still work here: My earlier suggestion also doesn't work: when we're inside a transaction, we cannot give up the DB lock for any reason, because the connection is in a state no-one else should see. |
|
Ah gross transactions. Yeah, that'll be a problem. The more I think about it, the more I wonder if this is actually solvable in the general case. Reminding myself of the 4 necessary conditions for deadlocks:
Which is all somewhat depressing. Maybe this is actually a problem of mitigation and/or detection. My plan would probably still make deadlocks less likely, though whether it's worth it for the added complexity is a question for a maintainer to decide. Googling suggests true deadlock detection requires knowing about all the locks, who has them and who's blocked by them which is a tall order in this situation. Unless we decide to only care about this pair of locks. I guess at the end of the day there's always timeouts and exceptions... |
|
This issue has been automatically marked as stale because it has not been commented on for at least three months. |
|
Came across a similar issue with Because Essentially any gem that has a DB call as part of a class/module definition that can be autoloaded will fall prey to deadlock issues. I would like to reopen this and see what we can do. It does seem like #34308 was making some progress against a related issue and this issue just plain stopped despite it being fairly important to larger existing apps adopting the new rails testing systems. @matthewd I am bit new to the threading systems in rails do you know what we can do to help? |
|
I just realised I never put our workaround in this thread. For what it's worth, we avoided this by eager loading classes in CI. This worked well enough for us. Locally it hangs rarely enough that people can just restart the test and we saw no performance change in CI.
config.eager_load = !!ENV['CI']I also noticed this news post which may or may not make this bug redundant in rails 6. But I haven't looked into it in particular detail. https://weblog.rubyonrails.org/2019/2/22/zeitwerk-integration-in-rails-6-beta-2/ |
|
Haha, thats funny. After trying to apply the patch and debug this thing for a few hours myself I came to a very similar conclusion independently and also forgot to drop it here. We did ours in a slightly different way because we could reproduce the deadlock locally not just in a CI environment and call Maybe also a quick reference to @eileencodes as I know she has been so helpful in getting the system test system online (#28083) so she is aware of the issue too. Also for people who stumble upon this in the future I found this a related bug for others reference: rails/spring#519 |
Deadlocks in feature specs may be worked around by enabling eager loading. See rails/rails#34310 for more details. Attempt to fix https://gitlab.com/gitlab-org/gitlab-ce/issues/60953
In our rspec test environment, some feature specs occasionally hang. Generating stack traces using the sigdump gem and
ActionDispatch::DebugLocksI was able to narrow it down to roughly the following.There are two request threads, A and B, and another thread accessing the database C. When requests start they call
ActiveSupport::Dependencies.interlock.start_sharingand if they need to autoload any files, they upgrade tointerlock.exclusive. When threads use the database, they callAbstractAdapter#logwhich acquires a separate lock for the database connection (which is aLoadInterlockAwareMonitor). Then I have the following:AbstractAdapter#logAbstractAdapter#logAnd then A and B wait forever. Also of note, rspec cleanup requires rolling back a transaction which requires the db connection lock so rspec also hangs forever.
Steps to reproduce
It's hard to actually reproduce this as it's so timing based but I created this unit test 7ab4f35. I also added the full
ActionDispatch::DebugLocksoutput at the end.Expected behavior
All requests complete
Actual behavior
The test suite hangs
System configuration
Rails version: 5.1.6 (the test also reproduces against master)
Ruby version: 2.3.6 (the test also reproduces against 2.5.1)
ActionDispatch::DebugLocks output
The text was updated successfully, but these errors were encountered: