-
-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutex gets 'broken' sporadically (since 9.2.1) #5520
Comments
think this might have been a false alarm, I will need to compare some changes in behavior in 9.2 |
CDL impl + occasional require 'thread'
require 'timeout'
class CountDownLatch
def initialize(count)
raise ArgumentError if count < 0
@count = count
@mutex = Mutex.new
@conditional = ConditionVariable.new
end
def countdown!
@mutex.synchronize do
@count -= 1 if @count > 0
@conditional.broadcast if @count == 0
end
end
def count
@mutex.synchronize { @count }
end
def wait(timeout = nil)
# begin
Timeout::timeout timeout do
@mutex.synchronize do
@conditional.wait @mutex if @count > 0
end
end
true
# rescue Timeout::Error
# false
# end
end
end
def wait_for_latch(count = 10, timeout = 5)
latch = CountDownLatch.new count
count.times do
Thread.new { latch.countdown! }
end
latch.wait(timeout)
end
10_000.times do |i|
puts i
result = wait_for_latch
end |
This seems to have manifested in a spec failure. |
yy, it has been manifesting from time-to-time, for a while now. |
I'm on it. |
@kares Should we marked this fixed or move it to 9.2.7? My small patch addresses the base issue. |
A recent mspec update changed how the Mutex#lock specs test for blocking. The new logic loops while checking thread status for "sleep" to indicate the thread is blocking. With our impl of lock, a thread could briefly appear to be in "sleep" status even if the lock acquisition was immediately succesful. Using tryLock first reduces the chance that the thread will look asleep, though the race could still happen if another thread unlocks the mutex after tryLock has been called. See ruby/mspec@4c660ce See related issue #5520.
@kares As I mentioned on IRC I think the sporadic failure of that spec was due to a spec update. In ruby/mspec@4c660ce mspec's blocking-call detector was modified to use a thread status of "sleep" to indicate that the thread was blocking, but our mutex always set "sleep" briefly around the call to |
FWIW this issue did find the deadlock in the interrupts list, so that's a great thing to have fixed :-) |
interesting, mspec driven approach :) ... guess mspec is biased these days. anyway, the fix is good as is but I still do not like iterating while the collection might get modified ... |
... as a way of 'double' resolving GH-5520
... as a way of 'double' resolving jrubyGH-5520
... as a way of 'double' resolving jrubyGH-5520
... as a way of 'double' resolving GH-5520
we're seen some mutex occasional failures in CI, for a while now.
turns out maybe not false alarms, but the implementation regressing (since 9.2.1)
the issue can be spotted under concurrent threading with some user use-cases.
e.g. a .rb
CountDownLatch
https://raw.githubusercontent.com/benlangfeld/countdownlatch/master/lib/countdownlatch.rb ... guessing AR's connection pool might be also affected by this.already bisected and find the cause to be: 66d2905
which makes no sense ... going to try setup a PR with a concurrent test-case and a proposed fix.
The text was updated successfully, but these errors were encountered: