mutex: Raise a ThreadError when detecting a fiber deadlock #6680

casperisfine · 2022-11-05T17:43:55Z

[Bug #19105]

If no fiber scheduler is registered and the fiber that owns the lock and the one that try to acquire it
both belong to the same thread, we're in a deadlock case.

Ref: https://bugs.ruby-lang.org/issues/17827#note-10

cc @eregon @ioquatix WDYT?

eregon

Looks good to me!

test/fiber/test_mutex.rb

eregon · 2022-11-05T17:48:39Z

test/fiber/test_mutex.rb

@@ -194,7 +194,7 @@ def test_queue_pop_waits
  end

  def test_mutex_deadlock
-    error_pattern = /No live threads left. Deadlock\?/
+    error_pattern = /lock already owned by another fiber/


FWIW the No live threads left. Deadlock? error is actually an eFatal IIRC.
But I think using ThreadError as you did seems fine too.

There is some other detection too using that message though, maybe it's good to be consistent with it in the exception class?

Probably we also need to respect GET_THREAD()->vm->thread_ignore_deadlock (which defaults to false) like in rb_check_deadlock(), i.e., the lock could be interrupted by a signal and unblock things.

FWIW the No live threads left. Deadlock? error is actually an eFatal IIRC.

Interesting, the other deadlock related errors in this function are all rb_eThreadError.

Probably we also need to respect GET_THREAD()->vm->thread_ignore_deadlock

Done.

Interesting, the other deadlock related errors in this function are all rb_eThreadError.

Right, ThreadError seems fair then (especially since it's quite similar to the "deadlock; recursive locking" case).

[Bug #19105] If no fiber scheduler is registered and the fiber that owns the lock and the one that try to acquire it both belong to the same thread, we're in a deadlock case.

ioquatix

I would also suggest adding a test using Fiber.blocked{}. We don't always go through the scheduler code path even if a scheduler is present, if the fiber is blocking. I believe in this case, it should still error out, but without writing the code & test myself, I am not 100% certain.

casperisfine · 2022-11-06T21:32:07Z

I would also suggest adding a test using Fiber.blocked{}.

I need to read on that one. I'm not familiar with Fiber.blocked.

ioquatix · 2022-11-06T21:40:13Z

Fiber.blocked disables the fiber scheduler hooks (by disabling the fiber scheduler for the duration of that call).

technicalpickles · 2022-11-08T16:12:54Z

Would it be possible to get a 3.0 and/or 3.1 release with this fix in it? This is blocking our upgrade from 2.7. Happy to help how I can!

eregon · 2022-11-08T16:16:28Z

@technicalpickles The ticket is already marked for backport. But I don't see how this would unblock an upgrade, it will raise instead of deadlock, but that means the code trying to relock the Mutex on the same thread still won't work correctly.

technicalpickles · 2022-11-08T16:20:14Z

@eregon Apologies! I'm not familiar with how backporting and releases happen. Thanks for confirming!

This will help us flush out where deadlock is happening and help confirm when we've fixed it.

casperisfine · 2022-11-08T21:19:38Z

Yeah, backporting is tracked on Redmine, and handle by the release maintainer for each version:

A new version will be cut eventually, but probably not anytime soon. I don't know how hard it is with your infra, but I suggest applying the patch yourself when you compile your Ruby.

mutex: Raise a ThreadError when detecting a fiber deadlock (#6680) [Bug #19105] If no fiber scheduler is registered and the fiber that owns the lock and the one that try to acquire it both belong to the same thread, we're in a deadlock case. Co-authored-by: Jean Boussier <byroot@ruby-lang.org> --- test/fiber/test_mutex.rb | 22 +++++++++++++++++++++- thread_sync.c | 4 ++++ 2 files changed, 25 insertions(+), 1 deletion(-)

ojab · 2022-11-17T11:17:30Z

This broke rspec-support, It has implementation of reentrant mutex and corresponding spec.
Right now it fails on @mutex.lock unless @mutex.owned? inside the Fiber because now it raises an exception instead of waiting for the lock.

Any hint how to deal with it?

casperisfine · 2022-11-17T11:24:41Z

@ojab could you provide a small reproduction script? I'll happily look at it.

casperisfine · 2022-11-17T11:25:22Z

Also isn't ReentrantMutex basically a Monitor?

eregon · 2022-11-17T11:44:41Z

Also isn't ReentrantMutex basically a Monitor?

It is AFAIK. But RSpec tries to support very very old Ruby versions (1.8), see rspec/rspec-support#552 (comment).

I think we just need to change the RSpec spec here, so it accepts that outcome (in addition to the existing one).
That specs would be stuck forever except it gets itself unstuck via Thread#raise.
It seems a pretty artificial case which makes sense as a test but not in real code.

Another option would be to use Thread.ignore_deadlock = true in that spec, that's designed for the case a deadlock is solved by Thread#raise or a signal, so it seems fair enough.

Self-note: link to my PR fixing ReentrantMutex on Ruby 3.0: rspec/rspec-support#503

casperisfine · 2022-11-17T11:53:26Z

RSpec tries to support very very old Ruby versions (1.8)

Monitor was present in 1.4; https://github.com/ruby/ruby/blame/a36e0c78c90917c4d5cc78f67b3808913795f264/lib/monitor.rb

Based on the RSpec thread the reason seem to be that they don't want to require anything from stblib to not impact the tested code.

eregon · 2022-11-17T12:14:20Z

Right, we'd need to make Monitor core then for RSpec to use it. But anyway RSpec::Support::ReentrantMutex is fine and fairly simple (in fact much simpler than Monitor).

ojab · 2022-11-18T13:21:15Z

@casperisfine hopefully reproducer is not needed since ReentrantMutex is pretty simple:

class ReentrantMutex
  def initialize
    @owner = nil
    @count = 0
    @mutex = Mutex.new
  end

  def synchronize
    enter
    yield
  ensure
    exit
  end

  private

  def enter
    @mutex.lock unless @mutex.owned?
    @count += 1
  end

  def exit
    unless @mutex.owned?
      raise ThreadError, "Attempt to unlock a mutex which is locked by another thread/fiber"
    end
    @count -= 1
    @mutex.unlock if @count == 0
  end
end

Could you please elaborate why fiber would be stuck? Something like

mutex = ReentrantMutex.new

mutex.synchronize do 
  f = Fiber.new do 
    mutex.synchronize { do_stuff }
  end
  f.resume
  do_other_stuff
end

looks reasonable for me because we're not trying to acquire the mutex for the second time. And I guess it could work if Mutex#owned? would return true inside Fiber.new, which is not the case now.

AFAIU currently there is no way to know if we could lock or unlock the mutex right now without causing ThreadError.

casperisfine · 2022-11-18T13:25:46Z

@ojab your repro already fails on 3.0.3. What cause it to fail is not this change but https://bugs.ruby-lang.org/issues/17827

ojab · 2022-11-18T13:40:05Z

oh, right, now I got that rspec-support spec is hanging because there is no f.resume there. Thanks for explanation and sorry for the noise.

It raises `ThreadError` since ruby/ruby#6680

pirj · 2022-12-29T09:11:44Z

Please accept my apologies if I'm saying some complete nonsense. I wonder if it's an irresolvable deadlock?
What if an attempt to lock the mutex owned by a different Fiber of the same Thread would just yield?

PS It seems to be a difference when the mutex is locked by the root Fiber, and there are no other Fibers, it might never be resumed.

I understand the tradeoff of a possibility of that Fiber may get stuck, instead of failing quickly with a ThreadError.

eregon · 2022-12-29T19:46:23Z

@pirj The description already mentions that case with the Fiber scheduler, there is no change when there is a Fiber scheduler.
When there is no Fiber scheduler it's an illegal change semantically to change Fiber on Mutex#lock.

eregon approved these changes Nov 5, 2022

View reviewed changes

test/fiber/test_mutex.rb Outdated Show resolved Hide resolved

eregon reviewed Nov 5, 2022

View reviewed changes

casperisfine force-pushed the monitor-deadlock branch from 7940290 to f1e895b Compare November 5, 2022 17:56

mutex: Raise a ThreadError when detecting a fiber deadlock

73885b2

[Bug #19105] If no fiber scheduler is registered and the fiber that owns the lock and the one that try to acquire it both belong to the same thread, we're in a deadlock case.

casperisfine force-pushed the monitor-deadlock branch from f1e895b to 73885b2 Compare November 5, 2022 18:00

ioquatix approved these changes Nov 5, 2022

View reviewed changes

ngan mentioned this pull request Nov 6, 2022

LoadInterlockAwareMonitor deadlock when clearing cache (multiple databases, test) rails/rails#45994

Closed

ioquatix merged commit eacedcf into ruby:master Nov 8, 2022

casperisfine deleted the monitor-deadlock branch November 8, 2022 21:17

ojab mentioned this pull request Nov 17, 2022

Confused over instance doubles, keyword args, with, and Ruby 3.2 previews rspec/rspec-mocks#1495

Closed

ojab added a commit to ojab/rspec-support that referenced this pull request Nov 18, 2022

Fix ReentrantMutex locking inside Fiber spec for ruby-3.2

0e5a66c

It raises `ThreadError` since ruby/ruby#6680

ojab added a commit to ojab/rspec-support that referenced this pull request Nov 18, 2022

Fix ReentrantMutex locking inside Fiber spec for ruby-3.2

66ab781

It raises `ThreadError` since ruby/ruby#6680

ojab mentioned this pull request Nov 18, 2022

Fix ReentrantMutex locking inside Fiber spec for ruby-3.2 rspec/rspec-support#553

Closed

pirj pushed a commit to ojab/rspec-support that referenced this pull request Nov 19, 2022

Fix ReentrantMutex locking inside Fiber spec for ruby-3.2

269599f

It raises `ThreadError` since ruby/ruby#6680

pirj pushed a commit to ojab/rspec-support that referenced this pull request Nov 19, 2022

Fix ReentrantMutex locking inside Fiber spec for ruby-3.2

5de969f

It raises `ThreadError` since ruby/ruby#6680

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mutex: Raise a ThreadError when detecting a fiber deadlock #6680

mutex: Raise a ThreadError when detecting a fiber deadlock #6680

casperisfine commented Nov 5, 2022 •

edited

eregon left a comment

eregon Nov 5, 2022

eregon Nov 5, 2022

casperisfine Nov 5, 2022

eregon Nov 5, 2022 •

edited

ioquatix left a comment

casperisfine commented Nov 6, 2022

ioquatix commented Nov 6, 2022

technicalpickles commented Nov 8, 2022

eregon commented Nov 8, 2022

technicalpickles commented Nov 8, 2022

casperisfine commented Nov 8, 2022

ojab commented Nov 17, 2022

casperisfine commented Nov 17, 2022

casperisfine commented Nov 17, 2022

eregon commented Nov 17, 2022 •

edited

casperisfine commented Nov 17, 2022

eregon commented Nov 17, 2022

ojab commented Nov 18, 2022

casperisfine commented Nov 18, 2022

ojab commented Nov 18, 2022

pirj commented Dec 29, 2022 •

edited

eregon commented Dec 29, 2022

mutex: Raise a ThreadError when detecting a fiber deadlock #6680

mutex: Raise a ThreadError when detecting a fiber deadlock #6680

Conversation

casperisfine commented Nov 5, 2022 • edited

eregon left a comment

Choose a reason for hiding this comment

eregon Nov 5, 2022

Choose a reason for hiding this comment

eregon Nov 5, 2022

Choose a reason for hiding this comment

casperisfine Nov 5, 2022

Choose a reason for hiding this comment

eregon Nov 5, 2022 • edited

Choose a reason for hiding this comment

ioquatix left a comment

Choose a reason for hiding this comment

casperisfine commented Nov 6, 2022

ioquatix commented Nov 6, 2022

technicalpickles commented Nov 8, 2022

eregon commented Nov 8, 2022

technicalpickles commented Nov 8, 2022

casperisfine commented Nov 8, 2022

ojab commented Nov 17, 2022

casperisfine commented Nov 17, 2022

casperisfine commented Nov 17, 2022

eregon commented Nov 17, 2022 • edited

casperisfine commented Nov 17, 2022

eregon commented Nov 17, 2022

ojab commented Nov 18, 2022

casperisfine commented Nov 18, 2022

ojab commented Nov 18, 2022

pirj commented Dec 29, 2022 • edited

eregon commented Dec 29, 2022

casperisfine commented Nov 5, 2022 •

edited

eregon Nov 5, 2022 •

edited

eregon commented Nov 17, 2022 •

edited

pirj commented Dec 29, 2022 •

edited