Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutex#lock and Mutex#synchronize are not interruptible by Thread#kill #5476

Closed
eregon opened this issue Nov 27, 2018 · 14 comments

Comments

@eregon
Copy link
Member

commented Nov 27, 2018

Environment

  • JRuby version: jruby 9.2.5.0-SNAPSHOT (2.5.0) 2018-11-27 ebbb750 OpenJDK 64-Bit Server VM 25.171-b10 on 1.8.0_171-b10 +jit [linux-x86_64]

Run:

ruby -e 'm=Mutex.new; m.lock; t=Thread.new{m.lock}; sleep 1; t.kill; t.join; p :done' 

Expected Behavior

:done

Actual Behavior

It hangs on JRuby.
3 Mutex specs have been tagged in #5475 to avoid hanging.

@headius

This comment has been minimized.

Copy link
Member

commented Nov 28, 2018

Huh. Must have regressed...I'm sure this worked before.

@headius

This comment has been minimized.

Copy link
Member

commented Nov 28, 2018

Regressed "on purpose" for a user report, due to thread interrupts popping locks unexpectedly. They may need a better way to interrupt or we may need a better way to lock interruptibly that doesn't depend on Thread.interrupt.

@headius

This comment has been minimized.

Copy link
Member

commented Nov 28, 2018

User report was #4261.

alexis779 pushed a commit to alexis779/jruby that referenced this issue Dec 2, 2018

@andy-twosticks

This comment has been minimized.

Copy link

commented Apr 1, 2019

Hangs for me on 9.1.8.0, 9.1.17.0, 9.2.0.0 and 9.2.1.0-dev.

I'd rather like this to work on 9.1.17.0, since that's the "stable" release. I think it might be tangentally responsible for a problem I have, where threads are being held onto between rspec sessions despite being supposedly killed at the end of the test?

Update: hangs in 9.2.6.0 for me, too.

@eregon

This comment has been minimized.

Copy link
Member Author

commented Apr 1, 2019

Couldn't the semantics just be try to lock in an interruptible manner, and if it is interrupted, check for Thread#raise, otherwise just try to lock (checking interrupts) again? That's essentially what we do in TruffleRuby.

@headius

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

@andy-twosticks: Do you have a real-world case freezing due to this? I am leaning toward going back to interruptible with some status check. As @eregon mentions, checking for an interrupt event may be sufficient, but I'm not sure I know all the cases that are expected to wake a lock-blocked thread.

@headius

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

@eregon The main concern I have with that approach is all the other JVM interrupt cases that aren't related to Ruby. Since we're a JVM language, we have to make decisions about whether to ignore non-Ruby interrupts or allow them to cancel the blocking operation being performed. There's no perfect answer.

We have had multiple user reports in the past where blocking operations were or were not interruptible via standard JVM mechanisms. Often we have to make a judgment call which way to go, but simply forcing Ruby semantics on users is not usually the first choice.

@headius

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

Seems like we would need to either support Java interrupt semantics or Ruby interrupt semantics. It would be possible for us to extend Thread and override interrupt to set a Ruby thread event, so Java interrupts would properly interrupt a lock, but this basically puts us back in the boat we were trying to fix in #4261 (where an interrupt intended to interrupt regex processing woke up a lock instead).

So summarizing options:

  • No interrupt: prevents threads waiting on a mutex from being interrupted in the usual Ruby ways.
  • Allow interrupt, no status check: allows normal JVM interrupting to work but risks spurious wakeups from other JVM events and signals.
  • Allow interrupt, check for Ruby event: prevents normal JVM interrupting from working unless a Ruby thread event has been set. Ruby interrupts work as expected in this issue's description.
@andy-twosticks

This comment has been minimized.

Copy link

commented Apr 2, 2019

@headius -- I don't have a real-world freeze. Although of course I have no way of knowing if any of the libraries I am using are trying to kill a thread that's held by a Mutex. And folks here are now worried, and justifiably, I think, of encountering this one in the wild.

What I have is an rspec test that waits forever for threads to be killed, sometimes (or would, if I hadn't added a timeout to the test) and other times just crashes from some other part of the suite (and the trace says the crash happened inside a Mutex).

My current working theory is that somehow repeated runs of the test suite build up dead threads until I hit some sort of limit and I either get a hang or a crash. I know that sounds wild, but it's the best I have right now.

So you see why I said my problems might be tangentally related to this one.

(I'm currently trying to duplicate this outside of rspec. If I come up with anything concrete I'll raise a seperate issue, but please don't hold your breath; it's proving highly problematic, and I'm going to have to switch to another project any time now.)

@headius

This comment has been minimized.

Copy link
Member

commented Apr 2, 2019

@andy-twosticks The fix isn't difficult...it basically restores the original logic that does a thread poll, but with the additional wrinkle that if the thread was not interrupted for a Ruby threading reason it will resume attempting to acquire the lock. I'm on the fence about that, because it doesn't seem like we can make both Java interrupters and Ruby interrupters happy.

diff --git a/core/src/main/java/org/jruby/ext/thread/Mutex.java b/core/src/main/java/org/jruby/ext/thread/Mutex.java
index 196390653d..e3d55f23a4 100644
--- a/core/src/main/java/org/jruby/ext/thread/Mutex.java
+++ b/core/src/main/java/org/jruby/ext/thread/Mutex.java
@@ -97,7 +97,18 @@ public class Mutex extends RubyObject implements DataType {
             // failed to acquire, proceed to sleep and block
             try {
                 thread.enterSleep();
-                thread.lock(lock);
+
+                // Loop until lock is acquired or we are interrupted with a thread event.
+                while (true) {
+                    try {
+                        thread.lockInterruptibly(lock);
+                        break;
+                    } catch (InterruptedException ie) {
+                        // Check thread events; if one is present, it will propagate. Otherwise interrupt was spurious.
+                        // See jruby/jruby#5476
+                        thread.pollThreadEvents(context);
+                    }
+                }
             } finally {
                 thread.exitSleep();
             }

headius added a commit to headius/jruby that referenced this issue Apr 9, 2019

Allow contended Mutex to be interrupted by Ruby.
This flips the script a bit, allowing waiting on a contended Mutex
to be interrupted by Ruby thread events but Java interrupts are
considered spurious. This is a third option for supporting
interrupts, after allowing both and allowing neither.

Intended to address jruby#5476.
@headius

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

I have filed #5682 which applies the above patch.

At this point I'm leaning back toward making all locks be interruptible by both Java and Ruby. The original accommodation for LogStash was in order to allow them to interrupt threads that appeared hung (based on some heuristic) without also interrupting threads validly blocked on a lock. If the contract of a Mutex in Ruby were that it's not interruptible from either Ruby or C, the current logic makes sense, but my reading of the world says that both Ruby and C can interrupt a thread blocking on a contended Mutex, which is how it should work in Java.

You out there @andrewvc? Or perhaps @colinsurprenant knows who else could discuss the original issue #4261?

@andrewvc

This comment has been minimized.

Copy link

commented Apr 9, 2019

@headius what you say makes sense. I no longer work on Logstash, but I agree it makes sense.

@headius headius added this to the JRuby 9.2.8.0 milestone Apr 9, 2019

headius added a commit to headius/jruby that referenced this issue Apr 9, 2019

Return to interruptible lock acquisition in Mutex.
Fixes jruby#5476.

Revert "Merge branch 'uninterruptible-lock'"

This reverts commit 6a436e9, reversing
changes made to 44564e5.

headius added a commit to headius/jruby that referenced this issue Apr 10, 2019

Return to interruptible lock acquisition in Mutex.
Fixes jruby#5476.

This reverts the original change from jruby#4261 that disabled lock
interrupts, and uses Task logic from RubyThread to handle the
interrupting properly.

headius added a commit to headius/jruby that referenced this issue Apr 10, 2019

Centralize some threading logic for Mutex, CondVar.
This better models the way MRI uses the Mutex. Rather than
directly doing the unlock, sleep, lock steps, this version just
adds the thread to a listeners queue and asks the mutex to sleep.
This allows us to reuse the sleep logic already in Mutex, which
now properly defers to RubyThread methods to do its own unlock,
sleep, lock cycle. This centralizes all of the key Mutex behaviors
in RubyThread so they can be properly boxed with thread status
changes like "sleep".

This also fixes jruby#5476 by restoring full interruptibility to Mutex
contended lock calls. Both interrupts from Java and from Ruby will
wake the thread. This returns us to matching MRI's Mutex lock
interrupt behavior.
@headius

This comment has been minimized.

Copy link
Member

commented Apr 10, 2019

I'm going ahead with the changes in #5683, which align Mutex and ConditionVariable much more closely with CRuby behavior.

@colinsurprenant

This comment has been minimized.

Copy link

commented Jul 17, 2019

@headius just came across this issue, for some reason it evaded my radar. This looks reasonable, I am actually following up on some interruption issues, will let you know if we have concerns related to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.