Mutexes should use .lock instead of .lockInterruptibly #4261

Closed
andrewvc opened this Issue Nov 2, 2016 · 8 comments

Projects

None yet

2 participants

@andrewvc
andrewvc commented Nov 2, 2016

Since ruby does not have checked exceptions the behavior here: https://github.com/jruby/jruby/blob/master/core/src/main/java/org/jruby/ext/thread/Mutex.java#L91 is very surprising. If users really need this behavior they should use a ReentrantLock directly.

The only question I have now is if its removal might be even more surprising to existing users. Thoughts?

@andrewvc
andrewvc commented Nov 2, 2016

To be clear, this caused a nasty bug for Logstash that took a while to show up.

@headius
Member
headius commented Nov 2, 2016

I was about to comment that if we make this change, Thread blocked on Mutex acquisition would no longer be interruptible as in MRI:

[] ~/projects/jruby $ cat blah.rb
m = Mutex.new

m.lock

t = Thread.new { puts 'before'; m.lock rescue puts 'after' }

sleep 1
1 while t.status != "sleep"

t.raise
t.join

[] ~/projects/jruby $ ruby23 blah.rb
before
after

However, JRuby does not appear to be actually interrupt the mutex correctly, and never reaches "after".

@headius headius added this to the JRuby 9.1.6.0 milestone Nov 2, 2016
@andrewvc
andrewvc commented Nov 2, 2016

@headius I suppose the confusion is that I'm not using Thread#raise in my own usage, but rather am calling #interrupt on the java thread itself.

I'm not sure if there's a way to disentangle those things, but I do think that'd make sense. That said, we learned our lesson and used ReentrantLock directly so its not a pressing concern for me. I do think that it is confusing if you are using java's Thread#interrupt, but I think its all arguable either way.

At the very least I think it'd be desirable for the situation where there was no ruby raise called to either not throw anything or throw an InterruptedException.

@headius
Member
headius commented Nov 2, 2016

It's difficult (maybe impossible) to know whether the interrupt is from someone in normal Java space doing Thread.interrupt or from JRuby trying to wake the thread for a raise or kill event.

@andrewvc: To clarify: you had a thread waiting on a mutex that was interrupted by Java code? I'd like to understand your bug better so we can figure out the best path forward.

Meanwhile, I will push the lock change to a branch, so we can see how CI likes it.

@headius headius added a commit that referenced this issue Nov 2, 2016
@headius headius Don't lock uninterruptibly. #4261 2501c07
@andrewvc
andrewvc commented Nov 2, 2016

@headius The gist of the issue is that the grok filter in logstash now monitors grokking threads for timeouts with a TimeoutEnforcer object/thread. This thread periodically sees if threads have been running for too long and interrupts ones that have relying on Joni's interrupt behavior. The threads coordinated with Mutexes which is where the problem lay, that mutex would sometimes be the thing that was interrupted.

Here was the actual bug we hit, + the fix (moving to direct use of a ReentrantLock). logstash-plugins/logstash-filter-grok#98 The ticket is well documented. The PR has a few commits ,but here's the only one that matters: logstash-plugins/logstash-filter-grok@82a75e9 Let me know if you'd like additional information added clarify the ticket.

@headius
Member
headius commented Nov 3, 2016

Well it looks like CI is ok.

So the remaining question for me (cc @enebo) is whether we should be looking to fix the uninterruptibility. As far as I can tell, MRI will not break out of a blocked lock request unless you raise in the target thread; simply doing wakeup does not work. Of course neither work for us, but this tells me that being able to interrupt a thread waiting to lock is not typically used by Rubyists, since they'd only be able to do it by raising an error.

I think we'll go with the patch you suggest for now and deal with any fallout.

@headius
Member
headius commented Nov 3, 2016

FWIW, here's what I get from MRI if I change that lock-interrupting script to try to wakeup instead of raise.

$ ruby23 blah.rb
before
blah.rb:11:in `join': No live threads left. Deadlock? (fatal)
    from blah.rb:11:in `<main>'
@headius headius closed this Nov 3, 2016
@headius
Member
headius commented Nov 3, 2016

Merged to master in 6a436e9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment