New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutexes should use .lock instead of .lockInterruptibly #4261

Closed
andrewvc opened this Issue Nov 2, 2016 · 8 comments

Comments

Projects
None yet
2 participants
@andrewvc

andrewvc commented Nov 2, 2016

Since ruby does not have checked exceptions the behavior here: https://github.com/jruby/jruby/blob/master/core/src/main/java/org/jruby/ext/thread/Mutex.java#L91 is very surprising. If users really need this behavior they should use a ReentrantLock directly.

The only question I have now is if its removal might be even more surprising to existing users. Thoughts?

@andrewvc

This comment has been minimized.

Show comment
Hide comment
@andrewvc

andrewvc Nov 2, 2016

To be clear, this caused a nasty bug for Logstash that took a while to show up.

andrewvc commented Nov 2, 2016

To be clear, this caused a nasty bug for Logstash that took a while to show up.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 2, 2016

Member

I was about to comment that if we make this change, Thread blocked on Mutex acquisition would no longer be interruptible as in MRI:

[] ~/projects/jruby $ cat blah.rb
m = Mutex.new

m.lock

t = Thread.new { puts 'before'; m.lock rescue puts 'after' }

sleep 1
1 while t.status != "sleep"

t.raise
t.join

[] ~/projects/jruby $ ruby23 blah.rb
before
after

However, JRuby does not appear to be actually interrupt the mutex correctly, and never reaches "after".

Member

headius commented Nov 2, 2016

I was about to comment that if we make this change, Thread blocked on Mutex acquisition would no longer be interruptible as in MRI:

[] ~/projects/jruby $ cat blah.rb
m = Mutex.new

m.lock

t = Thread.new { puts 'before'; m.lock rescue puts 'after' }

sleep 1
1 while t.status != "sleep"

t.raise
t.join

[] ~/projects/jruby $ ruby23 blah.rb
before
after

However, JRuby does not appear to be actually interrupt the mutex correctly, and never reaches "after".

@headius headius added this to the JRuby 9.1.6.0 milestone Nov 2, 2016

@andrewvc

This comment has been minimized.

Show comment
Hide comment
@andrewvc

andrewvc Nov 2, 2016

@headius I suppose the confusion is that I'm not using Thread#raise in my own usage, but rather am calling #interrupt on the java thread itself.

I'm not sure if there's a way to disentangle those things, but I do think that'd make sense. That said, we learned our lesson and used ReentrantLock directly so its not a pressing concern for me. I do think that it is confusing if you are using java's Thread#interrupt, but I think its all arguable either way.

At the very least I think it'd be desirable for the situation where there was no ruby raise called to either not throw anything or throw an InterruptedException.

andrewvc commented Nov 2, 2016

@headius I suppose the confusion is that I'm not using Thread#raise in my own usage, but rather am calling #interrupt on the java thread itself.

I'm not sure if there's a way to disentangle those things, but I do think that'd make sense. That said, we learned our lesson and used ReentrantLock directly so its not a pressing concern for me. I do think that it is confusing if you are using java's Thread#interrupt, but I think its all arguable either way.

At the very least I think it'd be desirable for the situation where there was no ruby raise called to either not throw anything or throw an InterruptedException.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 2, 2016

Member

It's difficult (maybe impossible) to know whether the interrupt is from someone in normal Java space doing Thread.interrupt or from JRuby trying to wake the thread for a raise or kill event.

@andrewvc: To clarify: you had a thread waiting on a mutex that was interrupted by Java code? I'd like to understand your bug better so we can figure out the best path forward.

Meanwhile, I will push the lock change to a branch, so we can see how CI likes it.

Member

headius commented Nov 2, 2016

It's difficult (maybe impossible) to know whether the interrupt is from someone in normal Java space doing Thread.interrupt or from JRuby trying to wake the thread for a raise or kill event.

@andrewvc: To clarify: you had a thread waiting on a mutex that was interrupted by Java code? I'd like to understand your bug better so we can figure out the best path forward.

Meanwhile, I will push the lock change to a branch, so we can see how CI likes it.

headius added a commit that referenced this issue Nov 2, 2016

@andrewvc

This comment has been minimized.

Show comment
Hide comment
@andrewvc

andrewvc Nov 2, 2016

@headius The gist of the issue is that the grok filter in logstash now monitors grokking threads for timeouts with a TimeoutEnforcer object/thread. This thread periodically sees if threads have been running for too long and interrupts ones that have relying on Joni's interrupt behavior. The threads coordinated with Mutexes which is where the problem lay, that mutex would sometimes be the thing that was interrupted.

Here was the actual bug we hit, + the fix (moving to direct use of a ReentrantLock). logstash-plugins/logstash-filter-grok#98 The ticket is well documented. The PR has a few commits ,but here's the only one that matters: logstash-plugins/logstash-filter-grok@82a75e9 Let me know if you'd like additional information added clarify the ticket.

andrewvc commented Nov 2, 2016

@headius The gist of the issue is that the grok filter in logstash now monitors grokking threads for timeouts with a TimeoutEnforcer object/thread. This thread periodically sees if threads have been running for too long and interrupts ones that have relying on Joni's interrupt behavior. The threads coordinated with Mutexes which is where the problem lay, that mutex would sometimes be the thing that was interrupted.

Here was the actual bug we hit, + the fix (moving to direct use of a ReentrantLock). logstash-plugins/logstash-filter-grok#98 The ticket is well documented. The PR has a few commits ,but here's the only one that matters: logstash-plugins/logstash-filter-grok@82a75e9 Let me know if you'd like additional information added clarify the ticket.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 3, 2016

Member

Well it looks like CI is ok.

So the remaining question for me (cc @enebo) is whether we should be looking to fix the uninterruptibility. As far as I can tell, MRI will not break out of a blocked lock request unless you raise in the target thread; simply doing wakeup does not work. Of course neither work for us, but this tells me that being able to interrupt a thread waiting to lock is not typically used by Rubyists, since they'd only be able to do it by raising an error.

I think we'll go with the patch you suggest for now and deal with any fallout.

Member

headius commented Nov 3, 2016

Well it looks like CI is ok.

So the remaining question for me (cc @enebo) is whether we should be looking to fix the uninterruptibility. As far as I can tell, MRI will not break out of a blocked lock request unless you raise in the target thread; simply doing wakeup does not work. Of course neither work for us, but this tells me that being able to interrupt a thread waiting to lock is not typically used by Rubyists, since they'd only be able to do it by raising an error.

I think we'll go with the patch you suggest for now and deal with any fallout.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 3, 2016

Member

FWIW, here's what I get from MRI if I change that lock-interrupting script to try to wakeup instead of raise.

$ ruby23 blah.rb
before
blah.rb:11:in `join': No live threads left. Deadlock? (fatal)
    from blah.rb:11:in `<main>'
Member

headius commented Nov 3, 2016

FWIW, here's what I get from MRI if I change that lock-interrupting script to try to wakeup instead of raise.

$ ruby23 blah.rb
before
blah.rb:11:in `join': No live threads left. Deadlock? (fatal)
    from blah.rb:11:in `<main>'

@headius headius closed this Nov 3, 2016

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 3, 2016

Member

Merged to master in 6a436e9.

Member

headius commented Nov 3, 2016

Merged to master in 6a436e9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment