New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9.1.0.0 hangs on syswrite #3868

Closed
mohamedhafez opened this Issue May 10, 2016 · 5 comments

Comments

Projects
None yet
3 participants
@mohamedhafez
Contributor

mohamedhafez commented May 10, 2016

9.1.0.0 has been hanging on syswrite once every few days for me while making an http request, which wasn't happening with 9.0.4.0 or earlier. I'm almost certain its due to some kind of strange/unexpected/incorrect behavior of the server I'm making the request of, as it doesn't happen for days and then all of a sudden I get a handful at the same time. Here's the stack trace:

/usr/local/rvm/rubies/jruby-9.1.0.0/lib/ruby/stdlib/jopenssl23/openssl/buffering.rb:322:in `syswrite'
/usr/local/rvm/rubies/jruby-9.1.0.0/lib/ruby/stdlib/jopenssl23/openssl/buffering.rb:322:in `do_write'
/usr/local/rvm/rubies/jruby-9.1.0.0/lib/ruby/stdlib/jopenssl23/openssl/buffering.rb:340:in `write'
/usr/local/rvm/rubies/jruby-9.1.0.0/lib/ruby/stdlib/net/protocol.rb:211:in `write0'
/usr/local/rvm/rubies/jruby-9.1.0.0/lib/ruby/stdlib/net/protocol.rb:185:in `block in write'
/usr/local/rvm/rubies/jruby-9.1.0.0/lib/ruby/stdlib/net/protocol.rb:202:in `writing'
/usr/local/rvm/rubies/jruby-9.1.0.0/lib/ruby/stdlib/net/protocol.rb:184:in `write'
/usr/local/rvm/rubies/jruby-9.1.0.0/lib/ruby/stdlib/net/http/generic_request.rb:188:in `send_request_with_body'
/usr/local/rvm/rubies/jruby-9.1.0.0/lib/ruby/stdlib/net/http/generic_request.rb:121:in `exec'
/usr/local/rvm/rubies/jruby-9.1.0.0/lib/ruby/stdlib/net/http.rb:1446:in `block in transport_request'
/usr/local/rvm/rubies/jruby-9.1.0.0/lib/ruby/stdlib/net/http.rb:1445:in `catch'
/usr/local/rvm/rubies/jruby-9.1.0.0/lib/ruby/stdlib/net/http.rb:1445:in `transport_request'
/usr/local/rvm/rubies/jruby-9.1.0.0/lib/ruby/stdlib/net/http.rb:1418:in `request'
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius May 10, 2016

Member

Ok, I started to analyze this and then realized I was looking at IO.syswrite instad of SSLSocket.syswrite. The latter has a different implementation all balled up inside jruby-openssl.

There are three places it could be hung: selecting for write, writing the bytes it needs, and subsequently flushing the IO. We need to see the JVM trace.

Can you get a JVM stack trace from one of these hung processes? Should be Ctrl+\ in the same console, or send SIGQUIT, or use the jstack command.

Is there any way you get get us something reproducible?

Member

headius commented May 10, 2016

Ok, I started to analyze this and then realized I was looking at IO.syswrite instad of SSLSocket.syswrite. The latter has a different implementation all balled up inside jruby-openssl.

There are three places it could be hung: selecting for write, writing the bytes it needs, and subsequently flushing the IO. We need to see the JVM trace.

Can you get a JVM stack trace from one of these hung processes? Should be Ctrl+\ in the same console, or send SIGQUIT, or use the jstack command.

Is there any way you get get us something reproducible?

@mohamedhafez

This comment has been minimized.

Show comment
Hide comment
@mohamedhafez

mohamedhafez May 10, 2016

Contributor

I'll get back to you with the JVM trace, I'm going to be calling

JRuby.reference(t).native_thread.getStackTrace

where t is the stuck thread, I'm guessing that'll produce the same result as a SIGQUIT would (I've got 2000 threads running in this process so I figure this is easier, but just let me know if there is something wrong with this approach)

Regarding a reproducible test case, I'm making about ~10,000 requests a day to this particular server, which I don't have access to, and this is only happening once every 3 days, so I'm afraid I have no idea of what a reproducible case would be

Contributor

mohamedhafez commented May 10, 2016

I'll get back to you with the JVM trace, I'm going to be calling

JRuby.reference(t).native_thread.getStackTrace

where t is the stuck thread, I'm guessing that'll produce the same result as a SIGQUIT would (I've got 2000 threads running in this process so I figure this is easier, but just let me know if there is something wrong with this approach)

Regarding a reproducible test case, I'm making about ~10,000 requests a day to this particular server, which I don't have access to, and this is only happening once every 3 days, so I'm afraid I have no idea of what a reproducible case would be

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius May 11, 2016

Member

@mohamedhafez Yes, that should work fine too, and we'll get just the thread we want.

As long as you're able to access the environment we'll have an easier time investigating. You may also want to grab a heap dump (jmap tool, jvisualvm tool) when you next see a hanging thread so we have a memory snapshot to explore.

cc @kares and @mkristian since this is OpenSSL related.

It may be possible to downgrade the jruby-openssl gem to see if that fixes this issue. I do not have any theories at the moment.

Member

headius commented May 11, 2016

@mohamedhafez Yes, that should work fine too, and we'll get just the thread we want.

As long as you're able to access the environment we'll have an easier time investigating. You may also want to grab a heap dump (jmap tool, jvisualvm tool) when you next see a hanging thread so we have a memory snapshot to explore.

cc @kares and @mkristian since this is OpenSSL related.

It may be possible to downgrade the jruby-openssl gem to see if that fixes this issue. I do not have any theories at the moment.

@mohamedhafez

This comment has been minimized.

Show comment
Hide comment
@mohamedhafez

mohamedhafez May 26, 2016

Contributor

Is there a way to maybe record any relevant data just for the thread in question? Unfortunately the whole heap for the process is about 30 GB, and if I remember correctly it hangs or at least takes way too long to get a dump with visual vm...

So just as an update, this hasn't happened since the last time I posted here, so I guess its a lot more uncommon than I thought... I'll report back with the JVM trace next time this happens, and if it doesn't happen for a while I'm fine closing the issue until I can provide more info if you'd prefer

Contributor

mohamedhafez commented May 26, 2016

Is there a way to maybe record any relevant data just for the thread in question? Unfortunately the whole heap for the process is about 30 GB, and if I remember correctly it hangs or at least takes way too long to get a dump with visual vm...

So just as an update, this hasn't happened since the last time I posted here, so I guess its a lot more uncommon than I thought... I'll report back with the JVM trace next time this happens, and if it doesn't happen for a while I'm fine closing the issue until I can provide more info if you'd prefer

@mohamedhafez

This comment has been minimized.

Show comment
Hide comment
@mohamedhafez

mohamedhafez Jun 4, 2016

Contributor

Haven't seen this happen since... huh. I'll close until/if it happens again, and I'll post a JVM stack trace if so

Contributor

mohamedhafez commented Jun 4, 2016

Haven't seen this happen since... huh. I'll close until/if it happens again, and I'll post a JVM stack trace if so

@enebo enebo added this to the JRuby 9.1.3.0 milestone Aug 26, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment