Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jruby-openssl spins endlessly trying to write to a socket which is in state CLOSE_WAIT #1156

Closed
xb opened this Issue Oct 24, 2013 · 6 comments

Comments

Projects
None yet
5 participants
@xb
Copy link

xb commented Oct 24, 2013

On

$ jruby --version
jruby 1.7.5 (1.9.3p392) 2013-10-07 74e9291 on OpenJDK 64-Bit Server VM 1.7.0_40-b31 [linux-amd64]

with the jruby-openssl-0.9.4 gem installed, when running an application which accessess a HTTPS site as a HTTPS client, I repeatably observe following patterns after some running time:

  1. The CPU usage of the application is slightly above 100% times n, where n is a positive integer.
  2. The number of sockets which are actually HTTPS sockets and in CLOSE_WAIT state (as of "lsof") is equal to n.
  3. The number of threads calling org.jruby.ext.openssl.SSLSocket.syswrite(SSLSocket.java:664) in at a particular time (as evidenced by "jstack" or by "kill -3") is almost always n.
  4. The thread ids for these n affected threads being the same over time.

I conclude that there seems to be a bug in jruby-openssl that is characterized by a SSL client socket getting into the CLOSE_WAIT state (e.g. the remote party or the network terminating the connection unexpectedly) and jruby-openssl apparently not noticing this state, and, rather than that, attempting to write to the socket in an endless loop.

Some Java stack traces which were captured:

   java.lang.Thread.State: RUNNABLE
        at java.lang.Object.hashCode(Native Method)
        at java.util.HashMap.hash(HashMap.java:366)
        at java.util.HashMap.put(HashMap.java:496)
        at java.util.HashSet.add(HashSet.java:217)
        at sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:165)
        at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:133)
        - locked <0x00000000e3ce1d60> (a java.util.Collections$UnmodifiableSet)
        at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:209)
        - locked <0x00000000e42aa638> (a java.lang.Object)
        - locked <0x00000000e42aa648> (a java.lang.Object)
        at java.nio.channels.SelectableChannel.register(SelectableChannel.java:277)
        at org.jruby.ext.openssl.SSLSocket.waitSelect(SSLSocket.java:288)
        at org.jruby.ext.openssl.SSLSocket.do_syswrite(SSLSocket.java:644)
        at org.jruby.ext.openssl.SSLSocket.syswrite(SSLSocket.java:664)
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollCtl(Native Method)
        at sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayWrapper.java:287)
        - locked <0x00000000e3cdf488> (a java.lang.Object)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:256)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x00000000e3cdf410> (a sun.nio.ch.Util$2)
        - locked <0x00000000e3cdf400> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00000000e3cdf1c8> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102)
        at org.jruby.ext.openssl.SSLSocket$2.run(SSLSocket.java:316)
        at org.jruby.RubyThread.executeBlockingTask(RubyThread.java:1063)
        at org.jruby.ext.openssl.SSLSocket.waitSelect(SSLSocket.java:295)
        at org.jruby.ext.openssl.SSLSocket.do_syswrite(SSLSocket.java:644)
        at org.jruby.ext.openssl.SSLSocket.syswrite(SSLSocket.java:664)
   java.lang.Thread.State: RUNNABLE
        at org.jruby.util.io.ChannelStream.fflush(ChannelStream.java:661)
        - locked <0x00000000e4430e08> (a org.jruby.util.io.ChannelStream)
        at org.jruby.RubyIO.flush(RubyIO.java:2168)
        at org.jruby.ext.openssl.SSLSocket.do_syswrite(SSLSocket.java:654)
        at org.jruby.ext.openssl.SSLSocket.syswrite(SSLSocket.java:664)

The bug seems to be similar to https://jira.codehaus.org/browse/JRUBY-6973 .

@xb

This comment has been minimized.

Copy link
Author

xb commented Oct 24, 2013

The bug is probably that in https://github.com/jruby/jruby/blob/1.7.6/lib/ruby/shared/jopenssl19/openssl/buffering.rb?source=cc#L318

      while remain > 0
        str = @wbuffer[nwritten,remain]
        begin
          nwrote = syswrite(str)
        rescue Errno::EAGAIN
          retry
        end
        remain -= nwrote
        nwritten += nwrote
      end

nwrote is either 0 each time or Errno::EAGAIN is raised each time although the call is a blocking call (which means that neither nwrote==0 is allowed to happen nor that Errno::EAGAIN is allowed to be raised).

@xb

This comment has been minimized.

Copy link
Author

xb commented Oct 25, 2013

Analyzing this problem further, it turns out that in https://github.com/jruby/jruby/blob/1.7.6/ext/openssl/src/main/java/org/jruby/ext/openssl/SSLSocket.java#L484

    public int write(ByteBuffer src, boolean blocking) throws SSLException, IOException {
        if(initialHandshake) {
            throw new IOException("Writing not possible during handshake");
        }

        SelectableChannel selectable = getSocketChannel();
        boolean blockingMode = selectable.isBlocking();
        if (!blocking) selectable.configureBlocking(false);

        try {
            if(netData.hasRemaining()) {
                flushData(blocking);
            }
            netData.clear();
            SSLEngineResult res = engine.wrap(src, netData);
            netData.flip();
            flushData(blocking);
            return res.bytesConsumed();
        } finally {
            if (!blocking) selectable.configureBlocking(blockingMode);
        }
    }

the line

SSLEngineResult res = engine.wrap(src, netData); 

Returns an SSLEngineResult object "res". This SSLEngineResult has a status (obtainable by res.getStatus()) One of the possible states is "CLOSED". In case of this bug, the status is actually closed.

It turns out the place where this CLOSED state is created is http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/share/classes/sun/security/ssl/SSLEngineImpl.java

    /*
     * Makes additional checks for unwrap, but this time more
     * specific to this packet and the current state of the machine.
     */
    private SSLEngineResult writeAppRecord(EngineArgs ea) throws IOException {

        Status status = null;
        HandshakeStatus hsStatus = null;

        /*
         * See if the handshaker needs to report back some SSLException.
         */
        checkTaskThrown();

        /*
         * short circuit if we're closed/closing.
         */
        if (writer.isOutboundDone()) {
            return new SSLEngineResult(Status.CLOSED, getHSStatus(null), 0, 0);
        }

So in this case, writer.isOutboundDone() returns true and thus the SSLEngineResult contains status Status.CLOSED.

The netData buffer then contains 0 bytes and when attempting to write it, 0 bytes are actually written. Thus, the syswrite() method returns 0, and spins forever.

@tbeauvais

This comment has been minimized.

Copy link

tbeauvais commented Nov 15, 2013

Yes, I'm seeing the same thing. Several threads(5-6) in similar state and each causes the CPU to spinning at 600% or so.

Is there a Jira ticket for this? any workarounds?

Thread 31345: (state = IN_VM)
 - org.jruby.internal.runtime.ThreadService.getCurrentContext() @bci=0, line=189 (Interpreted frame)
 - org.jruby.Ruby.getCurrentContext() @bci=4, line=2557 (Interpreted frame)
 - org.jruby.RubyThread.pollThreadEvents() @bci=5, line=489 (Interpreted frame)
 - org.jruby.RubyThread.executeBlockingTask(org.jruby.RubyThread$BlockingTask) @bci=10, line=1004 (Interpreted frame)
 - org.jruby.ext.openssl.SSLSocket.waitSelect(int, boolean) @bci=100, line=295 (Interpreted frame)
 - org.jruby.ext.openssl.SSLSocket.do_syswrite(org.jruby.runtime.ThreadContext, org.jruby.runtime.builtin.IRubyObject, boolean) @bci=13, line=644 (Interpreted frame)
 - org.jruby.ext.openssl.SSLSocket.syswrite(org.jruby.runtime.ThreadContext, org.jruby.runtime.builtin.IRubyObject) @bci=4, line=664 (Interpreted frame)
Thread 13372: (state = IN_VM)
 - org.jruby.internal.runtime.ThreadService.getCurrentContext() @bci=0, line=189 (Interpreted frame)
 - org.jruby.Ruby.getCurrentContext() @bci=4, line=2557 (Interpreted frame)
 - org.jruby.ext.openssl.SSLSocket.waitSelect(int, boolean) @bci=21, line=283 (Interpreted frame)
 - org.jruby.ext.openssl.SSLSocket.do_syswrite(org.jruby.runtime.ThreadContext, org.jruby.runtime.builtin.IRubyObject, boolean) @bci=13, line=644 (Interpreted frame)
 - org.jruby.ext.openssl.SSLSocket.syswrite(org.jruby.runtime.ThreadContext, org.jruby.runtime.builtin.IRubyObject) @bci=4, line=664 (Interpreted frame)

xb pushed a commit to xb/jruby that referenced this issue Nov 19, 2013

@tbeauvais

This comment has been minimized.

Copy link

tbeauvais commented Nov 26, 2013

Can we get this merged into the jruby repo?

@mohamedhafez

This comment has been minimized.

Copy link
Contributor

mohamedhafez commented Dec 16, 2013

@xb, would you mind doing a pull request into the main jruby repo for your fix?

headius added a commit that referenced this issue Dec 19, 2013

Fix bug #1156 .
Signed-off-by: Charles Oliver Nutter <headius@headius.com>
@headius

This comment has been minimized.

Copy link
Member

headius commented Dec 19, 2013

I went ahead and cherry-picked your commit to jruby-1_7 and master branches. Perhaps you can come up with a test case PR too?

@headius headius closed this Dec 19, 2013

@enebo enebo modified the milestones: JRuby 1.7.10, JRuby 1.7.11 Feb 24, 2014

kares added a commit to kares/jruby-openssl that referenced this issue Jun 25, 2014

Fix bug jruby/jruby#1156 .
Signed-off-by: Charles Oliver Nutter <headius@headius.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.