net/http lib throws java.nio.channels.NotYetConnectedException #4049

Closed
sumitmah opened this Issue Aug 4, 2016 · 5 comments

Comments

Projects
None yet
2 participants
@sumitmah

sumitmah commented Aug 4, 2016

Environment

  • JRuby version

    jruby 9.1.0.0 (2.3.0) 2016-05-02 a633c63 Java HotSpot(TM) 64-Bit Server VM 25.91-b14 on 1.8.0_91-b14 +jit [linux-x86_64]

    Able to reproduce on following version as well.

    jruby 9.1.3.0-SNAPSHOT (2.3.0) 2016-08-04 b104023 Java HotSpot(TM) 64-Bit Server VM 25.91-b14 on 1.8.0_91-b14 +jit [linux-x86_64]

  • Operating system and platform (e.g. uname -a)
    Linux 3.19.0-64-generic #72~14.04.1-Ubuntu SMP Fri Jun 24 17:59:48 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

  • We are able to reproduce issue using both open-jdk 1.7 as well as oracle 1.8

Expected Behavior

  • Should never throw NotYetConnectedException when you make Net::HTTP.get call.

Actual Behavior

  • here is the script that I used to reproduce issue on linux-ubuntu based machine. Just run rails server in development mode and do any HTTP GET call(I don't this is tied to GET it can happen for any HTTP method). Following is simple ping api we have which just returns http ok. We have observed exception comes during the high load.
require 'net/http'
puts Process.pid
#this number you might have to tweak to reproduce the issue. 
threads = 1000.times.collect do
  Thread.new do
    puts "#{Thread.current.object_id } - Net::HTTP.get"
    string = URI.parse("http://localhost:3000/ping?i=#{Thread.current.object_id}")
    puts "#{Thread.current.object_id } - #{Net::HTTP.get(string)}"
    puts "#{Thread.current.object_id } - Net::HTTP.get"
  end
end
threads.each do |thread|
  thread.join
end
  • To reproduce I ran above script simultaneously for 3 times, which might not be necessary though.
  • We think issue is in RubyTCPSocket.java file. Here is snippet from the file,
    channel.configureBlocking(false);
    channel.connect( new InetSocketAddress(InetAddress.getByName(remoteHost), remotePort) );
    context.getThread().select(channel, this, SelectionKey.OP_CONNECT);
    channel.finishConnect();
    // only try to set blocking back if we succeeded to finish connecting
    channel.configureBlocking(true);

    System.out.println(context.getThread().id() + "channel.isConnected()"); // false when we receive exception
    System.out.println(context.getThread().id() + " " + socket.isClosed()); // false when we receive exception
    System.out.println(context.getThread().id() + " " + channel.isConnectionPending()); // true when we receive exception
    System.out.println(context.getThread().id() + " " + channel.isOpen()); // true when we receive exception
  • In above snippet of code, we are not checking the return values of select and finishConnect and continue because of that later when we do read or write on socket it throws NotYetConnectedException.
    Here is gist link for stackstrace
  • on linux context.getThread().select(channel, this, SelectionKey.OP_CONNECT); call uses EPollSelectorImpl#doSelect method to wait for events on socket.
  • Even though it does the blocking select above, we have observed doSelect returns 0. Internally it calls native implementation of epollWait for blocking select. Still epollWait returns 0, not sure why need to debug more here.
  • On OSX it uses KQueueSelectorImpl we observed same doSelect returns 0. Here internally it is using kevent for blocking select, from native implementation of kevent from openjdk looks like when EINTR is received kevent ignores the event and returns 0. Here is snippet
   result = kevent(kq, NULL, 0, kevs, kevCount, tsp);

    if (result < 0) {
        if (errno == EINTR) {
            // ignore EINTR, pretend nothing was selected
            result = 0;
        } else {
            JNU_ThrowIOExceptionWithLastError(env, "KQueueArrayWrapper: kqueue failed");
        }
    }
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 11, 2016

Member

Wow thanks for doing your homework on this one! I think you are right...this code needs to check whether the select was actually successful.

Member

headius commented Aug 11, 2016

Wow thanks for doing your homework on this one! I think you are right...this code needs to check whether the select was actually successful.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 11, 2016

Member

I've pushed a fix to master that checks both return values. It might be overkill to check finishConnect, but it's better to be safe I think.

I'm not sure how to write a test for this, since it's a rather tricky race that would be hard to force.

Member

headius commented Aug 11, 2016

I've pushed a fix to master that checks both return values. It might be overkill to check finishConnect, but it's better to be safe I think.

I'm not sure how to write a test for this, since it's a rather tricky race that would be hard to force.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 11, 2016

Member

Fixed in af1d26d.

Member

headius commented Aug 11, 2016

Fixed in af1d26d.

@headius headius closed this Aug 11, 2016

@headius headius added this to the JRuby 9.1.3.0 milestone Aug 11, 2016

@sumitmah

This comment has been minimized.

Show comment
Hide comment
@sumitmah

sumitmah Aug 12, 2016

Thank you very much @headius for the fix.

Thank you very much @headius for the fix.

@sumitmah

This comment has been minimized.

Show comment
Hide comment
@sumitmah

sumitmah Aug 12, 2016

btw @headius why we are doing non blocking connect in first place? I'm just curious.

btw @headius why we are doing non blocking connect in first place? I'm just curious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment