Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNIX Sockets raising Errno::ECONNRESET or EOFError ( 9.0.0.0 && 1.7.19 ) #2750

Closed
digitalextremist opened this Issue Mar 24, 2015 · 27 comments

Comments

Projects
None yet
4 participants
@digitalextremist
Copy link
Contributor

digitalextremist commented Mar 24, 2015

Breakage proven by the break-unix-sockets branch of Reel which were being held back until all rubies could support UNIX Socket connections properly.

The test which passes under rubinius and MRI, under jRuby fails with:

Failures:

1) Reel::Server::UNIX allows connections over UNIX sockets
Failure/Error: response = Net::HTTPResponse.read_new(sock)
Errno::ECONNRESET:
  Connection reset by peer - Connection reset by peer
# org/jruby/RubyIO.java:2858:in `read_nonblock'
# RVM/rubies/jruby-1.7.19/lib/ruby/1.9/net/protocol.rb:141:in `rbuf_fill'
# RVM/rubies/jruby-1.7.19/lib/ruby/1.9/net/protocol.rb:122:in `readuntil'
# RVM/rubies/jruby-1.7.19/lib/ruby/1.9/net/protocol.rb:132:in `readline'
# RVM/rubies/jruby-1.7.19/lib/ruby/1.9/net/http.rb:2571:in `read_status_line'
# RVM/rubies/jruby-1.7.19/lib/ruby/1.9/net/http.rb:2560:in `read_new'
# ./spec/reel/unix_server_spec.rb:28:in `(root)'
# RVM/rubies/jruby-1.7.19/lib/ruby/shared/tmpdir.rb:0:in `create'
# ./spec/reel/unix_server_spec.rb:21:in `(root)'
# org/jruby/RubyBasicObject.java:1562:in `instance_exec'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example.rb:177:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example.rb:385:in `with_around_and_singleton_context_hooks'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example.rb:343:in `with_around_example_hooks'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/hooks.rb:474:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/hooks.rb:612:in `run_around_example_hooks_for'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/hooks.rb:474:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example.rb:343:in `with_around_example_hooks'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example.rb:385:in `with_around_and_singleton_context_hooks'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example.rb:174:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example_group.rb:548:in `run_examples'
# org/jruby/RubyArray.java:2412:in `map'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example_group.rb:544:in `run_examples'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example_group.rb:512:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:110:in `run_specs'
# org/jruby/RubyArray.java:2412:in `map'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:110:in `run_specs'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/configuration.rb:1526:in `with_suite_hooks'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:109:in `run_specs'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/reporter.rb:62:in `report'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:108:in `run_specs'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:86:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:70:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:38:in `invoke'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/exe/rspec:4:in `(root)'

This issue is the only issue remaining in 0.6.0 of our release, and we'd be very excited to include UNIX Socket servers after a year or so of holding that functionality back.

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Mar 24, 2015

marked against 1.7.20 so we do not forget to evaluate what is wrong here before next release...

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Mar 24, 2015

Thanks @enebo. Short of brushing up on my JAVA is there any way I could perhaps troubleshoot this further and try to help you guys surround it?

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Mar 24, 2015

@digitalextremist if you could try JRuby 9.0.0.0pre1 and see if it works there it would help. Our IO subsystem was re-written and it would be good to know if we potentially have one or two problems.

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Mar 24, 2015

@enebo it's definitely not working with 9.0.0.0 either.

I've tested with -SNAPSHOT ... you can see both strains failing here:

https://travis-ci.org/celluloid/reel/builds/55636058

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Mar 24, 2015

For the record: you'll notice jruby-openssl is also failing under 9.0.0.0 but that's tangential.

Important tidbit:

  • 1.7.19 fails with Errno::ECONNRESET
  • 9.0.0.0 fails with EOFError

Both fail at the same call though, at different locations per version:

  • org/jruby/RubyIO.java:2858:in read_nonblock 1.7.19
  • org/jruby/RubyIO.java:2768:in read_nonblock 9.0.0.0

Both of those are failing at this line in the test: response = Net::HTTPResponse.read_new(sock)

Here is the complete test:

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Mar 24, 2015

@enebo I think it's raising an exception near here for 9.0.0.0:

It's finding ret to be nil after getPartial

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Mar 24, 2015

@enebo and for 1.7.19 it seems like it's here, not sure why ECONNRESET though:

I'm sure it's in/after/during read_nonblock but not sure why there's different behavior for each.

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Mar 24, 2015

The only place where I see ECONNRESET happen is related to UDP sockets.

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Mar 24, 2015

The extremely confusing thing is that we're wrapping the calls in a rescue covering both those.

@digitalextremist digitalextremist changed the title UNIX Sockets raising Errno::ECONNRESET under 1.7.19 UNIX Sockets raising Errno::ECONNRESET or EOFError ( 9.0.0.0 && 1.7.19 ) Mar 24, 2015

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Mar 24, 2015

@digitalextremist This could be as simple as something we a missing in unix domain socket support causing getPartial to return nil. I am pretty sure we bypass Java and use our native callouts for uds.

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Mar 24, 2015

@enebo, right on. Is there a crash-test-dummy level of exposure I could get in actually attempting to modify the Java and test that on-the-fly without needing to rebuild jruby every time I modify a file?

@digitalextremist digitalextremist referenced this issue Mar 24, 2015

Closed

Release 0.6.0.pre1 #174

6 of 6 tasks complete
@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Mar 24, 2015

@enebo, @headius I've picked up a further down issue that's "cascading" into the ones I've shown, because I could see those failures. This one I had to dig to find:

Entire chain for 1.7.19 ...

ArgumentError: mode not supported for this object: r
    org/nio4r/Nio4r.java:172:in `register'
    /home/de/FOSS/celluloid-io/lib/celluloid/io/reactor.rb:43:in `wait'
    /home/de/FOSS/celluloid-io/lib/celluloid/io/reactor.rb:22:in `wait_readable'
    /home/de/FOSS/celluloid-io/lib/celluloid/io.rb:53:in `wait_readable'
    /home/de/FOSS/celluloid-io/lib/celluloid/io/unix_server.rb:19:in `accept'
    /home/de/FOSS/reel/lib/reel/server.rb:49:in `run'
    org/jruby/RubyKernel.java:1507:in `loop'
    /home/de/FOSS/reel/lib/reel/server.rb:47:in `run'
    org/jruby/RubyKernel.java:1958:in `public_send'
    /home/de/FOSS/celluloid/lib/celluloid/calls.rb:26:in `dispatch'
    /home/de/FOSS/celluloid/lib/celluloid/calls.rb:137:in `dispatch'
    /home/de/FOSS/celluloid/lib/celluloid/cell.rb:60:in `invoke'
    /home/de/FOSS/celluloid/lib/celluloid/cell.rb:71:in `task'
    /home/de/FOSS/celluloid/lib/celluloid/actor.rb:357:in `task'
    /home/de/FOSS/celluloid/lib/celluloid/tasks.rb:57:in `initialize'
    /home/de/FOSS/celluloid/lib/celluloid/tasks/task_fiber.rb:14:in `create'
Errno::ECONNRESET: Connection reset by peer - Connection reset by peer
     read_nonblock at org/jruby/RubyIO.java:2858
     read_nonblock at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/1.9/forwardable.rb:201
         rbuf_fill at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/1.9/net/protocol.rb:141
         readuntil at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/1.9/net/protocol.rb:122
          readline at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/1.9/net/protocol.rb:132
  read_status_line at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/1.9/net/http.rb:2571
          read_new at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/1.9/net/http.rb:2560
        __ensure__ at 180.rb:51
            (root) at 180.rb:45
            create at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/shared/tmpdir.rb:0
            (root) at 180.rb:44

Entire chain for 9.0.0.0-pre ...

ArgumentError: mode not supported for this object: r
    org/nio4r/Nio4r.java:172:in `register'
    /home/de/FOSS/celluloid-io/lib/celluloid/io/reactor.rb:43:in `wait'
    /home/de/FOSS/celluloid-io/lib/celluloid/io/reactor.rb:22:in `wait_readable'
    /home/de/FOSS/celluloid-io/lib/celluloid/io.rb:53:in `wait_readable'
    /home/de/FOSS/celluloid-io/lib/celluloid/io/unix_server.rb:19:in `accept'
    /home/de/FOSS/reel/lib/reel/server.rb:49:in `run'
    org/jruby/RubyKernel.java:1300:in `loop'
    /home/de/FOSS/reel/lib/reel/server.rb:47:in `run'
    org/jruby/RubyKernel.java:1832:in `public_send'
    /home/de/FOSS/celluloid/lib/celluloid/calls.rb:26:in `dispatch'
    /home/de/FOSS/celluloid/lib/celluloid/calls.rb:137:in `dispatch'
    /home/de/FOSS/celluloid/lib/celluloid/cell.rb:60:in `invoke'
    /home/de/FOSS/celluloid/lib/celluloid/cell.rb:71:in `task'
    /home/de/FOSS/celluloid/lib/celluloid/actor.rb:357:in `task'
    /home/de/FOSS/celluloid/lib/celluloid/tasks.rb:57:in `initialize'
    /home/de/FOSS/celluloid/lib/celluloid/tasks/task_fiber.rb:14:in `create'
EOFError: No message available
               read_nonblock at org/jruby/RubyIO.java:2751
               read_nonblock at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/forwardable.rb:183
                   rbuf_fill at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/net/protocol.rb:153
                   readuntil at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/net/protocol.rb:134
                    readline at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/net/protocol.rb:144
            read_status_line at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/net/http/response.rb:39
                    read_new at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/net/http/response.rb:28
  180.rb_CLOSURE_2__180.rb_1 at 180.rb:51
                      create at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/tmpdir.rb:146
                  __script__ at 180.rb:44

headius added a commit to jnr/jnr-enxio that referenced this issue Mar 24, 2015

headius added a commit to jnr/jnr-unixsocket that referenced this issue Mar 24, 2015

@headius

This comment has been minimized.

Copy link
Member

headius commented Mar 25, 2015

I've pushed revisions to jnr-enxio and jnr-unixsocket that modifies both to allow READ among the select operations for server sockets. I've also pushed a change to jruby-1_7 to update to these snapshot versions.

Let me know how it goes!

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Mar 25, 2015

Alright! Well I built a custom jruby, 1.7.20-SNAPSHOT and mounted it in rvm. Thank you very much for the patched release @headius. I'm getting the same error though.

ArgumentError: mode not supported for this object: r

You said perhaps nio4r or even Celluloid::IO might be misconfiguring the socket, but it was unlikely. How can I test that? From what you did, it ought to be readable, correct? Are there any setsockopt configurations we need to do?

Thank you for giving so much of your time today. We really, really appreciate it.

It was really cool to build, mount, and run my own jruby binary.

@headius headius closed this in 052e0d0 Mar 26, 2015

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Mar 26, 2015

Awesome! Thanks @headius. I'll check this in a bit and pin Reel 0.6.0.pre1 to the version that fixes this. What point release will that be?

On March 26, 2015 4:45:38 AM PDT, Charles Oliver Nutter notifications@github.com wrote:

Closed #2750 via 052e0d0.


Reply to this email directly or view it on GitHub:
#2750 (comment)

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Mar 26, 2015

@headius I think this closed automatically but we've tested it and it didn't work, right? Can this be reopened until it does pass?

@headius headius reopened this Mar 26, 2015

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Mar 26, 2015

@headius, thank you sir.

@headius

This comment has been minimized.

Copy link
Member

headius commented Apr 2, 2015

Back on this one today...

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Apr 2, 2015

@headius so happy to hear that. Thank you.

@headius

This comment has been minimized.

Copy link
Member

headius commented Apr 2, 2015

Ah-ha!

I believe the remaining issue is a bug in nio4r; it uses a selector from the wrong provider, and that error gets misinterpreted as a bad selection operation.

When I add printStackTrace to Nio4r.java:172, I get this:

java.nio.channels.IllegalSelectorException
    at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:128)
    at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:212)
    at java.nio.channels.SelectableChannel.register(SelectableChannel.java:280)
    at org.nio4r.Nio4r$Selector.register(Nio4r.java:170)

UNIX sockets in JRuby come from jnr-enxio, which has its own selector provider. Selectors and selectable channels must come from the same provider.

I'll see if I can come up with a patch for nio4r.

@digitalextremist

This comment has been minimized.

Copy link
Contributor Author

digitalextremist commented Apr 2, 2015

Awesome! Excited to see what you turn up next. Will be ready to check-in a nio4r patch. /cc: @tarcieri

@headius

This comment has been minimized.

Copy link
Member

headius commented Apr 2, 2015

This will take a bit more work than I'd hoped; nio4r needs to duplicate logic we have in JRuby for dealing with selectors from different providers.

@headius

This comment has been minimized.

Copy link
Member

headius commented Apr 2, 2015

In the interim I will test 1.7 with the updated jnr-unixsocket stuff and see if it has reduced to the same problem.

@tarcieri

This comment has been minimized.

Copy link

tarcieri commented Apr 2, 2015

@headius is there any kind of API we can standardize on to avoid the duplication?

@headius

This comment has been minimized.

Copy link
Member

headius commented Apr 2, 2015

I've confirmed the EOFError in 1.7 is now also caused by this illegal selector error. The difference in exception is probably due to 9k having recent ports of MRI's IO logic.

I'm going to resolve this as fixed, since jnr-unixsocket and jnr-enxio and jruby itself appear to be doing the right thing. We'll deal with the nio4r issue separately.

@headius

This comment has been minimized.

Copy link
Member

headius commented May 4, 2015

@tarcieri Should I file an issue about this, or shall we discuss realtime a bit more? In any case I'm closing this because JRuby should be doing the right thing if you use Ruby APIs, and the work to be done is in nio4r.

@headius headius closed this May 4, 2015

@tarcieri

This comment has been minimized.

Copy link

tarcieri commented May 4, 2015

Maybe open an nio4r issue about this and we can discuss there. FWIW I feel like nio4r is somewhat coupled to JRuby internals, and maybe needs some APIs surfaced (even just in Java-land) to bind to for this sort of thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.