New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JRuby's native `SortedSet` lies about adding an object #5035

Closed
chuckremes opened this Issue Feb 7, 2018 · 12 comments

Comments

Projects
None yet
3 participants
@chuckremes

chuckremes commented Feb 7, 2018

Environment

Provide at least:

  • JRuby version (jruby -v) and command line (flags, JRUBY_OPTS, etc)
    jruby 9.2.0.0-SNAPSHOT (2.4.1) 2018-02-01 aa2ea83 Java HotSpot(TM) 64-Bit Server VM 25.144-b01 on 1.8.0_144-b01 +jit [darwin-x86_64]

  • Operating system and platform (e.g. uname -a)
    Darwin Charless-Air 16.7.0 Darwin Kernel Version 16.7.0: Mon Nov 13 21:56:25 PST 2017; root:xnu-3789.72.11~1/RELEASE_X86_64 x86_64

Other relevant info you may wish to add:

  • Installed or activated gems
  • Application/framework version (e.g. Rails, Sinatra)
  • Environment variables
    RUN ulimit -n 2000 before following the directions to duplicate the error.

Expected Behavior

  • Describe your expectation of how JRuby should behave, perhaps by showing how CRuby/MRI behaves.

Script should exit after about 60 seconds back to a command prompt.

Actual Behavior

The program hangs indefinitely. After diving into the guts of the issue, a Timers class that I wrote is the culprit. It uses SortedSet to keep track of timers. At some point during execution, several timers that are supposedly added to the set are not actually recorded. I discovered this when asserting in my code that @timers.size == @timers.to_a.size and having it fail. I tried to add a similar assert to the Java code but while it compiles it doesn't actually seem to work. Since not all timers fire, another check within my program prevents it from exiting, however the bug is caused by a flaw in SortedSet.

Here's how to reproduce the issue.

  1. % git clone git@github.com:chuckremes/ruby-io.git
  2. `cd ruby-io/examples
  3. git checkout b2be9780d2272c6a2e178f48a74333193ea536b2
  4. ulimit -n 2000 # give yourself a few more file descriptors than the 256 default
  5. ruby echo_tcp.rb
  6. Program will exit when the @Timers object gets out of synch. I recommend adding an assert to the Java code to figure out why the size of the hash and the size of the TreeSet are diverging.
@kares

This comment has been minimized.

Member

kares commented Feb 8, 2018

Thanks Chuck, wouth checking I'm hoping there isn't much concurrency ...
turning out difficult running, in the mean time did you verify the example works fine under JRuby 9.1?

../jruby/bin/jruby -rfcntl -Ilib examples/echo_tcp.rb 
NameError: uninitialized constant Fcntl::O_NOCTTY
       const_missing at org/jruby/RubyModule.java:3392
  <module:Constants> at /home/kares/workspace/oss/ruby-io/lib/io/platforms/common_constants.rb:56
  <module:Platforms> at /home/kares/workspace/oss/ruby-io/lib/io/platforms/common_constants.rb:3
          <class:IO> at /home/kares/workspace/oss/ruby-io/lib/io/platforms/common_constants.rb:2
              <main> at /home/kares/workspace/oss/ruby-io/lib/io/platforms/common_constants.rb:1

followed by :

FFI::NotFoundError: Function 'disconnectx' not found in [libc.so.6]
            attach_function at /home/kares/workspace/oss/jruby/lib/ruby/stdlib/ffi/library.rb:241
         <module:Platforms> at /home/kares/workspace/oss/ruby-io/lib/io/platforms/common/ffi.rb:32
                 <class:IO> at /home/kares/workspace/oss/ruby-io/lib/io/platforms/common/ffi.rb:2
                     <main> at /home/kares/workspace/oss/ruby-io/lib/io/platforms/common/ffi.rb:1
@headius

This comment has been minimized.

Member

headius commented Feb 8, 2018

Perhaps it's time we finally dumped the CRuby impl of set.rb and made a proper one that's concurrency-friendly and not based on Hash.

@kares

This comment has been minimized.

Member

kares commented Feb 8, 2018

@headius this is a bug with the set.rb rewrite (to .java) on master - hopefully not concurrency related

@headius

This comment has been minimized.

Member

headius commented Feb 8, 2018

@chuckremes

This comment has been minimized.

chuckremes commented Feb 8, 2018

@kares Unfortunately I can't run it on any 9.1.x release because of issue 4920. That Fiber fix hasn't been part of any release yet.

As for the O_NOCTTY, I'm not seeing that on my OSX or Linux box. The disconnectx error must be from you running this on linux. I'll check this morning (I only test on my linux image about once per week) and push a fix. Sorry for distracting from the real problem!

If you have access to an OSX machine, the current code should run fine.

@chuckremes

This comment has been minimized.

chuckremes commented Feb 8, 2018

And I'll add a comment here like I did in IRC. There is no concurrent access to this SortedSet. It is only ever accessed by the thread running the IOLoop so there should not be any races.

From my testing yesterday, what I see happening is that the superclass's hash is adding the value. That's why size returns the values that it does, because SortedSet relies on the hash.size(). But when running to_a, it generates the array from the TreeSet order variable. This tells me that the value is somehow being dropped from the TreeSet. Perhaps it thinks it is adding a duplicate and just overwrites another value? Anyway, the problem is the disparity between the hash and the treeset.

@kares

This comment has been minimized.

Member

kares commented Feb 8, 2018

Thanks Chuck, I assumed there isn't much concurrency but wanted to double check.
will definitely fix this one, we just need to be able to reproduce - ideally on Linux.
I do have a MacBook box around but its not setup - thus I rather wait if you're able to get this going...

@chuckremes

This comment has been minimized.

chuckremes commented Feb 8, 2018

@kares Pull latest master and try again. Until I figure out my epoll issues on linux I have forced all platforms to use select. On my Ubuntu 17.10 box the example code now runs to completion under MRI. I do not have jruby installed and working... I keep getting some complaint about trustedanchors that I haven't been able to fix yet.

Let me know if you need anything else.

@kares

This comment has been minimized.

Member

kares commented Feb 8, 2018

all reproducing now - the issue relates to trying to implement a proper SortedSet by JRuby ...
a one that does sort its elements upon insertion (unlike MRI's set.rb), doing so Java's TreeSet has a glitch: compare should align with equals. which is not the case for IO's Timer :

        def <=>(other)
          @fire_time <=> other.fire_time
        end

        def ==(other)
          # need a more specific equivalence test since multiple timers could be
          # scheduled to go off at exactly the same time
          #      @fire_time == other.fire_time &&
          #      @callback == other.callback &&
          #      @periodical == other.periodical?
          object_id == other.object_id
        end

so for the underlying Hash instances are never equal but for the TreeSet (order) they might be (<=>)
... which I can confirm if I adjust (no more failures) :

        def <=>(other)
          cmp = @fire_time <=> other.fire_time
          cmp.zero? ? object_id <=> other.object_id : cmp
        end

this is a bug (regression) of not having proper MRI compat on master -> will either need to blindly follow set.rb's SortedSet inefficiencies and not play smart or adjust the TreeSet impl to do additional equal check

@chuckremes

This comment has been minimized.

chuckremes commented Feb 8, 2018

I will make your modification to my code for now. Do you plan to make the native Java SortedSet conform to the Ruby semantics?

@kares

This comment has been minimized.

Member

kares commented Feb 9, 2018

Do you plan to make the native Java SortedSet conform to the Ruby semantics?

yep MRI compat first - will use your original <=> + == implementation as a guideline/test-case ... thus :

will either need to blindly follow set.rb's SortedSet inefficiencies and not play smart or adjust the TreeSet impl to do additional equal check

@headius

This comment has been minimized.

Member

headius commented Feb 9, 2018

@kares I guess the two options here are to implement our own Set that matches MRI semantics (i.e. not wrap slightly-off Java sets) or add a linked list for insertion ordering. The latter would still be more efficient than a fully-wrapped Hash but the former would be the most efficient. We might also take this opportunity to implement direct addressing as added in Ruby 2.4, which we could them reapply to our Hash.

@kares kares added this to the JRuby 9.2.0.0 milestone Feb 12, 2018

@kares kares closed this in af82b10 Feb 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment