Skip to content

Conversation

nevans
Copy link
Collaborator

@nevans nevans commented Oct 3, 2025

Combined with #528, this replaces #470, and fixes #454.

@nevans nevans force-pushed the add-jruby-to-ci branch 4 times, most recently from 2445096 to bd25f1a Compare October 3, 2025 22:47
@nevans nevans force-pushed the add-jruby-to-ci branch 2 times, most recently from f391498 to b144058 Compare October 3, 2025 22:55
@nevans nevans changed the title ✅ Add JRuby to CI ✅🚧 Add JRuby to CI Oct 3, 2025
@headius
Copy link

headius commented Oct 8, 2025

Hey checking back in after a long time. I'd like to help get this green... have you looked into the failures at all?

@nevans
Copy link
Collaborator Author

nevans commented Oct 9, 2025

@headius Thanks!

I haven't looked into it recently, beyond rebasing the branch and verifying it still had the same issues... the tests that (reliably) fail in CI are all passing locally for me. That certainly slows down my debugging ability.

I don't remember for sure, but I think that some of the other tests that are already omitted or pending in this branch may also pass locally (or maybe they're flaky?). I'd prefer to fix all of them and omit none: omitting every test that actually makes a connection doesn't give me much confidence. But, for the ratchet effect, getting JRuby into CI with a bunch of critical tests omitted is probably better than not testing it at all.

The following is going off of memory the last time I looked into this... and I didn't have time to fully test my hypothesis, so I may be completely wrong, buuuut (caveat's aside)... I think the biggest source of test failures are exceptions from IO#close. My interpretation of the CRuby IO#close code is that I don't think it can raise that exception.

When IO#close is called, rb_io_close_m checks (with the GVL) whether or not it's already closed. And if not, io_close_fptr goes ahead and closes it, without dropping the GVL (as far as I can tell) or adding itself to the list of threads that are blocking on it. At this point, (if it were to release the GVL) if any other thread attempts to close the IO, it would just return immediately without exception. After that, it notifies any other threads that are blocking on that IO to raise that exception.

But I might be misunderstanding what the code does and biased by what I want it to do. 😉 I found some issues in the b.r-l.o tracker that led me to think maybe it didn't or doesn't work the way I assume. So, I looked to see if there was a test or a spec, and there was (and I think it was written by you, so thanks!). But (as far as I can tell) the spec for this doesn't distinguish between an error that's raised by IO#close or an error that's caused by IO#close and raised by another thread that's blocked on IO#close.

  it 'raises an IOError with a clear message' do
    matching_exception = nil

    -> do
      IOSpecs::THREAD_CLOSE_RETRIES.times do
        read_io, write_io = IO.pipe
        going_to_read = false

        thread = Thread.new do
          begin
            going_to_read = true
            read_io.read
          rescue IOError => ioe
            if ioe.message == IOSpecs::THREAD_CLOSE_ERROR_MESSAGE
              matching_exception = ioe
            end
            # try again
          end
        end

        # best attempt to ensure the thread is actually blocked on read
        Thread.pass until going_to_read && thread.stop?
        sleep(0.001)

        read_io.close    # <--------------------------- does this raise the exception?
        thread.join
        write_io.close   # <--------------------------- does this raise the exception?

        matching_exception&.tap {|ex| raise ex} # <---- or does this raise the exception?
      end
    end.should raise_error(IOError, IOSpecs::THREAD_CLOSE_ERROR_MESSAGE)
  end

My hypothesis is that CRuby's IO#close can't raise this exception, but JRuby's can. I'd like to update the spec to make that distinction. But... I had to switch to other projects before I could test this hypothesis.

With all of that said, even if JRuby's IO#close does behaves differently, it's highly likely that there's still threading bugs in the tests or in the client that need to be fixed. We can still improve the code to not trigger that condition, not just as a workaround but as an improvement that may fix other issues too.

@nevans
Copy link
Collaborator Author

nevans commented Oct 9, 2025

Okay, so looking into it just now, I think my memory or interpretation (or both!) was off by a bit: any blocking threads are notified before the file descriptor is completely cleared. That does introduce the possibility that multiple threads could call IO#close while waiting for the blocking threads to be notified. And, the fiber scheduler also gets to play a role in this, which opens up even more possibilities.

Nevertheless, it still seems to me that, the closer doesn't add itself blocking list for that IO, so it should never be interrupted with that exception. And, after all of the (other) blocking threads have been removed from the blocking list, the closer thread gets to do its thing while keeping the GVL.

Regardless, I'll repeat myself on this:

even if JRuby's IO#close does behaves differently, it's highly likely that there's still threading bugs in the tests or in the client that need to be fixed. We can still improve the code to not trigger that condition, not just as a workaround but as an improvement that may fix other issues too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add JRuby and TruffleRuby to CI

2 participants