New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fiber#alive? returns true even when already finished #4838

Closed
satosho opened this Issue Nov 3, 2017 · 8 comments

Comments

Projects
None yet
2 participants
@satosho

satosho commented Nov 3, 2017

It looks to be related to #1789 and #1949.

Environment

jruby 9.1.13.0 (2.3.3) 2017-09-06 8e1c115 Java HotSpot(TM) 64-Bit Server VM 25.151-b12 on 1.8.0_151-b12 +jit [mswin32-x86_64]
Windows 10 Home

Expected Behavior

Fiber#alive? returns false for all the calls in the following code.

require "fiber"

def check(&block)
  1000.times.map do
    fiber = Fiber.new(&block)
    fiber.resume
    fiber.alive?
  end
end

results = [
  check {},
  check { "Text" },
  check { Fiber.new {} },
  check { sleep(0.001) },
]

puts ["", "FALSE", "TRUE"].join("\t")
results.each_with_index do |r, i|
  puts ["test#{i}", r.count(false), r.count(true)].join("\t")
end

On CRuby:

$ ruby check_fiber_alive.rb
        FALSE   TRUE
test0   1000    0
test1   1000    0
test2   1000    0
test3   1000    0

Actual Behavior

On JRuby, the results can be true.

$ jruby check_fiber_alive.rb
        FALSE   TRUE
test0   998     2
test1   999     1
test2   288     712
test3   987     13

The code sometimes runs into NullPointerException.

$ jruby check_fiber_alive.rb
Unhandled Java exception: java.lang.NullPointerException
java.lang.NullPointerException: null
                 alive at org/jruby/ext/fiber/ThreadFiber.java:243
             __alive__ at org/jruby/ext/fiber/ThreadFiber.java:229
                  call at org/jruby/ext/fiber/ThreadFiber$INVOKER$i$0$0$__alive__.gen:-1
                  call at org/jruby/internal/runtime/methods/AliasMethod.java:56
                  call at org/jruby/runtime/callsite/CachingCallSite.java:129
  invokeOther16:alive? at check_fiber_alive.rb:7
        block in check at check_fiber_alive.rb:7
           yieldDirect at org/jruby/runtime/CompiledIRBlockBody.java:156
                 yield at org/jruby/runtime/BlockBody.java:114
                 yield at org/jruby/runtime/Block.java:165
                  call at org/jruby/RubyEnumerable.java:846
         yieldSpecific at org/jruby/runtime/CallBlock19.java:77
         yieldSpecific at org/jruby/runtime/Block.java:134
                 times at org/jruby/RubyFixnum.java:299
                  call at org/jruby/RubyFixnum$INVOKER$i$0$0$times.gen:-1
                  call at org/jruby/internal/runtime/methods/JavaMethod.java:498
               finvoke at org/jruby/RubyClass.java:522
                invoke at org/jruby/runtime/Helpers.java:395
            callMethod at org/jruby/RubyBasicObject.java:393
                  each at org/jruby/RubyEnumerator.java:323
                  call at org/jruby/RubyEnumerator$INVOKER$i$each.gen:-1
               finvoke at org/jruby/RubyClass.java:512
                invoke at org/jruby/runtime/Helpers.java:383
            callEach19 at org/jruby/RubyEnumerable.java:116
         collectCommon at org/jruby/RubyEnumerable.java:838
                   map at org/jruby/RubyEnumerable.java:830
                  call at org/jruby/RubyEnumerable$INVOKER$s$0$0$map.gen:-1
             callBlock at org/jruby/runtime/callsite/CachingCallSite.java:139
                  call at org/jruby/runtime/callsite/CachingCallSite.java:145
     invokeOther20:map at check_fiber_alive.rb:4
                 check at check_fiber_alive.rb:4
                  call at org/jruby/internal/runtime/methods/CompiledIRMethod.java:90
          cacheAndCall at org/jruby/runtime/callsite/CachingCallSite.java:328
             callBlock at org/jruby/runtime/callsite/CachingCallSite.java:141
                  call at org/jruby/runtime/callsite/CachingCallSite.java:145
   invokeOther28:check at check_fiber_alive.rb:15
                <main> at check_fiber_alive.rb:15
   invokeWithArguments at java/lang/invoke/MethodHandle:-1
                  load at org/jruby/ir/Compiler.java:95
             runScript at org/jruby/Ruby.java:828
           runNormally at org/jruby/Ruby.java:747
           runNormally at org/jruby/Ruby.java:765
           runFromMain at org/jruby/Ruby.java:578
         doRunFromMain at org/jruby/Main.java:417
           internalRun at org/jruby/Main.java:305
                   run at org/jruby/Main.java:232
                  main at org/jruby/Main.java:204

Workaround:
Fiber#alive? looks to return false as expected by adding a short sleep between resume and alive? calls.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 9, 2017

Member

The NPE is a real bug that we should fix.

The behavior of alive? on JRuby may be subject to thread scheduling. Each Fiber in JRuby is backed by a real native thread, so portions of their bookkeeping may run in parallel with the parent thread. In this case, I would guess that the final result has been returned from the Fiber, but it has not completely died by the time you call alive? once in a while.

I'll take a look at both issues. In the future, it would be best to file them separately.

Member

headius commented Nov 9, 2017

The NPE is a real bug that we should fix.

The behavior of alive? on JRuby may be subject to thread scheduling. Each Fiber in JRuby is backed by a real native thread, so portions of their bookkeeping may run in parallel with the parent thread. In this case, I would guess that the final result has been returned from the Fiber, but it has not completely died by the time you call alive? once in a while.

I'll take a look at both issues. In the future, it would be best to file them separately.

headius added a commit that referenced this issue Nov 9, 2017

Improvements to Fiber#alive?. Fixes #4838.
* Store thread between null check and alive check to avoid NPE.
* Use || for the conditions so any of them will indicate deadness.
  This allows the Fiber's queue shutdown, which happens before
  the last value is returned, to indicate that the thread is no
  longer alive (or soon will be truly dead).

@headius headius closed this in 3e01848 Nov 9, 2017

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 9, 2017

Member

This is fixed for JRuby 9.1.15.0.

A better, albeit cumbersome workaround for you...this basically implements what I committed, but in Ruby:

class org::jruby::ext::fiber::ThreadFiber
  field_reader :thread
  field_reader :data

  class FiberData
    field_reader :queue
  end
end


class Fiber
  def alive?
    thread = JRuby.reference(self).thread

    if thread == nil || !JRuby.reference(thread).isAlive || JRuby.reference(self).data.queue.isShutdown
      return false
    end

    true
  end
end
Member

headius commented Nov 9, 2017

This is fixed for JRuby 9.1.15.0.

A better, albeit cumbersome workaround for you...this basically implements what I committed, but in Ruby:

class org::jruby::ext::fiber::ThreadFiber
  field_reader :thread
  field_reader :data

  class FiberData
    field_reader :queue
  end
end


class Fiber
  def alive?
    thread = JRuby.reference(self).thread

    if thread == nil || !JRuby.reference(thread).isAlive || JRuby.reference(self).data.queue.isShutdown
      return false
    end

    true
  end
end

@headius headius added this to the JRuby 9.1.15.0 milestone Nov 9, 2017

@satosho

This comment has been minimized.

Show comment
Hide comment
@satosho

satosho Nov 11, 2017

Thank you, I have tried both the workaround and nightly build.
I'm afraid alive? still returns true occasionally in my environment.

$ java -jar jruby-complete-9.1.15.0-SNAPSHOT.jar -v
jruby 9.1.15.0-SNAPSHOT (2.3.3) 2017-11-11 65a004f Java HotSpot(TM) 64-Bit Server VM 25.151-b12 on 1.8.0_151-b12 +jit [mswin32-x86_64]

$ java -jar jruby-complete-9.1.15.0-SNAPSHOT.jar check_fiber_alive.rb
        FALSE   TRUE
test0   999     1
test1   1000    0
test2   162     838
test3   960     40

satosho commented Nov 11, 2017

Thank you, I have tried both the workaround and nightly build.
I'm afraid alive? still returns true occasionally in my environment.

$ java -jar jruby-complete-9.1.15.0-SNAPSHOT.jar -v
jruby 9.1.15.0-SNAPSHOT (2.3.3) 2017-11-11 65a004f Java HotSpot(TM) 64-Bit Server VM 25.151-b12 on 1.8.0_151-b12 +jit [mswin32-x86_64]

$ java -jar jruby-complete-9.1.15.0-SNAPSHOT.jar check_fiber_alive.rb
        FALSE   TRUE
test0   999     1
test1   1000    0
test2   162     838
test3   960     40

@headius headius reopened this Nov 21, 2017

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 21, 2017

Member

Hmm strange. My local env does have a couple spurious alive?s but only in the Fiber.new { } case.

I'll try to figure out how these remaining cases would be happening.

Member

headius commented Nov 21, 2017

Hmm strange. My local env does have a couple spurious alive?s but only in the Fiber.new { } case.

I'll try to figure out how these remaining cases would be happening.

headius added a commit that referenced this issue Nov 21, 2017

Begin the shutdown process before releasing the last waiter.
Once we push the final result on the resume queue the waiting
thread may immediately resume, and see a Fiber that appears to
be alive but which will actually soon be dead. By moving one of
the death conditions before the final push, we will at least
ensure the Fiber appears dead.

There is an opposite side effect here, in that now a Fiber may
appear to be dead a brief moment before it returns its last
value, but this can only be observed from a third thread, which
could not atomically know that both the final value had returned
and the Fiber had declared itself dead.

Fixes #4838.
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 21, 2017

Member

I have committed an additional fix that makes your script run nearly (I'll come back to this) perfect. It should end up in tonight's nightly builds. Unfortunately, the fix is very deep inside our Fiber logic, which means there's no workaround I know of.

I have on occasion, with or without this patch, seen an a FiberError claiming resume was called on a dead thread. I am still investigating that.

Member

headius commented Nov 21, 2017

I have committed an additional fix that makes your script run nearly (I'll come back to this) perfect. It should end up in tonight's nightly builds. Unfortunately, the fix is very deep inside our Fiber logic, which means there's no workaround I know of.

I have on occasion, with or without this patch, seen an a FiberError claiming resume was called on a dead thread. I am still investigating that.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 21, 2017

Member

It appears to be the check { Fiber.new {} } case that's triggering failure. I do not have an explanation yet.

Member

headius commented Nov 21, 2017

It appears to be the check { Fiber.new {} } case that's triggering failure. I do not have an explanation yet.

@satosho

This comment has been minimized.

Show comment
Hide comment
@satosho

satosho Nov 22, 2017

Regarding the return value, I have confirmed that alive? behaves as expected also in my local environment. Thank you again.
(jruby 9.1.15.0-SNAPSHOT (2.3.3) 2017-11-22 ff9a618)

satosho commented Nov 22, 2017

Regarding the return value, I have confirmed that alive? behaves as expected also in my local environment. Thank you again.
(jruby 9.1.15.0-SNAPSHOT (2.3.3) 2017-11-22 ff9a618)

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 22, 2017

Member

@satosho Thank you. I have not found a conclusive reason for the other failure, so I may file that as a separate issue and call this one fixed.

Member

headius commented Nov 22, 2017

@satosho Thank you. I have not found a conclusive reason for the other failure, so I may file that as a separate issue and call this one fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment