Fiber finalizer fix #1075

Closed
wants to merge 1 commit into
from

Projects

None yet

3 participants

@bbrowning
Contributor

ThreadFiber's finalizer was triggering the creation of a new ThreadFiber instance. So, for every ThreadFiber instance that got garbage collected another one was created.

This behavior can lead to the finalizer queue filling up as quickly as it's emptied under heavy memory pressure and the JVM grinding to a halt.

This fixes the issue with TorqueBox integration tests under JRuby 1.7.5 hanging when they didn't under 1.7.4.

@bbrowning bbrowning ThreadFiber's finalizer was triggering the creation of a new
ThreadFiber instance. So, for every ThreadFiber instance that got
garbage collected another one was created.

This behavior can lead to the finalizer queue filling up as quickly as
it's emptied under heavy memory pressure and the JVM grinding to a
halt.
d0ecf53
@bbrowning
Contributor

There may be a better way to fix this than what I'm doing here. This is just one solution I came up with late at night.

@headius
Member
headius commented Oct 3, 2013

Yeah, I'm doing the uberfix that resolves this issue and also cleans up the fiber threads properly. With tests, even :-D

@headius headius added a commit that referenced this pull request Oct 3, 2013
@headius headius Ensure abandoned Fiber instances finalize properly. Fixes #1075
There are several fixes here to ensure that fibers finalize in
a consistent and clean way.

* Direct delivery of the kill event to the target thread.

Without this change, the delivery process caused another
ThreadFiber object to come to life, keeping objects alive and
endlessly finalizing. Delivering the event via ThreadService
caused the finalizer thread to participate in Ruby thread events,
which we do not want. The new version delivers the event directly
via a special "dieFromFinalizer" method added to RubyThread.

* Make a better effort to kill the thread.

The original code just shut down the fiber's queue and delivered
a kill event. The new logic does both of those as well as
interrupting both the RubyThread and the java.lang.Thread in case
the fiber is waiting on some other blocking call. We could go the
next step and try to forcibly Thread.stop, but that has not been
necessary yet.

* Ensure that the fiber thread never has on-stack references to
  the fiber object

In order for the fiber object to be GCable and finalizable, there
must be no live references to it. This includes references on
the fiber thread's execution stack, such as those surrounding the
yield method's call to queue.pop. In the new logic, no hard
references to the ThreadFiber object appear anywhere in code,
and the fiber object is eventually GCed and finalized as expected.

* Test to ensure that fibers are getting cleaned up

This commit also adds a test that attempts to spin up 10000 fibers
and checks that they are cleaned up by comparing pre-test and
post-test native thread counts. This should help ensure we do not
regress on fiber lifecycle in the future.
669f6a8
@headius headius closed this Oct 3, 2013
@headius
Member
headius commented Oct 3, 2013

Uberfix has landed!

@chrisseaton
Member

This seems to be broken. It hangs for a while and then we get an exception in what looks like the fiber system shutdown. I've commented it out in 383df3c and it was causing the build to error (timeout).

@chrisseaton chrisseaton reopened this Feb 1, 2014
@headius headius added a commit that referenced this pull request Feb 22, 2014
@headius headius Fix regression in #1075.
Fiber started to get GCed and shut down, but then woke up and
tried to yield. If a fiber yielding control has its own queue
shut down, allow exception to propagate normally.
969a3c4
@headius
Member
headius commented Feb 26, 2014

There were issues with the GH-1075 spec but I'm not sure they were related. In any case, we've had everything green since this PR was reopened, so I'm re-closing.

@headius headius closed this Feb 26, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment