Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Error: This Fiber is a zombie #131

Open
glasser opened this Issue · 7 comments

2 participants

@glasser

If I'm able to catch an exception with the text Error: this Fiber is a zombie, then that's a fibers bug, right? Not user error?

Working on finding a minimal reproduction, but while we're working on that, curious if we're correct that this is definitely a Fibers bug... (In 1.0.0; going to try 1.0.1 next.)

@glasser

OK, I figured out what we're doing.

We're calling Fiber.yield in a fiber that we don't keep any reference to. So it eventually gets GCd. This causes stack unwinding, which throws from the Fiber.yield. But our Fiber.yield is wrapped (very indirectly) in a try/finally. And the finally clause itself has a f.wait in it.

The effect for us (on OSX at least) is that the main fiber actually gets re-run from the top!

Here's a reproduction:

var Fiber = require('fibers');

// Run this every so often so that GC and DestroyOrphans happens.
setInterval(function () {
  Fiber(function () { global.gc() }).run();
}, 1000);

Fiber(function () {
  console.log("TOP");
  try {
    Fiber.yield();
  } finally {
    console.log("finally");
    var f = Fiber.current;
    setTimeout(function () {
      console.log("re-running");
      f.run();
      console.log("re-ran");
    }, 10);
    console.log("yielding");
    Fiber.yield();
    console.log("yeld");
  }
}).run();
console.log("finishing main program");

For me, with Node 0.8.24 and Fibers 1.0.0, this prints:

$ node --expose-gc zombie.js 
TOP
finishing main program
finally
yielding
re-running
TOP
re-ran
finally
yielding
re-running
TOP
re-ran
finally
yielding
re-running
TOP
re-ran
finally
yielding
re-running
TOP
re-ran

and so forth.

@glasser

So I guess the short answer is, I shouldn't be letting a fiber that I care about get GCed. And in fact, the strategy of using try { Fiber.yield } finally { cleanup which involves waiting } is problematic.

But it certainly was surprising that the fiber got re-run from the top!

@glasser

Also occurs with Node 0.10.13, Fibers 1.0.1.

@glasser

At the very least, I think doing this should cause your code to crash, not re-run the fiber from the top. We would have figured out what was going on much earlier if that had happened.

@glasser

Ah, OK. The second call to Fiber.yield immediately re-throws the "zombie" exception, so by the time the callback runs, the fiber has completely unwound. And you can run fibers multiple times... and that's what the f.run() does.

Now, for me, the f.run() was actually hidden inside a future.return(). I think it is very surprising that future.return() (or throw) can call fiber.run on a fiber where fiber.started is false (ie, a fiber that may have been running when the wait was called but is not any more). Can this happen for any reason other than that the Future.wait got terminated by a zombie exception? If that's the only way that this can happen, can cb in Future.wait check fiber.started and exit with a message otherwise?

@laverdet
Owner

So I guess the short answer is, I shouldn't be letting a fiber that I care about get GCed. And in fact, the strategy of using try { Fiber.yield } finally { cleanup which involves waiting } is problematic.

Actually if you grab a reference to the fiber in the finally it should interrupt the unrolling process and you'll be just fine.

But it certainly was surprising that the fiber got re-run from the top!

Yeah the overloading of run() to both start and a resume a fiber was a questionable design decision in retrospect. I think the main issue that needs to be addressed here is the fact that future.return() can re-run a thrown fiber future (if I'm understanding correctly). I will have to look into this in more detail for sure.

@glasser

Right, I agree that grabbing a reference to the Fiber in finally kept it from being GCed.

I don't think I was actually using FiberFuture, if that's what you meant. But yeah, I think the main issue was that a future.return() re-ran a fiber which had been waiting on that future but which got unwound. (Or well, a fiber which tried to wait on that future but when it called yield, it hit the "I'm trying to unwind, immediately rethrow zombie" check.)

@AlexeyMK AlexeyMK referenced this issue from a commit in AlexeyMK/meteor
@Slava Slava Fix ssh-tunnel reconnect problem by keeping reference to Fiber we wan…
…t to yield to.

For more info look here: laverdet/node-fibers#131
c185b2b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.