Error: This Fiber is a zombie #131

Open
glasser opened this Issue Jul 16, 2013 · 7 comments

Comments

Projects
None yet
2 participants
@glasser

glasser commented Jul 16, 2013

If I'm able to catch an exception with the text Error: this Fiber is a zombie, then that's a fibers bug, right? Not user error?

Working on finding a minimal reproduction, but while we're working on that, curious if we're correct that this is definitely a Fibers bug... (In 1.0.0; going to try 1.0.1 next.)

@glasser

This comment has been minimized.

Show comment
Hide comment
@glasser

glasser Jul 16, 2013

OK, I figured out what we're doing.

We're calling Fiber.yield in a fiber that we don't keep any reference to. So it eventually gets GCd. This causes stack unwinding, which throws from the Fiber.yield. But our Fiber.yield is wrapped (very indirectly) in a try/finally. And the finally clause itself has a f.wait in it.

The effect for us (on OSX at least) is that the main fiber actually gets re-run from the top!

Here's a reproduction:

var Fiber = require('fibers');

// Run this every so often so that GC and DestroyOrphans happens.
setInterval(function () {
  Fiber(function () { global.gc() }).run();
}, 1000);

Fiber(function () {
  console.log("TOP");
  try {
    Fiber.yield();
  } finally {
    console.log("finally");
    var f = Fiber.current;
    setTimeout(function () {
      console.log("re-running");
      f.run();
      console.log("re-ran");
    }, 10);
    console.log("yielding");
    Fiber.yield();
    console.log("yeld");
  }
}).run();
console.log("finishing main program");

For me, with Node 0.8.24 and Fibers 1.0.0, this prints:

$ node --expose-gc zombie.js 
TOP
finishing main program
finally
yielding
re-running
TOP
re-ran
finally
yielding
re-running
TOP
re-ran
finally
yielding
re-running
TOP
re-ran
finally
yielding
re-running
TOP
re-ran

and so forth.

glasser commented Jul 16, 2013

OK, I figured out what we're doing.

We're calling Fiber.yield in a fiber that we don't keep any reference to. So it eventually gets GCd. This causes stack unwinding, which throws from the Fiber.yield. But our Fiber.yield is wrapped (very indirectly) in a try/finally. And the finally clause itself has a f.wait in it.

The effect for us (on OSX at least) is that the main fiber actually gets re-run from the top!

Here's a reproduction:

var Fiber = require('fibers');

// Run this every so often so that GC and DestroyOrphans happens.
setInterval(function () {
  Fiber(function () { global.gc() }).run();
}, 1000);

Fiber(function () {
  console.log("TOP");
  try {
    Fiber.yield();
  } finally {
    console.log("finally");
    var f = Fiber.current;
    setTimeout(function () {
      console.log("re-running");
      f.run();
      console.log("re-ran");
    }, 10);
    console.log("yielding");
    Fiber.yield();
    console.log("yeld");
  }
}).run();
console.log("finishing main program");

For me, with Node 0.8.24 and Fibers 1.0.0, this prints:

$ node --expose-gc zombie.js 
TOP
finishing main program
finally
yielding
re-running
TOP
re-ran
finally
yielding
re-running
TOP
re-ran
finally
yielding
re-running
TOP
re-ran
finally
yielding
re-running
TOP
re-ran

and so forth.

@glasser

This comment has been minimized.

Show comment
Hide comment
@glasser

glasser Jul 16, 2013

So I guess the short answer is, I shouldn't be letting a fiber that I care about get GCed. And in fact, the strategy of using try { Fiber.yield } finally { cleanup which involves waiting } is problematic.

But it certainly was surprising that the fiber got re-run from the top!

glasser commented Jul 16, 2013

So I guess the short answer is, I shouldn't be letting a fiber that I care about get GCed. And in fact, the strategy of using try { Fiber.yield } finally { cleanup which involves waiting } is problematic.

But it certainly was surprising that the fiber got re-run from the top!

@glasser

This comment has been minimized.

Show comment
Hide comment
@glasser

glasser Jul 16, 2013

Also occurs with Node 0.10.13, Fibers 1.0.1.

glasser commented Jul 16, 2013

Also occurs with Node 0.10.13, Fibers 1.0.1.

@glasser

This comment has been minimized.

Show comment
Hide comment
@glasser

glasser Jul 16, 2013

At the very least, I think doing this should cause your code to crash, not re-run the fiber from the top. We would have figured out what was going on much earlier if that had happened.

glasser commented Jul 16, 2013

At the very least, I think doing this should cause your code to crash, not re-run the fiber from the top. We would have figured out what was going on much earlier if that had happened.

@glasser

This comment has been minimized.

Show comment
Hide comment
@glasser

glasser Jul 16, 2013

Ah, OK. The second call to Fiber.yield immediately re-throws the "zombie" exception, so by the time the callback runs, the fiber has completely unwound. And you can run fibers multiple times... and that's what the f.run() does.

Now, for me, the f.run() was actually hidden inside a future.return(). I think it is very surprising that future.return() (or throw) can call fiber.run on a fiber where fiber.started is false (ie, a fiber that may have been running when the wait was called but is not any more). Can this happen for any reason other than that the Future.wait got terminated by a zombie exception? If that's the only way that this can happen, can cb in Future.wait check fiber.started and exit with a message otherwise?

glasser commented Jul 16, 2013

Ah, OK. The second call to Fiber.yield immediately re-throws the "zombie" exception, so by the time the callback runs, the fiber has completely unwound. And you can run fibers multiple times... and that's what the f.run() does.

Now, for me, the f.run() was actually hidden inside a future.return(). I think it is very surprising that future.return() (or throw) can call fiber.run on a fiber where fiber.started is false (ie, a fiber that may have been running when the wait was called but is not any more). Can this happen for any reason other than that the Future.wait got terminated by a zombie exception? If that's the only way that this can happen, can cb in Future.wait check fiber.started and exit with a message otherwise?

@laverdet

This comment has been minimized.

Show comment
Hide comment
@laverdet

laverdet Jul 17, 2013

Owner

So I guess the short answer is, I shouldn't be letting a fiber that I care about get GCed. And in fact, the strategy of using try { Fiber.yield } finally { cleanup which involves waiting } is problematic.

Actually if you grab a reference to the fiber in the finally it should interrupt the unrolling process and you'll be just fine.

But it certainly was surprising that the fiber got re-run from the top!

Yeah the overloading of run() to both start and a resume a fiber was a questionable design decision in retrospect. I think the main issue that needs to be addressed here is the fact that future.return() can re-run a thrown fiber future (if I'm understanding correctly). I will have to look into this in more detail for sure.

Owner

laverdet commented Jul 17, 2013

So I guess the short answer is, I shouldn't be letting a fiber that I care about get GCed. And in fact, the strategy of using try { Fiber.yield } finally { cleanup which involves waiting } is problematic.

Actually if you grab a reference to the fiber in the finally it should interrupt the unrolling process and you'll be just fine.

But it certainly was surprising that the fiber got re-run from the top!

Yeah the overloading of run() to both start and a resume a fiber was a questionable design decision in retrospect. I think the main issue that needs to be addressed here is the fact that future.return() can re-run a thrown fiber future (if I'm understanding correctly). I will have to look into this in more detail for sure.

@glasser

This comment has been minimized.

Show comment
Hide comment
@glasser

glasser Jul 17, 2013

Right, I agree that grabbing a reference to the Fiber in finally kept it from being GCed.

I don't think I was actually using FiberFuture, if that's what you meant. But yeah, I think the main issue was that a future.return() re-ran a fiber which had been waiting on that future but which got unwound. (Or well, a fiber which tried to wait on that future but when it called yield, it hit the "I'm trying to unwind, immediately rethrow zombie" check.)

glasser commented Jul 17, 2013

Right, I agree that grabbing a reference to the Fiber in finally kept it from being GCed.

I don't think I was actually using FiberFuture, if that's what you meant. But yeah, I think the main issue was that a future.return() re-ran a fiber which had been waiting on that future but which got unwound. (Or well, a fiber which tried to wait on that future but when it called yield, it hit the "I'm trying to unwind, immediately rethrow zombie" check.)

AlexeyMK pushed a commit to AlexeyMK/meteor that referenced this issue Nov 13, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment