Fiber.poolSize bug (or expected behaviour?) #169

Closed
yortus opened this Issue Apr 10, 2014 · 6 comments

Comments

Projects
None yet
2 participants

yortus commented Apr 10, 2014

In running various benchmarks for asyncawait, I found that in certain scenarios there were severe memory leaks. After a great deal of forensic work, I pinpointed the problem. The memory leaks occurred in a scenario with both high concurrency and recursion, where the number of currently active fibers exceeded the value of Fiber.poolSize.

Upon examining the node-fibers source code I found this in couroutine.cc (line 197 onward):

        if (fiber_pool.size() < pool_size) {
            fiber_pool.push_back(this);
        } else {
            // TODO?: This assumes that we didn't capture any keys with dtors. This may not always be
            // true, and if it is not there will be memory leaks.

            // Can't delete right now because we're currently on this stack!
            assert(delete_me == NULL);
            delete_me = this;
        }

My C++ is pretty rusty so I haven't tried to fully comprehend what's going on here, but it's pretty clear that my code is hitting the TODO? branch of the if statement and getting the memory leak.

I have mitigated the problem by automatically adjusting Fiber.poolSize at runtime as required.

I have three questions:

  1. Is this a bug or working as designed?
  2. If it is a bug, can you explain what's going on here (the comment seems to hint that's its hard to fix)
  3. If not a bug, and managing Fiber.poolSize is the right way to go, can this be documented somewhere to save others the many hours to track down such a leak? I'm not even sure if Fiber.poolSize is an official part of the API (is it?).
Owner

laverdet commented Jul 1, 2014

Shouldn't be a memory leak. The TODO refers to pthread dtors, which have been eliminated from fibers entirely. The coroutine is memoized in delete_me and deleted when we can, and great case is taken to ensure that no more than one coroutine is waiting to be deleted (that's what the assert is for). It's still very possible there's a leak somewhere but I don't think that's it.

yortus commented Jul 2, 2014

Hi, thanks for looking at this. I've made a repro that demonstrates the problem on my setup. Here it is:

var Fiber = require('fibers');


//========================================
// node-fibers issue #169 repro (https://github.com/laverdet/node-fibers/issues/169)
// Expected behaviour:
//     prints dots on the console at a constant rate until terminated, regardless of settings below.
// Actual behaviour:
//     gets progressively slower if numberOfFibers > Fiber.poolSize (see comments below).

// Uncomment one of the following lines to test that many concurrent fibers.
// NB: The problem behaviour starts at 120 fibers, which is node-fiber's default pool size.
var numberOfFibers = 100; // Works with consistent GC performance
//var numberOfFibers = 130; // Works, but gets progressively slower (mem leak?)
//var numberOfFibers = 200; // Works, but gets progressively slower (mem leak?)
//var numberOfFibers = 500; // Works, but gets progressively slower (mem leak?)

// Uncomment one of the following lines to adjust node-fiber's pool size. The default is 120.
// NB: The problem cases above work fine if the pool size is greater that the number of fibers.
Fiber.poolSize = 120;
//Fiber.poolSize = 500;
//========================================


// Create the specified number of fibers, let them finish, then repeat...
setInterval(function () {
    process.stdout.write('.');
    startConcurrentTasks(numberOfFibers, 5);
}, 10);


/** Create 'count' concurrent fibers whose functions each finish after 'ms' milliseconds. */
function startConcurrentTasks(count, ms) {
    for (var i = 0; i < count; ++i) {
        startTask(ms);
    }
}


/** Create a fiber whose function finishes after 'ms' milliseconds. */
function startTask (ms) {
    Fiber(function () {
        pause(ms);
    }).run();
}


/** Pause the current fiber for 'ms' milliseconds. */
function pause(ms) {
    var fiber = Fiber.current;
    setTimeout(function () { fiber.run(); }, ms);
    Fiber.yield();
}

I'm running this on Windows 7 x64 with Node 0.10.25.

Owner

laverdet commented Dec 18, 2014

Sorry again for the delay. I'm really bad at github.

Anyway this is a very interesting bug. I've tracked it down to an issue in v8 and have submitted a bug report to their tracker directly: http://code.google.com/p/v8/issues/detail?id=3777

I have a workaround in mind, so I should at least be able to get fibers running fast in this case.

Owner

laverdet commented Dec 18, 2014

Could you try out master and let me know if you notice any differences? 92897f8 is the commit that takes care of it

yortus commented Dec 21, 2014

@laverdet your changes in master seem to have fixed the problem as far as I can tell. I ran the benchmarks in asyncawait that used to show the slowdown over time, and they perform consistently fast now. Nice one.

Owner

laverdet commented Dec 21, 2014

Yay

@laverdet laverdet closed this Dec 21, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment