Skip to content
This repository

file stream resume() can cause duplicate data emitted, leading to crash #3258

Closed
davepacheco opened this Issue May 11, 2012 · 2 comments

3 participants

David Pacheco Koichi Kobayashi Craig
David Pacheco

Here's my test program:

var mod_fs = require('fs');

var file = mod_fs.createReadStream('tmpfile');

file.on('open', function () {
    console.log('file opened');
    process.nextTick(function () {
            console.log('pausing for 500ms (bufsize = %d)',
                file.buffer ? file.buffer.length : 0); 
            file.pause();
            setTimeout(function () {
                    console.log('ending delay (bufsize = %d)',
                        file.buffer ? file.buffer.length : 0); 
                    file.resume();
            }, 500);
    }); 
});

file.on('data', function (chunk) {
    console.log('data (length %s) (buffer = %d)', chunk.length,
        file.buffer ? file.buffer.length : 0); 
    file.resume();
});

Run it like this (be sure to create "tmpfile" first -- which can be a tiny file):

$ echo line1 > tmpfile
$ node resume-test.js 
file opened
pausing for 500ms (bufsize = 0)
ending delay (bufsize = 6)
data (length 6) (buffer = 6)
data (length 6) (buffer = 6)
data (length 6) (buffer = 6)
...
data (length 6) (buffer = 6)

timers.js:96
                if (!process.listeners('uncaughtException').length) throw e;
                                                                      ^    RangeError: Maximum call stack size exceeded

As you can see, the program enters an infinite loop, but one where it keeps calling new functions and eventually runs out of stack space and crashes.

When I first saw this, not knowing a good way to debug it, I modified V8 to dump core when an uncaught exception was thrown. (Importantly, it must dump core when the exception is thrown. Doing it on the uncaughtException event doesn't work because the stack has been unwound by that point. The change to do this is pretty trivial: just add a call to abort(3C) in Isolate::DoThrow, around here: https://github.com/joyent/node/blob/master/deps/v8/src/isolate.cc#L1156. To reproduce this, though, you need use something other than setTimeout or process.nextTick in the above example because those catch and rethrow, so you lose the stack by the time you abort.) Then I used mdb_v8 to see what the stack was, and found that my call to resume() was emitting the buffered data here: https://github.com/joyent/node/blob/master/lib/fs.js#L1288.

The problem is that it emits the buffered data, then clears this.buffer. But if the event itself triggers another resume() (as it does here), then it ends up emitting the data again (which is wrong) and then entering the infinite loop until it runs out of stack space and crashes.

Obviously this program is somewhat contrived, but I this is the simplest case I could come up with. I initially hit this bug in a more complex program (node-bunyan), where I was calling resume() very indirectly from the on('data') handler, which seems to me like a totally reasonable thing to do.

Koichi Kobayashi
Collaborator
koichik commented May 11, 2012

@davepacheco - Thanks for investigating.

Can someone review koichik/node@5f9ffa1 (for v0.6)?

Koichi Kobayashi koichik closed this in 5f9ffa1 May 12, 2012
Isaac Z. Schlueter isaacs referenced this issue from a commit May 14, 2012
Isaac Z. Schlueter 2012.05.15 Version 0.6.18 (stable)
* windows: skip GetFileAttributes call when opening a file (Bert Belder)

* crypto: add PKCS12/PFX support (Sambasiva Suda)

* #3240: child_process: delete NODE_CHANNEL_FD from env in spawn (Ben Noordhuis)

* windows: add test for path.normalize with UNC paths (Bert Belder)

* windows: make path.normalize convert all slashes to backslashes (Bert Belder)

* fs: Automatically close FSWatcher on error (Bert Belder)

* #3258: fs.ReadStream.pause() emits duplicate data event (koichik)

* pipe_wrap: don't assert() on pipe accept errors (Ben Noordhuis)

* Better exception output for module load and process.nextTick (Felix Geisendörfer)

* zlib: fix error reporting (Ben Noordhuis)

* http: Don't destroy on timeout (isaacs)

* #3231: http: Don't try to emit error on a null'ed req object (isaacs)

* #3236: http: Refactor ClientRequest.onSocket (isaacs)
bb25001
Isaac Z. Schlueter isaacs referenced this issue from a commit May 14, 2012
Isaac Z. Schlueter 2012.05.15 Version 0.6.18 (stable)
* windows: skip GetFileAttributes call when opening a file (Bert Belder)

* crypto: add PKCS12/PFX support (Sambasiva Suda)

* #3240: child_process: delete NODE_CHANNEL_FD from env in spawn (Ben Noordhuis)

* windows: add test for path.normalize with UNC paths (Bert Belder)

* windows: make path.normalize convert all slashes to backslashes (Bert Belder)

* fs: Automatically close FSWatcher on error (Bert Belder)

* #3258: fs.ReadStream.pause() emits duplicate data event (koichik)

* pipe_wrap: don't assert() on pipe accept errors (Ben Noordhuis)

* Better exception output for module load and process.nextTick (Felix Geisendörfer)

* zlib: fix error reporting (Ben Noordhuis)

* http: Don't destroy on timeout (isaacs)

* #3231: http: Don't try to emit error on a null'ed req object (isaacs)

* #3236: http: Refactor ClientRequest.onSocket (isaacs)
4bc1d39
Craig
craigyk commented May 18, 2012

Did this bug creep in recently? I have been noticing that a resumable file uploader I wrote to handle large uploads has recently been saving corrupted data under heavy load, and I don't remember this happening before. I'll upgrade to .18 and see if it is still happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.