Default highWaterMark value of 16,384 may not be appropriate for objectMode #9304

bloudermilk · 2015-02-28T09:10:24Z

I hope this is the right place to leave this comment/suggestion. In the last few days I've been implementing a fairly complex system of streams to process high-volumes of data. I've been pleased with speed and ease the API affords, but got caught up today with a memory issue related to objectMode. In my scenario I had a series of piped streams doing some data processing:

fs.createReadStream()
lib.createGzip()
parse (via csv-parse)
Transform({ objectMode: true }) (fast)
WriteStream({ objectMode: true }) (slow)

In my case, the Transform stream is where I started to have issues. The purpose of the stream was to group incoming objects into Arrays of 1,000 for bulk processing in the WriteStream. Since the default highWaterMark had been working for me elsewhere I didn't really think about it, but when we started processing CSV files with 40M+ lines we saw incredibly fast memory growth and eventually our processes died from FATAL ERROR: Malloced operator new Allocation failed - process out of memory. The issue was fairly easy to find and obvious in hindsight–having an array of 1,000 objects just increased our potential memory usage by three orders of magnitude. The elements in the array were bson.ObjectId objects, which according to sizeof take up roughly 240 bytes of memory. A grand total of 3.66GB of memory if the buffer fills.

There is no bug in Node that caused this, just my own oversight. However, since the default highWaterMark of 16,384 was picked for it's insignificant use of 16kb memory when used in Buffer mode, it may be worth it to use a different default when using streams in objectMode. Considering that even "simple" objects with a couple of keys referencing string or integer values can easily take up 20-30 bytes, it's safe to assume that in general objectMode is going to consume much more memory than its Buffer equivalent.

An alternative or complimentary solution to lowering the default highWaterMark for objectMode would be to note in the docs that an appropriate highWaterMark should almost certainly be set based on the potential of the buffer filling and the size of the buffered objects.

Thanks for listening!

Edit: I should also note that I'm happy to send a pull request for either or both of these suggested changes if the maintainers find them agreeable.

The text was updated successfully, but these errors were encountered:

bloudermilk · 2015-02-28T09:20:31Z

Well, it seems I'm a bit late to this party. Checking the Node 0.12.x docs I can see that the default is now a more sensible 16. We're on 0.10.x., clearly 😄

node.h was modifed by nodejs#9304 to include tracing/trace_event.h but tracing/trace_event.h was not added to the headers installed by tools/install.py.

bloudermilk changed the title ~~highWaterMark default of 16,384 may not be appropriate for objectMode~~ Default highWaterMark value of 16,384 may not be appropriate for objectMode Feb 28, 2015

bloudermilk closed this as completed Feb 28, 2015

richardlau added a commit to ibmruntimes/node that referenced this issue Jan 21, 2017

build: add tracing/trace_event.h to tarballs

51a8d37

node.h was modifed by nodejs#9304 to include tracing/trace_event.h but tracing/trace_event.h was not added to the headers installed by tools/install.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default highWaterMark value of 16,384 may not be appropriate for objectMode #9304

Default highWaterMark value of 16,384 may not be appropriate for objectMode #9304

bloudermilk commented Feb 28, 2015

bloudermilk commented Feb 28, 2015

Default highWaterMark value of 16,384 may not be appropriate for objectMode #9304

Default highWaterMark value of 16,384 may not be appropriate for objectMode #9304

Comments

bloudermilk commented Feb 28, 2015

bloudermilk commented Feb 28, 2015