Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Default highWaterMark value of 16,384 may not be appropriate for objectMode #9304

Closed
bloudermilk opened this issue Feb 28, 2015 · 1 comment

Comments

@bloudermilk
Copy link

I hope this is the right place to leave this comment/suggestion. In the last few days I've been implementing a fairly complex system of streams to process high-volumes of data. I've been pleased with speed and ease the API affords, but got caught up today with a memory issue related to objectMode. In my scenario I had a series of piped streams doing some data processing:

  • fs.createReadStream()
  • lib.createGzip()
  • parse (via csv-parse)
  • Transform({ objectMode: true }) (fast)
  • WriteStream({ objectMode: true }) (slow)

In my case, the Transform stream is where I started to have issues. The purpose of the stream was to group incoming objects into Arrays of 1,000 for bulk processing in the WriteStream. Since the default highWaterMark had been working for me elsewhere I didn't really think about it, but when we started processing CSV files with 40M+ lines we saw incredibly fast memory growth and eventually our processes died from FATAL ERROR: Malloced operator new Allocation failed - process out of memory. The issue was fairly easy to find and obvious in hindsight–having an array of 1,000 objects just increased our potential memory usage by three orders of magnitude. The elements in the array were bson.ObjectId objects, which according to sizeof take up roughly 240 bytes of memory. A grand total of 3.66GB of memory if the buffer fills.

There is no bug in Node that caused this, just my own oversight. However, since the default highWaterMark of 16,384 was picked for it's insignificant use of 16kb memory when used in Buffer mode, it may be worth it to use a different default when using streams in objectMode. Considering that even "simple" objects with a couple of keys referencing string or integer values can easily take up 20-30 bytes, it's safe to assume that in general objectMode is going to consume much more memory than its Buffer equivalent.

An alternative or complimentary solution to lowering the default highWaterMark for objectMode would be to note in the docs that an appropriate highWaterMark should almost certainly be set based on the potential of the buffer filling and the size of the buffered objects.

Thanks for listening!

Edit: I should also note that I'm happy to send a pull request for either or both of these suggested changes if the maintainers find them agreeable.

@bloudermilk bloudermilk changed the title highWaterMark default of 16,384 may not be appropriate for objectMode Default highWaterMark value of 16,384 may not be appropriate for objectMode Feb 28, 2015
@bloudermilk
Copy link
Author

Well, it seems I'm a bit late to this party. Checking the Node 0.12.x docs I can see that the default is now a more sensible 16. We're on 0.10.x., clearly 😄

richardlau added a commit to ibmruntimes/node that referenced this issue Jan 21, 2017
node.h was modifed by nodejs#9304 to include tracing/trace_event.h but
tracing/trace_event.h was not added to the headers installed by
tools/install.py.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant