You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.
I hope this is the right place to leave this comment/suggestion. In the last few days I've been implementing a fairly complex system of streams to process high-volumes of data. I've been pleased with speed and ease the API affords, but got caught up today with a memory issue related to objectMode. In my scenario I had a series of piped streams doing some data processing:
In my case, the Transform stream is where I started to have issues. The purpose of the stream was to group incoming objects into Arrays of 1,000 for bulk processing in the WriteStream. Since the default highWaterMark had been working for me elsewhere I didn't really think about it, but when we started processing CSV files with 40M+ lines we saw incredibly fast memory growth and eventually our processes died from FATAL ERROR: Malloced operator new Allocation failed - process out of memory. The issue was fairly easy to find and obvious in hindsight–having an array of 1,000 objects just increased our potential memory usage by three orders of magnitude. The elements in the array were bson.ObjectId objects, which according to sizeof take up roughly 240 bytes of memory. A grand total of 3.66GB of memory if the buffer fills.
There is no bug in Node that caused this, just my own oversight. However, since the default highWaterMark of 16,384 was picked for it's insignificant use of 16kb memory when used in Buffer mode, it may be worth it to use a different default when using streams in objectMode. Considering that even "simple" objects with a couple of keys referencing string or integer values can easily take up 20-30 bytes, it's safe to assume that in general objectMode is going to consume much more memory than its Buffer equivalent.
An alternative or complimentary solution to lowering the default highWaterMark for objectMode would be to note in the docs that an appropriate highWaterMark should almost certainly be set based on the potential of the buffer filling and the size of the buffered objects.
Thanks for listening!
Edit: I should also note that I'm happy to send a pull request for either or both of these suggested changes if the maintainers find them agreeable.
The text was updated successfully, but these errors were encountered:
bloudermilk
changed the title
highWaterMark default of 16,384 may not be appropriate for objectMode
Default highWaterMark value of 16,384 may not be appropriate for objectMode
Feb 28, 2015
Well, it seems I'm a bit late to this party. Checking the Node 0.12.x docs I can see that the default is now a more sensible 16. We're on 0.10.x., clearly 😄
node.h was modifed by nodejs#9304 to include tracing/trace_event.h but
tracing/trace_event.h was not added to the headers installed by
tools/install.py.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I hope this is the right place to leave this comment/suggestion. In the last few days I've been implementing a fairly complex system of streams to process high-volumes of data. I've been pleased with speed and ease the API affords, but got caught up today with a memory issue related to
objectMode
. In my scenario I had a series ofpipe
d streams doing some data processing:fs.createReadStream()
lib.createGzip()
parse
(via csv-parse)Transform({ objectMode: true })
(fast)WriteStream({ objectMode: true })
(slow)In my case, the Transform stream is where I started to have issues. The purpose of the stream was to group incoming objects into
Array
s of 1,000 for bulk processing in theWriteStream
. Since the defaulthighWaterMark
had been working for me elsewhere I didn't really think about it, but when we started processing CSV files with 40M+ lines we saw incredibly fast memory growth and eventually our processes died fromFATAL ERROR: Malloced operator new Allocation failed - process out of memory
. The issue was fairly easy to find and obvious in hindsight–having an array of 1,000 objects just increased our potential memory usage by three orders of magnitude. The elements in the array werebson.ObjectId
objects, which according to sizeof take up roughly 240 bytes of memory. A grand total of 3.66GB of memory if the buffer fills.There is no bug in Node that caused this, just my own oversight. However, since the default
highWaterMark
of 16,384 was picked for it's insignificant use of 16kb memory when used in Buffer mode, it may be worth it to use a different default when using streams inobjectMode
. Considering that even "simple" objects with a couple of keys referencing string or integer values can easily take up 20-30 bytes, it's safe to assume that in generalobjectMode
is going to consume much more memory than its Buffer equivalent.An alternative or complimentary solution to lowering the default
highWaterMark
forobjectMode
would be to note in the docs that an appropriatehighWaterMark
should almost certainly be set based on the potential of the buffer filling and the size of the buffered objects.Thanks for listening!
Edit: I should also note that I'm happy to send a pull request for either or both of these suggested changes if the maintainers find them agreeable.
The text was updated successfully, but these errors were encountered: