New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: partition readFile to avoid threadpool exhaustion #17054

Closed
wants to merge 1 commit into
base: master
from

Conversation

@davisjam
Contributor

davisjam commented Nov 15, 2017

Problem
Node implements fs.readFile as a call to stat, followed by a C++ -> libuv request
to read the entire file based on the size reported by stat.

Why is this bad?
The effect is to place on the libuv threadpool a potentially-large read request,
occupying the libuv thread until it completes.
While readFile certainly requires buffering the entire file contents,
it can partition the read into smaller buffers (as is done on other read paths)
along the way to avoid threadpool squatting.

If the file is relatively large or stored on a slow medium,
reading the entire file in one shot seems particularly harmful,
and presents a possible DoS vector.

Downsides to partitioning?

  1. Correctness: I don't think partitioning the read like this raises any additional risk of read-write races on the FS. If the application is concurrently readFile'ing and modifying the file, it will already see funny behavior. Though libuv uses preadv where available, this doesn't guarantee read atomicity in the presence of concurrent writes.

  2. Performance implications:
    a. Downside: Partitioning means that a single large readFile will be broken into many "out and back" requests to libuv, introducing overhead.
    b. Upside: In between each "out and back", other work pending on the threadpool can take a turn. In short, although partitioning will slow down a large request, it will lead to better throughput if the threadpool is handling more than one type of request.

Related
It might be that writeFile has similar behavior. The writeFile path is a bit more complex and I didn't investigate carefully.

Fix approach
Simple -- instead of reading in one shot, partition the read length using kReadFileBufferLength.

Test
I introduced a new test to ensure that fs.readFile works for files smaller and larger than kReadFileBufferLength. It works.

Performance:

  1. Machine details:
    $ uname -a
    Linux jamie-Lenovo-K450e 4.8.0-56-generic #61~16.04.1-Ubuntu SMP Wed Jun 14 11:58:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

  2. Excerpts from lscpu:
    Architecture: x86_64
    CPU(s): 8
    Thread(s) per core: 2
    Core(s) per socket: 4
    Socket(s): 1
    Model name: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
    CPU MHz: 1499.194

  3. benchmark/fs/readfile.js

Summary
Benchmarks using benchmark/fs/readfile.js are unfavorable. I ran three iterations with my change and three with an unmodified version. Performance within a version was similar across the three iterations, so I report only the third iteration for each.

  • comparable performance on the 1KB file
  • significant performance degradation on the 16MB file (4-5x decrease)

With partitioned read:

$ for i in `seq 1 3`; do /tmp/node-part/node benchmark/fs/readfile.js; done
...
fs/readfile.js concurrent=1 len=1024 dur=5: 42,836.45194074361
fs/readfile.js concurrent=10 len=1024 dur=5: 94,170.12611909183
fs/readfile.js concurrent=1 len=16777216 dur=5: 71.79583090225451
fs/readfile.js concurrent=10 len=16777216 dur=5: 163.98033223174818

Without change:

$ for i in `seq 1 3`; do /tmp/node-orig/node benchmark/fs/readfile.js; done
...
fs/readfile.js concurrent=1 len=1024 dur=5: 43,815.347866646596
fs/readfile.js concurrent=10 len=1024 dur=5: 93,783.59180605657
fs/readfile.js concurrent=1 len=16777216 dur=5: 339.77196820103387
fs/readfile.js concurrent=10 len=16777216 dur=5: 592.325183524534
  1. benchmark/fs/readfile-clogging.js

As discussed above, the readfile.js benchmark doesn't tell the whole story. The contention of this PR is that the 16MB reads will clog the threadpool, disadvantaging other work contending for the threadpool. I've introduced a new benchmark to characterize this.

Benchmark summary: I copied readfile.js and added a small asynchronous zlib operation to compete for the threadpool. If a non-partitioned readFile is clogging the threadpool, there will be a relatively small number of zips.

Performance summary:

  • Small file: No difference whether 1 read or 10
  • Large file: With 1 read, some effect (1 thread is always reading, but 3 threads remain for zip). With 10 reads, huge effect (zips get a fair share of the threadpool when partitoined). 61K zips with partitioned, 700 zips without.

Partitioned:

$ for i in `seq 1 3`; do /tmp/node-part/node benchmark/fs/readfile-clogging.js; done
...
bench ended, reads 96464 zips 154582
fs/readfile-clogging.js concurrent=1 len=1024 dur=5: 19,289.8420223229
fs/readfile-clogging.js concurrent=1 len=1024 dur=5: 30,909.421907455828
bench ended, reads 332932 zips 62896
fs/readfile-clogging.js concurrent=10 len=1024 dur=5: 66,572.28049862666
fs/readfile-clogging.js concurrent=10 len=1024 dur=5: 12,575.639939453387
bench ended, reads 149 zips 149574
fs/readfile-clogging.js concurrent=1 len=16777216 dur=5: 29.793230608569676
fs/readfile-clogging.js concurrent=1 len=16777216 dur=5: 29,905.935378334147
bench ended, reads 623 zips 61745
fs/readfile-clogging.js concurrent=10 len=16777216 dur=5: 124.57446300744513
fs/readfile-clogging.js concurrent=10 len=16777216 dur=5: 12,345.553950958118

Non-partitioned:

$ for i in `seq 1 3`; do /tmp/node-orig/node benchmark/fs/readfile-clogging.js; done
...
bench ended, reads 92559 zips 153226
fs/readfile-clogging.js concurrent=1 len=1024 dur=5: 18,510.65052192176
fs/readfile-clogging.js concurrent=1 len=1024 dur=5: 30,641.12621937156
bench ended, reads 332066 zips 62739
fs/readfile-clogging.js concurrent=10 len=1024 dur=5: 66,396.6979771542
fs/readfile-clogging.js concurrent=10 len=1024 dur=5: 12,543.801322137173
bench ended, reads 1554 zips 98886
fs/readfile-clogging.js concurrent=1 len=16777216 dur=5: 310.708121371412
fs/readfile-clogging.js concurrent=1 len=16777216 dur=5: 19,769.932924561737
bench ended, reads 2759 zips 703
fs/readfile-clogging.js concurrent=10 len=16777216 dur=5: 550.9968714783075
fs/readfile-clogging.js concurrent=10 len=16777216 dur=5: 140.38479443398438

Issue:
This commit addresses #17047.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

fs

@davisjam davisjam force-pushed the davisjam:PartitionReadFile branch from 49d3297 to 16195e0 Nov 15, 2017

@davisjam

This comment has been minimized.

Contributor

davisjam commented Nov 15, 2017

Working on linter errors.

@benjamingr benjamingr changed the title from Partition readFile to avoid threadpool exhaustion to fs: partition readFile to avoid threadpool exhaustion Nov 16, 2017

@benjamingr

This comment has been minimized.

Member

benjamingr commented Nov 16, 2017

Thanks for following up, pinging @nodejs/fs for review.

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Nov 16, 2017

I can see how this is a concern in a theoretical sense but I don't remember any bug reports where it was an actual issue. Seems like premature (de)optimization.

@davisjam

This comment has been minimized.

Contributor

davisjam commented Nov 16, 2017

@bnoordhuis I wouldn't call this a de-optimization. It optimizes the throughput of the threadpool in its entirety by increasing the number of requests that a readFile makes. It's an optimization for throughput, at the cost of the latency of large readFiles.

I think this is in the spirit of Node.js: handle many client requests simultaneously on a small number of threads, and don't do too much work in one shot on any of the threads. This is already the approach taken by a readStream.

The benchmark/fs/readfile-clogging.js demonstrates this:
Reading a 16MB file:

  • 1 thread: partitioning yields 149K zips vs. 98K zips currently
  • 10 threads: partitioning yields 60K zips vs. 700 zips currently

FWIW The latency cost can be largely mitigated with the use of a more reasonably-sized buffer. The 1-thread numbers for readfile.js improve to a 30% degradation and the 10-thread numbers are comparable to the non-partitioned performance.

benchmark/fs/readfile.js

Here's the (rounded) readFile throughout for various read lengths on the 16MB file:

With an 8KB buffer:

fs/readfile.js concurrent=1 len=16777216 dur=5: 71
fs/readfile.js concurrent=10 len=16777216 dur=5: 163

With a 64KB buffer:

fs/readfile.js concurrent=1 len=16777216 dur=5: 222
fs/readfile.js concurrent=10 len=16777216 dur=5: 534

With the full 16MB file in one shot:

fs/readfile.js concurrent=1 len=16777216 dur=5: 339
fs/readfile.js concurrent=10 len=16777216 dur=5: 592

benchmark/fs/readfile-clogging.js

Here's the (rounded) zip throughout for various read sizes:

With an 8KB buffer:

fs/readfile-clogging.js concurrent=1 len=16777216 dur=5: 29,905
fs/readfile-clogging.js concurrent=10 len=16777216 dur=5: 12,345

With a 64KB buffer:

fs/readfile-clogging.js concurrent=1 len=16777216 dur=5: 33,201
fs/readfile-clogging.js concurrent=10 len=16777216 dur=5: 6,995

With the full 16MB file in one shot:

fs/readfile-clogging.js concurrent=1 len=16777216 dur=5: 19,769
fs/readfile-clogging.js concurrent=10 len=16777216 dur=5: 140

Conclusion
If we use a 64KB buffer size for readFile, there will be a 10% increase in readFile latency but a 50x increase in the ability of small concurrent threadpool operations to get a turn.

Admittedly the zip operation I'm using in readfile-clogging.js is tiny, so this is a particularly favorable comparison. I'm happy to try it with a more "reasonable" competing operation if anyone would like to suggest one.

@mscdex mscdex added the performance label Nov 16, 2017

console.log(`bench ended, reads ${reads} zips ${zips}`);
bench_ended = true;
bench.end(reads);
bench.end(zips);

This comment has been minimized.

@mscdex

mscdex Nov 16, 2017

Contributor

Calling this twice does not make sense and will break compare.js which is expecting one bench.end() per benchmark.

This comment has been minimized.

@davisjam

davisjam Nov 16, 2017

Contributor

@mscdex Thanks for pointing this out. How ought I report throughput for two separate variables like this?

This comment has been minimized.

@mscdex

mscdex Nov 16, 2017

Contributor

You can't. Perhaps just combine both values for total fulfilled requests per second?

This comment has been minimized.

@davisjam

davisjam Nov 16, 2017

Contributor

OK. I'll leave in the console.log then so the distinction between request type is clear.

bench.end(reads);
bench.end(zips);
try { fs.unlinkSync(filename); } catch (e) {}
process.exit(0);

This comment has been minimized.

@mscdex

mscdex Nov 16, 2017

Contributor

This isn't really safe since process.send() used by bench.end() is not synchronous. It's better to just return early in afterRead() and afterZip() when bench_ended === true and let the process exit naturally.

This comment has been minimized.

@davisjam

davisjam Nov 16, 2017

Contributor

OK, this process.exit(0) is just a copy/paste from benchmark/fs/readfile.js. I'll fix this in both places.

var reads = 0;
var zips = 0;
var bench_ended = false;

This comment has been minimized.

@mscdex

mscdex Nov 16, 2017

Contributor

Minor nit, but lower camelCase is typically used in JS portions of node core and underscores are typically used in C++ portions.

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Nov 16, 2017

I wouldn't call this a de-optimization. It optimizes the throughput of the threadpool in its entirety by increasing the number of requests that a readFile makes. It's an optimization for throughput, at the cost of the latency of large readFiles.

I understand that. My point is that no one complained so far - people aren't filing bug reports. To me that suggests it's mostly a theoretical issue.

Meanwhile, the proposed changes will almost certainly regress some workloads and people are bound to take notice of that.

@davisjam

This comment has been minimized.

Contributor

davisjam commented Nov 16, 2017

Meanwhile, the proposed changes will almost certainly regress some workloads and people are bound to take notice of that.

True. But other workloads should be accelerated -- it's the readfile.js vs. readfile-clogging.js tradeoff.

@mscdex

This comment has been minimized.

Contributor

mscdex commented Nov 16, 2017

I think I agree with Ben here. Anyone who wants a file read in chunks can just use fs.createReadStream().

@YurySolovyov

This comment has been minimized.

YurySolovyov commented Nov 16, 2017

maybe add a separate API instead?
.readFile is simpler to use (over streams) because you don't have to manage assembling the final result.

@davisjam

This comment has been minimized.

Contributor

davisjam commented Nov 16, 2017

Anyone who wants a file read in chunks can just use fs.createReadStream().

And if they want to make giant reads, they can use fs.read().

But if they've opted for the simplicity offs.readFile(), I think the framework should Do The Right Thing -- namely, optimize threadpool throughput, not single request latency. Presumably some kind of chunking/partitioning is done by crypto and compression as well?

@mscdex

This comment has been minimized.

Contributor

mscdex commented Nov 16, 2017

On an unrelated note, please do not @ mention me in commit messages.

@davisjam davisjam force-pushed the davisjam:PartitionReadFile branch from 6598119 to b9971e4 Nov 16, 2017

@davisjam

This comment has been minimized.

Contributor

davisjam commented Nov 16, 2017

Fixed, sorry.

@refack

This comment has been minimized.

Member

refack commented Nov 16, 2017

Hello @davisjam and thank you for the contribution 🎩

Sure looks like you did a lot of research and experimentation, and I really appreciate that.

I think the framework should Do The Right Thing -- namely, optimize threadpool throughput, not single request latency

There are several assumptions and rule-of-thumb optimizations around the uv threadpool [addition: based on empirical experience, and feedback]. One of those is that since the pool serves I/O bound operations, a small pool is enough. As such doing multiple interleaved FS operations is an anti-pattern.
As for "doing the right thing", I would go in a different way that has less concurrency, not more. Check that the uv threadpool is not all consumed doing the same operation.

@davisjam

This comment has been minimized.

Contributor

davisjam commented Nov 16, 2017

One of those is that since the pool serves I/O bound operations, a small pool is enough. As such doing multiple interleaved FS operations is an anti-pattern. As for "doing the right thing", I would go in a different way that has less concurrency, not more. Check that the uv threadpool is not all consumed doing the same operation.

@refack Perhaps I misunderstand you, but this PR does not increase concurrency. With my patch, an fs.readFile results in more requests to the threadpool, but each such request is submitted when the previous one completes, hand-over-hand. [addition: Of course, a server might fs.readFile on behalf of different clients concurrently, but there will be one task per ongoing fs.readFile in the queue.]

I agree that a small pool is good for certain activities, but reading large files in one shot is not one of them. A small pool suffices so long as each task doesn't take too long, but a long-running task on a small pool monopolizes its assigned thread, degrading the task throughput of the pool. Then indeed (one thread in) "the uv threadpool is all consumed doing the same operation." [addition: The trouble is that on Linux, a thread still performs the I/O-bound task synchronously, since approaches like KAIO have been rejected (see here). So if the task is long-running, the thread blocks for a long time.]

If the threadpool is used solely for "large" tasks, there's no problem -- each task takes a long time anyway, and partitioning them just adds overhead. But if the threadpool is used for a mix of larger and smaller tasks (e.g. serving different sized files to different clients, running compression and file I/O concurrently, etc.), then the larger tasks will harm the throughput of the smaller tasks. In my benchmark/fs/readfile-clogging.js benchmark, the small task throughput improves by 50x if you partition the large reads.

@jasnell

This comment has been minimized.

Member

jasnell commented Nov 16, 2017

I definitely appreciate the work here, but I think I'm also falling on the -1 side on this. I think a better approach would be to simply increase efforts to warn dev's away from using fs.readFile() for large files that cannot be read in a single uv roundtrip. Anything beyond that should be deferred to using either fs.read()or fs.createReadStream(). Using fs.readFile() to read anything larger than that is an anti-pattern that I really do not think we should be encouraging.

@davisjam

This comment has been minimized.

Contributor

davisjam commented Nov 16, 2017

@jasnell Thanks for your input!

  1. I agree that fs.readFile is not a good idea in the server context, though of course it's fine for scripting purposes. I'm planning to include this discussion as part of this proposed guide, if there's interest from the nodejs.org folks.

  2. However, all the documentation in the world won't stop a new developer from making a mistake. If we agree that "anything larger than [small files] should be deferred to fs.read() or fs.createReadStream()", then surely partitioning fs.readFile() in the style of fs.createReadStream() is an appropriate step. I don't think doing so encourages bad developer behavior -- it's just ensuring that if the developer has made a mistake, they won't pay too much for it. Do The Right Thing and so on.

My 64KB benchmark suggests that scripts that use fs.readFile() shouldn't suffer overmuch from partitioning (they still read 8GB's worth of the same 16MB file in 5 seconds), and that this partitioning stands to benefit some kinds of servers.

@davisjam

This comment has been minimized.

Contributor

davisjam commented Nov 16, 2017

warn dev's away from using fs.readFile() for large files that cannot be read in a single uv roundtrip

Right, but the current fs.readFile() behavior is to read any file in one uv roundtrip, regardless of its size. If a dev is reading small files with fs.readFile(), this PR will have no effect on performance. If the dev is reading large files with fs.readFile(), (1) they shouldn't be, but (2) we can still help them out.

@davisjam

This comment has been minimized.

Contributor

davisjam commented Nov 17, 2017

Found a few minutes for some deeper benchmarking...I've collected measurements across a range of partition sizes to give a better sense of the tradeoffs between degrading readFile performance and improving threadpool throughput.

I looked at the following partition sizes in KB: 4 8 16 32 64 128 256 512 1024 4096 16384. At each stage I doubled the partition size until I reached 1024 KB (1MB), at which point I quadrupled to 4MB and again to 16MB. The final partition size, 16384KB (16MB), is the size of the file being read, so this last size is the baseline, equivalent to the current behavior of Node.

The numbers I'm reporting represent a single run of the benchmarks on the machine described above, which is otherwise idle. Since it's just one run for each partition size, these numbers are just an estimate.

Excerpting the "1 and 10 concurrent readFile's on a 16MB file" performance from benchmark/fs/readfile.js:

4 KB
fs/readfile.js concurrent=1 len=16777216 dur=5: 39
fs/readfile.js concurrent=10 len=16777216 dur=5: 88
8 KB
fs/readfile.js concurrent=1 len=16777216 dur=5: 67
fs/readfile.js concurrent=10 len=16777216 dur=5: 162
16 KB
fs/readfile.js concurrent=1 len=16777216 dur=5: 111
fs/readfile.js concurrent=10 len=16777216 dur=5: 364
32 KB
fs/readfile.js concurrent=1 len=16777216 dur=5: 146
fs/readfile.js concurrent=10 len=16777216 dur=5: 514
64 KB
fs/readfile.js concurrent=1 len=16777216 dur=5: 214
fs/readfile.js concurrent=10 len=16777216 dur=5: 575
128 KB
fs/readfile.js concurrent=1 len=16777216 dur=5: 212
fs/readfile.js concurrent=10 len=16777216 dur=5: 566
256 KB
fs/readfile.js concurrent=1 len=16777216 dur=5: 292
fs/readfile.js concurrent=10 len=16777216 dur=5: 538
512 KB
fs/readfile.js concurrent=1 len=16777216 dur=5: 205
fs/readfile.js concurrent=10 len=16777216 dur=5: 523
1024 KB
fs/readfile.js concurrent=1 len=16777216 dur=5: 246
fs/readfile.js concurrent=10 len=16777216 dur=5: 492
4096 KB
fs/readfile.js concurrent=1 len=16777216 dur=5: 274
fs/readfile.js concurrent=10 len=16777216 dur=5: 477
16384 KB
fs/readfile.js concurrent=1 len=16777216 dur=5: 356
fs/readfile.js concurrent=10 len=16777216 dur=5: 577

Excerpting the "10 concurrent readFile's on a 16MB file" performance from benchmark/fs/readfile-clogging.js:

for size in 4 8 16 32 64 128 256 512 1024 4096 16384; do echo "$size KB"; echo; PARTITION_SIZE_KB=$size /tmp/node-part-cfg/node benchmark/fs/readfile-clogging.js | tee /tmp/o_clogging_$size; echo; done
4 KB
bench ended: reads 300, zips 62511, total ops 62811
8 KB
bench ended: reads 600, zips 61582, total ops 62182
16 KB
bench ended: reads 1139, zips 59401, total ops 60540
32 KB
bench ended: reads 1779, zips 47918, total ops 49697
64 KB
bench ended: reads 2258, zips 35383, total ops 37641
128 KB
bench ended: reads 2305, zips 19380, total ops 21685
256 KB
bench ended: reads 2743, zips 11553, total ops 14296
512 KB
bench ended: reads 2409, zips 5553, total ops 7962
1024 KB
bench ended: reads 2268, zips 2889, total ops 5157
4096 KB
bench ended: reads 2298, zips 1053, total ops 3351
16384 KB
bench ended: reads 2767, zips 693, total ops 3460

Summarizing these results:

  1. readfile.js:
    a. The relationship between partition size and read rate in the 1-reader case is unclear. The best performance is at 16MB (356 reads/second at one read per readFile), but other pretty good points were 256KB (292 reads/second) and 4096KB (274 reads/second).
    b. The relationship between partition size and read rate in the 10-reader case is also unclear. The high point was again 16MB (577 reads/second), but 64KB (575 reads/sec) and 128KB (566 reads/sec) were also contenders.
  2. readfile-clogging.js: Unsurprisingly, the number of zips is generally inversely proportional (roughly linearly) with the partition size, or linearly proportional to the number of partitions. The more partitions, the more turns the zip job gets.

Recommendation:

The 1-reader case seems pretty unrealistic, so let's focus on the 10-reader case. It looks to me like if we go with a 64KB partition, for pure readFile we face somewhere between a 10% drop in throughput (reported earlier) and a negligible drop in throughput (in this data). For this we get a 50x improvement in throughput for contending threadpool jobs. For better readFile performance, a larger blocksize could be used while still improving overall threadpool throughput.

Since the patch is a one-liner, nothing fancy, this seems like a pretty good trade to me.

As has been discussed, best practice is certainly not to use fs.readFile for serving files. But for users who are doing so, I think this patch could give them nice performance improvements for free.

Docs:
I agree with @jasnell that urging developers to avoid fs.readFile in server contexts is a good idea. I'm also happy to pursue a docs change and/or a longer guide in this direction.

davisjam added a commit to davisjam/node that referenced this pull request Nov 20, 2017

doc: fs.readFile is async but not partitioned
This change was suggested during the discussion of nodejs#17054.

@davisjam davisjam referenced this pull request Nov 20, 2017

Closed

doc: fs.readFile is async but not partitioned #17154

2 of 2 tasks complete
@davisjam

This comment has been minimized.

Contributor

davisjam commented Nov 21, 2017

But if they've opted for the simplicity of fs.readFile(), I think the framework should Do The Right Thing -- namely, optimize threadpool throughput, not single request latency. Presumably some kind of chunking/partitioning is done by crypto and compression as well?

Actually, I just checked the crypto module. It does not chunk/partition large requests.

The following example will not print "Short buf finished" until there are no more long requests in the threadpool queue and one of the workers picks up the short request.

var nBytes = 10 * 1024*1024; /* 50 MB */
var nLongRequets = 20;
const crypto = require('crypto');

for (var i = 0; i < nLongRequets; i++) {
	crypto.randomBytes(nBytes, (buf) => {
		console.log('Long buf finished');
	});
}

crypto.randomBytes(1, (buf) => {
	console.log('Short buf finished');
});

console.log('begin');

Thoughts on a similar PR to partition large crypto requests like this, or a doc-change PR like #17154 with a warning?

For the FS there are alternatives to fs.readFile if you are making a large request. I don't see comparable framework alternatives for large crypto requests. Thoughts on a new API for this?

@Trott

This comment has been minimized.

Member

Trott commented Nov 22, 2017

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Nov 22, 2017

Same as #17054 (comment) with the addendum that crypto.randomBytes() doesn't do I/O, it's purely CPU-bound. Users can partition requests themselves if they want.

As well, there is hardly ever a reason to request more than a few hundred bytes of randomness at a time. I don't think large requests are a practical concern.

MylesBorins added a commit that referenced this pull request Jan 22, 2018

test: add test description to fs.readFile tests
PR-URL: #17610
Refs: #17054 (comment)
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Jon Moss <me@jonathanmoss.me>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>

MylesBorins added a commit that referenced this pull request Jan 22, 2018

test: add test description to fs.readFile tests
PR-URL: #17610
Refs: #17054 (comment)
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Jon Moss <me@jonathanmoss.me>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
@davisjam

This comment has been minimized.

Contributor

davisjam commented Jan 24, 2018

aada57b, d181977: Hoist on my own petard. I will rebase.

fs: partition readFile against threadpool exhaustion
Problem:

Node implements fs.readFile as:
- a call to stat, then
- a C++ -> libuv request to read the entire file using the stat size

Why is this bad?
The effect is to place on the libuv threadpool a potentially-large
read request, occupying the libuv thread until it completes.
While readFile certainly requires buffering the entire file contents,
it can partition the read into smaller buffers
(as is done on other read paths)
along the way to avoid threadpool exhaustion.

If the file is relatively large or stored on a slow medium, reading
the entire file in one shot seems particularly harmful,
and presents a possible DoS vector.

Solution:

Partition the read into multiple smaller requests.

Considerations:

1. Correctness

I don't think partitioning the read like this raises
any additional risk of read-write races on the FS.
If the application is concurrently readFile'ing and modifying the file,
it will already see funny behavior. Though libuv uses preadv where
available, this doesn't guarantee read atomicity in the presence of
concurrent writes.

2. Performance

Downside: Partitioning means that a single large readFile will
  require into many "out and back" requests to libuv,
  introducing overhead.
Upside: In between each "out and back", other work pending on the
  threadpool can take a turn.

In short, although partitioning will slow down a large request,
it will lead to better throughput if the threadpool is handling
more than one type of request.

Test:

I added test/parallel/test-fs-readfile.js.

Benchmark:

I introduced benchmark/fs/readfile-partitioned.js to characterize
the performance tradeoffs.
See PR for details.

Related:

It might be that writeFile has similar behavior.
The writeFile path is a bit more complex and I didn't
investigate carefully.

Fixes: #17047

@davisjam davisjam force-pushed the davisjam:PartitionReadFile branch from 37dadf5 to 30ac024 Jan 24, 2018

@maclover7 maclover7 force-pushed the nodejs:master branch from bb5575a to 993b716 Jan 26, 2018

@cjihrig cjihrig force-pushed the nodejs:master branch from 993b716 to 082f952 Jan 26, 2018

@addaleax

This comment has been minimized.

Member

addaleax commented Jan 27, 2018

@davisjam

This comment has been minimized.

Contributor

davisjam commented Jan 28, 2018

@addaleax Yes.

@BridgeAR

This comment has been minimized.

Member

BridgeAR commented Feb 1, 2018

Landed in 67a4ce1

@BridgeAR BridgeAR closed this Feb 1, 2018

BridgeAR added a commit to BridgeAR/node that referenced this pull request Feb 1, 2018

fs: partition readFile against pool exhaustion
Problem:

Node implements fs.readFile as:
- a call to stat, then
- a C++ -> libuv request to read the entire file using the stat size

Why is this bad?
The effect is to place on the libuv threadpool a potentially-large
read request, occupying the libuv thread until it completes.
While readFile certainly requires buffering the entire file contents,
it can partition the read into smaller buffers
(as is done on other read paths)
along the way to avoid threadpool exhaustion.

If the file is relatively large or stored on a slow medium, reading
the entire file in one shot seems particularly harmful,
and presents a possible DoS vector.

Solution:

Partition the read into multiple smaller requests.

Considerations:

1. Correctness

I don't think partitioning the read like this raises
any additional risk of read-write races on the FS.
If the application is concurrently readFile'ing and modifying the file,
it will already see funny behavior. Though libuv uses preadv where
available, this doesn't guarantee read atomicity in the presence of
concurrent writes.

2. Performance

Downside: Partitioning means that a single large readFile will
  require into many "out and back" requests to libuv,
  introducing overhead.
Upside: In between each "out and back", other work pending on the
  threadpool can take a turn.

In short, although partitioning will slow down a large request,
it will lead to better throughput if the threadpool is handling
more than one type of request.

Fixes: nodejs#17047

PR-URL: nodejs#17054
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
@BridgeAR

This comment has been minimized.

Member

BridgeAR commented Feb 1, 2018

This broke our CI. I did not realize it right away and landed a couple other commits afterwards, otherwise I would have reverted this. A change landed a few hours before this one that changed the tmpDir behavior and broke the test from this PR.

I am submitting a fix.

@BridgeAR BridgeAR referenced this pull request Feb 1, 2018

Closed

test: fix builds #18500

4 of 4 tasks complete

kcaulfield94 added a commit to kcaulfield94/node that referenced this pull request Feb 2, 2018

fs: partition readFile against pool exhaustion
Problem:

Node implements fs.readFile as:
- a call to stat, then
- a C++ -> libuv request to read the entire file using the stat size

Why is this bad?
The effect is to place on the libuv threadpool a potentially-large
read request, occupying the libuv thread until it completes.
While readFile certainly requires buffering the entire file contents,
it can partition the read into smaller buffers
(as is done on other read paths)
along the way to avoid threadpool exhaustion.

If the file is relatively large or stored on a slow medium, reading
the entire file in one shot seems particularly harmful,
and presents a possible DoS vector.

Solution:

Partition the read into multiple smaller requests.

Considerations:

1. Correctness

I don't think partitioning the read like this raises
any additional risk of read-write races on the FS.
If the application is concurrently readFile'ing and modifying the file,
it will already see funny behavior. Though libuv uses preadv where
available, this doesn't guarantee read atomicity in the presence of
concurrent writes.

2. Performance

Downside: Partitioning means that a single large readFile will
  require into many "out and back" requests to libuv,
  introducing overhead.
Upside: In between each "out and back", other work pending on the
  threadpool can take a turn.

In short, although partitioning will slow down a large request,
it will lead to better throughput if the threadpool is handling
more than one type of request.

Fixes: nodejs#17047

PR-URL: nodejs#17054
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>

msoechting added a commit to hpicgs/node that referenced this pull request Feb 5, 2018

doc: fs.readFile is async but not partitioned
This change was suggested during the discussion of nodejs#17054.

PR-URL: nodejs#17154
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>

msoechting added a commit to hpicgs/node that referenced this pull request Feb 5, 2018

doc: non-partitioned async crypto operations
Neither crypto.randomBytes nor crypto.randomFill
partitions the work submitted to the threadpool.

This change was suggested during the discussion of nodejs#17054.
See also nodejs#17154.

PR-URL: nodejs#17250
Refs: nodejs#17154
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>

msoechting added a commit to hpicgs/node that referenced this pull request Feb 5, 2018

test: add test description to fs.readFile tests
PR-URL: nodejs#17610
Refs: nodejs#17054 (comment)
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Jon Moss <me@jonathanmoss.me>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>

msoechting added a commit to hpicgs/node that referenced this pull request Feb 5, 2018

fs: partition readFile against pool exhaustion
Problem:

Node implements fs.readFile as:
- a call to stat, then
- a C++ -> libuv request to read the entire file using the stat size

Why is this bad?
The effect is to place on the libuv threadpool a potentially-large
read request, occupying the libuv thread until it completes.
While readFile certainly requires buffering the entire file contents,
it can partition the read into smaller buffers
(as is done on other read paths)
along the way to avoid threadpool exhaustion.

If the file is relatively large or stored on a slow medium, reading
the entire file in one shot seems particularly harmful,
and presents a possible DoS vector.

Solution:

Partition the read into multiple smaller requests.

Considerations:

1. Correctness

I don't think partitioning the read like this raises
any additional risk of read-write races on the FS.
If the application is concurrently readFile'ing and modifying the file,
it will already see funny behavior. Though libuv uses preadv where
available, this doesn't guarantee read atomicity in the presence of
concurrent writes.

2. Performance

Downside: Partitioning means that a single large readFile will
  require into many "out and back" requests to libuv,
  introducing overhead.
Upside: In between each "out and back", other work pending on the
  threadpool can take a turn.

In short, although partitioning will slow down a large request,
it will lead to better throughput if the threadpool is handling
more than one type of request.

Fixes: nodejs#17047

PR-URL: nodejs#17054
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>

msoechting added a commit to hpicgs/node that referenced this pull request Feb 7, 2018

doc: fs.readFile is async but not partitioned
This change was suggested during the discussion of nodejs#17054.

PR-URL: nodejs#17154
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>

msoechting added a commit to hpicgs/node that referenced this pull request Feb 7, 2018

doc: non-partitioned async crypto operations
Neither crypto.randomBytes nor crypto.randomFill
partitions the work submitted to the threadpool.

This change was suggested during the discussion of nodejs#17054.
See also nodejs#17154.

PR-URL: nodejs#17250
Refs: nodejs#17154
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>

msoechting added a commit to hpicgs/node that referenced this pull request Feb 7, 2018

test: add test description to fs.readFile tests
PR-URL: nodejs#17610
Refs: nodejs#17054 (comment)
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Jon Moss <me@jonathanmoss.me>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>

msoechting added a commit to hpicgs/node that referenced this pull request Feb 7, 2018

fs: partition readFile against pool exhaustion
Problem:

Node implements fs.readFile as:
- a call to stat, then
- a C++ -> libuv request to read the entire file using the stat size

Why is this bad?
The effect is to place on the libuv threadpool a potentially-large
read request, occupying the libuv thread until it completes.
While readFile certainly requires buffering the entire file contents,
it can partition the read into smaller buffers
(as is done on other read paths)
along the way to avoid threadpool exhaustion.

If the file is relatively large or stored on a slow medium, reading
the entire file in one shot seems particularly harmful,
and presents a possible DoS vector.

Solution:

Partition the read into multiple smaller requests.

Considerations:

1. Correctness

I don't think partitioning the read like this raises
any additional risk of read-write races on the FS.
If the application is concurrently readFile'ing and modifying the file,
it will already see funny behavior. Though libuv uses preadv where
available, this doesn't guarantee read atomicity in the presence of
concurrent writes.

2. Performance

Downside: Partitioning means that a single large readFile will
  require into many "out and back" requests to libuv,
  introducing overhead.
Upside: In between each "out and back", other work pending on the
  threadpool can take a turn.

In short, although partitioning will slow down a large request,
it will lead to better throughput if the threadpool is handling
more than one type of request.

Fixes: nodejs#17047

PR-URL: nodejs#17054
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>

MylesBorins added a commit that referenced this pull request Feb 11, 2018

test: add test description to fs.readFile tests
PR-URL: #17610
Refs: #17054 (comment)
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Jon Moss <me@jonathanmoss.me>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>

MylesBorins added a commit that referenced this pull request Feb 12, 2018

test: add test description to fs.readFile tests
PR-URL: #17610
Refs: #17054 (comment)
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Jon Moss <me@jonathanmoss.me>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>

MylesBorins added a commit that referenced this pull request Feb 12, 2018

test: add test description to fs.readFile tests
PR-URL: #17610
Refs: #17054 (comment)
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Jon Moss <me@jonathanmoss.me>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>

MylesBorins added a commit that referenced this pull request Feb 13, 2018

test: add test description to fs.readFile tests
PR-URL: #17610
Refs: #17054 (comment)
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Jon Moss <me@jonathanmoss.me>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>

MayaLekova added a commit to MayaLekova/node that referenced this pull request May 8, 2018

fs: partition readFile against pool exhaustion
Problem:

Node implements fs.readFile as:
- a call to stat, then
- a C++ -> libuv request to read the entire file using the stat size

Why is this bad?
The effect is to place on the libuv threadpool a potentially-large
read request, occupying the libuv thread until it completes.
While readFile certainly requires buffering the entire file contents,
it can partition the read into smaller buffers
(as is done on other read paths)
along the way to avoid threadpool exhaustion.

If the file is relatively large or stored on a slow medium, reading
the entire file in one shot seems particularly harmful,
and presents a possible DoS vector.

Solution:

Partition the read into multiple smaller requests.

Considerations:

1. Correctness

I don't think partitioning the read like this raises
any additional risk of read-write races on the FS.
If the application is concurrently readFile'ing and modifying the file,
it will already see funny behavior. Though libuv uses preadv where
available, this doesn't guarantee read atomicity in the presence of
concurrent writes.

2. Performance

Downside: Partitioning means that a single large readFile will
  require into many "out and back" requests to libuv,
  introducing overhead.
Upside: In between each "out and back", other work pending on the
  threadpool can take a turn.

In short, although partitioning will slow down a large request,
it will lead to better throughput if the threadpool is handling
more than one type of request.

Fixes: nodejs#17047

PR-URL: nodejs#17054
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment