minimize fs calls #2

es128 · 2015-12-10T20:16:16Z

For performance reasons, the less calls to fs methods, the better.

Currently, path entries go through a chain of stat -> realPath -> readdir (the last one only for confirmed dirs). But I think it would be better to refactor to something more like:

fs.readdir(path, function(err, entries) {
  if (err.Error === 'ENOTDIR') {
    // path exists, and it is a file or other non-dir fs entry
  } else if (err) {
    return callback(err);
  } else {
    // this is a dir, recurse to the entries
  }
});

Then after that do stat and/or realPath if requested based on opt-in parameters (or keep it simple and just leave it up to the consumer to do on their own).

It could even be possible to allow a really aggressive strategy of assuming that any entry that is named like a file (/\.\w{1,4}$/) is a file and skipping the fs.readdir.

I discussed the same sort of idea in paulmillr/readdirp#24, but when I really looked at it it turned out to seem to be too complex a change to slide into readdirp. I had planned to work on it as a new module soon after knocking down some of the bugs in chokidar that had been reported during my less OSS-active period this year (one of those bugs btw was directly caused by readdirp always using realPath, which is another motivation for not doing it here or making it optional). And then you came along and made this.

It would represent a really significant change to the strategy employed here so far (almost a complete rewrite possibly), but if you're cool with the concept I'd be happy to collaborate on moving in this direction here rather than work on it in a new project.

The text was updated successfully, but these errors were encountered:

bpasero · 2016-03-11T08:49:28Z

@es128 @kmalakoff I have had bad experiences doing eager fs.readdir() calls when using more recent versions of node.js. It seems to me that readdir is doing a whole bunch of things you do not want for files. We have similar code to walk a file system and found the current approach to work quite efficient (feedback welcome):

always begin with a lstat
if file is a link, stat it (you need this because lstat will not tell you if the file is a directory or not unless you stat it)
if file is a link, realpath it to get the real target and prevent endless loops if you have links pointing to each other (you will need to add this to walk-filtered to fix this issue)
continue with readdir for folders

kmalakoff · 2018-01-03T04:23:28Z

I tried both of your advice, but I couldn't get a faster running verion.

original

Default options x 6.03 ops/sec ±3.70% (33 runs sampled)
Serial (fs) x 3.31 ops/sec ±1.34% (21 runs sampled)
Parallel (fs) x 5.67 ops/sec ±1.81% (31 runs sampled)
Parallel (gfs) x 5.35 ops/sec ±3.06% (30 runs sampled)
Parallel limit (fs, 10) x 6.07 ops/sec ±2.14% (33 runs sampled)
Parallel limit (fs, 50) x 5.99 ops/sec ±2.24% (33 runs sampled)
Parallel limit (fs, 100) x 5.48 ops/sec ±7.40% (31 runs sampled)
Parallel limit (gfs, 10) x 5.93 ops/sec ±2.04% (33 runs sampled)
Parallel limit (gfs, 50) x 5.68 ops/sec ±2.91% (31 runs sampled)
Parallel limit (gfs, 100) x 5.74 ops/sec ±2.23% (32 runs sampled)

No lstat using failing readdir with a check for error.code === 'ENOTDIR'
https://github.com/kmalakoff/walk-filtered/tree/no-stats

Default options x 1.32 ops/sec ±7.84% (11 runs sampled)
Serial (fs) x 0.62 ops/sec ±19.78% (8 runs sampled)
Parallel (fs) x 1.15 ops/sec ±2.16% (10 runs sampled)
Parallel (gfs) x 1.12 ops/sec ±2.33% (10 runs sampled)
Parallel limit (fs, 10) x 1.21 ops/sec ±2.61% (11 runs sampled)
Parallel limit (fs, 50) x 1.22 ops/sec ±2.08% (10 runs sampled)
Parallel limit (fs, 100) x 1.19 ops/sec ±2.35% (10 runs sampled)
Parallel limit (gfs, 10) x 1.24 ops/sec ±1.50% (11 runs sampled)
Parallel limit (gfs, 50) x 1.22 ops/sec ±2.19% (10 runs sampled)
Parallel limit (gfs, 100) x 1.20 ops/sec ±2.98% (10 runs sampled)

always begin with a lstat, etc from above
https://github.com/kmalakoff/walk-filtered/tree/optimizations

Default options x 5.98 ops/sec ±3.07% (33 runs sampled)
Serial (fs) x 3.31 ops/sec ±1.53% (21 runs sampled)
Parallel (fs) x 5.67 ops/sec ±2.65% (32 runs sampled)
Parallel (gfs) x 5.55 ops/sec ±3.07% (31 runs sampled)
Parallel limit (fs, 10) x 5.95 ops/sec ±2.84% (33 runs sampled)
Parallel limit (fs, 50) x 5.79 ops/sec ±3.65% (32 runs sampled)
Parallel limit (fs, 100) x 5.87 ops/sec ±2.68% (32 runs sampled)
Parallel limit (gfs, 10) x 5.88 ops/sec ±2.52% (33 runs sampled)
Parallel limit (gfs, 50) x 5.83 ops/sec ±3.25% (32 runs sampled)
Parallel limit (gfs, 100) x 5.71 ops/sec ±2.92% (32 runs sampled)

The original still seems to perform the best. Any ideas? Feel free to restructure and submit a pull request to try to beat it.

kmalakoff · 2020-04-21T05:07:36Z

I've published a refactored library using an asyncIterator API. See fs-iterator

Also, I ran performance tests against readdirp.

es128 mentioned this issue Mar 11, 2016

Mac: High CPU usage for large folders paulmillr/chokidar#447

Closed

kmalakoff added a commit that referenced this issue Jan 3, 2018

Tried advice from: #2

5be7015

kmalakoff closed this as completed Apr 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minimize fs calls #2

minimize fs calls #2

es128 commented Dec 10, 2015

bpasero commented Mar 11, 2016

kmalakoff commented Jan 3, 2018 •

edited

kmalakoff commented Apr 21, 2020

minimize fs calls #2

minimize fs calls #2

Comments

es128 commented Dec 10, 2015

bpasero commented Mar 11, 2016

kmalakoff commented Jan 3, 2018 • edited

kmalakoff commented Apr 21, 2020

kmalakoff commented Jan 3, 2018 •

edited