Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: buffer dir entries in opendir() #29893

Closed
wants to merge 6 commits into from

Conversation

@addaleax
Copy link
Member

addaleax commented Oct 8, 2019

Read up to 32 directory entries in one batch when dir.readSync()
or dir.read() are called.

This increases performance significantly, although it introduces
quite a bit of edge case complexity.

                                                             confidence improvement accuracy (*)    (**)    (***)
 fs/bench-opendir.js mode='async' dir='lib' n=100                  ***    155.93 %      ±30.05% ±40.34%  ±53.21%
 fs/bench-opendir.js mode='async' dir='test/parallel' n=100        ***    479.65 %      ±56.81% ±76.47% ±101.32%
 fs/bench-opendir.js mode='sync' dir='lib' n=100                           10.38 %      ±14.39% ±19.16%  ±24.96%
 fs/bench-opendir.js mode='sync' dir='test/parallel' n=100         ***     63.13 %      ±12.84% ±17.18%  ±22.58%
Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
@addaleax addaleax requested a review from Fishrock123 Oct 8, 2019
@addaleax addaleax added the performance label Oct 8, 2019
@nodejs-github-bot

This comment has been minimized.

@mscdex

This comment has been minimized.

Copy link
Contributor

mscdex commented Oct 9, 2019

So this means that instead of 1 Dirent the APIs will now return an array of Dirents? If so, doesn't the documentation need to be updated? Also, wouldn't it be a good idea to have the number be configurable (via options passed to dir.read()/dir.readSync())?

@addaleax

This comment has been minimized.

Copy link
Member Author

addaleax commented Oct 9, 2019

So this means that instead of 1 Dirent the APIs will now return an array of Dirents? If so, doesn't the documentation need to be updated?

No, the API is unaffected and will still only return one item at a time.

Also, wouldn't it be a good idea to have the number be configurable (via options passed to dir.read()/dir.readSync()`)?

Maybe? If you intentionally want a large number of directory entries at once you probably wouldn’t use the streaming API…

src/node_dir.cc Outdated Show resolved Hide resolved
src/node_dir.h Show resolved Hide resolved
@cjihrig
cjihrig approved these changes Oct 9, 2019
@nodejs-github-bot

This comment has been minimized.

Copy link
Member

BridgeAR left a comment

This is pretty great work!

benchmark/fs/bench-opendir.js Outdated Show resolved Hide resolved
lib/internal/fs/dir.js Outdated Show resolved Hide resolved
lib/internal/fs/dir.js Outdated Show resolved Hide resolved
src/node_dir.cc Outdated Show resolved Hide resolved
Fishrock123 added a commit to Fishrock123/node that referenced this pull request Oct 9, 2019
This is unlikely to be necessary in any case, and causes much
unwarrented complexity when implementing further
optimizations.

Refs: nodejs#29893 (comment)
@Fishrock123 Fishrock123 mentioned this pull request Oct 9, 2019
3 of 3 tasks complete
@addaleax addaleax added blocked and removed author ready labels Oct 9, 2019
Fishrock123 added a commit to Fishrock123/node that referenced this pull request Oct 9, 2019
This is unlikely to be necessary in any case, and causes much
unwarrented complexity when implementing further
optimizations.

Refs: nodejs#29893 (comment)
Fishrock123 added a commit to Fishrock123/node that referenced this pull request Oct 9, 2019
This is unlikely to be necessary in any case, and causes much
unwarrented complexity when implementing further
optimizations.

Refs: nodejs#29893 (comment)
Fishrock123 added a commit that referenced this pull request Oct 9, 2019
This is unlikely to be necessary in any case, and causes much
unwarrented complexity when implementing further
optimizations.

Refs: #29893 (comment)

PR-URL: #29908
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
@Fishrock123

This comment has been minimized.

Copy link
Member

Fishrock123 commented Oct 9, 2019

Ok I landed #29908, lets see what this looks like without all that extra encoding nonsense. 😅

@addaleax addaleax force-pushed the addaleax:opendir-buffering branch from 0ccb46c to 4165094 Oct 9, 2019
@addaleax addaleax removed the blocked label Oct 9, 2019
@addaleax

This comment has been minimized.

Copy link
Member Author

addaleax commented Oct 9, 2019

@cjihrig @devnexen @Fishrock123 I’ve rebased this and it’s quite a bit simpler now, feel free to take another look

@nodejs-github-bot

This comment has been minimized.

@jasnell

This comment has been minimized.

Copy link
Member

jasnell commented Oct 9, 2019

Just thinking about it... given that opendir() is an async iteration over the directory entries, it may be good to document what happens when new entries are created while the iterator is still iterating...

e.g.

const fs = require('fs');

let n = 0;

async function print(path) {
  const dir = await fs.promises.opendir(path);
  for await (const dirent of dir) {
    fs.mkdirSync(`./foo${n++}`);
    console.log(dirent.name);
  }
}
print('./').catch(console.error);

and

const fs = require('fs');

const dir = fs.opendirSync(__dirname);

console.log(dir.readSync());
fs.mkdirSync('foo');
console.log(dir.readSync());

(the answer, of course, is that the newly created entries are not included in the iteration, but that should be documented :-) ...)

@jasnell
jasnell approved these changes Oct 9, 2019
BridgeAR added a commit that referenced this pull request Oct 10, 2019
This is unlikely to be necessary in any case, and causes much
unwarrented complexity when implementing further
optimizations.

Refs: #29893 (comment)

PR-URL: #29908
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Copy link
Member

Fishrock123 left a comment

(Also if you could amend the commit to note that it it also includes misc cleanup that would be ideal)

lib/internal/fs/dir.js Outdated Show resolved Hide resolved
@Fishrock123

This comment has been minimized.

Copy link
Member

Fishrock123 commented Oct 10, 2019

@jasnell I believe that also applies to uv_fs_scandir aka fs.readdir(), since other threads could operate on the directory on disk between I/O calls.

addaleax added 3 commits Oct 8, 2019
Read up to 32 directory entries in one batch when `dir.readSync()`
or `dir.read()` are called.

This increases performance significantly, although it introduces
quite a bit of edge case complexity.

                                                                 confidence improvement accuracy (*)    (**)    (***)
     fs/bench-opendir.js mode='async' dir='lib' n=100                  ***    155.93 %      ±30.05% ±40.34%  ±53.21%
     fs/bench-opendir.js mode='async' dir='test/parallel' n=100        ***    479.65 %      ±56.81% ±76.47% ±101.32%
     fs/bench-opendir.js mode='sync' dir='lib' n=100                           10.38 %      ±14.39% ±19.16%  ±24.96%
     fs/bench-opendir.js mode='sync' dir='test/parallel' n=100         ***     63.13 %      ±12.84% ±17.18%  ±22.58%
@addaleax addaleax force-pushed the addaleax:opendir-buffering branch from 4165094 to c66840d Oct 10, 2019
@addaleax

This comment has been minimized.

Copy link
Member Author

addaleax commented Oct 10, 2019

Just thinking about it... given that opendir() is an async iteration over the directory entries, it may be good to document what happens when new entries are created while the iterator is still iterating...

[...]

(the answer, of course, is that the newly created entries are not included in the iteration, but that should be documented :-) ...)

That is not the answer, sorry – POSIX says:

If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified.

I think it’s fair for our docs to reflect that, though. I’ve pushed c66840d for that.

@jasnell I believe that also applies to uv_fs_scandir aka fs.readdir(), since other threads could operate on the directory on disk between I/O calls.

(I assume the same goes for that, too.)

(Also if you could amend the commit to note that it it also includes misc cleanup that would be ideal)

I’m not really seeing anything unrelated to the buffering, besides maybe removing the GetAsyncWrap() fn?

@Trott Trott requested a review from Fishrock123 Oct 11, 2019

read();
});
} {

This comment has been minimized.

Copy link
@Fishrock123

Fishrock123 Oct 11, 2019

Member

@addaleax This patch seems wrong? There should be an else here... I think the benchmark you ran included readSync() along with the callback one...

This comment has been minimized.

Copy link
@addaleax

addaleax Oct 11, 2019

Author Member

Oops, yes … funny the linter wouldn’t complain about this? The correct benchmark results for changing to setImmediate are

                                                                confidence improvement accuracy (*)    (**)   (***)
 fs/bench-opendir.js mode='callback' dir='lib' n=100                           3.40 %      ±15.95% ±21.22% ±27.62%
 fs/bench-opendir.js mode='callback' dir='test/parallel' n=100        ***    -43.16 %      ±10.70% ±14.27% ±18.66%

This comment has been minimized.

Copy link
@Fishrock123

Fishrock123 Oct 11, 2019

Member

This result seems strange to me, how is that worse than before?

Regardless though after further though I think that nextTick is the reasonable thing to do.

This comment has been minimized.

Copy link
@addaleax

addaleax Oct 11, 2019

Author Member

@Fishrock123 I think it makes sense – as you noted, the readSync() part was run unconditionally before, so its (unchanged) performance was included in the -30 %. If we only benchmark what has been slowed down, seeing more negative impact seems about right to me.

@addaleax addaleax force-pushed the addaleax:opendir-buffering branch from c23f280 to 751a8b6 Oct 11, 2019
@nodejs-github-bot

This comment has been minimized.

addaleax added a commit that referenced this pull request Oct 11, 2019
Read up to 32 directory entries in one batch when `dir.readSync()`
or `dir.read()` are called.

This increases performance significantly, although it introduces
quite a bit of edge case complexity.

                                                                 confidence improvement accuracy (*)    (**)    (***)
     fs/bench-opendir.js mode='async' dir='lib' n=100                  ***    155.93 %      ±30.05% ±40.34%  ±53.21%
     fs/bench-opendir.js mode='async' dir='test/parallel' n=100        ***    479.65 %      ±56.81% ±76.47% ±101.32%
     fs/bench-opendir.js mode='sync' dir='lib' n=100                           10.38 %      ±14.39% ±19.16%  ±24.96%
     fs/bench-opendir.js mode='sync' dir='test/parallel' n=100         ***     63.13 %      ±12.84% ±17.18%  ±22.58%

PR-URL: #29893
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
addaleax added a commit that referenced this pull request Oct 11, 2019
PR-URL: #29893
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
@addaleax

This comment has been minimized.

Copy link
Member Author

addaleax commented Oct 11, 2019

Landed in 5c93aab 7812a61

@addaleax addaleax closed this Oct 11, 2019
@addaleax addaleax deleted the addaleax:opendir-buffering branch Oct 11, 2019
targos added a commit that referenced this pull request Nov 8, 2019
Read up to 32 directory entries in one batch when `dir.readSync()`
or `dir.read()` are called.

This increases performance significantly, although it introduces
quite a bit of edge case complexity.

                                                                 confidence improvement accuracy (*)    (**)    (***)
     fs/bench-opendir.js mode='async' dir='lib' n=100                  ***    155.93 %      ±30.05% ±40.34%  ±53.21%
     fs/bench-opendir.js mode='async' dir='test/parallel' n=100        ***    479.65 %      ±56.81% ±76.47% ±101.32%
     fs/bench-opendir.js mode='sync' dir='lib' n=100                           10.38 %      ±14.39% ±19.16%  ±24.96%
     fs/bench-opendir.js mode='sync' dir='test/parallel' n=100         ***     63.13 %      ±12.84% ±17.18%  ±22.58%

PR-URL: #29893
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
targos added a commit that referenced this pull request Nov 8, 2019
PR-URL: #29893
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
targos added a commit that referenced this pull request Nov 10, 2019
Read up to 32 directory entries in one batch when `dir.readSync()`
or `dir.read()` are called.

This increases performance significantly, although it introduces
quite a bit of edge case complexity.

                                                                 confidence improvement accuracy (*)    (**)    (***)
     fs/bench-opendir.js mode='async' dir='lib' n=100                  ***    155.93 %      ±30.05% ±40.34%  ±53.21%
     fs/bench-opendir.js mode='async' dir='test/parallel' n=100        ***    479.65 %      ±56.81% ±76.47% ±101.32%
     fs/bench-opendir.js mode='sync' dir='lib' n=100                           10.38 %      ±14.39% ±19.16%  ±24.96%
     fs/bench-opendir.js mode='sync' dir='test/parallel' n=100         ***     63.13 %      ±12.84% ±17.18%  ±22.58%

PR-URL: #29893
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
targos added a commit that referenced this pull request Nov 10, 2019
PR-URL: #29893
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.