New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs: add recursive opendir/readdir #41439
fs: add recursive opendir/readdir #41439
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed behavior looks good to me, although the output design can be changed for the sake of easier control and management over the output for developers, but the current output design can also stay if desired.
Imagine a file structure as the following:
foo
|_ bar
|_ baz
|_ a.txt
|_ fizz
|_ b.txt
|_ fuzz
|_ c.txt
|_ d.txt
Design 1 (The proposed design aka the current design, only resolve paths to files):
import { readdirSync } from 'node:fs';
const output = readdirSync('./foo', { recursive: true });
console.log(output);
// ['bar/a.txt', 'fizz/b.txt', 'fuzz/c.txt', 'd.txt']
Design 2 (Resolve paths to both directories and files):
import { readdirSync } from 'node:fs';
const output = readdirSync('./foo', { recursive: true });
console.log(output);
// ['bar', 'bar/baz', 'bar/a.txt', 'fizz', 'fizz/b.txt', 'fuzz', 'fuzz/c.txt', 'd.txt']
Design 3 (Instead of returning an array of paths, return array of objects with information to each recursively read directories):
import { readdirSync } from 'node:fs';
const output = readdirSync('./foo', { recursive: true });
console.log(output);
/*
[
{
type: 'directory',
name: 'bar',
children: [
{
type: 'directory',
name: 'baz',
children: []
},
{
type: 'file',
name: 'a.txt'
}
]
},
{
type: 'directory',
name: 'fizz',
children: [
{
type: 'file',
name: 'b.txt'
}
]
},
{
type: 'directory',
name: 'fuzz',
children: [
{
type: 'file',
name: 'c.txt'
}
]
},
{
type: 'file',
name: 'd.txt'
}
]
*/
Also remember to fix the formatting issues reported by our linter(s) :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the result is potentially huge and the process itself might take virtually forever, especially on network drives etc, we should probably encourage streaming APIs instead of returning arrays. If we just return arrays, the process might crash after hours or so when the array allocations run into an out-of-memory situation, never having produced any results.
Can you fix linting please? It's hard to review with all the lint comments. Thanks! |
I am wondering if it wouldn't be nice for users to change the recursive option to be either await fs.readdir('.', { recursive: 'breadth-first' }) // ['a', 'b', 'a/x']
await fs.readdir('.', { recursive: 'depth-first' }) // ['a', 'a/x', b']
await fs.readdir('.', { recursive: true }) // throw new Error('Specify recursive as either "breadth-first" or "depth-first")
A Considering |
Thanks for all the great feedback folks. I'm going to update the tests to exemplify an API design for this feature that includes things like:
|
We also discussed on the call to keep this initial implementation simple. Omit things like a I personally would like there to be some sort of default behavior when using just |
I would prefer to also see |
Summary of recent updates: Default case uses just I included examples with the I included another way to configure the {
result?: 'stream',
algorithm?: 'depth-first' | 'breadth-first'
} Unless folks can come up with another I did not update the promise tests - want to focus on the API shape for now. I did not include async tests for all of the sync operations. After API agreement I'll add those in. I did not include error case tests yet. Will think about that more later |
When implementing a readdir stream a while back I had to do Different thing: Would it be a good idea to support abort signals? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are looking quite clean, good start. Added a recommendation that we add a test for symlink loops.
It might be worth looking at the test suite for fs-extra copy/recursive, they have a bunch of test cases around symlink behavior that we might be able to use for inspiration.
If i'm not mistaken the
Yes, for the async operations we could do this |
After some consideration, I think supporting alternative algorithms is going to make this PR more complicated than I can handle at this time. What would be the best way to defer the algorithm feature of this PR? One option: I can put the test files I've written for them into a new branch and open an issue to track. @bcoe @bnb |
imo getting a basic implementation in and expanding on that with additional algorithms in later PRs would be fine. |
See linked issue #41709 for |
Also for the sake of initial implementation, I'm dropping the promise API. I'd like to see us land the sync/async versions and then progress from there |
@Ethan-Arrowood What about the concerns raised in #41439 (review) and #41439 (review) ? |
Ahh I missed those somehow. My initial thought on the discussion is it seems like a fine idea. Focus on a stream based recursive readdir, and then also add a recursive option to opendir that would return an array. |
Okay new API design for initial implementation:
Future work:
What do we think? |
Does it mean that |
Good catch. I meant that the callback will be passed the same readable that is returned by the sync version |
Would anything really async happen before the stream is constructed and returned in the callback version? |
Not sure. I guess it'd be the difference between opendir and opendirSync under the hood. The promise version of this is simple, I plan on using an async iterable. The sync and callback versions I haven't given tooo much thought beyond implementing a Readable and using _read to exhaust recursive calls to something. |
Then why don't we make |
Ah i may be lost now. opendir just returns a Dir (via callback, promise, or sync) that's an async iterable of Dirents. Are you suggesting that when recursive is enabled, the async iterable will return everything in it as Dirents? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing
Landed in 7b39e80 |
Congrats @Ethan-Arrowood |
Thank you thank you! |
WOOOO CONGRATS @Ethan-Arrowood |
Adds a naive, linear recursive algorithm for the following methods: readdir, readdirSync, opendir, opendirSync, and the promise based equivalents. Fixes: #34992 PR-URL: #41439 Refs: nodejs/tooling#130 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Notable changes: crypto: * update root certificates to NSS 3.89 (Node.js GitHub Bot) #47659 dns: * (SEMVER-MINOR) expose getDefaultResultOrder (btea) #46973 doc: * add KhafraDev to collaborators (Matthew Aitken) #47510 fs: * add recursive option to readdir and opendir (Ethan Arrowood) #41439 * (SEMVER-MINOR) add support for mode flag to specify the copy behavior (Tetsuharu Ohzeki) #47084 stream: * (SEMVER-MINOR) preserve object mode in compose (Raz Luvaton) #47413 test_runner: * (SEMVER-MINOR) support combining coverage reports (Colin Ihrig) #47686 wasi: * (SEMVER-MINOR) make returnOnExit true by default (Michael Dawson) #47390 PR-URL: TODO
Notable changes: assert: * deprecate `CallTracker` (Moshe Atlow) #47740 crypto: * update root certificates to NSS 3.89 (Node.js GitHub Bot) #47659 dns: * (SEMVER-MINOR) expose `getDefaultResultOrder` (btea) #46973 doc: * add KhafraDev to collaborators (Matthew Aitken) #47510 fs: * (SEMVER-MINOR) add `recursive` option to `readdir` and `opendir` (Ethan Arrowood) #41439 * (SEMVER-MINOR) add support for `mode` flag to specify the copy behavior of the `cp` methods (Tetsuharu Ohzeki) #47084 http: * (SEMVER-MINOR) add `highWaterMark` option `http.createServer` (HinataKah0) #47405 stream: * (SEMVER-MINOR) preserve object mode in `compose` (Raz Luvaton) #47413 test_runner: * (SEMVER-MINOR) add `testNamePatterns` to `run` API (Chemi Atlow) #47648 * (SEMVER-MINOR) execute `before` hook on test (Chemi Atlow) #47586 * (SEMVER-MINOR) support combining coverage reports (Colin Ihrig) #47686 wasi: * (SEMVER-MINOR) make `returnOnExit` true by default (Michael Dawson) #47390 PR-URL: #47820
Notable changes: assert: * deprecate `CallTracker` (Moshe Atlow) #47740 crypto: * update root certificates to NSS 3.89 (Node.js GitHub Bot) #47659 dns: * (SEMVER-MINOR) expose `getDefaultResultOrder` (btea) #46973 doc: * add KhafraDev to collaborators (Matthew Aitken) #47510 fs: * (SEMVER-MINOR) add `recursive` option to `readdir` and `opendir` (Ethan Arrowood) #41439 * (SEMVER-MINOR) add support for `mode` flag to specify the copy behavior of the `cp` methods (Tetsuharu Ohzeki) #47084 http: * (SEMVER-MINOR) add `highWaterMark` option `http.createServer` (HinataKah0) #47405 stream: * (SEMVER-MINOR) preserve object mode in `compose` (Raz Luvaton) #47413 test_runner: * (SEMVER-MINOR) add `testNamePatterns` to `run` API (Chemi Atlow) #47648 * (SEMVER-MINOR) execute `before` hook on test (Chemi Atlow) #47586 * (SEMVER-MINOR) support combining coverage reports (Colin Ihrig) #47686 wasi: * (SEMVER-MINOR) make `returnOnExit` true by default (Michael Dawson) #47390 PR-URL: #47820
Notable changes: assert: * deprecate `CallTracker` (Moshe Atlow) #47740 crypto: * update root certificates to NSS 3.89 (Node.js GitHub Bot) #47659 dns: * (SEMVER-MINOR) expose `getDefaultResultOrder` (btea) #46973 doc: * add KhafraDev to collaborators (Matthew Aitken) #47510 fs: * (SEMVER-MINOR) add `recursive` option to `readdir` and `opendir` (Ethan Arrowood) #41439 * (SEMVER-MINOR) add support for `mode` flag to specify the copy behavior of the `cp` methods (Tetsuharu Ohzeki) #47084 http: * (SEMVER-MINOR) add `highWaterMark` option `http.createServer` (HinataKah0) #47405 stream: * (SEMVER-MINOR) preserve object mode in `compose` (Raz Luvaton) #47413 test_runner: * (SEMVER-MINOR) add `testNamePatterns` to `run` API (Chemi Atlow) #47628 * (SEMVER-MINOR) execute `before` hook on test (Chemi Atlow) #47586 * (SEMVER-MINOR) support combining coverage reports (Colin Ihrig) #47686 wasi: * (SEMVER-MINOR) make `returnOnExit` true by default (Michael Dawson) #47390 PR-URL: #47820
Notable changes: assert: * deprecate `CallTracker` (Moshe Atlow) #47740 crypto: * update root certificates to NSS 3.89 (Node.js GitHub Bot) #47659 dns: * (SEMVER-MINOR) expose `getDefaultResultOrder` (btea) #46973 doc: * add KhafraDev to collaborators (Matthew Aitken) #47510 fs: * (SEMVER-MINOR) add `recursive` option to `readdir` and `opendir` (Ethan Arrowood) #41439 * (SEMVER-MINOR) add support for `mode` flag to specify the copy behavior of the `cp` methods (Tetsuharu Ohzeki) #47084 http: * (SEMVER-MINOR) add `highWaterMark` option `http.createServer` (HinataKah0) #47405 stream: * (SEMVER-MINOR) preserve object mode in `compose` (Raz Luvaton) #47413 test_runner: * (SEMVER-MINOR) add `testNamePatterns` to `run` API (Chemi Atlow) #47628 * (SEMVER-MINOR) execute `before` hook on test (Chemi Atlow) #47586 * (SEMVER-MINOR) support combining coverage reports (Colin Ihrig) #47686 wasi: * (SEMVER-MINOR) make `returnOnExit` true by default (Michael Dawson) #47390 PR-URL: #47820
Wow, this looks like a big feature Does this represent a built-in replacement for some use cases of the following libraries? (npm trends)
|
Yes it does! It may not be 1-to-1 replacements but it achieves similar functionality with all of those |
Implements a recursive option for the
readdir
andreaddirSync
methods.Closes: #34992
Ref: nodejs/tooling#130
Starting with tests to just get a sense for how things should behave. Will dive into the implementation next. Any feedback/concerns is welcome at any time.