New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookbook ideas for walkdir crate #112

Closed
dtolnay opened this Issue May 17, 2017 · 25 comments

Comments

Projects
None yet
10 participants
@dtolnay
Member

dtolnay commented May 17, 2017

Come up with ideas for nice introductory examples of using walkdir, possibly in combination with other crates, that would be good to show in the Rust Cookbook. Please leave a comment here with your ideas! You don't necessarily have to write the example code yourself but PRs are always welcome!

@budziq

This comment has been minimized.

Show comment
Hide comment
@budziq

budziq May 18, 2017

Collaborator
  • recursively find all files with given predicate (date / extension / size / executable flag)
  • find file duplicates in directory recursively
Collaborator

budziq commented May 18, 2017

  • recursively find all files with given predicate (date / extension / size / executable flag)
  • find file duplicates in directory recursively
@dtolnay

This comment has been minimized.

Show comment
Hide comment
@dtolnay

dtolnay May 18, 2017

Member

Thanks! I agree both of those would make great examples.

Member

dtolnay commented May 18, 2017

Thanks! I agree both of those would make great examples.

@Michael-F-Bryan

This comment has been minimized.

Show comment
Hide comment
@Michael-F-Bryan

Michael-F-Bryan May 20, 2017

Contributor

What about recursively walking the directory structure and printing it to the screen somewhat like the tree command does?

$ tree src
src
├── document.rs
 ...
├── section.rs
└── visitor
    ├── mod.rs
    └── printer.rs
Contributor

Michael-F-Bryan commented May 20, 2017

What about recursively walking the directory structure and printing it to the screen somewhat like the tree command does?

$ tree src
src
├── document.rs
 ...
├── section.rs
└── visitor
    ├── mod.rs
    └── printer.rs
@budziq

This comment has been minimized.

Show comment
Hide comment
@budziq

budziq May 20, 2017

Collaborator
  • recursively calculate file and directory sizes at given min and max depth
  • sort files by a complex predicate effectively
Collaborator

budziq commented May 20, 2017

  • recursively calculate file and directory sizes at given min and max depth
  • sort files by a complex predicate effectively
@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson May 27, 2017

Contributor

Maybe just "recursively calculate size of all files in a directory".

  • Walk over a directory while skipping dotfiles. Common example featured in the walkdir docs.
Contributor

brson commented May 27, 2017

Maybe just "recursively calculate size of all files in a directory".

  • Walk over a directory while skipping dotfiles. Common example featured in the walkdir docs.
@Henning-K

This comment has been minimized.

Show comment
Hide comment
@Henning-K

Henning-K Jun 1, 2017

How about (recursively) counting up all file types/endings from current directory on and printing them to the screen?

Henning-K commented Jun 1, 2017

How about (recursively) counting up all file types/endings from current directory on and printing them to the screen?

@clarcharr

This comment has been minimized.

Show comment
Hide comment
@clarcharr

clarcharr Jun 1, 2017

Searching for all Rust files in a directory, maybe? I do that with the collapse-crate crate.

clarcharr commented Jun 1, 2017

Searching for all Rust files in a directory, maybe? I do that with the collapse-crate crate.

@cetra3

This comment has been minimized.

Show comment
Hide comment
@cetra3

cetra3 Jun 1, 2017

Contributor

Word count of a directory!

Contributor

cetra3 commented Jun 1, 2017

Word count of a directory!

@vandenoever

This comment has been minimized.

Show comment
Hide comment
@vandenoever

vandenoever Jun 1, 2017

Compare two directories by using sorted iterators.

    let a = WalkDir::new("dir_a").sort_by(|a,b| a.cmp(b));
    let b = WalkDir::new("dir_b").sort_by(|a,b| a.cmp(b));
    for entry in a {
        ....
    }

vandenoever commented Jun 1, 2017

Compare two directories by using sorted iterators.

    let a = WalkDir::new("dir_a").sort_by(|a,b| a.cmp(b));
    let b = WalkDir::new("dir_b").sort_by(|a,b| a.cmp(b));
    for entry in a {
        ....
    }
@KodrAus

This comment has been minimized.

Show comment
Hide comment
@KodrAus

KodrAus Jun 1, 2017

From the internals thread

@imbaczek has suggested restructuring a directory based on mtime:

https://github.com/imbaczek/organize-by-mtime

KodrAus commented Jun 1, 2017

From the internals thread

@imbaczek has suggested restructuring a directory based on mtime:

https://github.com/imbaczek/organize-by-mtime

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Jun 1, 2017

Contributor

Thanks for all the ideas. Let's settle on some and keep moving.

How about (recursively) counting up all file types/endings from current directory on and printing them to the screen?

@Henning-K There's a "recursively find files with given predicate" example here. Seems about what you are after.

Searching for all Rust files in a directory, maybe?

@clarcharr Also seems functionally similar to previous example.

@vandenoever Comparing two directories seems plausible.

I wonder how many recipes we need for walkdir. I think we don't need too many. Most important to show off capabilities of the library for real tasks. The two we have (1 2) actually don't seem to show distinct capabilities of walkdir between them.

The "compare two directories" idea at least shows `sort_by'.

Walkdir's own docs basically show off filter_entry and follow_links, and amusingly neither of ours does. Can either of our existing be modified to use filter_entry instead?

I wonder what the advantage of filter_entry is over into_iter().filter_map(). Hm, I'm a bit confused.

Other walkdir features that seem prominent that we're not showing off are min_dir / max_dir.

Contributor

brson commented Jun 1, 2017

Thanks for all the ideas. Let's settle on some and keep moving.

How about (recursively) counting up all file types/endings from current directory on and printing them to the screen?

@Henning-K There's a "recursively find files with given predicate" example here. Seems about what you are after.

Searching for all Rust files in a directory, maybe?

@clarcharr Also seems functionally similar to previous example.

@vandenoever Comparing two directories seems plausible.

I wonder how many recipes we need for walkdir. I think we don't need too many. Most important to show off capabilities of the library for real tasks. The two we have (1 2) actually don't seem to show distinct capabilities of walkdir between them.

The "compare two directories" idea at least shows `sort_by'.

Walkdir's own docs basically show off filter_entry and follow_links, and amusingly neither of ours does. Can either of our existing be modified to use filter_entry instead?

I wonder what the advantage of filter_entry is over into_iter().filter_map(). Hm, I'm a bit confused.

Other walkdir features that seem prominent that we're not showing off are min_dir / max_dir.

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Jun 1, 2017

Contributor

I added "Recursively calculate file sizes at given depth" to the op based on @budziq since it leverages min_depth / max_depth.

Contributor

brson commented Jun 1, 2017

I added "Recursively calculate file sizes at given depth" to the op based on @budziq since it leverages min_depth / max_depth.

@vandenoever

This comment has been minimized.

Show comment
Hide comment
@vandenoever

vandenoever Jun 1, 2017

if you want to highlight efficiency of walkdir vs find you could compare find |sort vs walkdir.sort_by. The latter is much more efficient in terms of memory and cpu.

vandenoever commented Jun 1, 2017

if you want to highlight efficiency of walkdir vs find you could compare find |sort vs walkdir.sort_by. The latter is much more efficient in terms of memory and cpu.

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Jun 2, 2017

Contributor

@BurntSushi is not sure what the benefit of filter_entry over filter_map is.

Contributor

brson commented Jun 2, 2017

@BurntSushi is not sure what the benefit of filter_entry over filter_map is.

@budziq

This comment has been minimized.

Show comment
Hide comment
@budziq

budziq Jun 2, 2017

Collaborator

@brson

what the benefit of filter_entry over filter_map is.

filter_entry does not descend into nodes failing the predicate while

Can either of our existing be modified to use filter_entry instead?

I guess that we could add filter_entry/ follow_links / sort_by to the already existing examples with some tweaking.

Collaborator

budziq commented Jun 2, 2017

@brson

what the benefit of filter_entry over filter_map is.

filter_entry does not descend into nodes failing the predicate while

Can either of our existing be modified to use filter_entry instead?

I guess that we could add filter_entry/ follow_links / sort_by to the already existing examples with some tweaking.

@KodrAus

This comment has been minimized.

Show comment
Hide comment
@KodrAus

KodrAus Jun 2, 2017

I wonder what the advantage of filter_entry is over into_iter().filter_map(). Hm, I'm a bit confused.

@brson It looks like there are a few ergonomic differences. filter_entry takes a predicate like:

FnMut(&DirEntry) -> bool

Whereas filter_map would take a predicate like:

FnMut(Result<DirEntry>) -> Option<B>

If the DirEntry is a directory and the predicate doesn't match then it doesn't walk that directory, whereas filter/filter_map still will (I think).

KodrAus commented Jun 2, 2017

I wonder what the advantage of filter_entry is over into_iter().filter_map(). Hm, I'm a bit confused.

@brson It looks like there are a few ergonomic differences. filter_entry takes a predicate like:

FnMut(&DirEntry) -> bool

Whereas filter_map would take a predicate like:

FnMut(Result<DirEntry>) -> Option<B>

If the DirEntry is a directory and the predicate doesn't match then it doesn't walk that directory, whereas filter/filter_map still will (I think).

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Jun 2, 2017

Contributor

filter_entry will actually skip subdirectories when it can - if you return false on a directory, the entire subtree is skipped. So it seems like it's a best practice to use it, and we should probably prefer it for the recipes.

Contributor

brson commented Jun 2, 2017

filter_entry will actually skip subdirectories when it can - if you return false on a directory, the entire subtree is skipped. So it seems like it's a best practice to use it, and we should probably prefer it for the recipes.

@KodrAus

This comment has been minimized.

Show comment
Hide comment
@KodrAus

KodrAus Jun 2, 2017

So it seems like it's a best practice to use it, and we should probably prefer it for the recipes.

@brson I agree, let's always prefer filter_entry over the standard iterator filters in the cookbook examples and move the discussion over the internals thread if we want to explore walkdirs filter_entry vs Iterator's filter/filter_map a bit more.

KodrAus commented Jun 2, 2017

So it seems like it's a best practice to use it, and we should probably prefer it for the recipes.

@brson I agree, let's always prefer filter_entry over the standard iterator filters in the cookbook examples and move the discussion over the internals thread if we want to explore walkdirs filter_entry vs Iterator's filter/filter_map a bit more.

@budziq

This comment has been minimized.

Show comment
Hide comment
@budziq

budziq Jun 2, 2017

Collaborator

@KodrAus

let's always prefer filter_entry over the standard iterator filters

Please note that filter_entry changes the semantics over filter which might be surprising and unwanted by the user (we might want to filter over files but descend to all dirs or filter dirs with separate predicate). These two would coexist. But I kinda find the filter_entry name braking the rule of least surprise.

Collaborator

budziq commented Jun 2, 2017

@KodrAus

let's always prefer filter_entry over the standard iterator filters

Please note that filter_entry changes the semantics over filter which might be surprising and unwanted by the user (we might want to filter over files but descend to all dirs or filter dirs with separate predicate). These two would coexist. But I kinda find the filter_entry name braking the rule of least surprise.

@KodrAus

This comment has been minimized.

Show comment
Hide comment
@KodrAus

KodrAus Jun 2, 2017

@brson Should I raise another issue in this repo for using filter_entry in our walkdir samples?

KodrAus commented Jun 2, 2017

@brson Should I raise another issue in this repo for using filter_entry in our walkdir samples?

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Jun 2, 2017

Contributor

@budziq These semantic points would be good to point out in recipe text, and would be good to have some pretext in the cookbook for one example needing the short-circuiting behavior, and another needing to avoid the short-cirtuiting behavior. Can we think of how to rephrase one of the existing examples to want the short-circuiting, and the other to not, or otherwise come up with a new idea that fulfills these two criteria?

@KodrAus yes please. But do capture @budziq's point about the semantic differences not being clear cut.

Contributor

brson commented Jun 2, 2017

@budziq These semantic points would be good to point out in recipe text, and would be good to have some pretext in the cookbook for one example needing the short-circuiting behavior, and another needing to avoid the short-cirtuiting behavior. Can we think of how to rephrase one of the existing examples to want the short-circuiting, and the other to not, or otherwise come up with a new idea that fulfills these two criteria?

@KodrAus yes please. But do capture @budziq's point about the semantic differences not being clear cut.

@brson brson closed this Jun 2, 2017

@brson brson reopened this Jun 2, 2017

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Jun 2, 2017

Contributor

I guess that we could add filter_entry/ follow_links / sort_by to the already existing examples with some tweaking.

@budziq hm, I feel like "Recursively find all files with given predicate" is pretty great as stated - it is a real simple common task. Let's not over complicate that one.

filter_entry would work well for a "Traverse directories while skipping dotfiles" example, which also skips dot-directories. I feel pretty good about this as a recipe. Common task. The text can explain the short-circuiting behavior on dot-directories. What do you think?

Per @BurntSushi's comment about follow_symlinks it does seem like a good one to get in an example, with some text color about how it normalizes behavior across platforms.

I like the idea of tweaking "Recursively find duplicate file names" to follow symlinks - seems like a decent way to extend that example to show some unique functionality. Though "Recursively find duplicate file names while following symlinks" doesn't strike me as the perfect recipe - what does duplicate file names have to do with symlinks. Still, I think it is ok.

Contributor

brson commented Jun 2, 2017

I guess that we could add filter_entry/ follow_links / sort_by to the already existing examples with some tweaking.

@budziq hm, I feel like "Recursively find all files with given predicate" is pretty great as stated - it is a real simple common task. Let's not over complicate that one.

filter_entry would work well for a "Traverse directories while skipping dotfiles" example, which also skips dot-directories. I feel pretty good about this as a recipe. Common task. The text can explain the short-circuiting behavior on dot-directories. What do you think?

Per @BurntSushi's comment about follow_symlinks it does seem like a good one to get in an example, with some text color about how it normalizes behavior across platforms.

I like the idea of tweaking "Recursively find duplicate file names" to follow symlinks - seems like a decent way to extend that example to show some unique functionality. Though "Recursively find duplicate file names while following symlinks" doesn't strike me as the perfect recipe - what does duplicate file names have to do with symlinks. Still, I think it is ok.

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Jun 2, 2017

Member

Though "Recursively find duplicate file names while following symlinks" doesn't strike me as the perfect recipe - what does duplicate file names have to do with symlinks

Well, it's not really about deduplication. You have two choices when recursively traversing a directory while following symlinks, where symlinks are used to create a loop:

  1. Loop forever.
  2. Detect loops, return an error, and refuse to continue descending ad infinitum.

This actually happens. Example on my system:

$ find -L /proc
...
find: File system loop detected; ‘/proc/self/task/18490/root/dev/fd/3/clones/fts-rs/test/cyclic/cyclic’ is part of the same file system loop as ‘/proc/self/task/18490/root/dev/fd/3/clones/fts-rs/test/cyclic’.
...

The only way to detect loops is to determine whether path X and path Y point to the same file. By its very nature, X != Y, so you can't just compare file names.

Member

BurntSushi commented Jun 2, 2017

Though "Recursively find duplicate file names while following symlinks" doesn't strike me as the perfect recipe - what does duplicate file names have to do with symlinks

Well, it's not really about deduplication. You have two choices when recursively traversing a directory while following symlinks, where symlinks are used to create a loop:

  1. Loop forever.
  2. Detect loops, return an error, and refuse to continue descending ad infinitum.

This actually happens. Example on my system:

$ find -L /proc
...
find: File system loop detected; ‘/proc/self/task/18490/root/dev/fd/3/clones/fts-rs/test/cyclic/cyclic’ is part of the same file system loop as ‘/proc/self/task/18490/root/dev/fd/3/clones/fts-rs/test/cyclic’.
...

The only way to detect loops is to determine whether path X and path Y point to the same file. By its very nature, X != Y, so you can't just compare file names.

@budziq

This comment has been minimized.

Show comment
Hide comment
@budziq

budziq Jun 2, 2017

Collaborator

"Recursively find duplicate file names while following symlinks" doesn't strike me as the perfect recipe

Well the "while following symlinks" part does indeed sound artificial at best. I would suggest to keep the example title as it was "Recursively find duplicate file names" and just add implementation and textual description regarding symlinks

Collaborator

budziq commented Jun 2, 2017

"Recursively find duplicate file names while following symlinks" doesn't strike me as the perfect recipe

Well the "while following symlinks" part does indeed sound artificial at best. I would suggest to keep the example title as it was "Recursively find duplicate file names" and just add implementation and textual description regarding symlinks

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Jun 2, 2017

Member

I would object to calling anything that walkdir does "recursively find duplicate file names." That's not what it's doing. It's specifically detecting loops in the file system when it's asked to traverse symbolic links. For example, if two distinct symbolic links point to the same file, then walkdir will happily emit both of them.

Ignore this comment. Sorry. Serves me right for not actually looking at the cookbook example. :-(

Member

BurntSushi commented Jun 2, 2017

I would object to calling anything that walkdir does "recursively find duplicate file names." That's not what it's doing. It's specifically detecting loops in the file system when it's asked to traverse symbolic links. For example, if two distinct symbolic links point to the same file, then walkdir will happily emit both of them.

Ignore this comment. Sorry. Serves me right for not actually looking at the cookbook example. :-(

@budziq budziq closed this Sep 24, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment