Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #535 --prune option #546

Closed
wants to merge 3 commits into from
Closed

Conversation

neuronull
Copy link

  • Added --prune option which will not descend into directories
    that are a match on pattern.
  • Added test to cover --prune option.

- Added --prune option which will not descend into directories
  that are a match on pattern.
- Added test to cover --prune option.
@neuronull
Copy link
Author

$ tree
.
├── bar
│   └── zap
│       ├── foo
│       │   └── zip
│       └── zig
└── foo
    ├── bar
    └── foo

7 directories, 1 file

$ fd foo
bar/zap/foo
foo
foo/foo

$ fd --prune foo
bar/zap/foo
foo

$ fd -P foo
bar/zap/foo
foo

src/walk.rs Outdated Show resolved Hide resolved
src/app.rs Outdated Show resolved Hide resolved
Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your contribution.

If we are going to introduce this new option in fd, I would definitely like to see it more thoroughly tested.

I certainly believe that the current implementation does not really work in all cases. We need to check possible interference with other command line options.

For example, consider what happens if we use --prune in combination with --type <filetype>. If a user specifies --type file, only files will be matched. With the current implementation, this would mean that --prune would not work anymore (I think).

There are also other command-line flags where we should check for interference:

  • -p, --full-path
  • -e, --extension <ext>... and other filters such as -S, --size <size>..., --changed-within <date|dur>, ...
  • -d, --max-depth <depth>
  • -E, --exclude <pattern>...
  • How does this all work with symlinks? With and without -L, --follow.

@neuronull
Copy link
Author

Thank you very much for your contribution.

Thanks for the feedback!

For example, consider what happens if we use --prune in combination with --type <filetype>. If a user specifies --type file, only files will be matched. With the current implementation, this would mean that --prune would not work anymore (I think).

You are correct, I just checked this case against the current implementation and --type + --prune can result in --prune not pruning matched directories.
Will address this in next iteration.

There are also other command-line flags where we should check for interference:

I will exercise these combinations and include some #test cases for them.

- Fixed logic for --prune to consider other command-line flags.
- Added additional test cases to cover combinations of --prune
  and other command-line flags.
@neuronull
Copy link
Author

For example, consider what happens if we use --prune in combination with --type <filetype>. If a user specifies --type file, only files will be matched. With the current implementation, this would mean that --prune would not work anymore (I think).

There are also other command-line flags where we should check for interference:

  • -p, --full-path
  • -e, --extension <ext>... and other filters such as -S, --size <size>..., --changed-within <date|dur>, ...
  • -d, --max-depth <depth>
  • -E, --exclude <pattern>...
  • How does this all work with symlinks? With and without -L, --follow.

In diff2, I attempted to cover all of these items.
One thing I couldn't quite figure out was how symlink testing is done.
I saw there was some symlink in the tests.rs file but was not able to figure out how exactly that works.
For the symlinks and --follow, I performed this manual test:

$ ln -s zap sym_zap
$ tree
.
├── a.foo
├── sym_zap -> zap
└── zap
    ├── bar
    │   └── zot
    ├── b.foo
    └── foo
        └── c.foo

4 directories, 4 files

$ fd -t f foo
a.foo
zap/b.foo
zap/foo/c.foo

$ fd --prune -t f foo
a.foo
zap/b.foo

$ fd --follow --prune -t f foo
a.foo
sym_zap/b.foo
zap/b.foo

@neuronull neuronull requested a review from sharkdp March 15, 2020 22:40
@neuronull
Copy link
Author

Sorry a n00b question here. Why you didn't use and && instead of nested if condition. something like what would C programmer do.

(Rust noob here.)
Not doing config.prune && was purely an oversight and my bad. Not combining the second and third of that nested ifs with entry.file_type().map_or(false, |e| e.is_dir()) was purely due to my newness to the language and not being 100% adept with its syntax and flow.

src/walk.rs Outdated Show resolved Hide resolved
tests/tests.rs Outdated Show resolved Hide resolved
tests/tests.rs Outdated Show resolved Hide resolved
tests/tests.rs Outdated Show resolved Hide resolved
tests/tests.rs Outdated Show resolved Hide resolved
tests/tests.rs Outdated Show resolved Hide resolved
@neuronull neuronull requested a review from sharkdp March 23, 2020 01:02
// when pruning, don't descend into matching dirs
// note that prune requires pattern
let walk_action = if config.prune
&& pattern.as_str().len() != 0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why this is necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it so that the prune is only evaluated if there is a pattern specified, thinking that the prune option should only be used with a pattern. (and I added a test case for this). But, I suppose it could be allowed and then the expected behavior would be that no directories would be descended into. What do you think/prefer?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, "not having a pattern" does not only refer to cases where --type is used, right? It could also be a unconstrained search, a --size search or a --changed-{within,before} search.

I agree that --prune maybe doesn't make sense in these cases, but we should take a closer look, I think. We could use the find behavior as a guideline.

If two options in combination do not make sense, we should also think about disallowing that via .conflicts_with.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, "not having a pattern" does not only refer to cases where --type is used, right? It could also be a unconstrained search, a --size search or a --changed-{within,before} search.

Good point! I hadn't considered that...

I agree that --prune maybe doesn't make sense in these cases, but we should take a closer look, I think. We could use the find behavior as a guideline.

Lets see. If we are using find as an example, it seems like the -prune option actually takes an argument on what to prune.
(I found this explanation helpful: https://stackoverflow.com/questions/1489277/how-to-use-prune-option-of-find-in-sh)

Currently this implementation uses the <pattern> as what to also prune. But I suppose we could make --prune take an argument, stating what to prune. In this case, the user has more control over what to prune. for example:

$ fd foo --prune=bar

, will look for matches with foo but not descend into dirs matching bar.

This would eliminate the need to do the check for pattern on line 369.
What do you think?
Yet it almost sounds like the functionality of --exclude <pattern> but specific to directories. hmmmm... are they too similar?

If two options in combination do not make sense, we should also think about disallowing that via .conflicts_with.

That is good to know about!
After reading your comment, and looking at the find documentation
http://man7.org/linux/man-pages/man1/find.1.html

  • The find -prune seems to conflict with -depth, so that could be something to consider, to mark conflicts _with on --max-depth?

  • I'm not convinced we'd need to mark prune as conflicts with --size or --changed-{} , if we added an argument to it as I proposed above. Someone could want to see files of a certain size but omitting some directories, maybe? But if we don't go with adding an argument to prune, marking as conflicts with --size and --changed-{} might be a better way to enforce the need for a to prune on.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet it almost sounds like the functionality of --exclude but specific to directories. hmmmm... are they too similar?

FWIW, the -E option invokes ripgreps gitignore style excludes. To ignore a directory -E foo/ should work. Eg.

[squirt ~/tmp/tests]
$ tree -p
.
├── [drwxr-xr-x]  bar
│   └── [drwxr-xr-x]  lower_dir
│       └── [-rw-r--r--]  bar
└── [drwxr-xr-x]  foo
    └── [drwxr-xr-x]  lower_dir
        └── [-rw-r--r--]  bar

4 directories, 2 files
[squirt ~/tmp/tests]
$ fd -E bar/
foo
foo/lower_dir
foo/lower_dir/bar

@sharkdp
Copy link
Owner

sharkdp commented Mar 23, 2020

Since this adds a new condition in the "hot path" of fd, we should check whether this leads to a performance regression (see regression.sh in https://github.com/sharkdp/fd-benchmarks), if --prune is not enabled. Branch prediction will probably make this negligible, but it's better to measure it.

@neuronull
Copy link
Author

Since this adds a new condition in the "hot path" of fd, we should check whether this leads to a performance regression (see regression.sh in https://github.com/sharkdp/fd-benchmarks), if --prune is not enabled. Branch prediction will probably make this negligible, but it's better to measure it.

I followed the instructions for generating the report (attached)
report.txt

@neuronull neuronull requested a review from sharkdp March 24, 2020 22:14
@sharkdp sharkdp mentioned this pull request Mar 28, 2020
Comment on lines +355 to +358
doc!(h, "prune"
, "Do not descend into matched directories"
, "When a directory matches the pattern, fd does not descend into it but still shows \
all matches at the same level as that directory.");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this is slightly ambiguous for people who aren't familiar with find's -prune option. In particular, I think it should be more explicitly stated that the matched directories are included in the output. Maybe something like:

"Do not descend into matched directories"
"When a directory matches the pattern, fd does not descend into it. The directory itself is not excluded."

Copy link

@jonathan-s jonathan-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this locally, and it works according to my expectations. So @neuronull I would really encourage you to stick it out with this PR.

For reference this is what I ran.

fd 'node_modules' . -t d --prune -I -E Library

@jonathan-s
Copy link

jonathan-s commented Jun 8, 2020

And to help with clarity perhaps @sharkdp could make clear what still needs to be done on this PR to get this merged? I would love to see this in.

@sharkdp
Copy link
Owner

sharkdp commented Jun 11, 2020

And to help with clarity perhaps @sharkdp could make clear what still needs to be done on this PR

This is a complex new feature. What is much more important for me than getting this in early is that we make sure that (1) it doesn't break anything (2) it doesn't make any existing functionality slower, (3) it works as expected and the design is well thought-out such that we don't have to rewrite it immediately after shipping it to the first users (4) it plays nicely with all kinds of existing features.

Also, this PR needs to be in a mergable state.

The four points listed are prioritized. We can merge this PR as soon as I have a feeling that we have enough evidence for all the listed points. Any help is very much appreciated. Examples:

For point (1): this is probably the hardest. We have a large test coverage but things can always go wrong. Thinking hard about how the new code could break things and writing new, sensible tests is one thing to help with this point.

For point (2): We have some benchmark results but we may need to repeat them after the PR is in a mergeable state again.

For point (3) and (4): What do we actually want from this feature? How do we expect it to behave it different situations? How do we expect it to behave in combination with other features? We have posed some of these questions here (and not all of them have been answered), but I'm sure there are more.

See also: #613

@sharkdp
Copy link
Owner

sharkdp commented Jul 26, 2020

closing for now (cleaning up list of open PRs).

@sharkdp sharkdp closed this Jul 26, 2020
@reima reima mentioned this pull request Oct 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants