Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi exec #960

Merged
merged 7 commits into from
Mar 8, 2022
Merged

Multi exec #960

merged 7 commits into from
Mar 8, 2022

Conversation

tmccombs
Copy link
Collaborator

No description provided.

@sharkdp
Copy link
Owner

sharkdp commented Feb 19, 2022

Oh - very cool! Thank you for working on this. Let us know when it's ready for review.

@tmccombs
Copy link
Collaborator Author

One issue here is if you run something like:

fd --exec echo directory  '{//}' \; --exec  echo basename '{/}' \;

then you can end up printing baches of directories then batches of basenames instead of writing the directory and base name together, because the commands are run in parallel.

There are a few things we can do about this:

  1. Do nothing, and warn users that if they care about the order of output they need to use --threads=1
  2. Hold the lock for stdout while running all the --exec commands. Which is not great, since it means commands can't be run in parallel.
  3. Concatenate the output from all commands into a string before writing to stdout. Would require some refactoring.
  4. Create a Vec of the outputs from the commands, then write ouch each entry with the stdout lock held. Would require some refactoring.

I'm leaning towards 3 or 4, what are your thoughts?

@tmccombs
Copy link
Collaborator Author

My latest commit implements 4. All that remains is writing some tests.

@tmccombs
Copy link
Collaborator Author

tmccombs commented Mar 1, 2022

Biggest downsides to using 3 or 4 are that it can result in higher memory usage, and potentially delay output from the first exec command(s). There will also be slightly higher overhead for allocating the buffers. That could probably be mitigated somewhat by using a smallvec instead of a Vec. Or possibly using a thread-local arena.

@tmccombs tmccombs marked this pull request as ready for review March 1, 2022 15:40
@tmccombs tmccombs force-pushed the multi-exec branch 2 times, most recently from ae98814 to 2427b1f Compare March 2, 2022 07:36
@tavianator
Copy link
Collaborator

I think 4 makes sense since we already buffer anyway, might as well buffer everything for each file.

Maybe we should implement a limit to the buffering? Right now fd -x yes will just eat all your memory. But no need to fix that in this PR, it's preexisting.

@sharkdp
Copy link
Owner

sharkdp commented Mar 4, 2022

Thank you, I'm planning to take a look soon.

So that if multiple `--exec` options are given, and the commands are run
in parallel, the buffered output for related commands will be
consecutive.
src/exec/command.rs Outdated Show resolved Hide resolved
@sharkdp
Copy link
Owner

sharkdp commented Mar 6, 2022

I found the following problematic case:

> touch foo
> fd -x echo \; -x
[fd error]: Problem while executing command: Permission denied (os error 13)

(maybe we could improve the error messages here)

Using strace -f -e execve we can see that it's trying to execute:

[pid 107001] execve("./foo", ["./foo"], 0x7ffc494d0718 /* 66 vars */) = -1 EACCES (Permission denied)

If foo is executable, it's actually executed.

@sharkdp
Copy link
Owner

sharkdp commented Mar 6, 2022

Benchmarks look good (no change when running a single command) 👍

Command execution

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-8.3.2 'ab' '/home/shark/Informatik/' --exec echo 357.7 ± 4.7 353.5 370.4 1.00 ± 0.01
./fd-960 'ab' '/home/shark/Informatik/' --exec echo 356.0 ± 1.7 354.3 359.3 1.00

Command execution (large output)

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-8.3.2 -tf 'ab' '/home/shark/Informatik/' --exec cat 325.7 ± 1.7 323.1 327.9 1.00
./fd-960 -tf 'ab' '/home/shark/Informatik/' --exec cat 326.8 ± 2.6 324.2 333.8 1.00 ± 0.01

Comment on lines +1388 to +1391
// TODO test for windows
if cfg!(windows) {
return;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the problem here? Only the path separator (/ vs \)? I think we have code somewhere to replace them automatically?

echo should be available on Windows as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think echo is only a shell builtin on Windows

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests seem to indicate otherwise, right?

fd/tests/tests.rs

Lines 1480 to 1501 in 0fd7ec5

/// Shell script execution (--exec) with a custom --path-separator
#[test]
fn test_exec_with_separator() {
let (te, abs_path) = get_test_env_with_abs_path(DEFAULT_DIRS, DEFAULT_FILES);
te.assert_output(
&[
"--path-separator=#",
"--absolute-path",
"foo",
"--exec",
"echo",
],
&format!(
"{abs_path}#a.foo
{abs_path}#one#b.foo
{abs_path}#one#two#C.Foo2
{abs_path}#one#two#c.foo
{abs_path}#one#two#three#d.foo
{abs_path}#one#two#three#directory_foo",
abs_path = abs_path.replace(std::path::MAIN_SEPARATOR, "#"),
),
);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that test fails on my Windows VM. I bet GIt Bash or MSYS or something installs an echo.exe binary that gets picked up on CI.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just following what we did in the test_exec and test_exec_batch, and (specificaly) test_exec_batch_with_limit.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'm fine with skipping Windows for now.

In this case, the only open point I see right now is the problem mentioned above (#960 (comment)).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you already reported it upstream. Thanks!

Accepting multiple occurances means we need to check this ourselves. See
clap-rs/clap#3542.
Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

One thing that I would like to discuss (independent from this PR) is the syntax for passing multiple commands. This might be a personal thing, but I really dislike the ; syntax for "ending" the list of arguments. It is ugly, needs escaping, and is hard to type (on German keyboards). It's also potentially confusing since ; has special meaning in a shell so users really need to understand the escaping part here.

fd … -x echo \; -x do_something

Is there any way we can provide something easier here? Using ; for separation has precedent in find, of course, but it's not like that's something universal. It's just a convention.

One option would be to come up with a different value terminator character or sequence (https://docs.rs/clap/latest/clap/struct.Arg.html#method.value_terminator). -- would be one common option, but that has special meaning already. So this would likely not work(?). But let's say we use something that's easier to type and doesn't need escaping:

fd … -x echo :: -x do_something

This would only cause problems if someone actually wants to pass :: to a command. But the same is true for the semicolon.

Another idea might be to provide a way to get rid of both: the value separator and the additional -x/--exec option. For example, we could choose a special character sequence (let's say + or ++) to directly combine two commands:

fd … -x echo + do_something

What do you think?

@tmccombs
Copy link
Collaborator Author

tmccombs commented Mar 8, 2022

personally I prefer the ;, because it is the same as what find uses, and seems pretty self-explanatory. to be fair, I use an american keyboard, where \; and ';' are pretty easy to type (the semicolon is on the home row).

I do like the idea of having a delimiter so you don't have to specify -x multiple times like fd ... -x echo ++ do_something. ++ is probably rare enough that needing to escape it would be rare.

What do you think of keeping \; to end the options to -x but also allowing you to specify multiple commands delimited by ++ or similar?

@sharkdp
Copy link
Owner

sharkdp commented Mar 8, 2022

What do you think of keeping \; to end the options to -x but also allowing you to specify multiple commands delimited by ++ or similar?

I agree. Let's keep \; for now. The other (++) syntax would be interesting to explore. But that can be done in a separate PR. @tavianator Any opinion on this?

@sharkdp sharkdp merged commit c577b08 into sharkdp:master Mar 8, 2022
@tmccombs tmccombs deleted the multi-exec branch March 8, 2022 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants