Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the --exec flag #116

Merged
merged 9 commits into from
Oct 15, 2017
Merged

Implement the --exec flag #116

merged 9 commits into from
Oct 15, 2017

Conversation

mmstick
Copy link
Contributor

@mmstick mmstick commented Oct 14, 2017

With this change, the --exec flag will be fully functional with a syntax similar to GNU Parallel. The argument that follows the --exec flag will be used as the command template, and each path discovered is distributed across a job pool for command generation & execution.

The README has been updated to demonstrate the new feature.

Closes #84

@mmstick
Copy link
Contributor Author

mmstick commented Oct 14, 2017

I'm guessing Rust 1.16 is a required target? Will need to do a few minor changes to get it building on that one.

@sharkdp
Copy link
Owner

sharkdp commented Oct 14, 2017

I'm guessing Rust 1.16 is a required target? Will need to do a few minor changes to get it building on that one.

I was actually thinking about increasing this for some time. What would be required for this PR?

@mmstick
Copy link
Contributor Author

mmstick commented Oct 14, 2017

Nothing's truly required. It's largely just syntax enhancements that have come about to make Rust more user-friendly, like not requiring that match arm scopes have commas, and eprintln.

@sharkdp
Copy link
Owner

sharkdp commented Oct 14, 2017

Sorry.. I meant: which Rust version would be required as new minimum version? 1.19?

@sharkdp
Copy link
Owner

sharkdp commented Oct 14, 2017

For some actual feedback: This is incredible, thank you very much!

I will review the code in detail (although I have a feeling that it will be more of a lesson for me than a proper review for you 😄), but first some general question:

I was trying this out, and noticed the following:

> fd main.rs --exec 'echo'
src/main.rs
> fd main.rs --exec 'echo {}'
src/main.rs
> fd main.rs --exec 'echo {.}'
src/main src/main.rs

The last one looks like {} is appended in addition to {.}. Is this intended?

@mmstick
Copy link
Contributor Author

mmstick commented Oct 14, 2017

Edit: I'll fix that.

@mmstick
Copy link
Contributor Author

mmstick commented Oct 14, 2017

Older builds should be working now, and fixed the double placement issue.

Edit: Or not, Arc::from_raw and field shorthands aren't supported in 1.16.

README.md Outdated

```sh
# Demonstration of parallel job execution
fd '*.flac' --exec 'sleep 1; echo $\{SHELL}: {}'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment: By default, fd treats the pattern as a regular expression, not a glob. So this should be either fd -e flac --exec ... or fd '\.flac$' --exec. Same for the other examples.

src/app.rs Outdated
@@ -95,6 +95,11 @@ pub fn build_app() -> App<'static, 'static> {
.takes_value(true)
.hidden(true),
)
.arg(
arg("exec")
.long("exec")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short option -e is already taken, but we could consider adding -x as a short version of --exec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright -- have added "x" as a short option.

@mmstick
Copy link
Contributor Author

mmstick commented Oct 14, 2017

Rust 1.19 will be a requirement.

@sharkdp
Copy link
Owner

sharkdp commented Oct 14, 2017

Rust 1.19 will be a requirement.

Okay. We should update .travis.yml to reflect that.

@mmstick
Copy link
Contributor Author

mmstick commented Oct 14, 2017

Done.

Rust 1.19 will be a requirement, however.
@sharkdp
Copy link
Owner

sharkdp commented Oct 14, 2017

One thing that should be worked out (not necessarily within this PR) is the interplay between --exec and --absolute-path. Currently, the search-threads in fd send relative paths across the mpsc::channel. When --absolute-path is activated, these paths will be relative to /. Currently, the following happens:

> touch /tmp/foo
> fd -a foo /tmp -x echo 
tmp/foo

@mmstick
Copy link
Contributor Author

mmstick commented Oct 14, 2017

Do you want outputs to be printed serially, in the order that inputs are processed?

@sharkdp
Copy link
Owner

sharkdp commented Oct 14, 2017

Do you want outputs to be printed serially, in the order that inputs are processed?

I don't think that's needed since there is no well-defined order in which we traverse the file-system (due to there being several threads walking the filesystem).

@mmstick
Copy link
Contributor Author

mmstick commented Oct 14, 2017

I can easily go ahead and fix the absolute path part, if you want me to.

@sharkdp
Copy link
Owner

sharkdp commented Oct 14, 2017

I can easily go ahead and fix the absolute path part, if you want me to.

That would be great.

@@ -1,36 +1,44 @@
[package]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it's a problem, but I'm curious: did you use a tool to re-format Cargo.toml?

Copy link
Contributor Author

@mmstick mmstick Oct 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use the cargo-edit tool to add dependencies, which also automatically formats it. Like, cargo add <dep>

README.md Outdated
@@ -34,7 +35,7 @@ subdirectories and about a million files. For averaging and statistical analysis
cache". Results for a cold cache are similar.

Let's start with `find`:
```
```sh
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if we could roll this back. Otherwise time in the output is highlighted because it's a shell-builtin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do that as I submit the next commit to get absolute paths working.

@mmstick
Copy link
Contributor Author

mmstick commented Oct 14, 2017

Absolute paths are working now.

mmstick:~/S/fd# target/debug/fd -a --exec 'echo {}'
/home/mmstick/Sources/fd/Cargo.lock
/home/mmstick/Sources/fd/LICENSE
/home/mmstick/Sources/fd/Cargo.toml
/home/mmstick/Sources/fd/CONTRIBUTING.md
/home/mmstick/Sources/fd/README.md
/home/mmstick/Sources/fd/appveyor.yml
/home/mmstick/Sources/fd/src
/home/mmstick/Sources/fd/tests/tests.rs
/home/mmstick/Sources/fd/tests
/home/mmstick/Sources/fd/src/app.rs
/home/mmstick/Sources/fd/src/internal.rs
/home/mmstick/Sources/fd/src/exec
/home/mmstick/Sources/fd/src/main.rs
/home/mmstick/Sources/fd/src/fshelper
/home/mmstick/Sources/fd/src/lscolors
/home/mmstick/Sources/fd/src/walk.rs
/home/mmstick/Sources/fd/src/exec/paths.rs
/home/mmstick/Sources/fd/src/output.rs
/home/mmstick/Sources/fd/src/fshelper/mod.rs
/home/mmstick/Sources/fd/src/exec/mod.rs
/home/mmstick/Sources/fd/build.rs
/home/mmstick/Sources/fd/src/exec/token.rs
/home/mmstick/Sources/fd/src/exec/job.rs
/home/mmstick/Sources/fd/src/lscolors/mod.rs
/home/mmstick/Sources/fd/src/exec/ticket.rs
/home/mmstick/Sources/fd/tests/testenv/mod.rs
/home/mmstick/Sources/fd/tests/testenv

@mmstick
Copy link
Contributor Author

mmstick commented Oct 14, 2017

I could also implement grouping of outputs, if you want outputs to be grouped by job (only print the stdout/stderr of one job at a time). Doing this on UNIX OS's is a lot easier than Windows though, since Windows doesn't really have RawFd support, or tmpfs support.

@mmstick
Copy link
Contributor Author

mmstick commented Oct 14, 2017

Implemented a mechanism for grouping outputs. It uses std::process::Output to act as a buffer for piped outputs, and a Arc<Mutex<()>> to serve as a lock for ensuring that only thread has write access at a given time.

It's the easiest way to implement this in a cross-platform manner. Only downside is that it has to perform heap allocations to buffer outputs, and will only print outputs when a job is complete.

@mmstick
Copy link
Contributor Author

mmstick commented Oct 14, 2017

Am working on an optimized variant of grouping for *nix systems.

@mmstick
Copy link
Contributor Author

mmstick commented Oct 15, 2017

Alright, *nix-specific specialization have been implemented. Shouldn't be any heap allocations on that end, beyond allowing the kernel to handle all buffering of outputs.

@mmstick
Copy link
Contributor Author

mmstick commented Oct 15, 2017

Here's a performance comparison. It would seem that find has a better latency. Will have to look into what they are doing differently. They apparently don't use the SHELL variable like I am, which makes the following possible:

> env SHELL=ion fd .rs --exec 'echo $to_uppercase("{}")'
SRC/APP.RS
TESTS/TESTS.RS
BUILD.RS
TESTS/TESTENV/MOD.RS
SRC/INTERNAL.RS
SRC/MAIN.RS
SRC/OUTPUT.RS
SRC/EXEC/PATHS.RS
SRC/WALK.RS
SRC/FSHELPER/MOD.RS
SRC/EXEC/MOD.RS
SRC/EXEC/TICKET.RS
SRC/EXEC/TOKEN.RS
SRC/EXEC/JOB.RS
SRC/LSCOLORS
SRC/LSCOLORS/MOD.RS

GNU find

> perf stat -r 100 find -name '*.rs' -exec echo {} \; > /dev/null

 Performance counter stats for 'find -name *.rs -exec echo {} ;' (100 runs):

         45.645229      task-clock:u (msec)       #    0.947 CPUs utilized            ( +-  0.89% )
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
             1,390      page-faults:u             #    0.030 M/sec                    ( +-  0.07% )
        24,016,632      cycles:u                  #    0.526 GHz                      ( +-  0.70% )  (85.24%)
         8,914,105      stalled-cycles-frontend:u #   37.12% frontend cycles idle     ( +-  0.80% )  (86.50%)
         3,929,859      stalled-cycles-backend:u  #   16.36% backend cycles idle      ( +-  2.81% )  (38.19%)
        18,868,794      instructions:u            #    0.79  insn per cycle         
                                                  #    0.47  stalled cycles per insn  ( +-  0.98% )  (59.13%)
         4,510,604      branches:u                #   98.819 M/sec                    ( +-  0.77% )  (72.28%)
           195,020      branch-misses:u           #    4.32% of all branches          ( +-  0.64% )  (84.44%)

       0.048211013 seconds time elapsed                                          ( +-  1.01% )

fd

> env SHELL=dash perf stat -r 100 fd -x 'echo {}' .rs > /dev/null

 Performance counter stats for 'fd -x echo {} .rs' (100 runs):

         75.216770      task-clock:u (msec)       #    2.363 CPUs utilized            ( +-  1.08% )
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
             3,364      page-faults:u             #    0.045 M/sec                    ( +-  0.25% )
        42,427,488      cycles:u                  #    0.564 GHz                      ( +-  3.92% )  (87.17%)
         9,996,856      stalled-cycles-frontend:u #   23.56% frontend cycles idle     ( +-  0.73% )  (89.05%)
         8,013,443      stalled-cycles-backend:u  #   18.89% backend cycles idle      ( +-  1.48% )  (32.36%)
        42,041,689      instructions:u            #    0.99  insn per cycle         
                                                  #    0.24  stalled cycles per insn  ( +-  8.39% )  (59.72%)
        10,112,065      branches:u                #  134.439 M/sec                    ( +-  8.44% )  (72.51%)
           124,204      branch-misses:u           #    1.23% of all branches          ( +-  0.89% )  (85.12%)

       0.031826863 seconds time elapsed                                          ( +-  0.85% )

@mmstick
Copy link
Contributor Author

mmstick commented Oct 15, 2017

So it seems the reason that find has an advantage is because it acts as it's own shell. It doesn't use any shell at all -- parses arguments and executes commands natively. That would make sense.

@sharkdp
Copy link
Owner

sharkdp commented Oct 15, 2017

I like that this spawns a shell, makes it more powerful.

Do you still want to work on this or do you think it's ready to be integrated?

I'm in favor of merging this in early, but unfortunately there are a few conflicts now due to #113.

@mmstick
Copy link
Contributor Author

mmstick commented Oct 15, 2017

I fixed the conflicts. It should be ready to merge now, and fully feature-complete.

@sharkdp
Copy link
Owner

sharkdp commented Oct 15, 2017

Possibly naive question: There are quite a lot of unsafe blocks now. Are these strictly necessary or due to optimizations?

@mmstick
Copy link
Contributor Author

mmstick commented Oct 15, 2017

All the unsafe blocks are just within the then_execute method for *nix systems, because any call to a function in libc (pipe/dup2/close) requires you to use the unsafe keyword. It's not actually unsafe -- they are just kernel system calls to create and manage anonymous file descriptors. Rust doesn't have these functions in the standard library precisely because it's specific to POSIX-like kernels.

The worst that'd happen is that if the programmer didn't write that section correctly (ie: remove some of those close calls), the piped file descriptors would cause each thread to hang at io::copy(), which is easily debugged (if, for whatever reason, someone decides to modify that code).

@sharkdp
Copy link
Owner

sharkdp commented Oct 15, 2017

Thank you for the clarification.

The unsafe-block in main.rswalk.rs is not FFI-specific though, right?

@mmstick
Copy link
Contributor Author

mmstick commented Oct 15, 2017

The unsafe block in walk.rs is a workaround to get the reference to the TokenizedCommand to be shared across threads, by converting the reference into an Arc. The borrow checker, at this time, just isn't smart enough to realize that the data lives well beyond the join() calls for the thread handles of each worker, and that the original data is never mutated during that timeframe (as it's an immutable reference).

pub fn basename(input: &str) -> &str {
let mut index = 0;
for (id, character) in input.bytes().enumerate() {
if character == b'/' {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use something like std::path::MAIN_SEPARATOR here instead of /. This doesn't seem to work on Windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched the module and it's tests to use MAIN_SEPARATOR and char_indices.

@sharkdp
Copy link
Owner

sharkdp commented Oct 15, 2017

Doen't quite seem to work on Windows (see inline comment). I think we should add a few integration tests after all (see tests/tests.rs). Maybe simply in the style of fd -x echo ==fd where. What do you think?

@mmstick
Copy link
Contributor Author

mmstick commented Oct 15, 2017

I'll look into writing integration tests when I get back home.

@sharkdp
Copy link
Owner

sharkdp commented Oct 15, 2017

Thank you very much for all of this! I'm going to merge this now and we can add the integration tests later.

@sharkdp sharkdp merged commit e0eab07 into sharkdp:master Oct 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants