Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--exec does not escape cmd.exe metacharacters (Windows) #155

Closed
reima opened this issue Oct 30, 2017 · 16 comments
Closed

--exec does not escape cmd.exe metacharacters (Windows) #155

reima opened this issue Oct 30, 2017 · 16 comments
Labels

Comments

@reima
Copy link
Contributor

reima commented Oct 30, 2017

On Windows, --exec launches the command through cmd.exe. This works fine as long as the file names do not contain any of the metacharacters (, ), %, !, ^, ", <, >, &, or |. These metacharacters are not escaped by fd. As a consequence, they are interpreted by cmd.exe with their special meaning.

This may even lead to unwanted processes being started by fd:

E:\Development\Rust\Projects\fd\test>..\target\debug\fd.exe
x&whoami

E:\Development\Rust\Projects\fd\test>..\target\debug\fd.exe --exec "echo {}"
x
reima

reima is my Windows user name, which is printed by calling whoami.

There is also an issue with the escaping of file names with spaces (which was not fixed with #135, at least on Windows):

E:\Development\Rust\Projects\fd\test>..\target\debug\fd.exe
a b c

E:\Development\Rust\Projects\fd\test>..\target\debug\fd.exe --exec "echo {}"
\"a b c\"
@reima
Copy link
Contributor Author

reima commented Oct 30, 2017

This makes we wonder why we take the detour with cmd.exe/sh at all. It incurs the runtime overhead of launching an additional process and it increases the implementation complexity.

On the other hand this enables features like shell output redirection without explicitly invoking a shell.

I'm not certain this is worth all of the trouble of multiple layers of escaping. find and xargs don't launch an intermediary shell either.

Looking forward to other thoughts on this.

@mmstick
Copy link
Contributor

mmstick commented Oct 31, 2017

The main reason for shelling out to a shell is because it's incredibly complex to properly parse shell syntax, and in doing so you will end up restricting yourself to whatever subset of shell functionality that your implementation supports.

I'm not certain this is worth all of the trouble of multiple layers of escaping

Escaping currently only happens with a single line of code, so there aren't multiple layers -- just one.

escape(Cow::Borrowed(self.data))

Had a feeling this would happen with that crate though. Probably best that we do the escaping ourselves. The crate doesn't really handle the whole special character set on both platforms.

@mmstick
Copy link
Contributor

mmstick commented Oct 31, 2017

We should simply check for any bytes within the range of 0...42, 59...64, 91...96, and 123...127, and then escape them accordingly. On *nix systems, that's the \ character, and with cmd on Windows it is the ^ character.

@mmstick
Copy link
Contributor

mmstick commented Oct 31, 2017

Something like this should do the job efficiently:

use std::borrow::Cow;

#[cfg(windows)]
const ESCAPE_CHAR: u8 = b'^';

#[cfg(not(windows))]
const ESCAPE_CHAR: u8 = b'\\';

fn needs_escape(byte: u8) -> bool {
    byte < 43 || (byte > 58 && byte < 65)
        || (byte > 90 && byte < 97)
        || (byte > 122 && byte <= 127)
}

pub fn escape<'a>(input: &'a str) -> Cow<'a, str> {
    let chars_to_escape = input.as_bytes().iter().filter(|&&x| needs_escape(x)).count();
    if chars_to_escape == 0 {
        Cow::Borrowed(input)
    } else {
        let mut output = Vec::with_capacity(input.len() + chars_to_escape);
        for &character in input.as_bytes() {
            if needs_escape(character) {
                output.push(ESCAPE_CHAR);
            }
            output.push(character);
        }
        let output = unsafe { String::from_utf8_unchecked(output) };
        Cow::Owned(output)
    }
 }

mmstick added a commit to mmstick/fd that referenced this issue Oct 31, 2017
mmstick added a commit to mmstick/fd that referenced this issue Oct 31, 2017
@mmstick
Copy link
Contributor

mmstick commented Oct 31, 2017

Might help if you can try out #158 and see if that fixes your issues.

@reima
Copy link
Contributor Author

reima commented Nov 1, 2017

The main reason for shelling out to a shell is because it's incredibly complex to properly parse shell syntax, and in doing so you will end up restricting yourself to whatever subset of shell functionality that your implementation supports.

I agree, implementing shell syntax would be crazy. But that's not what I was thinking about. I was thinking about dropping the intermediary shell and with it all support for shell redirection etc. inside the --exec argument.

Escaping currently only happens with a single line of code, so there aren't multiple layers -- just one.

Exactly. Currently there is only one layer of escaping. The second layer (cmd escaping) is missing—that's what this issue is about. See https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/ for details.

We should simply check for any bytes within the range of 0...42, 59...64, 91...96, and 123...127, and then escape them accordingly.

Could you please elaborate how you came up with these ranges? There's still the issue with properly quoting file names with spaces, which cannot be solved by only escaping on a character by character basis.

Might help if you can try out #158 and see if that fixes your issues.

Thanks for the PR. It escapes all the cmd metacharacters I tested. But unfortunately files with spaces are still an issue:

C:\Users\reima\Development\Rust\fd\test>..\target\debug\fd.exe
a b c
a&whoami

C:\Users\reima\Development\Rust\fd\test>..\target\debug\fd.exe --exec ls
ls: cannot access 'a': No such file or directory
ls: cannot access 'b': No such file or directory
ls: cannot access 'c': No such file or directory
a&whoami

@mmstick
Copy link
Contributor

mmstick commented Nov 1, 2017

I was thinking about dropping the intermediary shell and with it all support for shell redirection etc. inside the --exec argument.

This would majorly limit the capabilities of what you can do with command executions. There are many things you can do beyond redirections that having an intermediary shell provides, such as executing multiple statements, conditionals, shell expansions (such as process, brace, variable, array, etc. expansions), and entire scripts.

So without the shell, you'd have to replicate these at some level, and that's effectively writing your own shell. And it'd be a shame to give up these capabilities just because cmd.exe doesn't correctly parse arguments with spaces.

Could you please elaborate how you came up with these ranges? There's still the issue with properly quoting file names with spaces, which cannot be solved by only escaping on a character by character basis.

The range was created from the ASCII table, which includes all special characters that matter within the 7-bit range.

It's surprising that ^ doesn't work to escape spaces. Looking into it further, according to this, it's impossible to escape spaces within cmd. The closest you can get, in your example, would be for cmd to take "a b c" as the input and hope that whatever command you are using will strip the " characters from the input. Very confusing.

@mmstick
Copy link
Contributor

mmstick commented Nov 1, 2017

Best course of action may just be to drop cmd.exe support and use Powershell instead. Powershell allows you to escape spaces, along with all other characters, using their ` escape character.

@mmstick
Copy link
Contributor

mmstick commented Nov 1, 2017

@reima Try the latest commit, which is using powershell -NoProfile -Command instead of cmd /C.

@sharkdp sharkdp added the bug label Nov 1, 2017
@reima
Copy link
Contributor Author

reima commented Nov 1, 2017

This would majorly limit the capabilities of what you can do with command executions. There are many things you can do beyond redirections that having an intermediary shell provides, such as executing multiple statements, conditionals, shell expansions (such as process, brace, variable, array, etc. expansions), and entire scripts.

That's something I don't see myself using a single command line for. Anything as complex as conditionals I would put into a script that is spawned by --exec. Which is of course still possible without an enforced intermediary shell. But I can see your point. It certainly can be convenient to be able to use shell features directly. It's just that it's not that important to me that I can embed shell commands directly into --exec, as there are other ways to get the same functionality (using scripts or calling the shell explicitly).

Another argument against spawning a shell is that every time I don't use shell features, I pay for something I don't use. If I use fd to kick off a Python script for each file, the shell process is superfluous.

It's also worth noting that someone coming from find would not expect that --exec spawns a shell, as find doesn't do that either. That's a difference that should at least be documented somewhere.

Maybe we could compromise on making the shell optional via a command line switch?

And it'd be a shame to give up these capabilities just because cmd.exe doesn't correctly parse arguments with spaces.

There is no issue with cmd.exe not parsing correctly. It's that that tunneling arguments through cmd.exe is notoriously hard to do right.

The range was created from the ASCII table, which includes all special characters that matter within the 7-bit range.

Yes, but why did you chose these ranges specifically? E.g. why is 42 included, but 43 is not? What I'm getting at is that without further explanation these are just a bunch of magic numbers. There should at least be a comment in the source code about how these ranges came to be (e.g. what class of escapeable characters they contain).

It's surprising that ^ doesn't work to escape spaces. Looking into it further, according to this, it's impossible to escape spaces within cmd. The closest you can get, in your example, would be for cmd to take "a b c" as the input and hope that whatever command you are using will strip the " characters from the input. Very confusing.

Sadly, that's how command line arguments work on Windows. In *nix, the kernel API for spawing a process (execve and friends) expects a list of strings for the command line arguments. So it is the responsibility of the calling process to supply each argument on its own. In Windows, the kernel API (CreateProcess) expects the complete command line in a single string. The called process can interpret this string as it sees fit. So it is the responsibility of the called process to split the command line in separate arguments. This is usually done with CommandLineToArgvW, but there is no guarantee at all that any given program does this. Yeah, it's a bit of a mess. But unfortunately that's something you have to deal with to support Windows.

@reima
Copy link
Contributor Author

reima commented Nov 1, 2017

@reima Try the latest commit, which is using powershell -NoProfile -Command instead of cmd /C.

Escaping works just fine now. But the performance impact of spinning up a new PowerShell for each file is huge (times are in seconds, ~60 files per run, each command was run multiple times in different orders, and the numbers are representative samples):

E:\Development\Rust\Projects\fd> Measure-Command {.\target\debug\fd.exe} | Select -ExpandProperty TotalSeconds
0,0242955
E:\Development\Rust\Projects\fd> Measure-Command {.\target\debug\fd.exe -x ls.exe} | Select -ExpandProperty TotalSeconds
2,8479149

Using --exec will of course always be more expensive, but with the previous commit (that still used cmd) the overhead is much lower:

E:\Development\Rust\Projects\fd> Measure-Command {.\target\debug\fd.exe -x ls.exe} | Select -ExpandProperty TotalSeconds
0,3226748

I also tested with a WIP version where no shell is spawned, which is a little faster (as expected):

E:\Development\Rust\Projects\fd> Measure-Command {.\target\debug\fd.exe -x ls.exe} | Select -ExpandProperty TotalSeconds
0,2148832

@tmccombs
Copy link
Collaborator

tmccombs commented Nov 2, 2017

I tend to agree that executing the command in a shell is unnecessary.

@mmstick
Copy link
Contributor

mmstick commented Nov 2, 2017

Am working on a the logic to enable execution without a shell. It's just rather complicated on the Linux side of things. Namely, you need to ensure that you write an argument splitter yourself, and must use a different set of rules for escaping inputs, so you're kind of re-inventing a basic shell parser.

You can't really use quotes with inputs, because they can conflict with quotes provided by the user, but you can't use backslashes either, because then you'll just have additional backslashes in your input. So it's very, very tricky to handle properly.

@reima
Copy link
Contributor Author

reima commented Nov 3, 2017

@mmstick I've created PR #160 with my WIP version of removing the shell entirely. I decided to use the same syntax as find/xargs/parallel etc. and allow multiple arguments for the --exec command. These are then exactly the arguments that are being passed to the called process, thus eliminating the need for manual argument splitting. E.g. instead of fd --exec "ls -l {}" you would simply use fd --exec ls -l {}. A single ; argument can be used to terminate the command.

@sharkdp
Copy link
Owner

sharkdp commented Nov 8, 2017

@reima @mmstick Sorry for the late reply. Thank you both for looking into this and for your contributions.

That's something I don't see myself using a single command line for. Anything as complex as conditionals I would put into a script that is spawned by --exec. Which is of course still possible without an enforced intermediary shell. But I can see your point. It certainly can be convenient to be able to use shell features directly. It's just that it's not that important to me that I can embed shell commands directly into --exec, as there are other ways to get the same functionality (using scripts or calling the shell explicitly).

I believe I also agree with this. An additional shell layer would be nice to have in terms of functionality, but dropping it will speed things up and also dramatically simplify the involved logic concerning quoting. I'm afraid that we would keep running into similar issues as reported in this ticket - and each of these is, in effect, a rather serious problem (or even a vulnerability). Accidental spawning of unwanted processes might seem contrived, but people will run fd in folders with millions of files and one of them might indeed have a very unfortunate name 😃.

@sharkdp
Copy link
Owner

sharkdp commented Nov 18, 2017

closed via #160.

@sharkdp sharkdp closed this as completed Nov 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants