Skip to content

Add Filename.quote_command function#1492

Merged
damiendoligez merged 3 commits intoocaml:trunkfrom
xavierleroy:quote_command
Jun 25, 2019
Merged

Add Filename.quote_command function#1492
damiendoligez merged 3 commits intoocaml:trunkfrom
xavierleroy:quote_command

Conversation

@xavierleroy
Copy link
Copy Markdown
Contributor

As reported in MPR#6107 and MPR#7672, it is not easy to properly quote the command name and its arguments in a shell command executed through Sys.command or the Unix.open_process* functions. For a POSIX shell it is enough to apply Filename.quote on every argument. But for the Windows shell cmd.exe, additional quoting is required on top of that, as explained here.

This pull requests grants the feature wish from MPR#7672 by adding a Filename.quote_command function that takes a command and its arguments (as a list of strings) and produces a command line appropriately quoted for the shell that is going to parse and execute it.

Documentation of Sys.command, Unix.system and Unix.open_process* was updated to encourage the use of Filename.quote_command.

Preventive FAQ

Q: why string list -> string and not string array -> string for consistency with Unix.exec?

A: it felt right. MPR#7672 suggested string list. It's easier to build a list (with :: and @) than an array. Perhaps it's Unix.exec that is wrong in its use of a string array.

@dra27
Copy link
Copy Markdown
Member

dra27 commented Nov 27, 2017

This is a great leap forward, though the essay I've just spent an hour writing still needs more investigation, so for now a holding comment to say that I've been looking at this, and to link to the original issue ocaml/dune#322 which led to MPR#7672 and which isn't (yet) addressed by this GPR.

The TL;DR is that this still doesn't entirely deal with nasty escaping cases - to give a brief example, consider the correctly formatted Sys.command "\"\"C:\\Evil \"^%\"PATH\"^%\"PATH\"^%\" Dir\\foo.exe\"\"" which calls the program C:\Evil %PATH%PATH% Dir\foo.exe which can't (I think) at present be arrived at using Filename.quote_command. For why this might matter, consider that dir "25%-50% growth data.txt" looks for a file called 25%-50% growth data.txt but

set -50="&rd/s/q %USERPROFILE%\Documents&"
dir "25%-50% growth data.txt"

tries to erase your home profile...

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Your example is too complicated for me to follow. Can you make it shorter? Like 2 characters long? Also, is the problem in the quoting algorithm given here or in my implementation of it?

@dra27
Copy link
Copy Markdown
Member

dra27 commented Nov 30, 2017

I can't make it much shorter, no! The issue is the combination of:

  • embedded % characters to defined environment variables (which, I think is mentioned in the article you link but then overlooked - I think that the implementation shown there is therefore buggy)
  • spaces (so an executable filename which requires quotes to be executed)
  • (quoted arguments, though I already simplified that example by not including one)

The third point is the original problem in MPR#7672 which is that the command "C:\Some Directory\command.exe" "quoted argument" requires an extra set of quotes around the entire command when passed to cmd /c because the slightlyphenomenally arcane rules for the cmd command itself eliminate the effective outer quotes when there are additional quotes within string, as the cmd /? "explains":

If /C or /K is specified, then the remainder of the command line after
the switch is processed as a command line, where the following logic is
used to process quote (") characters:

    1.  If all of the following conditions are met, then quote characters
        on the command line are preserved:

        - no /S switch
        - exactly two quote characters
        - no special characters between the two quote characters,
          where special is one of: &<>()@^|
        - there are one or more whitespace characters between the
          two quote characters
        - the string between the two quote characters is the name
          of an executable file.

    2.  Otherwise, old behavior is to see if the first character is
        a quote character and if so, strip the leading character and
        remove the last quote character on the command line, preserving
        any text after the last quote character.

For that horrific example, your implementation escapes the outermost quotes, which doesn't work - I'm not yet certain that that counts as a bug in your implementation per se, because you explain that Filename.quote_command has to be used carefully with redirections and the like. However, I can't see a way with any permutation of Filename.quote or Filename.quote_command of correctly quoting that executable name, because the escaping of a % character differs depending on whether it's within a quoted string or not.

Unfortunately, I imagine that what may be required is a more expressive datatype than string list which very quickly leads to my normal opinion on this - just don't use Sys.command! Personally, I'd prefer to put the effort into moving some of the more advanced process management functions from Unix to Sys, as virtually all uses of Sys.command can either be replaced by better Unix process calls or are non-portable anyway.

It's worth remembering that even cmd gets all this wrong - tab completion on a filename with % is incorrectly handled.

I'm concerned that if we end up adding a function which is hard to use correctly then we end up appearing to be "doing it right", when in reality unexpected results like that "attack" I outline above are still possible.

@Chris00
Copy link
Copy Markdown
Member

Chris00 commented Nov 30, 2017

I'd prefer to put the effort into moving some of the more advanced process management functions from Unix to Sys

I think that would be a good solution as well — export something like open_process_full so that the program is searched in the path.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

xavierleroy commented Dec 2, 2017

All right, let me make one last attempt at being constructive and designing a proper solution. (As opposed to the terribly negative "it's completely broken" replies above.)

To make the discussion more understandable, keep in mind that my current proposal encodes ["cmd"; "arg 1"; "arg 2"] as

    ^"cmd^" ^"arg 1^" ^"arg 2^"

Special characters in cmd and arg are further escaped, but not spaces, as shown above.

First, I claim that this encoding, taken from here, is correct for the arguments to the command. cmd.exe strips the ^ escapes and then starts the command, which performs CommandLineToArgv, which understands the " and the remaining escapes.

In particular, %VAR% sequences in the arguments are correctly escaped as ^%VAR^%, because for cmd.exe they do NOT occur between double quotes -- those double quotes are ^-escaped! Just run cmd.exe and try it:

   set VAR=xxx
   echo ^"^%VAR^%^"              // output: "%VAR%", what we want
   echo "^%VAR^%"                // output: "^%VAR^%", not what we want

Now, there is a problem with the first word of the command line: the path to the file to be executed. This path is parsed entirely by cmd.exe, so the CommandLineToArgv rules don't apply. If the executable file is some file.exe, the current encoding is

    ^"some file.exe^"

and cmd.exe thinks that the executable is "some.

So, we need to use cmd.exe quoting to protect this first word of the command. Here is a summary of cmd.exe quoting conventions. What we can do:

  • Enclose the first word in double quotes.
  • To protect against unwanted stripping of the first and last double quotes, preventively enclose the whole command line in an extra pair of double quotes. ("Case 2" in the "explanation" above.)
  • Within the first word, escape % as %% to disable variable expansion. [EDIT: this quoting doesn't work.]
  • Within the first word, the only other special characters are double quotes and newlines, which I don't know how to escape, but it doesn't matter because they are not allowed in NTFS file names (nor in FAT32, actually), so Filename.quote_command can just fail if it sees those characters in the first word of the command.

With this approach the encoding of ["cmd"; "arg 1"; "arg 2"] would be

    ""cmd" ^"arg 1^" ^"arg 2^""

I'll try to code this approach when I find enough energy.

@dra27
Copy link
Copy Markdown
Member

dra27 commented Dec 2, 2017

I'm not totally sure how you get from "This is a great leap forward" to "it's completely broken", but let's not fight that out here - I'll owe you a coffee/beer at the next dev meeting instead!

I agree that the quoting for arguments completely works - the only reason we might want to avoid the extra escaping of quote characters is to keep the command line length down so we don't hit the limit.

I'm not certain how that SS64 reference arrives at "%%" as a means of escaping "%" characters (I'd like to know, because it might help to refute it) but the only place where I know of that working is batch parameters inside shell scripts (i.e. it's for %i at the prompt but for %%i in a batch file). Sys.command (Filename.quote_command ["C:\\%VAR%\\foo.exe"]) passes the literal ""C:\%%VAR%%\foo.exe"" to Sys.command but that still isn't correct (""C:\"^%"VAR"^%"\foo.exe"" works)

Adding the quotes around the whole command creates an issue with your example for redirections (Filename.quote_command ["ls"; "-l"; f] ^ ">" ^ Filename.quote_command [f']). The new behaviour for Filename.quote_command correctly handles the fact that anything after a shell separator (>, &&, etc.) needs to use command rather than argument escaping, but the extra quotes are now a problem:

(* This would be literally ""echo" ^"foo^"" *)
let command = Filename.quote_command ["echo"; "foo"];;
(* This would be literally ""C:\"^%"PATH"^%" Bar"" *)
let file = Filename.quote_command ["C:\\%PATH% Bar"];;
(* As given, the Sys.command below will echo
foo">C:\^%PATH^% Val
  to the console. If instead command and file have
  their outer quotes removed and the outer quotes go
  over the whole command then it works. *)
Sys.command (command ^ ">" ^ file)

So I think that either quoting the entire argument should be by a new version of Sys.command or Filename.quote_command needs to take a slightly different data structure. Sys.command cannot compatibly be updated just to add the quotes all the time because built-ins gets messed around by wrong quoting (echo is a built-in, findstr is a program):

# Sys.command "echo";;
ECHO is on.
- : int = 0
# Sys.command "\"echo\"";;
ECHO is on.
- : int = 0
# Sys.command "\"\"echo\"\"";;

- : int = 0
# Sys.command "findstr";;
FINDSTR: Bad command line
- : int = 2
# Sys.command "\"findstr\"";;
FINDSTR: Bad command line
- : int = 2
# Sys.command "\"\"findstr\"\"";;
FINDSTR: Bad command line
- : int = 2

perhaps Filename.(quote_command [C ["echo"; "foo"]; S ">"; C ["C:\\%PATH% Val"]]) yielding the literal ""echo" ^"foo^">"C:\"^%"PATH"^%" Val" could work? (Sys.command being unchanged)?

@xavierleroy
Copy link
Copy Markdown
Contributor Author

xavierleroy commented Dec 3, 2017

Thanks a lot for the feedback, very useful indeed.

I think we're close. Not quoting builtin commands such as echo could be achieved in a systematic way by simply not quoting words that don't need quoting. As to how to escape % in the first word, my limited experiments (with Windows 10) agree with the theory that %% within double quotes gives %. [EDIT: this quoting doesn't work] But otherwise we could just reject % in the first word as being illegal (or at least damn dangerous!) in a file name. Nobody puts % characters in executable file names.

I also think we have a show-stopper, namely the issue with redirections (> and <) that interact badly with the idea of putting the whole output of quote_command between a pair of double quotes. I agree that a change in API for quote_command is in order.

At any rate, I wasn't too proud of the documentation I wrote explaining how to call quote_command twice with an unquoted > redirection between. So that's another reason for a better API.

I can think of two APIs:

  • The ocamlbuild-like API: quote_command takes a list of strings tagged with constructors saying whether to quote the string or to leave it verbatim.
  • A very simplified API where quote_command takes two optional arguments, to_file and from_file, or maybe to and from, that add the appropriate >/< redirections and quoting. In this approach we don't support pipes or other fancy shell operators, just the minimal redirections that a portable OCaml program such as the OCaml compilers themselves can expect to work on all OSes.

@dra27
Copy link
Copy Markdown
Member

dra27 commented Dec 3, 2017

What's the detail of your experiments with %% quoting - everything I'm trying is definitely giving me two % characters, but it may be that the case I'm dealing with is not equivalent. I agree that just banning % characters in the executable name is a good solution ... we could re-discuss if/when the Mantis report appears 🙂

One other might be to do the ocamlbuild-like API (describing it that way makes me shudder a little!) but also have a simpler function. Say quote_command_full which has all the constructors and also let quote_command c = quote_command_full [C c]?

@dra27
Copy link
Copy Markdown
Member

dra27 commented Dec 4, 2017

Actually, having run a search for any filenames containing % on my local machines, there is one conceivable place where this could happen, which is poorly http escaping resulting in My%20Program%20with%20spaces.exe (not necessarily a reason to support this, but you could conceivably have a process which attempts to run Sys.argv.(0) and doesn't expect that to fail because of a supposed illegal character)

@xavierleroy
Copy link
Copy Markdown
Contributor Author

My observations concerning %% quoting were done over an ssh connection, so probably cmd.exe was thinking it was running a batch file. With a true console or with cmd /c the quoting doesn't work for me either.

@mshinwell
Copy link
Copy Markdown
Contributor

@dra27 @xavierleroy How are we doing on this one -- could it make 4.07?

@alainfrisch
Copy link
Copy Markdown
Contributor

FWIW, I've had recently to generate .bat scripts of the form "SET foo=bar", with bar coming from an OCaml string, with the intent of getting the variable defined to that string. Hence the need to escape. And I observed that % characters in the OCaml string need to be escaped as %%, and ^% does not work. I don't know if there is specific to the SET command, and if the same rules apply to command-line arguments for calling external programs from cmd, but I thought I would report this finding, just in case.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

xavierleroy commented Mar 15, 2018

Thanks for the ping! I swapped out a lot of the background knowledge (that's a known strategy to survive Windows-induced trauma), so I need to swap it in back... But I think we can get something to work, we just have to choose between two possible APIs:

  • An ocamlbuild-like API where each word in the command line is tagged with a constructor meaning "file name to be interpreted by cmd.exe" or "argument to be interpreted by the C runtime" or "special character, don't touch", e.g.
   quote_command [ File "ls"; Arg "C:\\temp"; Verbatim ">"; File "log.txt" ]
  • A simplified API where we have optional arguments for stdin, stdout and perhaps stderr redirection to files, and otherwise the first word of the command line is the executable and everything else is treated as argument, e.g.
  quote_command ~out:"log.txt" [ "ls"; "C:\\temp" ]

@ygrek
Copy link
Copy Markdown
Contributor

ygrek commented Mar 30, 2018

we can have both, simplified api being a wrapper for precise one?

@damiendoligez
Copy link
Copy Markdown
Member

@alainfrisch

I don't know if there is specific to the SET command

Unfortunately, the SET command has its own quoting rules.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

I closed this PR by accident. Actually I meant to push a new implementation of Filename.quote_command that addresses most if not all of the problems discussed above.

The API is now

   Filename.quote_command cmd ?input ?output args

with the command cmd separated from the list of arguments args, and < and > redirections expressed with optional arguments ?input and ?output, then properly encoded by Filename.quote_command.

In a nutshell, args is escaped as before, first using Filename.quote, then (under Win32) using ^ to quote characters that are special for cmd.exe.

The novelty is that, under Win32, cmd and the optional redirection files are checked for the absence of double-quote and percent characters (which we have problems quoting!), then quoted using double quotes.

Finally, for good measure, the whole command line is wrapped in double quotes to please cmd.exe.

@xavierleroy xavierleroy reopened this Jun 8, 2018
@damiendoligez damiendoligez self-assigned this Jun 11, 2018
@dra27
Copy link
Copy Markdown
Member

dra27 commented Jun 30, 2018

I will try to find some time for a proper review again of this... in the meantime, there are some check-typo nits:

./otherlibs/unix/unix.mli:580.81: [long-line] line is over 80 columns
./otherlibs/unix/unix.mli:787.58: [white-at-eol] whitespace at end of line
./stdlib/filename.mli:192.1: [white-at-eof] empty line(s) at EOF
./stdlib/sys.mli:78.11: [white-at-eol] whitespace at end of line
./stdlib/sys.mli:83.51: [white-at-eol] whitespace at end of line
./testsuite/tests/lib-filename/quotecommand.ml:79.1: [white-at-eof] empty line(s) at EOF

Copy link
Copy Markdown
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, although I can't say I master all the convolutions of the Windows quoting conventions, so let's wait for @dra27's review.

A question though: is it on purpose that you don't allow redirecting standard error output?

@xavierleroy
Copy link
Copy Markdown
Contributor Author

is it on purpose that you don't allow redirecting standard error output?

I vaguely remembered that Windows's cmd.exe would not support stderr redirection. I was very wrong: https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-xp/bb490982(v=technet.10)

I can certainly add a ?error argument. I'm less sure about redirecting both stdout and stderr to the same file: it is useful in practice but I'm not sure what API is best.

Also, and independently: the optional arguments could be labeled stdin, stdout and stderr instead of input, output, and error. Opinions welcome.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Rebased on trunk and fixed typo.

@damiendoligez
Copy link
Copy Markdown
Member

I vote for stdin/stdout/stderr.

For redirecting both, I see two solutions:

  1. Yet another optional argument, stdout_and_stderr, but then you have to raise Invalid_argument if it is used at the same time as stdout or stderr.
  2. Detect that stdout and stderr are the same string and do the right thing. It doesn't make much sense to redirect both independently to the same file anyway.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Thanks for the review. I added stderr redirection following @damiendoligez's approach #2 above, and also renamed the optional arguments.

The PR was rebased and neatly squashed. From my viewpoint, it is ready for merging, and I will not develop it further. There's a long URL in a comment that causes check-typo to throw a fit, but I'll let the check-typo masters handle this.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 19, 2018

@xavierleroy: check-typo lets you write long URLs as long as there is nothing after them on the line. To get rid of the alarm, just put the terminating *) bracket on a new line.

I initially thought about relaxing this restriction, but I see why it makes sense. For example, someone using a screen reader can decide to skip to the next line when encountering an URL, but they must be able to assume that there won't remain any useful text after the URL on the same line.

if any are quoted using {!Filename.quote}, then concatenated.
Under Win32, additional quoting is performed as required by the
[cmd.exe] shell that is called by {!Sys.command}.
*)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make it explicit that quote_command can raise (and in which condition)?

Also, I'm not fully convinced that Invalid_arg should be raised, and not Failure. Invalid_arg reports programming errors, and should not usually be caught; in this case, this would mean that the caller needs to validate the redirection filenames, since they could easily be derived from some program inputs. The validation is not overly difficult, but not straightforward and it is platform dependent. In practice, programmers will not implement the validation.

If one decides to raise Failure, one could go away with a more informal contract with the client ("The function will raise Failure if some arguments cannot be properly escaped on the current platform"), and programmers might actually decide to catch the exception.

@dbuenzli : as our champion for Failure vs Invalid_arg debate, feel free to comment.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be Failure rather than Invalid_argument because it's a failure of the library itself rather than the program: in principle, every file name should be quotable, it's just that we don't (yet) know how to do it on Windows.

@damiendoligez
Copy link
Copy Markdown
Member

Rebased and changed the exception to Failure. Ready for formal review.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

A review and a decision, please? This would close #6107 among others.

xavierleroy and others added 3 commits May 21, 2019 15:07
This function takes care of quoting the command and its arguments
so that they are correctly parsed by the system shell
(/bin/sh for Unix, cmd.exe for Win32).

Redirections for std input (< file) and std output (> file) and
std error (2> file) can also be specified as optional arguments.
A 2>&1 redirection is used if stdout and stderr are redirected to the
same file.

The result is a string that can be passed directly to Sys.command or
to the Unix functions that expect shell command lines.
@damiendoligez
Copy link
Copy Markdown
Member

Rebased again. @dra27 could you review the Windows-specific parts? The rest is OK for me.

Copy link
Copy Markdown
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to get this nice PR merged, which helps solve a very real and tricky problem when trying to write platform-independent code in OCaml.

I looked over the code and it looks fine to me; the intricacies of quoting on Windows means that it may not be perfect (and indeed, the docstring of Filename.quote_command makes it clear that it is a best-effort solution). On the basis of that, I am approving the PR and move that we merge it.

@damiendoligez damiendoligez merged commit a8daa89 into ocaml:trunk Jun 25, 2019
@damiendoligez
Copy link
Copy Markdown
Member

Thanks everyone!

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 7, 2020

#9289 proposes a change of quote_command behavior for arguments that themselves contain double-quotes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants