Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite run_external.rs #12921

Merged
merged 23 commits into from
May 23, 2024
Merged

Conversation

YizhePKU
Copy link
Contributor

@YizhePKU YizhePKU commented May 21, 2024

This PR is a complete rewrite of run_external.rs. The main goal of the rewrite is improving readability, but it also fixes some bugs related to argument handling and the PATH variable (fixes #6011).

I'll discuss some technical details to make reviewing easier.

Argument handling

Quoting arguments for external commands is hard. Like, really hard. We've had more than a dozen issues and PRs dedicated to quoting arguments (see Appendix) but the current implementation is still buggy.

Here's a demonstration of the buggy behavior:

let foo = "'bar'"
^touch $foo            # This creates a file named `bar`, but it should be `'bar'`
^touch ...[ "'bar'" ]  # Same

I'll describe how this PR deals with argument handling.

First, we'll introduce the concept of bare strings. Bare strings are string literals that are either unquoted or quoted by backticks 1. Strings within a list literal are NOT considered bare strings, even if they are unquoted or quoted by backticks.

When a bare string is used as an argument to external process, we need to perform tilde-expansion, glob-expansion, and inner-quotes-removal, in that order. "Inner-quotes-removal" means transforming from --option="value" into --option=value.

.bat files and CMD built-ins

On Windows, .bat files and .cmd files are considered executable, but they need CMD.exe as the interpreter. The Rust standard library supports running .bat files directly and will spawn CMD.exe under the hood (see documentation). However, other extensions are not supported 2.

Nushell also supports a selected number of CMD built-ins. The problem with CMD is that it uses a different set of quoting rules. Correctly quoting for CMD requires using Command::raw_arg() and manually quoting CMD special characters, on top of quoting from the Nushell side. I decided that this is too complex and chose to reject special characters in CMD built-ins instead 3. Hopefully this will not affact real-world use cases. I've implemented escaping that works reasonably well.

which-support feature

The which crate is now a hard dependency of nu-command, making the which-support feature essentially useless. The which crate is already a hard dependency of nu-cli, and we should consider removing the which-support feature entirely.

Appendix

Here's a list of quoting-related issues and PRs in rough chronological order.

Footnotes

  1. The idea that backtick-quoted strings act like bare strings was introduced by Kubouch and briefly mentioned in the language reference.

  2. The documentation also said "running .bat scripts in this way may be removed in the future and so should not be relied upon", which is another reason to move away from this. But again, quoting for CMD is hard.

  3. If anyone wants to try, the best resource I found on the topic is this.

@fdncred
Copy link
Collaborator

fdncred commented May 21, 2024

wow, complete rewrite indeed. thanks for this. i'm anxious to try it on windows.

i only had one real point of concern listed above, others may see other interesting changes. i really appreciate all the comments.

i'm wondering if it's a good idea to add some tests that specifically exercise the windows built-in commands, not that they run right but that we pass the arguments right to them. I do see some tests in ./crates/nu-command/tests/commands/run_external.rs but they're pretty basic.

@fdncred
Copy link
Collaborator

fdncred commented May 21, 2024

I'm also thinking it may be good to keep these comments somewhere
https://github.com/nushell/nushell/pull/12921/files#diff-0845dc2fd7fb9dbed776b0561cade84ae09f084410b3365681a9b9fd8adc7cabL198-L211 (from run-external.rs)

I went through the supported cmd.exe built-ins and seems like these are the "have-to-have" supported syntax.

  • getting help with any dos cmdline app - we should at least allow /?
  • mklink will definitely need /parameter, \, and " support since it's making links to files and folders
  • start will need /parameter, \, and " support but will only be called with ^start since we have our own start command, but it has a lot of parameters
  • the other supported built-ins don't really need parameters with odd syntax. you'll need /? on all of them but after that, it's nothing weird.

@YizhePKU
Copy link
Contributor Author

I thought we already escaped pipes with ^|.

The original implementation was broken beyond repair. Let me show you just how broken it was.

# All of these behaviors are wrong
^echo '^foo'           # foo
^echo 'foo bar'        # "foo bar"
^echo 'foo | bar'      # "foo ^| bar"
^echo '"foo'           # \"foo
^echo '\"foo'          # \\\"foo
^echo '&shutdown/s'    # (shuts down the PC)

@YizhePKU
Copy link
Contributor Author

YizhePKU commented May 21, 2024

/ and \ are NOT special characters in CMD (I added them to the list mostly by mistake). Rust sometimes trys to escape \, but that can be avoided by calling raw_arg() instead of arg().

mklink will definitely need /parameter, , and " support since it's making links to files and folders

" is not legal in Windows paths. In fact, I think the only place where " is needed is the "title" parameter of START, which explictly requires double quotes.

@fdncred
Copy link
Collaborator

fdncred commented May 21, 2024

" is not legal in Windows paths.

I meant that " is needed when doing something like this mklink /J "c:\Program Files" target

Copy link
Member

@IanManske IanManske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning up this code! Most of it looks good. Some comments:

tests/repl/test_known_external.rs Outdated Show resolved Hide resolved
crates/nu-command/src/system/run_external.rs Outdated Show resolved Hide resolved
crates/nu-command/src/system/run_external.rs Outdated Show resolved Hide resolved
crates/nu-command/src/system/run_external.rs Outdated Show resolved Hide resolved
crates/nu-command/src/system/run_external.rs Outdated Show resolved Hide resolved
crates/nu-command/src/system/run_external.rs Outdated Show resolved Hide resolved
@YizhePKU
Copy link
Contributor Author

YizhePKU commented May 21, 2024

Arguments to CMD internal commands are now "properly" escaped. It should work 99% of the time, and when it doesn't work (echo sometimes prints extra quotes) it shouldn't be anything crazy like shutting down your PC.

@YizhePKU
Copy link
Contributor Author

I'm also thinking it may be good to keep these comments somewhere

The part about "fallback path" is no longer accurate. One of the main goal of the rewrite is getting rid of all of fallback logic (#6011 (comment)). I've added the bits that are still relavant.

Copy link
Collaborator

@WindSoilder WindSoilder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The general implementation looks good to me, just left a comment

crates/nu-command/src/system/run_external.rs Outdated Show resolved Hide resolved
@YizhePKU
Copy link
Contributor Author

So far, the only remaining concern is quote-removal, discussed here. How much should we follow Bash's behavior? Should we require the existence of = or -- to trigger quote-removal? What about multiple quotes and nested quotes?

@fdncred
Copy link
Collaborator

fdncred commented May 22, 2024

I'd vote for whatever is most consistent. By that, I mean that, if possible, whatever we do here works the same on MacOS, Linux, and Windows.

The hard part about doing what is "least surprising" is that sometimes that is in the eye of the beholder. A hard-core Linux person may interpret "least surprising" in a way that a hard-core Windows user would've never considered. So, this is always a tricky question for cross platform software. If we're consistent, we can at least say, "it always does this, on every platform", and it's an easier rule to remember.

@IanManske
Copy link
Member

IanManske commented May 22, 2024

Maybe for now we should just match what we do for internal commands. Namely, we only do quote removal for long flags (--flag) with a =, and we split on the first = not in quotes. (E.g., --a=b=c => a = b=c.) It looks like we also only remove the outermost set of quotes.

@YizhePKU
Copy link
Contributor Author

we split on the first = not in quotes

What does "not in quotes" mean? Does the first = ever gets quoted?

@YizhePKU
Copy link
Contributor Author

Here's what I've implemented based on @IanManske's description. Does this look like what you want?

/// Transforms `--option="value"` into `--option=value`. `value` can be quoted
/// with double quotes, single quotes, or backticks. Only removes the outermost
/// pair of quotes after the equal sign.
fn remove_inner_quotes(arg: &str) -> Cow<'_, str> {
    // Check that `arg` is a long option.
    if !arg.starts_with("--") {
        return Cow::Borrowed(arg);
    }
    // Split `arg` on the first `=`.
    let splits: Vec<&str> = arg.splitn(2, '=').collect();
    let [option, value] = splits.as_slice() else {
        return Cow::Borrowed(arg);
    };
    // Remove the outermost pair of quotes from `value`.
    let value = remove_quotes(value);
    Cow::Owned(format!("{option}={value}"))
}

@IanManske
Copy link
Member

IanManske commented May 22, 2024

Yes thanks, that looks good.

What does "not in quotes" mean? Does the first = ever gets quoted?

I was thinking of something like --flag"=" or --flag"=""". This might be annoying to implement (e.g., --some""flag="value"). I think it would be fine for now to just check that the left hand side does not contain quotes. Unless you have another idea.

One last thing, you can use split_once instead of splitn to avoid the collect.

@YizhePKU
Copy link
Contributor Author

Thanks everyone for the feedback! I believe we can merge this PR now.

@IanManske
Copy link
Member

IanManske commented May 23, 2024

Thanks! Let's give this a try 🚀

@IanManske IanManske merged commit 6c64980 into nushell:main May 23, 2024
14 checks passed
@hustcer hustcer added this to the v0.94.0 milestone May 23, 2024
@YizhePKU YizhePKU deleted the rework-run-external-mk3 branch May 23, 2024 04:55
WindSoilder pushed a commit that referenced this pull request May 25, 2024
# Description
Instead of returning an error, this PR changes `expand_glob` in
`run_external.rs` to return the original string arg if glob creation
failed. This makes it so that, e.g.,
```nushell
^echo `[`
^echo `***`
```
no longer fail with a shell error. (This follows from #12921.)
@fdncred
Copy link
Collaborator

fdncred commented May 25, 2024

@YizhePKU I'm not asking for changes with this comment but wanted to put this "found code" here where Microsoft talks about parsing command line options. https://github.com/microsoft/sudo/blob/90edcaa04d9f58fa4702a75c034105bae2e49aa9/sudo/src/helpers.rs#L358 and https://github.com/microsoft/sudo/blob/90edcaa04d9f58fa4702a75c034105bae2e49aa9/sudo/src/run_handler.rs#L60

This is from their just published sudo command, which is written in rust. There are lots of interesting windows bits in this helper.rs file.

@fdncred
Copy link
Collaborator

fdncred commented May 29, 2024

@YizhePKU I just did a git bisect and found that this PR is what caused ~/some/path/exe in the command position, to stop working. Meaning, with that command line exe is not ran. It just gives an error "Command blah not found`. The core team has discussed this, and we think we need to fix this and issue a release patch.

@IanManske
Copy link
Member

IanManske commented May 29, 2024

To clarify, we like the new behavior where ^"~/some_exe" does not expand the tilde, but using bare words no longer expands the tilde which is causing the issue:

~/some_exe
`~/some_exe`
^~/some_exe
^`~/some_exe`

(And again, thanks for all of your work so far!)

@devyn
Copy link
Contributor

devyn commented May 30, 2024

I've found that I can restore the behaviour we used to have where tilde is expanded for the command name, but the bare word detection and quote removal just feels a bit strange to me. I think this is the fault of how nu-parser generates the external args and how those end up getting passed to run-external, but for example I find this to be a bit strange:

> let s = "\"foo\""
> ^echo $s
"foo"
> run-external echo $s
"foo"
> run-external echo "foo"
foo
> run-external echo "\"foo\""
foo

And note that that doesn't currently apply to the command. I think to be consistent for now I'll end up doing that

@fdncred
Copy link
Collaborator

fdncred commented May 30, 2024

commented on the quoting stuff here #13001 (comment)

devyn added a commit that referenced this pull request May 30, 2024
# Description

Fix a regression introduced by #12921, where tilde expansion was no
longer done on the external command name, breaking things like

```nushell
> ~/.cargo/bin/exa
```

This properly handles quoted strings, so they don't expand:

```nushell
> ^"~/.cargo/bin/exa"
Error: nu::shell::external_command

  × External command failed
   ╭─[entry #1:1:2]
 1 │ ^"~/.cargo/bin/exa"
   ·  ─────────┬────────
   ·           ╰── Command `~/.cargo/bin/exa` not found
   ╰────
  help: `~/.cargo/bin/exa` is neither a Nushell built-in or a known external command

```

This required a change to the parser, so the command name is also parsed
in the same way the arguments are - i.e. the quotes on the outside
remain in the expression. Hopefully that doesn't break anything else. 🤞

Fixes #13000. Should include in patch release 0.94.1

cc @YizhePKU

# User-Facing Changes
- Tilde expansion now works again for external commands
- The `command` of `run-external` will now have its quotes removed like
the other arguments if it is a literal string
- The parser is changed to include quotes in the command expression of
`ExternalCall` if they were present

# Tests + Formatting
I would like to add a regression test for this, but it's complicated
because we need a well-known binary within the home directory, which
just isn't a thing. We could drop one there, but that's kind of a bad
behavior for a test to do. I also considered changing the home directory
for the test, but that's so platform-specific - potentially could get it
working on specific platforms though. Changing `HOME` env on Linux
definitely works as far as tilde expansion works.

- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`
@fdncred
Copy link
Collaborator

fdncred commented May 31, 2024

Another thing potentially broken by this PR #13020

@kubouch
Copy link
Contributor

kubouch commented May 31, 2024

This PR also caused #13021

devyn added a commit that referenced this pull request Jun 20, 2024
…he parser (#13089)

# Description

We've had a lot of different issues and PRs related to arg handling with
externals since the rewrite of `run-external` in #12921:

- #12950
- #12955
- #13000
- #13001
- #13021
- #13027
- #13028
- #13073

Many of these are caused by the argument handling of external calls and
`run-external` being very special and involving the parser handing
quoted strings over to `run-external` so that it knows whether to expand
tildes and globs and so on. This is really unusual and also makes it
harder to use `run-external`, and also harder to understand it (and
probably is part of the reason why it was rewritten in the first place).

This PR moves a lot more of that work over to the parser, so that by the
time `run-external` gets it, it's dealing with much more normal Nushell
values. In particular:

- Unquoted strings are handled as globs with no expand
- The unescaped-but-quoted handling of strings was removed, and the
parser constructs normal looking strings instead, removing internal
quotes so that `run-external` doesn't have to do it
- Bare word interpolation is now supported and expansion is done in this
case
- Expressions typed as `Glob` containing `Expr::StringInterpolation` now
produce `Value::Glob` instead, with the quoted status from the expr
passed through so we know if it was a bare word
- Bare word interpolation for values typed as `glob` now possible, but
not implemented
- Because expansion is now triggered by `Value::Glob(_, false)` instead
of looking at the expr, externals now support glob types

# User-Facing Changes

- Bare word interpolation works for external command options, and
otherwise embedded in other strings:
  ```nushell
  ^echo --foo=(2 + 2) # prints --foo=4
  ^echo -foo=$"(2 + 2)" # prints -foo=4
  ^echo foo="(2 + 2)" # prints (no interpolation!) foo=(2 + 2)
  ^echo foo,(2 + 2),bar # prints foo,4,bar
  ```

- Bare word interpolation expands for external command head/args:
  ```nushell
  let name = "exa"
  ~/.cargo/bin/($name) # this works, and expands the tilde
  ^$"~/.cargo/bin/($name)" # this doesn't expand the tilde
  ^echo ~/($name)/* # this glob is expanded
  ^echo $"~/($name)/*" # this isn't expanded
  ```

- Ndots are now supported for the head of an external command
(`^.../foo` works)

- Glob values are now supported for head/args of an external command,
and expanded appropriately:
  ```nushell
  ^("~/.cargo/bin/exa" | into glob) # the tilde is expanded
  ^echo ("*.txt" | into glob) # this glob is expanded
  ```

- `run-external` now works more like any other command, without
expecting a special call convention
  for its args:
  ```nushell
  run-external echo "'foo'"
  # before PR: 'foo'
  # after PR:  foo
  run-external echo "*.txt"
  # before PR: (glob is expanded)
  # after PR:  *.txt
  ```

# Tests + Formatting
Lots of tests added and cleaned up. Some tests that weren't active on
Windows changed to use `nu --testbin cococo` so that they can work.
Added a test for Linux only to make sure tilde expansion of commands
works, because changing `HOME` there causes `~` to reliably change.

- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`

# After Submitting
- [ ] release notes: make sure to mention the new syntaxes that are
supported
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

changing Path environment variable after shell starts have no effect on windows
7 participants