Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glob expansion should not happen for external commands when quoted #4631

Closed
lukexor opened this issue Feb 24, 2022 · 16 comments
Closed

Glob expansion should not happen for external commands when quoted #4631

lukexor opened this issue Feb 24, 2022 · 16 comments
Labels
🐛 bug Something isn't working glob-expansion Specific behavior around file-system globbing with regular commands or `glob` needs-design this feature requires design priority these issues are important Stale used for marking issues and prs as stale

Comments

@lukexor
Copy link

lukexor commented Feb 24, 2022

Describe the bug

Currently it's impossible to pass a glob-like string to an external command, as nushell expands globs even when quoted.

Some commands like the UI testing framework cypress allow passing in filter globs, I believe jest offers similar arguments.

How to reproduce

> npx run cypress --spec "cypress/integration/**/*"

(or any external command that takes a glob string)

Expected behavior

Expected:

npx run cypress "--spec" "cypress/integration/**/*"

Actual:

npx run cypress "--spec" "<file1>" "<file2>" "<file3>" "<etc>"

Screenshots

No response

Configuration

| key                | value                                    |
| ------------------ | ---------------------------------------- |
| version            | 0.59.0                                   |
| branch             | main                                     |
| short_commit       | 3c62d27c                                 |
| commit_hash        | 3c62d27c280dd7a7ba276760c239bfaa1b5080cd |
| commit_date        | 2022-02-24 19:02:28 +00:00               |
| build_os           | macos-x86_64                             |
| rust_version       | rustc 1.58.1 (db9d1b20b 2022-01-20)      |
| rust_channel       | stable-x86_64-apple-darwin               |
| cargo_version      | cargo 1.58.0 (f01b232bc 2022-01-19)      |
| pkg_version        | 0.59.0                                   |
| build_time         | 2022-02-24 12:20:29 -08:00               |
| build_rust_channel | release                                  |
| features           | default, which                           |
| installed_plugins  |                                          |

Additional context

Single quotes, double quotes, or unquoted all result in the same

@sophiajt sophiajt added good first issue Good for newcomers and removed good first issue Good for newcomers labels Feb 24, 2022
@sophiajt
Copy link
Contributor

@lukexor - thinking about this for a minute - currently using quotes is how we allow you to get to filepaths that have spaces in them. I wonder if we need a way of saying "do not expand this string"

@lukexor
Copy link
Author

lukexor commented Feb 24, 2022

I agree there but do we need to reinvent the wheel? quoting things is generally how you say "use this as-is" - even google search uses quotes for exact match

@sophiajt
Copy link
Contributor

@lukexor - I'd characterise it more as trying to find the right balance for the design. Some of the things we're trying to do don't have clear precedence.

Nushell is crossplatform with Windows, so we support both forward and backslash path separators. This means, for example, we can't use the escaping that some shells use for paths because they don't support Windows natively.

We moved to using quotes instead of escaping because, like you alluded to, quoting is intuitive and helps with a "all of this goes together" feel. That said, how do you have spaces and glob patterns without breaking crossplatform paths?

That's why I said earlier that we need to do a bit of design to make this possible. There just aren't enough shells that treat all platforms as equals to draw inspiration from (and are well-known enough that their solutions would be familiar to most users). Which leaves us trying to come up with something that is intuitive, readable, and predictable while also meeting our other requirements.

@lukexor
Copy link
Author

lukexor commented Feb 25, 2022

Thanks for the detail - that's certainly all fair. I know all too well trying to satisfy multiple conflicting requirements and I don't know enough about the history of workings with spaces in the parser - but are you saying with the current way spaces are being interpreted inside quoted strings that it's incompatible with ignoring globs?

I think my intuition on this would be that quoted arguments for external commands shouldn't need any special parsing right? Only nushell commands like ls, find, etc need to parse paths and spaces in strings unless there is some feature nushell is providing that I'm missing. I think the experience that parameters to externals aren't passed exactly as typed is a very unexpected behavior and it would seem impractical to try and account for all edge cases.

@sophiajt
Copy link
Contributor

sophiajt commented Feb 25, 2022

An example that people might expect to work would be something like: cat *.toml. Similar externals will expect that shell to do the expansion of the glob into separate arguments for them, and then they just iterate over the arguments they're given.

If the path to get to those files contain spaces, we have to be able to represent that path to the user and allow for the glob patterns inside it, so that we can also do the proper expansions on the shell side. Say something like:

> cat "my files/*.toml"

Nushell would need to expand that into full paths we can pass to the external, using whatever escaping makes sense on that platform. We don't currently have another way to write a filepath that has both a space and a glob (and hence need some way to represent when not to expand the glob)

I guess I'm thinking of it like a matrix of sorts, with the different options:

  • A glob that expands automatically
  • A glob looking thing that doesn't expand automatically
  • A path with a space and a glob that expands automatically
  • A path with a space and a glob that doesn't expand automatically

And some way of writing each one.

@onthebridgetonowhere onthebridgetonowhere added 🐛 bug Something isn't working needs-design this feature requires design labels Feb 25, 2022
@lukexor
Copy link
Author

lukexor commented Feb 25, 2022

I see. Certainly a quandry. Could we offer single quotes or backticks as alternatives - or perhaps something similar to the $"" syntax to indicate an escaped string? I'm curious how windows-based shells like powershell or cygwin do escaping for regexes - \ is a pretty standard escape character across several platforms so I'd be surprised there isn't a windows work-around to avoid pathname collisions.

I'm wary of any nushell-specific glob syntax given that external commands generally require a POSIX or Bash-style format, not to mention other regex parameters that include * but if we can figure out a decent quoting mechanism for indicating "leave this string alone" that would be great.

@lukexor
Copy link
Author

lukexor commented Feb 26, 2022

I just came across another case that was a bit awkward to figure out - there is a work-around, but it's not obvious how one would know to do that given years of using find in other shells:

> find . -name '*.txt' -exec cat {} \;
find: -exec: no terminating ";" or "+"

Workaround is to quote the semicolon instead of trying to escape it

> find . -name '*.txt' -exec cat {} ";"

@uasi
Copy link
Contributor

uasi commented Mar 27, 2022

FWIW, POSIX sh meets all of those requirements with somewhat awkward syntax:

  • A glob that expands automatically: *.toml
  • A glob looking thing that doesn't expand automatically: "*.toml"
  • A path with a space and a glob that expands automatically: "my files/"*".toml"
  • A path with a space and a glob that doesn't expand automatically: "my files/*.toml"

@sophiajt sophiajt added the priority these issues are important label Apr 30, 2022
@sophiajt
Copy link
Contributor

@uasi I think the one thing we'll need to figure out on the design side is how to handle spaces. Right now we use quotes to allow spaces. Unix-based shells use escaping, but that wouldn't work for Windows paths.

Some way that allows both for spaces and turning off the globs would fix this I think.

@uasi
Copy link
Contributor

uasi commented May 2, 2022

@jntrnr How about using the following rules?

  • bare, single-, or raw-quoted \ does not have special meaning
  • double-quoted \ escapes the following character
  • bare * performs glob matching
  • quoted * does not have special meaning
  • adjacent strings (bare or quoted) are concatenated

For example, a glob pattern that matches all TOML files in the C:\my files\*important*\ folder could be written as

  • "C:\\my files\\*important*"\*.toml,
  • 'C:\my files\*important*'\*.toml,
  • or even C:\"my files"\'*'important`*`\*.toml.

@uasi
Copy link
Contributor

uasi commented May 2, 2022

These are also what POSIX-compliant shells do, except for the first rule.

Note that the string concatenation rule would conflict with extended string literals such as b"bytestring" or r#"raw string"#.

@BatmanAoD
Copy link
Contributor

Note that the reason the @uasi's POSIX space-handling examples don't work has nothing to do with escaping; it's about how Nu decides whether to treat " and ' as the beginning of a quoted string or as a literal character.

POSIX (and related) shells let quoted strings begin or end without a space on either side. @uasi's example was "my files/"*".toml", but I have a habit of using this all the time for my Git commits, writing git commit -m"message" rather than git commit -m "message".

If Nu permitted this, it would provide an easy workaround for #5196 as well as for this issue.

@paulhey
Copy link

paulhey commented Jul 21, 2022

I've run into a similarly related issue trying to install a cargo package:

❯ cargo install pijul --version '~1.0.0-beta' --force
error: the `--version` provided, `/home/1.0.0-beta`, is not a valid semver version: cannot parse '/home/1.0.0-beta' as a semver

I also tried to escape the ~ character:

❯ cargo install pijul --version '\~1.0.0-beta' --force
error: the `--version` provided, `\~1.0.0-beta`, is not a valid semver version: cannot parse '\~1.0.0-beta' as a semver

Would it make sense to do something like PowerShell where single-quotes are string literals and double-quotes are able to be interpolated? Or like C# where you do @"..." for a verbatim string literal?

EDIT: For reference, this works under other shells (eg. fish, bash, etc)

@github-actions github-actions bot added the Stale used for marking issues and prs as stale label Jan 17, 2023
@paulhey
Copy link

paulhey commented Jun 28, 2023

This now works in nu 0.82.0:

❯ cargo install pijul --version '~1.0.0-beta' --locked
...

@BatmanAoD
Copy link
Contributor

It looks like 0.82.0 has completely different behavior with globbing now; it's entirely suppressed by either single or double quotes.

@sholderbach sholderbach added the glob-expansion Specific behavior around file-system globbing with regular commands or `glob` label Jul 9, 2023
@sholderbach
Copy link
Member

Thanks for the update @paulhey and @BatmanAoD

Closing as completed

IanManske pushed a commit that referenced this issue May 23, 2024
This PR is a complete rewrite of `run_external.rs`. The main goal of the
rewrite is improving readability, but it also fixes some bugs related to
argument handling and the PATH variable (fixes
#6011).

I'll discuss some technical details to make reviewing easier.

## Argument handling

Quoting arguments for external commands is hard. Like, *really* hard.
We've had more than a dozen issues and PRs dedicated to quoting
arguments (see Appendix) but the current implementation is still buggy.

Here's a demonstration of the buggy behavior:

```nu
let foo = "'bar'"
^touch $foo            # This creates a file named `bar`, but it should be `'bar'`
^touch ...[ "'bar'" ]  # Same
```

I'll describe how this PR deals with argument handling.

First, we'll introduce the concept of **bare strings**. Bare strings are
**string literals** that are either **unquoted** or **quoted by
backticks** [^1]. Strings within a list literal are NOT considered bare
strings, even if they are unquoted or quoted by backticks.

When a bare string is used as an argument to external process, we need
to perform tilde-expansion, glob-expansion, and inner-quotes-removal, in
that order. "Inner-quotes-removal" means transforming from
`--option="value"` into `--option=value`.

## `.bat` files and CMD built-ins

On Windows, `.bat` files and `.cmd` files are considered executable, but
they need `CMD.exe` as the interpreter. The Rust standard library
supports running `.bat` files directly and will spawn `CMD.exe` under
the hood (see
[documentation](https://doc.rust-lang.org/std/process/index.html#windows-argument-splitting)).
However, other extensions are not supported [^2].

Nushell also supports a selected number of CMD built-ins. The problem
with CMD is that it uses a different set of quoting rules. Correctly
quoting for CMD requires using
[Command::raw_arg()](https://doc.rust-lang.org/std/os/windows/process/trait.CommandExt.html#tymethod.raw_arg)
and manually quoting CMD special characters, on top of quoting from the
Nushell side. ~~I decided that this is too complex and chose to reject
special characters in CMD built-ins instead [^3]. Hopefully this will
not affact real-world use cases.~~ I've implemented escaping that works
reasonably well.

## `which-support` feature

The `which` crate is now a hard dependency of `nu-command`, making the
`which-support` feature essentially useless. The `which` crate is
already a hard dependency of `nu-cli`, and we should consider removing
the `which-support` feature entirely.

## Appendix

Here's a list of quoting-related issues and PRs in rough chronological
order.

* #4609
* #4631
* #4601
  * #5846
* #5978
  * #6014
* #6154
  * #6161
* #6399
  * #6420
  * #6426
* #6465
* #6559
  * #6560

[^1]: The idea that backtick-quoted strings act like bare strings was
introduced by Kubouch and briefly mentioned in [the language
reference](https://www.nushell.sh/lang-guide/chapters/strings_and_text.html#backtick-quotes).

[^2]: The documentation also said "running .bat scripts in this way may
be removed in the future and so should not be relied upon", which is
another reason to move away from this. But again, quoting for CMD is
hard.

[^3]: If anyone wants to try, the best resource I found on the topic is
[this](https://daviddeley.com/autohotkey/parameters/parameters.htm).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working glob-expansion Specific behavior around file-system globbing with regular commands or `glob` needs-design this feature requires design priority these issues are important Stale used for marking issues and prs as stale
Projects
None yet
Development

No branches or pull requests

8 participants