Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow filesystem commands to access files with glob metachars in name #10694

Merged
merged 10 commits into from Oct 18, 2023

Conversation

bobhy
Copy link
Contributor

@bobhy bobhy commented Oct 12, 2023

(squashed version of #10557, clean commit history and review thread)

Fixes #10571, also potentially: #10364, #10211, #9558, #9310,

Description

Changes processing of arguments to filesystem commands that are source paths or globs.
Applies to cp, cp-old, mv, rm, du but not ls (because it uses a different globbing interface) or glob (because it uses a different globbing library).

The core of the change is to lookup the argument first as a file and only glob if it is not. That way,
a path containing glob metacharacters can be referenced without glob quoting, though it will have to be single quoted to avoid nushell parsing.

Before: A file path that looks like a glob is not matched by the glob specified as a (source) argument and takes some thinking about to access. You might say the glob pattern shadows a file with the same spelling.

> ls a*
╭───┬────────┬──────┬──────┬────────────────╮
│ # │  name  │ type │ size │    modified    │
├───┼────────┼──────┼──────┼────────────────┤
│ 0 │ a[bc]d │ file │  0 B │ 34 seconds ago │
│ 1 │ abd    │ file │  0 B │ now            │
│ 2 │ acd    │ file │  0 B │ now            │
╰───┴────────┴──────┴──────┴────────────────╯

> cp --verbose 'a[bc]d' dest
copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd
copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd

> ## Note -- a[bc]d *not* copied, and seemingly hard to access.
> cp --verbose 'a\[bc\]d' dest
Error:   × No matches found
   ╭─[entry #33:1:1]
 1 │ cp --verbose 'a\[bc\]d' dest
   ·              ─────┬────
   ·                   ╰── no matches found
   ╰────

> #.. but is accessible with enough glob quoting.
> cp --verbose 'a[[]bc[]]d' dest
copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d

Before_2: if file has glob metachars but isn't a valid pattern, user gets a confusing error:

> touch 'a[b'
> cp 'a[b' dest
Error:   × Pattern syntax error near position 30: invalid range pattern
   ╭─[entry #13:1:1]
 1 │ cp 'a[b' dest
   ·    ──┬──
   ·      ╰── invalid pattern
   ╰────

After: Args to cp, mv, etc. are tried first as literal files, and only as globs if not found to be files.

> cp --verbose 'a[bc]d' dest
copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d
> cp --verbose '[a][bc]d' dest
copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd
copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd

After_2: file with glob metachars but invalid pattern just works. (though Windows does not allow file name to contain *.).

> cp --verbose 'a[b' dest
copied /home/bobhy/src/rust/work/r4/a[b to /home/bobhy/src/rust/work/r4/dest/a[b

So, with this fix, a file shadows a glob pattern with the same spelling. If you have such a file and really want to use the glob pattern, you will have to glob quote some of the characters in the pattern. I think that's less confusing to the user: if ls shows a file with a weird name, s/he'll still be able to copy, rename or delete it.

User-Facing Changes

Could break some existing scripts. If user happened to have a file with a globbish name but was using a glob pattern with the same spelling, the new version will process the file and not expand the glob.

Tests + Formatting

After Submitting

remove deprecated MatchOptions::new() from  doc comments.
command args that might be path or glob;
fix 10571 by checking first for existing file before globbing.
@bobhy bobhy marked this pull request as draft October 12, 2023 08:42
@bobhy bobhy changed the title Allow filesystem commands to access files with glob metachars in name Allow filesystem commands to access files with glob metachars in name (v2) Oct 12, 2023
@fdncred
Copy link
Collaborator

fdncred commented Oct 12, 2023

I just ran toolkit check pr and it succeeds on this PR but if you run cargo test --workspace --profile ci --exclude nu_plugin_*, which is what our ci runs, this PR fails on Windows. We need to get toolkit.nu inline with CI I think.

I think the problem is that you're using an illegal character in Windows ?. https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file

@bobhy bobhy marked this pull request as ready for review October 12, 2023 22:34
@bobhy
Copy link
Contributor Author

bobhy commented Oct 12, 2023

Hoorah, tests all pass!
Please take a look and let me know what you think.

@fdncred
Copy link
Collaborator

fdncred commented Oct 13, 2023

I added some thoughts and context here #10364 (comment)

@amtoine amtoine changed the title Allow filesystem commands to access files with glob metachars in name (v2) Allow filesystem commands to access files with glob metachars in name Oct 14, 2023
@amtoine
Copy link
Member

amtoine commented Oct 14, 2023

@bobhy
when you say

Fixes #10571, also potentially: #10364, #10211, #9558, #9310

could you be bit more precise?
these issues are not super straighforward to understand and i'm not sure if this PR should close them or not when i look at the new tests you've added 😇

@bobhy
Copy link
Contributor Author

bobhy commented Oct 14, 2023

Well, I think the PR does close those issues, but it's up to the openers there to say. I left a comment in each, inviting the opener to take a look

@amtoine
Copy link
Member

amtoine commented Oct 14, 2023

Well, I think the PR does close those issues, but it's up to the openers there to say. I left a comment in each, inviting the opener to take a look

ooh right, i missed them, my bad 🙏

@bobhy
Copy link
Contributor Author

bobhy commented Oct 15, 2023

I just ran toolkit check pr and it succeeds on this PR but if you run cargo test --workspace --profile ci --exclude nu_plugin_*, which is what our ci runs, this PR fails on Windows. We need to get toolkit.nu inline with CI I think.

@fdncred
I agree toolkit.nu should match CI tests. Is there something checked into the project that defines the CI tests? I can't find it and am never sure I'm understanding what I see in CI run logs on github.

But, even it were perfectly aligned, toolkit check pr on linux is not going to detect a windows-only problem.

@fdncred
Copy link
Collaborator

fdncred commented Oct 15, 2023

@bobhy You need to look at the ci.yml.

For clippy, I currently don't like how this is separated because it makes it hard to diagnose when there are problems, but if you look here

CLIPPY_OPTIONS: "-D warnings -D clippy::unwrap_used"

and then here

run: cargo clippy --workspace ${{ matrix.flags }} --exclude nu_plugin_* -- $CLIPPY_OPTIONS

For tests here

run: cargo clippy --tests --workspace ${{ matrix.flags }} --exclude nu_plugin_* -- -D warnings

Plugins have their own section.

As it relates to Windows, you can definitely use toolkit check pr on Windows, but maybe you mean that the checks run in the environment you're currently using?

@fdncred fdncred merged commit 09b3dab into nushell:main Oct 18, 2023
20 checks passed
@fdncred
Copy link
Collaborator

fdncred commented Oct 18, 2023

Thanks @bobhy. Anxious to see how people like this.

@bobhy
Copy link
Contributor Author

bobhy commented Oct 18, 2023

Yah, me too. We'll never hear from the folks for whom it just does the "right" thing. I want to hear from the folks who say, "I typed rm foo and it deleted file foo, and that was the wrong thing to do"!

@fdncred
Copy link
Collaborator

fdncred commented Oct 18, 2023

The one thing that has always concerned me about this PR is metadata. Getting a file's metadata is notoriously slow. What concerns me is that now I think we're doing it multiple times. It's not like ls is super optimized but maybe something good to look at in the future is only calling the .metadata() method once per file for ls. The same holds true for cp, rm, mv, etc.

@bobhy
Copy link
Contributor Author

bobhy commented Oct 19, 2023

OK, but it's small potatoes. I'm doing 1 stat. If it succeeds, that's the only one. If fails, all these commands are going to expand the glob and do at least 1 stat per file they find (in nu_glob::fill_todo() to check for directory).

gaetschwartz pushed a commit to gaetschwartz/nushell that referenced this pull request Oct 20, 2023
…nushell#10694)

(squashed version of nushell#10557, clean commit history and review thread)

Fixes nushell#10571, also potentially: nushell#10364, nushell#10211, nushell#9558, nushell#9310,


# Description
Changes processing of arguments to filesystem commands that are source
paths or globs.
Applies to `cp, cp-old, mv, rm, du` but not `ls` (because it uses a
different globbing interface) or `glob` (because it uses a different
globbing library).

The core of the change is to lookup the argument first as a file and
only glob if it is not. That way,
a path containing glob metacharacters can be referenced without glob
quoting, though it will have to be single quoted to avoid nushell
parsing.

Before: A file path that looks like a glob is not matched by the glob
specified as a (source) argument and takes some thinking about to
access. You might say the glob pattern shadows a file with the same
spelling.
```
> ls a*
╭───┬────────┬──────┬──────┬────────────────╮
│ # │  name  │ type │ size │    modified    │
├───┼────────┼──────┼──────┼────────────────┤
│ 0 │ a[bc]d │ file │  0 B │ 34 seconds ago │
│ 1 │ abd    │ file │  0 B │ now            │
│ 2 │ acd    │ file │  0 B │ now            │
╰───┴────────┴──────┴──────┴────────────────╯

> cp --verbose 'a[bc]d' dest
copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd
copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd

> ## Note -- a[bc]d *not* copied, and seemingly hard to access.
> cp --verbose 'a\[bc\]d' dest
Error:   × No matches found
   ╭─[entry nushell#33:1:1]
 1 │ cp --verbose 'a\[bc\]d' dest
   ·              ─────┬────
   ·                   ╰── no matches found
   ╰────

> #.. but is accessible with enough glob quoting.
> cp --verbose 'a[[]bc[]]d' dest
copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d
```
Before_2: if file has glob metachars but isn't a valid pattern, user
gets a confusing error:

```
> touch 'a[b'
> cp 'a[b' dest
Error:   × Pattern syntax error near position 30: invalid range pattern
   ╭─[entry nushell#13:1:1]
 1 │ cp 'a[b' dest
   ·    ──┬──
   ·      ╰── invalid pattern
   ╰────
```

After: Args to cp, mv, etc. are tried first as literal files, and only
as globs if not found to be files.

```
> cp --verbose 'a[bc]d' dest
copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d
> cp --verbose '[a][bc]d' dest
copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd
copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd
```
After_2: file with glob metachars but invalid pattern just works.
(though Windows does not allow file name to contain `*`.).

```
> cp --verbose 'a[b' dest
copied /home/bobhy/src/rust/work/r4/a[b to /home/bobhy/src/rust/work/r4/dest/a[b
```

So, with this fix, a file shadows a glob pattern with the same spelling.
If you have such a file and really want to use the glob pattern, you
will have to glob quote some of the characters in the pattern. I think
that's less confusing to the user: if ls shows a file with a weird name,
s/he'll still be able to copy, rename or delete it.

# User-Facing Changes
Could break some existing scripts. If user happened to have a file with
a globbish name but was using a glob pattern with the same spelling, the
new version will process the file and not expand the glob.

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use std testing; testing run-tests --path
crates/nu-std"` to run the tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->

---------

Co-authored-by: Darren Schroeder <343840+fdncred@users.noreply.github.com>
hardfau1t pushed a commit to hardfau1t/nushell that referenced this pull request Dec 14, 2023
…nushell#10694)

(squashed version of nushell#10557, clean commit history and review thread)

Fixes nushell#10571, also potentially: nushell#10364, nushell#10211, nushell#9558, nushell#9310,


# Description
Changes processing of arguments to filesystem commands that are source
paths or globs.
Applies to `cp, cp-old, mv, rm, du` but not `ls` (because it uses a
different globbing interface) or `glob` (because it uses a different
globbing library).

The core of the change is to lookup the argument first as a file and
only glob if it is not. That way,
a path containing glob metacharacters can be referenced without glob
quoting, though it will have to be single quoted to avoid nushell
parsing.

Before: A file path that looks like a glob is not matched by the glob
specified as a (source) argument and takes some thinking about to
access. You might say the glob pattern shadows a file with the same
spelling.
```
> ls a*
╭───┬────────┬──────┬──────┬────────────────╮
│ # │  name  │ type │ size │    modified    │
├───┼────────┼──────┼──────┼────────────────┤
│ 0 │ a[bc]d │ file │  0 B │ 34 seconds ago │
│ 1 │ abd    │ file │  0 B │ now            │
│ 2 │ acd    │ file │  0 B │ now            │
╰───┴────────┴──────┴──────┴────────────────╯

> cp --verbose 'a[bc]d' dest
copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd
copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd

> ## Note -- a[bc]d *not* copied, and seemingly hard to access.
> cp --verbose 'a\[bc\]d' dest
Error:   × No matches found
   ╭─[entry nushell#33:1:1]
 1 │ cp --verbose 'a\[bc\]d' dest
   ·              ─────┬────
   ·                   ╰── no matches found
   ╰────

> #.. but is accessible with enough glob quoting.
> cp --verbose 'a[[]bc[]]d' dest
copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d
```
Before_2: if file has glob metachars but isn't a valid pattern, user
gets a confusing error:

```
> touch 'a[b'
> cp 'a[b' dest
Error:   × Pattern syntax error near position 30: invalid range pattern
   ╭─[entry nushell#13:1:1]
 1 │ cp 'a[b' dest
   ·    ──┬──
   ·      ╰── invalid pattern
   ╰────
```

After: Args to cp, mv, etc. are tried first as literal files, and only
as globs if not found to be files.

```
> cp --verbose 'a[bc]d' dest
copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d
> cp --verbose '[a][bc]d' dest
copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd
copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd
```
After_2: file with glob metachars but invalid pattern just works.
(though Windows does not allow file name to contain `*`.).

```
> cp --verbose 'a[b' dest
copied /home/bobhy/src/rust/work/r4/a[b to /home/bobhy/src/rust/work/r4/dest/a[b
```

So, with this fix, a file shadows a glob pattern with the same spelling.
If you have such a file and really want to use the glob pattern, you
will have to glob quote some of the characters in the pattern. I think
that's less confusing to the user: if ls shows a file with a weird name,
s/he'll still be able to copy, rename or delete it.

# User-Facing Changes
Could break some existing scripts. If user happened to have a file with
a globbish name but was using a glob pattern with the same spelling, the
new version will process the file and not expand the glob.

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use std testing; testing run-tests --path
crates/nu-std"` to run the tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->

---------

Co-authored-by: Darren Schroeder <343840+fdncred@users.noreply.github.com>
}

// user wasn't referring to a specific thing in filesystem, try to glob it.
match glob_with_parent(&pattern.item, options, cwd) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @bobhy
Sorry for a simple question, why are you using glob_with_parent rather than glob_with?

WindSoilder added a commit that referenced this pull request Jan 26, 2024
…`du` commands (#11621)

# Description
This pr is a follow up to
[#11569](#11569 (comment))
> Revert the logic in #10694 and
apply the logic in this pr to mv, cp, rv will require a larger change, I
need to think how to achieve the bahavior

And sorry @bobhy for reverting some of your changes.

This pr is going to unify glob behavior on the given commands:
* open
* rm
* cp-old
* mv
* umv
* cp
* du

So they have the same behavior to `ls`, which is:
If given parameter is quoted by single quote(`'`) or double quote(`"`),
don't auto-expand the glob pattern. If not quoted, auto-expand the glob
pattern.

Fixes: #9558  Fixes: #10211 Fixes: #9310 Fixes: #10364 

# TODO
But there is one thing remains: if we give a variable to the command, it
will always auto-expand the glob pattern, e.g:
```nushell
let path = "a[123]b"
rm $path
```
I don't think it's expected. But I also think user might want to
auto-expand the glob pattern in variables.

So I'll introduce a new command called `glob escape`, then if user
doesn't want to auto-expand the glob pattern, he can just do this: `rm
($path | glob escape)`

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

# Tests + Formatting
Done

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->

## NOTE
This pr changes the semantic of `GlobPattern`, before this pr, it will
`expand path` after evaluated, this makes `nu_engine::glob_from` have no
chance to glob things right if a path contains glob pattern.

e.g: [#9310
](#9310 (comment))
#10211

I think changing the semantic is fine, because it makes glob works if
path contains something like '*'.

It maybe a breaking change if a custom command's argument are annotated
by `: glob`.
dmatos2012 pushed a commit to dmatos2012/nushell that referenced this pull request Feb 20, 2024
…`du` commands (nushell#11621)

# Description
This pr is a follow up to
[nushell#11569](nushell#11569 (comment))
> Revert the logic in nushell#10694 and
apply the logic in this pr to mv, cp, rv will require a larger change, I
need to think how to achieve the bahavior

And sorry @bobhy for reverting some of your changes.

This pr is going to unify glob behavior on the given commands:
* open
* rm
* cp-old
* mv
* umv
* cp
* du

So they have the same behavior to `ls`, which is:
If given parameter is quoted by single quote(`'`) or double quote(`"`),
don't auto-expand the glob pattern. If not quoted, auto-expand the glob
pattern.

Fixes: nushell#9558  Fixes: nushell#10211 Fixes: nushell#9310 Fixes: nushell#10364 

# TODO
But there is one thing remains: if we give a variable to the command, it
will always auto-expand the glob pattern, e.g:
```nushell
let path = "a[123]b"
rm $path
```
I don't think it's expected. But I also think user might want to
auto-expand the glob pattern in variables.

So I'll introduce a new command called `glob escape`, then if user
doesn't want to auto-expand the glob pattern, he can just do this: `rm
($path | glob escape)`

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

# Tests + Formatting
Done

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->

## NOTE
This pr changes the semantic of `GlobPattern`, before this pr, it will
`expand path` after evaluated, this makes `nu_engine::glob_from` have no
chance to glob things right if a path contains glob pattern.

e.g: [nushell#9310
](nushell#9310 (comment))
nushell#10211

I think changing the semantic is fine, because it makes glob works if
path contains something like '*'.

It maybe a breaking change if a custom command's argument are annotated
by `: glob`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants