Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nu-cli/completions: better fix for files with special characters #5254

Merged
merged 19 commits into from
Apr 28, 2022

Conversation

herlon214
Copy link
Contributor

@herlon214 herlon214 commented Apr 20, 2022

Description

Creating this PR for for discussing a better implementation for handling paths with special characters.

Related:

Also, created a function so that logic can be reused.

Tests

Make sure you've run and fixed any issues with these commands:

  • cargo fmt --all -- --check to check standard code formatting (cargo fmt --all applies these changes)
  • cargo clippy --all --all-features -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect to check that you're using the standard code style
  • cargo build; cargo test --all --all-features to check that all the tests pass

@herlon214
Copy link
Contributor Author

herlon214 commented Apr 20, 2022

I'm not sure if we should match other cases:

path.contains(' ') || path.contains('\\') || path.contains('"') || path.contains('`')

Let me know what you guys think about it.

@herlon214 herlon214 force-pushed the fix/completion-quote-general-case branch from 19115e4 to 8ec1f6e Compare April 20, 2022 15:34
@jansol
Copy link

jansol commented Apr 20, 2022

Since you are already splitting the function into a check and a separate escape replacement step I'd just iterate over over the string once and check against all characters that trigger quoting in one go rather than iterating over the string separately for each character like the current code does:

let needs_escape = path.chars().fold( false, |acc, x|
    acc
    || " \\\"'`".contains(x) // or write out the comparisons against each possible char
);

In fact that can still be sped up without compromising anything:

  • On a quick glance it looks like the code operates on a String, which is guaranteed to be UTF-8
  • UTF-8 consists essentially of two parts: good old 7-bit ASCII and a scheme where bytes with the leading bit set to 1 to signify that they are part of a multibyte sequence.
  • All characters that have a special meaning in the nu language are ASCII* so we can pretend multibyte sequences don't even exist for the purpose of determining whether to quote or not

*not sure how non-breaking, zero-width and fullwidth spaces are handled in nu

So let's iterate & compare individual bytes instead:

let needs_escape = path.bytes().fold( false, |acc, x|
    acc
    || b" \\\"'`".contains(x)
);

However if we don't mind being a bit more zealous with triggering the escape than perhaps necessary (may or may not be a useful thing, I'm not sure how much string interpolation etc happens after this part) we can further optimize this: ASCII is quite deliberate with its character layout in terms of bit patterns. If you look closely you'll notice that a lot of the usual special characters in shells have their top bits fixed to 0b0010.

let needs_escape = path.bytes().fold(false, |acc, x|
    acc
    || (x >> 4) == 0b0010
    || x == '\\' as u8 // 0x5c
    || x == '`' as u8 // 0x60
);

Or simply go with the lazy shotgun approach:

let needs_escape = path.bytes().fold( false, |acc, x|  acc || x.is_ascii_punctuation() );

For this particular case that is not the most efficient solution as it appears that simply checks each individual punctuation character instead of shortening the check based on the bit patterns. Not that it should really matter in this day and age. 😆

@jansol
Copy link

jansol commented Apr 20, 2022

In case you are not familiar with fold it is a functional way to accumulate some value over an iterator by invoking a function for each element, essentially:

fn fold(iterator, init, f) -> typeof(init) {
  accumulator = init
  for x in iterator {
    accumulator = f(accumulator, x)
  }
  return accumulator
}

@herlon214
Copy link
Contributor Author

Could you check now @jansol?

@herlon214
Copy link
Contributor Author

herlon214 commented Apr 21, 2022

The tests are failing, looks like it's escaping more than it should:

  left: `"/Users/runner/work/nushell/nushell/tests/fixtures/completions/nushell"`,
 right: `"\"/Users/runner/work/nushell/nushell/tests/fixtures/completions/nushell\""`', crates/nu-cli/tests/test_completions.rs:66:9

This case shouldn't be escaped, right?

It's due to the || (x >> 4) == 0b0010 the character / fits on it.

@jansol
Copy link

jansol commented Apr 21, 2022

Yeah that case does not really need quoting. This is what I meant with the comment about being unnecessarily trigger-happy with quoting.

@jansol
Copy link

jansol commented Apr 21, 2022

If you still want to quote the rest of the characters in that column, forward slash is easy to special case out of that condition:
|| ((x >> 4) == 0b0010 && x != b'/'

@herlon214
Copy link
Contributor Author

wow, nice one @jansol !
I added more tests and your implementation

@herlon214 herlon214 force-pushed the fix/completion-quote-general-case branch from a8ba019 to 95e689c Compare April 21, 2022 10:25
Copy link

@jansol jansol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trashes non-ascii characters in its current form.

crates/nu-cli/src/completions/file_completions.rs Outdated Show resolved Hide resolved
crates/nu-cli/src/completions/file_completions.rs Outdated Show resolved Hide resolved
@fdncred
Copy link
Collaborator

fdncred commented Apr 21, 2022

@jansol @herlon214 we may want to take a step back. let's try this. let's just get something working, land the pr, then think more about optimizing it. so, this pr should be for fixing the current situation. I'm also not very interested in unsafe code.

@herlon214
Copy link
Contributor Author

@fdncred I'm not sure iif we can reproduce the tests on windows, the CI is showing error:

  Error: error: invalid path 'tests/fixtures/completions/`needs_escape""'
  Error: The process 'C:\Program Files\Git\bin\git.exe' failed with exit code 128

@fdncred
Copy link
Collaborator

fdncred commented Apr 21, 2022

I'm on windows and I can't check out this pr. I get this

remote: Enumerating objects: 101, done.
remote: Counting objects: 100% (101/101), done.
remote: Compressing objects: 100% (33/33), done.
remote: Total 90 (delta 63), reused 81 (delta 54), pack-reused 0
Unpacking objects: 100% (90/90), 16.71 KiB | 63.00 KiB/s, done.
From https://github.com/nushell/nushell
 * [new ref]           refs/pull/5254/head -> fix/completion-quote-general-case
error: invalid path 'tests/fixtures/completions/`needs_escape""'
exit status 1

@fdncred
Copy link
Collaborator

fdncred commented Apr 21, 2022

ah, double quotes are forbidden in file names (and paths i assume)
https://stackoverflow.com/questions/1976007/what-characters-are-forbidden-in-windows-and-linux-directory-names
this is the same list but from microsoft
https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file

so we can't have a folder with "" in it like it is above.

@jansol
Copy link

jansol commented Apr 21, 2022

I would also add test cases for a path with multibyte characters that needs escaping and one that does not.

Let's say いろは'ファイル and いろは_ファイル

For the sake of cross-platform testing the escaping function should probably just get a unit test that just works with hardcoded paths instead of touching any files so we don't need to worry about the limitations of windows.

@herlon214
Copy link
Contributor Author

@jansol that's exactly what I'm doing right now.

@herlon214
Copy link
Contributor Author

@fdncred I don't think it will be possible to create the test for windows. Once the path will always be parsed AND the path string from Path will never be the same as the parsed.

@fdncred
Copy link
Collaborator

fdncred commented Apr 22, 2022

@herlon214 if you're trying to create a test to parse an path in windows that has illegal characters like double quotes, then a test isn't necessary because the OS will prevent that from happening and it'll never get to nushell to figure out what to do. so, in that case, it doesn't matter if there's a test.

@jansol
Copy link

jansol commented Apr 22, 2022

How does that interact with things like WSL? Linux allows almost all characters in filenames (except forward slash '/') after all, provided that the underlying filesystem supports them.

@fdncred
Copy link
Collaborator

fdncred commented Apr 22, 2022

How does that interact with things like WSL? Linux allows almost all characters in filenames (except forward slash '/') after all, provided that the underlying filesystem supports them.

it would work exactly the same way it work on any linux installation since WSL is, in fact, running in a linux vm on the ext4 file system.

@herlon214
Copy link
Contributor Author

@herlon214 if you're trying to create a test to parse an path in windows that has illegal characters like double quotes, then a test isn't necessary because the OS will prevent that from happening and it'll never get to nushell to figure out what to do. so, in that case, it doesn't matter if there's a test.

@fdncred no, no.
If we check the test that failed in the CI, it's basically because we're escaping the \ from the Windows paths.
So rust's Path will be D:\a\nushell\nushell\tests\fixtures\completions\test_a\ for this case the escape shouldn't be necessary (if OS == windows). The thing is: the escape function isn't aware of OS, so it escapes the path to "D:\\a\\nushell\\nushell\\tests\\fixtures\\completions\\test_a\\"

So we have:

  • D:\a\nushell\nushell\tests\fixtures\completions\test_a\ != "D:\\a\\nushell\\nushell\\tests\\fixtures\\completions\\test_a\\"

Therefore, the same will apply to all the other tests.

I think for windows it should only escape when the \:

  • Is not in the beginning of the file/folder
  • Is not in the end of the file/folder

And for other OS it doesn't need to be escaped, is that correct?

@jansol
Copy link

jansol commented Apr 22, 2022

If '\' is the control character used for escaping special characters in nu strings it should always be escaped.

@herlon214 herlon214 force-pushed the fix/completion-quote-general-case branch from 83b74d9 to c20aa31 Compare April 23, 2022 22:30
@fdncred
Copy link
Collaborator

fdncred commented Apr 27, 2022

@herlon214 can we make the test that's failing just work on *nix and bypass windows. i know we do that on a few tests.

@merelymyself
Copy link
Contributor

The test that’s failing isn’t failing when it’s called, but when it’s compiled. It’s trying to call a val that doesn’t exist.

Copy link
Contributor

@merelymyself merelymyself left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should resolve the errors.

crates/nu-cli/tests/test_completions.rs Outdated Show resolved Hide resolved
crates/nu-cli/tests/test_completions.rs Show resolved Hide resolved
@herlon214
Copy link
Contributor Author

@merelymyself yeah that's the latest error, this is easy to solve, but isn't the error we've been discussing. I was just trying to test some ideas and made that compile error

@herlon214
Copy link
Contributor Author

@fdncred yeah that should be doable, we can proceed with that at least for now.

@herlon214
Copy link
Contributor Author

@fdncred tests passing now, ignoring windows.

@fdncred
Copy link
Collaborator

fdncred commented Apr 28, 2022

ok, let's land this then. thanks to the team for all the hard work and effort.

@fdncred fdncred merged commit 3cf3329 into nushell:main Apr 28, 2022
@herlon214 herlon214 deleted the fix/completion-quote-general-case branch April 29, 2022 09:03
fdncred added a commit that referenced this pull request Apr 29, 2022
fdncred added a commit that referenced this pull request Apr 29, 2022
fennewald pushed a commit to fennewald/nushell that referenced this pull request Jun 27, 2022
…hell#5254)

* nu-cli/completions: fix paths with special chars

* add backticks

* fix replace

* added single quotes to check list

* check escape using fold

* fix clippy errors

* fix comment line

* fix conflicts

* change to vec

* skip sort checking

* removed invalid windows path

* remove comment

* added tests for escape function

* fix fn import

* fix fn import error

* test windows issue fix

* fix windows backslash path in the tests

* show expected path on error

* skip test for windows
fennewald pushed a commit to fennewald/nushell that referenced this pull request Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants