Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index expansion operators can be escaped when unquoted #7969

Open
kopischke opened this issue May 3, 2021 · 11 comments
Open

Index expansion operators can be escaped when unquoted #7969

kopischke opened this issue May 3, 2021 · 11 comments
Labels
bug Something that's not working as intended
Milestone

Comments

@kopischke
Copy link

kopischke commented May 3, 2021

Running fish version 3.2.2 (installed via Homebrew) on macOS 11.3 in the stock Terminal.app. I tried fish without third-party customisations by executing sh -c 'env HOME=$(mktemp -d) fish' and checked whether it affected the behaviour I am reporting (it did not).


While exploring some finer points of the fish syntax (still working on that fish syntax plugin for the Nova editor – believe me, nobody’s sorrier than me to keep on coming up with these issues) I have run into a peculiarity of index range expansion I am pretty sure should be considered a bug. Specifically, the index range expansion operators [ and ] can be character escaped if the index range expansion is unquoted:

set range (seq 3)
echo $foo[1]        # => 1
echo $foo\1331\135  # => 1

The second echo statement’s output is both inconsistent with how all other operator-ish characters, paired or not, are treated by fish, and with user expectations as to what escaping should achieve (for instance, echo foo\133bar will output foo[bar, escaping the unpaired bracket syntax error).

Am I correct in assuming this is not intended behaviour? I could model this in my syntax plugin if I must, but this screams “bug” at me …

@zanchey zanchey added the bug Something that's not working as intended label May 3, 2021
@zanchey zanchey added this to the fish-future milestone May 3, 2021
@ridiculousfish
Copy link
Member

Nice find, also affects escaped chars:

set foo 1 2 3
echo $fo\157    # 1 2 3

@ridiculousfish
Copy link
Member

This is mercifully easy to fix - we can just prepend INTERNAL_SEPARATOR like we do for quotes.

@ridiculousfish
Copy link
Member

Fixed as 555af37. Thanks for filing!

@zanchey zanchey modified the milestones: fish-future, fish 3.3.0 May 6, 2021
@kopischke
Copy link
Author

kopischke commented May 6, 2021

@ridiculousfish from the point of view of someone working on a syntax model for fish, this is fantastic, as it hugely simplifies parsing variable expansion. However, I am wondering about a few points related to this change:

  • What about valid variable identifier characters that cannot easily be input as literals in the user’s locale? After all, the valid character set is a pretty huge Unicode subset. I had assumed the idea in allowing escapes was to allow including such characters as Unicode escapes …
  • What about variable definitions? fish allows the full range of escapes in variable names when defined, i.e. in set expressions and for … in loops (but, interestingly, not in in variable overrides à la foo=bar cmd $foo).

@ridiculousfish
Copy link
Member

  • What about valid variable identifier characters that cannot easily be input as literals in the user’s locale? After all, the valid character set is a pretty huge Unicode subset. I had assumed the idea in allowing escapes was to allow including such characters as Unicode escapes …

This is a good theoretical point; in practical terms, I'm not sure anyone would define a non-ASCII variable name. Maybe there's a good reason I'm missing.

  • What about variable definitions? fish allows the full range of escapes in variable names when defined, i.e. in set expressions and for … in loops (but, interestingly, not in in variable overrides à la foo=bar cmd $foo).

Yeah, a set command will undergo normal expansion before executing the builtin. For example set "foo" bar is fine; the quotes get removed as normal. There's no special behavior there.

The variable-assignment style like foo=bar is an interesting case, but I think it is correct to not allow escapes in the variable name. That syntax always struck me as a little funny since it overlaps with valid command names.

There's a practical consideration, which is when constructing the ast we don't want to go through the full set of expansions. For example we recognize function --help but not function '--help' for speed and simplicity.

@kopischke
Copy link
Author

kopischke commented May 11, 2021

I'm not sure anyone would define a non-ASCII variable name. Maybe there's a good reason I'm missing.

How quintessentially American a thing to say. As a non English speaker, I have always felt that fish not adhering to the philosophy goes a long way towards filling “friendly” with life …

But assuming the intent is not to reduce the valid variable character set to [a-zA-Z0-9_](that would be a very unfriendly change), my point is that the current solution has traded in a benign, if partly undefined, behaviour for heightened complexity. Instead of “you can / cannot use character escapes in variable names, period“, which would be ideal consistency-wise, or of “you can use character escapes in variable names, except in overrides” (the state pre commit), we now have “you cannot use character escapes in variable names in expansions, but you can when setting variables, unless it’s in an override“). User experience wise, even if you ignore the fact that set accepts character escapes in lieu of indexing brackets, I find that hard to consider an improvement.

The variable-assignment style like foo=bar is an interesting case, but I think it is correct to not allow escapes in the variable name. That syntax always struck me as a little funny since it overlaps with valid command names.

Well, with invalid command names, actually: fish will complain about a naked foo=bar being a POSIX shell-ism and pointedly note you should be using set.

There's a practical consideration, which is when constructing the ast we don't want to go through the full set of expansions. For example we recognize function --help but not function '--help' for speed and simplicity.

Duly noted, and can’t I relate to that, but I have to point out that, at the same time, fish has the most, ah, expansive expansion rules for keywords of any shell I know, so 'func'ti\157n --help will work. I understand these are processed at different points in the parser, but honestly, if I had to explain where escaping works in fish, “it does everywhere, except where it doesn’t“ is the best I could do, and I have spent the last months so deeply ensconced in fish syntax that I’m starting to dream fish code.

Anyway, I have managed to correctly handle all the nuances of brace expansion versus literal brace usage and The Case of the Amazing Illegal Opening Bracket in my extension, so changing the parsing for variable expansion is no biggie; I just wanted to be sure the tradeoffs are considered acceptable enough for this to stay in fish.

@faho
Copy link
Member

faho commented May 12, 2021

How quintessentially American a thing to say.

This seems rather condescending, and I would prefer not to have that sort of talk here.

Note also that I am very much not american, and I agree with @ridiculousfish.

You should be naming your variables in ASCII, because that makes it much easier to work with other people and because that removes the considerations of locale from variable names. To be honest iswalnum is awful to use here because it relies on locale, which means set -l ⓣ 22; set -l breaks with LC_ALL=C. I'm also pretty sure it depends on the OS' supported unicode version. We should probably stop using it, and that basically means restricting variable names further.

Well, with invalid command names, actually

A valid command name is any valid filename. You can call a file "foo=bar" and make it executable. You won't be able to call it via $PATH lookup, because it overlaps with "var=val" syntax, but that's _fish's point.

@kopischke
Copy link
Author

kopischke commented May 12, 2021

How quintessentially American a thing to say.

This seems rather condescending, and I would prefer not to have that sort of talk here.

No condescension intended, and I sincerely apologise to @ridiculousfish for my flippancy. Believe it or not, what I was actually trying to do is keep my cool, because decades of (almost invariably American) developers bemusedly wondering why anyone would want more than the character set that works for them have worn my nerves paper thin and reduced my willingness to explain why to nil. My choice of reaction obviously backfired, so, again: my apologies. I’m not here to pick a fight.

You should be naming your variables in ASCII, because that makes it much easier to work with other people and because that removes the considerations of locale from variable names.

I’m afraid we will have to disagree on that one: most of my scripts are for my personal use, so portability (if there even is such a thing in the shell universe) is not a consideration, and I haven't used a locale that was not UTF-8 compatible for at least 15 years. Under these conditions, I prefer expressiveness, which means, among other things, using my own language for identifiers. I’m pretty sure I am not the only one, as I have come across localised identifiers in other people’s scripts and snippets; not everybody is fluent in English, and not everybody values portability over expressiveness. In 2021, in the age of ubiquitous Unicode, trying to accommodate this should be a given.

To be honest iswalnum is awful to use here because it relies on locale, which means set -l ⓣ 22; set -l breaks with LC_ALL=C. I'm also pretty sure it depends on the OS' supported unicode version. We should probably stop using it, and that basically means restricting variable names further.

No quibbles with the assessment of iswalnum – it is pretty horrible. However, that is a C++ idiosyncrasy, and as such I’d qualify it as technical debt; allowing identifier characters beyond those included in a standard from 1963 is not a design flaw (in 1963, I’d be hacking this into a teletype, except there’d be no point, because ARPANET would still be a glimmer in the eye of its inventors). Let’s not throw out the baby with the bath water.

Well, with invalid command names, actually

A valid command name is any valid filename. You can call a file "foo=bar" and make it executable. You won't be able to call it via $PATH lookup, because it overlaps with "var=val" syntax, but that's _fish's point.

I’ll grant you that pointing this out in my previous comment was needlessly flippant, too, and I am quite willing to apologise a third time, should @ridiculousfish wish for it. As to the matter itself, it boils down to my poor choice of words.

What I meant by “invalid” was not that foo=bar is an invalid command name as such, but that, without quoting or escaping the equal sign, it is invalid in command position. As it happens, it has been so for quite some time, independently of the override syntax: parse_error_bare_variable_assignment, the error triggered by trying to execute foo=bar, predates #6287, which introduced the variable override feature (that PR also disabled the error handling, requiring its restoration in f6a6251). To this day, the error message fails to account for the change, and only mentions one really should be using set.

But that is a moot point. As I said: I’m not here to pick a fight; I’ll try to pick my words more carefully next time, instead.

@krobelus
Copy link
Member

ok so unicode characters are already valid, and escapes don't seem really necessary here so the current state is fine IMO

To this day, the error message fails to account for the change, and only mentions one really should be using set.

yeah, because the equivalent to bash's a=b (not followed by a command) is fish's set a b.
We can't tell if the user actually wanted to use variable overrides. Also set --help does include an example with overrides.

@kopischke
Copy link
Author

kopischke commented May 13, 2021

To this day, the error message fails to account for the change, and only mentions one really should be using set.

yeah, because the equivalent to bash's a=b (not followed by a command) is fish's set a b.

We can't tell if the user actually wanted to use variable overrides. Also set --help does include an example with overrides.

Of course not, but changing the message to something along the lines of “Variable assignments with '=' are only supported when followed by a command. In fish, please use set a=b.” might be clearer and invite discovery of the override feature (personally, I would never have found it through that error message. I read up on fish specific syntax when I switched, and what I assumed a=b would do was, actually, to call a command with that name).

@faho faho reopened this Jun 10, 2021
@faho
Copy link
Member

faho commented Jun 10, 2021

This was reverted in 7059eaa because of #7980.

@faho faho modified the milestones: fish 3.3.0, fish-future Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that's not working as intended
Projects
None yet
Development

No branches or pull requests

5 participants