Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoted brackets aren't matched by % and # parameter expansion #290

Closed
Crestwave opened this issue Apr 26, 2019 · 12 comments
Closed

Quoted brackets aren't matched by % and # parameter expansion #290

Crestwave opened this issue Apr 26, 2019 · 12 comments

Comments

@Crestwave
Copy link
Contributor

Crestwave commented Apr 26, 2019

$ var=[
$ echo "${var#[}"

$ echo "${var#"["}"
[
@Crestwave Crestwave changed the title Quoted brackets aren't matched by % and # Quoted brackets aren't matched by % and # parameter expansion Apr 26, 2019
@andychu
Copy link
Contributor

andychu commented Apr 27, 2019

Thanks for the report! I'm not sure what's going on yet, but I discovered that bash behaves differently than other shells here.

if var is [foo], then

  • dash gives [foo] and foo] for unquoted and quoted
  • mksh gives the same thing
  • zsh gives a "bad pattern error"
  • bash gives foo] and foo]
  • osh gives foo] and [foo]. I will figure out something more reasonable; I suspect it has something to do with the fact that [ is a glob metacharacter and it's unbalanced here.

andychu pushed a commit that referenced this issue Apr 27, 2019
If var='[foo]', then we will get 'foo]'.

This edge case is pretty confusing (even by shell standards.)

For some reason, dash, mksh, and zsh disagree.  They must treat [ as an
invalid glob and leave it as '[foo]'?  In other words, they require
${var%'['}.

But bash and busybox ash agree, and the code seems to make sense -- i.e.
we perform glob unescaping if the arg word doesn't look like a glob,
because we already glob-escaped it.  LooksLikeGlob() is necessarily a
heuristic.

Fixes issue #290.
@andychu
Copy link
Contributor

andychu commented Apr 27, 2019

OK I just fixed this.

This case was pretty confusing! dash, mksh, and zsh actually agree on a different behavior.

But bash and busybox ash agree, and that behavior seems to make more sense, so I made OSH agree with them.

Thanks for the report -- keep them coming!

@Crestwave
Copy link
Contributor Author

This case was pretty confusing! dash, mksh, and zsh actually agree on a different behavior.

But bash and busybox ash agree, and that behavior seems to make more sense, so I made OSH agree with them.

Don't all shells agree on the reported issue (${var%"["})? There are inconsistencies when unquoted, as you mentioned, but OSH already behaved like Bash in that way.

Thanks for the report -- keep them coming!

Seems that single quoted brackets still aren't matched; this also doesn't work on / expansion. Also, I forgot to mention a similar problem: ${var/[[]} doesn't match. In this case, POSIX itself specifies that an opening bracket after the opening bracket in a regex match (opening opening bracket?) loses its special meaning. Also, strangely, it works in % and # parameter expansion; why do they behave differently? Conversely, the original issue didn't apply to / expansion.

@andychu
Copy link
Contributor

andychu commented Apr 27, 2019

Yes that's true, OSH already disagreed with dash/mksh/zsh.

I think the [[] issue you're talking about is known failing and covered here:

http://www.oilshell.org/release/0.6.pre18/test/spec.wwz/glob.html (compare cases 29-30 with 30-31)

I haven't decided what to do here yet, but I don't consider it high priority until a real and unpatchable script uses it.

[[] and []] should be written as [\[] and [\]], which removes all the ambiguity.


The fundamental reason for the difference is long and old... basically OSH uses libc pattern matching like glob() and fnmatch(). I don't implement my own string pattern matching algorithms.

But ksh originated % %% # ## and /, and ksh is not written against libc's pattern matching. It has its own engine with a different API. Those constructs were basically designed with "not libc" in mind.

In order to implement /, you need the position of matches, analogous to m.start(n) and m.end(n) in Python. You don't need this for % %% etc.

So OSH translates the / glob patterns to a regex, but it doesn't compile % glob patterns to regexes. In the latter case it can just use fnmatch(), rather than translating and regexec().

So there are some inconsistencies there, particularly with regard to Unicode. It may be worth opening another bug about this, but like I said I usually put them at high priority only when a "real" script tickles the behavior. (This one seemed to be worth fixing simply because I was confused at the behavior, and it was a bug.)


FWIW all shells have to jump through some hoops to implement ksh constructs... but most of them also have their own string matching algorithms, so it's not quite the same as the issue with OSH.

I'm pretty sure % etc. are worst-case quadratic in all shells because of the position info issue.

@andychu
Copy link
Contributor

andychu commented Apr 27, 2019

Another possibility is to generate a runtime error for ${var/[[]} rather than silently doing something non-POSIX... on the other hand, even recognizing to generate the error is probably hard.

@Crestwave
Copy link
Contributor Author

Ah, that makes sense. Other issues I've noticed are with array subscripts in arithmetic expansion (e.g., single-quoted/escape ones aren't expanded, associative arrays expand to the associative array in JSON or something), as well as "${var%"${var#?}"}" where var contains an ANSI escape code yielding nothing.

[[] and []] should be written as [\[] and [\]], which removes all the ambiguity.

Yes, this one I just stumbled upon interactively; I use ['[]'] in Bash scripts, although as I mentioned earlier, it's still broken in OSH. A single-quoted - also fails, and matching ['+-'] causes a fatal error.

I'm not sure what you consider a "real" script, but all of the issues I've reported have been from actual Bash programs, though not very practical ones, except for the one I mentioned. I tried to patch them, but there were too many issues.

I also tried an sh program, and encountered this issue and the problem with escape codes; I was only able to get it to work by patching it to use Bash features. I understand that these use more obscure features and aren't high priority, just want to let you know that they are used in some programs.

@andychu
Copy link
Contributor

andychu commented Apr 27, 2019

@Crestwave I don't recognize any of those bugs, except that I think the associative array issue may be fixed by #287.

I looked at the past issues you've filed and I don't see them.

Can you provide a repro for each one? Like a short code snippet, what OSH does, and what you think should happen?

What I mean by "real" is something that people use, i.e. rather than a contrived test case. You pointed to some on #274.

@andychu
Copy link
Contributor

andychu commented Apr 27, 2019

I use ['[]'] in Bash scripts, although as I mentioned earlier, it's still broken in OSH

In particular I don't recall this, and I am not even familiar with that syntax ... ?

@Crestwave
Copy link
Contributor Author

Crestwave commented Apr 30, 2019

Can you provide a repro for each one? Like a short code snippet, what OSH does, and what you think should happen?

Sure. Escaped/single-quoted variables in array subscripts in arithmetic expressions don't expand:

$ i=0
$ arr[i]=0
$ (( ++arr['$i''] ))
Line 3 of '/home/user/.config/oil/oshrc'                                                             
  (( ++arr['$i'] ))
           ^
osh warning: Invalid integer constant '$i'

If you're wondering what use this has, it's usually done with associative arrays, as you have to use the $ prefix for variables there and using it unescaped expands it twice, allowing code execution.

For completeness, associative arrays expand to JSON when expanded by arithmetic expressions:

$ declare -A arr
$ arr[0]=0
$ (( var = ${arr[0]} ))
$ (( var = arr[0] ))
*** Error has no source location info ***
fatal: Expected array in index expression, got {'0': '0'}

Escape sequences are incorrectly handled when combining % and # parameter expansion:

$ var=$'\n'
$ printf '%q\n' "${var#?}"
''
$ printf '%q\n' "${var%''}"
''$'\n'
$ printf '%q\n' "${var%"${var#?}"}"
''

Single-quotes are literal in quoted parameter expansion:

$ var=a
$ echo ${var#'a'}

$ echo "${var#'a'}"
a
$ var="'a'"
$ echo "${var#'a'}"

Matching ['+-'] in / parameter expansion results in a fatal error:

$ var=+-
$ echo "${var//['+-']}"
Line 16 of '/home/user/.config/oil/oshrc'                                                            
  echo "${var//['+-']}"
             ^
fatal: Error matching regex "(['+-'])": Invalid regex syntax (func_regex_first_group_match)

What I mean by "real" is something that people use, i.e. rather than a contrived test case.

The reports I've given are from actual programs I've tried, except for the one issue I mentioned, although this particular batch isn't exactly very popular: [sh] [bash].

In particular I don't recall this, and I am not even familiar with that syntax ... ?

It's simply single-quoted instead of double-quoted/escaped.

@andychu
Copy link
Contributor

andychu commented Apr 30, 2019

Thanks, these are good test cases! I reproduced all of them as a difference between OSH and bash. bash and mksh seem to agree on most of them, but not all.

Some of them may boil down to OSH's more static evaluation model, but I will think about them and either fix or document them after some hacking.

Let me create some new bugs to discuss them in isolation.

@andychu
Copy link
Contributor

andychu commented May 3, 2019

Reviewing the rest of the list you sent, I filed #294, that is indeed a bug.

The case

  (( ++arr['$i'] ))

is an intentional incompatibility, documented here:

https://github.com/oilshell/oil/blob/master/doc/osh-manual.md

http://www.oilshell.org/blog/2016/10/20.html

That can be written ++arr[i], ++arr[$i], or ++arr["$i"] instead.

There may two related issues there: the undecidable parse issue, and also the recursive arithmetic evaluation rule #3, which is also intentionally not implemented. Arithmetic expressions are only evaluated once unlike in ksh/bash. It's more predictable and also more secure. The ksh/bash behavior is like a silent eval that treats data as code, and can be dangerous / surprising.


Please file more issues as you find them!

@andychu
Copy link
Contributor

andychu commented May 14, 2019

This fix is released! Latest version of OSH: http://www.oilshell.org/release/0.6.pre20/

@andychu andychu closed this as completed May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants