Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common bash-compatible syntax to avoid expansion of (associative) array subscripts (ble.sh) #645

Closed
akinomyoga opened this issue Mar 9, 2020 · 13 comments

Comments

@akinomyoga
Copy link
Collaborator

Issue #620

((_ble_builtin_history_rskip_dict[$file]+=$2))

maybe that can be worked around with dict['$file']

Related to the above issue, I noticed that the interpretation is different in Oil and Bash. In Bash, the subscripts of arrays and associative arrays in arithmetic expressions are subject of the shell expansions in the arithmetic evaluation phase. As a consequence, for example, the following commands produces the same results in Bash:

$ bash -c 'declare -A d=([a]=123); x=a; echo $((d[$x]))'
123
$ bash -c 'declare -A d=([a]=123); x=a; echo $((d["$x"]))'
123
$ bash -c 'declare -A d=([a]=123); x=a; echo $((d[\$x]))'
123
$ bash -c 'declare -A d=([a]=123); x=a; echo $((d['\''$x'\'']))'
123

Oil produces syntax errors for the latter two cases:

$ bin/osh -c 'declare -A d=([a]=123); x=a; echo $((d[$x]))'
123
$ bin/osh -c 'declare -A d=([a]=123); x=a; echo $((d["$x"]))'
123
$ bin/osh -c 'declare -A d=([a]=123); x=a; echo $((d[\$x]))'
  declare -A d=([a]=123); x=a; echo $((d[\$x]))
                                         ^
[ -c flag ]:1: Unexpected token in arithmetic context
$ bin/osh -c 'declare -A d=([a]=123); x=a; echo $((d['\''$x'\'']))'
[??? no location ???] warning: Undefined value in arithmetic context
0

One of the reason why Bash performs the shell expansions on the array subscript in the arithmetic evaluation phase is related to the case where the array subscript variable $x contains a value resembling arithmetic expressions. In Bash, normal shell expansions are processed before parsing the arithmetic expression. So the array subscript needs to be deferred by quoting. See the following example:

$ bash -c 'x=1],e[; declare -A d=(["1],e["]=123); echo $((a=d[$x]))'
bash: e[]: bad array subscript
bash: e[]: bad array subscript
0
$ bash -c 'x=1],e[; declare -A d=(["1],e["]=123); echo $((a=d["$x"]))'
bash: e[]: bad array subscript
bash: e[]: bad array subscript
0
$ bash -c 'x=1],e[; declare -A d=(["1],e["]=123); echo $((a=d['\''$x'\'']))'
123
$ bash -c 'x=1],e[; declare -A d=(["1],e["]=123); echo $((a=d[\$x]))'
123
$

For this reason, the parameter expansion of the variables (whose value is not under control of the script) appearing in the array subscripts of the arithmetic expression must be quoted always in Bash scripts. If some Bash script has unquoted parameter expansion of array subscripts in the arithmetic expression, it's a bug of that script. This Bash behavior seems strange and not user friendly, but that's a sad compatibility burden coming from the shell history.

@andychu
Copy link
Contributor

andychu commented Mar 9, 2020

I think this it the same issue as #648. Arithmetic is always static, and there's no recursive evaluation, regardless of where it appears -- in array indices, or in (( )). There are about a dozen places where arithmetic appears and they all obey the same rules.

@andychu
Copy link
Contributor

andychu commented Mar 10, 2020

OK now that I look carefully I see what's going on...

This is very confusing and exactly why I didn't reproduce that behavior in Oil!

However I see that it would be better to have a single syntax that works in BOTH bash and Oil. That is, a common subset.

Let me think about this...


BTW is A[\$key] documented anywhere? That syntax does not match my idea of bash syntax for a variable that shouldn't be expanded.

@andychu andychu changed the title Expansion of (associative) array subscripts (ble.sh) Common bash-compatible syntax to avoid expansion of (associative) array subscripts (ble.sh) Mar 10, 2020
@andychu
Copy link
Contributor

andychu commented Mar 10, 2020

If some Bash script has unquoted parameter expansion of array subscripts in the arithmetic expression, it's a bug of that script. This Bash behavior seems strange and not user friendly, but that's a sad compatibility burden coming from the shell history.

This feels like a bug in bash to me. The bash maintainer has a habit of claiming that bugs were intentional behavior when they were really accidents of the implementation.

Is it documented anywhere?

Not that it really changes anything, because what's done is done, but still I would call it more like working around a bug in bash than anything.


Also "$x" seems fine in bash 4.3 ? Using your exact example:

$ bash --version
GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2013 Free Software Foundation, Inc.
...

andy@lisa:~/git/oilshell/oil$  bash -c 'x=1],e[; declare -A d=(["1],e["]=123); echo $((a=d["$x"]))'
123

I do NOT get bash: e[]: bad array subscript

@akinomyoga
Copy link
Collaborator Author

BTW is A[$key] documented anywhere? That syntax does not match my idea of bash syntax for a variable that shouldn't be expanded.

It's not a special construct of the form A[\$key] but just a normal quote \$ inside an array subscript. It's just the same usage as the following code for the case you don't want the variable to be expanded,

$ key=123; echo $key \$key
123 $key
$

This feels like a bug in bash to me. The bash maintainer has a habit of claiming that bugs were intentional behavior when they were really accidents of the implementation.

Actually I have a similar feeling, but I don't know better solutions as far as one wants to support (associative) arrays by extending the POSIX arithmetic expression. POSIX specifies the order of the shell expansions and the arithmetic evaluation for the construct $(()):

From POSIX XCU 2.6.4

... The shell shall expand all tokens in the expression for parameter expansion, command substitution, and quote removal.

Next, the shell shall treat this as an arithmetic expression and substitute the value of the expression.

In principle, we might not have to follow the POSIX when extensions such as the associative arrays are used, but it's even worse that we have different syntactic treatment of arithmetic expressions depending on whether the extension is contained or not in the expression. As far as we follow the POSIX evaluation order, x=1],e[; $((a=d[$x])) and $((a=d["$x"])) must be equivalent to $((a=d[1],e[])).

Zsh behaves similarly to Bash in this situation (but not exactly the same). Zsh also performs extra array-subscript expansions. I just tried zsh to obtain the following results. Zsh only works as expected with d[\$x].

$ zsh --version
zsh 5.7.1 (x86_64-redhat-linux-gnu)
$ zsh -c 'x=1],e[; declare -A d=(["$x"]=123); echo $((d[$x]))'
0
$ zsh -c 'x=1],e[; declare -A d=(["$x"]=123); echo $((d["$x"]))'
0
$ zsh -c 'x=1],e[; declare -A d=(["$x"]=123); echo $((d[\$x]))'
123
$ zsh -c 'x=1],e[; declare -A d=(["$x"]=123); echo $((d['\''$x'\'']))'
0

I tried ksh and found that ksh seems to have given up complete compatibility with POSIX. It seems like evaluating $x as an independent arithmetic expression. Nevertheless ksh performs extra array subscript expansion (although it doesn't support associative array. Maybe there are additional rationale for this behavior other than associative array).

$ ksh -c 'x=1],e[1; d[9]=123; echo $((d[$x]))'
ksh: 1],e[1: arithmetic syntax error
$ ksh -c 'x=1],e[1; d[9]=123; echo $((d["$x"]))'
ksh: 1],e[1: arithmetic syntax error
$ ksh -c 'x=9; d[9]=123; echo $((d[\$x]))'
123
$ ksh -c 'x=9; d[9]=123; echo $((d['\''$x'\'']))'
123

Actually this topic pops up occasionally in bug-bash mailing list, but the behavior is unlikely to be changed. Also I don't have an idea on what would be the better solution. For example, see the following discussions in the mailing list:

https://lists.gnu.org/archive/html/bug-bash/2014-06/msg00003.html
https://lists.gnu.org/archive/html/bug-bash/2014-10/msg00154.html
https://lists.gnu.org/archive/html/bug-bash/2015-02/msg00066.html

Is it documented anywhere?

I would say nowhere. In the above 2014-06 discussion, the Bash maintainer, Chet, explains the behavior quoting the Bash documents, but actually I haven't yet understood his logic for this. Chet tends to say something like "it's implied by the description here".


Also "$x" seems fine in bash 4.3 ? Using your exact example:

Oh, I didn't know that. Thank you!

@andychu
Copy link
Contributor

andychu commented Mar 10, 2020

Yeah bash 4.3 is already 7 years old, so I think this is just an old bug in bash. And yes those threads are good examples of the confusion over bash -- i.e. undocumented behavior and post-hoc rationalizations of the implementation. That is one of the main reasons for Oil.

In any case the \$file construct seemingly appears in only one place in your program. It can also be worked around like:

(( a = 1 + A[\$key] ))    

# rewrite
tmp=${A["$key"]}
(( a = 1 + tmp ))

(( A[\$key] += 1 ))

# rewrite
tmp=${A["$key"]}
A["$key"]=$(( tmp + 1 ))

Since you're avoiding associative arrays within (( )), I think that should work with any version of bash.

@andychu
Copy link
Contributor

andychu commented Mar 10, 2020

Also, '$file' and \$file are the same in command context. But A['$file'] and $A[\file] are apparently not the same in some version of bash, which doesn't make sense.

@andychu
Copy link
Contributor

andychu commented Mar 11, 2020

OK I don't think there's anything to be done here (whew)

@andychu andychu closed this as completed Mar 11, 2020
@akinomyoga
Copy link
Collaborator Author

akinomyoga commented Mar 11, 2020

But A['$file'] and $A[\file] are apparently not the same in some version of bash, which doesn't make sense.

I think ((A["$file"])) is just a bug. The other behavior is consistent as far as I know.

It actually depends on the context. In the construct $((expression)), POSIX requires that the expression must be treated as if it is enclosed in double quotations "..." in the phase of shell expansions and quote removal (except for "..."). So $((A['$file'])) and $((A[\$file])) are always different in the same way as echo "'$file'" and echo "\$file" are different.

But POSIX doesn't define the arithmetic command ((expression)), so Bash has been treating expression as if it is written in the normal command context. So ((A['$file'])) and ((A[\$file])) are always equivalent as echo '$file' and echo \$file are equivalent. But I have to add that recently Bash has changed the treatment of ' in ((...)) to make it consistent with $((...)) in devel branch. It's not yet released, but only A[\$file] will be the working solution in future.


Edit: I'm sorry... I noticed the inconsistency of my discussion. I was confused. $(('$file')) and $((\$file)) are different but $((A['$file'])) and $((A[\$file])) can be the same because of the extra shell expansions of array subscripts. I checked with versions of Bash and found that Bash has changed its behavior from Bash-4.4:

for bash in $(compgen -c -- bash- | sort); do
  echo -n $bash:
  $bash -c 'i=9; a[9]=123; : $((b=a['\''$i'\'']));echo $b'
done
bash-2.05b.13:123
bash-3.0.22:123
bash-3.1.23:123
bash-3.2.57:123
bash-4.0.44:123
bash-4.1.17:123
bash-4.2.53:123
bash-4.3.48:123
bash-4.4.23:bash-4.4.23: '9': syntax error: operand expected (error token is "'9'")
bash-5.0.7:bash-5.0.7: '9': syntax error: operand expected (error token is "'9'")
bash-dev:bash-dev: '9': syntax error: operand expected (error token is "'9'")

OK I don't think there's anything to be done here (whew)

I think it's useful to warn about this issue somewhere (maybe in the section of static arithmetic parsing of known-differences?).

Thank you.

@andychu
Copy link
Contributor

andychu commented Mar 11, 2020

Hm well if you want to draft some text about how to use associative arrays in a way that's compatible with (newer versions of) bash and Oil, I will include it :) Or feel free to send me a PR.

I'm a little confused -- personally I would just use the workarounds so that you have $(( )) on the right, like A["$key"]=$(( 1 + 2 ))

@andychu
Copy link
Contributor

andychu commented Mar 11, 2020

e.g. personally I would limit my dialect inside and outside of (( )), and inside and outside of A[] to:

  • "$key"
  • 'const'

\$key is too confusing to me, and so are the alternatives with '\''. I didn't even try to read it.

That's basically how I designed Oil... I DID encounter a lot of bugs in bash while I was implementing this feature, e.g. in

https://www.oilshell.org/release/0.8.pre2/test/spec.wwz/assoc.html

Basically I don't want to use anything that requires me to remember specific rules inside (( and [ ... I would just write the equivalent OUTSIDE of (( and [ which is always possible.

I believe Oil is pretty "context independent". There are only a few obvious differences between command context and arithmetic context.

The additional A[] context is the cause of

http://www.oilshell.org/blog/2019/01/18.html#a-story-about-a-30-year-old-security-problem

because the exploit is allowed within A[] but not normal arithmetic. To be fair ksh also behaves that way...

@andychu
Copy link
Contributor

andychu commented Mar 11, 2020

I guess I could write that last comment as the advice in Oil's docs, but if you have something better, let me know :)

And I appreciate the testing because this was one place where I knew it may be hard to come up with a common subset of bash and Oil. But if bash itself is changing in undocumented ways, then there's limited hope to begin with.

@akinomyoga
Copy link
Collaborator Author

I guess I could write that last comment as the advice in Oil's docs, but if you have something better, let me know :)

I think that's enough (just something like "Do not reference associative array elements in arithmetic expressions").

@andychu
Copy link
Contributor

andychu commented Mar 11, 2020

Another way to say it is that Oil is supposed to obey the invariant that

(( a [ X ] = Y ))

is the same as

A[X]=$(( Y ))

for all arithmetic expressions X and Y. Bash doesn't obey that and I'd argue that all the differences are undocumented bugs.

I didn't test that invariant but I'm pretty sure it should hold ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants