Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems with associative arrays in arithmetic expressions #152

Closed
stephane-chazelas opened this issue Jan 5, 2021 · 14 comments
Closed
Labels
bug Something is not working

Comments

@stephane-chazelas
Copy link

$ var='1$(uname>&2)' ksh -c 'typeset -A a; ((a[$var]++)); typeset -p a'
Linux
typeset -A a=([1]=1)

The uname command was run, or in other words ((a[$var]++)) is an arbitrary command execution vulnerability.

That problem also affects bash and zsh and is kind of known already.

Yet in ksh93 (not bash nor zsh), this is OK:

$ var='1`uname>&2`' ksh -c 'typeset -A a; ((a[$var]++)); typeset -p a'
typeset -A a=(['1`uname>&2`']=1)

Which suggests that some form of work around for that sort of issue is (partly) implemented in ksh93. It seems, it's the presence of $ characters that cause the second interpretation:

$ var='$0`uname>&2`' ksh -c 'typeset -A a; ((a[$var]++)); typeset -p a'
Linux
typeset -A a=([ksh]=1)

In bash and zsh, a work around is to use:

let 'a[$var]++'

instead (note the single quotes).

But, in ksh93, it appears to be buggy for the empty key or for keys that contain ] or \ characters:

~$ var='' ksh -c "typeset -A a; let 'a[\$var]++'; typeset -p a"
typeset -A a=()
~$ var='\' ksh -c "typeset -A a; let 'a[\$var]++'; typeset -p a"
typeset -A a=([']']=1)
~$ var=']' ksh -c "typeset -A a; let 'a[\$var]++'; typeset -p a"
typeset -A a=()
~$ var='x]foo' ksh -c "typeset -A a; let 'a[\$var]++'; typeset -p a"
typeset -A a=([x]=1)

See more details at https://unix.stackexchange.com/questions/627474/how-to-use-associative-arrays-safely-inside-arithmetic-expressions

@McDutchie McDutchie added the bug Something is not working label Jan 5, 2021
@hyenias
Copy link

hyenias commented Jan 6, 2021

Additional two examples:

$ var='1$(uname>&2)' ksh -c 'typeset -A a; ((a[$var]++)); typeset -p a'
Darwin
typeset -A a=([1]=1)
$ var='1$(uname>&2)' ksh -c 'typeset -A a; ((a[\$var]++)); typeset -p a'
typeset -A a=(['1$(uname>&2)']=1)
$ var='1$(uname>&2)' ksh -c 'typeset -A a; ((a["$var"]++)); typeset -p a'
typeset -A a=(['1$(uname>&2)']=1)

@hyenias
Copy link

hyenias commented Jan 6, 2021

Another couple of examples:

$ var='1$(uname>&2)' ksh -c 'typeset -A a; ((a[$var]++)); typeset -p a'
Darwin
typeset -A a=([1]=1)
$ var='1$(uname>&2)' ksh -c 'typeset -A a; ((a["\$var"]++)); typeset -p a'
typeset -A a=(['1$(uname>&2)']=1)
$
$ var='$0`uname>&2`' ksh -c 'typeset -A a; ((a[$var]++)); typeset -p a'
Darwin
typeset -A a=([ksh]=1)
$ var='$0`uname>&2`' ksh -c 'typeset -A a; ((a["\$var"]++)); typeset -p a'
typeset -A a=(['$0`uname>&2`']=1)

@hyenias
Copy link

hyenias commented Jan 6, 2021

Here are some examples using ((arithmetic)) instead of let:

$ var='' ksh -c "typeset -A a; ((a[\$var]++)); typeset -p a"
typeset -A a=(['']=1)
$ var='\' ksh -c "typeset -A a; ((a[\$var]++)); typeset -p a"
typeset -A a=(['\']=1)
$ var=']' ksh -c "typeset -A a; ((a[\$var]++)); typeset -p a"
typeset -A a=([']']=1)
$ var='x]foo' ksh -c "typeset -A a; ((a[\$var]++)); typeset -p a"
typeset -A a=(['x]foo']=1) 

@hyenias
Copy link

hyenias commented Jan 6, 2021

Here are some examples that have escaped the special characters in var='x]+b[1$(uname>&2)' so that they can be used in a subscript/key.

$ var="x\]+b\[\$(uname>&2)" ksh -c 'typeset -A a; ((a[$var]++)); typeset -p a'
Darwin
typeset -A a=(['x]+b[']=1)
$ var="x\]+b\[\$(uname>&2)" ksh -c 'typeset -A a; ((a[\$var]++)); typeset -p a'
typeset -A a=(['x]+b[$(uname>&2)']=1)
$ var="x\]+b\[\$(uname>&2)" ksh -c 'typeset -A a; ((a["$var"]++)); typeset -p a'
typeset -A a=(['x]+b[$(uname>&2)']=1)
$ var="x\]+b\[\$(uname>&2)" ksh -c 'typeset -A a; ((a["\$var"]++)); typeset -p a'
typeset -A a=(['x]+b[$(uname>&2)']=1)

@hyenias
Copy link

hyenias commented Jan 7, 2021

As a note, one can use printf with its q option to escape quote arbitrary strings for use as keys in associative arrays.

$ var='x]+b[$(uname>&2)' ksh -c 'var=${ printf "%q" "$var" }; typeset -A a; ((a[$var]++)); typeset -p a var'
typeset -A a=(['x]+b[$(uname>&2)']=1)
typeset -x var=$'\'x]+b[$(uname>&2)\''
$
$ var='x]+b[$(uname>&2)' ksh -c 'var=$( printf "%q" "$var" ); typeset -A a; ((a[$var]++)); typeset -p a var'
typeset -A a=(['x]+b[$(uname>&2)']=1)
typeset -x var=$'\'x]+b[$(uname>&2)\''
$
$ var='x]+b[$(uname>&2)' zsh -c 'var=$( printf "%q" "$var" ); typeset -A a; ((a[$var]++)); typeset -p a var'
typeset -A a=( ['x]+b[$(uname\>\&2)']=1 )
export var='x\]+b\[\$\(uname\>\&2\)'
$
$ var='x]+b[$(uname>&2)' bash -c 'var=$( printf "%q" "$var" ); typeset -A a; ((a[$var]++)); typeset -p a var'
declare -A a=(["x]+b[\$(uname>&2)"]="1" )
declare -x var="x\\]+b\\[\\\$\\(uname\\>\\&2\\)"
$

@stephane-chazelas
Copy link
Author

stephane-chazelas commented Jan 7, 2021

As a note, one can use printf with its q option to escape quote arbitrary strings for use as keys in associative arrays.
[...]

No, that's not reliable either. See:

$ var=$']\n' ./ksh -c 'typeset -A a; (( a[${ printf %q "$var"; }]++ )); typeset -p a'
typeset -A a=(['']=1)

Still an ACE vulnerability:

$ var=$'\n\'$(uname>&2)' ./ksh -c 'typeset -A a; (( a[${ printf %q "$var"; }]++ )); typeset -p a'
Linux
typeset -A a=(['n]']=1)

It doesn't work in bash nor zsh either. See https://stackoverflow.com/questions/61701799/bad-array-subscript-error-when-trying-to-increment-an-associative-array-element/65578230#65578230

More generally, I wouldn't trust the output of printf %q (or any quoting operator using anything but single quotes) for reinput to the shell. See https://unix.stackexchange.com/questions/379181/escape-a-variable-for-use-as-content-of-another-script/600214#600214

@ormaaj
Copy link

ormaaj commented Jan 30, 2021

I thought this was a sorta known issue. I attempted to describe it (tersely) here long ago and brought it up on bug-bash a few times, back around the time the (IMO wrong) assoc_expand_once bodge was put in. The fact that Chet thinks the manpage description of that feature makes any sense tells me he never fully grok'd the issue because the arithmetic processor NEVER directly performs expansions of any kind! People believe it does because they fail to identify the nuanced interplay between the arithmetic evaluation scheme and the variable name resolver - the latter being where the array subscript expansion should actually be taking place, always exactly once for both typeset -a and typeset -A types. The semantics are obfuscated by:

  • the seperate and unrelated recursive arithmetic name lookup behaviour.
  • the also seperate and unrelated pre-expansion of the string passed to the arithmetic evaluator described in the BashPitfall.
  • the further complexification when nameref gets involved.
  • the weird broken / half-implemented ksh93-specific parsing scheme described (sorta) by Stephane (but really that's not new either, people of course wouldn't know these undocumented quirks unless they read my / stephane's ramblings or independently discover them... ugh).

ksh93 has a lot more going on with its parsing of array indicies because:

  • Non-shitty functions that take arbitrary non-string type arguments via nameref (and static scope holy f!)
  • More non-shitty functions incl. user-definable arithmetic funcs callable from the arithmetic evaluator, also accepting arbitrary type args that may have subscripts.
  • Massively more complexification with compound, typeset -T, enum, etc, permutations of which should be combinable with any array type and passable to aforementioned non-shitty functions even in math contexts. But I was pretty much the world's only actual user of those features, it would seem. (until I just fully gave up on these mother-effing hopeless shell toys and switched to the racket xrepl with a pile of macros for proper shellification :/)

Proper interpretation of issues such as this should bear in mind that proper implementation was probably supposed to be contingent upon completion of the sadly half-baked afterthought that is the typeset -T mechanism, which would logically be the critical prerequisite to a proper generic user-defined collection type scheme and iterator protocol. Quick hax to work around the considerable fallout arising from that gaping hole are ultimately ugly bodges.

@McDutchie
Copy link

McDutchie commented Mar 3, 2021

If we make the echo command abort() and produce a crash dump...

diff --git a/src/cmd/ksh93/bltins/print.c b/src/cmd/ksh93/bltins/print.c
index 814f9fb8..8137b23f 100644
--- a/src/cmd/ksh93/bltins/print.c
+++ b/src/cmd/ksh93/bltins/print.c
@@ -109,6 +109,7 @@ static char* 	nullarg[] = { 0, 0 };
 	prdata.raw = prdata.echon = 0;
 	prdata.sh = context->shp;
 	NOT_USED(argc);
+abort();
 	/* This mess is because /bin/echo on BSD is different */
 	if(!prdata.sh->universe)
 	{

...and then execute a version of Stéphane's reproducer that uses echo...

$ var='1$(echo DANGER >&2)' arch/*/bin/ksh -c 'typeset -A a; ((a[$var]++)); typeset -p a'
arch/darwin.i386-64/bin/ksh: 4516: Abort
typeset -A a=([1]=1)

...the resulting crash dump can probably give us an idea of where to look for a fix.

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib              0x00007fff7209b2c2 __pthread_kill + 10
1   libsystem_pthread.dylib             0x00007fff72156bf1 pthread_kill + 284
2   libsystem_c.dylib                   0x00007fff720056a6 abort + 127
3   ksh                                 0x000000010c8173ea B_echo + 58
4   ksh                                 0x000000010c8a3501 sh_exec + 7729 (xec.c:1343)
5   ksh                                 0x000000010c89d585 sh_subshell + 2837 (subshell.c:685)
6   ksh                                 0x000000010c86f0a1 comsubst + 2737 (macro.c:2172)
7   ksh                                 0x000000010c86fc5a varsub + 1418 (macro.c:1162)
8   ksh                                 0x000000010c86c502 copyto + 3298 (macro.c:625)
9   ksh                                 0x000000010c86b76b sh_mactrim + 411 (macro.c:179)
10  ksh                                 0x000000010c83a480 scope + 2272 (arith.c:165)
11  ksh                                 0x000000010c8396d2 arith + 5058 (arith.c:456)
12  ksh                                 0x000000010c89840b arith_exec + 1563 (streval.c:253)
13  ksh                                 0x000000010c89b6bf strval + 143 (streval.c:968)
14  ksh                                 0x000000010c83824c sh_strnum + 380 (arith.c:545)
15  ksh                                 0x000000010c839b12 sh_arith + 34 (arith.c:560)
16  ksh                                 0x000000010c8a71b2 sh_exec + 23266 (xec.c:2277)
17  ksh                                 0x000000010c8a6110 sh_exec + 19008 (xec.c:2026)
18  ksh                                 0x000000010c82aaa4 exfile + 3284 (main.c:582)
19  ksh                                 0x000000010c82b9e0 sh_main + 3376 (main.c:351)
20  ksh                                 0x000000010c811206 main + 38 (pmain.c:45)
21  libdyld.dylib                       0x00007fff71f603d5 start + 1

This quickly shows us that sh_mactrim is called from line 165 in arith.c. This must be where it decides to do expansion of strings containing $ in array subscripts. As you can see in the backtrace, this eventually leads to comsubst() which does the command substitution.

Sure enough, those lines are:

if(strchr(sub,'$'))
sub = sh_mactrim(shp,sub,0);

It only looks for $, which explains why ` is not a problem. So it's the opposite of what it seemed to be to Stéphane: there is not a workaround that only works for `, but a (mis)feature that only looks for $.

I find that if I comment those two lines out or delete them, I get zero regression test failures upon running bin/shtests, and this bug/misfeature goes away:

$ var='1$(echo DANGER >&2)' arch/*/bin/ksh -c 'typeset -A a; ((a[$var]++)); typeset -p a'
typeset -A a=(['1$(echo DANGER >&2)']=1)

And regular variable expansions continue to work just fine for subscripts, as they are evidently handled elsewhere:

$ arch/*/bin/ksh -c 'typeset A foo=([bar]=OK); v=bar; echo ${foo[$v]}'     
OK
$ arch/*/bin/ksh -c 'typeset A foo=([bar]=OK); v=bar; echo "${foo[$v]}"' 
OK
$ arch/*/bin/ksh -c 'typeset A foo=([bar]=OK); v=bar; echo ${foo["$v"]}' 
OK

So, it looks like this is case number I-lost-count where a ksh93 bug is fixed by removing some evidently misguided code. I should find those again and make a list.

diff --git a/src/cmd/ksh93/sh/arith.c b/src/cmd/ksh93/sh/arith.c
index 93de13cd..c78d5c37 100644
--- a/src/cmd/ksh93/sh/arith.c
+++ b/src/cmd/ksh93/sh/arith.c
@@ -161,8 +161,6 @@ static Namval_t *scope(register Namval_t *np,register struct lval *lvalue,int as
 				sfprintf(shp->strbuf,"%s%s%c",nv_name(np),sub,0);
 				sub = sfstruse(shp->strbuf);
 			}
-			if(strchr(sub,'$'))
-				sub = sh_mactrim(shp,sub,0);
 			*cp = flag;
 			if(c || hasdot)
 			{

@McDutchie
Copy link

McDutchie commented Mar 3, 2021

@hyenias wrote:

Here are some examples using ((arithmetic)) instead of let:

$ var='' ksh -c "typeset -A a; ((a[\$var]++)); typeset -p a"
typeset -A a=(['']=1)
$ var='\' ksh -c "typeset -A a; ((a[\$var]++)); typeset -p a"
typeset -A a=(['\']=1)
$ var=']' ksh -c "typeset -A a; ((a[\$var]++)); typeset -p a"
typeset -A a=([']']=1)
$ var='x]foo' ksh -c "typeset -A a; ((a[\$var]++)); typeset -p a"
typeset -A a=(['x]foo']=1) 

These reproducers are not actually valid, because you're using double quotes for the -c script, and double quotes remove the backslashes. The results will be identical when using single quotes, which is a bug – actually no they're not, that bug is not there, I did something wrong. Need more caffeine.

@McDutchie
Copy link

Additional two examples:

$ var='1$(uname>&2)' ksh -c 'typeset -A a; ((a[\$var]++)); typeset -p a'
typeset -A a=(['1$(uname>&2)']=1)

This one appears to be fixed by the patch above:

$ var='1$(uname>&2)' arch/*/bin/ksh -c 'typeset -A a; ((a[\$var]++)); typeset -p a'
typeset -A a=(['$var']=1)
$ var='1$(uname>&2)' ksh -c 'typeset -A a; ((a["$var"]++)); typeset -p a'
typeset -A a=(['1$(uname>&2)']=1)

This is the expected result. A variable in double quotes is expanded, and that is not different within an arithmetic command or arithmetic expansion. Unfortunately, the patch inserts some extra backslashes in the result:

$ var='1$(uname>&2)' arch/*/bin/ksh -c 'typeset -A a; ((a["$var"]++)); typeset -p a'   
typeset -A a=(['1$\(uname>\&2\)']=1)

@hyenias
Copy link

hyenias commented Mar 3, 2021

Wow! You have been busy. I experimented with those two lines deleted as described by your diff above. Seems like promising progress.

@McDutchie
Copy link

Unfortunately, the patch inserts some extra backslashes in the result:

$ var='1$(uname>&2)' arch/*/bin/ksh -c 'typeset -A a; ((a["$var"]++)); typeset -p a'   
typeset -A a=(['1$\(uname>\&2\)']=1)

This problem is fixed by removing the NV_SUBQUOTE flag from this nv_endsubscript() call. That flag causes a level of quoting to be added using backslashes. If we remove that flag in addition to the previous patch, the above reproducer passes, so do all the others, and there are no regression test failures.

In fact I wonder if anything would break if we remove that NV_SUBQUOTE flag altogether. It is handled in nv_endubscript() in array.c but is only passed in one other place in arith.c. And, if we think about it logically, the arithmetic subsystem simply has no business messing around with shell quoting at all, because shell quoting doesn't belong to it. Shell quotes are properly resolved by the main shell language before the arithmetic subsystem is even invoked. If we no longer remove quotes in the arithmetic subsystem (thus resolving this arbitrary code execution vulnerability), we should no longer have a need to add any, either.

So that's my theory/hypothesis now, let's see how that pans out...

@McDutchie
Copy link

Sure enough: with the patch below, I get zero regression test failures. This includes all the reproducers in this thread, which I've turned into regression tests by now, and which will be committed along with the fix.

diff --git a/src/cmd/ksh93/include/name.h b/src/cmd/ksh93/include/name.h
index f2fef412..bd727dfb 100644
--- a/src/cmd/ksh93/include/name.h
+++ b/src/cmd/ksh93/include/name.h
@@ -150,8 +150,6 @@ struct Ufunction
 #define	nv_funtree(n)	((n)->nvalue.rp->ptree)
 #define	funptr(n)	((n)->nvalue.bfp)
 
-#define NV_SUBQUOTE	(NV_ADD<<1)	/* used with nv_endsubscript */
-
 /* NAMNOD MACROS */
 /* ... for attributes */
 
diff --git a/src/cmd/ksh93/sh/arith.c b/src/cmd/ksh93/sh/arith.c
index 93de13cd..cfdbf747 100644
--- a/src/cmd/ksh93/sh/arith.c
+++ b/src/cmd/ksh93/sh/arith.c
@@ -161,8 +161,6 @@ static Namval_t *scope(register Namval_t *np,register struct lval *lvalue,int as
 				sfprintf(shp->strbuf,"%s%s%c",nv_name(np),sub,0);
 				sub = sfstruse(shp->strbuf);
 			}
-			if(strchr(sub,'$'))
-				sub = sh_mactrim(shp,sub,0);
 			*cp = flag;
 			if(c || hasdot)
 			{
@@ -171,9 +169,9 @@ static Namval_t *scope(register Namval_t *np,register struct lval *lvalue,int as
 			}
 #if SHOPT_FIXEDARRAY
 			ap = nv_arrayptr(np);
-			cp = nv_endsubscript(np,sub,NV_ADD|NV_SUBQUOTE|(ap&&ap->fixed?NV_FARRAY:0));
+			cp = nv_endsubscript(np,sub,NV_ADD|(ap&&ap->fixed?NV_FARRAY:0));
 #else
-			cp = nv_endsubscript(np,sub,NV_ADD|NV_SUBQUOTE);
+			cp = nv_endsubscript(np,sub,NV_ADD);
 #endif /* SHOPT_FIXEDARRAY */
 			if(*cp!='[')
 				break;
@@ -270,7 +268,7 @@ static Sfdouble_t arith(const char **ptr, struct lval *lvalue, int type, Sfdoubl
 				dot=NV_NOADD;
 				if((c = *++str) !='[')
 					continue;
-				str = nv_endsubscript((Namval_t*)0,cp=str,NV_SUBQUOTE)-1;
+				str = nv_endsubscript((Namval_t*)0,cp=str,0)-1;
 				if(sh_checkid(cp+1,(char*)0))
 					str -=2;
 			}
diff --git a/src/cmd/ksh93/sh/array.c b/src/cmd/ksh93/sh/array.c
index 777c140d..4317bd00 100644
--- a/src/cmd/ksh93/sh/array.c
+++ b/src/cmd/ksh93/sh/array.c
@@ -1480,7 +1480,7 @@ char *nv_endsubscript(Namval_t *np, register char *cp, int mode)
 	/* first find matching ']' */
 	while(count>0 && (c= *++cp))
 	{
-		if(c=='\\' && (!(mode&NV_SUBQUOTE) || (c=cp[1])=='[' || c==']' || c=='\\' || c=='*' || c=='@'))
+		if(c=='\\')
 		{
 			quoted=1;
 			cp++;

@McDutchie
Copy link

For the historical record, now that we have @JohnoKing's ksh93-history repo, we can see that this vulnerability was introduced in version 2010-06-05 ksh93u-. The most relevant changelog entry seems to be:

10-06-03  A bug in the quoting for name reference declarations which did
         not properly handle [ and ] in subscripts for associative arrays.

Not very informative, particularly since no relevant regression test seems to have been committed along with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
None yet
Development

No branches or pull requests

4 participants