Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression - glob expansion with brace expansion and parameter exapansion #660

Closed
dicktyr opened this issue Jun 11, 2023 · 11 comments
Closed
Labels
blocker This had better be fixed before releasing bug Something is not working

Comments

@dicktyr
Copy link

dicktyr commented Jun 11, 2023

release 1.0.5 appears to have introduced a regression

globs are not expanded with the following:

% echo "/"{bin,sbin}"/*"
/bin/* /sbin/*

% v=/; echo "$v"bin"/*"
/bin/*

but globbing is unexpectedly performed if both parameter substitution and brace expansion are present:

% v=/; echo "$v"{bin,sbin}"/*"
[output omitted for the sake of brevity]
@McDutchie McDutchie added the bug Something is not working label Jun 11, 2023
@McDutchie
Copy link

Yikes. Thanks for the report. Will look into it ASAP.

@McDutchie McDutchie added the blocker This had better be fixed before releasing label Jun 11, 2023
@McDutchie
Copy link

McDutchie commented Jun 11, 2023

Bug introduced by commit 6d76174 (dev), 4361b6a (1.0).

@McDutchie
Copy link

$ cd /
$ unset x
$ ls -d {b,d}"*"
ls: b*: No such file or directory
ls: d*: No such file or directory
$ ls -d $x{b,d}"*"
bin dev
$ ls -d $x"*"{n,r}
bin  sbin usr  var
$ ls -d "*"$x{n,r}   
bin  sbin usr  var
$ ls -d $x"*"     
ls: *: No such file or directory

The nature of the bug seems to be that, if both a variable expansion (quoted or not, set or not) and a brace expansion are used, quoted wildcards are subject to expansion.

@McDutchie
Copy link

McDutchie commented Jun 12, 2023

The closest I've been able to come to fixing this so far is the patch below. All the above reproducers act correctly with this patch and all the current regression tests pass.

diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 44e48bb92..8717b62e7 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -2583,7 +2583,7 @@ static void endfield(Mac_t *mp,int split)
 		mp->atmode = 0;
 		if(mp->patfound)
 		{
-			int musttrim = mp->wasexpan && !mp->noextpat && strchr(argp->argval,'\\');
+			int musttrim = mp->wasexpan && !mp->quoted && !mp->noextpat && strchr(argp->argval,'\\');
 			sh.argaddr = 0;
 #if SHOPT_BRACEPAT
 			/* in POSIX mode, disallow brace expansion for unquoted expansions */

Unfortunately the patch also reintroduces #549 for cases where any part of the word is quoted -- even if the quoted part is empty. For example:

$ mkdir testdir
$ cd testdir
$ touch a b c ./-
$ p='[a\-c]'
$ echo $p  # OK
- a c
$ echo ""$p  # BUG
a b c
$ echo $p""  # BUG
a b c

The fundamental problem is that, by the time endfield() is reached, we have a complete word and we can no longer distinguish between the quoted and unquoted parts of it. The Mac_t flags apply to the entire word.

freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Jun 12, 2023
From issue #660:

Globs are not expanded with the following:

% echo "/"{bin,sbin}"/*"
/bin/* /sbin/*

% v=/; echo "$v"bin"/*"
/bin/*

But globbing is unexpectedly performed if both parameter substitution
and brace expansion are present:

	% v=/; echo "$v"{bin,sbin}"/*"
	[output omitted for the sake of brevity]

Obtained from:	ksh93/ksh#660
MFH:		2023Q3
freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Jun 12, 2023
From issue #660:

Globs are not expanded with the following:

% echo "/"{bin,sbin}"/*"
/bin/* /sbin/*

% v=/; echo "$v"bin"/*"
/bin/*

But globbing is unexpectedly performed if both parameter substitution
and brace expansion are present:

	% v=/; echo "$v"{bin,sbin}"/*"
	[output omitted for the sake of brevity]

Obtained from:  ksh93/ksh#660
@McDutchie
Copy link

McDutchie commented Jun 13, 2023

Patch version two. In mac_copy(), don't internally backslash-escape a backslash in a glob pattern bracket expression. This also fixes the #549 regression reintroduced by the previous patch, at least for standard glob patterns. For this, we need to use the bracketexpr flag (introduced as a copyto() local variable in 6c73c8c) in mac_copy(), so we move it to the Mac_t struct, making it globally accessible. Initialisation is automatic. Please test this.

Patch v2
diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 44e48bb92..29ac26aed 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -76,6 +76,7 @@ typedef struct  _mac_
 	char		macsub;		/* set to 1 when running mac_substitute */
 	int		dotdot;		/* set for .. in subscript */
 	void		*nvwalk;	/* for name space walking */
+	char		bracketexpr; 	/* set when in [brackets] within a non-ERE glob pattern */
 } Mac_t;
 
 #undef ESCAPE
@@ -437,7 +438,6 @@ static void copyto(Mac_t *mp,int endch, int newquote)
 	char		oldquote = mp->quote;	/* save "double quoted" state */
 	char		ansi_c = 0;		/* set when processing ANSI C escape codes */
 	int32_t		ere = 0;		/* bitmask of pattern options indicating an extended regular expression */
-	char		bracketexpr = 0; 	/* set when in [brackets] within a non-ERE glob pattern */
 	Sfio_t		*sp = mp->sp;
 	Stk_t		*stkp = sh.stk;
 	char		*resume = 0;
@@ -533,7 +533,7 @@ static void copyto(Mac_t *mp,int endch, int newquote)
 			if(mp->pattern)
 			{
 				/* preserve \ for escaping glob pattern bracket expression operators */
-				if(bracketexpr && n==S_BRAOP)
+				if(mp->bracketexpr && n==S_BRAOP)
 					break;
 				/* preserve \digit for pattern matching */
 				/* also \alpha for extended patterns */
@@ -636,8 +636,8 @@ static void copyto(Mac_t *mp,int endch, int newquote)
 				mp->pattern = c;
 			break;
 		    case S_ENDCH:
-			if(bracketexpr && cp[-1]==RBRACT && !(mp->quote || mp->lit))
-				bracketexpr--;
+			if(mp->bracketexpr && cp[-1]==RBRACT && !(mp->quote || mp->lit))
+				mp->bracketexpr--;
 			if((mp->lit || cp[-1]!=endch || mp->quote!=newquote))
 				goto pattern;
 			if(endch==RBRACE && mp->pattern && brace)
@@ -738,12 +738,12 @@ static void copyto(Mac_t *mp,int endch, int newquote)
 				cp = first = fcseek(0);
 				break;
 			}
-			if(mp->pattern==1 && !ere && !bracketexpr)
+			if(mp->pattern==1 && !ere && !mp->bracketexpr)
 			{
-				bracketexpr++;
+				mp->bracketexpr++;
 				/* a ] following [, as in []abc], should not close the bracket expression */
 				if(cp[0]==RBRACT && cp[1])
-					bracketexpr++;
+					mp->bracketexpr++;
 			}
 			/* FALLTHROUGH */
 		    case S_PAT:
@@ -883,7 +883,7 @@ static void copyto(Mac_t *mp,int endch, int newquote)
 			break;
 		    case S_BRAOP:
 			/* escape a quoted !^- within a bracket expression */
-			if(!bracketexpr || !(mp->quote || mp->lit))
+			if(!mp->bracketexpr || !(mp->quote || mp->lit))
 				continue;
 			if(c)
 				sfwrite(stkp,first,c);
@@ -2481,7 +2481,10 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
 				continue;
 			}
 			if(n==S_ESC)
-				sfputc(stkp,ESCAPE);
+			{
+				if(!mp->bracketexpr)
+					sfputc(stkp,ESCAPE);
+			}
 			else if(n==S_EPAT)
 			{
 				/* don't allow extended patterns in this case */
@@ -2583,7 +2586,7 @@ static void endfield(Mac_t *mp,int split)
 		mp->atmode = 0;
 		if(mp->patfound)
 		{
-			int musttrim = mp->wasexpan && !mp->noextpat && strchr(argp->argval,'\\');
+			int musttrim = mp->wasexpan && !mp->quoted && !mp->noextpat && strchr(argp->argval,'\\');
 			sh.argaddr = 0;
 #if SHOPT_BRACEPAT
 			/* in POSIX mode, disallow brace expansion for unquoted expansions */

edit: It's broken, do not apply

@dicktyr
Copy link
Author

dicktyr commented Jun 13, 2023

a quick test runs as expected
(I'm not able to test more thoroughly at this time)

I noticed this bug in an automated script
as I use ksh extensively in my own projects
(actually in preference to other languages)
so I'm very grateful for your work on ksh

thank you!

@McDutchie
Copy link

It still isn't right. With patch v2 (and before):

$ touch '\'
$ pat='[x&y]'
$ echo $pat
\

That pattern should not resolve to the file \; the pattern didn't even contain a backslash.

@McDutchie
Copy link

McDutchie commented Jun 13, 2023

That is just going to have to be a separate bug. And I'm not sure it will ever be fixed without redesigning the entire split/glob mechanism from the ground up.

@McDutchie
Copy link

Patch version two. In mac_copy(), don't internally backslash-escape a backslash in a glob pattern bracket expression. This also fixes the #549 regression reintroduced by the previous patch, at least for standard glob patterns.

Argh. No it doesn't fix those. I don't know what made me think that. I must have been testing the wrong binary.

@McDutchie
Copy link

So, patch v2 is total nonsense. We cannot use the bracketexpr flag from copyto() in mac_copy(), becasue mac_copy() is called at S_EOF, meaning at the end of processing.

I have a new patch where mac_copy() does its own detection of a bracket expression, making that fix work:

Patch v3 (also broken, do not apply)
diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 44e48bb92..9d19df04a 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -2451,6 +2451,7 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
 	}
 	else if(!mp->quote && mp->split && (mp->ifs||mp->pattern))
 	{
+		char	bracketexpr = 0; 	/* set when in [brackets] */
 		/* split words at ifs characters */
 		state = sh.ifstable;
 		if(mp->pattern)
@@ -2467,6 +2468,8 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
 				if(state[c]==0)
 					state[c] = S_PAT;
 			}
+			if(state[RBRACT]==0 && state[LBRACT]==S_PAT)
+				state[RBRACT] = S_ENDCH;
 			if(state[ESCAPE]==0)
 				state[ESCAPE] = S_ESC;
 		}
@@ -2481,7 +2484,10 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
 				continue;
 			}
 			if(n==S_ESC)
-				sfputc(stkp,ESCAPE);
+			{
+				if(!bracketexpr)
+					sfputc(stkp,ESCAPE);
+			}
 			else if(n==S_EPAT)
 			{
 				/* don't allow extended patterns in this case */
@@ -2491,7 +2497,21 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
 					mp->noextpat = 1;
 			}
 			else if(n==S_PAT)
+			{
 				mp->patfound = mp->pattern;
+				if(!bracketexpr && mp->pattern)
+				{
+					bracketexpr++;
+					/* a ] following [, as in []abc], should not close the bracket expression */
+					if(cp[0]==RBRACT && cp[1])
+						bracketexpr++;
+				}
+			}
+			else if(n==S_ENDCH)
+			{
+				if(bracketexpr)
+					bracketexpr--;
+			}
 			else if(n && mp->ifs)
 			{
 				if(mbwide() && n==S_MBYTE)
@@ -2557,8 +2577,10 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
 				if(state[c]==S_PAT)
 					state[c] = 0;
 			}
-			if(sh.ifstable[ESCAPE]==S_ESC)
-				sh.ifstable[ESCAPE] = 0;
+			if(state[RBRACT]==S_ENDCH)
+				state[RBRACT] = 0;
+			if(state[ESCAPE]==S_ESC)
+				state[ESCAPE] = 0;
 		}
 	}
 	else
@@ -2583,7 +2605,7 @@ static void endfield(Mac_t *mp,int split)
 		mp->atmode = 0;
 		if(mp->patfound)
 		{
-			int musttrim = mp->wasexpan && !mp->noextpat && strchr(argp->argval,'\\');
+			int musttrim = mp->wasexpan && !mp->quoted && !mp->noextpat && strchr(argp->argval,'\\');
 			sh.argaddr = 0;
 #if SHOPT_BRACEPAT
 			/* in POSIX mode, disallow brace expansion for unquoted expansions */
diff --git a/src/cmd/ksh93/tests/glob.sh b/src/cmd/ksh93/tests/glob.sh
index ae4a8e101..d95eeae42 100755
--- a/src/cmd/ksh93/tests/glob.sh
+++ b/src/cmd/ksh93/tests/glob.sh
@@ -471,6 +471,8 @@ test_glob '<~(K)[]-z]>' ~(K)["]-z"]
 unquoted_patvar='[\!N]'; test_glob '<[\!N]>' $unquoted_patvar
 unquoted_patvar='[\^N]'; test_glob '<[\^N]>' $unquoted_patvar
 unquoted_patvar='[a\-c]'; test_glob '<[a\-c]>' $unquoted_patvar
+unquoted_patvar='[a\-c]'; test_glob '<[a\-c]>' ""$unquoted_patvar
+unquoted_patvar='[a\-c]'; test_glob '<[a\-c]>' $unquoted_patvar""
 : > -; test_glob '<->' $unquoted_patvar
 : > a > c; test_glob '<-> <a> <c>' $unquoted_patvar
 : > !; unquoted_patvar='[\!N]'; test_glob '<!>' $unquoted_patvar

But that doesn't fix the #549 regression reintroduced by patch v1 either, it just breaks it in the opposite way:

$ mkdir testdir
$ cd testdir
$ touch a b c ./-
$ p='[a\-c]'
$ echo $p      # BUG
a b c
$ echo ""$p    # OK
- a c
$ echo $p""    # OK
- a c

And, as a bonus, we get a lot of regression test failures as well:

test glob begins at 2023-06-13+13:12:12
	glob.sh[477]: FAIL: glob -- expected '<[\!N]>', got '<\> <b>'
	glob.sh[478]: FAIL: glob -- expected '<[\^N]>', got '<\> <b>'
	glob.sh[479]: FAIL: glob -- expected '<[a\-c]>', got '<b>'
	glob.sh[480]: FAIL: glob -- expected '<[a\-c]>', got '<[a-c]>'
	glob.sh[481]: FAIL: glob -- expected '<[a\-c]>', got '<[a-c]>'
	glob.sh[482]: FAIL: glob -- expected '<->', got '<b>'
	glob.sh[483]: FAIL: glob -- expected '<-> <a> <c>', got '<a> <b> <c>'
	glob.sh[484]: FAIL: glob -- expected '<!>', got '<!> <-> <\> <a> <b> <c>'
	glob.sh[485]: FAIL: glob -- expected '<^>', got '<!> <-> <\> <^> <a> <b> <c>'
	glob.sh[489]: FAIL: glob -- expected '<~(S)[\!N]>', got '<\> <b>'
	glob.sh[490]: FAIL: glob -- expected '<~(S)[\^N]>', got '<\> <b>'
	glob.sh[491]: FAIL: glob -- expected '<~(S)[a\-c]>', got '<b>'
	glob.sh[492]: FAIL: glob -- expected '<->', got '<b>'
	glob.sh[493]: FAIL: glob -- expected '<-> <a> <c>', got '<a> <b> <c>'
	glob.sh[494]: FAIL: glob -- expected '<!>', got '<!> <-> <\> <a> <b> <c>'
	glob.sh[495]: FAIL: glob -- expected '<^>', got '<!> <-> <\> <^> <a> <b> <c>'
	glob.sh[497]: FAIL: glob -- expected '<~(K)[\!N]>', got '<\> <b>'
	glob.sh[498]: FAIL: glob -- expected '<~(K)[\^N]>', got '<\> <b>'
	glob.sh[499]: FAIL: glob -- expected '<~(K)[a\-c]>', got '<b>'
	glob.sh[500]: FAIL: glob -- expected '<->', got '<b>'
	glob.sh[501]: FAIL: glob -- expected '<-> <a> <c>', got '<a> <b> <c>'
	glob.sh[502]: FAIL: glob -- expected '<!>', got '<!> <-> <\> <a> <b> <c>'
	glob.sh[503]: FAIL: glob -- expected '<^>', got '<!> <-> <\> <^> <a> <b> <c>'
test glob failed at 2023-06-13+13:12:12 with exit code 23 [ 254 tests 23 errors ]
test quoting begins at 2023-06-13+13:13:15
	quoting.sh[53]: FAIL: nested $(print -r - '') differs from ''
	quoting.sh[56]: FAIL: "nested $(print -r - '')" differs from ''
test quoting failed at 2023-06-13+13:13:16 with exit code 2 [ 96 tests 2 errors ]

@McDutchie
Copy link

As far as I can tell, the "design" of the split/glob mechanism, which is decades of kludge upon kludge upon kludge etc., is simply incompatible with correct operation. Any change in the internal backslash escaping mechanism will introduce regressions. At least I sure don't know how to fix things. And they never did get it right at AT&T. This is simply hopeless. I'm throwing in the towel.

The first, simple patch above at least improves things, and fixes the unacceptable regression that @dicktyr reported. So I'm just going to commit it, make an urgent 1.0.6 release, and hope there aren't any other serious issues left to be discovered.

And we're just going to have to live with those corner case bugs involving backslashes. I cannot fix them. A comprehensive redesign is needed where that internal backslash escaping mechanism is replaced by something completely different. But that is a pipe dream, at least for the time being.

McDutchie added a commit that referenced this issue Jun 13, 2023
@dicktyr (Richard Taityr) reports:
> globs are not expanded with the following:
>
>   % echo "/"{bin,sbin}"/*"
>   /bin/* /sbin/*
>
>   % v=/; echo "$v"bin"/*"
>   /bin/*
>
> but globbing is unexpectedly performed if both parameter
> substitution and brace expansion are present:
>
>   % v=/; echo "$v"{bin,sbin}"/*"
>   [output omitted for the sake of brevity]

So, quoted pattern characters are expanded. No es bueno.

The closest I've been able to come to fixing this so far is:

src/cmd/ksh93/sh/macro.c: endfield():
- Do not set musttrim if mp->quoted is set, i.e., if the argument
  node contains any quoted parts.

This fixes the major bug. But it also partially reintroduces the
previously fixed bug <#549> for
cases where any part of the word is quoted -- even if the quoted
part is empty. For example:

    $ mkdir testdir
    $ cd testdir
    $ touch a b c ./-
    $ p='[a\-c]'
    $ echo $p    # OK
    - a c
    $ echo ""$p  # BUG
    a b c
    $ echo $p""  # BUG
    a b c

The fundamental problem is that, by the time endfield() is reached,
we have a complete word and we can no longer distinguish between
the quoted and unquoted parts of it. The Mac_t flags, such as
mp->quoted, apply to the entire word.

This is a fundamental flaw in the current design, where quotes are
internally translated to backslash escapes for certain characters.
To the best of my knowledge (and I've tried hard), it is not
possible to fix this without introducing other regressions.

A radical redesign of this entire mechanism is needed where this
internal backslash escaping is replaced by a way for every 'struct
argnod' argument node to track for each individual character
whether it is quoted or not, without modifying the argument string
itself. Because it is incorrect to modify the argument string.

But, for the foreseeable future, that is a pipe dream, because
there is no one who fully understands all this code with all its
myriad kludges and workarounds going decades back.

Resolves: #660
freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Jun 13, 2023
From the announcement:

This is an urgent bugfix release that fixes a serious regression in
pathname expansion, see: ksh93/ksh#660
The previous 1.0.5 release is withdrawn and should not be used.

Main changes between 1.0.5 and 1.0.6:

- Fixed a serious regression in pathname expansion where quoted wildcard
   characters were incorrectly expanded if a pattern contains both a brace
   expansion and a variable expansion.
- Fixed a bug where the command to launch a full-screen editor (^X^E in
   emacs and 'v' in vi) could cause the wrong command line to be edited
   if two shell sessions share a .sh_history file.

1.0.5 had a large amount of bugfixes compared to 93u+m/1.0.4. In summary:

- Fixed various bugs causing crashes.
- Fixed many bugs in the emacs and vi line editors, in command completion,
   and in file name completion.
- Fixed various bugs in the handling of quotes, backslash escapes and braces
   when processing shell glob patterns (e.g. in pathname expansion and
'case').
- ksh now throws a panic and exits if a read error (such as an I/O error)
   occurs while trying to read the next command(s) from a running script.
- Fixed many bugs in 'printf' and 'print -f' built-in commands, including:
   . Multiple bugs causing incorrect output for relative date
specifications,
     e.g., printf %T\\n 'exactly 20 months ago' now outputs a correct
result.
   . More printf bugs with mix and match of % and %x$.
   . A data corruption bug when using %B with 'printf -v varname'.
   . A bug causing double evaluation of arithmetic expressions.
- Fixed a bug where 'unset -f commandname', executed in a subshell, hides
   any built-in command by the same name for the duration of that subshell.
- Fixed ${var/#/string} and ${var/%/string} (with anchored empty pattern)
   to work as on mksh, bash and zsh; these are no longer ineffective.
- Fixed incorrect result of array slicing ${array[@]:offset:length} where
   'length' is a nested expansion involving an array.
- Command names can now end in ':' as they can on other shells.
- Fixed a spurious syntax error in compound assignments upon encountering a
   pair of repeated opening parentheses '(('.
- Fixed spurious syntax error in ${parameter:offset:length}: the arithmetic
   expressions 'offset' and 'length' may now contain the operators ( ) & |.
- Fixed a parsing bug in the declaration of .sh.math.* arithmetic functions.
- Fixed nameref self-reference loop detection for more than two namerefs.
- Several improvements to the POSIX compatibility mode.
- Many more minor and/or esoteric bugfixes.
freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Jun 13, 2023
This includes an urgent bugfix release that fixes a serious regression
in pathname expansion, see: ksh93/ksh#660
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This had better be fixed before releasing bug Something is not working
Projects
None yet
Development

No branches or pull requests

2 participants