Skip to content

Commit

Permalink
Fix multiple bugs in .sh.match (#455)
Browse files Browse the repository at this point in the history
This commit backports all of the relevant .sh.match bugfixes from
ksh93v-. Most of the .sh.match rewrite is from versions 2012-08-24
and 2012-10-04, with patches from later releases of 93v- and
ksh2020 also applied. Note that there are still some remaining bugs
in .sh.match, although now the total count of .sh.match bugs should
be less that before.

These are the relevant changes in the ksh93v- changelog that were
backported:
12-08-07  .sh.match no longer gets set for patterns in PS4 during
          set -x.
12-08-10  Rewrote .sh.match expansions fixing several bugs and
          improving performance.
12-08-22  .sh.match now handles subpatterns that had no matches with
          ${var//pattern} correctly.
12-08-21  A bug in setting .sh.match after ${var//pattern/string}
          when string is empty has been fixed.
12-08-21  A bug in setting .sh.match after [[ string == pattern ]]
          has been fixed.
12-08-31  A bug that could cause a core dump after
          typeset -m var=.sh.match has been fixed.
12-09-10  Fixed a bug in typeset -m the .sh.match is being renamed.
12-09-07  Fixed a bug in .sh.match code that coud cause the shell
          to quitely
13-02-21  The 12-01-16 bug fix prevented .sh.match from being used
          in the replacement string. The previous code was restored
          and a different fix which prevented .sh.match from being
          computed for nested replacement has been used instead.
13-05-28  Fixed two bug for typeset -c and typeset -m for variable
          .sh.match.

Changes:
- The SHOPT_2DMATCH option has been removed. This was already the
  default behavior previously, and now it's documented in the man
  page.
- init.c: Backported the sh_setmatch() rewrite from 93v- 2012-08-24
  and 2012-10-04.
- Backported the libast 93v- strngrpmatch() function, as the
  .sh.match rewrite requires this API.
- Backported the sh_match regression tests from ksh93v-, with many
  other sh_match tests backported from ksh2020. Much of the sh_match
  script is based on code from Roland Mainz:
  https://marc.info/?l=ast-developers&m=134606574109162&w=2
  https://marc.info/?l=ast-developers&m=134490505607093
- tests/{substring,treemove}.sh: Backported other relevant .sh.match
  fixes, with tests added to the substring and treemove test scripts.
- tests/types.sh: One of the (now reverted) memory leak bugfixes
  introduced a CI test failure in this script, so for that test the
  error message has been improved.
- string/strmatch.c: The original ksh93v- code for the strngrpmatch()
  changes introduced a crash that could occur because strlen would
  be used on a null pointer. This has been fixed by avoiding strlen
  if the string is null.

One nice side effect of these changes is a considerable performance
improvement in the shbench[1] gsub benchmark (results from 20
iterations with CCFLAGS=-Os):
--------------------------------------------------
name      /tmp/ksh-current     /tmp/ksh-matchfixes
--------------------------------------------------
gsub.ksh  0.883 [0.822-0.959]  0.457 [0.442-0.505]
--------------------------------------------------

Despite all of the many fixes and improvements in the backported
93v- .sh.match code, there are a few remaining bugs:

- .sh.match is printed with a default [0] subscript (see also
  #308 (comment)):
     $ arch/*/bin/ksh -c 'echo ${!.sh.match}'
       .sh.match[0]
  This bug appears to have been introduced by the changes from
  ksh93v- 2012-08-24.
- The wrong variable name is given for 'parameter not set' errors
  (from https://marc.info/?l=ast-developers&m=134489094602596):
     $ arch/*/bin/ksh -u
     $ x=1234
     $ true "${x//~(X)([012])|([345])/}"
     $ compound co
     $ typeset -m co.array=.sh.match
     $ printf "%q\n" "${co.array[2][0]}"
     arch/linux.i386-64/bin/ksh: co.array[2][(null)]: parameter not set
- .sh.match leaks out of subshells. Further information and a
  reproducer can be found here:
  https://marc.info/?l=ast-developers&m=136292897330187

[1]: https://github.com/ksh-community/shbench
  • Loading branch information
JohnoKing committed Feb 9, 2022
1 parent 0af8199 commit b1b8d6f
Show file tree
Hide file tree
Showing 24 changed files with 1,354 additions and 142 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
LANG=ja_JP.SJIS script -q -e -c "bin/shtests --locale --nocompile" &&
: disable most SHOPTs, rebuild ksh &&
sed --regexp-extended --in-place=.orig \
'/^SHOPT (2DMATCH|AUDIT|BGX|BRACEPAT|DEVFD|DYNAMIC|EDPREDICT|ESH|FIXEDARRAY|HISTEXPAND|MULTIBYTE|NAMESPACE|OPTIMIZE|SPAWN|STATS|SUID_EXEC|VSH)=/ s/=1?/=0/' \
'/^SHOPT (AUDIT|BGX|BRACEPAT|DEVFD|DYNAMIC|EDPREDICT|ESH|FIXEDARRAY|HISTEXPAND|MULTIBYTE|NAMESPACE|OPTIMIZE|SPAWN|STATS|SUID_EXEC|VSH)=/ s/=1?/=0/' \
src/cmd/ksh93/SHOPT.sh &&
bin/package make &&
: default regression tests with SHOPTs disabled &&
Expand Down
3 changes: 3 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ Any uppercase BUG_* names are modernish shell bug IDs.
associative array when read back in by the shell. Elements that are sparse
indexed arrays are now prefixed with "typeset -a".

- The rewritten .sh.match code from ksh93v- has been backported to ksh93u+m,
fixing many bugs and improving performance by a considerable amount.

2022-02-05:

- Fixed: for indexed arrays, given an unset array member a[i] with i > 0,
Expand Down
2 changes: 0 additions & 2 deletions src/cmd/ksh93/README
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ options where no feature probe is available, probe is the same as off.

The options have the following defaults and meanings:

2DMATCH on Two-dimensional ${.sh.match} for ${var//pat/str}.

ACCT off Shell accounting.
Noted by "L" in the version string when enabled.
See README-AUDIT.md.
Expand Down
1 change: 0 additions & 1 deletion src/cmd/ksh93/SHOPT.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
# For a more complete description of the options, see src/cmd/ksh93/README.
#

SHOPT 2DMATCH=1 # two dimensional ${.sh.match} for ${var//pat/str}
SHOPT ACCT=0 # accounting
SHOPT ACCTFILE=0 # per-user accounting info
SHOPT AUDIT=1 # enable auditing per SHOPT_AUDITFILE
Expand Down
6 changes: 3 additions & 3 deletions src/cmd/ksh93/bltins/test.c
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ static int e3(struct test*);

static int test_strmatch(const char *str, const char *pat)
{
regoff_t match[2*(MATCH_MAX+1)],n;
int match[2*(MATCH_MAX+1)],n;
register int c, m=0;
register const char *cp=pat;
while(c = *cp++)
Expand All @@ -99,9 +99,9 @@ static int test_strmatch(const char *str, const char *pat)
match[0] = 0;
if(m > elementsof(match)/2)
m = elementsof(match)/2;
n = strgrpmatch(str, pat, match, m, STR_GROUP|STR_MAXIMAL|STR_LEFT|STR_RIGHT);
n = strgrpmatch(str, pat, (ssize_t*)match, m, STR_GROUP|STR_MAXIMAL|STR_LEFT|STR_RIGHT|STR_INT);
if(m==0 && n==1)
match[1] = strlen(str);
match[1] = (int)strlen(str);
if(n)
sh_setmatch(str, -1, n, match, 0);
return(n);
Expand Down
4 changes: 2 additions & 2 deletions src/cmd/ksh93/include/defs.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@
#define defs_h_defined

#include <ast.h>
#if !defined(AST_VERSION) || AST_VERSION < 20220201
#error libast version 20220201 or later is required
#if !defined(AST_VERSION) || AST_VERSION < 20220208
#error libast version 20220208 or later is required
#endif
#if !_lib_fork
#error In 2021, ksh joined the 21st century and started requiring fork(2).
Expand Down
1 change: 1 addition & 0 deletions src/cmd/ksh93/include/shell.h
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,7 @@ struct Shell_s
char instance; /* in set_instance */
char decomma; /* decimal_point=',' */
char redir0; /* redirect of 0 */
char intrace; /* set when trace expands PS4 */
char *readscript; /* set before reading a script */
int subdup; /* bitmask for dups of 1 */
int *inpipe; /* input pipe pointer */
Expand Down
7 changes: 7 additions & 0 deletions src/cmd/ksh93/sh.1
Original file line number Diff line number Diff line change
Expand Up @@ -1695,6 +1695,13 @@ element stores the complete match and the
element stores the
.IR i\^ -th
submatch.
For
.B //
the array is two dimensional with the first subscript indicating the
most recent match and subpattern match and the second script indicating
which match with
.B 0
representing the first match.
The
.B .sh.match
variable
Expand Down
6 changes: 5 additions & 1 deletion src/cmd/ksh93/sh/array.c
Original file line number Diff line number Diff line change
Expand Up @@ -1176,7 +1176,7 @@ Namval_t *nv_putsub(Namval_t *np,register char *sp,register long mode)
if(!ap || !ap->header.fun)
#endif /* SHOPT_FIXEDARRAY */
{
if(sp)
if(sp && sp!=Empty)
{
if(ap && ap->xp && !strmatch(sp,"+([0-9])"))
{
Expand All @@ -1185,7 +1185,11 @@ Namval_t *nv_putsub(Namval_t *np,register char *sp,register long mode)
size = nv_getnum(mp);
}
else
{
Dt_t *root = sh.last_root;
size = (int)sh_arith((char*)sp);
sh.last_root = root;
}
}
if(size <0 && ap)
size += array_maxindex(np);
Expand Down
1 change: 1 addition & 0 deletions src/cmd/ksh93/sh/fault.c
Original file line number Diff line number Diff line change
Expand Up @@ -585,6 +585,7 @@ void sh_exit(register int xno)
if(!pp)
sh_done(sig);
sh.arithrecursion = 0;
sh.intrace = 0;
sh.prefix = 0;
#if SHOPT_TYPEDEF
sh.mktype = 0;
Expand Down
189 changes: 125 additions & 64 deletions src/cmd/ksh93/sh/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -176,10 +176,13 @@ struct match
const char *v;
char *val;
char *rval[2];
regoff_t *match;
char node[NV_MINSZ+sizeof(char*)+sizeof(Dtlink_t)];
regoff_t first;
int *match;
char *nodes;
char *names;
int first;
int vsize;
int vlen;
int msize;
int nmatch;
int index;
int lastsub[2];
Expand Down Expand Up @@ -775,117 +778,167 @@ static void put_lastarg(Namval_t* np,const char *val,int flags,Namfun_t *fp)
np->nvenv = 0;
}

static void match2d(struct match *mp)
{
Namval_t *np;
int i;
Namarr_t *ap;
nv_disc(SH_MATCHNOD, &mp->hdr, NV_POP);
np = nv_namptr(mp->nodes, 0);
for(i=0; i < mp->nmatch; i++)
{
np->nvname = mp->names + 3 * i;
if(i > 9)
{
*np->nvname = '0' + i / 10;
np->nvname[1] = '0' + (i % 10);
}
else
*np->nvname = '0' + i;
nv_putsub(np, (char*)0, 1);
nv_putsub(np, (char*)0, 0);
nv_putsub(SH_MATCHNOD, (char*)0, i);
nv_arraychild(SH_MATCHNOD, np, 0);
np = nv_namptr(np + 1, 0);
}
if(ap = nv_arrayptr(SH_MATCHNOD))
ap->nelem = mp->nmatch;
}

/*
* store the most recent value for use in .sh.match
* treat .sh.match as a two dimensional array
*/
void sh_setmatch(const char *v, int vsize, int nmatch, regoff_t match[],int index)
void sh_setmatch(const char *v, int vsize, int nmatch, int match[], int index)
{
struct match *mp = &((Init_t*)sh.init_context)->SH_MATCH_init;
Namval_t *np = (Namval_t*)(&(mp->node[0]));
register int i,n,x;
unsigned int savesub = sh.subshell;
Init_t *ip = sh.init_context;
struct match *mp = &ip->SH_MATCH_init;
register int i,n,x, savesub=sh.subshell;
Namarr_t *ap = nv_arrayptr(SH_MATCHNOD);
Namarr_t *ap_save = ap;
/* do not crash if .sh.match is unset */
if(!ap)
Namval_t *np;
if(sh.intrace)
return;
sh.subshell = 0;
#if !SHOPT_2DMATCH
index = 0;
#else
if(index==0)
#endif /* !SHOPT_2DMATCH */
if(index<0)
{
if(ap->hdr.next != &mp->hdr)
{
free((void*)ap);
ap = nv_arrayptr(np);
SH_MATCHNOD->nvfun = &ap->hdr;
}
if(ap)
np = nv_namptr(mp->nodes,0);
if(mp->index==0)
match2d(mp);
for(i=0; i < mp->nmatch; i++)
{
ap->nelem &= ~ARRAY_SCAN;
i = array_elem(ap);
ap->nelem++;
while(--i>= 0)
nv_disc(np,&mp->hdr,NV_LAST);
nv_putsub(np,(char*)0,mp->index);
for(x=mp->index; x >=0; x--)
{
n = i + x*mp->nmatch;
if(mp->match[2*n+1]>mp->match[2*n])
nv_putsub(np,Empty,ARRAY_ADD|x);
}
if((ap=nv_arrayptr(np)) && array_elem(ap)==0)
{
nv_putsub(SH_MATCHNOD, (char*)0,i);
nv_putsub(SH_MATCHNOD,(char*)0,i);
_nv_unset(SH_MATCHNOD,NV_RDONLY);
}
ap->nelem--;
np = nv_namptr(np+1,0);
}
if(!nv_hasdisc(SH_MATCHNOD,mp->hdr.disc))
nv_disc(SH_MATCHNOD,&mp->hdr,NV_LAST);
if(nmatch)
nv_putsub(SH_MATCHNOD, NIL(char*), (nmatch-1)|ARRAY_FILL|ARRAY_SETSUB);
ap_save->nelem = mp->nmatch = nmatch;
mp->v = v;
mp->first = match[0];
sh.subshell = savesub;
return;
}
#if SHOPT_2DMATCH
else
mp->index = index;
if(index==0)
{
if(index==1)
if(mp->nodes)
{
np->nvalue.cp = Empty;
np->nvfun = SH_MATCHNOD->nvfun;
nv_onattr(np,NV_NOFREE|NV_ARRAY);
SH_MATCHNOD->nvfun = 0;
np = nv_namptr(mp->nodes,0);
for(i=0; i < mp->nmatch; i++)
{
nv_putsub(SH_MATCHNOD, (char*)0, i);
nv_arraychild(SH_MATCHNOD, np,0);
if(np->nvfun && np->nvfun != &mp->hdr)
{
free((void*)np->nvfun);
np->nvfun = 0;
}
np = nv_namptr(np+1,0);
}
ap_save->nelem = mp->nmatch;
free((void*)mp->nodes);
mp->nodes = 0;
}
mp->vlen = 0;
if(ap && ap->hdr.next != &mp->hdr)
free((void*)ap);
SH_MATCHNOD->nvalue.cp = 0;
SH_MATCHNOD->nvfun = 0;
if(!(mp->nmatch=nmatch) && !v)
{
sh.subshell = savesub;
return;
}
ap = nv_arrayptr(np);
nv_putsub(np, NIL(char*), index|ARRAY_FILL|ARRAY_SETSUB);
mp->nodes = sh_calloc(mp->nmatch*(NV_MINSZ+sizeof(void*)+3),1);
mp->names = mp->nodes + mp->nmatch*(NV_MINSZ+sizeof(void*));
np = nv_namptr(mp->nodes,0);
nv_disc(SH_MATCHNOD,&mp->hdr,NV_LAST);
for(i=nmatch; --i>=0;)
{
if(match[2*i]>=0)
nv_putsub(SH_MATCHNOD,Empty,ARRAY_ADD|i);
}
mp->v = v;
mp->first = match[0];
}
else
{
if(index==1)
match2d(mp);
}
#endif /* SHOPT_2DMATCH */
sh.subshell = savesub;
index *= 2*mp->nmatch;
if(mp->nmatch)
{
for(n=mp->first+(mp->v-v),vsize=0,i=0; i < 2*nmatch; i++)
{
if(match[i]>=0 && (match[i] - n) > vsize)
vsize = match[i] -n;
}
index *= 2*mp->nmatch;
i = (index+2*mp->nmatch)*sizeof(match[0]);
if((i+vsize) >= mp->vsize)
if(i >= mp->msize)
{
if(mp->msize)
mp->match = (int*)sh_realloc(mp->match,2*i);
else
mp->match = (int*)sh_malloc(2*i);
mp->msize = 2*i;
}
if(vsize >= mp->vsize)
{
if(mp->vsize)
mp->match = (int*)sh_realloc(mp->match,i+vsize+1);
mp->val = (char*)sh_realloc(mp->val,x=2*vsize);
else
mp->match = (int*)sh_malloc(i+vsize+1);
mp->vsize = i+vsize+1;
mp->val = (char*)sh_malloc(x=vsize+1);
mp->vsize = x;
}
mp->val = ((char*)mp->match)+i;
memcpy(mp->match+index,match,nmatch*2*sizeof(match[0]));
for(x=0,i=0; i < 2*nmatch; i++)
for(i=0; i < 2*nmatch; i++)
{
if(match[i]>=0)
mp->match[index+i] -= n;
else
x=1;

}
ap_save->nelem -= x;
while(i < 2*mp->nmatch)
mp->match[index+i++] = -1;
memcpy(mp->val,v+n,vsize);
mp->val[vsize] = 0;
if(index==0)
v+= mp->first;
memcpy(mp->val+mp->vlen,v,vsize-mp->vlen);
mp->val[mp->vlen=vsize] = 0;
mp->lastsub[0] = mp->lastsub[1] = -1;
}
}
}

static char* get_match(register Namval_t* np, Namfun_t *fp)
{
struct match *mp = (struct match*)fp;
int sub,sub2=0,n,i =!mp->index;
char *val;
sub = nv_aindex(SH_MATCHNOD);
if(sub<0)
sub = 0;
if(np!=SH_MATCHNOD)
sub2 = nv_aindex(np);
if(sub>=mp->nmatch)
Expand Down Expand Up @@ -915,7 +968,15 @@ static char* get_match(register Namval_t* np, Namfun_t *fp)
return(mp->rval[i]);
}

static const Namdisc_t SH_MATCH_disc = { sizeof(struct match), 0, get_match };
static char *name_match(Namval_t *np, Namfun_t *fp)
{
int sub = nv_aindex(SH_MATCHNOD);
sfprintf(sh.strbuf,".sh.match[%d]",sub);
return(sfstruse(sh.strbuf));
}

static const Namdisc_t SH_MATCH_disc = { sizeof(struct match), 0, get_match,
0,0,0,0,name_match };

static char* get_version(register Namval_t* np, Namfun_t *fp)
{
Expand Down
Loading

0 comments on commit b1b8d6f

Please sign in to comment.