-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash when running set
in a subshell/comsub with discipline function defined
#616
Comments
ASan stack trace (not that I can make head or tail of what the heck is going on):
|
That's... weird. I put it in a .sh, run it, the subshell doesn't crash, I add GIT_BRANCH.get to my shell startup, and I have two other .get disciplines and one .set discipline in my normal environment, and dev@754234d7 here. |
If I run it My head hurts. I have this in my normal environment
and |
The problem seems to be the use of the stack (the stkcopy() call in nvdisc.c:426). That may not be compatible with virtual subshells. If we replace edit: No, it doesn't. It just becomes intermittent, at least on my system. |
These crashes are due to "undefined behaviour" which is notoriously unpredictable. |
But this discipline function crashes it and the two I normally have in there don't? |
In my testing here so far, I only get the fault if Something weird I noticed. I replaced my |
Sure my head hurt too. To me it all comes from
Looks like comsubst() is a trouble zone and in trouble zone, trouble are found (happy silicon valley). To fix this I need explanations from guru's out there. To me all this jazz can be streamlined to
Apparently we enter here with
Jeez we enter here with type=3 and no explanation anywere what 3 could mean. but sure enough at line
Since we enter with 3 we go the I made a 'patch' that is not a fix but a demonstrator of all this explanation. above the
I add this
I know looks ugly, don't punch me on this, this is just a demonstrator
We got to do it on 1st comsubst() occurence, not the second one where it is too late, we are hosed already. The patch Enlight me about this Cheers, |
I think the type==3 thing is a red herring. It was introduced by backporting a couple of Red Hat patches, here: 970069a I radically simplified the amazing mess of hacks that were the multiple ksh command substitutions in 42becab (note: the "type" referred to there is not the same as the type parameter passed to Right now my head is hurting as well and I'm not clear that all the current comments about what the types are, are correct. But I'm pretty sure that diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 4513be322..11c37dd37 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -377,7 +377,7 @@ void sh_machere(Sfio_t *infile, Sfio_t *outfile, char *string)
break;
}
case S_PAR:
- comsubst(mp,NULL,3);
+ comsubst(mp,NULL,1);
break;
case S_EOF:
if((c=fcfill()) > 0)
@@ -1200,7 +1200,7 @@ retry1:
case S_PAR:
if(type)
goto nosub;
- comsubst(mp,NULL,3);
+ comsubst(mp,NULL,1);
return 1;
case S_DIG:
var = 0;
@@ -2229,7 +2229,6 @@ static void comsubst(Mac_t *mp,Shnode_t* t, int type)
char *cp = (char*)sh_malloc(IOBSIZE+1);
sp = sfnew(NULL,cp,IOBSIZE,fd,SF_READ|SF_MALLOC);
}
- type = 3;
}
else
{ |
Your demonstrator patch is a workaround that causes sh_subfork() to be called if the command in $(...) is
then the bug goes away as any Perhaps we're forking at the wrong point in the code path for subshares within virtual subshells. |
Actually, the forking of virtual subshells containing subshares is a red herring, too. If we remove it: demonstration patch (breaks things, do not commit)diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 4513be322..15b834612 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -2232,11 +2232,7 @@ static void comsubst(Mac_t *mp,Shnode_t* t, int type)
type = 3;
}
else
- {
- if(type==2 && sh.subshell && !sh.subshare)
- sh_subfork(); /* subshares within virtual subshells are broken, so fork first */
sp = sh_subshell(t,sh_isstate(SH_ERREXIT),type);
- }
fcrestore(&save);
}
else …then it crashes just the same. |
It effectivly strange to do things like You right this kinda patch, simply avoid doing set in subshell, its a kludge to at least avoid this core dump. I think the test about I didn't knew this RH thing, interestig I like your historicall analysis :-) |
I've found the following patch to be an effective workaround for this bug: diff --git a/src/cmd/ksh93/bltins/typeset.c b/src/cmd/ksh93/bltins/typeset.c
index b5b0330d5..544d46110 100644
--- a/src/cmd/ksh93/bltins/typeset.c
+++ b/src/cmd/ksh93/bltins/typeset.c
@@ -1432,6 +1432,15 @@ static int print_namval(Sfio_t *file,Namval_t *np,int flag, struct tdata *tp)
int indent=tp->indent, outname=0, isfun;
char tempexport=0;
sh_sigcheck();
+ /*
+ * Printing a name-value pair can cause a discipline shell function to be executed
+ * as the value is obtained via nv_getval(). This can cause messy interactions with
+ * further levels of virtual subshell or subshare that may cause the shell to crash
+ * in certain corner cases. To be safe, fork a virtual subshell early.
+ * https://github.com/ksh93/ksh/issues/616
+ */
+ if(sh.subshell && !sh.subshare)
+ sh_subfork();
if(flag)
flag = '\n';
if(tp->noref && nv_isref(np)) |
I'm away from my laptop right now but looks appealing will test it tonight.:-) |
I think we can make the forking workaround more fine-grained than that. It is only necessary if we need to call a shell discipline function to obtain the value. It is not needed for internal C discipline functions for special variables, like IFS. There is currently no way to check if a variable has a shell discipline function to get the value. What is needed is to check if the The patch below renames these for legibility (to sh_disc_getstring() and sh_disc_getnum(), respectively) and turns them into externs. For consistency, assign() is similarly renamed to sh_disc_assign(); we might want to check for this somewhere in future. We can then loop through each variable's linked list of disciplines (algorithm copied from nv_hasdisc() in nvdisc.c) and check if a shell More fine-grained patchdiff --git a/src/cmd/ksh93/bltins/typeset.c b/src/cmd/ksh93/bltins/typeset.c
index b5b0330d5..097cdc750 100644
--- a/src/cmd/ksh93/bltins/typeset.c
+++ b/src/cmd/ksh93/bltins/typeset.c
@@ -1432,6 +1432,25 @@ static int print_namval(Sfio_t *file,Namval_t *np,int flag, struct tdata *tp)
int indent=tp->indent, outname=0, isfun;
char tempexport=0;
sh_sigcheck();
+ if(sh.subshell && !sh.subshare)
+ {
+ /*
+ * If obtaining the value (using nv_getval() below) will trigger the execution of a .get or .getn shell
+ * discipline function, and we're already in a virtual subshell, then corner case interactions with any
+ * nested virtual subshells may cause a heap-use-after-free in lookup() in nvdisc.c. Avoid by forking the
+ * subshell. TODO: somehow fix the heap-use-after-free instead. https://github.com/ksh93/ksh/issues/616
+ */
+ Namfun_t *fp;
+ for(fp = np->nvfun; fp; fp = fp->next)
+ {
+ if(fp->disc && (fp->disc->getval==sh_disc_getstring || fp->disc->getnum==sh_disc_getnum))
+ {
+error(ERROR_warn(0),"[DEBUG] forking at %s",nv_name(np));
+ sh_subfork();
+ break;
+ }
+ }
+ }
if(flag)
flag = '\n';
if(tp->noref && nv_isref(np))
diff --git a/src/cmd/ksh93/include/variables.h b/src/cmd/ksh93/include/variables.h
index 842ddb2d6..d3a83cea8 100644
--- a/src/cmd/ksh93/include/variables.h
+++ b/src/cmd/ksh93/include/variables.h
@@ -43,6 +43,11 @@ extern void sh_save_rand_seed(struct rand *, int);
1 \
)
+/* for discipline shell functions (nvdisc.c) */
+extern char* sh_disc_getstring(Namval_t*,Namfun_t*);
+extern Sfdouble_t sh_disc_getnum(Namval_t*,Namfun_t*);
+extern void sh_disc_assign(Namval_t*,const char*,int,Namfun_t*);
+
/* The following defines must be kept synchronous with shtab_variables[] in data/variables.c */
#define PATHNOD (sh.bltin_nodes)
diff --git a/src/cmd/ksh93/sh/nvdisc.c b/src/cmd/ksh93/sh/nvdisc.c
index 4fc45380f..7745468c8 100644
--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -29,8 +29,6 @@
#include "io.h"
#include "shlex.h"
-static void assign(Namval_t*,const char*,int,Namfun_t*);
-
int nv_compare(Dt_t* dict, void *sp, void *dp, Dtdisc_t *disc)
{
if(sp==dp)
@@ -138,7 +136,7 @@ void nv_putv(Namval_t *np, const char *value, int flags, Namfun_t *nfp)
if(!nv_isattr(np,NV_NODISC) || fp==(Namfun_t*)nv_arrayptr(np))
break;
}
- if(!value && (flags&NV_TYPE) && fp && fp->disc->putval==assign)
+ if(!value && (flags&NV_TYPE) && fp && fp->disc->putval==sh_disc_assign)
fp = 0;
if(fp && fp->disc->putval)
(*fp->disc->putval)(np,value, flags, fp);
@@ -236,7 +234,7 @@ static void chktfree(Namval_t *np, struct vardisc *vp)
/*
* This function performs an assignment disc on the given node <np>
*/
-static void assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
+void sh_disc_assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
{
int type = (flags&NV_APPEND)?APPEND:ASSIGN;
struct vardisc *vp = (struct vardisc*)handle;
@@ -452,12 +450,12 @@ static char* lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
return cp;
}
-static char* lookups(Namval_t *np, Namfun_t *handle)
+char* sh_disc_getstring(Namval_t *np, Namfun_t *handle)
{
return lookup(np,LOOKUPS,NULL,handle);
}
-static Sfdouble_t lookupn(Namval_t *np, Namfun_t *handle)
+Sfdouble_t sh_disc_getnum(Namval_t *np, Namfun_t *handle)
{
Sfdouble_t d;
lookup(np,LOOKUPN, &d ,handle);
@@ -479,7 +477,7 @@ char *nv_setdisc(Namval_t* np,const char *event,Namval_t *action,Namfun_t *fp)
char *empty = "";
while(vp)
{
- if(vp->fun.disc && (vp->fun.disc->setdisc || vp->fun.disc->putval == assign))
+ if(vp->fun.disc && (vp->fun.disc->setdisc || vp->fun.disc->putval == sh_disc_assign))
break;
vp = (struct vardisc*)vp->fun.next;
}
@@ -536,7 +534,7 @@ char *nv_setdisc(Namval_t* np,const char *event,Namval_t *action,Namfun_t *fp)
/* Handle GET/SET/APPEND/UNSET disc */
if(np==SH_VALNOD || np==SH_LEVELNOD)
return NULL;
- if(vp && vp->fun.disc->putval!=assign)
+ if(vp && vp->fun.disc->putval!=sh_disc_assign)
vp = 0;
if(!vp)
{
@@ -548,7 +546,7 @@ char *nv_setdisc(Namval_t* np,const char *event,Namval_t *action,Namfun_t *fp)
vp->fun.disc = dp;
memset(dp,0,sizeof(*dp));
dp->dsize = sizeof(struct vardisc);
- dp->putval = assign;
+ dp->putval = sh_disc_assign;
if(nv_isarray(np) && !nv_arrayptr(np))
nv_putsub(np,NULL, 1);
nv_stack(np, (Namfun_t*)vp);
@@ -562,9 +560,9 @@ char *nv_setdisc(Namval_t* np,const char *event,Namval_t *action,Namfun_t *fp)
{
Namdisc_t *dp = (Namdisc_t*)vp->fun.disc;
if(type==LOOKUPS)
- dp->getval = lookups;
+ dp->getval = sh_disc_getstring;
else if(type==LOOKUPN)
- dp->getnum = lookupn;
+ dp->getnum = sh_disc_getnum;
vp->disc[type] = action;
}
else Of course, it would be even better if a way could be found to fix the damn heap-use-after-free... |
Trying to write a good regression test. This crash is such a heisenbug, I have not found a way to reproduce it in a regression test yet. Seems like this crash depends on something in my .kshrc. Minimal reproducer: GIT_BRANCH.get()
{
.sh.value=${ echo foo; }
}
v=$(set) On my end (with unpatched ksh):
|
OK, so the trigger seems to be that my .kshrc defines other discipline functions as well. Though systematic elimination I've found that we need at least two discipline functions to trigger the crash. New minimal reproducer:
This one crashes (on my end) by simply running |
As expected, it also crashes with |
I like your patch, will double checkit. The idea of 'mine' was trying to keep as much as the existing behavior as possible, i.e limit the early
I think you will be puzzled by this one :-) still on prod ksh, --rc vs --norc seems non relevant (with my rc that may be is not as complicated as yours :-) so I keep --norc here for getting comparable results. Note that my examples I use
The later don't crash :-)
To me this has to do with the variable name But in all case
Yes make sense, any caller to |
Using the patch along with my normal startup, I get this now:
But I don't get the memory fault.
|
The |
Well, dunno why I didn't thought earlier, when I discovered that the name of the variable did matter, i.e I decided to make a brutal kludge, i.e skipping over
And to my grand surprise, no more core dump, and no regression in QA, even though now Anyway this means that may be all the things with fork() is still good, only the Is that a trail to follow ? EDIT: |
Very interesting trail that, I will pursue it. I do disagree that it is legitimate to skip |
BTW, for future reference, something like if( (np->nvname[0]=='_')&&(np->nvname[1]==0) ) should be written as
See src/cmd/ksh93/include/variables.h and src/cmd/ksh93/data/variables.c. |
Cool learned something :) |
So, printing out name-value pairs with foo.get()
{
.sh.value=foo;
}
bar.get()
{
.sh.value=${ echo foo; }
}
(
: $bar
: $foo
)
echo $_ |
lldb backtrace with the above reproducer:
|
Relevant information from sh.1:
|
"When a discipline function is invoked, Here is a test patch that disables this functionality. It makes the crash go away on my end.diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index da9245d47..f6abf4437 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -750,11 +750,11 @@ static int set_instance(Namval_t *nq, Namval_t *node, struct Namref *nr)
#endif /* SHOPT_NAMESPACE */
}
nv_putval(SH_NAMENOD, cp, NV_NOFREE);
- memcpy(node,L_ARGNOD,sizeof(*node));
- L_ARGNOD->nvalue.nrp = nr;
- L_ARGNOD->nvflag = NV_REF|NV_NOFREE;
- L_ARGNOD->nvfun = 0;
- L_ARGNOD->nvenv = 0;
+// memcpy(node,L_ARGNOD,sizeof(*node));
+// L_ARGNOD->nvalue.nrp = nr;
+// L_ARGNOD->nvflag = NV_REF|NV_NOFREE;
+// L_ARGNOD->nvfun = 0;
+// L_ARGNOD->nvenv = 0;
if(sp)
{
nv_putval(SH_SUBSCRNOD,nr->sub=sp,NV_NOFREE);
@@ -765,9 +765,9 @@ static int set_instance(Namval_t *nq, Namval_t *node, struct Namref *nr)
static void unset_instance(Namval_t *nq, Namval_t *node, struct Namref *nr,long mode)
{
- L_ARGNOD->nvalue.nrp = node->nvalue.nrp;
- L_ARGNOD->nvflag = node->nvflag;
- L_ARGNOD->nvfun = node->nvfun;
+// L_ARGNOD->nvalue.nrp = node->nvalue.nrp;
+// L_ARGNOD->nvflag = node->nvflag;
+// L_ARGNOD->nvfun = node->nvfun;
if(nr->sub)
{
nv_putsub(nr->np, nr->sub, mode); |
Ha you beat me on this one, I learned how to monitor L_ARGNOD, thank you for that, and I discovered this.
I am not fluent enough with the setjmp/longjmp pushcontext/popcontext, I dunno how they are linked together etc... but I think if you manage set a setjmp/longjmp point so on error in sh_funct() we longjmp back to the restore point for L_ARGNOD, you would be safe and no need update set_instance() unset_instance() unless theire are bugged. |
Progress: the following patch fixes the crash introduced in 430e478 on my end, leaving only the earlier breakage introduced earlier in 88a1f3d. Learning that the But that still didn't fix the crash for me. The crash turns out to be caused by an omission in our own robustification of discipline functions (see 430e478, 2322f93). We added a Again, after this patch, the reproducer still leaves Patch with half the fixdiff --git a/src/cmd/ksh93/sh/nvdisc.c b/src/cmd/ksh93/sh/nvdisc.c
index 4fc45380f..c38eecd0e 100644
--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -245,6 +245,7 @@ static void assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
Namval_t node;
union Value *up = np->nvalue.up;
Namval_t *tp, *nr; /* for 'typeset -T' types */
+ int jmpval = 0;
if(val && (tp=nv_type(np)) && (nr=nv_open(val,sh.var_tree,NV_VARNAME|NV_ARRAY|NV_NOADD|NV_NOFAIL)) && tp==nv_type(nr))
{
char *sub = nv_getsub(np);
@@ -280,7 +281,6 @@ static void assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
if(nq && !isblocked(bp,type))
{
struct checkpt checkpoint;
- int jmpval;
int savexit = sh.savexit;
Lex_t *lexp = (Lex_t*)sh.lex_context, savelex;
int bflag;
@@ -370,6 +370,8 @@ done:
nq->nvalue.rp->running=0;
_nv_unset(nq,0);
}
+ if(jmpval >= SH_JMPSUB)
+ siglongjmp(*sh.jmplist,jmpval);
}
/*
@@ -384,10 +386,10 @@ static char* lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
char *cp=0;
Namval_t node;
union Value *up = np->nvalue.up;
+ int jmpval = 0;
if(nq && !isblocked(bp,type))
{
struct checkpt checkpoint;
- int jmpval;
int savexit = sh.savexit;
Lex_t *lexp = (Lex_t*)sh.lex_context, savelex;
/* disciplines like PS2 may run at parse time; save, reinit and restore the lexer state */
@@ -449,6 +451,8 @@ static char* lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
nq->nvalue.rp->running=0;
_nv_unset(nq,0);
}
+ if(jmpval >= SH_JMPSUB)
+ siglongjmp(*sh.jmplist,jmpval);
return cp;
}
diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index da9245d47..8bae45040 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -3224,6 +3224,7 @@ int sh_fun(Namval_t *np, Namval_t *nq, char *argv[])
char *prefix = sh.prefix;
int n=0;
char *av[3];
+ int jmpval = 0;
Fcin_t save;
fcsave(&save);
if((offset=staktell())>0)
@@ -3241,7 +3242,6 @@ int sh_fun(Namval_t *np, Namval_t *nq, char *argv[])
mode = set_instance(nq,&node, &nr);
if(is_abuiltin(np))
{
- int jmpval;
struct checkpt *buffp = (struct checkpt*)stkalloc(sh.stk,sizeof(struct checkpt));
Shbltin_t *bp = &sh.bltindata;
sh_pushcontext(buffp,SH_JMPCMD);
@@ -3258,8 +3258,6 @@ int sh_fun(Namval_t *np, Namval_t *nq, char *argv[])
sh.exitval = (funptr(np))(n,argv,bp);
}
sh_popcontext(buffp);
- if(jmpval>SH_JMPCMD)
- siglongjmp(*sh.jmplist,jmpval);
}
else
sh_funct(np,n,argv,NULL,sh_isstate(SH_ERREXIT));
@@ -3269,6 +3267,8 @@ int sh_fun(Namval_t *np, Namval_t *nq, char *argv[])
if(offset>0)
stakset(base,offset);
sh.prefix = prefix;
+ if(jmpval > SH_JMPCMD)
+ siglongjmp(*sh.jmplist,jmpval);
return sh.exitval;
}
|
Great analysis there, thank you! You've made me realise my patch above still did not fix all of the problem. The sigsetjmp/siglongjmp logic in sh_fun() is still wrong. That sh_funct() call should be fully included in it. This new version of the patch does that... and... looks like the bug is FIXED!! :-)diff --git a/src/cmd/ksh93/sh/nvdisc.c b/src/cmd/ksh93/sh/nvdisc.c
index 4fc45380f..c38eecd0e 100644
--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -245,6 +245,7 @@ static void assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
Namval_t node;
union Value *up = np->nvalue.up;
Namval_t *tp, *nr; /* for 'typeset -T' types */
+ int jmpval = 0;
if(val && (tp=nv_type(np)) && (nr=nv_open(val,sh.var_tree,NV_VARNAME|NV_ARRAY|NV_NOADD|NV_NOFAIL)) && tp==nv_type(nr))
{
char *sub = nv_getsub(np);
@@ -280,7 +281,6 @@ static void assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
if(nq && !isblocked(bp,type))
{
struct checkpt checkpoint;
- int jmpval;
int savexit = sh.savexit;
Lex_t *lexp = (Lex_t*)sh.lex_context, savelex;
int bflag;
@@ -370,6 +370,8 @@ done:
nq->nvalue.rp->running=0;
_nv_unset(nq,0);
}
+ if(jmpval >= SH_JMPSUB)
+ siglongjmp(*sh.jmplist,jmpval);
}
/*
@@ -384,10 +386,10 @@ static char* lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
char *cp=0;
Namval_t node;
union Value *up = np->nvalue.up;
+ int jmpval = 0;
if(nq && !isblocked(bp,type))
{
struct checkpt checkpoint;
- int jmpval;
int savexit = sh.savexit;
Lex_t *lexp = (Lex_t*)sh.lex_context, savelex;
/* disciplines like PS2 may run at parse time; save, reinit and restore the lexer state */
@@ -449,6 +451,8 @@ static char* lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
nq->nvalue.rp->running=0;
_nv_unset(nq,0);
}
+ if(jmpval >= SH_JMPSUB)
+ siglongjmp(*sh.jmplist,jmpval);
return cp;
}
diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index da9245d47..4024dff60 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -3224,6 +3224,8 @@ int sh_fun(Namval_t *np, Namval_t *nq, char *argv[])
char *prefix = sh.prefix;
int n=0;
char *av[3];
+ int jmpval = 0;
+ struct checkpt *buffp = (struct checkpt*)stkalloc(sh.stk,sizeof(struct checkpt));
Fcin_t save;
fcsave(&save);
if((offset=staktell())>0)
@@ -3239,15 +3241,13 @@ int sh_fun(Namval_t *np, Namval_t *nq, char *argv[])
n++;
if(nq)
mode = set_instance(nq,&node, &nr);
- if(is_abuiltin(np))
+ sh_pushcontext(buffp,SH_JMPCMD);
+ jmpval = sigsetjmp(buffp->buff,1);
+ if(jmpval == 0)
{
- int jmpval;
- struct checkpt *buffp = (struct checkpt*)stkalloc(sh.stk,sizeof(struct checkpt));
- Shbltin_t *bp = &sh.bltindata;
- sh_pushcontext(buffp,SH_JMPCMD);
- jmpval = sigsetjmp(buffp->buff,1);
- if(jmpval == 0)
+ if(is_abuiltin(np))
{
+ Shbltin_t *bp = &sh.bltindata;
bp->bnode = np;
bp->ptr = nv_context(np);
errorpush(&buffp->err,0);
@@ -3257,18 +3257,18 @@ int sh_fun(Namval_t *np, Namval_t *nq, char *argv[])
sh.exitval = 0;
sh.exitval = (funptr(np))(n,argv,bp);
}
- sh_popcontext(buffp);
- if(jmpval>SH_JMPCMD)
- siglongjmp(*sh.jmplist,jmpval);
+ else
+ sh_funct(np,n,argv,NULL,sh_isstate(SH_ERREXIT));
}
- else
- sh_funct(np,n,argv,NULL,sh_isstate(SH_ERREXIT));
+ sh_popcontext(buffp);
if(nq)
unset_instance(nq, &node, &nr, mode);
fcrestore(&save);
if(offset>0)
stakset(base,offset);
sh.prefix = prefix;
+ if(jmpval > SH_JMPCMD)
+ siglongjmp(*sh.jmplist,jmpval);
return sh.exitval;
}
|
Well, it did look like it was fixed properly, but there are a few regression test failures with the latest version of the patch. I hope we can find a way to fix those, because the patch is definitely progress.
|
I found the cause of the regressions with the latest patch. It doesn't like it when the checkpoint buffer is allocated onto the AST stack with Three times is a charm?diff --git a/src/cmd/ksh93/sh/nvdisc.c b/src/cmd/ksh93/sh/nvdisc.c
index 4fc45380f..c38eecd0e 100644
--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -245,6 +245,7 @@ static void assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
Namval_t node;
union Value *up = np->nvalue.up;
Namval_t *tp, *nr; /* for 'typeset -T' types */
+ int jmpval = 0;
if(val && (tp=nv_type(np)) && (nr=nv_open(val,sh.var_tree,NV_VARNAME|NV_ARRAY|NV_NOADD|NV_NOFAIL)) && tp==nv_type(nr))
{
char *sub = nv_getsub(np);
@@ -280,7 +281,6 @@ static void assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
if(nq && !isblocked(bp,type))
{
struct checkpt checkpoint;
- int jmpval;
int savexit = sh.savexit;
Lex_t *lexp = (Lex_t*)sh.lex_context, savelex;
int bflag;
@@ -370,6 +370,8 @@ done:
nq->nvalue.rp->running=0;
_nv_unset(nq,0);
}
+ if(jmpval >= SH_JMPSUB)
+ siglongjmp(*sh.jmplist,jmpval);
}
/*
@@ -384,10 +386,10 @@ static char* lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
char *cp=0;
Namval_t node;
union Value *up = np->nvalue.up;
+ int jmpval = 0;
if(nq && !isblocked(bp,type))
{
struct checkpt checkpoint;
- int jmpval;
int savexit = sh.savexit;
Lex_t *lexp = (Lex_t*)sh.lex_context, savelex;
/* disciplines like PS2 may run at parse time; save, reinit and restore the lexer state */
@@ -449,6 +451,8 @@ static char* lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
nq->nvalue.rp->running=0;
_nv_unset(nq,0);
}
+ if(jmpval >= SH_JMPSUB)
+ siglongjmp(*sh.jmplist,jmpval);
return cp;
}
diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index da9245d47..e7fda8ff5 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -3224,6 +3224,10 @@ int sh_fun(Namval_t *np, Namval_t *nq, char *argv[])
char *prefix = sh.prefix;
int n=0;
char *av[3];
+ int jmpval = 0;
+ /* allocate checkpoint on heap, not AST stack, to avoid
+ * possible conflicting use of the stack via sh_funct() */
+ struct checkpt *buffp = sh_malloc(sizeof(struct checkpt));
Fcin_t save;
fcsave(&save);
if((offset=staktell())>0)
@@ -3239,15 +3243,13 @@ int sh_fun(Namval_t *np, Namval_t *nq, char *argv[])
n++;
if(nq)
mode = set_instance(nq,&node, &nr);
- if(is_abuiltin(np))
+ sh_pushcontext(buffp,SH_JMPCMD);
+ jmpval = sigsetjmp(buffp->buff,1);
+ if(jmpval == 0)
{
- int jmpval;
- struct checkpt *buffp = (struct checkpt*)stkalloc(sh.stk,sizeof(struct checkpt));
- Shbltin_t *bp = &sh.bltindata;
- sh_pushcontext(buffp,SH_JMPCMD);
- jmpval = sigsetjmp(buffp->buff,1);
- if(jmpval == 0)
+ if(is_abuiltin(np))
{
+ Shbltin_t *bp = &sh.bltindata;
bp->bnode = np;
bp->ptr = nv_context(np);
errorpush(&buffp->err,0);
@@ -3257,18 +3259,19 @@ int sh_fun(Namval_t *np, Namval_t *nq, char *argv[])
sh.exitval = 0;
sh.exitval = (funptr(np))(n,argv,bp);
}
- sh_popcontext(buffp);
- if(jmpval>SH_JMPCMD)
- siglongjmp(*sh.jmplist,jmpval);
+ else
+ sh_funct(np,n,argv,NULL,sh_isstate(SH_ERREXIT));
}
- else
- sh_funct(np,n,argv,NULL,sh_isstate(SH_ERREXIT));
+ sh_popcontext(buffp);
+ free(buffp);
if(nq)
unset_instance(nq, &node, &nr, mode);
fcrestore(&save);
if(offset>0)
stakset(base,offset);
sh.prefix = prefix;
+ if(jmpval > SH_JMPCMD)
+ siglongjmp(*sh.jmplist,jmpval);
return sh.exitval;
}
|
So, please let me know if you can break the latest patch… |
I found the cause of why my second-to-last patch didn't play well with the stack in sh_fun(). The stack state was being saved ( The previous patch appears to work perfectly, but using the heap is less efficient than using the stack (as the stack allocates memory in larger chunks and is periodically auto-freed), so it's worth making the stack version work correctly. Patch version four (edit: broken; removed) |
Patch version four suddenly started crashing for me. Version three is still okay. Urgh. I'm going to sleep on it before debugging further. |
Oh, found the cause already. I was being too smart. Patch version four contains sh_pushcontext(checkpoint = stakalloc(sizeof(struct checkpt)), SH_JMPCMD); and that looks like it should work, but Patch version fivediff --git a/src/cmd/ksh93/sh/nvdisc.c b/src/cmd/ksh93/sh/nvdisc.c
index 4fc45380f..c38eecd0e 100644
--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -245,6 +245,7 @@ static void assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
Namval_t node;
union Value *up = np->nvalue.up;
Namval_t *tp, *nr; /* for 'typeset -T' types */
+ int jmpval = 0;
if(val && (tp=nv_type(np)) && (nr=nv_open(val,sh.var_tree,NV_VARNAME|NV_ARRAY|NV_NOADD|NV_NOFAIL)) && tp==nv_type(nr))
{
char *sub = nv_getsub(np);
@@ -280,7 +281,6 @@ static void assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
if(nq && !isblocked(bp,type))
{
struct checkpt checkpoint;
- int jmpval;
int savexit = sh.savexit;
Lex_t *lexp = (Lex_t*)sh.lex_context, savelex;
int bflag;
@@ -370,6 +370,8 @@ done:
nq->nvalue.rp->running=0;
_nv_unset(nq,0);
}
+ if(jmpval >= SH_JMPSUB)
+ siglongjmp(*sh.jmplist,jmpval);
}
/*
@@ -384,10 +386,10 @@ static char* lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
char *cp=0;
Namval_t node;
union Value *up = np->nvalue.up;
+ int jmpval = 0;
if(nq && !isblocked(bp,type))
{
struct checkpt checkpoint;
- int jmpval;
int savexit = sh.savexit;
Lex_t *lexp = (Lex_t*)sh.lex_context, savelex;
/* disciplines like PS2 may run at parse time; save, reinit and restore the lexer state */
@@ -449,6 +451,8 @@ static char* lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
nq->nvalue.rp->running=0;
_nv_unset(nq,0);
}
+ if(jmpval >= SH_JMPSUB)
+ siglongjmp(*sh.jmplist,jmpval);
return cp;
}
diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index da9245d47..b93a4f1b6 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -3210,21 +3210,23 @@ static void sh_funct(Namval_t *np,int argn, char *argv[],struct argnod *envlist,
}
/*
- * external interface to execute a function without arguments
+ * external interface to execute a function
* <np> is the function node
* If <nq> is not-null, then sh.name and sh.subscript will be set
*/
int sh_fun(Namval_t *np, Namval_t *nq, char *argv[])
{
- int offset = 0;
- char *base;
- Namval_t node;
+ struct checkpt *checkpoint;
+ int jmpval = 0;
+ int offset = 0;
+ char *base;
+ Namval_t node;
struct Namref nr;
long mode = 0;
char *prefix = sh.prefix;
- int n=0;
- char *av[3];
- Fcin_t save;
+ int n=0;
+ char *av[3];
+ Fcin_t save;
fcsave(&save);
if((offset=staktell())>0)
base=stakfreeze(0);
@@ -3239,36 +3241,35 @@ int sh_fun(Namval_t *np, Namval_t *nq, char *argv[])
n++;
if(nq)
mode = set_instance(nq,&node, &nr);
- if(is_abuiltin(np))
+ checkpoint = stakalloc(sizeof(struct checkpt));
+ sh_pushcontext(checkpoint, SH_JMPCMD);
+ jmpval = sigsetjmp(checkpoint->buff,1);
+ if(jmpval == 0)
{
- int jmpval;
- struct checkpt *buffp = (struct checkpt*)stkalloc(sh.stk,sizeof(struct checkpt));
- Shbltin_t *bp = &sh.bltindata;
- sh_pushcontext(buffp,SH_JMPCMD);
- jmpval = sigsetjmp(buffp->buff,1);
- if(jmpval == 0)
+ if(is_abuiltin(np))
{
+ Shbltin_t *bp = &sh.bltindata;
bp->bnode = np;
bp->ptr = nv_context(np);
- errorpush(&buffp->err,0);
+ errorpush(&checkpoint->err,0);
error_info.id = argv[0];
opt_info.index = opt_info.offset = 0;
opt_info.disc = 0;
sh.exitval = 0;
sh.exitval = (funptr(np))(n,argv,bp);
}
- sh_popcontext(buffp);
- if(jmpval>SH_JMPCMD)
- siglongjmp(*sh.jmplist,jmpval);
+ else
+ sh_funct(np,n,argv,NULL,sh_isstate(SH_ERREXIT));
}
- else
- sh_funct(np,n,argv,NULL,sh_isstate(SH_ERREXIT));
+ sh_popcontext(checkpoint);
if(nq)
unset_instance(nq, &node, &nr, mode);
fcrestore(&save);
if(offset>0)
stakset(base,offset);
sh.prefix = prefix;
+ if(jmpval >= SH_JMPFUN)
+ siglongjmp(*sh.jmplist,jmpval);
return sh.exitval;
}
|
Jeez you did all this while I was sleeping :) lemme review it, it may take a while... Ho BTW how do you insert the git diff with highlighted text in a reply ? |
Ha ok I ramp up my knowledge on the
The inner function may escape the restore except if the restore things is stuffed into a 'struct chkpt May be a fix could be to add a layer of save(np)/pushcontext/setjmp in sh_fun() and then catch longjmp/resore(np)/forward longjmp to prev layer of context... dunno if that make sense though :-) OR add a save(np)/restore(np) in any caller to sh_fun() there are not that much and remove the save/restore in Now that I look the EDIT: I meant, not doing a complete refactoring, well may be one is underway for C99, C23 anyway, but I meant more fixing macro to static inline when make sense as we go fixing. |
patch-five runs like a champ :) you are a star! |
Start with
Yeah, that is the effect of my patch for the offending sh_funct() call.
The code base we forked was still trying to be compatible with both K&R C and C++ (but badly failing at it due to bit rot). We've abandoned that but still use C89/C90 as a lowest common denominator (see a34e831, 1064933, 427ea54, and other commits referencing those). So we can still build with gcc 2.95.3, for example. I don't want to modernise beyond that yet, as I like to occasionally test ksh on ancient and obscure systems; they sometimes expose bugs others just quietly tolerate... Looks like the inline keyword was introduced in C99, so we cannot use it except subject to a feature test. We already use a couple of modern optimisation features ( Or I could just tweak the macros to avoid the multiple evaluations. I'm pretty sure this particular pair of macros is only ever called as if they were void functions, so we should be able to use { } instead of ( ) and declare a local variable to which to assign the first macro parameter, evaluating it just once.
Welcome to this codebase, you'll find many other things like that (and there were many other such things you won't find as we've cleaned them up already) :P Like, the multibyte character getter macro expansion Honestly though, it's not too bad as long as you make sure you stay aware of this issue. I've learned to make intensive use of Exuberant ctags to immediately jump to the declarations of things and take a quick look at them before doing anything even a little smart. Highly recommended for getting to know the code. You do need an editor that supports ctags, but the major ones (vi, emacs) do and so does my own favourite, joe. I spent the first couple of years awkwardly grepping the code to find things (becoming quite adept at regexing this code…) and I wish I'd discovered ctags much sooner. To create/update the tags file, I use:
Neat. Thanks for testing and analysing! You really helped, I'm not sure I'd have found the sh_fun() breakage without your analysis. |
Ha OK got it, the good old system testing, I used to do just that on my old projects, testing BE and perf on oldies, I had so slow HW at the time, I was tuning the perf like crazy, for the benefit of the latest/greatest HW. I'll keep the gcc 2.95.3 in mind then. I personally use cscope(1), an old habit, even though I am using emacs as editor... my .emacs is as old as my .kshrc (ksh88) :-) Thanx for the ```lang MD thingy will play with that a bit :) |
Forgot to say, patch-five run like a champ on ubuntu 22.04 s390x as well. |
Tested successful as described in issue with patch 5. I also checked my memory fault notes and did not find anything else to add to the mix. |
FWIW, I have also tested patch 5 and haven't found any major breakage (i.e., no crashes), although I did encounter a compiler warning:
This is fairly easy to fix (apply on top of patch 5): --- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -3241,7 +3241,7 @@ int sh_fun(Namval_t *np, Namval_t *nq, char *argv[])
n++;
if(nq)
mode = set_instance(nq,&node, &nr);
- checkpoint = stakalloc(sizeof(struct checkpt));
+ checkpoint = (struct checkpt*)stakalloc(sizeof(struct checkpt));
sh_pushcontext(checkpoint, SH_JMPCMD);
jmpval = sigsetjmp(checkpoint->buff,1);
if(jmpval == 0)
|
Thanks, good point. I should really fix the stak/stk interface to use void pointers. |
The _ variable is used for various context-dependent things and is sometimes a name reference (see man page under Variables). The $_ state changes are handled by set_instance() and unset_instance() in xec.c. In the following reproducer, $_ became corrupted, crashing the shell: foo.get() { .sh.value=foo; } bar.get() { .sh.value=${ echo foo; }; } (: $bar; : $foo) echo "$_" So this involves two .get discipline functions, a virtual subshell, and a shared-state command substitution in the second function. The $_ corruption in the reproducer was happening as of 88a1f3d, which began to robustify discipline functions by introducing a context push and sigsetjmp/siglongjmp to ensure they can restore state if an error occurs. As of that commit, the reproducer prints the empty string, and no value can be assigned to _ afterwards. While the state is restored correctly, it fails to actually execute a siglongjmp when necessary. For example, if sh_subfork() is called to fork a virtual subshell (causing jmpval == SH_JMPSUB), we need to siglongjmp in the parent process, or entirely incorrect things are going to happen, such as unset_instance() not being called when it should be. Commit 0a34324 introduced just such a sh_subfork() call, exposing this problem. The bug was further exacerbated by 0a34324. As of that commit, the reproducer crashes the shell. The real problem here is not in that commit, though, but in faulty sigsetjmp/siglongjmp logic in sh_fun(), the main interface to call a function, which is used by the discipline function routines. If a shell command run via the sh_funct() call executed a siglongjmp, it failed (among a few other things) to call unset_instance() to restore the state of $_. Moral of the story: 1. After sigsetjmp and sh_pushcontext, never fail to siglongjmp for a sufficiently high jmpval (see SH_JMP* in fault.h, and all the uses of sh_pushcontext()). 2. Always make sure the state is fully restored before siglongjmp. Thanks to @phidebian for all the help analysing the root cause of this problem. src/cmd/ksh93/sh/xec.c: sh_fun(): - Fix sigsetjmp logic by (a) making it cover the sh_funct() call, so that a siglongjmp from elsewhere will return here, and (b) delaying our own siglongjmp call until state is fully restored. - Push context with the jmpval SH_JMPCMD for a built-in command (which can also be run via this function) and SH_JMPFUN for a function; this could be needed for correct siglongjmp behaviour if an error occurs elsewhere, as various code paths will be able to detect whether it was a command or a function that failed. - While we're here, switch AST stack use from old stak(3) to new stk(3) API. This was already done in many places and will eventually be done in the entire codebase for consistency. src/cmd/ksh93/sh/nvdisc.c: assign(), lookup(): - Push context with SH_JMPFUN as we're going to run a function. - Add the missing siglongjmp if jmpval >= SH_JMPFUN. Do not siglongjmp until the state is fully restored. Resolves: #616
set
outputs the values of all the variables. This crashes in the following reproducer:The text was updated successfully, but these errors were encountered: