Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`+=' operator picks up global value of variable inside function even when variable is declared locally without assigning a value #533

Closed
jghub opened this issue Sep 9, 2022 · 20 comments
Labels
bug Something is not working

Comments

@jghub
Copy link

jghub commented Sep 9, 2022

consider

v=outside
function f { typeset v; print ">$v<"; v+=" inside"; print ">$v<"; }
f

if executed with ksh93u+ it works as expected. with 93u+m the += inside the function appends to the value of the global variable with the same name as the local one.

with 93u+ the output thus is correctly:

><
> inside<

while with 93u+m one gets

><
>outside inside<

correct behaviour is restored with 93u+m if the function definition is changed to

function f { typeset v=; print ">$v<"; v+=" inside"; print ">$v<"; }

i.e. if the local variable is not only declared but explicitly assigned an empty value.

@JohnoKing
Copy link

JohnoKing commented Sep 9, 2022

This bug was introduced in commit 75796a9. That change was backported from ksh93v- (which also has this bug). From what I can tell the code in question can't tell the difference between a function scope and an invocation-local scope, which results in ksh copying $v from outside of the function invocation-local scope before appending to the variable. A bugfix will likely involve avoiding the nv_clone call if no invocation-local scope is detected.

@McDutchie McDutchie added the bug Something is not working label Sep 9, 2022
@JohnoKing
Copy link

I have created a patch that should fix this bug. Please test:

diff --git a/src/cmd/ksh93/include/shell.h b/src/cmd/ksh93/include/shell.h
index 77eb6b828..34f1dbb88 100644
--- a/src/cmd/ksh93/include/shell.h
+++ b/src/cmd/ksh93/include/shell.h
@@ -334,6 +334,7 @@ struct Shell_s
 	struct dolnod	*arglist;
 	int16_t		fn_depth;	/* scoped ksh-style function call depth */
 	int16_t		dot_depth;	/* dot-script and POSIX function call depth */
+	char		invoc_local;	/* set when inside of an invocation-local scope */
 	int		hist_depth;
 	int		xargmin;
 	int		xargmax;
diff --git a/src/cmd/ksh93/sh/name.c b/src/cmd/ksh93/sh/name.c
index 55387a66b..56811222b 100644
--- a/src/cmd/ksh93/sh/name.c
+++ b/src/cmd/ksh93/sh/name.c
@@ -1560,7 +1560,7 @@ void nv_putval(register Namval_t *np, const char *string, int flags)
 	 * When we are in an invocation-local scope and we are using the
 	 * += operator, clone the variable from the previous scope.
 	 */
-	if((flags&NV_APPEND) && nv_isnull(np) && sh.var_tree->view)
+	if((flags&NV_APPEND) && nv_isnull(np) && sh.var_tree->view && sh.invoc_local)
 	{
 		Namval_t *mp = nv_search(np->nvname,sh.var_tree->view,0);
 		if(mp)
@@ -2287,6 +2287,8 @@ void sh_scope(struct argnod *envlist, int fun)
 		newroot = nv_dict(sh.namespace);
 #endif /* SHOPT_NAMESPACE */
 	newscope = dtopen(&_Nvdisc,Dtoset);
+	if(!fun)
+		sh.invoc_local = 1;
 	if(envlist)
 	{
 		dtview(newscope,(Dt_t*)sh.var_tree);
@@ -3500,6 +3502,7 @@ void sh_unscope(void)
 		sh.var_tree=dp;
 		dtclose(root);
 	}
+	sh.invoc_local = 0;
 }
 
 /*

@jghub
Copy link
Author

jghub commented Sep 9, 2022

I did: yes, this looks good :)

@jghub
Copy link
Author

jghub commented Sep 9, 2022

and of course: thanks a lot for this quick fix: that error cost me some 1-2 hours of hunting it down today in a 800 LOC script that I have used but not modified (or looked at) for a couple of years and suddenly got faulty behaviour. it took me quite some time to realise that a global variable hanging around with the same name as the local one in the affected function caused the issue (the more so as in that special context the global var had the same value as the local one and the user-visible effect was "stuttering" the value 2 times so I was initially only searching within the function :) ).

@JohnoKing
Copy link

JohnoKing commented Sep 9, 2022

The patch isn't yet complete, as I've encountered a related bug. Unlike the previous one, this bug can be reproduced in ksh93u+ and not in ksh93v- (it's not in the dev branch, but my patch reintroduces it):

$ cat /tmp/bar
function foo {
	echo $bar
}
function bar {
	bar=bar_
	bar+=foo foo
}
bar

$ ksh93u /tmp/bar # Or 93u+m with patch applied
foo

$ ksh93v /tmp/bar # Or 93u+m unpatched dev branch
bar_foo
Patch version 2 (doesn't fix the other bug)
diff --git a/src/cmd/ksh93/include/defs.h b/src/cmd/ksh93/include/defs.h
index f2c52c585..748607f13 100644
--- a/src/cmd/ksh93/include/defs.h
+++ b/src/cmd/ksh93/include/defs.h
@@ -149,7 +149,7 @@ extern const char	*_sh_translate(const char*);
 extern int		sh_trace(char*[],int);
 extern void		sh_trim(char*);
 extern int		sh_type(const char*);
-extern void             sh_unscope(void);
+extern void             sh_unscope(int fun);
 #if SHOPT_NAMESPACE
     extern Namval_t	*sh_fsearch(const char *,int);
 #endif /* SHOPT_NAMESPACE */
diff --git a/src/cmd/ksh93/include/shell.h b/src/cmd/ksh93/include/shell.h
index 77eb6b828..34f1dbb88 100644
--- a/src/cmd/ksh93/include/shell.h
+++ b/src/cmd/ksh93/include/shell.h
@@ -334,6 +334,7 @@ struct Shell_s
 	struct dolnod	*arglist;
 	int16_t		fn_depth;	/* scoped ksh-style function call depth */
 	int16_t		dot_depth;	/* dot-script and POSIX function call depth */
+	int		invoc_local;	/* set when inside of an invocation-local scope */
 	int		hist_depth;
 	int		xargmin;
 	int		xargmax;
diff --git a/src/cmd/ksh93/sh/name.c b/src/cmd/ksh93/sh/name.c
index 55387a66b..3f3cce8bd 100644
--- a/src/cmd/ksh93/sh/name.c
+++ b/src/cmd/ksh93/sh/name.c
@@ -1560,7 +1560,7 @@ void nv_putval(register Namval_t *np, const char *string, int flags)
 	 * When we are in an invocation-local scope and we are using the
 	 * += operator, clone the variable from the previous scope.
 	 */
-	if((flags&NV_APPEND) && nv_isnull(np) && sh.var_tree->view)
+	if(sh.invoc_local && (flags&NV_APPEND) && nv_isnull(np))
 	{
 		Namval_t *mp = nv_search(np->nvname,sh.var_tree->view,0);
 		if(mp)
@@ -2287,6 +2287,8 @@ void sh_scope(struct argnod *envlist, int fun)
 		newroot = nv_dict(sh.namespace);
 #endif /* SHOPT_NAMESPACE */
 	newscope = dtopen(&_Nvdisc,Dtoset);
+	if(!fun)
+		sh.invoc_local++;
 	if(envlist)
 	{
 		dtview(newscope,(Dt_t*)sh.var_tree);
@@ -3485,10 +3487,12 @@ Shscope_t *sh_setscope(Shscope_t *scope)
 	return(old);
 }
 
-void sh_unscope(void)
+void sh_unscope(int fun)
 {
 	register Dt_t *root = sh.var_tree;
 	register Dt_t *dp = dtview(root,(Dt_t*)0);
+	if(!fun)
+		sh.invoc_local--;
 	if(dp)
 	{
 		table_unset(root,NV_RDONLY|NV_NOSCOPE,dp);
diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index efdbf9778..47057d097 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -1316,7 +1316,7 @@ int sh_exec(register const Shnode_t *t, int flags)
 					if(buffp->olist)
 						free_list(buffp->olist);
 					if(scope)
-						sh_unscope();
+						sh_unscope(0);
 					bp->ptr = (void*)save_ptr;
 					bp->data = (void*)save_data;
 					sh.redir0 = 0;
@@ -3121,7 +3121,7 @@ int sh_funscope(int argn, char *argv[],int(*fun)(void*),void *arg,int execflg)
 		UNREACHABLE();
 	}
 	sh_popcontext(buffp);
-	sh_unscope();
+	sh_unscope(1);
 	sh.namespace = nspace;
 	sh.var_tree = (Dt_t*)prevscope->save_tree;
 	sh_argreset(argsav,saveargfor);
@@ -3510,7 +3510,7 @@ static pid_t sh_ntfork(const Shnode_t *t,char *argv[],int *jobid,int topfd)
 		sigreset(1);	/* restore ignored signals */
 	if(scope)
 	{
-		sh_unscope();
+		sh_unscope(0);
 		if(jmpval==SH_JMPSCRIPT)
 			nv_setlist(t->com.comset,NV_EXPORT|NV_IDENT|NV_ASSIGN,0);
 	}
diff --git a/src/cmd/ksh93/tests/variables.sh b/src/cmd/ksh93/tests/variables.sh
index e6abcbeef..0bb65815e 100755
--- a/src/cmd/ksh93/tests/variables.sh
+++ b/src/cmd/ksh93/tests/variables.sh
@@ -1460,5 +1460,34 @@ do
 	done
 done
 
+# ======
+# The += operator shouldn't copy variables outside of a function's scope
+# https://github.com/ksh93/ksh/issues/533
+unset v foo bar
+v=outside
+function f {
+	typeset v
+	print -n "$v"
+	v+="inside"
+	print "$v"
+}
+function foo {
+	echo $bar
+}
+function bar {
+	bar=bar_
+	bar+=foo foo
+	bar+=foo "$SHELL" -c 'echo $bar'
+	bar+=foo
+	echo $bar
+}
+exp='inside
+bar_foo
+bar_foo
+bar_foo'
+got=$(f && bar)
+[[ $exp == "$got" ]] || err_exit "+= operator used in function copies variable from outside of the function's scope" \
+	"(expected $(printf %q "$exp"), got $(printf %q "$got"))"
+
 # ======
 exit $((Errors<125?Errors:125))

@JohnoKing
Copy link

JohnoKing commented Sep 9, 2022

I fixed the above issue, but found yet another bug and haven't found a bugfix for it (this one appears to be in all versions of ksh edit: it's a regression introduced in ksh93t- 2008-06-02). The += operator assigns to the wrong scope when used in a nested function. Reproducer:

$ cat /tmp/baz
function three {
	true
}
function two {
	var+='wrong ' sh -c 'true'
	var+='wrong ' true
	var+='wrong ' three
	echo $var
}
function one {
	var=' ' two
}
one

$ ksh /tmp/baz
wrong wrong wrong

# Output from ksh compiled with SHOPT_SPAWN disabled
$ ksh-nospawnveg /tmp/baz
wrong wrong
Patch version 3
diff --git a/src/cmd/ksh93/include/shell.h b/src/cmd/ksh93/include/shell.h
index 77eb6b828..34f1dbb88 100644
--- a/src/cmd/ksh93/include/shell.h
+++ b/src/cmd/ksh93/include/shell.h
@@ -334,6 +334,7 @@ struct Shell_s
 	struct dolnod	*arglist;
 	int16_t		fn_depth;	/* scoped ksh-style function call depth */
 	int16_t		dot_depth;	/* dot-script and POSIX function call depth */
+	char		invoc_local;	/* set when inside of an invocation-local scope */
 	int		hist_depth;
 	int		xargmin;
 	int		xargmax;
diff --git a/src/cmd/ksh93/sh/name.c b/src/cmd/ksh93/sh/name.c
index 55387a66b..9755a55b8 100644
--- a/src/cmd/ksh93/sh/name.c
+++ b/src/cmd/ksh93/sh/name.c
@@ -1560,7 +1560,7 @@ void nv_putval(register Namval_t *np, const char *string, int flags)
 	 * When we are in an invocation-local scope and we are using the
 	 * += operator, clone the variable from the previous scope.
 	 */
-	if((flags&NV_APPEND) && nv_isnull(np) && sh.var_tree->view)
+	if(sh.invoc_local && (flags&NV_APPEND) && nv_isnull(np))
 	{
 		Namval_t *mp = nv_search(np->nvname,sh.var_tree->view,0);
 		if(mp)
diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index efdbf9778..eefb84be7 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -1240,6 +1240,7 @@ int sh_exec(register const Shnode_t *t, int flags)
 						if(argp)
 						{
 							scope++;
+							sh.invoc_local++;
 							sh_scope(argp,0);
 						}
 						opt_info.index = opt_info.offset = 0;
@@ -1316,7 +1317,10 @@ int sh_exec(register const Shnode_t *t, int flags)
 					if(buffp->olist)
 						free_list(buffp->olist);
 					if(scope)
+					{
 						sh_unscope();
+						sh.invoc_local--;
+					}
 					bp->ptr = (void*)save_ptr;
 					bp->data = (void*)save_data;
 					sh.redir0 = 0;
@@ -1335,6 +1339,7 @@ int sh_exec(register const Shnode_t *t, int flags)
 				if(!command && np && nv_isattr(np,NV_FUNCTION))
 				{
 					volatile int indx;
+					volatile char scope = 0;
 					struct checkpt *buffp = (struct checkpt*)stkalloc(sh.stk,sizeof(struct checkpt));
 #if SHOPT_NAMESPACE
 					Namval_t node, *namespace=0;
@@ -1392,6 +1397,11 @@ int sh_exec(register const Shnode_t *t, int flags)
 					}
 					if(jmpval == 0)
 					{
+						if(argp)
+						{
+							sh.invoc_local++;
+							scope++;
+						}
 						if(io)
 							indx = sh_redirect(io,execflg);
 #if SHOPT_NAMESPACE
@@ -1419,6 +1429,8 @@ int sh_exec(register const Shnode_t *t, int flags)
 						sh_popcontext(buffp);
 						sh_iorestore(indx,jmpval);
 					}
+					if(scope)
+						sh.invoc_local--;
 					if(nq)
 						unset_instance(nq,&node,&nr,mode);
 					sh_funstaks(slp->slchild,-1);
@@ -2999,6 +3011,7 @@ int sh_funscope(int argn, char *argv[],int(*fun)(void*),void *arg,int execflg)
 	int			isig,jmpval;
 	volatile int		r = 0;
 	int			n;
+	char			save_invoc_local;
 	char 			**savsig, *save_debugtrap = 0;
 	struct funenv		*fp = 0;
 	struct checkpt	*buffp = (struct checkpt*)stkalloc(sh.stk,sizeof(struct checkpt));
@@ -3079,6 +3092,8 @@ int sh_funscope(int argn, char *argv[],int(*fun)(void*),void *arg,int execflg)
 		nv_putval(SH_PATHNAMENOD,sh.st.filename,NV_NOFREE);
 		nv_putval(SH_FUNNAMENOD,sh.st.funname,NV_NOFREE);
 	}
+	save_invoc_local = sh.invoc_local;
+	sh.invoc_local = 0;
 	jmpval = sigsetjmp(buffp->buff,0);
 	if(jmpval == 0)
 	{
@@ -3113,6 +3128,7 @@ int sh_funscope(int argn, char *argv[],int(*fun)(void*),void *arg,int execflg)
 			r = sh.exitval;
 		}
 	}
+	sh.invoc_local = save_invoc_local;
 	sh.fn_depth--;
 	update_sh_level();
 	if(sh.fn_depth==1 && jmpval==SH_JMPERRFN)
@@ -3369,6 +3385,7 @@ static pid_t sh_ntfork(const Shnode_t *t,char *argv[],int *jobid,int topfd)
 		if(t->com.comset)
 		{
 			scope++;
+			sh.invoc_local++;
 			sh_scope(t->com.comset,0);
 		}
 		if(!strchr(path=argv[0],'/')) 
@@ -3511,6 +3528,7 @@ static pid_t sh_ntfork(const Shnode_t *t,char *argv[],int *jobid,int topfd)
 	if(scope)
 	{
 		sh_unscope();
+		sh.invoc_local--;
 		if(jmpval==SH_JMPSCRIPT)
 			nv_setlist(t->com.comset,NV_EXPORT|NV_IDENT|NV_ASSIGN,0);
 	}
diff --git a/src/cmd/ksh93/tests/variables.sh b/src/cmd/ksh93/tests/variables.sh
index e6abcbeef..9842e17e7 100755
--- a/src/cmd/ksh93/tests/variables.sh
+++ b/src/cmd/ksh93/tests/variables.sh
@@ -1460,5 +1460,52 @@ do
 	done
 done
 
+# ======
+# The += operator shouldn't copy variables outside of a function's scope
+# https://github.com/ksh93/ksh/issues/533
+unset v foo bar
+v=outside
+function f {
+	typeset v
+	print -n "$v"
+	v+="inside"
+	print "$v"
+}
+function foo {
+	echo $bar
+}
+function bar {
+	bar=bar_
+	bar+=foo foo
+	bar+=foo "$SHELL" -c 'echo $bar'
+	bar+=foo
+	echo $bar
+}
+exp='inside
+bar_foo
+bar_foo
+bar_foo'
+got=$(f && bar)
+[[ $exp == "$got" ]] || err_exit "+= operator used in function copies variable from outside of the function's scope" \
+	"(expected $(printf %q "$exp"), got $(printf %q "$got"))"
+
+unset var
+function three {
+        :
+}
+function two {
+	var+='wrong ' sh -c 'true'
+	var+='wrong ' true
+	var+='wrong ' three
+	echo $var
+}
+function one {
+	var=one_ two
+}
+exp=one_
+got=$(one)
+[[ $exp == "$got" ]] || err_exit "+= operator in a nested function appends variable in the wrong scope" \
+	"(expected $(printf %q "$exp"), got $(printf %q "$got"))"
+
 # ======
 exit $((Errors<125?Errors:125))

@JohnoKing
Copy link

After experimentation I have found out that this diff in ksh93t- is what broke the += operator when used in nested functions:

@@ -486,11 +663,48 @@ Namval_t *nv_create(const char *name, Dt_t *root, int flags, Namfun_t *dp)
 			isref = 0;
 			dp->last = cp;
 			mode =  (c=='.' || (flags&NV_NOADD))?add:NV_ADD;
-			if(flags&NV_NOSCOPE)
+			if((flags&NV_NOSCOPE) && c!='.')
 				mode |= HASH_NOSCOPE;
+			np=0;
 			if(top)
-				nq = nv_search(name,sh.var_base,0);
-			if(np = nv_search(name,root,mode))
+			{
+				struct Ufunction *rp;
+				if((rp=shp->st.real_fun) && !rp->sdict && (flags&NV_STATIC))
+				{
+					Dt_t *dp = dtview(shp->var_tree,(Dt_t*)0);
+					rp->sdict = dtopen(&_Nvdisc,Dtoset);
+					dtview(rp->sdict,shp->var_base);
+					dtview(shp->var_tree,rp->sdict);
+				}
+				if(np = nv_search(name,shp->var_tree,0))
+				{
+					if(shp->var_tree->walk == shp->var_base)
+					{
+						nq = np;
+						if(flags&NV_NOSCOPE)
+							np = 0;
+					}
+					else
+					{
+						root = shp->var_tree->walk;
+						flags |= NV_NOSCOPE;
+					}
+				}
+				if(rp && rp->sdict && (flags&NV_STATIC))
+				{
+					root = rp->sdict;
+					if(np && shp->var_tree->walk==shp->var_tree)
+					{
+						_nv_unset(np,0);
+						dtdelete(shp->var_tree,np);
+						free((void*)np);
+						np = 0;
+					}
+					if(!np || shp->var_tree->walk!=root)
+						np =  nv_search(name,root,HASH_NOSCOPE|NV_ADD);
+				}
+			}
+			if(np ||  (np = nv_search(name,root,mode)))
 			{
 				isref = nv_isref(np);
 				if(top)

Below is my attempt at fixing the remaining bug (apply this after applying my patch). It seems to work, although I'm not sure if it's the correct fix.

diff --git a/src/cmd/ksh93/sh/name.c b/src/cmd/ksh93/sh/name.c
index 9755a55b8..76d40eb95 100644
--- a/src/cmd/ksh93/sh/name.c
+++ b/src/cmd/ksh93/sh/name.c
@@ -711,7 +711,7 @@ Namval_t *nv_create(const char *name,  Dt_t *root, int flags, Namfun_t *dp)
 {
 	char			*sub=0, *cp=(char*)name, *sp, *xp;
 	register int		c;
-	register Namval_t	*np=0, *nq=0;
+	register Namval_t	*np=0, *nq=0, *mp;
 	Namfun_t		*fp=0;
 	long			mode, add=0;
 	int			copy=0,isref,top=0,noscope=(flags&NV_NOSCOPE);
@@ -780,18 +780,18 @@ Namval_t *nv_create(const char *name,  Dt_t *root, int flags, Namfun_t *dp)
 					dtview(rp->sdict,dp);
 					dtview(sh.var_tree,rp->sdict);
 				}
-				if(np = nv_search(name,sh.var_tree,0))
+				if(mp = nv_search(name,sh.var_tree,0))
 				{
 #if SHOPT_NAMESPACE
-					if(sh.var_tree->walk==sh.var_base || (sh.var_tree->walk!=sh.var_tree && sh.namespace &&  nv_dict(sh.namespace)==sh.var_tree->walk))
+					if(sh.var_tree->walk==sh.var_base || (sh.var_tree->walk!=sh.var_tree && sh.namespace && nv_dict(sh.namespace)==sh.var_tree->walk))
 #else
 					if(sh.var_tree->walk==sh.var_base)
 #endif /* SHOPT_NAMESPACE */
 					{
 #if SHOPT_NAMESPACE
-						if(!(nq = nv_search((char*)np,sh.var_base,NV_REF)))
+						if(!(nq = nv_search((char*)mp,sh.var_base,NV_REF)))
 #endif /* SHOPT_NAMESPACE */
-						nq = np;
+						nq = mp;
 						sh.last_root = sh.var_tree->walk;
 						if((flags&NV_NOSCOPE) && *cp!='.')
 						{
@@ -799,17 +799,19 @@ Namval_t *nv_create(const char *name,  Dt_t *root, int flags, Namfun_t *dp)
 								root = sh.var_tree->walk;
 							else
 							{
-								nv_delete(np,(Dt_t*)0,NV_NOFREE);
-								np = 0;
+								nv_delete(mp,(Dt_t*)0,NV_NOFREE);
+								mp = 0;
 							}
 						}
+						np = mp;
 					}
-					else
+					else if(!sh.invoc_local)
 					{
 						if(sh.var_tree->walk)
 							root = sh.var_tree->walk;
 						flags |= NV_NOSCOPE;
 						noscope = 1;
+						np = mp;
 					}
 				}
 				if(rp && rp->sdict && (flags&NV_STATIC))

@jghub
Copy link
Author

jghub commented Sep 10, 2022

ok, I did that (first applied "Patch version 3" and the one immediately preceding this post):

@McDutchie
Copy link

The += assignments in function two precede an external command, a regular built-in command, and a ksh function call. In all three cases, the assignment should remain in effect only during that invocation.

@McDutchie
Copy link

Just a note to say I'm quite busy at the moment and I'll try to get to testing the patches, and actually understanding what's going on here, as soon as I can. Thanks for finding and reporting the problem, @jghub.

@jghub
Copy link
Author

jghub commented Sep 10, 2022

I do wish you much success with that :). this one really bit me and it was rather hard to "corner" in the original long script where I noted it (for me anyway). and I also wonder: I always thought that all three of

typeset par
typeset par=
typeset par=''

per definition where declaring a new, empty variable in an equivalent/indistinguishable way. is this a misconception? if it is not, then I am even more surprised that changing the local declaration to `typeset v=' in the original example restored correct behaviour

@JohnoKing
Copy link

JohnoKing commented Sep 10, 2022

I do not really understand, I am afraid: with the patched version I do not get any output (and this seems what you wanted to achieve?)

The correct result is no output. That is a bit confusing though, so I've updated the reproducer script to output 'one ' instead for the correct output.

New reproducer script
function three {
	true
}
function two {
	var+='wrong ' sh -c 'true'
	var+='wrong ' true
	var+='wrong ' three
	echo $var  # This should print 'one '
}
function one {
	var='one ' two
}
one
echo $var  # This should print a blank newline

but looking at the code I actually would expect to see the "wrong wrong wrong" echo since the global var is assigned to in two followed by echo $var. so this seemed correct to me . what am I missing?

var is only assigned in invocation-local scopes, not the global scope. The scope of var should be limited to each command that is run (i.e., var+='wrong ' preceding a command should only change var while the command is running). This behavior is observed in bash and zsh, and is also observed outside of ksh functions, e.g.:

$ cat /tmp/baz
var=correct
var+='wrong ' sh -c 'true'
var='wrong ' true
echo $var
$ ksh /tmp/baz
correct
$ bash /tmp/baz  # or zsh and mksh
correct

Although var is in the global scope in the above example, preceding var= and var+= assignments create a local scope that limits the changed var to the running command. There is no reason why nested functions should behave differently in this regard.

@jghub
Copy link
Author

jghub commented Sep 10, 2022

yes, got it . thanks

@McDutchie
Copy link

I always thought that all three of

typeset par
typeset par=
typeset par=''

per definition where declaring a new, empty variable in an equivalent/indistinguishable way. i

They're not equivalent. The first one declares a variable in the current scope and makes it unset, i.e., it has no value (unless it was previously set in the current scope, in which case that command does nothing). The second and third declares a variable in the current scope and assigns it the empty string value.

@jghub
Copy link
Author

jghub commented Sep 11, 2022

thanks. I really believed that there is no real NULL value or, rather, that NULL and empty string are the same thing/value in shells. but on reading up on it I understand that ${x+word} behaves differently for both cases (question: what would be the "canonical" test for "variable exists but unset": the ${x+word} .

another thing: while browsing the ksh93u+ manpage (the one which I readily can access right now) for unset, null etc. I have noticed that under "File Name Generation" it says

 *      Matches any string, including the null string. 

should this not, actually, better read "empty string"? and there are a few more instances where null string or similar is used where null might cause confusion, no?

@McDutchie
Copy link

should this not, actually, better read "empty string"?

Agreed. I'll fix it.

@McDutchie
Copy link

(question: what would be the "canonical" test for "variable exists but unset": the ${x+word} .

There is no distinction in shell grammar between nonexistent and unset. All nonexistent variables are considered unset and treated as such. If a variable is unset in a local scope while set in a parent scope, then within that local scope it is treated as if it doesn't exist. Any behaviour that differs from that is a bug.

@JohnoKing
Copy link

I have made a minor update to my patch, and have included a commit message in it. In the commit message I have an explanation for why the fix in #533 (comment) works. Feel free to test the latest patch.

New patch
commit 1bca09a8e493f2faaa4effbca8b05888467a0536
Author: Johnothan King <johnothanking@protonmail.com>
Date:   Sat Sep 24 07:56:03 2022 -0700

    Fix += operator in invocation-local scopes
    
    This commit fixes multiple related bugs in invocation local scopes when
    using the += operator. Changes:
    
    - In sh_exec and sh_ntfork(), increment sh.invoc_local when entering an
      invocation-local scope, and decrement it when exiting the local scope.
      The nv_putval() function makes use of this variable when copying the
      parent scope variable into the local scope, which avoids setting the
      wrong scope for independent foo+= statements.
    
    - To avoid possible function scoping bugs, reset sh.invoc_local in
      function scopes created with sh_funscope(). Set sh.invoc_local back to
      it's previous value when exiting the function's static scope.
    
    - In nv_create(), fix a regression introduced in ksh93t- 2008-06-02.
      The regression causes nv_search to use the wrong scope, because
      nv_search doesn't use the NV_NOSCOPE flag (formerly HASH_NOSCOPE).
      As a result, nv_search will not find the locally scoped variable
      (since nv_putval has yet to create it) and searches the parent scope.
      This results in the variable node pointing to the parent scope's
      variable, resulting in the variable no longer having a local scope.
      To avoid this problem without introducing regression test failures,
      the resulting node found by nv_search is initially set to an mp
      pointer. The np pointer will only be set to mp (the result from
      nv_search) when needed (most of the time, this isn't in an
      invocation-local scope).

diff --git a/src/cmd/ksh93/include/shell.h b/src/cmd/ksh93/include/shell.h
index aea8cfd04..fd09acb21 100644
--- a/src/cmd/ksh93/include/shell.h
+++ b/src/cmd/ksh93/include/shell.h
@@ -337,6 +337,7 @@ struct Shell_s
 	struct dolnod	*arglist;
 	int16_t		fn_depth;	/* scoped ksh-style function call depth */
 	int16_t		dot_depth;	/* dot-script and POSIX function call depth */
+	char		invoc_local;	/* set when inside of an invocation-local scope */
 	int		hist_depth;
 	int		xargmin;
 	int		xargmax;
diff --git a/src/cmd/ksh93/sh/name.c b/src/cmd/ksh93/sh/name.c
index 55387a66b..8b90cac8a 100644
--- a/src/cmd/ksh93/sh/name.c
+++ b/src/cmd/ksh93/sh/name.c
@@ -773,6 +773,7 @@ Namval_t *nv_create(const char *name,  Dt_t *root, int flags, Namfun_t *dp)
 			if(top)
 			{
 				struct Ufunction *rp;
+				Namval_t *mp;
 				if((rp=sh.st.real_fun) && !rp->sdict && (flags&NV_STATIC))
 				{
 					Dt_t *dp = dtview(sh.var_tree,(Dt_t*)0);
@@ -780,18 +781,18 @@ Namval_t *nv_create(const char *name,  Dt_t *root, int flags, Namfun_t *dp)
 					dtview(rp->sdict,dp);
 					dtview(sh.var_tree,rp->sdict);
 				}
-				if(np = nv_search(name,sh.var_tree,0))
+				if(mp = nv_search(name,sh.var_tree,0))
 				{
 #if SHOPT_NAMESPACE
-					if(sh.var_tree->walk==sh.var_base || (sh.var_tree->walk!=sh.var_tree && sh.namespace &&  nv_dict(sh.namespace)==sh.var_tree->walk))
+					if(sh.var_tree->walk==sh.var_base || (sh.var_tree->walk!=sh.var_tree && sh.namespace && nv_dict(sh.namespace)==sh.var_tree->walk))
 #else
 					if(sh.var_tree->walk==sh.var_base)
 #endif /* SHOPT_NAMESPACE */
 					{
 #if SHOPT_NAMESPACE
-						if(!(nq = nv_search((char*)np,sh.var_base,NV_REF)))
+						if(!(nq = nv_search((char*)mp,sh.var_base,NV_REF)))
 #endif /* SHOPT_NAMESPACE */
-						nq = np;
+						nq = mp;
 						sh.last_root = sh.var_tree->walk;
 						if((flags&NV_NOSCOPE) && *cp!='.')
 						{
@@ -799,17 +800,19 @@ Namval_t *nv_create(const char *name,  Dt_t *root, int flags, Namfun_t *dp)
 								root = sh.var_tree->walk;
 							else
 							{
-								nv_delete(np,(Dt_t*)0,NV_NOFREE);
-								np = 0;
+								nv_delete(mp,(Dt_t*)0,NV_NOFREE);
+								mp = 0;
 							}
 						}
+						np = mp;
 					}
-					else
+					else if(!sh.invoc_local)
 					{
 						if(sh.var_tree->walk)
 							root = sh.var_tree->walk;
 						flags |= NV_NOSCOPE;
 						noscope = 1;
+						np = mp;
 					}
 				}
 				if(rp && rp->sdict && (flags&NV_STATIC))
@@ -1560,7 +1563,7 @@ void nv_putval(register Namval_t *np, const char *string, int flags)
 	 * When we are in an invocation-local scope and we are using the
 	 * += operator, clone the variable from the previous scope.
 	 */
-	if((flags&NV_APPEND) && nv_isnull(np) && sh.var_tree->view)
+	if(sh.invoc_local && (flags&NV_APPEND) && nv_isnull(np))
 	{
 		Namval_t *mp = nv_search(np->nvname,sh.var_tree->view,0);
 		if(mp)
diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index efdbf9778..eefb84be7 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -1240,6 +1240,7 @@ int sh_exec(register const Shnode_t *t, int flags)
 						if(argp)
 						{
 							scope++;
+							sh.invoc_local++;
 							sh_scope(argp,0);
 						}
 						opt_info.index = opt_info.offset = 0;
@@ -1316,7 +1317,10 @@ int sh_exec(register const Shnode_t *t, int flags)
 					if(buffp->olist)
 						free_list(buffp->olist);
 					if(scope)
+					{
 						sh_unscope();
+						sh.invoc_local--;
+					}
 					bp->ptr = (void*)save_ptr;
 					bp->data = (void*)save_data;
 					sh.redir0 = 0;
@@ -1335,6 +1339,7 @@ int sh_exec(register const Shnode_t *t, int flags)
 				if(!command && np && nv_isattr(np,NV_FUNCTION))
 				{
 					volatile int indx;
+					volatile char scope = 0;
 					struct checkpt *buffp = (struct checkpt*)stkalloc(sh.stk,sizeof(struct checkpt));
 #if SHOPT_NAMESPACE
 					Namval_t node, *namespace=0;
@@ -1392,6 +1397,11 @@ int sh_exec(register const Shnode_t *t, int flags)
 					}
 					if(jmpval == 0)
 					{
+						if(argp)
+						{
+							sh.invoc_local++;
+							scope++;
+						}
 						if(io)
 							indx = sh_redirect(io,execflg);
 #if SHOPT_NAMESPACE
@@ -1419,6 +1429,8 @@ int sh_exec(register const Shnode_t *t, int flags)
 						sh_popcontext(buffp);
 						sh_iorestore(indx,jmpval);
 					}
+					if(scope)
+						sh.invoc_local--;
 					if(nq)
 						unset_instance(nq,&node,&nr,mode);
 					sh_funstaks(slp->slchild,-1);
@@ -2999,6 +3011,7 @@ int sh_funscope(int argn, char *argv[],int(*fun)(void*),void *arg,int execflg)
 	int			isig,jmpval;
 	volatile int		r = 0;
 	int			n;
+	char			save_invoc_local;
 	char 			**savsig, *save_debugtrap = 0;
 	struct funenv		*fp = 0;
 	struct checkpt	*buffp = (struct checkpt*)stkalloc(sh.stk,sizeof(struct checkpt));
@@ -3079,6 +3092,8 @@ int sh_funscope(int argn, char *argv[],int(*fun)(void*),void *arg,int execflg)
 		nv_putval(SH_PATHNAMENOD,sh.st.filename,NV_NOFREE);
 		nv_putval(SH_FUNNAMENOD,sh.st.funname,NV_NOFREE);
 	}
+	save_invoc_local = sh.invoc_local;
+	sh.invoc_local = 0;
 	jmpval = sigsetjmp(buffp->buff,0);
 	if(jmpval == 0)
 	{
@@ -3113,6 +3128,7 @@ int sh_funscope(int argn, char *argv[],int(*fun)(void*),void *arg,int execflg)
 			r = sh.exitval;
 		}
 	}
+	sh.invoc_local = save_invoc_local;
 	sh.fn_depth--;
 	update_sh_level();
 	if(sh.fn_depth==1 && jmpval==SH_JMPERRFN)
@@ -3369,6 +3385,7 @@ static pid_t sh_ntfork(const Shnode_t *t,char *argv[],int *jobid,int topfd)
 		if(t->com.comset)
 		{
 			scope++;
+			sh.invoc_local++;
 			sh_scope(t->com.comset,0);
 		}
 		if(!strchr(path=argv[0],'/')) 
@@ -3511,6 +3528,7 @@ static pid_t sh_ntfork(const Shnode_t *t,char *argv[],int *jobid,int topfd)
 	if(scope)
 	{
 		sh_unscope();
+		sh.invoc_local--;
 		if(jmpval==SH_JMPSCRIPT)
 			nv_setlist(t->com.comset,NV_EXPORT|NV_IDENT|NV_ASSIGN,0);
 	}
diff --git a/src/cmd/ksh93/tests/variables.sh b/src/cmd/ksh93/tests/variables.sh
index e9d3353ea..c17323ba7 100755
--- a/src/cmd/ksh93/tests/variables.sh
+++ b/src/cmd/ksh93/tests/variables.sh
@@ -1474,5 +1474,52 @@ got=$("$SHELL" <<-\EOF
 ((exp == got)) || err_exit 'LINENO is wrong after a multi-line compound assignment' \
 	"(expected $exp, got $(printf %q "$got"))"
 
+# ======
+# The += operator shouldn't copy variables outside of a function's scope
+# https://github.com/ksh93/ksh/issues/533
+unset v foo bar
+v=outside
+function f {
+	typeset v
+	print -n "$v"
+	v+="inside"
+	print "$v"
+}
+function foo {
+	echo $bar
+}
+function bar {
+	bar=bar_
+	bar+=foo foo
+	bar+=foo "$SHELL" -c 'echo $bar'
+	bar+=foo
+	echo $bar
+}
+exp='inside
+bar_foo
+bar_foo
+bar_foo'
+got=$(f && bar)
+[[ $exp == "$got" ]] || err_exit "+= operator used in function copies variable from outside of the function's scope" \
+	"(expected $(printf %q "$exp"), got $(printf %q "$got"))"
+
+unset var
+function three {
+        :
+}
+function two {
+	var+='wrong ' sh -c 'true'
+	var+='wrong ' true
+	var+='wrong ' three
+	echo $var
+}
+function one {
+	var=one_ two
+}
+exp=one_
+got=$(one)
+[[ $exp == "$got" ]] || err_exit "+= operator in a nested function appends variable in the wrong scope" \
+	"(expected $(printf %q "$exp"), got $(printf %q "$got"))"
+
 # ======
 exit $((Errors<125?Errors:125))

@McDutchie
Copy link

Very nice. That all makes sense, thank you. I will test it more thoroughly.

The only problem I can see (without testing it) is that sh.invoc_local is not reset if the current shell environment calls sh_exit() and longjmps due to an error. That's easily fixed by resetting it in sh_exit().

McDutchie added a commit that referenced this issue Sep 25, 2022
This commit fixes multiple related bugs in invocation-local scopes
when using the += operator. Changes:

- In sh_exec and sh_ntfork(), increment sh.invoc_local when
  entering an invocation-local scope, and decrement it when exiting
  the local scope. The nv_putval() function makes use of this
  variable when copying the parent scope variable into the local
  scope, which avoids setting the wrong scope for independent foo+=
  statements.

- To avoid possible function scoping bugs, reset sh.invoc_local in
  function scopes created with sh_funscope(). Set sh.invoc_local
  back to its previous value when exiting the function's static
  scope.

- In nv_create(), fix a regression introduced in ksh93t-
  2008-06-02. The regression causes nv_search to use the wrong
  scope, because nv_search doesn't use the NV_NOSCOPE flag
  (formerly HASH_NOSCOPE, see 1884f57). As a result, nv_search
  will not find the locally scoped variable (since nv_putval has
  yet to create it) and searches the parent scope. This results in
  the variable node pointing to the parent scope's variable,
  resulting in the variable no longer having a local scope. To
  avoid this problem without introducing regression test failures,
  the resulting node found by nv_search is initially set to an mp
  pointer. The np pointer will only be set to mp (the result from
  nv_search) when needed (most of the time, this isn't in an
  invocation-local scope).

Co-authored-by: Martijn Dekker <martijn@inlv.org>
Resolves: #533
@McDutchie
Copy link

I've committed it so we can test it more widely for a few days before releasing 1.0.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
None yet
Development

No branches or pull requests

3 participants