Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

:= doesn't work as expected #157

Closed
oguz-ismail opened this issue Jan 20, 2021 · 18 comments
Closed

:= doesn't work as expected #157

oguz-ismail opened this issue Jan 20, 2021 · 18 comments
Labels
bug Something is not working

Comments

@oguz-ismail
Copy link

oguz-ismail commented Jan 20, 2021

$ typeset -i a b=42
$ echo ${a:=b}
0    # expecting `42', or at least `b'
$ echo $a
     # definitely expecting `42', zsh prints `0' here
$ 

This works fine on mksh, and partly on bash.

@McDutchie McDutchie added the bug Something is not working label Jan 20, 2021
@McDutchie
Copy link

Confirmed on all versions of ksh93 down to s+ 1993-12-28 through ksh2020. Anther one that is going to be challenging to track down... Thanks for the report.

@saper
Copy link

saper commented Feb 16, 2021

Let's try:

$ typeset -i b=42
$ c=
$ echo "${c:=b}"
b

Why should we expect for integer b to expand to its value and not to the string "b"?

The status of "a" after typeset -i a is an interesting one though. It seems to be "unset" for the purpose of parameter substitution but can be happily evaluated arithmetically.

@hyenias
Copy link

hyenias commented Feb 17, 2021

Agreed, we should not expect the variable b to have its value of 42 expanded as the b in echo ${a:=b} refers to the string of "b" as @saper pointed out. If you wanted 42 then echo ${a:=$b} should be used.

However, I did some testing and it appears that the variable expansion of ${variable:=word} is not working for numerics with a null value as demonstrated by @oguz-ismail.

$ ksh -c 'typeset -i a; typeset -p a; echo ${a:=42}; typeset -p a'
typeset -i a
0
typeset -i a
$ ksh -c 'typeset -F a; typeset -p a; echo ${a:=42}; typeset -p a'
typeset -F a
0
typeset -F a
$ ksh -c 'typeset -E a; typeset -p a; echo ${a:=42}; typeset -p a'
typeset -E a
0
typeset -E a
$ ksh -c 'typeset -X a; typeset -p a; echo ${a:=42}; typeset -p a'
typeset -X 32 a
0
typeset -X 32 a

When the variable is text, it works as expected which @saper has shown above.

$ ksh -c 'typeset a=; typeset -p a; echo ${a:=42}; typeset -p a'
a=''
42
a=42

@McDutchie
Copy link

Agreed, we should not expect the variable b to have its value of 42 expanded as the b in echo ${a:=b} refers to the string of "b" as @saper pointed out. If you wanted 42 then echo ${a:=$b} should be used.

I do not agree, because in the reproducer, that variable has been declared an integer with typeset -i. And that means assignment values for that variable are parsed as arithmetic expressions, even without $(()) being used. In arithmetic expressions, variable names that are not expanded by the shell (i.e. those without a leading $) are replaced by their numeric values by the arithmetic subsystem. So, for an a of an integer type, a simple assignment a=b will assign the numeric value of b to a. Since ${a:=b} and ${a=b} are just another form of assignment (though they are performed conditionally and yield a value), they should work the same way.

Compare with mksh, which does this right:

$ arch/*/bin/ksh -c 'echo $KSH_VERSION; typeset -i a b=42; a=b; echo $a'
Version AJM 93u+m/1.0.0-alpha+dev 2021-02-15
42
$ arch/*/bin/ksh -c 'echo $KSH_VERSION; typeset -i a b=42; echo ${a:=b}'
Version AJM 93u+m/1.0.0-alpha+dev 2021-02-15
0
$ mksh -c 'echo $KSH_VERSION; typeset -i a b=42; a=b; echo $a'
@(#)MIRBSD KSH R59 2020/05/05
42
$ mksh -c 'echo $KSH_VERSION; typeset -i a b=42; echo ${a:=b}'          
@(#)MIRBSD KSH R59 2020/05/05
42

@saper
Copy link

saper commented Feb 17, 2021

In arithmetic expressions, variable names that are not expanded by the shell (i.e. those without a leading $) are replaced by their numeric values by the arithmetic subsystem.

I think we should more careful here... I don't think that declaring a variable with typeset changes a whole "parameter expansion" system.

p. 136 of The New Kornshell explains "Arithmetic Expressions" lists situations where the arithmetic expression can be used (such as ((...)), let etc. etc.) but there is no mention of parameter expansion here.

p.137 says "If a variable in an arithmetic expression has the integer attribute (see attributes on page 178), then ksh uses the value of the variable".

Then we have p.157 which is I think is the most important page in the book and it says:

ksh performs tilde expansion, command substitution, parameter expansion and arithmetic expansion on a word from left to right first.

So definitely "parameter expansion" is done before any arithmetic expansion. And in this context, p. 158 "Arithmetic expansion" explains that

Each $((..)) is replaced by the value of the arithmetic expression within the double parentheses.

Therefore I don't think the following applies:

And that means assignment values for that variable are parsed as arithmetic expressions, even without $((…)) being used

Arithmetic expansion can be done only with $((...)) during the command processing stage according to page 158.

I think the problem is that typeset variables have a dual behaviour, similar to how tcl treats its variables. For string-like parameter expansion they are unset, but an attempt to extract numeric values gives numeric zero (which is also is not Null for the purposes of := or :?).

@oguz-ismail
Copy link
Author

oguz-ismail commented Feb 17, 2021

@saper Totally irrelevant. The rule that applies here is that ${a:=b} assigns b to a, and expands to a's value, not b. And if a has its integer attribute set, then b is evaluated arithmetically (not as part of arithmetic expansion, but as a step in the assignment, which is carried out as part of parameter expansion), and the result is a's new value.

Besides it wouldn't make any sense if ${a:=b} expanded to a non-integer value when a is marked as an integer.

@saper
Copy link

saper commented Feb 17, 2021

if a has its integer attribute set, then b is evaluated arithmetically

I don't see nothing in the spec that says this should happen... I know this makes little sense from the point of view of the desired effect but I we didn't tell ksh to extract the value of b. I was wrong, see below from @McDutchie

@McDutchie
Copy link

McDutchie commented Feb 17, 2021

KornShell book, "Attributes" chapter, section "-i or -ibase Integer", p. 178: "Whenever you assign a value to a variable, the value is evaluated as an arithmetic integer expression."

The manual page (sh.1), section Parameter Expansion: "If any of the floating point attributes, -E, -F, or -X, or the integer attribute, -i, is set for vname, then the value is subject to arithmetic evaluation as described below."

we didn't tell ksh to extract the value of b.

We did; the integer attribute tells ksh to automatically evaluate every value as an arithmetic expression. Arithmetic expressions are C-style and do not need $ to replace variables with their values.

@avih
Copy link

avih commented Feb 17, 2021

the integer attribute tells ksh to automatically evaluate every value as an arithmetic expression. Arithmetic expressions are C-style and do not need $ to replace variables with their values.

Huh.. not judging whether it's good or bad, but IMHO that's counter intuitive and makes reading code harder, because one needs to know the type of the assignee in order to understand how the value is evaluated...

Also, it behaves very unlike posix for a syntax which is well defined in posix - and depends on the type of the var which might have been set elsewhere at the code...

@oguz-ismail
Copy link
Author

@avih Actually, POSIX leaves room for such extensions.

If parameter is unset or null, quote removal shall be performed on the expansion of word and the result (or an empty string if word is omitted) shall be assigned to parameter. In all cases, the final value of parameter shall be substituted.

@avih
Copy link

avih commented Feb 18, 2021

Actually, POSIX leaves room for such extensions.

This is going off topic, but I was referring only to what I quoted, not the := part. I.e. even the most basic form of an assignment like a=b can take b literally (posix) or as $b (ksh when a is an integer - which might have been set elsewhere at the code). That's the part I was referring to.

The := behavior is an obvious extension of the basic rule quoted earlier that value assigned to an integer var is evaluated as an arithmetic expression.

FWIW, this behavior, e.g. typeset -i a; b=5; a=b; echo $a works the same and prints 5 also in bash and mksh (both even in posix mode) and in other pdksh derivatives like pdksh itself and openbsd sh.

@McDutchie
Copy link

McDutchie commented Feb 18, 2021

Right. There's no need to disable this in POSIX mode because typeset is not a POSIX feature. So if you use it, you've already forgone POSIX compatibility and we should just maintain the expected ksh behaviour.

There is one shell with a POSIX mode that tries to disable almost everything not specified by POSIX: yash. All the other shells with a POSIX mode (bash, mksh, zsh, and now ksh 93u+m) make the minimum adaptations necessary to avoid violating the standard so they are compatible with pure POSIX scripts, but they do not disable extensions.

@saper
Copy link

saper commented Feb 18, 2021

@McDutchie - you are right. That means that whenever typeset -i a; a="something" "something" should be evaluated as an arithmetic expression, for example like this (this works):

radziecki> typeset -i a; a="2+2"
radziecki> echo $a
4

Now I begin to think that problem is somewhere else:

radziecki> set -o nounset
radziecki> typeset -i a
radziecki> echo "${a}"
ksh93: a: parameter not set
radziecki> echo "${a?please set a}"
0
radziecki> echo "${a:?please give a value}"
0
radziecki> echo "${b:?please give b value}"
ksh93: b: please give b value
radziecki> echo "${b?please set b}"        
ksh93: b: please set b
radziecki> echo "${b+xx}"

radziecki> echo "${a+xx}"

Looks like sometimes (? modifier, :? modifier, := modifier) a is assumed to have a zero integer value, otherwise a is unset.
We simply do not reach the assignment phase because because ksh concluded a already has a value (well, sometimes)

@McDutchie
Copy link

McDutchie commented Feb 18, 2021

Huh.. not judging whether it's good or bad, but IMHO that's counter intuitive and makes reading code harder, because one needs to know the type of the assignee in order to understand how the value is evaluated...

I agree. The POSIX shell is fundamentally a typeless language in which variables are simple strings. ksh tacked types on top of it, which inevitably causes some unintuitive behaviour. However, it's been this way ever since there has been typeset and pretty much the entire installed base of ksh scripts depends on this, so it's not going to change no matter how much we would like it to.

@McDutchie
Copy link

Looks like sometimes (? modifier, :? modifier, := modifier) a is assumed to have a zero integer value, otherwise a is unset.
We simply do not reach the assignment phase because because ksh concluded a already has a value (well, sometimes)

Good testing there. I agree this may give a clue to the cause of the bug. It will be a while before I can get to hunting this one down properly. In the meantime, if anyone else feels so called, please go right ahead.

@McDutchie
Copy link

The root cause is that this check in varsub() for whether the variable is set or unset appears to be broken:

ksh/src/cmd/ksh93/sh/macro.c

Lines 1405 to 1408 in f8f2c4b

/*
* Check if the parameter is set or unset.
*/
if(np && (type==M_TREE || !c || !ap))

This only checks if the pointer is non-null, which is true if this nv_open() call succeeded, i.e., if the node exists. However, declaring a variable and assigning it a type creates a node, without assigning it a value. So we need another check for a non-null value. That is done with the nv_isnull() macro.

diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 545d09a5..c750a6f2 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -1405,7 +1405,7 @@ retry1:
 		/*
 		 * Check if the parameter is set or unset.
 		 */
-		if(np && (type==M_TREE || !c || !ap))
+		if(np && !nv_isnull(np) && (type==M_TREE || !c || !ap))
 		{
 			/* The parameter is set. */
 			char *savptr;

With this diff, the reproducer works as expected:

$ unset a b
$ (typeset -i a b=42; echo ${a:=b}; echo $a)
42
42

However, several regressions show up when running bin/shtests -p:

	functions.sh[793]: optimization error with undefined variable
	nameref.sh[165]: for loop optimization with namerefs not working
	nameref.sh[470]: name reference to unset type instance not redirected to .deleted
	variables.sh[830]: ${var+s} expansion fails in loops (expected 'USUSUsusus', got 'UUUUUsusus')

So, the hunt continues.

@McDutchie
Copy link

The functions.sh and variables.sh regressions reminded me of commit 9529441. Sure enough, those two are solved by moving the nv_optimize() call added in that commit forward to before the new nv_isnull() expansion, as that macro does not work properly before updating the optimisation state or whatever (I don't understand the loop invariants optimisation stuff yet).

The other two regressions are solved by only checking for a non-null value if the expansion is of a regular M_BRACE type. All the other types of expansion are special and 'set' or 'unset' does not apply.

With the diff below, it now all works for me.

diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 545d09a5..3f7f6949 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -1405,9 +1405,13 @@ retry1:
 		/*
 		 * Check if the parameter is set or unset.
 		 */
-		if(np && (type==M_TREE || !c || !ap))
+#if SHOPT_OPTIMIZE
+		if(np && type==M_BRACE && mp->shp->argaddr)
+			nv_optimize(np);  /* needed before calling nv_isnull() */
+#endif /* SHOPT_OPTIMIZE */
+		if(np && (type==M_BRACE ? !nv_isnull(np) : (type==M_TREE || !c || !ap)))
 		{
-			/* The parameter is set. */
+			/* Either the parameter is set, or it's a special type of expansion where 'unset' doesn't apply. */
 			char *savptr;
 			c = *((unsigned char*)stkptr(stkp,offset-1));
 			savptr = stkfreeze(stkp,0);
@@ -1443,13 +1447,7 @@ retry1:
 					if(ap)
 						v = nv_arrayisset(np,ap)?(char*)"x":0;
 					else
-					{
-#if SHOPT_OPTIMIZE
-						if(mp->shp->argaddr)
-							nv_optimize(np);  /* avoid BUG_ISSETLOOP */
-#endif /* SHOPT_OPTIMIZE */
 						v = nv_isnull(np)?0:(char*)"x";
-					}
 				}
 				else
 					v = nv_getval(np);

@oguz-ismail
Copy link
Author

@McDutchie Can't find a flaw, thanks for the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
None yet
Development

No branches or pull requests

5 participants