Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declaration utilities perform assignments immediately before expanding remaining command arguments #626

Open
hvdijk opened this issue Apr 6, 2023 · 13 comments
Labels
backburner Low priority (but feel free to fix it and do a PR) question Further information is requested

Comments

@hvdijk
Copy link

hvdijk commented Apr 6, 2023

Please consider the following

a=1
export a=2 b=$a
echo a=$a b=$b

This is supposed to print 'a=2 b=1'. Under ksh, it prints 'a=2 b=2'.

The reason it is supposed to print 'a=2 b=1' is that the word expansions are supposed to be performed first for each command argument, and only then does the command run, which performs the assignments. Therefore, when 'b=$a' is expanded, 'a' is supposed to still have its previous value of '1'. In ksh, when the command is recognised as a declaration utility, the assignment is performed for each argument immediately after it has been expanded.

This applies to 'declare', 'export', 'local', 'readonly', 'typeset'. Not all are covered by POSIX, but I expect it would be more work to treat these differently than it would be to change all of them in one go.

Workaround: It is possible to avoid the issue by writing the command in such a way that it does not get detected as a declaration utility, e.g.

a=1
${$+export} a=2 b=$a
echo a=$a b=$b
@McDutchie
Copy link

The issue is more fundamental than that. Consider preceding assignments:

$ b=1; a=2 b=$a sh -c 'echo $a $b'
2 2

(which is also the behaviour of most other shells)

In ksh, declaration utilities are implemented by interpreting the assignment-arguments as real assignments. When you do

export a=2 b=$a

what actually gets executed is

a=2 b=$a export a b

Same with typeset, etc.

This is fundamental. Changing it would require a comprehensive rewrite/redesign.

@hvdijk
Copy link
Author

hvdijk commented Apr 7, 2023

The issue is more fundamental than that. Consider preceding assignments:

$ b=1; a=2 b=$a sh -c 'echo $a $b'
2 2

(which is also the behaviour of most other shells)

This is something POSIX leaves unspecified. It is not a bug for this to result in '2 2'.

Variable assignments shall be performed as follows:
[...]
If the command name is not a special built-in utility or function, [...] In this case it is unspecified:

  • Whether or not the assignments are visible for subsequent expansions in step 4

(I cannot comment on the second part of your reply, as I am not sufficiently familiar with ksh internals.)

@McDutchie
Copy link

Perhaps this will clarify. On the dev branch, I now use deparse.c for typeset -f output instead of dumping text from a recorded offset in the source file (which is highly problematic as the source file may change or go away). Which means this sort of internals get exposed:

$ echo ${.sh.version}
Version AJM 93u+m/1.1.0-alpha+2d2913e5/MOD 2023-04-05
$ f() { b=1; export a=1 b=$a; }                                                                                                    
$ typeset -f f
f()
{	b=1
	a=1 b=$a export a b
}

So, that shows you how ksh has hacked declaration built-ins.

@hvdijk
Copy link
Author

hvdijk commented Apr 7, 2023

Thanks, that makes sense.

A trivial proof of concept modification to not do that works for this case:

--- a/src/cmd/ksh93/sh/parse.c
+++ b/src/cmd/ksh93/sh/parse.c
@@ -1484,33 +1484,13 @@ static Shnode_t *simple(Lex_t *lexp,int flag, struct ionod *io)
                if(associative && (!(argp->argflag&ARG_ASSIGN) || argp->argval[0]!='['))
                        sh_syntax(lexp);
                /* check for assignment argument */
-               if((argp->argflag&ARG_ASSIGN) && assignment!=2)
+               if((argp->argflag&ARG_ASSIGN) && assignment==0)
                {
                        *settail = argp;
                        settail = &(argp->argnxt.ap);
                        lexp->assignok = (flag&SH_ASSIGN)?SH_ASSIGN:1;
-                       if(assignment)
-                       {
-                               struct argnod *ap=argp;
-                               char *last, *cp;
-                               if(assignment==1)
-                               {
-                                       last = strchr(argp->argval,'=');
-                                       if(last && (last[-1]==']'|| (last[-1]=='+' && last[-2]==']')) && (cp=strchr(argp->argval,'[')) && (cp < last) && cp[-1]!='.')
-                                               last = cp;
-                                       stkseek(stkp,ARGVAL);
-                                       sfwrite(stkp,argp->argval,last-argp->argval);
-                                       ap=(struct argnod*)stkfreeze(stkp,1);
-                                       ap->argflag = ARG_RAW;
-                                       ap->argchn.ap = 0;
-                               }
-                               *argtail = ap;
-                               argtail = &(ap->argnxt.ap);
-                               if(argno>=0)
-                                       argno++;
-                       }
-                       else /* alias substitutions allowed */
-                               lexp->aliasok = 1;
+                       /* alias substitutions allowed */
+                       lexp->aliasok = 1;
                }
                else
                {

This works because declaration utilities already have the logic for parsing assignments that were passed in as command line arguments, as they are needed to handle e.g. export ${$+a=b}.

However, this cannot be easily extended to not also break e.g. export a=(b c d), as that cannot be passed to the 'export' utility as a string.

@ormaaj
Copy link

ormaaj commented May 20, 2023

The reason it is supposed to print 'a=2 b=1' is that the word expansions are supposed to be performed first for each command argument, and only then does the command run, which performs the assignments. 

IMO the other shells have the bug. The way ksh does this is consistent with the way assignments always work - the assignment words should be evaluated in order. I don't know why the others don't do that.

@hvdijk
Copy link
Author

hvdijk commented May 20, 2023

IMO the other shells have the bug.

My comment was based on what POSIX specifies. It requires the behaviour of other shells, for the reason I gave. I have no issue if you wish to argue that POSIX is defective, but IMO this is the wrong place to raise that, if you feel that way I would encourage you to report a bug at https://austingroupbugs.net/main_page.php. :)

@ormaaj
Copy link

ormaaj commented May 20, 2023

What requirement are you referring to? 2.9.1.1 section 2 says:

If there is a command name and it is recognized as a declaration utility,
then any remaining words after the word that expanded to produce the
command name, that would be recognized as a variable assignment in
isolation, shall be expanded as a variable assignment (tilde expansion
after the first <equals-sign> and after any unquoted <colon>, parameter
expansion, command substitution, arithmetic expansion, and quote removal,
but no field splitting or pathname expansion); while remaining words
that would not be a variable assignment in isolation shall be subject to
regular expansion (tilde expansion for only a leading <tilde>, parameter
expansion, command substitution, arithmetic expansion, field splitting,
pathname expansion, and quote removal). For all other command names,
words after the word that produced the command name shall be subject only
to regular expansion. All fields resulting from the expansion of the word
that produced the command name and the subsequent words, except for the
field containing the command name, shall be the arguments for the command.

This section, confusingly, redundantly overviews the order of expansions for an assignment, but it definitely says nothing about modification of the overall evaluation of assignments for declaration builtins. I would interpret this as meaning assignments should be processed as usual one assignment word at a time.

Also, the builtin itself almost never performs the assignments. This is usually reflected in xtrace output. If the assignments were being performed by the builtin then arguments would first need to be passed to the builtin, then any assignments exported to the builtin would have to evaluate, and last the builtin would evaluate the arguments with exported assignments also affecting these assignments.

No shell does this for regular assignments, but a notable exception with bash serves to illustrate. Bash may conditionally modify evaluation such that assignment are performed by the builtin for non-assignment words that expand to valid assignments after the initial assignment word pass. This applies only to compound array assignments in bash.

(ins)ormaaj 92 (2980524) 1 ~ # ( bash /proc/self/fd/9 9<<\EOF )
set -x
function f {
        typeset x=foo
        x=bar typeset -a 'a=("$x")' b=("$x")
        typeset -p a b
}

f
EOF
+ f
+ typeset x=foo
+ b=('foo')
+ x=bar
+ typeset -a 'a=("$x")' b
+ typeset -p a b
declare -a a=([0]="bar")
declare -a b=([0]="foo")

Note the literal assignment word is evaluated first, before the builtin is executed, next exported assignments are evaluated, and last the remaining word is passed to the builtin which re-evaluates the word as an assignment, except now with a new x being in scope.

@hvdijk
Copy link
Author

hvdijk commented May 20, 2023

This section, confusingly, redundantly overviews the order of expansions for an assignment, but it definitely says nothing about modification of the overall evaluation of assignments for declaration builtins.

That's because despite being parsed as assignment words, they have no such effect. Nothing in 2.9.1 says that 'export a=b' assigns to the variable 'a' the value 'b', that is entirely covered by the description of the 'export' command itself.

Also, the builtin itself almost never performs the assignments.

You're describing ksh implementation behaviour here, not POSIX. Per POSIX's description of the 'export' command:

If the name of a variable is followed by = word, then the value of that variable shall be set to word.

If the assignments were being performed by the builtin then arguments would first need to be passed to the builtin, then any assignments exported to the builtin would have to evaluate, and last the builtin would evaluate the arguments with exported assignments also affecting these assignments.

Yes, this is exactly what POSIX specifies.

No shell does this for regular assignments

False, this is exactly how it works in dash and dash-derived shells. I have not dug into the behaviour of other shells but the fact that ksh is alone in the behaviour I originally reported makes me doubt your claim here for other shells too.

@ormaaj
Copy link

ormaaj commented May 20, 2023

The POSIX description is quite poor and doesn't really capture how things actually work. I would not infer just from the in an assignment context sentence anything about the way multiple assignments are evaluated, or anything about the environment of the assignment. It's too vague.

You're describing ksh implementation behaviour here, not POSIX. Per POSIX's description of the 'export' command

That's because the way ksh and its derivatives process these was the entire precedent for adding declaration builtins as a separate class to the spec. The term "declaration command" literally originates from the ksh manual:

Commands  that  are  preceded  by  a ‡ symbol below are declaration
commands.  Any following words that are in the format of a variable
assignment are expanded with the same rules as a variable assignment.
This means that tilde expansion is performed after the = sign, array
assignments of the form varname=(assign_list) are supported, and field
splitting and pathname expansion are not performed.

In dash, declaration builtins were a total afterthought. Dash is not a shining example. I remember the patch which made that change and it was a clumbsy half-hearted attempt at converting the argument processing into something that looks roughly like assignment processing.

@hvdijk
Copy link
Author

hvdijk commented May 20, 2023

@ormaaj: You're talking to the developer of a dash-based shell who is intimately familiar with its internals. My view of what you just wrote is that that is wrong and insulting, and that continuing this conversation is a waste of time of everyone involved. I will withdraw from this conversation with you.

@McDutchie: If you believe that the current ksh behaviour is permitted by POSIX, I will be happy to have that conversation if we can do it in a productive way. I believe it is clear that it is not.

@ormaaj
Copy link

ormaaj commented May 20, 2023

I'm not wrong. And nothing about that is intended to offend.

Historically most ash derivatives did not treat their arguments as special assignments. That was a recent development for dash in reaction to POSIX. Ash gained the ability to assign with export var=value early in its original release https://www.in-ulm.de/~mascheck/various/ash/DIFFERENCES. That is a distinct concept

The sentences you're referring to are unchanged from issue 7, which doesn't even define special processing for assignment arguments. Unsurprisingly, words treated as ordinary arguments in that context are expanded as usual and also unsurprisingly the builtin would have to capture and evaluate them under the assumptions of a typical builtin. POSIX now mashes up these ideas and doesn't explain the design ideas underlying them.

@McDutchie McDutchie added question Further information is requested backburner Low priority (but feel free to fix it and do a PR) labels Jun 3, 2023
@McDutchie
Copy link

However, this cannot be easily extended to not also break e.g. export a=(b c d), as that cannot be passed to the 'export' utility as a string.

Not only that, it will expose those pseudo-assignment arguments to field splitting and pathname expansion, which is contra new POSIX.

I might get the Austin Group involved in this when I have enough time/energy to be bothered. Ksh invented declaration builtins, it doesn't seem right that we should have to redesign them.

I'm not a fan of them though, I think that (no matter how they are implemented) they're a kludge that papers over the real and fundamental problem, which is global field splitting and pathname expansion – and they do this at the expense of creating even more inconsistency and confusion (as in, we now have to explain to scripters why myowncommand foo=$bar is dangerous even though export foo=$bar is not).

So, for now, I'm having a hard time getting excited about this issue.

@hvdijk
Copy link
Author

hvdijk commented Jun 4, 2023

Not only that, it will expose those pseudo-assignment arguments to field splitting and pathname expansion, which is contra new POSIX.

Indeed. A simple test case for this:

a="x y"
export b=$a
echo "a='$a' b='$b'"

This is supposed to print

a='x y' b='x y'

under new POSIX rules, and prints

a='x y' b='x'

with my proof of concept patch.

I might get the Austin Group involved in this when I have enough time/energy to be bothered. Ksh invented declaration builtins, it doesn't seem right that we should have to redesign them.

If you can propose POSIX wording to make this unspecified, while leaving enough specified for the concept of declaration utilities to remain sufficiently useful to standardise, I am certainly not going to object to that. I didn't open this issue because it's breaking existing scripts, I opened it because I happened to come across it comparing my own shell to others in corner cases.

(as in, we now have to explain to scripters why myowncommand foo=$bar is dangerous even though export foo=$bar is not)

At the moment, for your shell and mine, export foo=$bar is not dangerous, but for portable shell scripts, it still is: there are still shells in use that implement the old rules (e.g. yash). So that needs to be export foo="$bar". Or export "foo=$bar". And this issue then makes it so that script authors have to realise there is a difference in ksh between the two. I understand why, and I understand that fixing this is not a priority, but that is an unfortunate situation we ended up in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backburner Low priority (but feel free to fix it and do a PR) question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants