Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the bash compatibility mode #9

Closed
McDutchie opened this issue Jun 14, 2020 · 16 comments
Closed

Fix the bash compatibility mode #9

McDutchie opened this issue Jun 14, 2020 · 16 comments
Labels
TODO Things to be done before releasing wontfix This will not be worked on

Comments

@McDutchie
Copy link

Does anyone care about ksh's bash compatibility mode?

It's not compiled in my default. The RELEASE file says it's incomplete and experimental.

bash is pretty ubiquitous. I think if people want to run bash scripts, they'll run bash.

Would anyone object to getting rid?

@McDutchie McDutchie added the question Further information is requested label Jun 14, 2020
@posguy99
Copy link

posguy99 commented Jun 14, 2020 via email

@McDutchie
Copy link
Author

McDutchie commented Jun 14, 2020

Are distributions enabling it?

It would surprise me if there are any that do. It's easy to check: ksh sets a $BASH_VERSION variable if it's enabled and adds a shopt builtin.

…Oh. Now it would surprise me even more, because I just found out it doesn't even compile:

$ bin/package make CCFLAGS='-Os -DSHOPT_BASH=1'
[...]
/usr/local/src/ksh93/ksh/src/cmd/ksh93/sh/init.c:1397:14: warning: comparison of distinct pointer types ('Shell_t *' (aka 'struct Shell_s *') and 'Shinit_f' (aka 'void (*)(struct Shell_s *, int)')) [-Wcompare-distinct-pointer-types]
                        shp>userinit = userinit = bash_init;
                        ~~~^~~~~~~~~
/usr/local/src/ksh93/ksh/src/cmd/ksh93/sh/init.c:1397:24: error: expression is not assignable
                        shp>userinit = userinit = bash_init;
                        ~~~~~~~~~~~~ ^
1 warning and 1 error generated.

…that's a straight-up typo: that should be shp->userinit. It's clear the Bell Labs folks didn't bother to give this the most rudimentary testing before releasing ksh 93u+.

After fixing the typo, compiling ksh succeeds but linking it fails:

[...]
+ cc -O -L. -L/usr/local/src/ksh93/ksh/arch/darwin.i386-64/lib -o ksh pmain.o libshell.a /usr/local/src/ksh93/ksh/arch/darwin.i386-64/lib/libdll.a /usr/local/src/ksh93/ksh/arch/darwin.i386-64/lib/libast.a -liconv -liconv /usr/local/src/ksh93/ksh/arch/darwin.i386-64/lib/libast.a -liconv -liconv /usr/local/src/ksh93/ksh/arch/darwin.i386-64/lib/libcoshell.a /usr/local/src/ksh93/ksh/arch/darwin.i386-64/lib/libast.a -liconv -liconv /usr/local/src/ksh93/ksh/arch/darwin.i386-64/lib/libast.a -liconv -liconv /usr/local/src/ksh93/ksh/arch/darwin.i386-64/lib/libcmd.a -lutil /usr/local/src/ksh93/ksh/arch/darwin.i386-64/lib/libast.a -liconv -liconv -lutil /usr/local/src/ksh93/ksh/arch/darwin.i386-64/lib/libast.a -liconv -liconv /usr/local/src/ksh93/ksh/arch/darwin.i386-64/lib/libast.a -liconv -liconv /usr/local/src/ksh93/ksh/arch/darwin.i386-64/lib/libast.a -liconv
Undefined symbols for architecture x86_64:
  "_e_bash_rc", referenced from:
      _sh_main in libshell.a(main.o)
  "_sh_bash1", referenced from:
      _infof in libshell.a(args.o)
  "_sh_bash2", referenced from:
      _infof in libshell.a(args.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
mamake [cmd/ksh93]: *** exit code 1 making ksh
mamake: *** exit code 1 making cmd/ksh93
mamake: *** exit code 1 making all

@DavidMorano
Copy link

Does anyone care about ksh's bash compatibility mode?

I am (possibly one of the people) continuing the development of ksh2020 and I have not bothered to even attempt to turn on the BASH compatibility mode, nor do I have any desire to do so. But I am interested in any other responses that people might make here.

@McDutchie

This comment has been minimized.

@DavidMorano

This comment has been minimized.

@McDutchie

This comment has been minimized.

@DavidMorano

This comment has been minimized.

@McDutchie McDutchie added TODO Things to be done before releasing and removed question Further information is requested labels Jun 16, 2020
@McDutchie McDutchie changed the title RFD: bash compatibility mode Fix the bash compatibility mode Jun 16, 2020
@McDutchie
Copy link
Author

There has been an objection on the korn-shell list against removing the broken bash compatibility mode. Here is my reply.

In short: It falls to the people interested in using this to fix it, or have it fixed. Either someone fixes this before we release, or it gets removed shortly before release.

So, this is now a TODO item.

@JohnoKing
Copy link

Well, FWIW it is probably worth the effort to fix the local builtin present in the Bash compatibility mode. This is quoted from @bk2204's comment in att#220:

I'll point out a few things about the local keyword:

  1. There's a request to the Austin Group (the POSIX folks) about standardizing it as
    part of POSIX which has garnered a lot of comments in favor. ksh has typically
    been capable of running POSIX commands, so it may be worthwhile to implement
    for that reason.

  2. Debian requires all shells that can be used as /bin/sh to implement local (and
    they don't build with SHOPT_BASH). Consequently, it isn't possible to use ksh
    as /bin/sh on a Debian or Ubuntu system.

  3. Git requires a shell providing local in order to run the testsuite and shell
    commands. Consequently, it isn't possible to use ksh as the shell for Git.

  4. All other open source POSIX-compliant shells that I'm aware of implement
    local: bash, dash, mksh, pdksh (and derivatives, like OpenBSD's ksh),
    and zsh.

The first point seems to be referencing this request on the Austin Group Defect Tracker. The other points are all perfectly valid reasons to have a local builtin, although this project does have a 'bug fixes only' policy, as in the ksh93u+m README:

  1. No new features; bug fixes only (but see items 3 and 4). Feature development is for a future separate branch.

The feature development branch hasn't been created yet, so improvements to the local builtin will have to be placed behind the SHOPT_BASH ifdef with the rest of the Bash compatibility code, as the local builtin isn't enabled by default.

@McDutchie
Copy link
Author

McDutchie commented Jun 16, 2020

Well, I am not a policy absolutist. I think policies should be broken if there are solid reasons why following them is worse than not following them.

Adding a local builtin just might be such a case, precisely because POSIX is likely to mandate one, one of these years. POSIX compliance was one of Korn's stated core objectives, and is one of ours.

However, to be anything like compatible with other shells, local should implement dynamic scoping, not static scoping like typeset. See Part III, Q28 in this FAQ, to know the difference.

In standard ksh mode, local should then only work in POSIX functions (defined like this() { ...; }), and it should implement dynamic scoping. The behaviour of typeset would be unchanged.

In bash compatibility mode (if it gets fixed), local should simply be a synonym of typeset without options, and both should implement dynamic scoping, even in ksh functions defined with the function keyword – because that is how typeset a.k.a. declare behaves in bash, mksh, yash, and zsh. Which means ksh static scoping would not be available if ksh is compiled in bash compatibility mode, but that is the price. You can't have both in one typeset. (edit: unless we add a new shell option for that instead…)

@McDutchie McDutchie added the help wanted Extra attention is needed label Jun 16, 2020
@JohnoKing
Copy link

I have created a WIP patch to add local and declare as special builtins. At the moment dynamic scoping isn't enabled as it is hidden behind SH_BASH (like in ksh93v-), although I did fix a few bugs (local now works correctly in POSIX functions and isn't broken when ${.sh.fun} is unset, unlike in ksh2020). The patch is included below:
local_builtin.patch.txt

@McDutchie
Copy link
Author

BTW, I don't want to add a declare builtin except as part of a bash compatibility mode, as that has zero chance of ever being added to POSIX.

Nobody is showing up to fix SHOPT_BASH, so it's looking increasingly likely to be removed before release. In which case we should add local only. I don't think we should add bashisms as part of standard ksh; we're not trying to change the ksh language here.

@posguy99
Copy link

posguy99 commented Aug 7, 2020

Does ksh need local? There doesn't seem to be any problem with having local variables right now, although you have to use the non-POSIX function declaration.

@McDutchie
Copy link
Author

Yes, local is set to be added to POSIX so we're going to need it if we're to keep the goal of POSIX compliance. It is practically standard already, ksh93 is the only shell that doesn't have it. It needs to work with POSIX functions and have dynamic scoping, not static scoping like typeset with ksh functions (see Part III, Q28 in this FAQ for the difference).

@JohnoKing
Copy link

JohnoKing commented Aug 8, 2020

I have been trying to get local working with dynamic scoping limited to functions, but I haven't made any progress with POSIX functions (although dynamic scoping now works in ksh functions). The local builtin (present in my local-builtin branch) currently makes all variables assigned in POSIX functions visible outside of the function:

foo() {
    local var=bar
}
foo; echo $var

In most shells that support local (dash, bash, etc.) this script only prints a newline. The current ksh local builtin outputs 'bar' (a bug also present in ksh93v- and ksh2020) because local makes $var visible outside of functions. This is due to a limitation in POSIX functions. Ksh does not give POSIX functions a function-local scope at all, only ksh functions get one:

ksh/src/cmd/ksh93/sh/xec.c

Lines 3201 to 3224 in e805c7d

if(nv_isattr(np,NV_FPOSIX))
{
char *save;
int loopcnt = shp->st.loopcnt;
shp->posix_fun = np;
save = argv[-1];
argv[-1] = 0;
shp->st.funname = nv_name(np);
shp->last_root = nv_dict(DOTSHNOD);
nv_putval(SH_FUNNAMENOD, nv_name(np),NV_NOFREE);
opt_info.index = opt_info.offset = 0;
error_info.errors = 0;
shp->st.loopcnt = 0;
b_dot_cmd(argn+1,argv-1,&shp->bltindata);
shp->st.loopcnt = loopcnt;
argv[-1] = save;
}
else
{
fun.env = envlist;
fun.node = np;
fun.nref = 0;
sh_funscope(argn,argv,0,&fun,execflg);
}

Ksh93v- made the following change to give POSIX functions proper scoping in the Bash compatibility mode:

@@ -3387,7 +3411,7 @@ static void sh_funct(Shell_t *shp,Namval_t *np,int argn, char *argv[],struct arg
 	SH_LEVELNOD->nvalue.s = lp->maxlevel;
 	shp->st.lineno = error_info.line;
 	np->nvalue.rp->running  += 2;
-	if(nv_isattr(np,NV_FPOSIX))
+	if(nv_isattr(np,NV_FPOSIX) && !sh_isoption(shp,SH_BASH))
 	{
 		char *save;
 		int loopcnt = shp->st.loopcnt;

This change breaks backward compatibility because it also affects typeset, so a different fix will be necessary.

@McDutchie
Copy link
Author

McDutchie commented Aug 31, 2020

Since the local built-in is shaping up to be a big project of its own, it's probably best dealt with in a separate issue. It remains to be seen whether we can make it work before the first release. I'd like to put out a first beta test version in the next month or so.

So, since no one has shown up to fix SHOPT_BASH in 2.5 months, I'm working on getting rid, and that will close this issue.

A couple of things are worth keeping. IMO this includes the handy and popular &>file shorthand for >file 2>&1. As a matter of fact, ksh93 already supports this natively, but only while running rc/profile/login scripts, and it issues a warning. I'd like to make it globally available and remove the warning. mksh fully supports it by default, as do zsh and (of course) bash, so I think it's time to fall in line there.

We should also keep the -o posix option (see #20) and it should disable the &>file shorthand, as it is technically a grammatical incompatibility with POSIX. For instance, POSIXly,

ls &>files.txt

is equivalent to

ls &
>files.txt

– meaning ls is run as a background job while an empty file files.txt is created. This is also what ksh93 currently does (except when running an rc/profile/login script). But nobody would write it like in the one-line version above – in real-world scripts there would be a newline separating those, or at least a space. So I seriously doubt that the &> operator would ever bite anyone. (In fact, bash and zsh don't even bother to disable &> in their POSIX modes, though mksh does disable it and so should we.)

@McDutchie McDutchie removed the help wanted Extra attention is needed label Aug 31, 2020
@McDutchie McDutchie added the wontfix This will not be worked on label May 27, 2022
McDutchie added a commit that referenced this issue Jun 13, 2022
I didn't trust this back in e3d7bf1 (which disabled it for
interactive shells) and I trust it less now. In af6a32d/6b380572,
this was also disabled for virtual subshells as it caused program
flow corruption there. Now, on macOS 10.14.6, a crash occurs when
repeatedly running a command with this optimisation:

$ ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done'
0 1 2 3 4 5 6 7 Illegal instruction

Oddly enough it seems that I can only reproduce this crash on macOS
-- not on Linux, OpenBSD, or Solaris. It could be a macOS bug,
particularly given the odd message in the stack trace below.

I've had enough, though. Out it comes. Things now work fine, the
reproducer is fixed on macOS, and it didn't optimise much anyway.

The double-fork issue discussed in e3d7bf1 remains.
________
For future reference, here's an lldb debugger session with a stack
trace. It crashes on calling calloc() (via sh_calloc(), via
sh_newof()) in jobsave_create(). This is not an invalid pointer
problem as we're allocating new memory, so it does look like an OS
bug. The "BUG IN CLIENT OF LIBPLATFORM" message is interesting.

$ lldb -- arch/*/bin/ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done'
(lldb) target create "arch/darwin.i386-64/bin/ksh"
Current executable set to 'arch/darwin.i386-64/bin/ksh' (x86_64).
(lldb) settings set -- target.run-args  "-c" "for((i=0;i<100;i++));do print -n \"$i \";(sleep 1&);done"
(lldb) run
error: shell expansion failed (reason: lldb-argdumper exited with error 2). consider launching with 'process launch'.
(lldb) process launch
Process 35038 launched: '/usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh' (x86_64)
0 1 2 3 4 5 6 7 8 9 Process 35038 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23
libsystem_platform.dylib`_os_unfair_lock_recursive_abort:
->  0x7fff70deb1c2 <+23>: ud2

libsystem_platform.dylib`_os_unfair_lock_unowned_abort:
    0x7fff70deb1c4 <+0>:  movl   %edi, %eax
    0x7fff70deb1c6 <+2>:  leaq   0x1a8a(%rip), %rcx        ; "BUG IN CLIENT OF LIBPLATFORM: Unlock of an os_unfair_lock not owned by current thread"
    0x7fff70deb1cd <+9>:  movq   %rcx, 0x361cb16c(%rip)    ; gCRAnnotations + 8
Target 0: (ksh) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23
    frame #1: 0x00007fff70de7c9a libsystem_platform.dylib`_os_unfair_lock_lock_slow + 239
    frame #2: 0x00007fff70daa3bd libsystem_malloc.dylib`tiny_malloc_should_clear + 188
    frame #3: 0x00007fff70daa20f libsystem_malloc.dylib`szone_malloc_should_clear + 66
    frame #4: 0x00007fff70dab444 libsystem_malloc.dylib`malloc_zone_calloc + 99
    frame #5: 0x00007fff70dab3c4 libsystem_malloc.dylib`calloc + 30
    frame #6: 0x000000010003fa5d ksh`sh_calloc(nmemb=1, size=16) at init.c:264:13
    frame #7: 0x000000010004f8a6 ksh`jobsave_create(pid=35055) at jobs.c:272:8
    frame #8: 0x000000010004ed42 ksh`job_reap(sig=20) at jobs.c:363:9
    frame #9: 0x000000010004ff6f ksh`job_waitsafe(sig=20) at jobs.c:511:3
    frame #10: 0x00007fff70de9b5d libsystem_platform.dylib`_sigtramp + 29
    frame #11: 0x00007fff70d39ac4 libsystem_kernel.dylib`__fork + 12
    frame #12: 0x00007fff70c57d80 libsystem_c.dylib`fork + 17
    frame #13: 0x000000010009590d ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:1883:16
    frame #14: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:2019:4
    frame #15: 0x0000000100096c4f ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2213:9
    frame #16: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2019:4
    frame #17: 0x000000010001c23f ksh`exfile(iop=0x0000000100405750, fno=-1) at main.c:603:4
    frame #18: 0x000000010001b23c ksh`sh_main(ac=3, av=0x00007ffeefbff4f0, userinit=0x0000000000000000) at main.c:365:2
    frame #19: 0x0000000100000776 ksh`main(argc=3, argv=0x00007ffeefbff4f0) at pmain.c:45:9
    frame #20: 0x00007fff70bfe3d5 libdyld.dylib`start + 1
McDutchie added a commit that referenced this issue Jun 13, 2022
I didn't trust this back in e3d7bf1 (which disabled it for
interactive shells) and I trust it less now. In af6a32d/6b380572,
this was also disabled for virtual subshells as it caused program
flow corruption there. Now, on macOS 10.14.6, a crash occurs when
repeatedly running a command with this optimisation:

$ ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done'
0 1 2 3 4 5 6 7 Illegal instruction

Oddly enough it seems that I can only reproduce this crash on macOS
-- not on Linux, OpenBSD, or Solaris. It could be a macOS bug,
particularly given the odd message in the stack trace below.

I've had enough, though. Out it comes. Things now work fine, the
reproducer is fixed on macOS, and it didn't optimise much anyway.

The double-fork issue discussed in e3d7bf1 remains.
________
For future reference, here's an lldb debugger session with a stack
trace. It crashes on calling calloc() (via sh_calloc(), via
sh_newof()) in jobsave_create(). This is not an invalid pointer
problem as we're allocating new memory, so it does look like an OS
bug. The "BUG IN CLIENT OF LIBPLATFORM" message is interesting.

$ lldb -- arch/*/bin/ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done'
(lldb) target create "arch/darwin.i386-64/bin/ksh"
Current executable set to 'arch/darwin.i386-64/bin/ksh' (x86_64).
(lldb) settings set -- target.run-args  "-c" "for((i=0;i<100;i++));do print -n \"$i \";(sleep 1&);done"
(lldb) run
error: shell expansion failed (reason: lldb-argdumper exited with error 2). consider launching with 'process launch'.
(lldb) process launch
Process 35038 launched: '/usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh' (x86_64)
0 1 2 3 4 5 6 7 8 9 Process 35038 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23
libsystem_platform.dylib`_os_unfair_lock_recursive_abort:
->  0x7fff70deb1c2 <+23>: ud2

libsystem_platform.dylib`_os_unfair_lock_unowned_abort:
    0x7fff70deb1c4 <+0>:  movl   %edi, %eax
    0x7fff70deb1c6 <+2>:  leaq   0x1a8a(%rip), %rcx        ; "BUG IN CLIENT OF LIBPLATFORM: Unlock of an os_unfair_lock not owned by current thread"
    0x7fff70deb1cd <+9>:  movq   %rcx, 0x361cb16c(%rip)    ; gCRAnnotations + 8
Target 0: (ksh) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23
    frame #1: 0x00007fff70de7c9a libsystem_platform.dylib`_os_unfair_lock_lock_slow + 239
    frame #2: 0x00007fff70daa3bd libsystem_malloc.dylib`tiny_malloc_should_clear + 188
    frame #3: 0x00007fff70daa20f libsystem_malloc.dylib`szone_malloc_should_clear + 66
    frame #4: 0x00007fff70dab444 libsystem_malloc.dylib`malloc_zone_calloc + 99
    frame #5: 0x00007fff70dab3c4 libsystem_malloc.dylib`calloc + 30
    frame #6: 0x000000010003fa5d ksh`sh_calloc(nmemb=1, size=16) at init.c:264:13
    frame #7: 0x000000010004f8a6 ksh`jobsave_create(pid=35055) at jobs.c:272:8
    frame #8: 0x000000010004ed42 ksh`job_reap(sig=20) at jobs.c:363:9
    frame #9: 0x000000010004ff6f ksh`job_waitsafe(sig=20) at jobs.c:511:3
    frame #10: 0x00007fff70de9b5d libsystem_platform.dylib`_sigtramp + 29
    frame #11: 0x00007fff70d39ac4 libsystem_kernel.dylib`__fork + 12
    frame #12: 0x00007fff70c57d80 libsystem_c.dylib`fork + 17
    frame #13: 0x000000010009590d ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:1883:16
    frame #14: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:2019:4
    frame #15: 0x0000000100096c4f ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2213:9
    frame #16: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2019:4
    frame #17: 0x000000010001c23f ksh`exfile(iop=0x0000000100405750, fno=-1) at main.c:603:4
    frame #18: 0x000000010001b23c ksh`sh_main(ac=3, av=0x00007ffeefbff4f0, userinit=0x0000000000000000) at main.c:365:2
    frame #19: 0x0000000100000776 ksh`main(argc=3, argv=0x00007ffeefbff4f0) at pmain.c:45:9
    frame #20: 0x00007fff70bfe3d5 libdyld.dylib`start + 1
McDutchie added a commit that referenced this issue Jun 14, 2022
I didn't trust this back in e3d7bf1 (which disabled it for
interactive shells) and I trust it less now. In af6a32d/6b380572,
this was also disabled for virtual subshells as it caused program
flow corruption there. Now, on macOS 10.14.6, a crash occurs when
repeatedly running a command with this optimisation:

$ ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done'
0 1 2 3 4 5 6 7 Illegal instruction

Oddly enough it seems that I can only reproduce this crash on macOS
-- not on Linux, OpenBSD, or Solaris. It could be a macOS bug,
particularly given the odd message in the stack trace below.

I've had enough, though. Out it comes. Things now work fine, the
reproducer is fixed on macOS, and it didn't optimise much anyway.

The double-fork issue discussed in e3d7bf1 remains.
________
For future reference, here's an lldb debugger session with a stack
trace. It crashes on calling calloc() (via sh_calloc(), via
sh_newof()) in jobsave_create(). This is not an invalid pointer
problem as we're allocating new memory, so it does look like an OS
bug. The "BUG IN CLIENT OF LIBPLATFORM" message is interesting.

$ lldb -- arch/*/bin/ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done'
(lldb) target create "arch/darwin.i386-64/bin/ksh"
Current executable set to 'arch/darwin.i386-64/bin/ksh' (x86_64).
(lldb) settings set -- target.run-args  "-c" "for((i=0;i<100;i++));do print -n \"$i \";(sleep 1&);done"
(lldb) run
error: shell expansion failed (reason: lldb-argdumper exited with error 2). consider launching with 'process launch'.
(lldb) process launch
Process 35038 launched: '/usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh' (x86_64)
0 1 2 3 4 5 6 7 8 9 Process 35038 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23
libsystem_platform.dylib`_os_unfair_lock_recursive_abort:
->  0x7fff70deb1c2 <+23>: ud2

libsystem_platform.dylib`_os_unfair_lock_unowned_abort:
    0x7fff70deb1c4 <+0>:  movl   %edi, %eax
    0x7fff70deb1c6 <+2>:  leaq   0x1a8a(%rip), %rcx        ; "BUG IN CLIENT OF LIBPLATFORM: Unlock of an os_unfair_lock not owned by current thread"
    0x7fff70deb1cd <+9>:  movq   %rcx, 0x361cb16c(%rip)    ; gCRAnnotations + 8
Target 0: (ksh) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23
    frame #1: 0x00007fff70de7c9a libsystem_platform.dylib`_os_unfair_lock_lock_slow + 239
    frame #2: 0x00007fff70daa3bd libsystem_malloc.dylib`tiny_malloc_should_clear + 188
    frame #3: 0x00007fff70daa20f libsystem_malloc.dylib`szone_malloc_should_clear + 66
    frame #4: 0x00007fff70dab444 libsystem_malloc.dylib`malloc_zone_calloc + 99
    frame #5: 0x00007fff70dab3c4 libsystem_malloc.dylib`calloc + 30
    frame #6: 0x000000010003fa5d ksh`sh_calloc(nmemb=1, size=16) at init.c:264:13
    frame #7: 0x000000010004f8a6 ksh`jobsave_create(pid=35055) at jobs.c:272:8
    frame #8: 0x000000010004ed42 ksh`job_reap(sig=20) at jobs.c:363:9
    frame #9: 0x000000010004ff6f ksh`job_waitsafe(sig=20) at jobs.c:511:3
    frame #10: 0x00007fff70de9b5d libsystem_platform.dylib`_sigtramp + 29
    frame #11: 0x00007fff70d39ac4 libsystem_kernel.dylib`__fork + 12
    frame #12: 0x00007fff70c57d80 libsystem_c.dylib`fork + 17
    frame #13: 0x000000010009590d ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:1883:16
    frame #14: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:2019:4
    frame #15: 0x0000000100096c4f ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2213:9
    frame #16: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2019:4
    frame #17: 0x000000010001c23f ksh`exfile(iop=0x0000000100405750, fno=-1) at main.c:603:4
    frame #18: 0x000000010001b23c ksh`sh_main(ac=3, av=0x00007ffeefbff4f0, userinit=0x0000000000000000) at main.c:365:2
    frame #19: 0x0000000100000776 ksh`main(argc=3, argv=0x00007ffeefbff4f0) at pmain.c:45:9
    frame #20: 0x00007fff70bfe3d5 libdyld.dylib`start + 1
JohnoKing added a commit to JohnoKing/ksh that referenced this issue Aug 19, 2022
The ASan crash in basic.sh when sourcing multiple files is caused by
a bug that is similar to the crash fixed in 59a5672. This is the
trace for the regression test crash (note that in order to see the
trace, the 2>/dev/null redirect must be disabled):

==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100
WRITE of size 8 at 0x6150000005b0 thread T0
    #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967
    ksh93#1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349
    ksh93#2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642
    ksh93#3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613
    ksh93#4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561
    ksh93#5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586
    ksh93#6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438
    ksh93#7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635
    ksh93#8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318
    ksh93#9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254
    ksh93#10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932
    ksh93#11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651
    ksh93#12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318
    ksh93#13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254
    ksh93#14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604
    ksh93#15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369
    ksh93#16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41
    ksh93#17 0x7f637b4db2cf  (/usr/lib/libc.so.6+0x232cf)
    ksh93#18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389)
    ksh93#19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115

Code in question:
https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968

To avoid any more similar crashes, all of the fixes introduced
in 7e317c5 that set slp->slptr to null have been improved with the
fix in 59a5672.
McDutchie pushed a commit that referenced this issue Aug 19, 2022
The ASan crash in basic.sh when sourcing multiple files is caused by
a bug that is similar to the crash fixed in 59a5672. This is the
trace for the regression test crash (note that in order to see the
trace, the 2>/dev/null redirect must be disabled):

==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100
WRITE of size 8 at 0x6150000005b0 thread T0
    #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967
    #1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349
    #2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642
    #3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613
    #4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561
    #5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586
    #6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438
    #7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635
    #8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318
    #9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254
    #10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932
    #11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651
    #12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318
    #13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254
    #14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604
    #15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369
    #16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41
    #17 0x7f637b4db2cf  (/usr/lib/libc.so.6+0x232cf)
    #18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389)
    #19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115

Code in question:
https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968

To avoid any more similar crashes, all of the fixes introduced
in 7e317c5 that set slp->slptr to null have been improved with the
fix in 59a5672.
McDutchie pushed a commit that referenced this issue Aug 19, 2022
The ASan crash in basic.sh when sourcing multiple files is caused by
a bug that is similar to the crash fixed in f24040e. This is the
trace for the regression test crash (note that in order to see the
trace, the 2>/dev/null redirect must be disabled):

==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100
WRITE of size 8 at 0x6150000005b0 thread T0
    #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967
    #1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349
    #2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642
    #3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613
    #4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561
    #5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586
    #6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438
    #7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635
    #8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318
    #9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254
    #10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932
    #11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651
    #12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318
    #13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254
    #14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604
    #15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369
    #16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41
    #17 0x7f637b4db2cf  (/usr/lib/libc.so.6+0x232cf)
    #18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389)
    #19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115

Code in question:
https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968

To avoid any more similar crashes, all of the fixes introduced
in 69d37d5 that set slp->slptr to null have been improved with the
fix in f24040e.
JohnoKing added a commit to JohnoKing/ksh that referenced this issue Sep 23, 2022
The isaname, isaletter, isadigit, isexp and ismeta macros don't check if
c is a negative value before accessing sh_lexstates. This can result in
ASan crashing because of a buffer overflow in quoting2.sh when running
in a multibyte locale:
  test quoting2(C.UTF-8) begins at 2022-09-23+14:03:12
  =================================================================
  ==262224==ERROR: AddressSanitizer: global-buffer-overflow on address 0x557b201a451f at pc 0x557b1fe5e6fc bp 0x7fffcf1ac700 sp 0x7fffcf1ac6f8
  READ of size 1 at 0x557b201a451f thread T0
      #0 0x557b1fe5e6fb in sh_fmtq /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/string.c:341:5
      ksh93#1 0x557b1fe6098c in sh_fmtqf /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/string.c:473:10
      ksh93#2 0x557b1ff08dc0 in extend /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/print.c:998:14
      ksh93#3 0x557b2008a56c in sfvprintf /home/johno/GitRepos/KornShell/ksh/src/lib/libast/sfio/sfvprintf.c:531:8
      ksh93#4 0x557b2005b7f7 in sfprintf /home/johno/GitRepos/KornShell/ksh/src/lib/libast/sfio/sfprintf.c:31:7
      ksh93#5 0x557b1ff04272 in b_print /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/print.c:343:4
      ksh93#6 0x557b1ff04ebf in b_printf /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/print.c:148:9
      ksh93#7 0x557b1fe8d9a7 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1261:21
      ksh93#8 0x557b1fe7a7cf in sh_subshell /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/subshell.c:652:4
      ksh93#9 0x557b1fdedc0d in comsubst /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:2207:9
      ksh93#10 0x557b1fdefc79 in varsub /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:1181:3
      ksh93#11 0x557b1fde3bef in copyto /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:620:21
      ksh93#12 0x557b1fde0b07 in sh_mactrim /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:169:2
      ksh93#13 0x557b1fe05ab6 in nv_setlist /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/name.c:280:9
      ksh93#14 0x557b1fe8a7e8 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1051:7
      ksh93#15 0x557b1fe95b85 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1940:5
      ksh93#16 0x557b1fe99ea6 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:2271:10
      ksh93#17 0x557b1fd23b04 in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604:4
      ksh93#18 0x557b1fd1fe10 in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369:2
      ksh93#19 0x557b1fd1d585 in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41:9
      ksh93#20 0x7f55d5b5028f  (/usr/lib/libc.so.6+0x2328f) (BuildId: 26c81e7e05ebaf40bac3523b7d76be0cd71fad82)
      ksh93#21 0x7f55d5b50349 in __libc_start_main (/usr/lib/libc.so.6+0x23349) (BuildId: 26c81e7e05ebaf40bac3523b7d76be0cd71fad82)
      ksh93#22 0x557b1fc158d4 in _start /build/glibc/src/glibc/csu/../sysdeps/x86_64/start.S:115

src/cmd/ksh93/include/lexstates.h:
- Check if c is negative before accessing sh_lexstates. Backported from
  ksh2020: att@a7013320.
  I'll note that later in ksh2020 these macros became functions:
  att@adc589de. I didn't backport that
  commit because it requires the C99 bool type to avoid compiler
  warnings.
McDutchie added a commit that referenced this issue Feb 29, 2024
When ksh executes a script without a #! path (note that the AT&T
team had a real disliking for #! paths), ksh forks and goes through
a quick reinitialisation procedure. This is much faster than
invoking a fully new shell but should have the same effect if it
all works well. Unfortunately it's not worked all that well so far.
Even after recent improvements (see referenced commits) I've been
finding corner case problems.

FYI, running a script without #! basically goes like this:
* in path_spawn(), execve() fails with ENOEXEC because the file is
  not a binary executable and does not start with #!
* this triggers 'case ENOEXEC:' which:
  * forks ksh
  * calls exscript()
* exscript() cleans up & calls siglongjmp(*sh.jmplist,SH_JMPSCRIPT)
* SH_JMPSCRIPT is the highest longjmp value, so *all* the previous
  sigsetjmp/sh_pushcontext calls are unwinded in reverse order,
  triggering all sorts of cleanup, state restoration, removal of
  local scopes, etc.
* eventually, this lands us at the top sigsetjmp in sh_main()
* sh_main() calls sh_reinit(), then resumes as if the shell had
  just been started

This commit makes the following interrelated changes for the
correct functioning of this procedure:
1. exscript() now exports the environment into a dedicated Stk_t
   buffer and sets environ[] to that.
2. Instead of re-using existing variables, sh_reinit() deletes
   everything and reinits all name-value trees from scratch,
   then re-imports the environment from environ[].
3. Variable values that were imported from the environment are no
   longer treated specially with an NV_IMPORT attribute and the
   np->nvenv pointer to their value in environ[], fixing at least
   one crash.[*1]

Details of the changes follow:

src/cmd/ksh93/sh/path.c:
- exscript(): Generate a new environ[] by activating a dedicated
  AST stack that will not be overwritten before calling
  sh_envgen(). This will allow sh_reinit() to delete all variables
  and then reimport the environment. The exporting must be done
  here, before siglongjmp, otherwise locally scoped exported
  variables won't be included (siglongjmp with SH_JMPSCRIPT
  triggers cleanup of all scopes).

src/cmd/ksh93/sh/init.c:
- sh_reinit(): Largely rewritten as follows.
  - Reset shell options first. This has the beneficial side effect
    of unsetting SH_RESTRICTED which interferes with unsetting
    certain variables, like PATH.
  - Remove workarounds for FPATH, SHLVL and tilde expansion
    disciplines; these will not be needed now.
  - Properly unset and delete all functions and built-ins. Since we
    now unset a function before deleting it, this should now free
    up their memory. (See nvdisc.c below for a change allowing
    removal of special built-ins.)
  - Properly unset all variables (which includes any associated
    discipline functions). Incorporate here the needed logic from
    sh_envnolocal() in name.c; most of it is unneeded (that
    function was previously used to cleanup local variables but has
    not been used for that for decades). So sh_envnolocal() is now
    unused.
  - Delete variables in a separate pass after unsetting variables
    and unsetting and deleting functions; this avoids use-after-
    free problems as well as possible "no parent" problems with
    namespace variables (e.g., .rc.status in our new kshrc.sh).
  - After all that, close and free up all function, alias, tracked
    alias, type and variable trees.
  - Free the contiguous built-in node space and the Init_t init
    context (with all the special variable discipline pointers).
  - Call nv_init (previously only called from sh_init) to
    reinitialise all of the above name-value stuff from scratch.
    It's the only way to be sure.
  - Re-import the environment as stored by exscript() above.
- env_init():
  - Per item 3 above and footnote 1 below, no longer set NV_IMPORT
    attribute and no longer point np->nvenv to the item in environ.
  - POSIX says, for 'environ': "Any application that directly
    modifies the pointers to which the environ variable points has
    undefined behavior."[*2] Yet, env_init() is indeed juggling the
    environ[] pointers to deal with variables that cannot be
    imported because their names are invalid (because they still
    need to be saved to be passed on to child processes). Replace
    the current approach with one where those env vars get
    allocated on the heap, pointed to by sh.save_env and counted by
    sh.save_env_n (renamed from sh.nenv). This only needs to be
    done once as ksh cannot use or change these variables.

src/cmd/ksh93/sh/name.c:
- sh_envgen(): Update to match env_init() change above.
- pushnam() (called by sh_envgen()): Remove NV_IMPORT attribute
  check as per above and never get the value from the nvenv pointer
  -- simply always use nv_getval(). As of this change, the
  NV_IMPORT attribute is unused. The next commit will remove it
  and do related cleanups.
- staknam(): is only called if value!=NULL, so remove that 'if'.
- sh_envnolocal(): Removed.

src/cmd/ksh93/sh/nvdisc.c:
- assign(): Remove a check for the SH_INIT state bit that avoids
  freeing functions during sh_reinit(). This works fine now.
- sh_addbuiltin(): Allow sh_reinit() to delete special builtins by
  checking for the SH_INIT state bit before throwing an error.

src/cmd/ksh93/sh/nvtree.c:
- outval(): Add a workaround for a use-after-free, introduced by
  the changes above, that occurred in the types.sh tests for
  #!-less scripts (types.sh:675-722). The use-after-free occurred
  here (abridged ASan trace follows; line numbers are current as of
  this commit):
  ==30849==ERROR: AddressSanitizer: heap-use-after-free [...]
    #0 in dttree dttree.c:393
    #1 in sh_reinit init.c:1637
    #2 in sh_main main.c:136
    [...]
  The pointer was freed in the same loop via nv_delete() in outval:
    #0 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:[...])
    #1 in nv_delete name.c:1318
    #2 in outval nvtree.c:731
    #3 in genvalue nvtree.c:905
    #4 in walk_tree nvtree.c:1042
    #5 in put_tree nvtree.c:1108
    #6 in nv_putv nvdisc.c:144
    #7 in _nv_unset name.c:2437
    #8 in sh_reinit init.c:1645
    #9 in sh_main main.c:136
    [...]
  So, what happened was that the nv_delete() call on name.c:1318
  (eventually resulting from the _nv_unset call on init.c:1645)
  freed the node pointed to by np, so that the next loop iteration
  crashed on line 1637 as the dtnext() macro now gets a freed np.
      Now, why on earth should _nv_unset() *ever* indirectly call
  nv_delete()? That's a question for another day; I suspect it may
  be a bug, or it may be needed for compound variables for some
  reason. For now, I'm adding a workaround: simply avoid calling
  nv_delete() if the SH_INIT state bit is on, indicating
  sh_reinit() is in the call stack. This allows the variables unset
  loop in sh_reinit() to continue without crashing. sh_reinit()
  handles deletion later anyway.

src/cmd/ksh93/sh/main.c:
- sh_main(): remove zeroing of sh.fun_depth and sh.dot_depth; these
  are known to be 0, coming from either sh_init() or sh_reinit().

________
[*1] This NV_IMPORT/nvenv usage is a redundant holdout from ancient
     ksh code; the imported value is easily available as a normal
     shell variable value via nv_getval(). Plus, the nvenv pointer
     is overloaded with too many other purposes: so far I've
     discovered it's used for pointers to subarrays of arrays
     (multidimentional arrays), compound variables, builtins, and
     other things.
     This mess caused at least one crash in set_instance() (xec.c)
     due to incorrectly using that nvenv pointer. The current kshrc
     script triggers this. Reproducer:
        $ export PS1
        $ bin/package use
        «0»26:…/src/ksh93/ksh[dev] $ typeset +x PS1
     ...and crash. That is now fixed.

[*2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/environ.html
McDutchie added a commit that referenced this issue Feb 29, 2024
When ksh executes a script without a #! path (note that the AT&T
team had a real disliking for #! paths), ksh forks and goes through
a quick reinitialisation procedure. This is much faster than
invoking a fully new shell but should have the same effect if it
all works well. Unfortunately it's not worked all that well so far.
Even after recent improvements (see referenced commits) I've been
finding corner case problems.

FYI, running a script without #! basically goes like this:
* in path_spawn(), execve() fails with ENOEXEC because the file is
  not a binary executable and does not start with #!
* this triggers 'case ENOEXEC:' which:
  * forks ksh
  * calls exscript()
* exscript() cleans up & calls siglongjmp(*sh.jmplist,SH_JMPSCRIPT)
* SH_JMPSCRIPT is the highest longjmp value, so *all* the previous
  sigsetjmp/sh_pushcontext calls are unwinded in reverse order,
  triggering all sorts of cleanup, state restoration, removal of
  local scopes, etc.
* eventually, this lands us at the top sigsetjmp in sh_main()
* sh_main() calls sh_reinit(), then resumes as if the shell had
  just been started

This commit makes the following interrelated changes for the
correct functioning of this procedure:
1. exscript() now exports the environment into a dedicated Stk_t
   buffer and sets environ[] to that.
2. Instead of re-using existing variables, sh_reinit() deletes
   everything and reinits all name-value trees from scratch,
   then re-imports the environment from environ[].
3. Variable values that were imported from the environment are no
   longer treated specially with an NV_IMPORT attribute and the
   np->nvenv pointer to their value in environ[], fixing at least
   one crash.[*1]

Details of the changes follow:

src/cmd/ksh93/sh/path.c:
- exscript(): Generate a new environ[] by activating a dedicated
  AST stack that will not be overwritten before calling
  sh_envgen(). This will allow sh_reinit() to delete all variables
  and then reimport the environment. The exporting must be done
  here, before siglongjmp, otherwise locally scoped exported
  variables won't be included (siglongjmp with SH_JMPSCRIPT
  triggers cleanup of all scopes).

src/cmd/ksh93/sh/init.c:
- sh_reinit(): Largely rewritten as follows.
  - Reset shell options first. This has the beneficial side effect
    of unsetting SH_RESTRICTED which interferes with unsetting
    certain variables, like PATH.
  - Remove workarounds for FPATH, SHLVL and tilde expansion
    disciplines; these will not be needed now.
  - Properly unset and delete all functions and built-ins. Since we
    now unset a function before deleting it, this should now free
    up their memory. (See nvdisc.c below for a change allowing
    removal of special built-ins.)
  - Properly unset all variables (which includes any associated
    discipline functions). Incorporate here the needed logic from
    sh_envnolocal() in name.c; most of it is unneeded (that
    function was previously used to cleanup local variables but has
    not been used for that for decades). So sh_envnolocal() is now
    unused.
  - Delete variables in a separate pass after unsetting variables
    and unsetting and deleting functions; this avoids use-after-
    free problems as well as possible "no parent" problems with
    namespace variables (e.g., .rc.status in our new kshrc.sh).
  - After all that, close and free up all function, alias, tracked
    alias, type and variable trees.
  - Free the contiguous built-in node space and the Init_t init
    context (with all the special variable discipline pointers).
  - Call nv_init (previously only called from sh_init) to
    reinitialise all of the above name-value stuff from scratch.
    It's the only way to be sure.
  - Re-import the environment as stored by exscript() above.
- env_init():
  - Per item 3 above and footnote 1 below, no longer set NV_IMPORT
    attribute and no longer point np->nvenv to the item in environ.
  - POSIX says, for 'environ': "Any application that directly
    modifies the pointers to which the environ variable points has
    undefined behavior."[*2] Yet, env_init() is indeed juggling the
    environ[] pointers to deal with variables that cannot be
    imported because their names are invalid (because they still
    need to be saved to be passed on to child processes). Replace
    the current approach with one where those env vars get
    allocated on the heap, pointed to by sh.save_env and counted by
    sh.save_env_n (renamed from sh.nenv). This only needs to be
    done once as ksh cannot use or change these variables.

src/cmd/ksh93/sh/name.c:
- sh_envgen(): Update to match env_init() change above.
- pushnam() (called by sh_envgen()): Remove NV_IMPORT attribute
  check as per above and never get the value from the nvenv pointer
  -- simply always use nv_getval(). As of this change, the
  NV_IMPORT attribute is unused. The next commit will remove it
  and do related cleanups.
- staknam(): is only called if value!=NULL, so remove that 'if'.
- sh_envnolocal(): Removed.

src/cmd/ksh93/sh/nvdisc.c:
- assign(): Remove a check for the SH_INIT state bit that avoids
  freeing functions during sh_reinit(). This works fine now.
- sh_addbuiltin(): Allow sh_reinit() to delete special builtins by
  checking for the SH_INIT state bit before throwing an error.

src/cmd/ksh93/sh/nvtree.c:
- outval(): Add a workaround for a use-after-free, introduced by
  the changes above, that occurred in the types.sh tests for
  #!-less scripts (types.sh:675-722). The use-after-free occurred
  here (abridged ASan trace follows; line numbers are current as of
  this commit):
  ==30849==ERROR: AddressSanitizer: heap-use-after-free [...]
    #0 in dttree dttree.c:393
    #1 in sh_reinit init.c:1637
    #2 in sh_main main.c:136
    [...]
  The pointer was freed in the same loop via nv_delete() in outval:
    #0 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:[...])
    #1 in nv_delete name.c:1318
    #2 in outval nvtree.c:731
    #3 in genvalue nvtree.c:905
    #4 in walk_tree nvtree.c:1042
    #5 in put_tree nvtree.c:1108
    #6 in nv_putv nvdisc.c:144
    #7 in _nv_unset name.c:2437
    #8 in sh_reinit init.c:1645
    #9 in sh_main main.c:136
    [...]
  So, what happened was that the nv_delete() call on name.c:1318
  (eventually resulting from the _nv_unset call on init.c:1645)
  freed the node pointed to by np, so that the next loop iteration
  crashed on line 1637 as the dtnext() macro now gets a freed np.
      Now, why on earth should _nv_unset() *ever* indirectly call
  nv_delete()? That's a question for another day; I suspect it may
  be a bug, or it may be needed for compound variables for some
  reason. For now, I'm adding a workaround: simply avoid calling
  nv_delete() if the SH_INIT state bit is on, indicating
  sh_reinit() is in the call stack. This allows the variables unset
  loop in sh_reinit() to continue without crashing. sh_reinit()
  handles deletion later anyway.

src/cmd/ksh93/sh/main.c:
- sh_main(): remove zeroing of sh.fun_depth and sh.dot_depth; these
  are known to be 0, coming from either sh_init() or sh_reinit().

________
[*1] This NV_IMPORT/nvenv usage is a redundant holdout from ancient
     ksh code; the imported value is easily available as a normal
     shell variable value via nv_getval(). Plus, the nvenv pointer
     is overloaded with too many other purposes: so far I've
     discovered it's used for pointers to subarrays of arrays
     (multidimentional arrays), compound variables, builtins, and
     other things.
     This mess caused at least one crash in set_instance() (xec.c)
     due to incorrectly using that nvenv pointer. The current kshrc
     script triggers this. Reproducer:
        $ export PS1
        $ bin/package use
        «0»26:…/src/ksh93/ksh[dev] $ typeset +x PS1
     ...and crash. That is now fixed.

[*2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/environ.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODO Things to be done before releasing wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants