>(Process substitution) b0rken when combined with redirection #2

McDutchie · 2020-06-10T12:31:27Z

Process substitution of the form >(list) is utterly broken:

echo ok > >(sed s/ok/good/)

Expected output: good (as on bash and zsh)

Actual output:

either none, with a file with a bizarre binary file name created
in the current working directory that contains ok plus a newline
(which is what happens on Linux and Solaris);
or (depending on the file system and/or OS and/or locale) an
error message indicating the refusal to create that file:
ksh: ?: cannot create [Illegal byte sequence]
(which is what happens on my Mac).

Before I spend hours searching for the cause, does anyone have any idea where to begin debugging this?

The text was updated successfully, but these errors were encountered:

JohnoKing · 2020-06-10T12:54:45Z

I did some quick tests and running that command will not produce a strange file in ksh93v- if run on Linux, although ksh93v- on FreeBSD still does cause an odd file to be created. Both ksh93u+ and ksh2020 will produce a file. All versions since ksh93u+ will not produce output on Linux or FreeBSD.

JohnoKing · 2020-06-10T13:18:46Z

Update: Apparently ksh93v- is still creating a file, but the file name was not showing up when I used ls or echo *. Using Dolphin (the KDE Plasma 5 file manager) shows a file without a name.

ksh segfaults in job_chksave after receiving SIGCHLD https://bugs.launchpad.net/ubuntu/+source/ksh/+bug/1697501 Eric Desrochers wrote on 2017-06-12: [Impact] * The compiler optimization dropped parts from the ksh job locking mechanism from the binary code. As a consequence, ksh could terminate unexpectedly with a segmentation fault after it received the SIGCHLD signal. [Test Case] Unfortunately, there is no clear and easy way to reproduce the segfault. * But the original reporter of this bug can randomly reproduce the problem using an in-house ksh script that only works inside his infrastructure as follow : "ksh <in-house-script.ksh>" and then once in a while ksh will segfault as follow : (gdb) bt #0 job_chksave (pid=pid@entry=19003) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:1948 #1 0x00000000004282ab in job_reap (sig=17) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:428 #2 <signal handler called> ... [Regression Potential] * Regression risk : low/none expected, the package has been highly/intensively tested by a user who run over 18M ksh scripts a day on each of their clusters. [...] * The fix has been written by RH and has been proven to work for them for the last 3 years. * A test package including the RH fix has been intensively tested and verified (pre-SRU) by an affected user with positive feedbacks using a reproducer that segfault without the RH patch. * Test package (pre-SRU) feedbacks : https://bugs.launchpad.net/ubuntu/xenial/+source/ksh/+bug/1697501/comments/7 [Other Info] * Details about the RH bug : - https://bugzilla.redhat.com/show_bug.cgi?id=1123467 - https://bugzilla.redhat.com/show_bug.cgi?id=1112306 - https://access.redhat.com/solutions/1253243 - http://rhn.redhat.com/errata/RHBA-2014-1015.html - ksh.spec * Fri Jul 25 2014 Michal Hlavinka <email address hidden> - 20120801-10.8 * job locking mechanism did not survive compiler optimization (#1123467) - patch * ksh-20120801-locking.patch Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=867181 [Original Description] # gdb [New LWP 3882] Core was generated by `/bin/ksh <KSH_SCRIPT>.ksh'. Program terminated with signal SIGSEGV, Segmentation fault. #0 job_chksave (pid=pid@entry=19385) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:1948 1948 if(jp->pid==pid) (gdb) p *jp Cannot access memory at address 0xb (gdb) p *jp->pid Cannot access memory at address 0x13 (gdb) p pid $2 = 19385 (gdb) p *jpold $1 = {next = 0xb, pid = -604008960, exitval = 11124} The struct is corrupted at some point looking at the next,pid and exitval struct members values which isn't valid data. # assembly code => 0x0000000000427159 <+41>: cmp %edi,0x8(%rdx) (gdb) p $edi ## pid variable $1 = 19385 (gdb) p *($rdx + 8) ## jp->pid struct Cannot access memory at address 0x13 -- ksh is segfaulting because it can't access struct "jp" ($rdx) thus cannot de-reference the struct member "jp>pid" ($rdx + 8) at line : src/cmd/ksh93/sh/jobs.c:1948 when looking if jp->pid is equal to pid ($edi) variable.

McDutchie · 2020-06-18T11:18:02Z

It only breaks when you combine it with an output redirection. When used without that, e.g. when passing the process substitution as a filename to tee, it works correctly:

$ ksh -c 'echo hi | tee >(cat)'
hi
hi

(output is as on bash and zsh)

JohnoKing · 2020-06-22T00:27:13Z

I found the reason why process substitution doesn't work with redirections. The code for handling process substitution with redirections is inside of an if statement that assumes IORAW is not set:

ksh/src/cmd/ksh93/sh/io.c

Line 1175 in de2b4a6

if(!(iof&IORAW))

ksh/src/cmd/ksh93/sh/io.c

Lines 1185 to 1196 in de2b4a6

    
           else if(iof&IOPROCSUB) 
        
           { 
        
           	struct argnod *ap = (struct argnod*)stakalloc(ARGVAL+strlen(iop->ioname)); 
        
           	memset(ap, 0, ARGVAL); 
        
           	if(iof&IOPUT) 
        
           		ap->argflag = ARG_RAW; 
        
           	else if(shp->subshell) 
        
           		sh_subtmpfile(shp); 
        
           	ap->argchn.ap = (struct argnod*)fname;  
        
           	ap = sh_argprocsub(shp,ap); 
        
           	fname = ap->argval; 
        
           }

This is never run for redirections as IOPROCSUB usually causes IORAW to be set. This is because process_sub sets the bit for ARG_RAW when the given token is '>' (i.e. OPROCSYM):

ksh/src/cmd/ksh93/sh/parse.c

Line 1386 in de2b4a6

int mode = (tok==OPROCSYM);

ksh/src/cmd/ksh93/sh/parse.c

Line 1391 in de2b4a6

argp->argflag = (ARG_EXP|mode);

ksh/src/cmd/ksh93/sh/parse.c

Lines 1743 to 1747 in de2b4a6

    
           else if(((token==IPROCSYM && !(iof&IOPUT)) || (token==OPROCSYM && (iof&IOPUT))) && !(iof&(IOLSEEK|IOREWRITE|IOMOV|IODOC))) 
        
           { 
        
           	lexp->arg = process_sub(lexp,token); 
        
           	iof |= IOPROCSUB; 
        
           }

ksh/src/cmd/ksh93/sh/parse.c

Lines 1781 to 1782 in de2b4a6

    
           if(lexp->arg->argflag&ARG_RAW) 
        
           	iof |= IORAW;

The IOPROCSUB section in io.c can be moved out of the IORAW if statement as a workaround for this, ~~although a background job will spawn~~ ksh will print the process ID of the asynchronous process:

$ echo ok > >(sed s/ok/good/)
[1] 19032
$ good

McDutchie · 2020-06-22T00:52:52Z

That's some great sleuthing! Process substitution is actually documented to run the process asynchronously (meaning: as a background process), so that bit is fine – it's just that those processes should be exempt from job control, so the [1] 19032 should not have been printed.

McDutchie · 2020-06-22T01:06:47Z

After moving the IOPROCSUB section as you described above, it now appears to work correctly on non-interactive shells, e.g.:

$ ksh -c 'echo ok > >(sed s/ok/good/); echo $!; wait'
79626
good

The wait is for the parent shell to wait for the asynchronous process to be finished before exiting. $! is set to its PID, as expected.

Unfortunately, moving the block introduces a memory fault somewhere in the io.sh regression tests, which I can't localise right now as it's 3am here. Thanks and goodnight.

JohnoKing · 2020-06-22T01:35:40Z

Ksh prints the process ID of the asynchronous process when process substitution is used in general (in interactive shells), which suggests it is a separate bug:

$ echo ok >(/bin/true)
[1]	11717
ok /dev/fd/5

This doesn't happen in Bash or zsh.

The code for handling process substitution with redirection was never being run because IORAW is usually set when IOPROCSUB is set. This commit fixes the problem by moving the required code out of the !IORAW if statement. The following command now prints 'good' instead of writing 'ok' to a bizzare file: $ ksh -c 'echo ok > >(sed s/ok/good/); wait' good This commit also fixes a bug that caused the process ID of the asynchronous process to print when the shell was in interactive mode. The following command no longer prints a process ID, behaving like in Bash and zsh: $ echo >(/bin/true) /dev/fd/5 src/cmd/ksh93/sh/args.c: - Temporarily turn off the interactive state while in a process substitution to prevent the shell from printing the PID of the asynchronous process. src/cmd/ksh93/sh/io.c: - Move the code for process substitution with redirection into a separate if statement. src/cmd/ksh93/tests/basic.sh: - Add two tests for both process substitution bugs fixed by this commit. Fixes ksh93#2

The code for handling process substitution with redirection was never being run because IORAW is usually set when IOPROCSUB is set. This commit fixes the problem by moving the required code out of the !IORAW if statement. The following command now prints 'good' instead of writing 'ok' to a bizzare file: $ ksh -c 'echo ok > >(sed s/ok/good/); wait' good This commit also fixes a bug that caused the process ID of the asynchronous process to print when the shell was in interactive mode. The following command no longer prints a process ID, behaving like in Bash and zsh: $ echo >(/bin/true) /dev/fd/5 src/cmd/ksh93/sh/args.c: - Temporarily turn off the interactive state while in a process substitution to prevent the shell from printing the PID of the asynchronous process. src/cmd/ksh93/sh/io.c: - Move the code for process substitution with redirection into a separate if statement. src/cmd/ksh93/tests/io.sh: - Add two tests for both process substitution bugs fixed by this commit. src/cmd/ksh93/tests/shtests: - Update shtests with a patch from Martijn Dekker to use pretty-printing for the output from the times builtin (if it is available). Fixes ksh93#2

Hopefully this doesn't introduce new bugs, but it does fix at least the following: 1. When whence -v/-a found an "undefined" (i.e. autoloadable) function in $FPATH, it actually loaded the function as a side effect of reporting on its existence (!). Now it only reports. 2. 'whence' will now canonicalise paths properly. Examples: $ whence ///usr/lib/../bin//./env /usr/bin/env $ (cd /; whence -v dev/../usr/bin//./env) dev/../usr/bin//./env is /usr/bin/env 3. 'whence' no longer prefixes a spurious double slash when doing something like 'cd / && whence bin/echo'. On Cygwin, an initial double slash denotes a network server, so this was not just a cosmetic problem. 4. 'whence -a' now reports a "tracked alias" (a.k.a. hash table entry, i.e. cached $PATH search) even if an actual alias by the same name exists. This needed fixing because in fact the hash table entry continues to be used when bypassing the alias. Aliases and "tracked aliases" are not remotely the same thing; confusing nomenclature is not a reason to report wrong results. 5. When using 'hash' or 'alias -t' on a command that is also a builtin to force caching a $PATH search for the external command, 'whence -a' double-reported the path: $ hash printf; whence -a printf printf is a shell builtin printf is /usr/bin/printf printf is a tracked alias for /usr/bin/printf This is now fixed so that the second output line is gone. Plus, if there were multiple versions of the command on $PATH, the tracked alias was reported at the end, which is the wrong order. This is also fixed. src/cmd/ksh93/bltins/whence.c: whence(): - Refactor the do...while loop that handles whence -v/-a for path searches in such a way that the code actually makes sense and stops looking like higher esotericism. Just doing this fixed #2, #4 and #5 above (the latter two before I even noticed them). For instance, the path_fullname() call to canonicalise paths was already there; it was just never used. - Remove broken 'notrack' flaggery for deciding whether to report a hash table entry a.k.a. "tracked alias"; instead, check the hash table (shp->track_tree). src/cmd/ksh93/sh/path.c: - path_search(): Re #3: When prefixing the PWD, first check if we're in '/' and if so, don't prefix it; otherwise, adding the next slash causes an initial double slash. (Since '/' is the only valid single-character absolute path, all we need to do is check if the second character pwd[1] is non-null.) - path_search(): Re #1: Stop autoloading when called by 'whence': * The 'flag==2' check to avoid autoloading a function was broken. The flag value is 2 on the first whence() loop iteration, but 3 on subsequent ones. Change to 'flag >= 2'. * However, this only fixes it if the function file does not have the x permission bit, as executable files are handled by path_absolute() which unconditionally autoloads functions! So, pass on our flag parameter when callling path_absolute(). - path_absolute(): Re #1: Add flag parameter. Do not autoload functions if flag >= 2. src/cmd/ksh93/include/path.h, src/cmd/ksh93/bltins/typeset.c, src/cmd/ksh93/sh/main.c, src/cmd/ksh93/sh/xec.c: - Re #1: Update path_absolute() calls, adding a 0 flag parameter. src/cmd/ksh93/include/name.h: - Remove now-unused pathcomp member from union Value. It was introduced in 9906535 to allow examining the value of a tracked alias. This commit uses nv_getval() instead. src/cmd/ksh93/tests/builtins.sh, src/cmd/ksh93/tests/path.sh: - Add and tweak various related tests. Fixes: #84

For one Red Hat customer, the following reproducer consistently crashed, tough I was not able to reproduce it and neither was RH. However, the crash analysis is sound (see below). function dlog { fc -ln -0 } trap dlog DEBUG >/tmp/blah Original patch: https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140801-arraylen.patch The Red Hat bug thread is closed to the public as it also contains some correspondence with their customer. But it has an excellent crash analysis from Thomas Gardner which I'm including here for the record (the line numbers are for their ksh at the time, not 93u+m). ===begin analysis=== > The creation of an empty file instead of a command that executes > anything causes the coredump. [...] > Here is my analysis on the core that was provided by the customer: > > (gdb) bt > #0 sh_fmtq (string=0x1 <Address 0x1 out of bounds>) > at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/string.c:340 > #1 0x0000000000457e96 in out_string (cp=<value optimized out>, c=32, > quoted=<value optimized out>, iop=<value optimized out>) > at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:444 > #2 0x000000000045804c in sh_debug (shp=0x76d180, trap=0x7f2f13a821e0 "dlog", > name=<value optimized out>, subscript=<value optimized out>, > argv=0x76e070, flags=<value optimized out>) > at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:548 > #3 0x000000000045a867 in sh_exec (t=0x7f2f13aafad0, flags=4) > at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:1265 > [...need go no further...] > > In frame 2, we can see it cycling through your classic > (char **)argv array like: > > 543 while(cp = *argv++) > 544 { > 545 if((flags&ARG_EXP) && argv[1]==0) > 546 out_pattern(iop, cp,' '); > 547 else > 548 out_string(iop, cp,' ',n?0: (flags&(ARG_RAW|ARG_NOGLOB))||*argv); > 549 } > 550 if(flags&ARG_ASSIGN) > 551 sfputc(iop,')'); > 552 else if(iop==stkstd) > > (we seg-fault after going down the out_string function in line > 548 up there). The string pointer that points to = 0x1 up in > frame #0 (sh_fmtq) traces back to the "cp" variable in line 548 > up there. The "argv" variable being referenced up there just gets > passed in as the fifth argument to this function. > > In frame #3 (sh_exec, line 1265), the line that makes the call > that takes us to frame 2 is: > > 1265 int n = sh_debug(shp,trap,(char*)0,(char*)0, com, ARG_R AW); > > so "com" (the fifth argument) is what's going wrong as it > descends down through these calls. Looking at where it comes > from, well, it's assigned here: > > 1241 if(argn==0) > 1242 { > 1243 /* fake 'true' built-in */ > 1244 np = SYSTRUE; > 1245 *argv = nv_name(np); > 1246 com = argv; > 1247 } > > because as we can see: > > (gdb) f 3 > #3 0x000000000045a867 in sh_exec (t=0x7f2f13aafad0, flags=4) > at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:1265 > 1265 int n = sh_debug(shp,trap,(char*)0,(char*)0, com, ARG_RAW); > (gdb) p argn > $2 = 0 > (gdb) > > argn is == 0 here. The tip-off here is that nv_name clearly > returns a simple pointer to an array of characters, not an array > of pointers to arrays of characters as is evidenced by the fact > that the assignment is "*argv = nv_name(np);" not "argv = > nv_name(np);". Looking at the function nv_name proves that it > does indeed return a single pointer to an array of characters, > not a pointer to an array of pointers to arrays of characters. > Now, com is defined as a 'char **': > > 1002 char *cp=0, **com=0, *comn; > > (as it is expected to be in the calls that follow) also, that > argv is also defined as the effective equivalent a 'char **': > > 1237 static char *argv[1]; > > Yup, argv is actually an array of pointers (char ** equivalent), > but that array is restricted to having exactly one element. > Recalling the assignment in the previously quoted line: > > 1245 *argv = nv_name(np); > > we see that the one and only element in that argv array is > getting assigned a pointer to an array of characters here. > Nothing necessarily wrong with that, but remember the loop we > looked at earlier in frame #2 (sh_debug). It went like: > > 543 while(cp = *argv++) > 544 { > 545 if((flags&ARG_EXP) && argv[1]==0) > 546 out_pattern(iop, cp,' '); > 547 else > 548 out_string(iop, cp,' ',n?0: (flags&(ARG_RAW|ARG_NOGLOB))||*argv); > 549 } > > which is clearly expecting argv in this context (com in frame 3, > which really points to that static local single element array > that is also pointed to by argv in frame 2) to be an array of > pointers of indefinite size, each element being a pointer, but > whose last element will be a null pointer. Well, in frame 3 it is > clearly an array with only a single element, and that one element > is NOT pointing to null. Watch this: > > (gdb) f 3 > #3 0x000000000045a867 in sh_exec (t=0x7f2f13aafad0, flags=4) > at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:1265 > 1265 int n = sh_debug(shp,trap,(char*)0,(char*)0, com, ARG_RAW); > (gdb) p com > $8 = (char **) 0x76e060 > (gdb) p &argv > $9 = (char *(*)[1]) 0x76e060 > (gdb) p com[0] > $11 = 0x5009c6 "true" > (gdb) p com[1] > $10 = 0x1 <Address 0x1 out of bounds> > (gdb) p argv[0] > $12 = 0x5009c6 "true" > (gdb) p argv[1] > $13 = 0x1 <Address 0x1 out of bounds> > (gdb) > > So, as expected, com and &argv point to the same place, the first > element points to the constant string "true", but since the array > is defined as having only one element, when you refer to a second > element in that array, you get well, whatever random crap happens > to be in that memory location. When we try to reproduce this > problem, apparently we're getting 0 there (or we're not quite > following this same code path, which is also possible), but the > customer happens to have a "1" in that memory location. ===end analysis=== src/cmd/ksh93/sh/xec.c: sh_exec(): - When processing TCOM (simple command) with an empty/null command, increase the size of the static dummy argv[1] array to argv[2], ensuring a terminating NULL element so that 'while(cp = *argv++)' loops don't crash. (Note that static objects are automatically initialised to zero in C.) src/cmd/ksh93/tests/io.sh: - Adapt the reproducer, testing a null-command redirection 1000x.

Original patch: https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140801-diskfull.patch Prior discussion: https://www.mail-archive.com/ast-users@lists.research.att.com/msg01037.html https://www.mail-archive.com/ast-users@lists.research.att.com/msg01038.html https://www.mail-archive.com/ast-users@lists.research.att.com/msg01042.html https://bugzilla.redhat.com/1212992 On Fri, 08 May 2015 14:37:45 -0700, Paulo Andrade wrote: > I have a user with a ksh crashing problem, and that has > some "Write error: No space left on device" messages > in /var/log/messages. > > After some debugging, and creating a chroot on a file > disk image, and a test user, and slowly filling the > "on file" filesystem, e.g. > > dd if=/dev/zero of=/mnt/tmp/zerosN bs=1M count=1024 > dd if=/dev/zero of=/mnt/tmp/zerosN bs=1K count=2 > > until leaving just around 12K, I managed to reproduce the > problem, and be able to debug it with valgrind and vgdb; > debugging on these conditions is tricky, as cannot tell > valgrind to spawn gdb, because then gdb itself would fail > to start. > > So, after following the code enough, I learned that at places > it handles SH_JMPEXIT, there was almost non existing > handling of SH_JMPERREXIT. > > ksh would evently cause a crash due to the struct > subshell allocated on stack, in sh/subshell.c:sh_subshell > kept set to the global subshell_data, after it siglongjmp > back the stack due to, not fully handling the out of disk > space errors. It would print a few messages, everytime > a pipe was created, e.g.: > > /etc/profile: line 28: write to 3 failed [No space left on device] > > until eventually crashing due to corrupted memory; e.g. the > references to stack data from sh_subsell in the global > subshell_data. One strange thing to me in coredump analysis > was that subshell_data prev field was pointing to itself when > it eventually crashed, what later was understood and expected... > > The attached patch handles SH_JMPERREXIT in the code > paths SH_JMPEXIT is handled, and the failed login, on > full disk, ends in a pause() call: > > ---terminal 1--- > $ valgrind -q --leak-check=full --free-fill=0x5a --vgdb=full > --vgdb-error=0 /bin/ksh -l > ==17730== (action at startup) vgdb me ... > ==17730== > ==17730== TO DEBUG THIS PROCESS USING GDB: start GDB like this > ==17730== /path/to/gdb /bin/ksh > ==17730== and then give GDB the following command > ==17730== target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17730 > ==17730== --pid is optional if only one valgrind process is running > ==17730== > ==17730== Syscall param mount(type) points to unaddressable byte(s) > ==17730== at 0x563377A: mount (in /usr/lib64/libc-2.17.so) > ==17730== by 0x493E58: fs3d_mount (fs3d.c:115) > ==17730== by 0x493C8B: fs3d (fs3d.c:57) > ==17730== by 0x423E41: sh_init (init.c:1302) > ==17730== by 0x405CD3: sh_main (main.c:141) > ==17730== by 0x405B84: main (pmain.c:45) > ==17730== Address 0x0 is not stack'd, malloc'd or (recently) free'd > ==17730== > ==17730== (action on error) vgdb me ... > ==17730== Continuing ... > /etc/profile: line 28: write to 3 failed [No space left on device] > ---8<--- > > ---terminal 2--- > (gdb) c > Continuing. > ^C > Program received signal SIGTRAP, Trace/breakpoint trap. > 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6 > (gdb) bt > #0 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6 > #1 0x000000000041e73d in sh_done (ptr=0x793360 <sh>, sig=255) at > /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/fault.c:665 > #2 0x0000000000407407 in exfile (shp=0x4542, iop=0xff, fno=0) at > /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:604 > #3 0x0000000000405c43 in sh_source (shp=0x793360 <sh>, iop=0x0, > file=0x524804 <e_sysprofile> "/etc/profile") > at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:109 > #4 0x00000000004060e4 in sh_main (ac=2, av=0xfff000498, userinit=0x0) > at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:202 > #5 0x0000000000405b85 in main (argc=2, argv=0xfff000498) at > /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/pmain.c:45 > (gdb) > ---8<---

Currently, running ksh under ASan without the ASAN_OPTIONS variable set to 'detect_leaks=0' usually ends with ASan complaining about a memory leak in defpathinit() (this leak doesn't grow in a loop, so no regression test was added to leaks.sh). Reproducer: $ ENV=/dev/null arch/*/bin/ksh $ cp -? cp: invalid option -- '?' Try 'cp --help' for more information. $ exit ================================================================= ==225132==ERROR: LeakSanitizer: detected memory leaks Direct leak of 85 byte(s) in 1 object(s) allocated from: #0 0x7f6dab42d459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154 ksh93#1 0x5647b77fe144 in sh_calloc /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/init.c:265 ksh93#2 0x5647b788fea9 in path_addcomp /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:1567 ksh93#3 0x5647b78911ed in path_addpath /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:1705 ksh93#4 0x5647b7888e82 in defpathinit /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:442 ksh93#5 0x5647b78869f3 in ondefpath /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:67 --- cut --- SUMMARY: AddressSanitizer: 174 byte(s) leaked in 2 allocation(s). src/cmd/ksh93/sh/path.c: - Move the code for allocating defpath from defpath_init() into its own dedicated function called std_path(). This function is called by defpath_init and onstdpath to obtain the current string stored in the defpath variable. This bugfix is adapted from a fork of ksh2020: l0stman/ksh@db5c83a

Currently, running ksh under ASan without the ASAN_OPTIONS variable set to 'detect_leaks=0' usually ends with ASan complaining about a memory leak in defpathinit() (this leak doesn't grow in a loop, so no regression test was added to leaks.sh). Reproducer: $ ENV=/dev/null arch/*/bin/ksh $ cp -? cp: invalid option -- '?' Try 'cp --help' for more information. $ exit ================================================================= ==225132==ERROR: LeakSanitizer: detected memory leaks Direct leak of 85 byte(s) in 1 object(s) allocated from: #0 0x7f6dab42d459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154 ksh93#1 0x5647b77fe144 in sh_calloc /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/init.c:265 ksh93#2 0x5647b788fea9 in path_addcomp /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:1567 ksh93#3 0x5647b78911ed in path_addpath /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:1705 ksh93#4 0x5647b7888e82 in defpathinit /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:442 ksh93#5 0x5647b78869f3 in ondefpath /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:67 --- cut --- SUMMARY: AddressSanitizer: 174 byte(s) leaked in 2 allocation(s). src/cmd/ksh93/sh/path.c: - Move the code for allocating defpath from defpathinit() into its own dedicated function called std_path(). This function is called by defpathinit() and onstdpath() to obtain the current string stored in the defpath variable. This bugfix is adapted from a fork of ksh2020: l0stman/ksh@db5c83a

Currently, running ksh under ASan without the ASAN_OPTIONS variable set to 'detect_leaks=0' usually ends with ASan complaining about a memory leak in defpathinit() (this leak doesn't grow in a loop, so no regression test was added to leaks.sh). Reproducer: $ ENV=/dev/null arch/*/bin/ksh $ cp -? cp: invalid option -- '?' Try 'cp --help' for more information. $ exit ================================================================= ==225132==ERROR: LeakSanitizer: detected memory leaks Direct leak of 85 byte(s) in 1 object(s) allocated from: #0 0x7f6dab42d459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154 ksh93#1 0x5647b77fe144 in sh_calloc /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/init.c:265 ksh93#2 0x5647b788fea9 in path_addcomp /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:1567 ksh93#3 0x5647b78911ed in path_addpath /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:1705 ksh93#4 0x5647b7888e82 in defpathinit /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:442 ksh93#5 0x5647b78869f3 in ondefpath /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:67 --- cut --- SUMMARY: AddressSanitizer: 174 byte(s) leaked in 2 allocation(s). src/cmd/ksh93/sh/path.c: - Move the code for allocating defpath from defpathinit() into its own dedicated function called std_path(). This function is called by defpathinit() and ondefpath() to obtain the current string stored in the defpath variable. This bugfix is adapted from a fork of ksh2020: l0stman/ksh@db5c83a

Currently, running ksh under ASan without the ASAN_OPTIONS variable set to 'detect_leaks=0' usually ends with ASan complaining about a memory leak in defpathinit() (this leak doesn't grow in a loop, so no regression test was added to leaks.sh). Reproducer: $ ENV=/dev/null arch/*/bin/ksh $ cp -? cp: invalid option -- '?' Try 'cp --help' for more information. $ exit ================================================================= ==225132==ERROR: LeakSanitizer: detected memory leaks Direct leak of 85 byte(s) in 1 object(s) allocated from: #0 0x7f6dab42d459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154 #1 0x5647b77fe144 in sh_calloc /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/init.c:265 #2 0x5647b788fea9 in path_addcomp /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:1567 #3 0x5647b78911ed in path_addpath /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:1705 #4 0x5647b7888e82 in defpathinit /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:442 #5 0x5647b78869f3 in ondefpath /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/path.c:67 --- cut --- SUMMARY: AddressSanitizer: 174 byte(s) leaked in 2 allocation(s). Analysis: The previous code was leaking memory because defpathinit() returns a pointer from path_addpath(), which has memory allocated to it in path_addcomp(). This is the code ASan traced as having allocated memory: 442: return(path_addpath((Pathcomp_t*)0,(defpath),PATH_PATH)); In path_addpath(): 1705: first = path_addcomp(first,old,cp,type); [...] 1729: return(first); In path_addcomp(): 1567: pp = sh_newof((Pathcomp_t*)0,Pathcomp_t,1,len+1); The ondefpath() function doesn't save a reference to the pointer returned by defpathinit(), which causes the memory leak: 66: if(!defpath) 67: defpathinit(); The changes in this commit avoid this problem by setting the defpath variable without also calling path_addpath(). src/cmd/ksh93/sh/path.c: - Move the code for allocating defpath from defpathinit() into its own dedicated function called std_path(). This function is called by defpathinit() and ondefpath() to obtain the current string stored in the defpath variable. This bugfix is adapted from a fork of ksh2020: l0stman/ksh@db5c83a6

This commit adds two fixes for the trap command: l0stman/ksh@7da7c97 l0stman/ksh@2033375 The changes are as follows: - sh_sigreset(): Fixed a few memory leaks inside of this function. One of the leaks can be reproduced under ASan by trying to run a non-existent test with the shtests script: $ bin/shtests nosuchtest Direct leak of 22 byte(s) in 1 object(s) allocated from: #0 0x7f56a380d279 in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:145 ksh93#1 0x55ae1dc22ac3 in _ast_strdup /home/johno/GitRepos/KornShell/ksh/src/lib/libast/string/strdup.c:64 ksh93#2 0x55ae1da6cf5e in sh_strdup /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/init.c:270 ksh93#3 0x55ae1db7e964 in b_trap /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/trap.c:172 --- cut --- - job_chldtrap(): Fixed a use after free bug in the for loop. The string pointed to by sh.st.trapcom[SIGCHLD] may be freed from memory after sh_trap(), so it must be reread each time sh_trap() is called from within the for loop.

This commit adds two fixes for the trap command: l0stman/ksh@7da7c97 l0stman/ksh@2033375 The changes are as follows: - sh_sigreset(): Fixed a few memory leaks inside of this function. One of the leaks can be reproduced under ASan by trying to run a non-existent test with the shtests script: $ bin/shtests nosuchtest Direct leak of 22 byte(s) in 1 object(s) allocated from: #0 0x7f56a380d279 in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:145 ksh93#1 0x55ae1dc22ac3 in _ast_strdup /home/johno/GitRepos/KornShell/ksh/src/lib/libast/string/strdup.c:64 ksh93#2 0x55ae1da6cf5e in sh_strdup /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/init.c:270 ksh93#3 0x55ae1db7e964 in b_trap /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/trap.c:172 --- cut --- - job_chldtrap(): Fixed a use after free bug in the for loop. The string pointed to by sh.st.trapcom[SIGCHLD] may be freed from memory after sh_trap(), so it must be reobtained each time sh_trap() is called from within the for loop.

I didn't trust this back in e3d7bf1 (which disabled it for interactive shells) and I trust it less now. In af6a32d/6b380572, this was also disabled for virtual subshells as it caused program flow corruption there. Now, on macOS 10.14.6, a crash occurs when repeatedly running a command with this optimisation: $ ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' 0 1 2 3 4 5 6 7 Illegal instruction Oddly enough it seems that I can only reproduce this crash on macOS -- not on Linux, OpenBSD, or Solaris. It could be a macOS bug, particularly given the odd message in the stack trace below. I've had enough, though. Out it comes. Things now work fine, the reproducer is fixed on macOS, and it didn't optimise much anyway. The double-fork issue discussed in e3d7bf1 remains. ________ For future reference, here's an lldb debugger session with a stack trace. It crashes on calling calloc() (via sh_calloc(), via sh_newof()) in jobsave_create(). This is not an invalid pointer problem as we're allocating new memory, so it does look like an OS bug. The "BUG IN CLIENT OF LIBPLATFORM" message is interesting. $ lldb -- arch/*/bin/ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' (lldb) target create "arch/darwin.i386-64/bin/ksh" Current executable set to 'arch/darwin.i386-64/bin/ksh' (x86_64). (lldb) settings set -- target.run-args "-c" "for((i=0;i<100;i++));do print -n \"$i \";(sleep 1&);done" (lldb) run error: shell expansion failed (reason: lldb-argdumper exited with error 2). consider launching with 'process launch'. (lldb) process launch Process 35038 launched: '/usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh' (x86_64) 0 1 2 3 4 5 6 7 8 9 Process 35038 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 libsystem_platform.dylib`_os_unfair_lock_recursive_abort: -> 0x7fff70deb1c2 <+23>: ud2 libsystem_platform.dylib`_os_unfair_lock_unowned_abort: 0x7fff70deb1c4 <+0>: movl %edi, %eax 0x7fff70deb1c6 <+2>: leaq 0x1a8a(%rip), %rcx ; "BUG IN CLIENT OF LIBPLATFORM: Unlock of an os_unfair_lock not owned by current thread" 0x7fff70deb1cd <+9>: movq %rcx, 0x361cb16c(%rip) ; gCRAnnotations + 8 Target 0: (ksh) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) * frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 frame #1: 0x00007fff70de7c9a libsystem_platform.dylib`_os_unfair_lock_lock_slow + 239 frame #2: 0x00007fff70daa3bd libsystem_malloc.dylib`tiny_malloc_should_clear + 188 frame #3: 0x00007fff70daa20f libsystem_malloc.dylib`szone_malloc_should_clear + 66 frame #4: 0x00007fff70dab444 libsystem_malloc.dylib`malloc_zone_calloc + 99 frame #5: 0x00007fff70dab3c4 libsystem_malloc.dylib`calloc + 30 frame #6: 0x000000010003fa5d ksh`sh_calloc(nmemb=1, size=16) at init.c:264:13 frame #7: 0x000000010004f8a6 ksh`jobsave_create(pid=35055) at jobs.c:272:8 frame #8: 0x000000010004ed42 ksh`job_reap(sig=20) at jobs.c:363:9 frame #9: 0x000000010004ff6f ksh`job_waitsafe(sig=20) at jobs.c:511:3 frame #10: 0x00007fff70de9b5d libsystem_platform.dylib`_sigtramp + 29 frame #11: 0x00007fff70d39ac4 libsystem_kernel.dylib`__fork + 12 frame #12: 0x00007fff70c57d80 libsystem_c.dylib`fork + 17 frame #13: 0x000000010009590d ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:1883:16 frame #14: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:2019:4 frame #15: 0x0000000100096c4f ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2213:9 frame #16: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2019:4 frame #17: 0x000000010001c23f ksh`exfile(iop=0x0000000100405750, fno=-1) at main.c:603:4 frame #18: 0x000000010001b23c ksh`sh_main(ac=3, av=0x00007ffeefbff4f0, userinit=0x0000000000000000) at main.c:365:2 frame #19: 0x0000000100000776 ksh`main(argc=3, argv=0x00007ffeefbff4f0) at pmain.c:45:9 frame #20: 0x00007fff70bfe3d5 libdyld.dylib`start + 1

The ASan crash in basic.sh when sourcing multiple files is caused by a bug that is similar to the crash fixed in 59a5672. This is the trace for the regression test crash (note that in order to see the trace, the 2>/dev/null redirect must be disabled): ==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100 WRITE of size 8 at 0x6150000005b0 thread T0 #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967 ksh93#1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349 ksh93#2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642 ksh93#3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613 ksh93#4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561 ksh93#5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586 ksh93#6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438 ksh93#7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635 ksh93#8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 ksh93#9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 ksh93#10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932 ksh93#11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651 ksh93#12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 ksh93#13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 ksh93#14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604 ksh93#15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369 ksh93#16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41 ksh93#17 0x7f637b4db2cf (/usr/lib/libc.so.6+0x232cf) ksh93#18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389) ksh93#19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115 Code in question: https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968 To avoid any more similar crashes, all of the fixes introduced in 7e317c5 that set slp->slptr to null have been improved with the fix in 59a5672.

The ASan crash in basic.sh when sourcing multiple files is caused by a bug that is similar to the crash fixed in 59a5672. This is the trace for the regression test crash (note that in order to see the trace, the 2>/dev/null redirect must be disabled): ==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100 WRITE of size 8 at 0x6150000005b0 thread T0 #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967 #1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349 #2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642 #3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613 #4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561 #5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586 #6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438 #7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635 #8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 #9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 #10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932 #11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651 #12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 #13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 #14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604 #15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369 #16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41 #17 0x7f637b4db2cf (/usr/lib/libc.so.6+0x232cf) #18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389) #19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115 Code in question: https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968 To avoid any more similar crashes, all of the fixes introduced in 7e317c5 that set slp->slptr to null have been improved with the fix in 59a5672.

The ASan crash in basic.sh when sourcing multiple files is caused by a bug that is similar to the crash fixed in f24040e. This is the trace for the regression test crash (note that in order to see the trace, the 2>/dev/null redirect must be disabled): ==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100 WRITE of size 8 at 0x6150000005b0 thread T0 #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967 #1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349 #2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642 #3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613 #4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561 #5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586 #6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438 #7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635 #8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 #9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 #10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932 #11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651 #12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 #13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 #14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604 #15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369 #16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41 #17 0x7f637b4db2cf (/usr/lib/libc.so.6+0x232cf) #18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389) #19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115 Code in question: https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968 To avoid any more similar crashes, all of the fixes introduced in 69d37d5 that set slp->slptr to null have been improved with the fix in f24040e.

The isaname, isaletter, isadigit, isexp and ismeta macros don't check if c is a negative value before accessing sh_lexstates. This can result in ASan crashing because of a buffer overflow in quoting2.sh when running in a multibyte locale: test quoting2(C.UTF-8) begins at 2022-09-23+14:03:12 ================================================================= ==262224==ERROR: AddressSanitizer: global-buffer-overflow on address 0x557b201a451f at pc 0x557b1fe5e6fc bp 0x7fffcf1ac700 sp 0x7fffcf1ac6f8 READ of size 1 at 0x557b201a451f thread T0 #0 0x557b1fe5e6fb in sh_fmtq /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/string.c:341:5 ksh93#1 0x557b1fe6098c in sh_fmtqf /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/string.c:473:10 ksh93#2 0x557b1ff08dc0 in extend /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/print.c:998:14 ksh93#3 0x557b2008a56c in sfvprintf /home/johno/GitRepos/KornShell/ksh/src/lib/libast/sfio/sfvprintf.c:531:8 ksh93#4 0x557b2005b7f7 in sfprintf /home/johno/GitRepos/KornShell/ksh/src/lib/libast/sfio/sfprintf.c:31:7 ksh93#5 0x557b1ff04272 in b_print /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/print.c:343:4 ksh93#6 0x557b1ff04ebf in b_printf /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/print.c:148:9 ksh93#7 0x557b1fe8d9a7 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1261:21 ksh93#8 0x557b1fe7a7cf in sh_subshell /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/subshell.c:652:4 ksh93#9 0x557b1fdedc0d in comsubst /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:2207:9 ksh93#10 0x557b1fdefc79 in varsub /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:1181:3 ksh93#11 0x557b1fde3bef in copyto /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:620:21 ksh93#12 0x557b1fde0b07 in sh_mactrim /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:169:2 ksh93#13 0x557b1fe05ab6 in nv_setlist /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/name.c:280:9 ksh93#14 0x557b1fe8a7e8 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1051:7 ksh93#15 0x557b1fe95b85 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1940:5 ksh93#16 0x557b1fe99ea6 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:2271:10 ksh93#17 0x557b1fd23b04 in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604:4 ksh93#18 0x557b1fd1fe10 in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369:2 ksh93#19 0x557b1fd1d585 in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41:9 ksh93#20 0x7f55d5b5028f (/usr/lib/libc.so.6+0x2328f) (BuildId: 26c81e7e05ebaf40bac3523b7d76be0cd71fad82) ksh93#21 0x7f55d5b50349 in __libc_start_main (/usr/lib/libc.so.6+0x23349) (BuildId: 26c81e7e05ebaf40bac3523b7d76be0cd71fad82) ksh93#22 0x557b1fc158d4 in _start /build/glibc/src/glibc/csu/../sysdeps/x86_64/start.S:115 src/cmd/ksh93/include/lexstates.h: - Check if c is negative before accessing sh_lexstates. Backported from ksh2020: att@a7013320. I'll note that later in ksh2020 these macros became functions: att@adc589de. I didn't backport that commit because it requires the C99 bool type to avoid compiler warnings.

The name.c change in that commit was ineffective at fixing the crash; I was misled by the crash being intermittent. The same crash in path_unsetfpath() continues to occur sometimes when FPATH is set in bin/package. This commit special-cases FPATH in sh_reinit(), which is used to (partially) reinitialise the shell after forking and before running a shell script without a hashbang path. It is a bit of a hack, but then again the entire sh_reinit() thing is a hack that I plan to get rid of eventually; for now, it just needs to not crash. src/cmd/ksh93/sh/name.c: sh_envnolocal(): - Revert the ineffective fix. src/cmd/ksh93/sh/init.c: sh_reinit(): - Before starting reinit, save FPATH's value if it has the export attribute, then unset FPATH. - After reinit is done, restore FPATH's value and export attribute if the value was saved. This should reinitialise sh.fpathdict. Fingers crossed...

When starting up ksh with ASan the following crash occurs during init: ==61757==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffff44090e0 at pc 0x55555560080b bp 0x7fffffffbbf0 sp 0x7fffffffbbe0 READ of size 16 at 0x7ffff44090e0 thread T0 #0 0x55555560080a in put_rand /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/init.c:688 ksh93#1 0x5555555b5747 in nv_putv /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/nvdisc.c:144 ksh93#2 0x555555668b17 in nv_putval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/name.c:1641 ksh93#3 0x55555560bac7 in nv_init /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/init.c:1914 ksh93#4 0x555555605ff9 in sh_init /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/init.c:1365 --- cut --- Address 0x7ffff44090e0 is located in stack of thread T0 at offset 32 in frame #0 0x55555560a6d9 in nv_init /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/init.c:1836 This frame has 1 object(s): [32, 40) 'd' (line 1837) <== Memory access at offset 32 partially overflows this variable src/cmd/ksh93/sh/init.c: - nv_init(): The 'd' variable used for $SECONDS and $RANDOM must also be Sfdouble_t to avoid misalignment and buffer overflows.

Under ASan a buffer overflow in tab completion causes an intermittent crash when using command completion on certain multibyte filenames. This can cause a test in pty.sh to fail, albeit rarely. A reproducer has been provided below: $ touch /tmp/ダーツ $ ls /tmp/ダー<Press Tab once or twice> ==135434==ERROR: AddressSanitizer: global-buffer-overflow on address 0x5555558fe87c at pc 0x55555574751b bp 0x7fffffffb100 sp 0x7fffffffb0f0 READ of size 1 at 0x5555558fe87c thread T0 #0 0x55555574751a in find_begin /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/edit/completion.c:229 ksh93#1 0x55555574801c in ed_expand /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/edit/completion.c:323 ksh93#2 0x5555556e12df in escape /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/edit/emacs.c:979 ksh93#3 0x5555556dd834 in ed_emacsread /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/edit/emacs.c:628 --- cut --- This crash can also manifest in the pty tests, as seen below (it's the third test failure): test pty(C.UTF-8) begins at 2024-01-22+11:40:14 pty.sh[1027]: FAIL: suspend a blocked write to a FIFO: line 1040: expected "^\^C.*: testfifo: cannot create \[.*\]\r\n$", got "^C" pty.sh[1027]: FAIL: suspend a blocked write to a FIFO: line 1041: read timeout pty.sh[1188]: FAIL: vi completion from wide produces corrupt characters: line 1195: expected "^:test-1: cd vitest/aあb/\r\n$", got ":test-1: cd vitest/a=================================================================\r\n" test pty(C.UTF-8) failed at 2024-01-22+11:41:17 with exit code 3 [ 60 tests 3 errors ] (As a side note, the other test failures and the lockup that occur are caused by ksh using signal(2) rather than POSIX's sigaction(2). That will be fixed in a separate commit.) In the find_begin function the crash occurs at the ismeta macro, which attempts to access the ST_NAME lexstate using a value of c == 12540 (outside of the valid range): edit/completion.c: if(!inquote && ismeta(c)) include/lexstates.h: #define ismeta(c) ((c) < 0 ? 0 : sh_lexstates[ST_NAME][c] == S_BREAK) data/lexstates.c: /* * ST_NAME * This state is for identifiers */ static const char sh_lexstate1[256] = src/cmd/ksh93/include/lexstates.h: - Modify the macros to reject excessive values for 'c' (re: 2f7faf6).

When ksh executes a script without a #! path (note that the AT&T team had a real disliking for #! paths), ksh forks and goes through a quick reinitialisation procedure. This is much faster than invoking a fully new shell but should have the same effect if it all works well. Unfortunately it's not worked all that well so far. Even after recent improvements (see referenced commits) I've been finding corner case problems. FYI, running a script without #! basically goes like this: * in path_spawn(), execve() fails with ENOEXEC because the file is not a binary executable and does not start with #! * this triggers 'case ENOEXEC:' which: * forks ksh * calls exscript() * exscript() cleans up & calls siglongjmp(*sh.jmplist,SH_JMPSCRIPT) * SH_JMPSCRIPT is the highest longjmp value, so *all* the previous sigsetjmp/sh_pushcontext calls are unwinded in reverse order, triggering all sorts of cleanup, state restoration, removal of local scopes, etc. * eventually, this lands us at the top sigsetjmp in sh_main() * sh_main() calls sh_reinit(), then resumes as if the shell had just been started This commit makes the following interrelated changes for the correct functioning of this procedure: 1. exscript() now exports the environment into a dedicated Stk_t buffer and sets environ[] to that. 2. Instead of re-using existing variables, sh_reinit() deletes everything and reinits all name-value trees from scratch, then re-imports the environment from environ[]. 3. Variable values that were imported from the environment are no longer treated specially with an NV_IMPORT attribute and the np->nvenv pointer to their value in environ[], fixing at least one crash.[*1] Details of the changes follow: src/cmd/ksh93/sh/path.c: - exscript(): Generate a new environ[] by activating a dedicated AST stack that will not be overwritten before calling sh_envgen(). This will allow sh_reinit() to delete all variables and then reimport the environment. The exporting must be done here, before siglongjmp, otherwise locally scoped exported variables won't be included (siglongjmp with SH_JMPSCRIPT triggers cleanup of all scopes). src/cmd/ksh93/sh/init.c: - sh_reinit(): Largely rewritten as follows. - Reset shell options first. This has the beneficial side effect of unsetting SH_RESTRICTED which interferes with unsetting certain variables, like PATH. - Remove workarounds for FPATH, SHLVL and tilde expansion disciplines; these will not be needed now. - Properly unset and delete all functions and built-ins. Since we now unset a function before deleting it, this should now free up their memory. (See nvdisc.c below for a change allowing removal of special built-ins.) - Properly unset all variables (which includes any associated discipline functions). Incorporate here the needed logic from sh_envnolocal() in name.c; most of it is unneeded (that function was previously used to cleanup local variables but has not been used for that for decades). So sh_envnolocal() is now unused. - Delete variables in a separate pass after unsetting variables and unsetting and deleting functions; this avoids use-after- free problems as well as possible "no parent" problems with namespace variables (e.g., .rc.status in our new kshrc.sh). - After all that, close and free up all function, alias, tracked alias, type and variable trees. - Free the contiguous built-in node space and the Init_t init context (with all the special variable discipline pointers). - Call nv_init (previously only called from sh_init) to reinitialise all of the above name-value stuff from scratch. It's the only way to be sure. - Re-import the environment as stored by exscript() above. - env_init(): - Per item 3 above and footnote 1 below, no longer set NV_IMPORT attribute and no longer point np->nvenv to the item in environ. - POSIX says, for 'environ': "Any application that directly modifies the pointers to which the environ variable points has undefined behavior."[*2] Yet, env_init() is indeed juggling the environ[] pointers to deal with variables that cannot be imported because their names are invalid (because they still need to be saved to be passed on to child processes). Replace the current approach with one where those env vars get allocated on the heap, pointed to by sh.save_env and counted by sh.save_env_n (renamed from sh.nenv). This only needs to be done once as ksh cannot use or change these variables. src/cmd/ksh93/sh/name.c: - sh_envgen(): Update to match env_init() change above. - pushnam() (called by sh_envgen()): Remove NV_IMPORT attribute check as per above and never get the value from the nvenv pointer -- simply always use nv_getval(). As of this change, the NV_IMPORT attribute is unused. The next commit will remove it and do related cleanups. - staknam(): is only called if value!=NULL, so remove that 'if'. - sh_envnolocal(): Removed. src/cmd/ksh93/sh/nvdisc.c: - assign(): Remove a check for the SH_INIT state bit that avoids freeing functions during sh_reinit(). This works fine now. - sh_addbuiltin(): Allow sh_reinit() to delete special builtins by checking for the SH_INIT state bit before throwing an error. src/cmd/ksh93/sh/nvtree.c: - outval(): Add a workaround for a use-after-free, introduced by the changes above, that occurred in the types.sh tests for #!-less scripts (types.sh:675-722). The use-after-free occurred here (abridged ASan trace follows; line numbers are current as of this commit): ==30849==ERROR: AddressSanitizer: heap-use-after-free [...] #0 in dttree dttree.c:393 #1 in sh_reinit init.c:1637 #2 in sh_main main.c:136 [...] The pointer was freed in the same loop via nv_delete() in outval: #0 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:[...]) #1 in nv_delete name.c:1318 #2 in outval nvtree.c:731 #3 in genvalue nvtree.c:905 #4 in walk_tree nvtree.c:1042 #5 in put_tree nvtree.c:1108 #6 in nv_putv nvdisc.c:144 #7 in _nv_unset name.c:2437 #8 in sh_reinit init.c:1645 #9 in sh_main main.c:136 [...] So, what happened was that the nv_delete() call on name.c:1318 (eventually resulting from the _nv_unset call on init.c:1645) freed the node pointed to by np, so that the next loop iteration crashed on line 1637 as the dtnext() macro now gets a freed np. Now, why on earth should _nv_unset() *ever* indirectly call nv_delete()? That's a question for another day; I suspect it may be a bug, or it may be needed for compound variables for some reason. For now, I'm adding a workaround: simply avoid calling nv_delete() if the SH_INIT state bit is on, indicating sh_reinit() is in the call stack. This allows the variables unset loop in sh_reinit() to continue without crashing. sh_reinit() handles deletion later anyway. src/cmd/ksh93/sh/main.c: - sh_main(): remove zeroing of sh.fun_depth and sh.dot_depth; these are known to be 0, coming from either sh_init() or sh_reinit(). ________ [*1] This NV_IMPORT/nvenv usage is a redundant holdout from ancient ksh code; the imported value is easily available as a normal shell variable value via nv_getval(). Plus, the nvenv pointer is overloaded with too many other purposes: so far I've discovered it's used for pointers to subarrays of arrays (multidimentional arrays), compound variables, builtins, and other things. This mess caused at least one crash in set_instance() (xec.c) due to incorrectly using that nvenv pointer. The current kshrc script triggers this. Reproducer: $ export PS1 $ bin/package use «0»26:…/src/ksh93/ksh[dev] $ typeset +x PS1 ...and crash. That is now fixed. [*2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/environ.html

The referenced commit left one test unexecuted because it crashes. Minimal reproducer: typeset -a arr=((a b c) 1) got=$( typeset -a arr=( ( ((a b c)1))) ) The crash occurs when the array is redefined in a subshell. Here are abridged ASan stack traces for the crash, for the use after free, and for when it was freed: ================================================================= ==73147==ERROR: AddressSanitizer: heap-use-after-free [snippage] READ of size 8 at 0x000107403eb0 thread T0 #0 0x104fded40 in nv_search nvdisc.c:1007 #1 0x104fbeb1c in nv_create name.c:860 #2 0x104fb8b9c in nv_open name.c:1440 #3 0x104fb1edc in nv_setlist name.c:309 #4 0x104fb4a30 in nv_setlist name.c:475 #5 0x105055b58 in sh_exec xec.c:1079 #6 0x105045cd4 in sh_subshell subshell.c:654 #7 0x104f92c1c in comsubst macro.c:2266 [snippage] 0x000107403eb0 is located 0 bytes inside of 80-byte region [snippage] freed by thread T0 here: #0 0x105c5ade4 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3ede4) #1 0x105261da0 in dtclose dtclose.c:52 #2 0x104f178cc in array_putval array.c:671 #3 0x104fd7f4c in nv_putv nvdisc.c:144 #4 0x104fbc5f0 in _nv_unset name.c:2435 #5 0x104fb3250 in nv_setlist name.c:364 #6 0x105055b58 in sh_exec xec.c:1079 #7 0x105045cd4 in sh_subshell subshell.c:654 #8 0x104f92c1c in comsubst macro.c:2266 [snippage] So the crash is caused because array_putval (array.c:671) calls dtclose, freeing ap->table, which is then reused after a recursive nv_setlist call via nv_open() -> nv_create() -> nv_search(). This only happens whwn we're in a virtual subshell. src/cmd/ksh93/sh/array.c: - array_putval(): When redefining an array in a virtual subshell, do not free the old ap->table; it will be needed by the parent shell environment.

McDutchie transferred this issue from another repository Jun 12, 2020

McDutchie added bug Something is not working help wanted Extra attention is needed blocker This had better be fixed before releasing labels Jun 12, 2020

McDutchie changed the title ~~>(Process substitution) utterly b0rken~~ >(Process substitution) b0rken when combined with redirection Jun 18, 2020

McDutchie closed this as completed in 0aa9e03 Jun 23, 2020

posguy99 mentioned this issue Jan 27, 2021

Current HEAD (8e45daea) fails to build on Catalina 10.15.7 #164

Closed

McDutchie mentioned this issue Feb 8, 2021

redirect output to process substitution fails ksh-community/ksh#18

Closed

posguy99 mentioned this issue Feb 24, 2021

Tab completion menu causes inconsistent editor state in emacs #71

Closed

JohnoKing mentioned this issue Apr 4, 2021

Process substitution as file name to redirection is not compiled by shcomp #165

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

>(Process substitution) b0rken when combined with redirection #2

>(Process substitution) b0rken when combined with redirection #2

McDutchie commented Jun 10, 2020

JohnoKing commented Jun 10, 2020

JohnoKing commented Jun 10, 2020 •

edited

Loading

McDutchie commented Jun 18, 2020

JohnoKing commented Jun 22, 2020 •

edited

Loading

McDutchie commented Jun 22, 2020 •

edited

Loading

McDutchie commented Jun 22, 2020

JohnoKing commented Jun 22, 2020 •

edited

Loading

>(Process substitution) b0rken when combined with redirection #2

>(Process substitution) b0rken when combined with redirection #2

Comments

McDutchie commented Jun 10, 2020

JohnoKing commented Jun 10, 2020

JohnoKing commented Jun 10, 2020 • edited Loading

McDutchie commented Jun 18, 2020

JohnoKing commented Jun 22, 2020 • edited Loading

McDutchie commented Jun 22, 2020 • edited Loading

McDutchie commented Jun 22, 2020

JohnoKing commented Jun 22, 2020 • edited Loading

JohnoKing commented Jun 10, 2020 •

edited

Loading

JohnoKing commented Jun 22, 2020 •

edited

Loading

McDutchie commented Jun 22, 2020 •

edited

Loading

JohnoKing commented Jun 22, 2020 •

edited

Loading