-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ksh segfaults in job_chksave after receiving SIGCHLD #14
Conversation
Many thanks for the pull request and the explanation @aweeraman, but could you include the explanation (including all the reference URLs) in your commit message itself and do a force-push, please? See policy point 6 in the main README.md. I know the issue number does get included when I merge (if I choose to have a merge commit), but a link to a GitHub issue that is unlikely to be around in 20 years is insufficient. We need to make sure people can |
Upstreaming Debian patch: Prior to this update, the compiler optimization dropped parts from the ksh job locking mechanism from the binary code. As a consequence, ksh could terminate unexpectedly with a segmentation fault after it received the SIGCHLD signal. This update implements a fix to ensure the compiler does not drop parts of the ksh mechanism and the crash no longer occurs. Author: Michal Hlavinka mhlavink@redhat.com Origin: Red Hat fix, ksh-20120801-locking.patch Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1123467 Bug-Ubuntu: https://bugs.launchpad.net/bugs/1697501 Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=867181
dc85e2a
to
07cc71b
Compare
Makes sense. Just did a force push with the updated commit message. |
Also, I don't really get it (this is probably where it shows that my primary competence is the shell language itself, and in C I pretty much pretend to know what I'm doing sometimes ;)). I mean, isn't this really working around a compiler bug, not a bug in ksh? If something like |
So, given the documentation and the testing, it seems safe enough to take this one on faith; I've merged it with more data from the Ubuntu bug – in the merge commit message, so you don't have to jump through another hoop. I still would like to know why this isn't actually a compiler bug, though… |
Thanks. The new description is much better. This would be a case of the compiler optimizing the unary pre/post-fix increment / decrement operations in a way that affects the mutex. Difficult to say whether it is a compiler bug or not without a reproducible test case... |
The more I think about it, the more it seems obvious that commit 07cc71b (PR #14) is quite simply a workaround for a GCC optimiser bug, and (who knows?) possibly an old, long-fixed one, as the bug report is years old. The commit also caused ksh to fail to build on HP-UX B.11.11 with GCC 4.2.3 (hosted at polarhome.com), because it doesn't have __sync_fetch_and_add() and __sync_fetch_and_sub(). It may fail on other systems. The GCC documentation says these are legacy: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html HELP WANTED: what I would like best is if someone could come up with some way of detecting this optimiser bug and then error out with a message along the lines of "please upgrade your broken compiler". It would probably need to be a new iffe test. Meanwhile, let's try it this way for a while and see what happens: src/cmd/ksh93/include/jobs.h: - Restore original ksh version of job_lock()/job_unlock() macros. - Use the workaround version only if the compiler has the builtins __sync_fetch_and_add() and __sync_fetch_and_sub().
Thanks. After thinking more about what those macros actually do concretely, I now find it very hard to imagine that it is not an optimiser bug. |
@aweeraman, FYI: b7f48e8 (please click for full commit message) |
I didn't trust this back in e3d7bf1 (which disabled it for interactive shells) and I trust it less now. In af6a32d/6b380572, this was also disabled for virtual subshells as it caused program flow corruption there. Now, on macOS 10.14.6, a crash occurs when repeatedly running a command with this optimisation: $ ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' 0 1 2 3 4 5 6 7 Illegal instruction Oddly enough it seems that I can only reproduce this crash on macOS -- not on Linux, OpenBSD, or Solaris. It could be a macOS bug, particularly given the odd message in the stack trace below. I've had enough, though. Out it comes. Things now work fine, the reproducer is fixed on macOS, and it didn't optimise much anyway. The double-fork issue discussed in e3d7bf1 remains. ________ For future reference, here's an lldb debugger session with a stack trace. It crashes on calling calloc() (via sh_calloc(), via sh_newof()) in jobsave_create(). This is not an invalid pointer problem as we're allocating new memory, so it does look like an OS bug. The "BUG IN CLIENT OF LIBPLATFORM" message is interesting. $ lldb -- arch/*/bin/ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' (lldb) target create "arch/darwin.i386-64/bin/ksh" Current executable set to 'arch/darwin.i386-64/bin/ksh' (x86_64). (lldb) settings set -- target.run-args "-c" "for((i=0;i<100;i++));do print -n \"$i \";(sleep 1&);done" (lldb) run error: shell expansion failed (reason: lldb-argdumper exited with error 2). consider launching with 'process launch'. (lldb) process launch Process 35038 launched: '/usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh' (x86_64) 0 1 2 3 4 5 6 7 8 9 Process 35038 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 libsystem_platform.dylib`_os_unfair_lock_recursive_abort: -> 0x7fff70deb1c2 <+23>: ud2 libsystem_platform.dylib`_os_unfair_lock_unowned_abort: 0x7fff70deb1c4 <+0>: movl %edi, %eax 0x7fff70deb1c6 <+2>: leaq 0x1a8a(%rip), %rcx ; "BUG IN CLIENT OF LIBPLATFORM: Unlock of an os_unfair_lock not owned by current thread" 0x7fff70deb1cd <+9>: movq %rcx, 0x361cb16c(%rip) ; gCRAnnotations + 8 Target 0: (ksh) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) * frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 frame #1: 0x00007fff70de7c9a libsystem_platform.dylib`_os_unfair_lock_lock_slow + 239 frame #2: 0x00007fff70daa3bd libsystem_malloc.dylib`tiny_malloc_should_clear + 188 frame #3: 0x00007fff70daa20f libsystem_malloc.dylib`szone_malloc_should_clear + 66 frame #4: 0x00007fff70dab444 libsystem_malloc.dylib`malloc_zone_calloc + 99 frame #5: 0x00007fff70dab3c4 libsystem_malloc.dylib`calloc + 30 frame #6: 0x000000010003fa5d ksh`sh_calloc(nmemb=1, size=16) at init.c:264:13 frame #7: 0x000000010004f8a6 ksh`jobsave_create(pid=35055) at jobs.c:272:8 frame #8: 0x000000010004ed42 ksh`job_reap(sig=20) at jobs.c:363:9 frame #9: 0x000000010004ff6f ksh`job_waitsafe(sig=20) at jobs.c:511:3 frame #10: 0x00007fff70de9b5d libsystem_platform.dylib`_sigtramp + 29 frame #11: 0x00007fff70d39ac4 libsystem_kernel.dylib`__fork + 12 frame #12: 0x00007fff70c57d80 libsystem_c.dylib`fork + 17 frame #13: 0x000000010009590d ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:1883:16 frame #14: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:2019:4 frame #15: 0x0000000100096c4f ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2213:9 frame #16: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2019:4 frame #17: 0x000000010001c23f ksh`exfile(iop=0x0000000100405750, fno=-1) at main.c:603:4 frame #18: 0x000000010001b23c ksh`sh_main(ac=3, av=0x00007ffeefbff4f0, userinit=0x0000000000000000) at main.c:365:2 frame #19: 0x0000000100000776 ksh`main(argc=3, argv=0x00007ffeefbff4f0) at pmain.c:45:9 frame #20: 0x00007fff70bfe3d5 libdyld.dylib`start + 1
I didn't trust this back in e3d7bf1 (which disabled it for interactive shells) and I trust it less now. In af6a32d/6b380572, this was also disabled for virtual subshells as it caused program flow corruption there. Now, on macOS 10.14.6, a crash occurs when repeatedly running a command with this optimisation: $ ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' 0 1 2 3 4 5 6 7 Illegal instruction Oddly enough it seems that I can only reproduce this crash on macOS -- not on Linux, OpenBSD, or Solaris. It could be a macOS bug, particularly given the odd message in the stack trace below. I've had enough, though. Out it comes. Things now work fine, the reproducer is fixed on macOS, and it didn't optimise much anyway. The double-fork issue discussed in e3d7bf1 remains. ________ For future reference, here's an lldb debugger session with a stack trace. It crashes on calling calloc() (via sh_calloc(), via sh_newof()) in jobsave_create(). This is not an invalid pointer problem as we're allocating new memory, so it does look like an OS bug. The "BUG IN CLIENT OF LIBPLATFORM" message is interesting. $ lldb -- arch/*/bin/ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' (lldb) target create "arch/darwin.i386-64/bin/ksh" Current executable set to 'arch/darwin.i386-64/bin/ksh' (x86_64). (lldb) settings set -- target.run-args "-c" "for((i=0;i<100;i++));do print -n \"$i \";(sleep 1&);done" (lldb) run error: shell expansion failed (reason: lldb-argdumper exited with error 2). consider launching with 'process launch'. (lldb) process launch Process 35038 launched: '/usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh' (x86_64) 0 1 2 3 4 5 6 7 8 9 Process 35038 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 libsystem_platform.dylib`_os_unfair_lock_recursive_abort: -> 0x7fff70deb1c2 <+23>: ud2 libsystem_platform.dylib`_os_unfair_lock_unowned_abort: 0x7fff70deb1c4 <+0>: movl %edi, %eax 0x7fff70deb1c6 <+2>: leaq 0x1a8a(%rip), %rcx ; "BUG IN CLIENT OF LIBPLATFORM: Unlock of an os_unfair_lock not owned by current thread" 0x7fff70deb1cd <+9>: movq %rcx, 0x361cb16c(%rip) ; gCRAnnotations + 8 Target 0: (ksh) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) * frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 frame #1: 0x00007fff70de7c9a libsystem_platform.dylib`_os_unfair_lock_lock_slow + 239 frame #2: 0x00007fff70daa3bd libsystem_malloc.dylib`tiny_malloc_should_clear + 188 frame #3: 0x00007fff70daa20f libsystem_malloc.dylib`szone_malloc_should_clear + 66 frame #4: 0x00007fff70dab444 libsystem_malloc.dylib`malloc_zone_calloc + 99 frame #5: 0x00007fff70dab3c4 libsystem_malloc.dylib`calloc + 30 frame #6: 0x000000010003fa5d ksh`sh_calloc(nmemb=1, size=16) at init.c:264:13 frame #7: 0x000000010004f8a6 ksh`jobsave_create(pid=35055) at jobs.c:272:8 frame #8: 0x000000010004ed42 ksh`job_reap(sig=20) at jobs.c:363:9 frame #9: 0x000000010004ff6f ksh`job_waitsafe(sig=20) at jobs.c:511:3 frame #10: 0x00007fff70de9b5d libsystem_platform.dylib`_sigtramp + 29 frame #11: 0x00007fff70d39ac4 libsystem_kernel.dylib`__fork + 12 frame #12: 0x00007fff70c57d80 libsystem_c.dylib`fork + 17 frame #13: 0x000000010009590d ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:1883:16 frame #14: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:2019:4 frame #15: 0x0000000100096c4f ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2213:9 frame #16: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2019:4 frame #17: 0x000000010001c23f ksh`exfile(iop=0x0000000100405750, fno=-1) at main.c:603:4 frame #18: 0x000000010001b23c ksh`sh_main(ac=3, av=0x00007ffeefbff4f0, userinit=0x0000000000000000) at main.c:365:2 frame #19: 0x0000000100000776 ksh`main(argc=3, argv=0x00007ffeefbff4f0) at pmain.c:45:9 frame #20: 0x00007fff70bfe3d5 libdyld.dylib`start + 1
I didn't trust this back in e3d7bf1 (which disabled it for interactive shells) and I trust it less now. In af6a32d/6b380572, this was also disabled for virtual subshells as it caused program flow corruption there. Now, on macOS 10.14.6, a crash occurs when repeatedly running a command with this optimisation: $ ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' 0 1 2 3 4 5 6 7 Illegal instruction Oddly enough it seems that I can only reproduce this crash on macOS -- not on Linux, OpenBSD, or Solaris. It could be a macOS bug, particularly given the odd message in the stack trace below. I've had enough, though. Out it comes. Things now work fine, the reproducer is fixed on macOS, and it didn't optimise much anyway. The double-fork issue discussed in e3d7bf1 remains. ________ For future reference, here's an lldb debugger session with a stack trace. It crashes on calling calloc() (via sh_calloc(), via sh_newof()) in jobsave_create(). This is not an invalid pointer problem as we're allocating new memory, so it does look like an OS bug. The "BUG IN CLIENT OF LIBPLATFORM" message is interesting. $ lldb -- arch/*/bin/ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' (lldb) target create "arch/darwin.i386-64/bin/ksh" Current executable set to 'arch/darwin.i386-64/bin/ksh' (x86_64). (lldb) settings set -- target.run-args "-c" "for((i=0;i<100;i++));do print -n \"$i \";(sleep 1&);done" (lldb) run error: shell expansion failed (reason: lldb-argdumper exited with error 2). consider launching with 'process launch'. (lldb) process launch Process 35038 launched: '/usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh' (x86_64) 0 1 2 3 4 5 6 7 8 9 Process 35038 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 libsystem_platform.dylib`_os_unfair_lock_recursive_abort: -> 0x7fff70deb1c2 <+23>: ud2 libsystem_platform.dylib`_os_unfair_lock_unowned_abort: 0x7fff70deb1c4 <+0>: movl %edi, %eax 0x7fff70deb1c6 <+2>: leaq 0x1a8a(%rip), %rcx ; "BUG IN CLIENT OF LIBPLATFORM: Unlock of an os_unfair_lock not owned by current thread" 0x7fff70deb1cd <+9>: movq %rcx, 0x361cb16c(%rip) ; gCRAnnotations + 8 Target 0: (ksh) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) * frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 frame #1: 0x00007fff70de7c9a libsystem_platform.dylib`_os_unfair_lock_lock_slow + 239 frame #2: 0x00007fff70daa3bd libsystem_malloc.dylib`tiny_malloc_should_clear + 188 frame #3: 0x00007fff70daa20f libsystem_malloc.dylib`szone_malloc_should_clear + 66 frame #4: 0x00007fff70dab444 libsystem_malloc.dylib`malloc_zone_calloc + 99 frame #5: 0x00007fff70dab3c4 libsystem_malloc.dylib`calloc + 30 frame #6: 0x000000010003fa5d ksh`sh_calloc(nmemb=1, size=16) at init.c:264:13 frame #7: 0x000000010004f8a6 ksh`jobsave_create(pid=35055) at jobs.c:272:8 frame #8: 0x000000010004ed42 ksh`job_reap(sig=20) at jobs.c:363:9 frame #9: 0x000000010004ff6f ksh`job_waitsafe(sig=20) at jobs.c:511:3 frame #10: 0x00007fff70de9b5d libsystem_platform.dylib`_sigtramp + 29 frame #11: 0x00007fff70d39ac4 libsystem_kernel.dylib`__fork + 12 frame #12: 0x00007fff70c57d80 libsystem_c.dylib`fork + 17 frame #13: 0x000000010009590d ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:1883:16 frame #14: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:2019:4 frame #15: 0x0000000100096c4f ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2213:9 frame #16: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2019:4 frame #17: 0x000000010001c23f ksh`exfile(iop=0x0000000100405750, fno=-1) at main.c:603:4 frame #18: 0x000000010001b23c ksh`sh_main(ac=3, av=0x00007ffeefbff4f0, userinit=0x0000000000000000) at main.c:365:2 frame #19: 0x0000000100000776 ksh`main(argc=3, argv=0x00007ffeefbff4f0) at pmain.c:45:9 frame #20: 0x00007fff70bfe3d5 libdyld.dylib`start + 1
The ASan crash in basic.sh when sourcing multiple files is caused by a bug that is similar to the crash fixed in 59a5672. This is the trace for the regression test crash (note that in order to see the trace, the 2>/dev/null redirect must be disabled): ==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100 WRITE of size 8 at 0x6150000005b0 thread T0 #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967 ksh93#1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349 ksh93#2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642 ksh93#3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613 ksh93#4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561 ksh93#5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586 ksh93#6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438 ksh93#7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635 ksh93#8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 ksh93#9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 ksh93#10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932 ksh93#11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651 ksh93#12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 ksh93#13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 ksh93#14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604 ksh93#15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369 ksh93#16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41 ksh93#17 0x7f637b4db2cf (/usr/lib/libc.so.6+0x232cf) ksh93#18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389) ksh93#19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115 Code in question: https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968 To avoid any more similar crashes, all of the fixes introduced in 7e317c5 that set slp->slptr to null have been improved with the fix in 59a5672.
The ASan crash in basic.sh when sourcing multiple files is caused by a bug that is similar to the crash fixed in 59a5672. This is the trace for the regression test crash (note that in order to see the trace, the 2>/dev/null redirect must be disabled): ==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100 WRITE of size 8 at 0x6150000005b0 thread T0 #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967 #1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349 #2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642 #3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613 #4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561 #5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586 #6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438 #7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635 #8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 #9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 #10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932 #11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651 #12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 #13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 #14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604 #15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369 #16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41 #17 0x7f637b4db2cf (/usr/lib/libc.so.6+0x232cf) #18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389) #19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115 Code in question: https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968 To avoid any more similar crashes, all of the fixes introduced in 7e317c5 that set slp->slptr to null have been improved with the fix in 59a5672.
The ASan crash in basic.sh when sourcing multiple files is caused by a bug that is similar to the crash fixed in f24040e. This is the trace for the regression test crash (note that in order to see the trace, the 2>/dev/null redirect must be disabled): ==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100 WRITE of size 8 at 0x6150000005b0 thread T0 #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967 #1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349 #2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642 #3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613 #4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561 #5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586 #6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438 #7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635 #8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 #9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 #10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932 #11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651 #12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 #13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 #14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604 #15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369 #16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41 #17 0x7f637b4db2cf (/usr/lib/libc.so.6+0x232cf) #18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389) #19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115 Code in question: https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968 To avoid any more similar crashes, all of the fixes introduced in 69d37d5 that set slp->slptr to null have been improved with the fix in f24040e.
The isaname, isaletter, isadigit, isexp and ismeta macros don't check if c is a negative value before accessing sh_lexstates. This can result in ASan crashing because of a buffer overflow in quoting2.sh when running in a multibyte locale: test quoting2(C.UTF-8) begins at 2022-09-23+14:03:12 ================================================================= ==262224==ERROR: AddressSanitizer: global-buffer-overflow on address 0x557b201a451f at pc 0x557b1fe5e6fc bp 0x7fffcf1ac700 sp 0x7fffcf1ac6f8 READ of size 1 at 0x557b201a451f thread T0 #0 0x557b1fe5e6fb in sh_fmtq /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/string.c:341:5 ksh93#1 0x557b1fe6098c in sh_fmtqf /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/string.c:473:10 ksh93#2 0x557b1ff08dc0 in extend /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/print.c:998:14 ksh93#3 0x557b2008a56c in sfvprintf /home/johno/GitRepos/KornShell/ksh/src/lib/libast/sfio/sfvprintf.c:531:8 ksh93#4 0x557b2005b7f7 in sfprintf /home/johno/GitRepos/KornShell/ksh/src/lib/libast/sfio/sfprintf.c:31:7 ksh93#5 0x557b1ff04272 in b_print /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/print.c:343:4 ksh93#6 0x557b1ff04ebf in b_printf /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/print.c:148:9 ksh93#7 0x557b1fe8d9a7 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1261:21 ksh93#8 0x557b1fe7a7cf in sh_subshell /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/subshell.c:652:4 ksh93#9 0x557b1fdedc0d in comsubst /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:2207:9 ksh93#10 0x557b1fdefc79 in varsub /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:1181:3 ksh93#11 0x557b1fde3bef in copyto /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:620:21 ksh93#12 0x557b1fde0b07 in sh_mactrim /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:169:2 ksh93#13 0x557b1fe05ab6 in nv_setlist /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/name.c:280:9 ksh93#14 0x557b1fe8a7e8 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1051:7 ksh93#15 0x557b1fe95b85 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1940:5 ksh93#16 0x557b1fe99ea6 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:2271:10 ksh93#17 0x557b1fd23b04 in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604:4 ksh93#18 0x557b1fd1fe10 in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369:2 ksh93#19 0x557b1fd1d585 in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41:9 ksh93#20 0x7f55d5b5028f (/usr/lib/libc.so.6+0x2328f) (BuildId: 26c81e7e05ebaf40bac3523b7d76be0cd71fad82) ksh93#21 0x7f55d5b50349 in __libc_start_main (/usr/lib/libc.so.6+0x23349) (BuildId: 26c81e7e05ebaf40bac3523b7d76be0cd71fad82) ksh93#22 0x557b1fc158d4 in _start /build/glibc/src/glibc/csu/../sysdeps/x86_64/start.S:115 src/cmd/ksh93/include/lexstates.h: - Check if c is negative before accessing sh_lexstates. Backported from ksh2020: att@a7013320. I'll note that later in ksh2020 these macros became functions: att@adc589de. I didn't backport that commit because it requires the C99 bool type to avoid compiler warnings.
Upstreaming Debian patch:
Prior to this update, the compiler optimization dropped parts from the ksh job
locking mechanism from the binary code. As a consequence, ksh could terminate
unexpectedly with a segmentation fault after it received the SIGCHLD signal.
This update implements a fix to ensure the compiler does not drop parts of the
ksh mechanism and the crash no longer occurs.
Author: Michal Hlavinka mhlavink@redhat.com
Origin: Red Hat fix, ksh-20120801-locking.patch
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1123467
Bug-Ubuntu: https://bugs.launchpad.net/bugs/1697501
Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=867181