Skip to content

Commit

Permalink
Implement a portable fix for SIGCHLD crashes (#18)
Browse files Browse the repository at this point in the history
As previously reported in rhbz#1112306 (https://bugzilla.redhat.com/show_bug.cgi?id=1112306),
ksh may crash when receiving SIGCHLD because GCC's optimizer will fail to generate
`addl` and `sub` instructions to increment and decrement `job.in_critical` in the
`job_subsave` function. This bug *does* occur in GCC 10 with `-O2`, but not `-O1`;
it doesn't appear this bug has been fixed. As a reference, here is the relevant
debug assembly output of `job_subsave` when KSH is compiled with `CCFLAGS` set to
`-g -O1`:

0000000000034c97 <job_subsave>:

void *job_subsave(void)
{
   34c97:       53                      push   %rbx
        struct back_save *bp = new_of(struct back_save,0);
   34c98:       bf 18 00 00 00          mov    $0x18,%edi
   34c9d:       e8 34 4a 0a 00          callq  d96d6 <_ast_malloc>
   34ca2:       48 89 c3                mov    %rax,%rbx
        job_lock();
   34ca5:       83 05 3c 50 13 00 01    addl   $0x1,0x13503c(%rip)        # 169ce8 <job+0x28>
        *bp = bck;
   34cac:       66 0f 6f 05 4c 5a 13    movdqa 0x135a4c(%rip),%xmm0        # 16a700 <bck>
   34cb3:       00
   34cb4:       0f 11 00                movups %xmm0,(%rax)
   34cb7:       48 8b 05 52 5a 13 00    mov    0x135a52(%rip),%rax        # 16a710 <bck+0x10>
   34cbe:       48 89 43 10             mov    %rax,0x10(%rbx)
        bp->prev = bck.prev;
   34cc2:       48 8b 05 47 5a 13 00    mov    0x135a47(%rip),%rax        # 16a710 <bck+0x10>
   34cc9:       48 89 43 10             mov    %rax,0x10(%rbx)
        bck.count = 0;
   34ccd:       c7 05 29 5a 13 00 00    movl   $0x0,0x135a29(%rip)        # 16a700 <bck>
   34cd4:       00 00 00
        bck.list = 0;
   34cd7:       48 c7 05 26 5a 13 00    movq   $0x0,0x135a26(%rip)        # 16a708 <bck+0x8>
   34cde:       00 00 00 00
        bck.prev = bp;
   34ce2:       48 89 1d 27 5a 13 00    mov    %rbx,0x135a27(%rip)        # 16a710 <bck+0x10>
        job_unlock();
   34ce9:       8b 05 f9 4f 13 00       mov    0x134ff9(%rip),%eax        # 169ce8 <job+0x28>
   34cef:       83 e8 01                sub    $0x1,%eax
   34cf2:       89 05 f0 4f 13 00       mov    %eax,0x134ff0(%rip)        # 169ce8 <job+0x28>
   34cf8:       75 2b                   jne    34d25 <job_subsave+0x8e>
   34cfa:       8b 3d ec 4f 13 00       mov    0x134fec(%rip),%edi        # 169cec <job+0x2c>
   34d00:       85 ff                   test   %edi,%edi
   34d02:       74 21                   je     34d25 <job_subsave+0x8e>
   34d04:       c7 05 da 4f 13 00 01    movl   $0x1,0x134fda(%rip)        # 169ce8 <job+0x28>

When `-O2` is used instead of `-O1`, the `addl` and `sub` instructions for
incrementing and decrementing the lock are removed. GCC instead generates a
broken `mov` instruction for `job_lock` and removes the initial `sub` instruction
in job_unlock (this is also seen in Red Hat's bug report):

       job_lock();
       *bp = bck;
  37d7c:       66 0f 6f 05 7c 79 14    movdqa 0x14797c(%rip),%xmm0        # 17f700 <bck>
  37d83:       00
       struct back_save *bp = new_of(struct back_save,0);
  37d84:       49 89 c4                mov    %rax,%r12
       job_lock();
  37d87:       8b 05 5b 6f 14 00       mov    0x146f5b(%rip),%eax        # 17ece8 <job+0x28>
...
        job_unlock();
  37dc6:       89 05 1c 6f 14 00       mov    %eax,0x146f1c(%rip)        # 17ece8 <job+0x28>
  37dcc:       85 c0                   test   %eax,%eax
  37dce:       75 2b                   jne    37dfb <job_subsave+0x8b>

The original patch works around this bug by using the legacy `__sync_fetch_and_add/sub`
GCC builtins. This forces GCC to generate instructions that change the lock with
`lock addl`, `lock xadd` and `lock subl`:

       job_lock();
  37d9f:       f0 83 05 41 6f 14 00    lock addl $0x1,0x146f41(%rip)        # 17ece8 <job+0x28>
  37da6:       01
...
       job_unlock();
  37deb:       f0 0f c1 05 f5 6e 14    lock xadd %eax,0x146ef5(%rip)        # 17ece8 <job+0x28>
  37df2:       00
  37df3:       83 f8 01                cmp    $0x1,%eax
  37df6:       74 08                   je     37e00 <job_subsave+0x70>
...
  37e25:       74 11                   je     37e38 <job_subsave+0xa8>
  37e27:       f0 83 2d b9 6e 14 00    lock subl $0x1,0x146eb9(%rip)        # 17ece8 <job+0x28>

While this does work, it isn't portable. This patch implements a different
workaround for this compiler bug. If `job_lock` is put at the beginning of
`job_subsave`, GCC will generate the required `addl` and `sub` instructions:

       job_lock();
  37d67:       83 05 7a 5f 14 00 01    addl   $0x1,0x145f7a(%rip)        # 17dce8 <job+0x28>
...
        job_unlock();
  37dbb:       83 e8 01                sub    $0x1,%eax
  37dbe:       89 05 24 5f 14 00       mov    %eax,0x145f24(%rip)        # 17dce8 <job+0x28>

It is odd that moving a single line of code fixes this problem, although
GCC _should_ have generated these instructions from the original code anyway.
I'll note that this isn't the only way to get these instructions to generate.
The problem also seems to go away when inserting almost anything else inside
of the code for `job_subsave`. This is just a simple workaround for a strange
compiler bug.
  • Loading branch information
JohnoKing committed Jun 16, 2020
1 parent 764acef commit c258a04
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion src/cmd/ksh93/sh/jobs.c
Original file line number Diff line number Diff line change
Expand Up @@ -1973,8 +1973,14 @@ static int job_chksave(register pid_t pid)

void *job_subsave(void)
{
struct back_save *bp = new_of(struct back_save,0);
/*
* We must make a lock first before doing anything else,
* otherwise GCC will remove the job locking mechanism
* as a result of compiler optimization.
*/
job_lock();

struct back_save *bp = new_of(struct back_save,0);
*bp = bck;
bp->prev = bck.prev;
bck.count = 0;
Expand Down

0 comments on commit c258a04

Please sign in to comment.