Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertion failure in default garbage collector after bad call to set-cdr! #686

Closed
WillClinger opened this issue Jan 17, 2015 · 4 comments
Closed

Comments

@WillClinger
Copy link
Member

This bug occurs in IAssassin versions on both Linux and MacOSX, but I have
not been able to reproduce it in Petit Larceny.

In v0.97 and the most recent development version of v0.98b1, the following
input produces an assertion failure in the default garbage collector:

;;; Use this as standard input to recreate a gc bug.

(define x (list 1 2 3 4 5))
(begin (set-cdr! cdddr x) (cdr x))
q ; quits out of the debugger
(begin (set-cdr! (cdddr x) (cdr x)) 0)
x ; initiates printing of a circular list

The output looks like this, and is 6221 lines long including the error messages
written to standard error:

% ./larceny < cheneyBUG.sch
Larceny v0.98b1 (Jan 16 2015 18:21:45, precise:Posix:unified)
larceny.heap, built on Fri Jan 16 18:23:02 EST 2015

> x

> 

Error: set-cdr!: 0 is not a pair.
Entering debugger; type "?" for help.
debug> 
> 0

> (1
 2
 3
 4
 2
 3
 4
 2
 3
 4
[...]
 2
 3
 4
 2
 3
 4
 Larceny Panic: Sys/cheney.c;831: Assertion failed.
Abort trap

The output for v0.97 is almost exactly the same, but replaces the last "2 3 4"
and error message with this uninformative line:

Segmentation fault

Omitting the bogus call to set-cdr!, I am unable to reproduce the problem.

It looks to me as though the exception code for set-cdr!, possibly in
collaboration with the debugger, may be corrupting the heap.

@WillClinger
Copy link
Member Author

Following the last set of changes made on 16 January 2015, the bug described
above is no longer reproduced by the given sequence of input forms.

That set of changes altered src/Lib/Common/print.sch and several other files.
The altered write procedure may not reproduce the allocation pattern that reveals
the bug.

To resolve this issue, we'll probably have to work with a version ranging from v0.97
through the version created by my next-to-last push on 16 January 2015.

@WillClinger
Copy link
Member Author

This looks like a bug in the IAssassin code generator. From src/Asm/Sassy/sassy-instr.sch:

;;; single_tag_test ptrtag
;;; Leave zero flag set if hwreg contains a value with the given
;;;     3-bit tag.

(define-sassy-instr (ia86.single_tag_test hwreg x)
  (assert-intel-reg hwreg)
  `(lea ,$r.temp (& ,hwreg ,(- 8 x)))
  `(test    ,$r.temp.low 7))

;;; single_tag_test_ex ptrtag, exception_code
;;; Unless in unsafe mode, test the pointer in RESULT for the
;;; tag and signal an exception if it does not match.

(define-sassy-instr (ia86.single_tag_test_ex hwreg x y)
  (assert-intel-reg hwreg)
  (cond 
   ((not (unsafe-code))
    (let ((l0 (fresh-label))
          (l1 (fresh-label)))
      `(label ,l0)
      (ia86.single_tag_test hwreg x)
      `(jz short ,l1)
      (ia86.exception_continuable y 'short l0)
      `(label ,l1)))))

$r.temp is the same as SECOND, which is the eax register. Although the
machine register eax is not a root for garbage collection, the virtual register
SECOND is a root for garbage collection. Copying eax to the SECOND
slot of the globals structure is the first thing done by the millicode exception
handler.

So the code shown above is creating a corrupt tagged pointer in eax, which
is saved in a rootable location by the millicode exception handler, which breaks
the garbage collector's inviolable expectation that all tagged pointers are valid.
Here's the disassembled code for the set-cdr! procedure:

00000000  83FB08            cmp ebx,byte +0x8
00000003  7411              jz 0x16
00000005  C7452C08000000    mov dword [ebp+0x2c],0x8
0000000C  FF951C020000      call near [ebp+0x21c]
00000012  90                nop
00000013  90                nop
00000014  EBEA              jmp short 0x0
00000016  8D4107            lea eax,[ecx+0x7]
00000019  A807              test al,0x7
0000001B  7409              jz 0x26
0000001D  FF5504            call near [ebp+0x4]
00000020  0300              add eax,[eax]
00000022  0000              add [eax],al
00000024  EBF0              jmp short 0x16
00000026  8D4103            lea eax,[ecx+0x3]
00000029  894530            mov [ebp+0x30],eax
0000002C  89CB              mov ebx,ecx
0000002E  89D0              mov eax,edx
00000030  FF9504020000      call near [ebp+0x204]
00000036  90                nop
00000037  90                nop
00000038  895103            mov [ecx+0x3],edx
0000003B  C3                ret

The tag test starts at 0016. The jz 0x26 skips over the exception code, which is
at 001D. (Ignore the two instructions following the call instruction at 001D. Those
four bytes contain the exception code at 20. Don't worry, the exception handler
adjusts its return address to skip past those four bytes.)

@WillClinger
Copy link
Member Author

The hypothesis presented in my comment above is confirmed. The bug remains
when I insert a useless or instruction before the call to the millicode exception
handler. In the following code for set-cdr!, that useless or instruction is at 001D:

00000000  83FB08            cmp ebx,byte +0x8
00000003  7411              jz 0x16
00000005  C7452C08000000    mov dword [ebp+0x2c],0x8
0000000C  FF951C020000      call dword near [ebp+0x21c]
00000012  90                nop
00000013  90                nop
00000014  EBEA              jmp short 0x0
00000016  8D4107            lea eax,[ecx+0x7]
00000019  A807              test al,0x7
0000001B  740D              jz 0x2a
0000001D  09C0              or eax,eax
0000001F  FF5504            call dword near [ebp+0x4]
00000022  0300              add eax,[eax]
00000024  0000              add [eax],al
00000026  90                nop
00000027  90                nop
00000028  EBEC              jmp short 0x16
0000002A  8D4103            lea eax,[ecx+0x3]
0000002D  894530            mov [ebp+0x30],eax
00000030  89CB              mov ebx,ecx
00000032  89D0              mov eax,edx
00000034  FF9504020000      call dword near [ebp+0x204]
0000003A  90                nop
0000003B  90                nop
0000003C  895103            mov [ecx+0x3],edx
0000003F  C3                ret

Changing the or to an xor clears eax before calling the exception handler,
which eliminates the bug for the specific test shown in the original report.

It doesn't fully eliminate the bug, however, because the bug is actually in the
code generated by ia86.single_tag_test, and that bug might still show up in
procedures other than set-cdr!. The added instruction also increases heap
size by 0.6%, and executes another instruction in the code for the normal case.

So the bug isn't fixed yet, but I now know what's going on.

@WillClinger
Copy link
Member Author

Fixed in 8d35e47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant