debug no-optimize OS X make check-fast exposes segfault in rustc #9129

pnkfelix · 2013-09-11T21:14:46Z

Probably related to #9127 (or at least the stack traces look similar to me):

CFG_CONFIGURE_ARGS   := --enable-debug --disable-optimize --enable-ccache --enable-clang --prefix=~/opt/rust-dbg-nopt

% x86_64-apple-darwin/stage2/bin/rustc --cfg stage2   --cfg debug -Z no-debug-borrows --target=x86_64-apple-darwin  --lib -o x86_64-apple-darwin/stage2/lib/rustc/x86_64-apple-darwin/lib/librun_pass_stage2.dylib tmp/run_pass_stage2.rc
Bus error: 10

Update: Here is a isolated test case (importantly, it does not depend on pulling in libsyntax):

pub trait bomb { fn boom(@self, Ident); }
pub struct S;
impl bomb for S { fn boom(@self, _: Ident) { } }

pub struct Ident { name: uint }

// macro_rules! int3( () => ( unsafe { asm!( "int3" ); } ) )
macro_rules! int3( () => ( { } ) )

fn Ident_new() -> Ident {
    int3!();
    Ident {name: 0x6789ABCD }
}

pub fn light_fuse(fld: @bomb) {
    int3!();
    let f = || {
        int3!();
        fld.boom(Ident_new()); // *** 1
    };
    f();
}

fn main() {
    let b = @S as @bomb;
    light_fuse(b);
}

The text was updated successfully, but these errors were encountered:

pnkfelix · 2013-09-13T11:53:20Z

Bisecting the contents of run_pass_stage2.rc, the problem down to one particular test:

% x86_64-apple-darwin/stage2/bin/rustc --cfg stage2   --cfg debug -Z no-debug-borrows \
  --target=x86_64-apple-darwin  --lib \
  -o x86_64-apple-darwin/stage2/lib/rustc/x86_64-apple-darwin/lib/librun_pass_stage2.dylib \
  ../src/test/run-pass/issue-2216.rs
Segmentation fault: 11

pnkfelix · 2013-09-13T12:06:10Z

Further narrowing reveals that the problem seems to be when we have either of labelled ExprBreak or ExprAgain statements:

% cat /tmp/issue-2216.rs
#[cfg(labelled_loop)]
pub fn main() { 'foo: loop { loop 'foo; } }

#[cfg(labelled_break)]
pub fn main() { 'foo: loop { break 'foo; } }
%  x86_64-apple-darwin/stage2/bin/rustc --cfg labelled_break  /tmp/issue-2216.rs
Bus error: 10
%  x86_64-apple-darwin/stage2/bin/rustc --cfg labelled_loop  /tmp/issue-2216.rs
Bus error: 10
% grep CONFIGURE_ARGS config.mk
CFG_CONFIGURE_ARGS   := --enable-debug --disable-optimize --enable-clang --prefix=~/opt/rust-dbg-nopt
%

huonw · 2013-09-13T12:18:46Z

Is the segfault/bus error due to infinite recursion?

pnkfelix · 2013-09-13T12:30:51Z

@huonw no, I already considered that, but the stack isn't that large, especially once you try on one of the narrowed test cases.

My current hypothesis is that it has something to do with these bits of code, like:

(update; correction to below, I have cut-and-pasted some code into config.rs for instrumentation, but the origin of the code in question is fold.rs)
~~config.rs~~ fold.rs says:

        ExprBreak(ref opt_ident) => {
            // FIXME #6993: add fold_name to fold.... then cut out the
            // bogus Name->Ident->Name conversion.
            ExprBreak(opt_ident.map_move(|x| fld.fold_ident(Ident::new(x)).name))
        }
        ExprAgain(ref opt_ident) => {
            // FIXME #6993: add fold_name to fold....
            ExprAgain(opt_ident.map_move(|x| fld.fold_ident(Ident::new(x)).name))
        }

but I have not figured out what is actually wrong there yet, if anything.

pnkfelix · 2013-09-13T17:28:11Z

Further discovery: the problem seems to be the subexpression Ident::new(x) as a direct argument to fold_indent. If I lift the expression out and bind it before the call, things work.

The cute thing here is that since I am seeing the same bug on both ExprBreak and on ExprAgain, I can quite directly illustrate the difference (with much the same test case that I used before):

% cat /tmp/issue-2216.rs

#[cfg(labelled_again)]
pub fn main() { 'foo: loop { loop 'foo; } }

#[cfg(labelled_break)]
pub fn main() { 'foo: loop { break 'foo; } }

#[cfg(labelled_for)]
pub fn main() { 'foo: for i in range(1,10) { } }

The relevant modification to the code in fold.rs (as well as anywhere else I happened to cut-and-paste that code):

        ExprBreak(ref opt_ident) => {
            debug!("fold ExprBreak input: %?", opt_ident);
            let ret = ExprBreak(opt_ident.map_move(|x| {
                        // let i1 = Ident::new(x);
                        let i2 = fld.fold_ident(Ident::new(x)); // *** 1
                        i2.name
                    }));
            debug!("fold ExprBreak output: %?", ret);
            ret
        }
        ExprAgain(ref opt_ident) => {
            debug!("fold ExprAgain input: %?", opt_ident);
            let ret = ExprAgain(opt_ident.map_move(|x| {
                        let i1 = Ident::new(x);
                        let i2 = fld.fold_ident(i1); // *** 2
                        i2.name
                    }));
            debug!("fold ExprAgain input: %?", ret);
            ret
        }

The first starred case, with ExprBreak, will break. The second starred case, with ExprAgain, will work:

%   x86_64-apple-darwin/stage2/bin/rustc --cfg labelled_break  /tmp/issue-2216.rs
Bus error: 10
%   x86_64-apple-darwin/stage2/bin/rustc --cfg labelled_again  /tmp/issue-2216.rs
warning: no debug symbols in executable (-arch x86_64)
%

I'm guessing one next step would be to compare the generated code for the two different closures.

pnkfelix · 2013-09-13T17:37:16Z

cc @jbclements for the injection. cc @nikomatsakis for the potential bogosity in borrowck (right?)

metajack · 2013-09-13T17:41:15Z

This small program also causes a similar bus error, and @pnkfelix seemed to think it might be related:

fn main() {
    macro_rules! breakme(
        ($value: expr) => {
            break 'foo;
        }
    )

    breakme!("foo");
}

jbclements · 2013-09-13T18:13:03Z

@pnkfelix does this still occur for you on master? I just failed to duplicated it under OS X using 248765a, and I'm wondering if my fix for #9110 might also have fixed this bug.

jbclements · 2013-09-13T18:31:17Z

@metajack I can't reproduce your crash on 248765a ... in your case, I'm thinking maybe the next merge was the issue?

pnkfelix · 2013-09-13T19:56:44Z

@jbclements I'll check about master. Just to confirm, you did use the same set of CONFIGURE_ARGS that I had in the description, right? (Just double-checking.)

metajack · 2013-09-13T22:02:34Z

I think I am using only --enable-debug in my compiles.

brson · 2013-09-13T22:50:09Z

I also can't reproduce the problem with either @pnkfelix or @metajack's test cases, though I can reproduce the one in servo.

Servo is hitting this problem, so this is a workaround for a lack of a real solution.

nikomatsakis · 2013-09-13T23:46:54Z

@pnkfelix do we have any better idea what the generated code is doing that is causing to crash? could be a bug in borrowck, but it's hard to say

pnkfelix · 2013-09-13T23:51:41Z

@nikomatsakis no, I don't have a real idea yet. Kind of scary that so few of us can replicate in the small isolated cases (so far just me and @lkuper have said so). My plan for next step is to actually compare the two generated code sequences within the closure, but I did not get a chance to do that today. I will either attempt to do it over the weekend (might be tricky, got family in town) or on Monday.

Servo is hitting this problem, so this is a workaround for lack of a real solution. No tests because I couldn't actually reproduce the problem with either of the testcases in #9129

pnkfelix · 2013-09-16T11:08:37Z

Status report: I've made side-by-side comparisons of both the generated LLVM bitcode (as .ll disassembly) and also the generated machine code. Its too much material to post here, but this is the link: https://gist.github.com/pnkfelix/6579174

(You might notice some small differences between e.g. the .rs input and the generated LLVM, because the .rs input has some changes to e.g. insert int3 instructions so that I could look at the dynamic execution under gdb. The machine code I present in the gist should actually correspond to the .rs code.)

I do not know .ll well enough to immediately tell whether the bug lies in trans (i.e. if we are generated bogus input to LLVM) or if this is an LLVM bug (i.e. they are mistranslating our input). The problem seems to be that the generated code is not correctly extracting the address for the fld.fold_ident method in the broken case.

In particular: for the LLVM input of the broken case, the main differences are that:

the invocation of Ident::new occurs a bit later in the control flow, basically right before we do call void %17(%"struct.syntax::ast::Ident[#2]"* %i2, { i64, %tydesc*, i8*, i8*, i8 }* %11, %"struct.syntax::ast::Ident[#2]"* %2) which I infer is the invocation of fld.fold_ident.
In the working variant, the invocation of Ident::new is an LLVM call instruction; there is no explicit exception handling landing pad in the bitcode. In the broken variant, the invocation of Ident::new is an LLVM invoke instruction, with an explicit exception handler, i.e. unwind and cleanup code.
In both variants there is some amount of code separating the load of the address of fld.fold_ident into a temp variable and the subsequent call on that temp variable. In the working code, the intervening code is a pair of bitcasts and a call to @llvm.memcpy.p0i8.p0i8.i64 (but perhaps that ends up being an LLVM intrinsic?); in the broken code, the intervening code is a load of the argument to Ident::new and the actual invocation of Ident::new.

pnkfelix · 2013-09-16T13:41:55Z

At this point I'm pretty confident this is exposing a bug in LLVM.

After making a standalone test case (see updated bug description), I was able to generate bitcode from --save-temps, and then hand-edit the resulting .ll file to figure out which differences seem to actually matter to LLVM, using llc (as well as linking the result via the command line presented by rustc -Z print-link-args).

Starting from the broken .ll code, I discovered:

Compiling with -O0 was necessary to see breakage; putting in -O1 makes the problem go away.
Replacing the invoke instruction with a call instruction makes the problem go away.
I can use the -filetype=asm argument to llc to get inspectable assembly, with some helpful annotations indicating the purpose of certain instructions (yay), and then use ediff to get side-by-side comparisons of the two results (comparing the result when using call instruction for the working variant and the invoke instruction for the broken variant):

Here's the source input for this experiment: https://gist.github.com/pnkfelix/6580915 (the basis of the .ll code is the test in this bug description).

Working code fragment:

    .section    __TEXT,__text,regular,pure_instructions
    .align  4, 0x90
__ZN10light_fuse4anon7expr_fn2anE:      ## @_ZN10light_fuse4anon7expr_fn2anE
    .cfi_startproc
## BB#0:                                ## %function top level
    subq    $56, %rsp
Ltmp12:
    .cfi_def_cfa_offset 64
    movabsq $0, %rax
    leaq    40(%rsp), %rsi
    movq    32(%rdi), %rdi
    movq    (%rdi), %rcx
    movq    %rcx, 40(%rsp)
    movq    8(%rdi), %rcx
    movq    %rcx, 48(%rsp)
    movq    %rax, %rdi
    callq   __ZN20_$SP$bomb.$x27static9glue_take19h9f15e2dd62f678e4arE
    leaq    32(%rsp), %rdi
                                        ## implicit-def: RSI
    movq    48(%rsp), %rax
    movq    40(%rsp), %rcx
    movq    8(%rcx), %rcx
    movq    %rcx, 8(%rsp)           ## 8-byte Spill
    movq    %rax, (%rsp)            ## 8-byte Spill
    callq   __ZN9Ident_new16h4688b2c57373e544v0.0E
    leaq    32(%rsp), %rsi
    movq    (%rsp), %rdi            ## 8-byte Reload
    movq    8(%rsp), %rax           ## 8-byte Reload
    callq   *%rax

Broken code fragment

    .section    __TEXT,__text,regular,pure_instructions
    .align  4, 0x90
__ZN10light_fuse4anon7expr_fn2anE:      ## @_ZN10light_fuse4anon7expr_fn2anE
    .cfi_startproc
    .cfi_personality 155, _upcall_rust_personality
Leh_func_begin5:
    .cfi_lsda 16, Lexception5
## BB#0:                                ## %function top level
    subq    $88, %rsp
Ltmp15:
    .cfi_def_cfa_offset 96
    movq    32(%rdi), %rdi
    vmovups (%rdi), %xmm0
    vmovaps %xmm0, 64(%rsp)
    xorl    %eax, %eax
    movl    %eax, %edi
    leaq    64(%rsp), %rsi
    callq   __ZN20_$SP$bomb.$x27static9glue_take19h9f15e2dd62f678e4arE
    movq    64(%rsp), %rsi
    movq    72(%rsp), %rdi
    movq    8(%rsi), %rsi
Ltmp11:
    leaq    56(%rsp), %rcx
    movq    %rdi, 32(%rsp)          ## 8-byte Spill
    movq    %rcx, %rdi
                                        ## implicit-def: RCX
    movq    %rsi, 24(%rsp)          ## 8-byte Spill
    movq    %rcx, %rsi
    callq   __ZN9Ident_new16h4688b2c57373e544v0.0E
Ltmp12:
    jmp LBB5_1
LBB5_1:                                 ## %normal return
    leaq    56(%rsp), %rsi
    movq    32(%rsp), %rdi          ## 8-byte Reload
    movq    16(%rsp), %rax          ## 8-byte Reload
    callq   *%rax

Look at the spill and reload instructions above: In the working fragment, %rcx is spilled to 8(%rsp) and then that location is reloaded to %rax. In the broken fragment, if you try to trace backward to figure out what %rax is, it seems like nonsense to me: it tries to load it from 16(%rsp), but there is no spill to that location that I can see in this function. It seems to me that in the broken fragment, the corresponding register with the right value prior to the invocation of Indent::new is %rsi, but that register is spilled to 24(%rsp), not 16(%rsp).

(How much do you want to bet that this is some code in the implementation of the LLVM invoke instruction that is doing an incorrect arithmetic calculation...)

alexcrichton · 2013-09-16T14:08:36Z

@pnkfelix, I just updated LLVM last night, would you mind trying that to see if this problem is fixed now? Just in the rare case that they caught this so far :)

pnkfelix · 2013-09-16T15:07:44Z

@alexcrichton I'll do you one better: I grabbed a more recent LLVM and built a newer llc on its own.

Problem still reproduces with the latest version I can build. So I'll probably look into filing an LLVM bug; I'll probably venture onto their chat room soon.

Most recently tested LLVM version (which still has the problem):

commit e5c8c5a1bcecff7e2aa60672be6af2062ad63e6a
Author: Evgeniy Stepanov <eugeni.stepanov@gmail.com>
Date:   Mon Sep 16 13:24:32 2013 +0000

    [msan] Check return value of main().


    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190782 91177308-0d34-0410-b5e6-96231b3b80d8

alexcrichton · 2013-09-16T21:04:16Z

Ah well, one can always hope at least. Thanks for checking though!

pnkfelix · 2013-09-17T01:57:05Z

At this point I have filed a bug against LLVM, here:
http://llvm.org/bugs/show_bug.cgi?id=17261

Not much else we can do right now except (1.) stop using invoke, or (2.) fix the LLVM problem ourselves.

alexcrichton · 2013-12-17T07:43:51Z

The code in the initial report now works just fine, flagging as needstest

Closes #5806 Closes #5950 Closes #7178 Closes #8259 Closes #8578 Closes #8851 Closes #9129 Closes #10412

pnkfelix mentioned this issue Sep 13, 2013

debug no-optimize make check yields three failures #9127

Closed

brson added a commit to brson/rust that referenced this issue Sep 13, 2013

Work around a compiler crash folding labeled break. rust-lang#9129

c62919f

Servo is hitting this problem, so this is a workaround for a lack of a real solution.

brson mentioned this issue Sep 13, 2013

Work around a compiler crash folding labeled break. #9129 #9176

Closed

alexcrichton closed this as completed in 3e04d2e Dec 18, 2013

alexcrichton added a commit that referenced this issue Dec 19, 2013

Adding tests for closed issues

8c6315f

Closes #5806 Closes #5950 Closes #7178 Closes #8259 Closes #8578 Closes #8851 Closes #9129 Closes #10412

Endilll mentioned this issue Aug 8, 2023

invoke of fcn ptr under -O0 isnt correctly restoring spilled reg before callq llvm/llvm-project#17635

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

debug no-optimize OS X make check-fast exposes segfault in rustc #9129

debug no-optimize OS X make check-fast exposes segfault in rustc #9129

pnkfelix commented Sep 11, 2013

pnkfelix commented Sep 13, 2013

pnkfelix commented Sep 13, 2013

huonw commented Sep 13, 2013

pnkfelix commented Sep 13, 2013

pnkfelix commented Sep 13, 2013

pnkfelix commented Sep 13, 2013

metajack commented Sep 13, 2013

jbclements commented Sep 13, 2013

jbclements commented Sep 13, 2013

pnkfelix commented Sep 13, 2013

metajack commented Sep 13, 2013

brson commented Sep 13, 2013

nikomatsakis commented Sep 13, 2013

pnkfelix commented Sep 13, 2013

pnkfelix commented Sep 16, 2013

pnkfelix commented Sep 16, 2013

alexcrichton commented Sep 16, 2013

pnkfelix commented Sep 16, 2013

alexcrichton commented Sep 16, 2013

pnkfelix commented Sep 17, 2013

alexcrichton commented Dec 17, 2013

debug no-optimize OS X make check-fast exposes segfault in rustc #9129

debug no-optimize OS X make check-fast exposes segfault in rustc #9129

Comments

pnkfelix commented Sep 11, 2013

pnkfelix commented Sep 13, 2013

pnkfelix commented Sep 13, 2013

huonw commented Sep 13, 2013

pnkfelix commented Sep 13, 2013

pnkfelix commented Sep 13, 2013

pnkfelix commented Sep 13, 2013

metajack commented Sep 13, 2013

jbclements commented Sep 13, 2013

jbclements commented Sep 13, 2013

pnkfelix commented Sep 13, 2013

metajack commented Sep 13, 2013

brson commented Sep 13, 2013

nikomatsakis commented Sep 13, 2013

pnkfelix commented Sep 13, 2013

pnkfelix commented Sep 16, 2013

pnkfelix commented Sep 16, 2013

alexcrichton commented Sep 16, 2013

pnkfelix commented Sep 16, 2013

alexcrichton commented Sep 16, 2013

pnkfelix commented Sep 17, 2013

alexcrichton commented Dec 17, 2013