Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug no-optimize OS X make check-fast exposes segfault in rustc #9129

Closed
pnkfelix opened this issue Sep 11, 2013 · 21 comments
Closed

debug no-optimize OS X make check-fast exposes segfault in rustc #9129

pnkfelix opened this issue Sep 11, 2013 · 21 comments
Labels
E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added.

Comments

@pnkfelix
Copy link
Member

Probably related to #9127 (or at least the stack traces look similar to me):

CFG_CONFIGURE_ARGS   := --enable-debug --disable-optimize --enable-ccache --enable-clang --prefix=~/opt/rust-dbg-nopt
% x86_64-apple-darwin/stage2/bin/rustc --cfg stage2   --cfg debug -Z no-debug-borrows --target=x86_64-apple-darwin  --lib -o x86_64-apple-darwin/stage2/lib/rustc/x86_64-apple-darwin/lib/librun_pass_stage2.dylib tmp/run_pass_stage2.rc
Bus error: 10

Update: Here is a isolated test case (importantly, it does not depend on pulling in libsyntax):

pub trait bomb { fn boom(@self, Ident); }
pub struct S;
impl bomb for S { fn boom(@self, _: Ident) { } }

pub struct Ident { name: uint }

// macro_rules! int3( () => ( unsafe { asm!( "int3" ); } ) )
macro_rules! int3( () => ( { } ) )

fn Ident_new() -> Ident {
    int3!();
    Ident {name: 0x6789ABCD }
}

pub fn light_fuse(fld: @bomb) {
    int3!();
    let f = || {
        int3!();
        fld.boom(Ident_new()); // *** 1
    };
    f();
}

fn main() {
    let b = @S as @bomb;
    light_fuse(b);
}
@pnkfelix
Copy link
Member Author

Bisecting the contents of run_pass_stage2.rc, the problem down to one particular test:

% x86_64-apple-darwin/stage2/bin/rustc --cfg stage2   --cfg debug -Z no-debug-borrows \
  --target=x86_64-apple-darwin  --lib \
  -o x86_64-apple-darwin/stage2/lib/rustc/x86_64-apple-darwin/lib/librun_pass_stage2.dylib \
  ../src/test/run-pass/issue-2216.rs
Segmentation fault: 11

@pnkfelix
Copy link
Member Author

Further narrowing reveals that the problem seems to be when we have either of labelled ExprBreak or ExprAgain statements:

% cat /tmp/issue-2216.rs
#[cfg(labelled_loop)]
pub fn main() { 'foo: loop { loop 'foo; } }

#[cfg(labelled_break)]
pub fn main() { 'foo: loop { break 'foo; } }
%  x86_64-apple-darwin/stage2/bin/rustc --cfg labelled_break  /tmp/issue-2216.rs
Bus error: 10
%  x86_64-apple-darwin/stage2/bin/rustc --cfg labelled_loop  /tmp/issue-2216.rs
Bus error: 10
% grep CONFIGURE_ARGS config.mk
CFG_CONFIGURE_ARGS   := --enable-debug --disable-optimize --enable-clang --prefix=~/opt/rust-dbg-nopt
% 

@huonw
Copy link
Member

huonw commented Sep 13, 2013

Is the segfault/bus error due to infinite recursion?

@pnkfelix
Copy link
Member Author

@huonw no, I already considered that, but the stack isn't that large, especially once you try on one of the narrowed test cases.

My current hypothesis is that it has something to do with these bits of code, like:

(update; correction to below, I have cut-and-pasted some code into config.rs for instrumentation, but the origin of the code in question is fold.rs)
config.rs fold.rs says:

        ExprBreak(ref opt_ident) => {
            // FIXME #6993: add fold_name to fold.... then cut out the
            // bogus Name->Ident->Name conversion.
            ExprBreak(opt_ident.map_move(|x| fld.fold_ident(Ident::new(x)).name))
        }
        ExprAgain(ref opt_ident) => {
            // FIXME #6993: add fold_name to fold....
            ExprAgain(opt_ident.map_move(|x| fld.fold_ident(Ident::new(x)).name))
        }

but I have not figured out what is actually wrong there yet, if anything.

@pnkfelix
Copy link
Member Author

Further discovery: the problem seems to be the subexpression Ident::new(x) as a direct argument to fold_indent. If I lift the expression out and bind it before the call, things work.

The cute thing here is that since I am seeing the same bug on both ExprBreak and on ExprAgain, I can quite directly illustrate the difference (with much the same test case that I used before):

% cat /tmp/issue-2216.rs

#[cfg(labelled_again)]
pub fn main() { 'foo: loop { loop 'foo; } }

#[cfg(labelled_break)]
pub fn main() { 'foo: loop { break 'foo; } }

#[cfg(labelled_for)]
pub fn main() { 'foo: for i in range(1,10) { } }

The relevant modification to the code in fold.rs (as well as anywhere else I happened to cut-and-paste that code):

        ExprBreak(ref opt_ident) => {
            debug!("fold ExprBreak input: %?", opt_ident);
            let ret = ExprBreak(opt_ident.map_move(|x| {
                        // let i1 = Ident::new(x);
                        let i2 = fld.fold_ident(Ident::new(x)); // *** 1
                        i2.name
                    }));
            debug!("fold ExprBreak output: %?", ret);
            ret
        }
        ExprAgain(ref opt_ident) => {
            debug!("fold ExprAgain input: %?", opt_ident);
            let ret = ExprAgain(opt_ident.map_move(|x| {
                        let i1 = Ident::new(x);
                        let i2 = fld.fold_ident(i1); // *** 2
                        i2.name
                    }));
            debug!("fold ExprAgain input: %?", ret);
            ret
        }

The first starred case, with ExprBreak, will break. The second starred case, with ExprAgain, will work:

%   x86_64-apple-darwin/stage2/bin/rustc --cfg labelled_break  /tmp/issue-2216.rs
Bus error: 10
%   x86_64-apple-darwin/stage2/bin/rustc --cfg labelled_again  /tmp/issue-2216.rs
warning: no debug symbols in executable (-arch x86_64)
% 

I'm guessing one next step would be to compare the generated code for the two different closures.

@pnkfelix
Copy link
Member Author

cc @jbclements for the injection. cc @nikomatsakis for the potential bogosity in borrowck (right?)

@metajack
Copy link
Contributor

This small program also causes a similar bus error, and @pnkfelix seemed to think it might be related:

fn main() {
    macro_rules! breakme(
        ($value: expr) => {
            break 'foo;
        }
    )

    breakme!("foo");
}

@jbclements
Copy link
Contributor

@pnkfelix does this still occur for you on master? I just failed to duplicated it under OS X using 248765a, and I'm wondering if my fix for #9110 might also have fixed this bug.

@jbclements
Copy link
Contributor

@metajack I can't reproduce your crash on 248765a ... in your case, I'm thinking maybe the next merge was the issue?

@pnkfelix
Copy link
Member Author

@jbclements I'll check about master. Just to confirm, you did use the same set of CONFIGURE_ARGS that I had in the description, right? (Just double-checking.)

@metajack
Copy link
Contributor

I think I am using only --enable-debug in my compiles.

@brson
Copy link
Contributor

brson commented Sep 13, 2013

I also can't reproduce the problem with either @pnkfelix or @metajack's test cases, though I can reproduce the one in servo.

brson added a commit to brson/rust that referenced this issue Sep 13, 2013
Servo is hitting this problem, so this is a workaround for a lack of a real solution.
@nikomatsakis
Copy link
Contributor

@pnkfelix do we have any better idea what the generated code is doing that is causing to crash? could be a bug in borrowck, but it's hard to say

@pnkfelix
Copy link
Member Author

@nikomatsakis no, I don't have a real idea yet. Kind of scary that so few of us can replicate in the small isolated cases (so far just me and @lkuper have said so). My plan for next step is to actually compare the two generated code sequences within the closure, but I did not get a chance to do that today. I will either attempt to do it over the weekend (might be tricky, got family in town) or on Monday.

bors added a commit that referenced this issue Sep 14, 2013
Servo is hitting this problem, so this is a workaround for lack of a real solution.

No tests because I couldn't actually reproduce the problem with either of the testcases in #9129
@pnkfelix
Copy link
Member Author

Status report: I've made side-by-side comparisons of both the generated LLVM bitcode (as .ll disassembly) and also the generated machine code. Its too much material to post here, but this is the link: https://gist.github.com/pnkfelix/6579174

(You might notice some small differences between e.g. the .rs input and the generated LLVM, because the .rs input has some changes to e.g. insert int3 instructions so that I could look at the dynamic execution under gdb. The machine code I present in the gist should actually correspond to the .rs code.)

I do not know .ll well enough to immediately tell whether the bug lies in trans (i.e. if we are generated bogus input to LLVM) or if this is an LLVM bug (i.e. they are mistranslating our input). The problem seems to be that the generated code is not correctly extracting the address for the fld.fold_ident method in the broken case.

In particular: for the LLVM input of the broken case, the main differences are that:

  1. the invocation of Ident::new occurs a bit later in the control flow, basically right before we do call void %17(%"struct.syntax::ast::Ident[#2]"* %i2, { i64, %tydesc*, i8*, i8*, i8 }* %11, %"struct.syntax::ast::Ident[#2]"* %2) which I infer is the invocation of fld.fold_ident.
  2. In the working variant, the invocation of Ident::new is an LLVM call instruction; there is no explicit exception handling landing pad in the bitcode. In the broken variant, the invocation of Ident::new is an LLVM invoke instruction, with an explicit exception handler, i.e. unwind and cleanup code.
  3. In both variants there is some amount of code separating the load of the address of fld.fold_ident into a temp variable and the subsequent call on that temp variable. In the working code, the intervening code is a pair of bitcasts and a call to @llvm.memcpy.p0i8.p0i8.i64 (but perhaps that ends up being an LLVM intrinsic?); in the broken code, the intervening code is a load of the argument to Ident::new and the actual invocation of Ident::new.

@pnkfelix
Copy link
Member Author

At this point I'm pretty confident this is exposing a bug in LLVM.

After making a standalone test case (see updated bug description), I was able to generate bitcode from --save-temps, and then hand-edit the resulting .ll file to figure out which differences seem to actually matter to LLVM, using llc (as well as linking the result via the command line presented by rustc -Z print-link-args).

Starting from the broken .ll code, I discovered:

  1. Compiling with -O0 was necessary to see breakage; putting in -O1 makes the problem go away.
  2. Replacing the invoke instruction with a call instruction makes the problem go away.
  3. I can use the -filetype=asm argument to llc to get inspectable assembly, with some helpful annotations indicating the purpose of certain instructions (yay), and then use ediff to get side-by-side comparisons of the two results (comparing the result when using call instruction for the working variant and the invoke instruction for the broken variant):

Here's the source input for this experiment: https://gist.github.com/pnkfelix/6580915 (the basis of the .ll code is the test in this bug description).

Working code fragment:

    .section    __TEXT,__text,regular,pure_instructions
    .align  4, 0x90
__ZN10light_fuse4anon7expr_fn2anE:      ## @_ZN10light_fuse4anon7expr_fn2anE
    .cfi_startproc
## BB#0:                                ## %function top level
    subq    $56, %rsp
Ltmp12:
    .cfi_def_cfa_offset 64
    movabsq $0, %rax
    leaq    40(%rsp), %rsi
    movq    32(%rdi), %rdi
    movq    (%rdi), %rcx
    movq    %rcx, 40(%rsp)
    movq    8(%rdi), %rcx
    movq    %rcx, 48(%rsp)
    movq    %rax, %rdi
    callq   __ZN20_$SP$bomb.$x27static9glue_take19h9f15e2dd62f678e4arE
    leaq    32(%rsp), %rdi
                                        ## implicit-def: RSI
    movq    48(%rsp), %rax
    movq    40(%rsp), %rcx
    movq    8(%rcx), %rcx
    movq    %rcx, 8(%rsp)           ## 8-byte Spill
    movq    %rax, (%rsp)            ## 8-byte Spill
    callq   __ZN9Ident_new16h4688b2c57373e544v0.0E
    leaq    32(%rsp), %rsi
    movq    (%rsp), %rdi            ## 8-byte Reload
    movq    8(%rsp), %rax           ## 8-byte Reload
    callq   *%rax

Broken code fragment

    .section    __TEXT,__text,regular,pure_instructions
    .align  4, 0x90
__ZN10light_fuse4anon7expr_fn2anE:      ## @_ZN10light_fuse4anon7expr_fn2anE
    .cfi_startproc
    .cfi_personality 155, _upcall_rust_personality
Leh_func_begin5:
    .cfi_lsda 16, Lexception5
## BB#0:                                ## %function top level
    subq    $88, %rsp
Ltmp15:
    .cfi_def_cfa_offset 96
    movq    32(%rdi), %rdi
    vmovups (%rdi), %xmm0
    vmovaps %xmm0, 64(%rsp)
    xorl    %eax, %eax
    movl    %eax, %edi
    leaq    64(%rsp), %rsi
    callq   __ZN20_$SP$bomb.$x27static9glue_take19h9f15e2dd62f678e4arE
    movq    64(%rsp), %rsi
    movq    72(%rsp), %rdi
    movq    8(%rsi), %rsi
Ltmp11:
    leaq    56(%rsp), %rcx
    movq    %rdi, 32(%rsp)          ## 8-byte Spill
    movq    %rcx, %rdi
                                        ## implicit-def: RCX
    movq    %rsi, 24(%rsp)          ## 8-byte Spill
    movq    %rcx, %rsi
    callq   __ZN9Ident_new16h4688b2c57373e544v0.0E
Ltmp12:
    jmp LBB5_1
LBB5_1:                                 ## %normal return
    leaq    56(%rsp), %rsi
    movq    32(%rsp), %rdi          ## 8-byte Reload
    movq    16(%rsp), %rax          ## 8-byte Reload
    callq   *%rax

Look at the spill and reload instructions above: In the working fragment, %rcx is spilled to 8(%rsp) and then that location is reloaded to %rax. In the broken fragment, if you try to trace backward to figure out what %rax is, it seems like nonsense to me: it tries to load it from 16(%rsp), but there is no spill to that location that I can see in this function. It seems to me that in the broken fragment, the corresponding register with the right value prior to the invocation of Indent::new is %rsi, but that register is spilled to 24(%rsp), not 16(%rsp).

(How much do you want to bet that this is some code in the implementation of the LLVM invoke instruction that is doing an incorrect arithmetic calculation...)

@alexcrichton
Copy link
Member

@pnkfelix, I just updated LLVM last night, would you mind trying that to see if this problem is fixed now? Just in the rare case that they caught this so far :)

@pnkfelix
Copy link
Member Author

@alexcrichton I'll do you one better: I grabbed a more recent LLVM and built a newer llc on its own.

Problem still reproduces with the latest version I can build. So I'll probably look into filing an LLVM bug; I'll probably venture onto their chat room soon.

Most recently tested LLVM version (which still has the problem):

commit e5c8c5a1bcecff7e2aa60672be6af2062ad63e6a
Author: Evgeniy Stepanov <eugeni.stepanov@gmail.com>
Date:   Mon Sep 16 13:24:32 2013 +0000

    [msan] Check return value of main().


    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190782 91177308-0d34-0410-b5e6-96231b3b80d8

@alexcrichton
Copy link
Member

Ah well, one can always hope at least. Thanks for checking though!

@pnkfelix
Copy link
Member Author

At this point I have filed a bug against LLVM, here:
http://llvm.org/bugs/show_bug.cgi?id=17261

Not much else we can do right now except (1.) stop using invoke, or (2.) fix the LLVM problem ourselves.

@alexcrichton
Copy link
Member

The code in the initial report now works just fine, flagging as needstest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added.
Projects
None yet
Development

No branches or pull requests

7 participants