Rubinius crash during a normal test run (that used to pass on 2.0.0dev) #1992

Closed
jfredett opened this Issue Nov 7, 2012 · 24 comments

Projects

None yet

5 participants

@jfredett
Contributor
jfredett commented Nov 7, 2012

The rbx report feature failed to be able to post the crash dump, I'm hoping that I'll be able to get the dump up here at the end of the issue.

to Reproduce:

  1. git clone git://github.com/jfredett/exegesis

  2. the .rvmrc should be committed, it points to rbx-head@exegesis. rbx-head --version for me gives:

rubinius 2.0.0rc1 (1.9.3 release 2012-11-02 JI) [x86_64-apple-darwin12.0.0]

  1. run rake to run the tests, it should result in a single failure, which passes when you run the file individually via rspec (the file is spec/unit/flyweight_spec)

Info about my system:


uname -a
#=> Darwin JGF.local 12.0.0 Darwin Kernel Version 12.0.0: Sun Jun 24 23:00:16 PDT 2012; root:xnu-2050.7.9~1/RELEASE_X86_64 x86_64 i386 MacBookPro8,2 Darwin

^-- late 2011 model MBP, i7, 4GB ram. Running 1OSX0.8

@jfredett
Contributor
jfredett commented Nov 7, 2012

The crash dump won't paste, for some reason, I think it must have some illegitimate character in it. I have it saved as a dump which will probably email okay, if someone wants to see the original dump.

@jc00ke
Member
jc00ke commented Nov 7, 2012

Can you gist the dump?

@jfredett
Contributor
jfredett commented Nov 7, 2012

For some reason, it will not post, Gist complains that the post is empty (as does pastie and pastebin). I think there might be a invalid character or something in the dump.

I have it saved locally, and vim can open it fine -- I suppose I could screenshot it… better than nothing, right?

/Joe

On Nov 7, 2012, at 11:27 AM, Jesse Cooke notifications@github.com wrote:

Can you gist the dump?


Reply to this email directly or view it on GitHub.

@jc00ke
Member
jc00ke commented Nov 7, 2012

Sure, a screenshot would work 😉

@Locke23rus
Member

@jfredett you can try cat ~/.rubinius_last_error and then paste to gist. It's work for me. ;)

@jfredett
Contributor
jfredett commented Nov 7, 2012

I'll give that a shot, if not a screenshot should definitely fix it.

On Nov 7, 2012, at 12:33 PM, Kirill Nikitin notifications@github.com wrote:

@jfredett you can try cat ~/.rubinius_last_error and then paste to gist. It's work for me. ;)


Reply to this email directly or view it on GitHub.

@jfredett
Contributor
jfredett commented Nov 7, 2012
Users/jfredett/.rvm/rubies/rbx-head/bin/rbx -S rspec ./spec/integration/flyweight_registerable_spec.rb ./spec/integration/visitor_spec.rb ./spec/unit/base_directory_spec.rb ./spec/unit/directory_spec.rb ./spec/unit/file_searcher_spec.rb ./spec/unit/flyweight_spec.rb ./spec/uni
t/helpers_spec.rb ./spec/unit/source_file_spec.rb -c -f progress
........................................................F...................*
---------------------------------------------
CRASH: A fatal error has occurred.

Backtrace:
0   rbx                                 0x000000010f4055d0 _ZN8rubiniusL12segv_handlerEi + 544
1   libsystem_c.dylib                   0x00007fff93f2092a _sigtramp + 26
2   ???                                 0x00007fcc818e2dc8 0x0 + 140516323634632
3   rbx                                 0x000000010f412a5a _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 101864   rbx                                 0x000000010f47e0fd _ZN8rubinius11MachineCode19execute_specializedINS_17SplatOnlyArgumentEEEPNS_6ObjectEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 1005
5   rbx                                 0x000000010f412a5a _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 101866   rbx                                 0x000000010f47e0fd _ZN8rubinius11MachineCode19execute_specializedINS_17SplatOnlyArgumentEEEPNS_6ObjectEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 1005
7   rbx                                 0x000000010f40f18d _ZN8rubinius11InlineCache11empty_cacheEPNS_5StateEPS0_PNS_9CallFrameERNS_9ArgumentsE + 6298   rbx                                 0x000000010f4126f6 _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 9318
9   rbx                                 0x000000010f51afce _ZN8rubinius16BlockEnvironment19execute_interpreterEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 1150
10  rbx                                 0x000000010f51b140 _ZN8rubinius16BlockEnvironment6invokeEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 260
11  rbx                                 0x000000010f51b2e0 _ZN8rubinius16BlockEnvironment10call_underEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 372
12  rbx                                 0x000000010f493903 _ZN8rubinius10Primitives16block_call_underEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 171
13  rbx                                 0x000000010f4126f6 _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 9318
14  rbx                                 0x000000010f47d807 _ZN8rubinius11MachineCode19execute_specializedINS_16GenericArgumentsEEEPNS_6ObjectEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 1063
15  rbx                                 0x000000010f40ed88 _ZN8rubinius11InlineCache19empty_cache_privateEPNS_5StateEPS0_PNS_9CallFrameERNS_9ArgumentsE + 554
16  rbx                                 0x000000010f41283c _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 9644
17  rbx                                 0x000000010f51afce _ZN8rubinius16BlockEnvironment19execute_interpreterEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 1150
18  rbx                                 0x000000010f51b140 _ZN8rubinius16BlockEnvironment6invokeEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 260
19  rbx                                 0x000000010f51b460 _ZN8rubinius16BlockEnvironment4callEPNS_5StateEPNS_9CallFrameERNS_9ArgumentsEi + 68
20  rbx                                 0x000000010f4137e7 _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 13655
21  rbx                                 0x000000010f47d807 _ZN8rubinius11MachineCode19execute_specializedINS_16GenericArgumentsEEEPNS_6ObjectEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 1063
22  rbx                                 0x000000010f40f18d _ZN8rubinius11InlineCache11empty_cacheEPNS_5StateEPS0_PNS_9CallFrameERNS_9ArgumentsE + 629
23  rbx                                 0x000000010f41283c _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 9644
24  rbx                                 0x000000010f51afce _ZN8rubinius16BlockEnvironment19execute_interpreterEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 1150
25  rbx                                 0x000000010f51b140 _ZN8rubinius16BlockEnvironment6invokeEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 260
26  rbx                                 0x000000010f51a450 _ZN8rubinius13BlockAsMethod14block_executorEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 430
27  rbx                                 0x000000010f40eb58 _ZN8rubinius11InlineCache17empty_cache_vcallEPNS_5StateEPS0_PNS_9CallFrameERNS_9ArgumentsE + 554
28  rbx                                 0x000000010f4125d2 _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 9026
29  rbx                                 0x000000010f51afce _ZN8rubinius16BlockEnvironment19execute_interpreterEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 1150
30  rbx                                 0x000000010f51b140 _ZN8rubinius16BlockEnvironment6invokeEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 260
31  rbx                                 0x000000010f51b2e0 _ZN8rubinius16BlockEnvironment10call_underEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 372
32  rbx                                 0x000000010f493903 _ZN8rubinius10Primitives16block_call_underEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 171
33  rbx                                 0x000000010f4126f6 _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 9318
34  rbx                                 0x000000010f47d807 _ZN8rubinius11MachineCode19execute_specializedINS_16GenericArgumentsEEEPNS_6ObjectEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 1063
35  rbx                                 0x000000010f40ed88 _ZN8rubinius11InlineCache19empty_cache_privateEPNS_5StateEPS0_PNS_9CallFrameERNS_9ArgumentsE + 554
36  rbx                                 0x000000010f41283c _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 9644
37  rbx                                 0x000000010f47dc94 _ZN8rubinius11MachineCode19execute_specializedINS_11NoArgumentsEEEPNS_6ObjectEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 1044
38  rbx                                 0x000000010f40eb58 _ZN8rubinius11InlineCache17empty_cache_vcallEPNS_5StateEPS0_PNS_9CallFrameERNS_9ArgumentsE + 554
39  rbx                                 0x000000010f4125d2 _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 9026
40  rbx                                 0x000000010f47d807 _ZN8rubinius11MachineCode19execute_specializedINS_16GenericArgumentsEEEPNS_6ObjectEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 1063
41  rbx                                 0x000000010f40ed88 _ZN8rubinius11InlineCache19empty_cache_privateEPNS_5StateEPS0_PNS_9CallFrameERNS_9ArgumentsE + 554
42  rbx                                 0x000000010f4126f6 _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 9318
43  rbx                                 0x000000010f51afce _ZN8rubinius16BlockEnvironment19execute_interpreterEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 1150
44  rbx                                 0x000000010f51b140 _ZN8rubinius16BlockEnvironment6invokeEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 260
45  rbx                                 0x000000010f51b2e0 _ZN8rubinius16BlockEnvironment10call_underEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 372
46  rbx                                 0x000000010f493903 _ZN8rubinius10Primitives16block_call_underEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 171
47  rbx                                 0x000000010f4126f6 _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 9318
48  rbx                                 0x000000010f47d807 _ZN8rubinius11MachineCode19execute_specializedINS_16GenericArgumentsEEEPNS_6ObjectEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 1063
49  rbx                                 0x000000010f40f18d _ZN8rubinius11InlineCache11empty_cacheEPNS_5StateEPS0_PNS_9CallFrameERNS_9ArgumentsE + 629
50  rbx                                 0x000000010f41283c _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 9644
51  rbx                                 0x000000010f51afce _ZN8rubinius16BlockEnvironment19execute_interpreterEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 1150
52  rbx                                 0x000000010f51b140 _ZN8rubinius16BlockEnvironment6invokeEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 260
53  rbx                                 0x000000010f51b460 _ZN8rubinius16BlockEnvironment4callEPNS_5StateEPNS_9CallFrameERNS_9ArgumentsEi + 68
54  rbx                                 0x000000010f4137e7 _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 1365555  rbx                                 0x000000010f47dc94 _ZN8rubinius11MachineCode19execute_specializedINS_11NoArgumentsEEEPNS_6ObjectEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 1044
56  rbx                                 0x000000010f41283c _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 9644
57  rbx                                 0x000000010f47ea34 _ZN8rubinius11MachineCode19execute_specializedINS_12TwoArgumentsEEEPNS_6ObjectEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 1060
58  rbx                                 0x000000010f4126f6 _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 9318
59  rbx                                 0x000000010f51afce _ZN8rubinius16BlockEnvironment19execute_interpreterEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 1150
60  rbx                                 0x000000010f51b140 _ZN8rubinius16BlockEnvironment6invokeEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 260
60  rbx                                 0x000000010f51b140 _ZN8rubinius16BlockEnvironment6invokeEPNS_5StateEPNS_9CallFrameEPS0_RNS_9ArgumentsERNS_15BlockInvocationE + 260
61  rbx                                 0x000000010f51b460 _ZN8rubinius16BlockEnvironment4callEPNS_5StateEPNS_9CallFrameERNS_9ArgumentsEi + 68
62  rbx                                 0x000000010f4137e7 _ZN8rubinius11MachineCode11interpreterEPNS_5StateEPS0_PNS_20InterpreterCallFrameE + 13655
63  rbx                                 0x000000010f47dc94 _ZN8rubinius11MachineCode19execute_specializedINS_11NoArgumentsEEEPNS_6ObjectEPNS_5StateEPNS_9CallFrameEPNS_10ExecutableEPNS_6ModuleERNS_9ArgumentsE + 1044


Wrote full error report to: /Users/jfredett/.rubinius_last_error
Run 'rbx report' to submit this crash report!
rake aborted!
/Users/jfredett/.rvm/rubies/rbx-head/bin/rbx -S rspec ./spec/integration/flyweight_registerable_spec.rb ./spec/integration/visitor_spec.rb ./spec/unit/base_directory_spec.rb ./spec/unit/directory_spec.rb ./spec/unit/file_searcher_spec.rb ./spec/unit/flyweight_spec.rb ./spec/unit/helpers_spec.rb ./spec/unit/source_file_spec.rb -c -f progress failed

Tasks: TOP => default => spec
(See full trace by running task with --trace)
@jfredett
Contributor
jfredett commented Nov 7, 2012

Victory! @Locke23rus 's idea worked. Odd that cut-pasting from both vim, macvim and textmate all failed, but catting worked...

@dbussink
Member
dbussink commented Nov 8, 2012

I just pushed 6e5968d, could you try whether that fixes the problem for you?

@jfredett
Contributor
jfredett commented Nov 8, 2012

@dbussink It doesn't appear to. I should add some more information I gathered last night with the help of @brixen:

  1. It does not appear to happen on archlinux w/ a fresh install of 2.0.0rc1
  2. It continues to happen on 10.8, even with the current version of master as of last night, as well as your recent commit.
  3. When running w/ gdb, it reveals that the error originates from line 431 of vm/inline_cache.cpp, due to a EXC_BAD_ACCESS of a memory address 0x<62-zeros>42. I don't know enough (Offtopic: but would love to be directed at resources so I can learn) about how this big pile of C++ works to make any sort of headway as to why.

I'm going to attempt to replicate the bug on another machine at work this morning, to see if it happens consistently on 10.8. @brixen noted that he is on 10.6, so perhaps it is a Mountain-lion specific bug.

@dbussink
Member
dbussink commented Nov 8, 2012

Well, I run 10.8 here too and it works fine here locally, so I don't expect that to be the problem. http://rubini.us/2012/01/04/debugging-rubinius/ as some stuff where I am debugging an issue and looking at stuff in memory, so that might help.

If you can catch the crash, feel free to hop into irc to look at it. Another solution would perhaps doing a screen sharing session to debug if you want.

@jfredett
Contributor
jfredett commented Nov 8, 2012

Okay, well -- since it works fine on 10.8 for you, then that isolates the
problem to me locally -- given that I installed via rvm, which compiles rbx
on install, I would suppose the issue is in the one of the external,
C-level dependencies -- perhaps a loose version specification for
something?

Curious that the crash is so consistent on my end, and yet irreproducible
for every other place I tried.

On Thu, Nov 8, 2012 at 8:29 AM, Dirkjan Bussink notifications@github.comwrote:

Well, I run 10.8 here too and it works fine here locally, so I don't
expect that to be the problem.
http://rubini.us/2012/01/04/debugging-rubinius/ as some stuff where I am
debugging an issue and looking at stuff in memory, so that might help.

If you can catch the crash, feel free to hop into irc to look at it.
Another solution would perhaps doing a screen sharing session to debug if
you want.


Reply to this email directly or view it on GitHubhttps://github.com/rubinius/rubinius/issues/1992#issuecomment-10187769.

@dbussink
Member
dbussink commented Nov 8, 2012

The problem is that this error doesn't point at anything external, but at Rubinius itself. There is also no reason to believe it is related to rvm. Did you try building Rubinius itself from a clone on that machine with DEV=1?

@jfredett
Contributor
jfredett commented Nov 8, 2012

Yes. That's how I got to the GDB output. I'll dig around some more and see if I can find where the actual bad access is coming from -- all the arguments to the method being called (as well as the thing it's being called on) seem to be okay.

I'll dig around and hopefully find something more interesting.

@dbussink
Member
dbussink commented Nov 8, 2012

That all the arguments seem correct is actually already very important information and rises a suspicion :). If you hop into irc we could investigate further.

@jfredett
Contributor
jfredett commented Nov 9, 2012

@dbussink https://gist.github.com/4043418

I may have stubbed a #new after doing SomeClass.private_class_method :new, :allocate -- which is something normal (well, sane) people wouldn't do. To be fair, I think I copied it out of the impl of Singleton in the rubinius source...

That gist contains a minimal replication of the bug I encountered (actually just replicates the assertion error, not the full crash... I'll have to double check it in the morning).

@dbussink
Member
dbussink commented Nov 9, 2012

This is really weird, just ran it here locally and it works fine, no nil entry in the cache for me here then.

@jfredett
Contributor

Some more things I found (that I have mentioned in irc), that I'm going to leave here for posterity:

The first 'bad' commit for me is 5067035, it was a change to fix #1813, which changes, to my understanding, how
aliasing a method works (to prevent an infinite recursion).

Given that this problem, apparently, happens for only me, I don't see it being resolved until someone else can actually replicate it. For my part, the problem is resolved by removing some admittedly evil code intended to prevent clients of a particular module from using #new to create objects including that module (the goal being to allow you to create flyweights by including a module), I don't suspect many people are willing to go to such extremes to abuse ruby, so I don't see this being a problem for anyone else.

It's probably okay to close this issue as 'cannot reproduce' or whatever, I don't know what the procedure around here is.

@brixen
Member
brixen commented Feb 26, 2013

@jfredett closing based on your feedback; please let us know if you have any issues that we can investigate.

@brixen brixen closed this Feb 26, 2013
@jfredett
Contributor

It occurred to me the other day that I stumbled across an issue in another project, on another ruby (MRI 1.9.3) which has similar, heisenbuggy characteristics to this one, namely

https://github.com/banister/binding_of_caller/issues/14#issuecomment-24855695

Similarly to my case, it fails only for some people. I likely did have Binding-of-caller included in the minimal reproduction case I did, because it's part of pry (which is why this bug has been hitting people at all).

Weirdly, it doesn't happen for everyone, and even more odd is that it doesn't necessarily happen to two otherwise apparently identical machines running the same code. These characteristics are similar enough to my case that it leads me to believe that maybe they are related.

It should be noted that this bug has only popped up on MRI, and not on RBX (according to that issue), but B-of-C does use C-extensions, which as I recall Rubinius does a pretty good job of emulating MRI behavior, so it's not out of the realm of possibility that you might experience similar failures when a good gem goes bad.

Ultimately, it's not critical for me to have this fixed, but as someone who loathes an unopened box, I figured it might be worthwhile to post this potential connection here for posterity.

@ghost
ghost commented Sep 23, 2013

@jfredett binding_of_caller on rubinius is actually a pure ruby implementation.

@jfredett
Contributor

Drats, I was hoping that might be the box that needed opening. Oh well, perhaps it's ours not to reason why.

On Sep 23, 2013, at 11:38 AM, Robert notifications@github.com wrote:

@jfredett binding_of_caller on rubinius is actually a pure ruby implementation.


Reply to this email directly or view it on GitHub.

@ghost
ghost commented Sep 23, 2013

it may be that binding_of_caller is finding a bug in one of Rubinius's APIs. donno though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment