Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory corruption causing sudden death of processes. #3658

Closed
digitalextremist opened this issue May 28, 2016 · 5 comments
Closed

Memory corruption causing sudden death of processes. #3658

digitalextremist opened this issue May 28, 2016 · 5 comments

Comments

@digitalextremist
Copy link
Member

digitalextremist commented May 28, 2016

Seems to be similar error output to: #2674


At seemingly random points, one of the following errors will output to the calling terminal ( not STDOUT or STDERR ) ... and there could be more:

*** Error in `ruby': free(): corrupted unsorted chunks: 0x00007fb21c000c20 ***
*** Error in `ruby': malloc(): memory corruption: 0x00007fb21c001040 ***
*** Error in `ruby': corrupted double-linked list: 0x00007fe72dbd6250 ***
*** Error in `ruby': double free or corruption (!prev): 0x00007f19a448a930 ***
*** Error in `ruby': malloc(): memory corruption: 0x00007f38beb4b888 ***
*** Error in `ruby': double free or corruption (out): 0x00007f2740001070 ***

No discernable way to reproduce.

Configuration Details

  • rubinius 3.33 (2.2.2 db6f477e 2016-05-23 3.6.0) [x86_64-linux-gnu]
  • Linux #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
  • Ubuntu 14.04.4 LTS

No stacktrace; sudden death.

Seems related to the thread context issue @brixen is already aware of. Have experienced in 3.33 as reporting here, as well as 3.32, and 3.31 but cannot verify the exact messages produced by those.

@digitalextremist
Copy link
Member Author

@chuckremes is there any chance this could be influenced by ffi-rzmq, or are you pretty confident it's Rubinius itself? I ask because it seems like these are pointer-related, and I recall seeing a lot of memory pointers being defined to properly integrate with libzmq3 underneath.

The above errors happen on any one of 7 different processes built with Celluloid; each communicates by 0MQ via Celluloid::ZMQ using ffi-rzmq. There are anywhere from 1-9 different 0MQ connections open on the processes.

I tend toward thinking it's not ffi-rzmq because from time to time I also see strange errors which feel like a race condition in thread context switching. For example, totally impossible values pop up during I/O operations, where the selector spits up junk data. I am watching for that to happen again to catch it and add it to the ticket.

@digitalextremist
Copy link
Member Author

digitalextremist commented May 30, 2016

More examples:

*** Error in `ruby': double free or corruption (!prev): 0x00007f5014575b20 ***
*** Error in `ruby': corrupted double-linked list: 0x00007f9ec80ee5c0 ***
*** Error in `ruby': corrupted double-linked list: 0x00007fb1c005fd60 ***
*** Error in `ruby': double free or corruption (!prev): 0x00007faef0988bf0 ***
*** Error in `ruby': double free or corruption (!prev): 0x00007f86300cf800 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000000a38a00 ***
*** Error in `ruby': free(): invalid pointer: 0x00007f0510056bc0 ***
*** Error in `ruby': corrupted double-linked list: 0x00007f051001a270 ***
*** Error in `ruby': corrupted double-linked list: 0x00007fbc0233bee0 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x000000000151aa00 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000001aa4a00 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000001358a00 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000001c50a00 ***
*** Error in `ruby': double free or corruption (!prev): 0x00007f9bc84bdbc0 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000000f955e0 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000001c80a00 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x00000000026fea00 ***
*** Error in `ruby': munmap_chunk(): invalid pointer: 0x00007fa11c163100 ***
*** Error in `ruby': corrupted double-linked list: 0x00007fa11c11e120 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x000000000270fa00 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x00000000020e9a00 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000001983a00 ***
 *** Error in `ruby': double free or corruption (!prev): 0x00007f502cd29d40 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000001eaaa00 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000002681a00 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000002899a00 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x000000000195da00 ***
*** Error in `ruby': double free or corruption (!prev): 0x00007f8484002080 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000001149a00 ***
*** Error in `ruby': double free or corruption (!prev): 0x00007f88ed993410 ***
*** Error in `ruby': double free or corruption (!prev): 0x00007f4db80d0560 ***
*** Error in `ruby': malloc(): memory corruption: 0x00007fdbcca36858 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000002654a00 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000000c9fa00 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x0000000001a8ea00 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x00000000016efa00 ***
*** Error in `ruby': free(): corrupted unsorted chunks: 0x000000000286ba00 ***

@brixen
Copy link
Member

brixen commented May 31, 2016

@digitalextremist any change you may be able to run one of these processes under a debugger? There are a number of places that free() could be called, so finding out where this is being called would help tremendously.

A few places that free() may be called include: 1. C-API via a C-ext, 2. FFI disposing of memory, 3. in the large object region during sweep, 4. at various places in Rubinius when heap memory is needed, including parts of the MachineCode instances and InlineCache mechanism (and a lot of other places).

So, this could be FFI related (and that would possibly implicate C-API as I'm pretty sure you're using the ffi gem, but correct me if not), C-API related other than FFI, GC related, interpreter related, concurrency related, etc.

If it's not possible to run under a debugger, would it be possible to 1. try to extract a repro, or 2. give me some sort of access to attempt to repro myself?

@digitalextremist
Copy link
Member Author

Ticket update... as of 3.38 I believe, the specific symptoms of this issue abated; but the underlying issue feels like it's there -- manifesting differently. See #3664 for continuation from here on; leaving open though, until #3664 is resolved.

@brixen
Copy link
Member

brixen commented Jan 4, 2020

Much of the internals of Rubinius have been completely or mostly rewritten in the past couple years. This includes the garbage collector, concurrency facilities, Fibers, much of the instruction set, and a migration away from "primitive" functions that implement Ruby features.

Since a number of segfaults or process hangs have occurred in these features over time, this issue may be fixed.

The focus for Rubinius in the near term is on the following capabilities:

  1. Instruction set
  2. Debugger
  3. Profiler
  4. Just-in-time compiler
  5. Concurrency
  6. Garbage collector

Contributions in the form of PRs for any of the areas of focus above are appreciated. Once these capabilities are more robust, it will be possible to more efficiently debug and fix any process crashes.

Other than these core capabilities, PRs to fix any specific issue are always welcome.

@brixen brixen closed this as completed Jan 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants