There are still two things to do for this:
1. Use a separate thread for writing compilation cache files.
2. Prune the contents of ~/.rbx when it exceeds a threshold.
Rubinius uses .rbc files to cache on disk the compiled bytecode for a Ruby
source code file. Typically, these cache files exist alongside the
corresponding .rb file, however, it is possible to collect all the cache files
into a single directory (and subdirectories) by hashing the full path to the
Ruby source file as a key to find a file in the cache directory.
In Rubinius 2.0, we have multiple language modes. The bytecode for 1.8
language mode differs from the bytecode for 1.9 language mode. The .rbc file
format was extended to include language version. This ensures that running the
same Ruby file in different modes will not use the wrong version of bytecode.
When Rubinius is installed, we pre-compile all the Ruby files in the lib/
directory. This ensures that if Rubinius is installed to a directory where a
user does not have write access, the cache files will still be created and can
be used to speed loading of standard library code.
If the .rbc files are placed alongside the .rb files, the existing arrangement
must be changed to provide different .rbc files depending on language mode. In
other words, just versioning the .rbc file is no longer sufficient as the
version of the .rbc files created for lib/**/*.rb files would be one or the
other. The same situation exists for the pre-installed gems, which are not
split into different gem directories for 1.8 and 1.9 mode.
An additional problem with creating .rbc files alongside the .rb files is that
people object to cluttering their source with the cache files.
There have repeatedly been requests for distributing Ruby applications without
Ruby source code. The existing .rbc files can be used for this, but are quite
primitive and don't provide easy ability to abstract other storage
configurations (eg encryption).
Finally, there are potentially numerous advantages to storing the compilation
cache in a proper database that would permit storing a great deal of
additional metadata for building tools for Ruby. Abstracting the cache from
the existing .rbc files to the directory of files using the -Xrbc.db option is
a good first step.
To summarize the problems with the existing .rbc mechanism:
1. Multiple different files are required to permit .rbc files in different
language modes to exist alongside a single .rb file, as is the case with
pre-compiling the standard library files on install.
2. People object to the files cluttering their source code.
3. The files don't easily permit extending them to store additional, valuable
4. Related to 3, the files don't provide a suitably powerful mechanism for
distributing Ruby applications without source code.
The existing -Xrbc.db option is a direct replacement for storing the .rbc
files alongside the .rb files and immediately solves problem #1 above. One
issue with the rbx.db option is what to provide for a default value. This is
1. If the user explicitly provides a path with -Xrbc.db, cache all files in
2. If the user does not provide a path, use two separate paths as follows:
a. on boot, record the current working directory (referred to as CWD below).
b. if the file being loaded has CWD as a prefix, store the cache for the
file in CWD/.rbx/<wherever>
c. if the file being loaded does not have CWD as a prefix, store the cache
for the file in ~/.rbx/<wherever>
3. When hashing the file path to determine the cache file, add the language
mode so that 1.8 and 1.9 files are separated. This does not replace the use
of the language version information embedded in the .rbc format, but avoids
recompile thrashing for e.g. running the specs under 1.8 mode and then
under 1.9 mode.
4. Only read and write to the cache if the cache directory is owned by the
user. This avoids a potential security hole where a superuser could be
running bytecode that was put into the cache maliciously and prevents the
superuser from creating files that the user would not be able to overwrite.
With these changes above, we have a reasonable default for all files. The
standard library files cache would exist in ~/.rbx/, which is reasonable for a
file installed with Rubinius that isn't going to be changing. The application
files would by default be cached with the application directory, but would not
liter files where source code files are. If the user explicitly requests a
rbc.db directory, all files are written there, but are still segregated based
on language version.
As a related but separate change, since we have full Ruby concurrency in
Rubinius 2.0, I propose making the Writer stage of the bytecode compiler use a
separate thread. Once the CompiledMethod is created, it is enqueued for
writing to the cache and immediately returned. The program can start executing
the method while the separate cache thread figures out where to put it and
marshals the contents to disk.
This fixes a crash issue where the JIT was running independent from the
GC and the GC was deallocating JIT memory at the same time. We don't
want to make the whole JIT generation GC dependent, since that causes
performance issues, so we guard all memory allocations here with a
The crash would be exposed with these backtraces where things were
Thread 6 (process 70553):
#0 rubinius::jit::FreeRangeHeader::AddToFreeList () at /Users/dirkjan/Code/rubinius/vm/llvm/jit_memory_manager.hpp:151
#1 0x000000010989f037 in rubinius::jit::MemoryRangeHeader::TrimAllocationToSize (this=0x10f7ec688, FreeList=0x10f7ec688, NewSize=5064) at vm/llvm/jit_memory_manager.cpp:211
#2 0x000000010989bb75 in rubinius::jit::RubiniusRequestJITMemoryManager::endFunctionBody (this=<value temporarily unavailable, due to optimizations>, F=<value temporarily unavailable, due to optimizations>, FunctionStart=<value temporarily unavailable, due to optimizations>, FunctionEnd=0x13c8 <Address 0x13c8 out of bounds>) at jit_memory_manager.hpp:317
#3 0x0000000109b4f852 in (anonymous namespace)::JITEmitter::finishFunction ()
#4 0x0000000109946106 in (anonymous namespace)::Emitter<llvm::JITCodeEmitter>::runOnMachineFunction ()
#5 0x0000000109bbbc30 in llvm::MachineFunctionPass::runOnFunction ()
#6 0x0000000109f1beb2 in llvm::FPPassManager::runOnFunction ()
#7 0x0000000109f1b9f9 in llvm::FunctionPassManagerImpl::run ()
#8 0x0000000109f1b8a1 in llvm::FunctionPassManager::run ()
#9 0x0000000109b461ab in llvm::JIT::runJITOnFunctionUnlocked ()
#10 0x0000000109b46148 in llvm::JIT::runJITOnFunction ()
#11 0x0000000109898fcc in rubinius::jit::Compiler::generate_function (this=0x10d485d38, indy=true) at vm/llvm/jit_compiler.cpp:118
#12 0x00000001098ada93 in rubinius::BackgroundCompilerThread::perform (this=0x7fce81633240) at vm/llvm/state.cpp:345
#13 0x00000001098ad4ef in rubinius::utilities::thread::Thread::delete_on_exit () at /Users/dirkjan/Code/rubinius/vm/util/thread.hpp:79
#14 0x00000001098ad4ef in rubinius::utilities::thread::Thread::trampoline (arg=0x7fce81633240) at thread.hpp:211
#15 0x00007fff8e73c7a2 in _pthread_start ()
#16 0x00007fff8e7291e1 in thread_start ()
Thread 5 (process 70553):
#0 0x00007fff952b5386 in __semwait_signal ()
#1 0x00007fff8e7c6800 in nanosleep ()
#2 0x00007fff8e7c668a in sleep ()
#3 0x000000010969c9dd in rubinius::segv_handler (sig=11) at vm/environment.cpp:211
#4 <signal handler called>
#5 rubinius::jit::FreeRangeHeader::AddToFreeList () at /Users/dirkjan/Code/rubinius/vm/llvm/jit_memory_manager.hpp:151
#6 0x000000010989ee53 in rubinius::jit::MemoryRangeHeader::FreeBlock (this=0x10f7c88f0, FreeList=<value temporarily unavailable, due to optimizations>) at jit_memory_manager.hpp:155
#7 0x00000001098ac3e7 in rubinius::LLVMState::remove (this=<value temporarily unavailable, due to optimizations>, func=<value temporarily unavailable, due to optimizations>) at jit_memory_manager.hpp:426
#8 0x000000010983dde9 in rubinius::CodeManager::sweep (this=0x7fce8180a2d8) at vm/gc/code_manager.cpp:107
#9 0x0000000109750e7e in rubinius::ObjectMemory::mark () at /Users/dirkjan/Code/rubinius/vm/objectmemory.hpp:634
#10 0x0000000109750e7e in rubinius::ObjectMemory::collect_mature_finish (this=0x7fce8180a200, state=0x10c94fec8, data=0x7fce8528b220) at vm/objectmemory.cpp:636
#11 0x0000000109843d8a in rubinius::State::memory () at /Users/dirkjan/Code/rubinius/vm/state.hpp:171
#12 0x0000000109843d8a in rubinius::ImmixMarker::perform (this=0x7fce8163a720, state=0x10c94fec8) at vm/gc/immix_marker.cpp:172
#13 0x0000000109843b71 in rubinius::immix_marker_tramp (state=0x10f7ec688) at vm/gc/immix_marker.cpp:18
#14 0x00000001098094c0 in rubinius::Thread::in_new_thread (ptr=0x7fce86a23e70) at vm/builtin/thread.cpp:250
#15 0x00007fff8e73c7a2 in _pthread_start ()
#16 0x00007fff8e7291e1 in thread_start ()