References creating during parsing were not in memory that the MRI GC could not see. If the process ran long enough, the references would get collected and the process would crash. In rbx, the C-API handles prevented the objects from getting collected because the method that called into C was still on the stack, and the handles are associated with that native method frame. The solution is to keep a list of references created during parsing and root that list in the Melbourne instance used to process the C-struct parse tree into an AST of Ruby objects.
Notes: * If at all possible, features should not modify the parser. Doing so only introduces potential incompatibilities, bugs, and complicates maintenance. * The __END__ element is arguably part of the AST. It could be considered a child of the toplevel execution context, but it must be handled before everything else in that context because code there may rely on DATA being set. Wrapping the toplevel node in the EndData node captures this better that making EndData a child. Consider a script 'p DATA'. The toplevel here is not a Block node but a single SendWithArguments. * It is possible to support __END__ with no parser changes at all because parsing is halted by the __END__ marker. * The use case for __END__ is quite limited and requiring the .rb file to be available is not unreasonable given the use case (i.e. a single script that carries some data with it, not a Ruby source file that is part of a larger application). * Providing DATA as an IO on the .rb source file ensures that code should behave the same on rbx and MRI.
The precompile scheme for installing Rubinius would use -Ilib with MRI. If the user had RUBYOPT=rubygems set, MRI would require rubygems which would end up trying to load the Rubinius lib/etc.rb. This approach essentially hardcodes the paths relative to the actual files. This is desirable given that the compiler needs to be loaded in various Ruby implementations (probably MRI but not necessarily so) and we should ensure that only these particulay files are loaded.
The parser now communicates any magic comments it sees to the compiler. Currently, the compiler enables any transform available using the text of the comment as the transform name. The current transform using this is Array Zen. It's a list comprehension transform. Currently, limited, but should be expanded to handle a number of more forms.
The transforms were previously implemented as compiler plugins. Here, the functionality is split in two parts: 1) recognizing a form that will be transformed and returning a node that will emit different bytecode, 2) emitting the bytecode. The recognition step is essentially stateless and is implemented as class methods on each transform class. For now, only the CALL parse tree node is processed for possible transforms. The transform classes register themselves under a category. For now, the categories are :default and :kernel. The catergory of transforms to be applied is selected for each compiler instance. Instead of applying all the transforms in a category, it is also possible to apply only a selected set of transforms.
The basic machinery is in place but the processor is all stubs. Both String#to_ast and File.to_ast "work". The next steps are: 1. Get a harness in place to run the specs 2. Start adding functionality to the processor methods 3. Fix the arguments passed to processor methods from the visitor 4. Augment the compiler Node classes with additional methods as needed 5. Get all specs passing 6. Remove the dependencies on libmpa, libmquard, libcchash