cascading.jruby is a DSL for Cascading, which is a dataflow API written in Java. With cascading.jruby, Ruby programmers can rapidly script efficient MapReduce jobs for Hadoop.
To give you a quick idea of what a cascading.jruby job looks like, here's word count:
require 'rubygems' require 'cascading' input_path = ARGV.shift || (raise 'input_path required') cascade 'wordcount', :mode => :local do flow 'wordcount' do source 'input', tap(input_path) assembly 'input' do split_rows 'line', 'word', :pattern => /[.,]*\s+/, :output => 'word' group_by 'word' do count end end sink 'input', tap('output/wordcount', :sink_mode => :replace) end end.complete
Note that the Ruby code you write merely constructs a Cascading job, so no JRuby runtime is required on your cluster. This stands in contrast with writing Hadoop streaming jobs in Ruby. To run cascading.jruby applications on a Hadoop cluster, you must use Jading to package them into a job jar.
cascading.jruby has been tested on JRuby versions 1.2.0, 1.4.0, 1.5.3, 1.6.5, and 22.214.171.124.