Skip to content
This repository

This branch is 0 commits ahead and 0 commits behind master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 lib
Octocat-spinner-32 samples
Octocat-spinner-32 spec
Octocat-spinner-32 tasks
Octocat-spinner-32 test
Octocat-spinner-32 .gitignore
Octocat-spinner-32 .travis.yml
Octocat-spinner-32 Gemfile
Octocat-spinner-32 Gemfile.lock
Octocat-spinner-32 LICENSE.txt
Octocat-spinner-32 README.md
Octocat-spinner-32 Rakefile
Octocat-spinner-32 build.xml
Octocat-spinner-32 cascading.jruby.gemspec
Octocat-spinner-32 ivy.xml
Octocat-spinner-32 ivysettings.xml
README.md

Cascading.JRuby Build Status

cascading.jruby is a DSL for Cascading, which is a dataflow API written in Java. With cascading.jruby, Ruby programmers can rapidly script efficient MapReduce jobs for Hadoop.

To give you a quick idea of what a cascading.jruby job looks like, here's word count:

require 'rubygems'
require 'cascading'

input_path = ARGV.shift || (raise 'input_path required')

cascade 'wordcount', :mode => :local do
  flow 'wordcount' do
    source 'input', tap(input_path)

    assembly 'input' do
      split_rows 'line', 'word', :pattern => /[.,]*\s+/, :output => 'word'
      group_by 'word' do
        count
      end
    end

    sink 'input', tap('output/wordcount', :sink_mode => :replace)
  end
end.complete

cascading.jruby provides a clean Ruby interface to Cascading, but doesn't attempt to add abstractions on top of it. Therefore, you should be acquainted with the Cascading API before you begin.

For operations you can apply to your dataflow within a pipe assembly, see the Assembly class. For operations available within a block passed to a group_by, union, or join, see the Aggregations class.

Note that the Ruby code you write merely constructs a Cascading job, so no JRuby runtime is required on your cluster. This stands in contrast with writing Hadoop streaming jobs in Ruby. To run cascading.jruby applications on a Hadoop cluster, you must use Jading to package them into a job jar.

cascading.jruby has been tested on JRuby versions 1.2.0, 1.4.0, 1.5.3, 1.6.5, and 1.6.7.2.

Something went wrong with that request. Please try again.