Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Using Hadoop by Ruby script, supported by JRuby. Not Hadoop streaming.
Ruby Java
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
examples
lib
spec
src/java/org/apache/hadoop/ruby
test/java/org/apache/hadoop/ruby
.gitignore
README.rdoc
Rakefile
VERSION
build.xml
jruby-on-hadoop.gemspec

README.rdoc

JRuby on Hadoop

JRuby on Hadoop is a thin wrapper for Hadoop Mapper / Reducer by JRuby. We recommend to use this with hadoop-papyrus on the github / gemcutter.

Description

Install

Required gems are all on GemCutter.

  1. Upgrade your rubygem to 1.3.5

  2. Install gems

$ gem install jruby-on-hadoop

Usage

  1. Run Hadoop cluster on your machines and set HADOOP_HOME env variable.

  2. put files into your hdfs. ex) test/inputs/file1

  3. Now you can run 'joh' like below:

$ joh examples/wordcount.rb test/inputs test/outputs

You can get Hadoop job results in your hdfs test/outputs/part-*

Example

see also examples/wordcount.rb

def setup(conf)
  # setup jobconf
end

def map(key, value, output, reporter)
  # mapper process
  # (wordcount example)
  value.split.each do |word|
    output.collect(word, 1)
  end
end

def reduce(key, values, output, reporter)
  # reducer process
  # (wordcount example)
  sum = 0
  values.each {|v| sum += v }
  output.collect(key, sum)
end

Build

You can build hadoop-ruby.jar by “ant”.

ant

Required to set env HADOOP_HOME for your system. Assumed Hadoop version is 0.19.2.

Author

Koichi Fujikawa <fujibee@gmail.com>

Copyright

License: Apache License

Something went wrong with that request. Please try again.