Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tree: d69137da7b
Fetching contributors…

Cannot retrieve contributors at this time

90 lines (50 sloc) 3.972 kb
layout title collapse
default
Install Notes
false

{{ site.gemname }} :: install

Get the code

Wukong is still under active development. The newest version is available via Git on github: site.gemname }}

$ git clone git://github.com/mrflip/{{ site.gemname }}

A gem is available from github:

$ sudo gem install mrflip-{{ site.gemname }} --source=http://gems.github.com

or from gemcutter

$ sudo gem install {{ site.gemname }} --source=http://gemcutter.org

You can instead download this project in either zip site.gemname }}/zipball/master or tar site.gemname }}/tarball/master formats.

Get the Dependencies

  • Hadoop, pig
  • extlib, YAML, JSON
  • Optional gems: trollop, addressable/uri, htmlentities

Setup

1. Allow Wukong to discover where his elephant friend lives by setting a $HADOOP_HOME environment variable: export HADOOP_HOME="/usr/local/share/hadoop"
2. Add wukong’s bin/ directory to your $PATH if you’d like to use the wutils

(see also: Ruby Hadoop Quickstart)

Installing and Running Wukong with Hadoop

Wukong was primarily developed for Hadoop, and we think it’s the best way to use Hadoop (it’s certainly the most fun!).

Run Wukong on the Amazon AWS EC2 Cloud

Hadoop Infrastructure

Even if you have a bunch of machines with spare cycles, lots of RAM, and a shared filesystem… do yourself a favor and start out using the Cloudera AMIs on Amazon’s EC2 cloud. There are an overwhelming number of fiddly little parameters and you’ll be glad for the user experience before you get into server setup. If it’s still mid-late 2009 when you read this, ignore prudence and jump straight to using Hadoop 0.20. It will be a) more fun, b) much more robust (trust me, at “v0.20” you want to live on the bleeding edge), and c) you won’t have to suffer through migrating your HDFS two weeks after setup.

To set up hadoop, your best bet are the Cloudera AMIs on Amazon’s EC2 compute cloud:

EC2 means anyone with a $10 bill can rent a 10-machine cluster with 1TB of distributed storage for 8 hours.

Run Wukong using Amazon AWS Elastic MapReduce

AWS Elastic MapReduce saves the trouble of even setting up a cluster: click, bam, there it is.

Phil Ripperger has prepared a Ruby Hadoop Quickstart explaining how to get started with Wukong, Hadoop and the Amazon Elastic MapReduce cloud — it’s better than anything we could put here. Thanks Phil!

Set up a Hadoop cluster

If you have a local cluster, or just want to experiment with a single-machine install, check out the Cloudera packages for both Debian/Ubuntu-based and Redhat/RPM-based Linux systems.

More Hadoop Notes

I’ve braindumped some random notes on configuring and using hadoop over here

Wukong isn’t just Hadoop: Datamapper, ActiveRecord, command-line usage and more

Wukong is used by many in an non-Hadoop environment — anywhere you can stream data records, you can unleash its monkey power.

Please see the usage notes for more!

Jump to Line
Something went wrong with that request. Please try again.