Browse files

modernizing docs and handlers

  • Loading branch information...
1 parent db1a8cf commit 88548d60766e29a49a72598498143728e6e9f688 Philip (flip) Kromer committed Aug 12, 2012
Showing with 145 additions and 400 deletions.
  1. +0 −5 .document
  2. +9 −0 .yardopts
  3. +25 −28 CHANGELOG.textile →
  4. +40 −0 Gemfile
  5. +0 −89 INSTALL.textile
  6. +43 −55 LICENSE.textile →
  7. +0 −141
  8. +28 −71 Rakefile
  9. +0 −11 TODO.textile
@@ -1,5 +0,0 @@
@@ -0,0 +1,9 @@
+--readme README.textile
+--markup markdown
@@ -1,47 +1,44 @@
-h2. Wukong v2.0.0
+## Wukong v2.0.0
-h4. Important changes
+#### Important changes
-* Passing options to streamers is now deprecated. Use @Settings@ instead.
+* Passing options to streamers is now deprecated. Use `Settings` instead.
* Streamer by default has a periodic monitor that logs (to STDERR by default) every 10_000 lines or 30 seconds
* Examples cleaned up, should all run
-h4. Simplified syntax
+#### Simplified syntax
* you can now pass an *instance* of Streamer to use as mapper or reducer
* Adding an experimental sugar:
- <pre>
#!/usr/bin/env ruby
require 'wukong/script' do |line|
emit line.reverse
- </pre>
Note that you can now tweet a wukong script.
-* It's now recommended that at the top of a wukong script you say
- <pre>
- require 'wukong/script'
- </pre>
+* It's now recommended that at the top of a wukong script you say `require 'wukong/script'`
Among other benefits, this lets you refer to wukong streamers without prefix.
-h2. Wukong v1.5.4
+## Wukong v1.5.4
* EMR support now works very well
-h2. Wukong v1.5.3
+## Wukong v1.5.3
* A couple of bugfixes. Sorry about that.
* Documentation fixes
-h2. Wukong v1.5.0
+## Wukong v1.5.0
-h4. Elastic Map-Reduce
+#### Elastic Map-Reduce
Use --run=emr to launch a job onto the Amazon Elastic MapReduce cloud.
@@ -51,21 +48,21 @@ Use --run=emr to launch a job onto the Amazon Elastic MapReduce cloud.
It's still **way** shaky and I don't think anything but the sample app will run. That sample app runs, tho.
-h4. Greatly simplified script launching.
+#### Greatly simplified script launching.
Incompatible changes to option handling and script launching:
* Script doesn't use extra_options any more. You should relocate them to the initializer or to configliere.
* there is no more default_mapper or default_reducer
-h2. Wukong v.14.12 2010-08-31
+## Wukong v.14.12 2010-08-31
* Improvements to the pig conversion methods
-* @hdp-rm@ respects the -skipTrash method
+* `hdp-rm` respects the -skipTrash method
-h2. Wukong v1.4.11 2010-07-30
+## Wukong v1.4.11 2010-07-30
-* added the @max_(maps|reduces)_per_(node|cluster)@ jobconfs.
+* added the `max_(maps|reduces)_per_(node|cluster)` jobconfs.
* added jobconfs for io_job_mb and friends.
* added a loadable module to convert output data to pig bags and tuples
* pulled in several methods from active_support, incl. Enumerable#sum
@@ -74,33 +71,33 @@ h2. Wukong v1.4.11 2010-07-30
source into a generic sink. Several stores have been landed in the code, but
many are in a half- or un-baked state. Please ignore this for the moment.
-h2. Wukong v1.4.8 2010-06-05
+## Wukong v1.4.8 2010-06-05
* made scripts inject a helpful job name using
* Hash.compact_blank! and HashLike.compact_blank! -- eliminate all key-values whoes value is blank?
-h2. Wukong v1.4.8 2010-05-17
+## Wukong v1.4.8 2010-05-17
* Bug in passing commandline args down to map and reduce child processes
-h2. Wukong v1.4.7 2010-03-05
+## Wukong v1.4.7 2010-03-05
Lots more examples:
* examples/stats/avg_value_frequency.rb does an Average Value Frequency histogram
* examples/server_logs has a quite useful apache log file parser
* Made the base streamer use each_record, opening the door for alternative record injection (eg Datamapper!)
* wukong/streamer/counting_reducer.rb is an um reducer and it counts things.
-h2. Wukong v1.4.6 2010-01-26
+## Wukong v1.4.6 2010-01-26
* A HELLA AWESOME working example from retail web analytics by @lenbust
-h2. Wukong v1.4.5 2010-01-18
+## Wukong v1.4.5 2010-01-18
-* In @--run=local@ mode, you can use '-' alone as a filename to indicate STDIN / STDOUT as input/output respectively.
+* In `--run=local` mode, you can use '-' alone as a filename to indicate STDIN / STDOUT as input/output respectively.
* Minor tweaks to contrib/jeans
-h2. Wukong v1.4.4 2010-01-15
+## Wukong v1.4.4 2010-01-15
-* Moved settings management & command line handling over to "Configliere": (
-* Added "example script and notes": from Fredrik Möllerstrand (@lenbust)
+* Moved settings management & command line handling over to [Configliere]( (
+* Added [example script and notes]( from Fredrik Möllerstrand (@lenbust)
40 Gemfile
@@ -0,0 +1,40 @@
+source ''
+gem 'configliere', :github => 'infochimps-labs/configliere', :branch => 'master'
+gem 'gorillib', :github => 'infochimps-labs/gorillib', :branch => 'version_1'
+gem 'addressable'
+gem 'htmlentities'
+gem 'multi_json', ">= 1.1"
+gem 'home_run', :platform => :mri, :require=>'date'
+# Only gems that you want listed as development dependencies in the gemspec
+group :development do
+ gem 'bundler', "~> 1.1"
+ gem 'rake'
+ gem 'yard', ">= 0.7"
+ gem 'simplecov', ">= 0.5", :platform => :ruby_19
+ gem 'rspec', "~> 2.8"
+ gem 'RedCloth', "~> 4.2"
+ gem 'redcarpet', ">= 2.1"
+ #
+ gem 'oj', ">= 1.2"
+ gem 'json', :platform => :jruby
+# Gems you would use if hacking on this gem (rather than with it)
+group :support do
+ gem 'jeweler', ">= 1.6"
+ gem 'pry'
+ gem 'perftools.rb', :platform => :mri
+# Gems for testing and coverage
+group :test do
+ gem 'guard', ">= 1.0"
+ gem 'guard-rspec', ">= 0.6"
+ gem 'guard-yard'
+ if RUBY_PLATFORM.include?('darwin')
+ gem 'rb-fsevent', ">= 0.9"
+ end
@@ -1,89 +0,0 @@
-layout: default
-title: Install Notes
-collapse: false
-h1(gemheader). {{ site.gemname }} %(small):: install%
-** "Get the code":#getcode
-** "Setup":#setup
-** "Installing and Running Wukong with Hadoop":#gethadoop
-** "Installing and Running Wukong with Datamapper, ActiveRecord, the command-line and more":#others
-<notextile><div class="toggle"></notextile>
-h2(#getcode). Get the code
-Wukong is still under active development. The newest version is available via "Git": on "github:":{{ site.gemname }}
-pre. $ git clone git://{{ site.gemname }}
-A gem is available from "github:":
-pre. $ sudo gem install mrflip-{{ site.gemname }} --source=
-or from "gemcutter":
-pre. $ sudo gem install {{ site.gemname }} --source=
-You can instead download this project in either "zip":{{ site.gemname }}/zipball/master or "tar":{{ site.gemname }}/tarball/master formats.
-h3. Get the Dependencies
-* Hadoop, pig
-* extlib, YAML, JSON
-* Optional gems: trollop, addressable/uri, htmlentities
-<notextile></div><div class="toggle"></notextile>
-h2(#setup). Setup
-1. Allow Wukong to discover where his elephant friend lives by setting a $HADOOP_HOME environment variable: @export HADOOP_HOME="/usr/local/share/hadoop"@
-2. Add wukong's @bin/@ directory to your $PATH if you'd like to use the "wutils":wutils.html
-<i>(see also: "Ruby Hadoop Quickstart":</i>
-<notextile></div><div class="toggle"></notextile>
-h2(#gethadoop). Installing and Running Wukong with Hadoop
-Wukong was primarily developed for Hadoop, and we think it's the best way to use Hadoop (it's certainly the most fun!).
-h3. Run Wukong on the Amazon AWS EC2 Cloud
-h3. Hadoop Infrastructure
-Even if you have a bunch of machines with spare cycles, lots of RAM, and a shared filesystem... do yourself a favor and start out using the "Cloudera AMIs on Amazon's EC2 cloud.": There are an overwhelming number of fiddly little parameters and you'll be glad for the user experience before you get into server setup. If it's still mid-late 2009 when you read this, ignore prudence and jump straight to using Hadoop 0.20. It will be a) more fun, b) much more robust (trust me, at "v0.20" you want to live on the bleeding edge), and c) you won't have to suffer through migrating your HDFS two weeks after setup.
-To set up hadoop, your best bet are the Cloudera AMIs on Amazon's EC2 compute cloud:
-EC2 means anyone with a $10 bill can rent a 10-machine cluster with 1TB of distributed storage for 8 hours.
-h3. Run Wukong using Amazon AWS Elastic MapReduce
-AWS Elastic MapReduce saves the trouble of even setting up a cluster: click, bam, there it is.
-Phil Ripperger has prepared a "Ruby Hadoop Quickstart": explaining how to get started with Wukong, Hadoop and the Amazon Elastic MapReduce cloud -- it's better than anything we could put here. Thanks Phil!
-h3. Set up a Hadoop cluster
-If you have a local cluster, or just want to experiment with a single-machine install, check out the Cloudera packages for both Debian/Ubuntu-based and Redhat/RPM-based Linux systems.
-h3. More Hadoop Notes
-I've braindumped some random notes on configuring and using hadoop "over here":hadoop-tips.html
-<notextile></div><div class="toggle"></notextile>
-h2(#others). Wukong isn't just Hadoop: Datamapper, ActiveRecord, command-line usage and more
-Wukong is used by many in an non-Hadoop environment -- anywhere you can stream data records, you can unleash its monkey power.
-Please see the "usage notes":usage.html#playnice for more!
Oops, something went wrong.

0 comments on commit 88548d6

Please sign in to comment.