Permalink
Browse files

updating README from gh-pages: formatting of code snippets

  • Loading branch information...
1 parent e20a0e9 commit b6d96d0ac116c1d33e294725827ce197cf105fce Philip (flip) Kromer committed Oct 12, 2009
Showing with 13 additions and 72 deletions.
  1. +13 −72 README.textile
View
@@ -10,8 +10,7 @@ Treat your dataset as a
Wukong is friends with "Hadoop":http://hadoop.apache.org/core the elephant, "Pig":http://hadoop.apache.org/pig/ the query language, and the @cat@ on your command line. (It's even passing familiar with "martinis":http://datamapper.org and "express trains":http://wiki.rubyonrails.org/rails/pages/ActiveRecord)
-The main documentation -- including tutorials and tips for working with big data -- lives on the "Wukong Pages.":http://mrflip.github.com/wukong Please feel free to add supplemental information to the "wukong wiki.":http://wiki.github.com/mrflip/wukong
-
+The **main documentation** lives on the "Wukong Pages.":http://mrflip.github.com/wukong Please feel free to add supplemental information to the "wukong wiki.":http://wiki.github.com/mrflip/wukong
* "Install and set up wukong":http://mrflip.github.com/wukong/INSTALL.html
* "Tutorial":http://mrflip.github.com/wukong/tutorial.html
@@ -23,22 +22,28 @@ The main documentation -- including tutorials and tips for working with big data
h2. Install
-Wukong is still under active development. The newest version is available at
+** "Main Install and Setup Documentation":http://mrflip.github.com/wukong/INSTALL.html **
+
+Wukong is still under active development. The newest version is available via "Git":http://git-scm.com on "github:":http://github.com/mrflip/{{ site.gemname }}
- http://github.com/mrflip/wukong
+pre. $ git clone git://github.com/mrflip/{{ site.gemname }}
A gem is available from "github:":http://gems.github.com
- gem install mrflip-wukong --source=http://gems.github.com
+pre. $ sudo gem install mrflip-{{ site.gemname }} --source=http://gems.github.com
or from "gemcutter":http://gemcutter.org
- gem install wukong --source=http://gemcutter.org
+pre. $ sudo gem install {{ site.gemname }} --source=http://gemcutter.org
+
+You can instead download this project in either "zip":http://github.com/mrflip/{{ site.gemname }}/zipball/master or "tar":http://github.com/mrflip/{{ site.gemname }}/tarball/master formats.
-Phil Ripperger has prepared "instructions on getting wukong to work on the Amazon AWS cloud.":http://blog.pdatasolutions.com/post/191978092/ruby-on-hadoop-quickstart Thanks Phil!
+To finish setting up, see "setup instructions":http://mrflip.github.com/wukong/INSTALL.html and then read the "usage notes":http://mrflip.github.com/wukong/usage.html
h2. How to write a Wukong script
+** "Tutorial By Example":http://mrflip.github.com/wukong/tutorial.html **
+
Here's a script to count words in a text stream:
<pre><code> require 'wukong'
@@ -121,11 +126,7 @@ You can also use structs to treat your dataset as a stream of objects:
h3. Advanced Patterns
-Wukong has a good collection of map/reduce patterns. For example, it's quite common to accumulate all records for a given key and emit some result based on the whole group.
-
-The AccumulatingReducer calls start! on the first record for each key, calls accumulate() on every example for that key (including the first), and calls finalize() once the last record for that key is seen.
-
-Here's an AccumulatingReducer that takes a long list of key-value pairs and emits, for each key, all its corresponding values in one line.
+Wukong has a good collection of map/reduce patterns. Here's an AccumulatingReducer that takes a long list of key-value pairs and emits, for each key, all its corresponding values in one line.
<pre><code> #
# Roll up all values for each key into a single line
@@ -174,62 +175,6 @@ You'd end up with
@newman @elaine @jerry @kramer
</code></pre>
-h3. More info
-
-There are many useful examples (including an actually-useful version of the WordCount script) in examples/ directory.
-
-h2. Setup
-
-1. Allow Wukong to discover where his elephant friend lives: either
-
- * set a @$HADOOP_HOME@ environment variable,
-
- * or create a file 'config/wukong-site.yaml' with a line that points to the top-level directory of your hadoop install:
-
- @:hadoop_home: /usr/local/share/hadoop@
-
-2. Add wukong's @bin/@ directory to your $PATH, so that you may use its filesystem shortcuts.
-
-h2. How to run a Wukong script
-
-To run your script using local files and no connection to a hadoop cluster,
-
- @your/script.rb --run=local path/to/input_files path/to/output_dir@
-
-To run the command across a Hadoop cluster,
-
- @your/script.rb --run=hadoop path/to/input_files path/to/output_dir@
-
-You can set the default in the config/wukong-site.yaml file, and then just use @--run@ instead of @--run=something@ --it will just use the default run mode.
-
-If you're running @--run=hadoop@, all file paths are HDFS paths. If you're running @--run=local@, all file paths are local paths. (your/script path, of course, lives on the local filesystem).
-
-You can supply arbitrary command line arguments (they wind up as key-value pairs in the options path your mapper and reducer receive), and you can use the hadoop syntax to specify more than one input file:
-
- ./path/to/your/script.rb --any_specific_options --options=can_have_vals \
- --run "input_dir/part_*,input_file2.tsv,etc.tsv" path/to/output_dir
-
-Note that all @--options@ must precede (in any order) all non-options.
-
-h2. How to test your scripts
-
-To run mapper on its own:
-
- cat ./local/test/input.tsv | ./examples/word_count.rb --map | more
-
-or if your test data lies on the HDFS,
-
- hdp-cat test/input.tsv | ./examples/word_count.rb --map | more
-
-Next graduate to running @--run=local@ mode so you can inspect the reducer.
-
-
-h2. What's up with Wukong::AndPig?
-
-@Wukong::AndPig@ is a small library to more easily generate code for the
-"Pig":http://hadoop.apache.org/pig data analysis language. See its
-"README":wukong/and_pig/README.textile for more.
-
h2. Why is it called Wukong?
Hadoop, as you may know, is "named after a stuffed elephant.":http://en.wikipedia.org/wiki/Hadoop Since Wukong was started by the "infochimps":http://infochimps.org team, we needed a simian analog. A Monkey King who journeyed to the land of the Elephant seems to fit the bill:
@@ -239,7 +184,3 @@ bq. Sun Wukong (孙悟空), known in the West as the Monkey King, is the main ch
bq. Sun Wukong possesses incredible strength, being able to lift his 13,500 jīn (8,100 kg) Ruyi Jingu Bang with ease. He also has superb speed, traveling 108,000 li (54,000 kilometers) in one somersault. Sun knows 72 transformations, which allows him to transform into various animals and objects; he is, however, shown with slight problems transforming into other people, since he is unable to complete the transformation of his tail. He is a skilled fighter, capable of holding his own against the best generals of heaven. Each of his hairs possesses magical properties, and is capable of transforming into a clone of the Monkey King himself, or various weapons, animals, and other objects. He also knows various spells in order to command wind, part water, conjure protective circles against demons, freeze humans, demons, and gods alike. -- ["Sun Wukong's Wikipedia entry":http://en.wikipedia.org/wiki/Wukong]
The "Jaime Hewlett / Damon Albarn short":http://news.bbc.co.uk/sport1/hi/olympics/monkey that the BBC made for their 2008 Olympics coverage gives the general idea.
-
-h2. What tools does Wukong work with?
-
-Wukong is friends with "Hadoop":http://hadoop.apache.org/core the elephant, "Pig":http://hadoop.apache.org/pig/ the query language, and the @cat@ on your command line. We're looking forward to being friends with "martinis":http://datamapper.org and "express trains":http://wiki.rubyonrails.org/rails/pages/ActiveRecord down the road.

0 comments on commit b6d96d0

Please sign in to comment.