Skip to content

Commit

Permalink
Documentation for script.rb
Browse files Browse the repository at this point in the history
  • Loading branch information
Philip (flip) Kromer committed Feb 16, 2009
1 parent 264b3d6 commit d3b1330
Show file tree
Hide file tree
Showing 3 changed files with 64 additions and 2 deletions.
53 changes: 51 additions & 2 deletions README.ile.txt
@@ -1,10 +1,19 @@

Wukong makes using Hadoop so easy a chimpanzee can use it.

== How to run a Wukong script

./path/to/your/script.rb --any_specific_options --options=can_have_vals --go input_file1.tsv,input_file2.tsv,etc.tsv path/to/output_dir
your/script.rb --go path/to/input_files path/to/output_dir

All of the file paths are HDFS paths ; your script path, of course, is on the local filesystem.
All of the file paths are HDFS paths except your script path, of course, which
is on the local filesystem.

You can supply arbitrary command line arguments (they wind up as key-value pairs
in the options path your mapper and reducer receive), and you can use the hadoop
syntax to specify more than one input file:

./path/to/your/script.rb --any_specific_options --options=can_have_vals \
--go "input_dir/part_*,input_file2.tsv,etc.tsv" path/to/output_dir


== How to test your scripts
Expand All @@ -13,3 +22,43 @@ To run mapper on its own:
cat ./local/test/input.tsv | ./examples/word_count.rb --map | more
or if your test data lies on the HDFS,
hdp-cat test/input.tsv | ./examples/word_count.rb --map | more


== What's up with Wukong::AndPig?

Wukong::AndPig is a small library to more easily generate code for the
"Pig":http://hadoop.apache.org/pig data analysis language. See
wukong/and_pig/README.textile for more.

== Why is it called Wukong?

Hadoop, as you may know, is "named after a stuffed
elephant."http://en.wikipedia.org/wiki/Hadoop Wukong the Monkey King, known for
his power and agility, is hero of a famous Chinese Fairytale in which he
journeys to the land of the Elephant:

Quoting the "Wikipedia page on Wukong:":http://en.wikipedia.org/wiki/Wukong

bq..::http://en.wikipedia.org/wiki/Wukong Sun Wukong (traditional Chinese: 孫悟空;
simplified Chinese: 孙悟空; pinyin: Sūn Wùkōng; Wade-Giles: Sun1 Wu4-k'ung1;
Japanese 孫悟空 (Son Gokū?)), known in the West as the Monkey King, is the main
character in the classical Chinese epic novel Journey to the West. In the novel,
he accompanies the monk Xuanzang on the journey to retrieve Buddhist sutras from
India.

Sun Wukong possesses incredible strength, being able to lift his 13,500 jīn
(8,100 kg) Ruyi Jingu Bang with ease. He also has superb speed, traveling
108,000 li (54,000 kilometers) in one somersault. Sun knows 72 transformations,
which allows him to transform into various animals and objects; he is, however,
shown with slight problems transforming into other people, since he is unable to
complete the transformation of his tail. He is a skilled fighter, capable of
holding his own against the best generals of heaven. Each of his hairs possesses
magical properties, and is capable of transforming into a clone of the Monkey
King himself, or various weapons, animals, and other objects. He also knows
various spells in order to command wind, part water, conjure protective circles
against demons, freeze humans, demons, and gods alike. (Journey to the West, Wu
Cheng'en (1500-1582), Translated by Foreign Languages Press, Beijing 1993.)

p. Sounds about right to us :) The "BBC-produced Jaime Hewlett / Damon Albarn
short":http://news.bbc.co.uk/sport1/hi/olympics/monkey made for the 2008
Olympics is highly recommended.
1 change: 1 addition & 0 deletions README.textile
12 changes: 12 additions & 0 deletions wukong/and_pig/README.textile
@@ -0,0 +1,12 @@
Wukong::AndPig is a small library to more easily generate code for the
"Pig":http://hadoop.apache.org/pig data analysis language.

Wukong::AndPig lets you use the structs from your Wukong scripts to
generate Pig instructions that know their types and structure -- even through
multiple pig commands. For example, if you use +FOREACH ... GENERATE+ to select
only a few of those fields, Wukong::AndPig will know that the result has only
those fields.

We're still trying to figure out if this is a stupid and crazy idea, or just a
crazy idea: Yeah, we're using a functional/OO scripting language to generate code for an
imperative query language that generates Java code for ad-hoc map-reduce operations.

0 comments on commit d3b1330

Please sign in to comment.