Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 90 lines (63 sloc) 3.554 kb
e81349b @mrflip Documentation for script.rb
mrflip authored
1
5c0ca18 @mrflip Correcting #emit to handle Structs
mrflip authored
2 Wukong makes Hadoop so easy a chimpanzee can use it.
e81349b @mrflip Documentation for script.rb
mrflip authored
3
0f51446 @mrflip Now using generator (yield()) semantics rather than crudely puts'ing res...
mrflip authored
4 h2. How to write a Wukong script
5
5c0ca18 @mrflip Correcting #emit to handle Structs
mrflip authored
6 Here's a super-script to count words in a text stream:
7
0f51446 @mrflip Now using generator (yield()) semantics rather than crudely puts'ing res...
mrflip authored
8 #!/usr/bin/env ruby
9 require 'wukong'
10
11 module WordCount
12 class Mapper < Wukong::Streamer::LineStreamer
5c0ca18 @mrflip Correcting #emit to handle Structs
mrflip authored
13 # Emit each word in the line.
0f51446 @mrflip Now using generator (yield()) semantics rather than crudely puts'ing res...
mrflip authored
14 def process line
5c0ca18 @mrflip Correcting #emit to handle Structs
mrflip authored
15 words = line.strip.split(/\W+/).reject(&:blank?)
16 words.each{|word| yield [word, 1] }
0f51446 @mrflip Now using generator (yield()) semantics rather than crudely puts'ing res...
mrflip authored
17 end
18 end
19 end
20 # Execute the script
5c0ca18 @mrflip Correcting #emit to handle Structs
mrflip authored
21 Wukong::Script.new(WordCount::Mapper, Wukong::Streamer::UniqCountKeysReducer).run
0f51446 @mrflip Now using generator (yield()) semantics rather than crudely puts'ing res...
mrflip authored
22
5c0ca18 @mrflip Correcting #emit to handle Structs
mrflip authored
23 There are many useful examples (including an actually-useful version of this
24 WordCount script) in examples/ directory.
0f51446 @mrflip Now using generator (yield()) semantics rather than crudely puts'ing res...
mrflip authored
25
b74872e @mrflip Correcting readme formatting
mrflip authored
26 h2. How to run a Wukong script
e81349b @mrflip Documentation for script.rb
mrflip authored
27
28 your/script.rb --go path/to/input_files path/to/output_dir
29
30 All of the file paths are HDFS paths except your script path, of course, which
31 is on the local filesystem.
32
33 You can supply arbitrary command line arguments (they wind up as key-value pairs
34 in the options path your mapper and reducer receive), and you can use the hadoop
35 syntax to specify more than one input file:
36
37 ./path/to/your/script.rb --any_specific_options --options=can_have_vals \
38 --go "input_dir/part_*,input_file2.tsv,etc.tsv" path/to/output_dir
39
40
b74872e @mrflip Correcting readme formatting
mrflip authored
41 h2. How to test your scripts
e81349b @mrflip Documentation for script.rb
mrflip authored
42
43 To run mapper on its own:
0f51446 @mrflip Now using generator (yield()) semantics rather than crudely puts'ing res...
mrflip authored
44
e81349b @mrflip Documentation for script.rb
mrflip authored
45 cat ./local/test/input.tsv | ./examples/word_count.rb --map | more
0f51446 @mrflip Now using generator (yield()) semantics rather than crudely puts'ing res...
mrflip authored
46
e81349b @mrflip Documentation for script.rb
mrflip authored
47 or if your test data lies on the HDFS,
0f51446 @mrflip Now using generator (yield()) semantics rather than crudely puts'ing res...
mrflip authored
48
e81349b @mrflip Documentation for script.rb
mrflip authored
49 hdp-cat test/input.tsv | ./examples/word_count.rb --map | more
50
51
b74872e @mrflip Correcting readme formatting
mrflip authored
52 h2. What's up with Wukong::AndPig?
e81349b @mrflip Documentation for script.rb
mrflip authored
53
0f51446 @mrflip Now using generator (yield()) semantics rather than crudely puts'ing res...
mrflip authored
54 @Wukong::AndPig@ is a small library to more easily generate code for the
b74872e @mrflip Correcting readme formatting
mrflip authored
55 "Pig":http://hadoop.apache.org/pig data analysis language. See its
56 "README":wukong/and_pig/README.textile for more.
e81349b @mrflip Documentation for script.rb
mrflip authored
57
b74872e @mrflip Correcting readme formatting
mrflip authored
58 h2. Why is it called Wukong?
e81349b @mrflip Documentation for script.rb
mrflip authored
59
60 Hadoop, as you may know, is "named after a stuffed
b74872e @mrflip Correcting readme formatting
mrflip authored
61 elephant."http://en.wikipedia.org/wiki/Hadoop Wukong (the Monkey King), known
62 for his power and agility, is hero of a famous Chinese Fairytale in which he
e81349b @mrflip Documentation for script.rb
mrflip authored
63 journeys to the land of the Elephant:
64
b74872e @mrflip Correcting readme formatting
mrflip authored
65 Quoting "Sun Wukong's Wikipedia entry:":http://en.wikipedia.org/wiki/Wukong
e81349b @mrflip Documentation for script.rb
mrflip authored
66
0f51446 @mrflip Now using generator (yield()) semantics rather than crudely puts'ing res...
mrflip authored
67 bq. Sun Wukong (traditional Chinese: 孫悟空;
e81349b @mrflip Documentation for script.rb
mrflip authored
68 simplified Chinese: 孙悟空; pinyin: Sūn Wùkōng; Wade-Giles: Sun1 Wu4-k'ung1;
69 Japanese 孫悟空 (Son Gokū?)), known in the West as the Monkey King, is the main
70 character in the classical Chinese epic novel Journey to the West. In the novel,
71 he accompanies the monk Xuanzang on the journey to retrieve Buddhist sutras from
72 India.
73
0f51446 @mrflip Now using generator (yield()) semantics rather than crudely puts'ing res...
mrflip authored
74 bq. Sun Wukong possesses incredible strength, being able to lift his 13,500 jīn
e81349b @mrflip Documentation for script.rb
mrflip authored
75 (8,100 kg) Ruyi Jingu Bang with ease. He also has superb speed, traveling
76 108,000 li (54,000 kilometers) in one somersault. Sun knows 72 transformations,
77 which allows him to transform into various animals and objects; he is, however,
78 shown with slight problems transforming into other people, since he is unable to
79 complete the transformation of his tail. He is a skilled fighter, capable of
80 holding his own against the best generals of heaven. Each of his hairs possesses
81 magical properties, and is capable of transforming into a clone of the Monkey
82 King himself, or various weapons, animals, and other objects. He also knows
83 various spells in order to command wind, part water, conjure protective circles
84 against demons, freeze humans, demons, and gods alike. (Journey to the West, Wu
85 Cheng'en (1500-1582), Translated by Foreign Languages Press, Beijing 1993.)
86
87 p. Sounds about right to us :) The "BBC-produced Jaime Hewlett / Damon Albarn
88 short":http://news.bbc.co.uk/sport1/hi/olympics/monkey made for the 2008
89 Olympics is highly recommended.
Something went wrong with that request. Please try again.