layout | title | collapse |
---|---|---|
default |
mrflip.github.com/wukong |
true |
{{ site.description }}
Treat your dataset like a
- stream of lines when it’s efficient to process by lines
- stream of field arrays when it’s efficient to deal directly with fields
- stream of lightweight objects when it’s efficient to deal with objects
Wukong is friends with Hadoop the elephant, Pig the query language, and the cat
on your command line.
Send Wukong questions to the Infinite Monkeywrench mailing list
- Tutorial
- Count Words
- Structured data
- Accumulators including a UniqByLastReducer and a GroupBy reducer.
- Wutils — command-line utilies for working with data from the command line
- Overview of wutils — command listing
- Stupid command-line tricks using the wutils
- wu-lign — present a tab-separated file as aligned columns
- Dear Lazyweb, please build us a tab-oriented version of the Textutils library
- Links and tips for configuring and working with hadoop
- Some opinionated thoughts on working with big data, on why you should drop acid, treat exceptions as records, and happily embrace variable-length strings as primary keys.
- Wukong is licensed under the Apache License (same as Hadoop)
- Work in progress: an intro to data processing with wukong:
{% include intro.textile %}
Wukong was written by Philip (flip) Kromer (flip@infochimps.org / @mrflip) for the infochimps project
Patches submitted by:
- gemified by Ben Woosley (ben.woosley@gmail.com)
- ruby interpreter path fix by Yuichiro MASUI – masui@masuidrive.jp – http://blog.masuidrive.jp/
Thanks to:
- Brad Heintz for his early feedback
- Phil Ripperger for his wukong in the Amazon AWS cloud tutorial.
Send Wukong questions to the Infinite Monkeywrench mailing list
{% include news.html %}