Skip to content
This repository

Python module that allows one to easily write and run Hadoop programs.

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 examples
Octocat-spinner-32 src
Octocat-spinner-32 README
Octocat-spinner-32 build-pymod.xml
Octocat-spinner-32 build.xml
README
DESCRIPTION
"""""""""""

Originally, Dumbo was just a simple Python module that made writing 
and running Streaming programs very easy, but now it also consists 
of some helper code in Java (although it can still be used without 
the Java code). More generally, Dumbo can be considered to be a 
convenient Python API for writing MapReduce programs.


INSTALLATION
""""""""""""

The Java code gets build together with the rest of Hadoop when the 
"dumbo/" directory is put in Hadoop's "src/contrib/", and the Python 
module can be installed by running

sudo ant -f build-pymod.xml install_pymod

in the "src/contrib/dumbo" directory. If the dir "dumbo/" is a subdir
of Hadoop's "src/contrib/", then the -f option can be omitted:

sudo ant install_pymod


USAGE
"""""

/usr/local/hadoop/bin/hadoop dfs -put examples/brian.txt brian.txt

python examples/wordcount.py -hadoop /path/to/hadoop \
-file excludes.txt -input brian.txt -output brian-wc

/usr/local/hadoop/bin/hadoop dfs -getmerge brian-wc brian-wc.txt


MORE INFO
"""""""""

http://github.com/klbostee/dumbo/wikis
Something went wrong with that request. Please try again.