Skip to content
This repository

Python module that allows one to easily write and run Hadoop programs.

Fetching latest commit…


Cannot retrieve the latest commit at this time

Octocat-spinner-32 bin
Octocat-spinner-32 examples
Octocat-spinner-32 src
Octocat-spinner-32 README
Octocat-spinner-32 build-pymod.xml
Octocat-spinner-32 build.xml

Originally, Dumbo was just a simple Python module that made writing 
and running Streaming programs very easy, but now it also consists 
of some helper code in Java. More generally, Dumbo can be considered
to be a convenient Python API for writing MapReduce programs.


Dumbo should get built together with the rest of Hadoop when the 
"dumbo/" directory is put in Hadoop's "src/contrib/" directory. More
precisely, a "build/hadoop-*/contrib/dumbo/" directory should be
generated when you run "ant package" in Hadoop's root directory.


contrib/dumbo/bin/put examples/brian.txt brian.txt

contrib/dumbo/bin/start examples/ \
-input brian.txt -output brian-wc -inputformat text

contrib/dumbo/bin/cat brian-wc > brian-wc.txt

Something went wrong with that request. Please try again.