Benchmark Python mapreduce implementations following A Guide to Python Frameworks and HiBench Hadoop benchmark suite.
As of 20140814T030000Z:
Only wordcount and sort have been implemented. Only Disco and Hadoop streaming Python examples have been completed.
This repository is follow-up to posting on Disco user group: https://groups.google.com/forum/#!topic/disco-dev/u3EsnGgLOPM
Versions used:
- Python v2.7 from the ContinuumIO Anaconda Python distribution
- Disco v0.4.4 from the ContinuumIO Anaconda Python distribution
- Hadoop v2.3.0-cdh5.0.3 from the Cloudera distribution