Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

This branch is 863 commits behind master

Fetching latest commit…

Cannot retrieve the latest commit at this time

..
Failed to load latest commit information.
pymongo_hadoop
src/main/java
README.md
mapper.py
mapper_kv.py
pom.xml
reducer.py
reducer_kv.py
run.sh
run_kv.sh

README.md

STREAMING

Streaming support + MongoDB requires your Hadoop distribution include the patches for the following issues:

For the mainline Apache Hadoop distribution, these patches were merged for the 0.21.0 release. We have verified as well that the Cloudera distribution (while based on 0.20.x still) includes these patches in CDH3 Update 1; anecdotal evidence (which needs confirmation) indicates they may have been there since CDH2, and likely exist in CDH3 as well.

By default, The Mongo-Hadoop project builds against Apache 0.20.203 which does not include these patches. To build/enable Streaming support you must build against either Cloudera CDH3u1 or Hadoop 0.21.0; you can change the Hadoop version of the build in Maven by specifying the hadoop.release property:

    mvn -Dhadoop.release=cdh3 
    mvn -Dhadoop.release=cloudera

Will both build against Cloudera CDH3u1, while:

    mvn -Dhadoop.release=apache-hadoop-0.21

Will build against Hadoop 0.21 from the mainline Apache distribution. Unfortunately we are not aware of any Maven Repositories which currently contain artifacts for Hadoop 0.21, and you may need to resolve these dependencies by hand if you choose to go down the 'Vanilla' route.

Something went wrong with that request. Please try again.