Example application on how to use mongo-hadoop connector with Spark
Clone or download
Latest commit 0f75253 Feb 19, 2014
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib Initial commit Feb 17, 2014
project Remove eclipse plugin dependency Feb 17, 2014
src/main Initial commit Feb 17, 2014
.classpath Initial commit Feb 17, 2014
.gitignore Initial commit Feb 17, 2014
.project Initial commit Feb 17, 2014
README.md Add blog link Feb 18, 2014
beowulf.json Initial commit Feb 17, 2014
build.sbt Remove eclipse plugin dependency Feb 17, 2014

README.md

mongo-spark

Example application on how to use mongo-hadoop connector with Apache Spark.

Read more details at http://codeforhire.com/2014/02/18/using-spark-with-mongodb/

Prerequisites

  • MongoDB installed and running on localhost
  • Scala 2.10 and SBT installed

Running

Import data into the database, run either JavaWordCount or ScalaWordCount and print the results.

mongoimport -d beowulf -c input beowulf.json
sbt 'run-main JavaWordCount'
sbt 'run-main ScalaWordCount'
mongo beowulf --eval 'printjson(db.output.find().toArray())' | less

License

The code itself is released to the public domain according to the Creative Commons CC0.

The example files are based on Beowulf from Project Gutenberg and is under its corresponding license.