Example on how to use the MongoDB to Spark connector
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docker
src/main/scala/com/github/rbrugier
.gitignore
README.md
pom.xml

README.md

MongoDB to Spark connector example

This project demonstrate how to use the MongoDB to Spark connector.

The queries are adapted from the aggregation pipeline example from the MongoDB documentation.

How to run:

Prerequisite:

  • Install docker and docker-compose
  • Install maven

Run MongoDB and import data

From the project root:

docker-compose -f docker/docker-compose.yml up -d
docker exec -it mongo_container sh /scripts/import-data.sh

Checking:

Verify the data have been loaded in MongoDB by connecting to the container and run a count:

docker exec mongo_container mongo --eval "db.zips.count()"

Should return:

MongoDB shell version: 3.2.11
connecting to: test
29353

Import the project

  • Import the maven project in your favorite IDE
  • Run the MongoSparkMain class