Project Layout

This page describes the overall structure of the mongo-hadoop project.

The mongo-hadoop provides integrations for a number of different Hadoop frameworks. Each of these integrations are contained in their own module at the root level of the project. These are:

core - MapReduce integration and abstractions that are reusable by the other modules.
hive - Apache Hive inegration
pig - Apache Pig integration
spark - Apache Spark integration (currently necessary only for PySpark support)
flume - Apache Flume integration
streaming - Hadoop Streaming integration

There are also a few directories that don't contain any of the actual MongoDB Hadoop connector source code itself. These are:

examples - Example code demonstrating how to use the MongoDB Hadoop connector.
docs - Old documentation. This will probably be deleted soon. Please consult the Github wiki instead (the one you're reading right now!).
config - The checkstyle and findbugs configuration XML files. Please use these when developing the Hadoop Connector.
tools - Contains the bson_splitter script (Python script for splitting large BSON files into smaller pieces). This may go away soon.
clusterConfigs - Contains some of the Hadoop configuration files that are used during the tests. Note that several other configuration files are contained in the build/resources directories under certain modules so that they'll be added to the CLASSPATH.
gradle - Gradle scripts used to run tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Layout

Clone this wiki locally