Skip to content
This repository has been archived by the owner on Jan 29, 2022. It is now read-only.

Project Layout

Luke Lovett edited this page Oct 7, 2015 · 3 revisions

This page describes the overall structure of the mongo-hadoop project.

The mongo-hadoop provides integrations for a number of different Hadoop frameworks. Each of these integrations are contained in their own module at the root level of the project. These are:

  • core - MapReduce integration and abstractions that are reusable by the other modules.
  • hive - Apache Hive inegration
  • pig - Apache Pig integration
  • spark - Apache Spark integration (currently necessary only for PySpark support)
  • flume - Apache Flume integration
  • streaming - Hadoop Streaming integration

There are also a few directories that don't contain any of the actual MongoDB Hadoop connector source code itself. These are:

  • examples - Example code demonstrating how to use the MongoDB Hadoop connector.
  • docs - Old documentation. This will probably be deleted soon. Please consult the Github wiki instead (the one you're reading right now!).
  • config - The checkstyle and findbugs configuration XML files. Please use these when developing the Hadoop Connector.
  • tools - Contains the bson_splitter script (Python script for splitting large BSON files into smaller pieces). This may go away soon.
  • clusterConfigs - Contains some of the Hadoop configuration files that are used during the tests. Note that several other configuration files are contained in the build/resources directories under certain modules so that they'll be added to the CLASSPATH.
  • gradle - Gradle scripts used to run tests.
Clone this wiki locally