Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 100 lines (73 sloc) 4.953 kb
3a491f5 @kstirman Update README.md
kstirman authored
1 #MongoDB Connector for Hadoop
76c9f5f @erh new readme
erh authored
2
595a794 @mpobrien move to README file for display in github.
mpobrien authored
3 ##Purpose
2bef81c @spf13 Update README.md
spf13 authored
4
ba958e3 @llvtt Update the README:
llvtt authored
5 The MongoDB Connector for Hadoop is a library which allows MongoDB (or backup files in its data format, BSON) to be used as an input source, or output destination, for Hadoop MapReduce tasks. It is designed to allow greater flexibility and performance and make it easy to integrate data in MongoDB with other parts of the Hadoop ecosystem including the following:
6 * [**Pig**][pig-usage]
13797a1 @llvtt Add Spark and MapReduce links to README.
llvtt authored
7 * [**Spark**][spark-usage]
8 * [**MapReduce**][mapreduce-usage]
70b9f99 @llvtt Add streaming usage link to README
llvtt authored
9 * [**Hadoop Streaming**][streaming-usage]
ba958e3 @llvtt Update the README:
llvtt authored
10 * [**Hive**][hive-usage]
11 * [**Flume**][flume-usage]
eea6968 @bwmcadams Further doc cleanup.
bwmcadams authored
12
ba958e3 @llvtt Update the README:
llvtt authored
13 Check out the [releases](https://github.com/mongodb/mongo-hadoop/releases) page for the latest stable release.
4561759 @bwmcadams Update README.md
bwmcadams authored
14
595a794 @mpobrien move to README file for display in github.
mpobrien authored
15 ## Features
509be2d @bwmcadams Updated README for Hadoop versions, contributors and a few notes
bwmcadams authored
16
595a794 @mpobrien move to README file for display in github.
mpobrien authored
17 * Can create data splits to read from standalone, replica set, or sharded configurations
18 * Source data can be filtered with queries using the MongoDB query language
19 * Supports Hadoop Streaming, to allow job code to be written in any language (python, ruby, nodejs currently supported)
2907ab9 @kstirman Update README.md
kstirman authored
20 * Can read data from MongoDB backup files residing on S3, HDFS, or local filesystems
3a491f5 @kstirman Update README.md
kstirman authored
21 * Can write data out in .bson format, which can then be imported to any MongoDB database with `mongorestore`
ba958e3 @llvtt Update the README:
llvtt authored
22 * Works with BSON/MongoDB documents in other Hadoop tools such as [**Pig**][pig-usage] and [**Hive**][hive-usage].
509be2d @bwmcadams Updated README for Hadoop versions, contributors and a few notes
bwmcadams authored
23
595a794 @mpobrien move to README file for display in github.
mpobrien authored
24 ## Download
d710b33 @visualzhou Version 1.2.0
visualzhou authored
25 See the [release](https://github.com/mongodb/mongo-hadoop/releases) page.
509be2d @bwmcadams Updated README for Hadoop versions, contributors and a few notes
bwmcadams authored
26
595a794 @mpobrien move to README file for display in github.
mpobrien authored
27 ## Building
28
8766850 @evanchooly update README to reflect new versions
evanchooly authored
29 Run `./gradlew jar` to build the jars. The jars will be placed in to `build/libs` for each module. e.g. for the core module,
87bdd44 @evanchooly update docs to indicate gradle as the build mechanism
evanchooly authored
30 it will be generated in the `core/build/libs` directory.
595a794 @mpobrien move to README file for display in github.
mpobrien authored
31
87bdd44 @evanchooly update docs to indicate gradle as the build mechanism
evanchooly authored
32 After successfully building, you must copy the jars to the lib directory on each node in your hadoop cluster. This is usually one of the
33 following locations, depending on which Hadoop release you are using:
595a794 @mpobrien move to README file for display in github.
mpobrien authored
34
35 * `$HADOOP_HOME/lib/`
36 * `$HADOOP_HOME/share/hadoop/mapreduce/`
37 * `$HADOOP_HOME/share/hadoop/lib/`
38
f65eefa @evanchooly docs update
evanchooly authored
39 mongo-hadoop should work on any distribution of hadoop. Should you run in to an issue, please file a
40 [Jira](https://jira.mongodb.org/browse/HADOOP/) ticket.
d710b33 @visualzhou Version 1.2.0
visualzhou authored
41
ba958e3 @llvtt Update the README:
llvtt authored
42 ## Documentation
595a794 @mpobrien move to README file for display in github.
mpobrien authored
43
ba958e3 @llvtt Update the README:
llvtt authored
44 For full documentation, please check out the [Hadoop Connector Wiki][wiki]. The documentation includes installation instructions, configuration options, as well as specific instructions and examples for each Hadoop application the connector supports.
595a794 @mpobrien move to README file for display in github.
mpobrien authored
45
46 ## Usage with Amazon Elastic MapReduce
47
87bdd44 @evanchooly update docs to indicate gradle as the build mechanism
evanchooly authored
48 Amazon Elastic MapReduce is a managed Hadoop framework that allows you to submit jobs to a cluster of customizable size and configuration,
49 without needing to deal with provisioning nodes and installing software.
595a794 @mpobrien move to README file for display in github.
mpobrien authored
50
3a491f5 @kstirman Update README.md
kstirman authored
51 Using EMR with the MongoDB Connector for Hadoop allows you to run MapReduce jobs against MongoDB backup files stored in S3.
595a794 @mpobrien move to README file for display in github.
mpobrien authored
52
87bdd44 @evanchooly update docs to indicate gradle as the build mechanism
evanchooly authored
53 Submitting jobs using the MongoDB Connector for Hadoop to EMR simply requires that the bootstrap actions fetch the dependencies (mongoDB
54 java driver, mongo-hadoop-core libs, etc.) and place them into the hadoop distributions `lib` folders.
595a794 @mpobrien move to README file for display in github.
mpobrien authored
55
56 For a full example (running the enron example on Elastic MapReduce) please see [here](examples/elastic-mapreduce/README.md).
57
58 ## Notes for Contributors
59
ba958e3 @llvtt Update the README:
llvtt authored
60 If your code introduces new features, add tests that cover them if possible and make sure that `./gradlew check` still passes. For instructions on how to run the tests, see the [Running the Tests](https://github.com/mongodb/mongo-hadoop/wiki/Running-the-Tests) section in the [wiki][wiki].
87bdd44 @evanchooly update docs to indicate gradle as the build mechanism
evanchooly authored
61 If you're not sure how to write a test for a feature or have trouble with a test failure, please post on the google-groups with details
8766850 @evanchooly update README to reflect new versions
evanchooly authored
62 and we will try to help. _Note_: Until findbugs updates its dependencies, running `./gradlew check` on Java 8 will fail.
595a794 @mpobrien move to README file for display in github.
mpobrien authored
63
64 ### Maintainers
ba958e3 @llvtt Update the README:
llvtt authored
65 Luke Lovett (luke.lovett@mongodb.com)
595a794 @mpobrien move to README file for display in github.
mpobrien authored
66
67 ### Contributors
87bdd44 @evanchooly update docs to indicate gradle as the build mechanism
evanchooly authored
68 * Mike O'Brien (mikeo@10gen.com)
595a794 @mpobrien move to README file for display in github.
mpobrien authored
69 * Brendan McAdams brendan@10gen.com
70 * Eliot Horowitz erh@10gen.com
71 * Ryan Nitz ryan@10gen.com
72 * Russell Jurney (@rjurney) (Lots of significant Pig improvements)
73 * Sarthak Dudhara sarthak.83@gmail.com (BSONWritable comparable interface)
74 * Priya Manda priyakanth024@gmail.com (Test Harness Code)
75 * Rushin Shah rushin10@gmail.com (Test Harness Code)
76 * Joseph Shraibman jks@iname.com (Sharded Input Splits)
77 * Sumin Xia xiasumin1984@gmail.com (Sharded Input Splits)
62a18d5 @mpobrien additions to list of contributors.
mpobrien authored
78 * Jeremy Karn
79 * bpfoster
80 * Ross Lawley
81 * Carsten Hufe
82 * Asya Kamsky
83 * Thomas Millar
ba958e3 @llvtt Update the README:
llvtt authored
84 * Justin Lee
85 * Luke Lovett
509be2d @bwmcadams Updated README for Hadoop versions, contributors and a few notes
bwmcadams authored
86
595a794 @mpobrien move to README file for display in github.
mpobrien authored
87 ### Support
1254875 @bwmcadams Updated README with more information on what does and does not work and
bwmcadams authored
88
f16d6be @IanWhalen Updated link to issue tracking location
IanWhalen authored
89 Issue tracking: https://jira.mongodb.org/browse/HADOOP/
d038355 @erh some readme additions
erh authored
90
0e4b555 @evanchooly version bump
evanchooly authored
91 Discussion: http://groups.google.com/group/mongodb-user/
ba958e3 @llvtt Update the README:
llvtt authored
92
93 [pig-usage]: https://github.com/mongodb/mongo-hadoop/wiki/Pig-Usage
94 [hive-usage]: https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
95 [flume-usage]: https://github.com/mongodb/mongo-hadoop/wiki/Flume-Usage
70b9f99 @llvtt Add streaming usage link to README
llvtt authored
96 [streaming-usage]: https://github.com/mongodb/mongo-hadoop/wiki/Streaming-Usage
13797a1 @llvtt Add Spark and MapReduce links to README.
llvtt authored
97 [spark-usage]: https://github.com/mongodb/mongo-hadoop/wiki/Spark-Usage
98 [mapreduce-usage]: https://github.com/mongodb/mongo-hadoop/wiki/MapReduce-Usage
ba958e3 @llvtt Update the README:
llvtt authored
99 [wiki]: https://github.com/mongodb/mongo-hadoop/wiki
Something went wrong with that request. Please try again.