GridDB connector for Apache Hadoop MapReduce
The GridDB connector for Apache Hadoop MapReduce is a Java library for using GridDB as an input source and output destination for Hadoop MapReduce jobs. This library allows the GridDB performance to be used directly by MapReduce jobs through in-memory processing.
Building of the library and execution of the sample programs are checked in the environment below.
OS: CentOS6.7(x64) Java: JDK 1.8.0_60 Maven: apache-maven-3.3.9 Hadoop: CDH5.7.1(YARN)
Build a GridDB Java client and place the created gridstore.jar under the lib directory.
Run the mvn command like the following: $ mvn package and create the following jar files.
Running the sample program
An operating example to run the WordCount program using GridDB is shown below. GridDB and Hadoop (HDFS and YARN) need to be started in advance. Run the following in an environment in which these and hadoop commands can be used.
$ cd gs-hadoop-mapreduce-examples $ ./exec-example.sh \ > --job wordcount \ > --define notificationAddress=<GridDB notification address(default is 126.96.36.199)> \ > --define notificationPort=<GridDB notification port(default is 31999)> \ > --define clusterName=<GridDB cluster name> \ > --define user=<GridDB user> \ > --define password=<GriDB password> \ > pom.xml 2> /dev/null | sort -r 5 <dependency> 5 </dependency> 3 <groupId>org.apache.hadoop</groupId> 3 <groupId>com.toshiba.mwcloud.gs.hadoop</groupId> ...
The first number is the number of occurrences while the right side is a word in the file (pom.xml) specified as a processing target. See gs-hadoop-mapreduce-examples/README.md for details about the sample programs.
Use the GitHub issue function if you have any requests, questions, or bug reports.
Use the GitHub pull request function if you want to contribute code. You'll need to agree GridDB Contributor License Agreement(CLA_rev1.1.pdf). By using the GitHub pull request function, you shall be deemed to have agreed to GridDB Contributor License Agreement.
The Hadoop MapReduce GridDB connector source license is Apache License, version 2.0.
Apache Hadoop and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.