GitHub - griddb/griddb_hadoop_mapreduce: GridDB connector for Apache Hadoop MapReduce

GridDB connector for Apache Hadoop MapReduce

Overview

The GridDB connector for Apache Hadoop MapReduce is a Java library for using GridDB as an input source and output destination for Hadoop MapReduce jobs. This library allows the GridDB performance to be used directly by MapReduce jobs through in-memory processing.

Operating environment

Building of the library and execution of the sample programs are checked in the environment below.

OS:         CentOS6.7(x64)
Java:       JDK 1.8.0_60
Maven:      apache-maven-3.3.9
Hadoop:     CDH5.7.1(YARN)

QuickStart

Preparations

Build a GridDB Java client and place the created gridstore.jar under the lib directory.

Build

Run the mvn command like the following: $ mvn package and create the following jar files.

gs-hadoop-mapreduce-client/target/gs-hadoop-maprduce-client-1.0.0.jar
gs-hadoop-mapreduce-examples/target/gs-hadoop-maprduce-examples-1.0.0.jar

Running the sample program

An operating example to run the WordCount program using GridDB is shown below. GridDB and Hadoop (HDFS and YARN) need to be started in advance. Run the following in an environment in which these and hadoop commands can be used.

$ cd gs-hadoop-mapreduce-examples
$ ./exec-example.sh \
> --job wordcount \
> --define notificationAddress=<GridDB notification address(default is 239.0.0.1)> \
> --define notificationPort=<GridDB notification port(default is 31999)> \
> --define clusterName=<GridDB cluster name> \
> --define user=<GridDB user> \
> --define password=<GriDB password> \
> pom.xml 2> /dev/null | sort -r

   5        <dependency>
   5        </dependency>
   3        <groupId>org.apache.hadoop</groupId>
   3        <groupId>com.toshiba.mwcloud.gs.hadoop</groupId>
...

The first number is the number of occurrences while the right side is a word in the file (pom.xml) specified as a processing target. See gs-hadoop-mapreduce-examples/README.md for details about the sample programs.

Community

Issues
Use the GitHub issue function if you have any requests, questions, or bug reports.
PullRequest
Use the GitHub pull request function if you want to contribute code. You'll need to agree GridDB Contributor License Agreement(CLA_rev1.1.pdf). By using the GitHub pull request function, you shall be deemed to have agreed to GridDB Contributor License Agreement.

License

The Hadoop MapReduce GridDB connector source license is Apache License, version 2.0.

Trademarks

Apache Hadoop and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Operating environment

QuickStart

Preparations

Build

Running the sample program

Community

License

Trademarks

About

Releases 2

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
gs-hadoop-mapreduce-client		gs-hadoop-mapreduce-client
gs-hadoop-mapreduce-examples		gs-hadoop-mapreduce-examples
lib		lib
.gitattributes		.gitattributes
APACHE-2.0.txt		APACHE-2.0.txt
CLA_rev1.1.pdf		CLA_rev1.1.pdf
LICENSE		LICENSE
README.md		README.md
README_ja.md		README_ja.md
pom.xml		pom.xml

License

griddb/griddb_hadoop_mapreduce

Folders and files

Latest commit

History

Repository files navigation

Overview

Operating environment

QuickStart

Preparations

Build

Running the sample program

Community

License

Trademarks

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages