Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
ch1
 
 
ch2
 
 
ch3
 
 
ch4
 
 
ch5
 
 
ch6
 
 
ch7
 
 
ch8
 
 
ch9
 
 
 
 

README.md

book-examples

This repository contains examples (and errata) for Learning Hadoop 2.

Requirements

Throughout the book we use Cloudera CDH 5.0 and Amazon EMR as reference systems. All examples target, and have been tested with, Java 7.

Build the examples

The easiest way to build the examples, with CDH 5.0 dependencies, is to use the provided Gradle and sbt scripts.

Gradle

We use Gradle to compile Java code and collect the required class files into a single JAR file.

$ ./gradlew jar

JARs can then be submitted to Hadoop with:

$ hadoop jar <job jarfile> <main class> <argument 1> ... <argument 2>

Example - Chapter 3 (Mapreduce and beyond)

To build ch3 examples

$ git clone https://github.com/learninghadoop2/book-examples
$ cd book-examples/ch3
$ ./gradlew jar

The script will take care of downloading a Gradle distribution from the official repo (https://services.gradle.org/distributions/gradle-2.0-bin.zip), and use it to build the code under src/main/java/com/learninghadoop2/mapreduce/. You will find the resulting jar in build/libs/mapreduce-example.jar.

We can run the WordCount example as described in Chapter 3:

$ hadoop jar build/libs/mapreduce-example.jar \
com.learninghadoop2.mapreduce.WordCount \
input.txt \
output

For more information on how gradle is bootstrapped to run the build, refer to https://docs.gradle.org/current/userguide/gradle_wrapper.html The gradle_wrapper plugin is distributed with the examples (gradle/wrapper/gradle-wrapper.jar).

SBT

We use sbt to build, manage, and execute the Spark examples in Chapter 5.

The build.sbt file controls the codebase metadata and software dependencies.

The source code for all examples can be compiled with:

$ cd ch5
$ sbt compile

Or, it can be packaged into a JAR file with:

$ sbt package

For Spark in standalone mode, an helper script to execute compiled classes can be generated with:

$ sbt add-start-script-tasks
$ sbt start-script

The helper can be invoked as follows:

$ target/start <class name> <master> <param1> … <param n>

YARN on CDH5

To run the examples on a YARN grid on CDH5, you can build a JAR file using:

$ sbt package

and then ship it to the Resource Manager using the spark-submit command:

./bin/spark-submit --class application.to.execute --master yarn-cluster [options] target/scala-2.10/chapter-4_2.10-1.0.jar [<param1> … <param n>]

Unlike the standalone mode, we don't need to specify a URI.

More information on launching Spark on YARN can be found at http://spark.apache.org/docs/latest/running-on-yarn.html.

About

Examples from Learning Hadoop 2 (Packt Publishing, 2015)

Resources

Releases

No releases published
You can’t perform that action at this time.