Skip to content

Full Stack of Hadoop 2.9.1 for Compilation and Modification

License

Notifications You must be signed in to change notification settings

razo7/Nap-Hadoop-2.9.1

Repository files navigation

Nap-Hadoop-2.9.1

Network-Aware, modified, version of Hadoop 2.9.1 used for Nap project (see more here).

Table of contents

Motivation

Create a network aware version of Hadoop. For that we write the map and reduce containers locations (the node for each container/task) to HDFS and LOG, then we can read these locations from HDFS for our network aware modification (Nap).

This modification results in writing these locations in Hadoop LOG, and under "/mappersLocations" or "/reducersLocations" directory respectively (HDFS), when the NameNode computer has to be "called" (has an hostname) master.

Technologies

  • Hadoop 2.9.1

Modification & Compilation

Writing Containers Location

The three changes in Hadoop source code has been done only in Nap-Hadoop-2.9.1/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java, and they are:

  • Before the constructor function, add four imports for the use of HDFS, URI and a flag for starting or stoping the writing to HDFS
  • In heartbeat() function check if all the mappers and reducers have been allocated, then find their locations by going over the requested LinkedHashMap, and write to HDFS in "/mappersLocations" or "/reducersLocations" directory. In addition, write to LOG every heartbeat the status of the mappers and reducers location.
  • In assignContainers(List allocatedContainers) function write to LOG the amount of mappers and reducers before the assignment of the task to nodes (slaves). After the assignment write to LOG the locations of the mappers and reducers.

All of the above changes can be easily found before "OR_Change" comment at each line.

Compilation

  1. Download Hadoop source code from here or clone this project.
  2. Make the changes in the desired java files, i.e., RMContainerAllocator.java.

** Read the build instructions for Hadoop from BUILDING.txt (in the parent directory).

  1. Install cmake, Google protocol buffers (Proto buf 2.5), and Maven (for compiling).
  2. Run one of the following codes under the hadoop-project directory (Nap-Hadoop-2.9.1/hadoop-project/):
mvn clean install -Pdist -Pnative -Dtar -DskipTests 
mvn package -Pdist,native -DskipTests -Dtar

Contact

Created by Or Raz (razo7) as part of his master's thesis work and it was partly published in the following article of NCA 19 (IEEE) - feel free to contact on Linkedin or email (razo@post.bgu.ac.il)!