Skip to content

gangliao/MapReduceFramework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MapReduce Infrastructure

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication.

Install Dependency

How to setup the project

Source Code Structure

Code walk through

How to Run MapReduce Job

This project can run successfully both on Linux and Mac OS X

  1. First, make sure you already installed gRPC and its dependent Protocol Buffers v3.0, check out Install Dependency section to find out much more details.

  2. Compile code and generate libraries

    • Goto src directory and run make command, two libraries would be created in external directory: libmapreduce.a and libmr_worker.a.

          cd src && make
    • Now goto test directory and run make command, two binaries would be created: mrdemo and mr_worker.

          cd test && make
  3. Now running the demo, once you have created all the binaries and libraries.

    • Clear the files if any in the output directory

          rm test/output/*
    • Start all the worker processes in the following fashion:

          ./mr_worker localhost:50051 & ./mr_worker localhost:50052 & ./mr_worker localhost:50053 & ./mr_worker localhost:50054 & ./mr_worker localhost:50055 & ./mr_worker localhost:50056;
    • Then start your main map reduce process: ./mrdemo

          ./mrdemo
    • Once the ./mrdemo finishes, kill all the worker proccesses you started.

      1. For Mac OS X:

            killall mr_worker
      2. For Linux:

            killall mr_worker
    • Check output directory to see if you have the correct results(obviously once you have done the proper implementation of your library

      .
      ├── output0.txt
      ├── output1.txt
      ├── output2.txt
      ├── output3.txt
      ├── output4.txt
      ├── output5.txt
      ├── output6.txt
      ├── output7.txt
      ├── temp0.txt
      ├── temp1.txt
      ├── temp2.txt
      ├── temp3.txt
      ├── temp4.txt
      ├── temp5.txt
      ├── temp6.txt
      └── temp7.txt
      
      0 directories, 16 files
      

Reference

Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, 2004

About

Map Reduce infrastructure lite using c++11 and gRPC

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published