Skip to content

matt-merman/DistributedGrep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed Grep

CircleCI Go Report Card

Goal

Realize a distributed grep using the MapReduce paradigm: Grep returns the lines of text of a large input file given in input that match a specific pattern (i.e., regular expression) specified.

Main Ideas

  1. Map phase:

    Process the N files/blocks in parallel on workers, applying the map function (i.e., local grep) to each file/block:

    • Master assigns the N files/blocks to workers, that execute the map task;
    • Each map task send its results to the master task;
  2. Shuffle and sort phase:

    It is centralized (on master) and does nothing;

  3. Reduce phase:

    Each reduce task processes its input and sends it to the master:

    • In our case, the reduce phase uses the identity function;

Running

#Build and run server, mappers and reducer in background, and build client
make server_run_client_build

#Run client
make client_run

#Terminate server, mappers and reducer
make kill

#Remove .out files
make clean

#Execute classical grep
make grep

NOTICE: 3 mappers are generated by default. The search word is cats in the file cats.txt. You can edit input in the make file. The number of mappers is defined as constant in the file server.go:

const NUMBER_MAPPER = 3

Implementation

NOTICE: as you can see in the example below, the sentences are returned in random order due to a failure to manage the order in which the goroutines (threadMapper()) return to the Server node.

Example

Performance

A 2.2MB .txt file (lorem.txt) has been used to measure performance. Obviously the classical grep has taken less time for the execution.

About

A distributed grep using the MapReduce paradigm

Topics

Resources

Stars

Watchers

Forks