Distributed Grep

Goal

Realize a distributed grep using the MapReduce paradigm: Grep returns the lines of text of a large input file given in input that match a specific pattern (i.e., regular expression) specified.

Main Ideas

Map phase:

Process the N files/blocks in parallel on workers, applying the map function (i.e., local grep) to each file/block:
- Master assigns the N files/blocks to workers, that execute the map task;
- Each map task send its results to the master task;
Shuffle and sort phase:

It is centralized (on master) and does nothing;
Reduce phase:

Each reduce task processes its input and sends it to the master:
- In our case, the reduce phase uses the identity function;

Running

#Build and run server, mappers and reducer in background, and build client
make server_run_client_build

#Run client
make client_run

#Terminate server, mappers and reducer
make kill

#Remove .out files
make clean

#Execute classical grep
make grep

NOTICE: 3 mappers are generated by default. The search word is cats in the file cats.txt. You can edit input in the make file. The number of mappers is defined as constant in the file server.go:

const NUMBER_MAPPER = 3

Implementation

NOTICE: as you can see in the example below, the sentences are returned in random order due to a failure to manage the order in which the goroutines (threadMapper()) return to the Server node.

Example

Performance

A 2.2MB .txt file (lorem.txt) has been used to measure performance. Obviously the classical grep has taken less time for the execution.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.circleci		.circleci
code		code
images		images
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Grep

Goal

Main Ideas

Running

Implementation

Example

Performance

About

Languages

matt-merman/DistributedGrep

Folders and files

Latest commit

History

Repository files navigation

Distributed Grep

Goal

Main Ideas

Running

Implementation

Example

Performance

About

Topics

Resources

Stars

Watchers

Forks

Languages