This project implements the following MapReduce algorithms(MapReduce programming model enables processing very large datasets in parallel) for a variety of common data processing tasks.
-
Creates an inverted index (dictionary where each word is associated with a list of document identifiers in which that word appears) for a given set of documents.
-
Implements a relational join as a MapReduce query.
-
Implements a MapReduce algorithm that derives the number of friends each person has from a simple social network dataset consisting of key-value pairs where each key is a person and each value is a friend of that person.
-
Implements a MapReduce algorithm to check whether a relationship is symmetric (is A is a friend of B, B is a friend of A) and generates a list of all non-symmetric friend relationships.
-
Designs a MapReduce algorithm to compute matrix multiplication: A * B, where A, B are two matrices in sparse matrix format.