Skip to content

The objective of this project is to get started with big data processing with Hadoop. The goals of the projects are to implement basic text processing tasks from scratch on the Hadoop framework.

Notifications You must be signed in to change notification settings

manishabiswas/Big-Data-Processing-with-Hadoop

Repository files navigation

Big-Data-Processing-with-Hadoop

The objective of this project is to get started with big data processing with Hadoop. The goals of the projects are to implement basic text processing tasks from scratch on the Hadoop framework.

Task 1: Implement a MapReduce algorithm to produce count of every word in the document.
Task 2: Implement a MapReduce algorithm that will produce modified tri-grams around the key words, after replacing the key word with ‘$’.
Task 3: Implement a MapReduce algorithm to produce inverted index for the dataset.
Task 4: Implement a MapReduce algorithm to join two datasets using a primary key.
Task 5: Implement KNN algorithm using MapReduce on the test and train data.

About

The objective of this project is to get started with big data processing with Hadoop. The goals of the projects are to implement basic text processing tasks from scratch on the Hadoop framework.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages