The Project consists of 5 sub-projects which use different technologies.
Part 1: Map-Reduce: Finding words present in different files Objective: to find which words are present in how many files along with the file names using Map-Reduce Framework.
Part 2: Sqoop Task: Loading Data from RDBMS to HDFS Objective: to load data from RDBMS (MySQL) to HDFS & then load data from HDFS to Hive tables using Apache Sqoop.
Part 3: Stocks Analysis using Hive Objective: to run queries on the stocks dataset loaded using Sqoop to understand it better and then perform analytics on it.
Part 4: Pig Analytics Objective: to perform basic queries on 2 datasets: stocks and dividends and then joining them using Pig Latin to understand how Apache Pig Framework works.
Part 5: Twitter’s Top 10 popular Hashtag Streaming per second using Apache Spark Objective: to find the Top 10 popular Hashtag on Twitter and perform Web Scraping using Spark & Scala to stream the data on per second basis.