GitHub - nrohit78/PigHive_StackExhangeData: Data is fetched from StackExchange, transformed using Pig, queried and stored in Hive. Additionally, the TF-IDF of the top 10 users is calculated using Hive.

Task Details:

Acquire the top 200,000 posts by viewcount from stack exchange (https://data.stackexchange.com/stackoverflow/queries)
Using Pig or MapReduce , extract, transform and load the data as applicable
Using Hive and/or MapReduce , get: I. The top 10 posts by score II. The top 10 users by post score III. The number of distinct users, who used the word “Hadoop” in one of their posts
Using Mapreduce /Pig/Hive calculate the per user TF IDF (just submit the top 10 terms for each of the top 10 users from Query 3.II)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
docs		docs
lib		lib
screenshots		screenshots
src		src
README.md		README.md

Provide feedback