Skip to content

priyanka21sk/Yelp-Data-Analysis-Using-Hadoop-

Repository files navigation

Yelp Data Analysis Using Hadoop

Objective

Yelp is an internet based company which contains reviews for millions of restaurants and other businesses which are provided by the yelp users. The Yelp dataset challenge that this project was done on contains 2.7M reviews by 687K users for about 86K businesses and focuses on U.K, Germany, Canada and 5 cities in US from 2014 to June 2016.

In our project, we have considered only the 5 cities in the US and did our analysis for the Food and Restaurant business category from 3 data files – business, user and Review.

Prerequisites:

Everything you need to go through the scripts and queries is already provisioned with the cluster. To export the analyzed data to Microsoft Excel, you must meet the following requirements:

 You must have Microsoft Excel 2010, 2013 or 2016 installed.
 You must have your Excel PowerView and 3D Map enabled.
 Tableau 9.2 or 9.3 installed for visualization of the analyzed data
 You must have Microsoft Hive ODBC Driver to import data from Hive into Excel. Select either the 32-bit or 64-bit version based on your version of Microsoft Excel. But, BigInsights does not support it yet as of Sept 2016.

Insights:

The analysis performed on the Yelp Dataset are:

 In which year Yelp has got maximum number of users?

Insight1

 Forecast of users joining Yelp in coming years.

Insight2

 Who are the most active users on Yelp?

Insight3

 Which users’ review is the most popular based on votes?

Insight4

 Which city has the maximum no of closed business?

Insight5

 Sentiment Analysis on Top 8 food chains in the USA based on user reviews.

Insight6

 Best suited restaurants for tourists.

Insight7

Please refer the hive+pig code

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published