Yelp is an internet based company which contains reviews for millions of restaurants and other businesses which are provided by the yelp users. The Yelp dataset challenge that this project was done on contains 2.7M reviews by 687K users for about 86K businesses and focuses on U.K, Germany, Canada and 5 cities in US from 2014 to June 2016.
In our project, we have considered only the 5 cities in the US and did our analysis for the Food and Restaurant business category from 3 data files – business, user and Review.
Everything you need to go through the scripts and queries is already provisioned with the cluster. To export the analyzed data to Microsoft Excel, you must meet the following requirements:
You must have Microsoft Excel 2010, 2013 or 2016 installed.
You must have your Excel PowerView and 3D Map enabled.
Tableau 9.2 or 9.3 installed for visualization of the analyzed data
You must have Microsoft Hive ODBC Driver to import data from Hive into Excel. Select either the 32-bit or 64-bit version based on your version of Microsoft Excel. But, BigInsights does not support it yet as of Sept 2016.
The analysis performed on the Yelp Dataset are:
Please refer the hive+pig code