Yelp Academic Dataset Analysis
To run the notebooks, you will need a standard installation of Anaconda with Python 3.x.
Additional libraries needed:
- sklearn
- s2sphere
Data used in the analysis can be downloaded at https://www.yelp.com/dataset.
This project was done as part of Udacity's Data Scientist Nanodegree. A yelp academic dataset was explored and the following questions were answered:
- Are there differences in the star ratings by major categories (Restaurants, Shopping, Nightlife, ...)? Any differences in Restaurant categories?
- What affects if a business will close?
- Does vicinity to other similar business help or hurt ratings?
There are four notebooks that contain the analysis of the dataset.
- Yelp Academic Dataset.ipynb: Start here. Contains the introduction, data exploration, and data preparation.
- Question1.ipynb
- Question2.ipynb
- Question3.ipynb
There is also a ProcessData.py file which contains functions for processing data which are needed by the notebooks.
The results of the analysis are presented in a Medium Post.
Main findings:
- Restaurants have the lowest stars ratings compared to other types of businesses. Cafes have the highest average ratings within restaurants.
- The location of the business has the largest effect on whether it will stay open or not.
- You don’t need to be around other restaurants to get a good rating.