Skip to content
Yelp Data Analysis
Python HTML JavaScript CSS C
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.


Type Name Latest commit message Commit time
Failed to load latest commit information.


Yelp Data Analysis

Created By: Richard Huang, Sebastian Guerrero, Ankit Gupta, Li (Sophia) He

The web application can be found at:

GitHub repo:


We are a group of four information technology students, two undergrad, and two masters, from Carnegie Mellon University. Our career paths have led us all to the Data Pipeline master course offered at the Human Computer Interaction Master Program. In this course we have learned may steps of the data pipeline, all leading to our final project which you have before you.

For our final project, which we have to encompass the whole data pipeline, we chose the Yelp Dataset Challenge. We were attracted to this set due to recent publicity about ho the city of Pittsburgh is considered one of the top food cities. Our task in this project is to explore and gain some insightful insights on the data. For this purpose, we are taking on the challenge of what it takes to beat Yelp.

Basically, our goals for this project is predominantly targeted towards current restaurant/business owners. We wanted to show users through clear data visualizations how Yelp does their rankings and using machine learning to figure out which factors/features (parking, take-out, noise-level, etc.) of a restaurant matter the most. Users can also find out for the category of their restaurant, which features to have to increase their overall ranking. Also there is a form that users can fill out different features of their restaurant and will return a prediction for the potential ranking of the restaurant.


This web application is built with Flask and run on Google App Engine currently. As for the data side we had to do data collection and data cleaning, modelling and feature selection, machine learning and gadient boosting machine, and lastly regression for prediction. For the visualizations, we used both Tableau and D3 (Sankey) to visualize our findings. As for the front end, we used Bootstrap and some Javascription libraries to the UI/UX.

You can’t perform that action at this time.