# Yelp Restaurant Rating Prediction

## Project Description
The project goal is to predict restaurant overall ratings on yelp
in New York City, using multiple features of restaurants that we will
extract using the yelp API. Our motivation is to help determine how
successful a new restaurant business might be, given certain known
characteristics of it.

We will use the Yelp API, with the help of pandas, to acquire raw data
from restaurants. Then we will extract reasonable features such as
location, open hours, whether it takes reservations, whether it has
delivery service, whether there is parking space, and whether it
provides free wifi etc., from the parsed data, and combine with the
overall ratings, which is a numerical value ranging from 0 to 5, as
labels.

We will model the rating distribution over the different features that
we extract and create, and analyse how much each feature shifts our
distribution. Using our results from this we will select good features
to train on machine learning models.

Using the labeled features that we construct, we will train different
machine learning models like linear regression, nonlinear regression,
logistic regression as well as neural networks, then make some
predictions, and compare the accuracy obtained from them.

## Team Members
Jun Hee Kim, Nikhil Rangarajan, Sander Shi

## Procedure
* [Data Gathering from API](#step-1)
* [Feature Extraction with Parsing](#step-2)
* [Feature Analysis and Variable Selection](#step-3)
* [Setup of Models](#step-4)
* [Cross Validation](#step-5)
* [Final Analysis](#step-6)

## Part 0: Imports and Definitions of Constants

We will be using `pandas` to parse the data and `tensorflow` to construct the machine learning models. We will also be using the Yelp API to gather the data.

In [1]:
import tensorflow as tf
import numpy as np
import pandas as pd
import requests

API_URL = "https://api.yelp.com/v3/businesses"
SEARCH_URL = API_URL + "/search"
API_KEY = "./API_KEY"

<a id="step-1"></a>

## Part 1: Data Gathering from API

In this step we will use the Yelp API to gather restaurant pages, then extract
information using business search API requests.

In [6]:
def find_restaurants(url, api_key_url):
    """
    This function loads all restaurant data from restaurants in Pittsburgh.
    
    @input url: The API url.
    @type url: String.
    
    @input api_key_url: The API key url.
    @type api_key_url: String.
    
    @return: A Pandas DataFrame containing the restaurant URLs.
    @rtype: pandas.DataFrame.
    """
    # Retrieve API key
    with open(api_key_url, 'r') as f:
        api_key = f.readline().strip()
        
    # Set request header and params for search query
    headers = {
        'Authorization': ' '.join(['Bearer', api_key])
    }
    params = {
        'term': 'restaurants',
        'location': 'NYC'
    }
    response = requests.get(url=url, headers=headers, params=params)
    return response

all_restaurants = find_restaurants(SEARCH_URL, API_KEY)

<Response [200]>


## Part 2: Feature Extraction with Parsing

In [3]:
def extract_features(raw_data):
    """
    This function extracts the features from raw restaurant data.
    """
    pass

labeled_features = extract_features(all_restaurants_raw)

## Part 3: Logistic Regression

In [4]:
def train_logistic(labeled_data):
    """
    This function trains the features using logistic regression.
    """
    pass

theta = train_logistic(labeled_features)

## Part 4: Neural Network

In [5]:
def train_neural_net(labeled_data):
    """
    This function trains the features using a neural network.
    """
    pass

thetas_nn = train_neural_net(labeled_features)

## Part 5: Predictions and Testing Accuracy

In [6]:
def predict_logistic(features, trained):
    """
    This functions gives a prediction using logistic regression.
    """
    pass

def predict_nn(features, trained):
    """
    This function gives a prediction using the neural net.
    """
    pass

prediction_logistic = predict_logistic(_, theta)
prediction_nn = predict_nn(_, thetas_nn)

def report_accuracy_logistic(trained):
    pass

def report_accuracy_nn(trained):
    pass