# Project Description

We want to recommend restaurants that the user will **like** based off that users existing reviews. We use a binary (good or bad) scale in our recommendation. So in our training data we re-label all 4 and 5 star reviews as 1 and reviews with 3 stars or lower as 0.

There are a few scenarios that we're trying to capture when recommending a good restaurant:

1. A restaurant that has a terrible rating but for some reason the user loves it
2. A restaurant that has a terrible rating and the user also hates it
3. A restaurant that has a really good rating but for some reason the user doesn't really like it
4. A restaurant that has a really good rating and the user loves it

We want to recommend restaurants that match scenarios 1 and 4 and filter out for scenarios 2 and 3. We try to do this using the *sentiment* of a users text, a users *word choice*, and the *topics* in a users corpus of reviews. 

The general idea is that we can capture the first portion of each scenario, "A restaurant that has [blank] rating", using linguistic tone and word choice. We capture linguistic tone by creating a negative and positive word percentage feature. We use the negative and positive word categories in the Hu and Liu (2004) word dictionary to create our word counts and take a percentage of total words, identified using regular expression rules. We capture word choice by using a TF-IDF feature matrix with an n-gram range of (2,2).

We try to capture the second part of each scenario, "the user feels [blank] about it", by using topic models trained on each users specific syntax. Specifically, we use Latent Dirichlet Allocation and Latent Semantic Indexing/Latent Semantic Analysis to create word topics. Then, we use reduce the tfidf (unique word representation) of each review to these topic representations. Restaurants that have reviews with weightings on topics that are similiar to the weightings on the user's good reviews should have qualities that the user will find enjoyable.


For more information see:

* [A Yelp Recommendation System](https://www.cs.cornell.edu/~rahmtin/files/YelpClassProject.pdf)
* [Original LDA paper](http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf)
* [2013 Yelp RecSys Kaggle Competition](https://www.kaggle.com/c/yelp-recsys-2013)
* [2013 Yelp RecSys Kaggle Competition Runner Up Recommendation System](https://kaggle2.blob.core.windows.net/forum-message-attachments/9102/vsu_RecSys2013.pdf) 
* [Hu & Liu (2004)](https://www.cs.uic.edu/~liub/publications/kdd04-revSummary.pdf)

Our methodology is novel in the sense that we do not rely on restaurant attributes (Price, Locaation, Type of Food, Outdoor Seating Availability, etc), but only on the text of each users review. Although the inputs are different, the classification methods are similiar and we should expect similiar results if text is as good of a predictor as restaurant attributes.


# Collection/Ingest/Infrastructure

# Wrangling

# Model Selection/Feature Engineering

The features we use are:
1. Negative/Positive Percentages
2. TF-IDF with an N-Gram range of (2,2)
3. 50-topic Latent Dirichlet Allocation
4. 50-topic Latent Semantic Indexing

The models that we test on are:
1. Random Forest
2. Naive Bayes
3. Linear Support Vector Machine

The central problem is that we do not have the ability to follow up and get feedback from users after we make a recommendation. Therefore, we have constructed the following testing procedure to select the best feature and model combination for each user in making restaurant recommendations. 

# Testing Procedure

To get around this problem, we exploit the fact that sometimes a user review rating is also that user's restaurant rating. For a given user, if she only has one review for a restaurant then the rating for that review is also her rating for the restaurant.

We can test our recommendation system by using the following test design:

**Build:**
* Let $B$ be the total set of user reviews.
* Let $R$ be the set of restaurants that the user has reviewed only once.
* Take some percentage, $p = .25$, of the set $R$ and let this set be called $R_{test}$. We use 25% as our test slice because it performed best in informal tests, in an actual production setting we should be using cross-validation.
* Next, get all the reviews **except for the user's review** for each restaurant in $R_{test}$. Call this set of reviews $K_{train}$.
* Set the remaining $1-p = .75$ percentage of the set $R$ and call this the training set of restaurants $R_{train}$. Take the subset of reviews from $B$ that correspond to these restaurants. Let this be the set of test restaurants $B_{train}$.
* Note every restaurant in $R_{train}$ and $R_{test}$ has User's Restaurant Rating = User's Review Rating.

**Run:**
1. For each review in $B_{test}$, create the tuple (User Review, Restaurant Rating, Restaurant ID) and replace the instance in $B_{test}$ with the tuple.

2. For each restaurant in $R_{train}$, find the total set of reviews from the Reviews database. Let this set be $Y$, where each element in $Y$ is a tuple (User Review, Restaurant Rating, Restaurant ID)

3. We run each of the algorithms above, using $Y$ as the training set and $B_{test}$ as the test set.
4. Step 3 results in a set $B_{result}$ where each element is characterized by (User Review, Actual Restaurant Rating, Predicted Restaurant Rating, Predicted Restaurant). Note the cardinality of $B_{result}$ is the same as that of $B_{test}$
5. Set $y_{pred} =$ I(Predicted Restaurant) and $y_{actual} =$ I(Restaurant) where the indicator function I() is 1 if the user rated the restaurant at least a 4 and is 0 if the user rated the restaurant less than a 4
6. We calculate the following scores for each model/feature combination:
    1. Log-Loss Score:
    $$-\sum_{i=1}^{N} \frac{\text{Good}_{i}*\log{P(Good)} - \text{1 - Good}_{i}*\log{1 - P(Good)}}{N}$$
    
    2. Accuracy-Score:
    $$\sum_{i=1}^{N} \frac{\text{Predicted Rating}_{i} - \text{Actual Rating}_{i}}{N}$$
    
    3. Precision-Score:
    $$\frac{\text{True Positive}}{\text{True Positive + False Positive}}$$

Where N is the number of recommended restaurants in $B_{result}$. Predicted rating is the predicted restaurant rating and actual rating is the rating that the user gave to the restaurant. A precision score of 1 means that the recommendation system was very good at recommending good restaurants. In this case, success means that the recommendation system was able to accurately predict that a user will like a restaurant.

# Sample Recommendation and Application

We want to create a list of recommendations according to the following schema:  

|            | _Same Tone_             | _Diff Tone_           |
|------------|-----------------------|----------------------|
| **_Same Word Choice_** | Yes                   | ML Result(Tone, Word Choice) |
| **_Diff Word Choice_** | ML Result(Tone , Word Choice)  | No                   |

After running the above algorithms and choosing the algorithm that best classifies using our testing process, we'll have the following list of classification results:

```python
ml_result = [Results from Best Performing ML Algorithm]
```
With each element in the list containing the following tuple:

(User Review, User Rating, Classified Restaurant Rating, Classified Restaurant)

We create the following lists from our ML results:

```python
good_results = [tup or tup in ml_result if abs(tup[1] - tup[2]) <= 1 & tup[1] >= 4]
restaurants = [tup[3] for tup in good_results]
```
We run the following process to populate a top 5 recommendation list:

```python
from collections import Counter
restaurant_counter = Counter(restaurants)
try:
    recommendation_list = restaurant_counter.most_common(5)
except:
    recommendation_list = restaurant_counter.most_common()
```
The elements in recommendation_list will list the most commonly classified restaurants, based on the users review, in descending order. So the first element in recommendation_list will be the top recommendation, the second will be the second recommendation, etc. 