# Using Yelp Data to Recommend the Location of the Next Lou Malnati: Preprocessing, Training, and Modeling 
Chicago pizza magnate Lou Malnati is looking to expand his national pizza empire. With 59 locations in Illinois, seven in Arizona, four in Wisconsin, and four in Indiana. Malnati is interested in potentially expanding both within Arizona and Indiana and to other states. In particular, Malnati is interested in Florida, Pennsylvania, New Jersey, and Missouri. 
Malnati’s restaurants are known for their deep dish pizza, and are looking for locations that either might not have deep dish options or locations where the pizza options are not satisfying consumers. Malnati's team believes that they can both introduce deep dish to new customers and lure currently unsatisfied customers with their nationally recognized pizza brand. 
Malnati’s team has requested an analysis of the existing landscape in the four new states along with Arizona and Indiana. They want to understand which state holds the most promise for more or more new locations. Ideally, they would like to open up multiple locations and want to know whether one of the new states would be a better option than continuing to open up restaurants in Arizona and Indiana.

**The purpose of this notebook is to pre-process the data, to split the data into training and test sets, and to build and evaluate several models for predicting the star rating based on the comment.**

## Data Sources
All data has been downloaded directly from [Yelp](https://www.yelp.com/dataset):

1. yelp_academic_dataset_business.json: contains business data including location data, attributes, and categories
2. yelp_academic_dataset_review.json: contains full review text data including the user_id that wrote the review and the business_id the review is written for.

The data was loaded and read into pandas dataframes in the 1-ridgway-read-data notebook. The dataframes were filtered for only businesses with "pizza" in the categories and then pickled. The pickled datasets were then cleaned, merged, the text feature was prepared (e.g., tokenization, lemminization), and pickled once again:

- processed.pkl: pickled dataframe containing reviews and select business information for pizza businesses in select states

## Changes
- 03-21-22: Started pre-processing

## Summary of Pre-Processing and Modeling

TBD

## Import Libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import nltk
#nltk.download('vader_lexicon')
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

## File Locations

In [3]:
processed_df = '../data/processed/processed.pkl'

## Load Data

In [4]:
df = pd.read_pickle(processed_df)

In [5]:
df.T

Unnamed: 0,2527,3471,3473,3480,3482,3488,3490,3491,3492,3493,...,343881,344202,344465,345262,345306,345311,345314,345321,345353,345356
review_id,R2oXPTjy1esbV7ZXIPoxBw,X7TNHS4htMTWSQeBp8LLbw,RLWl7Jtw2PVEv7Y0AlF31A,QXxfgfBfJYX8qJMXSISBkQ,0jiyBbd60YQ7TWa8ISt0_A,3CqRIOcmzPW13H4gH9INNg,Qe7jB_re-Mxv91utLThn2Q,MQ-_OfltcKLfbfQ5yNADwg,P4jT8lta5I8FwJ_kMaPS-A,yhFWd574BJZeIZim3JrAWw,...,1wrZo1Rx6Fq0fHGyPEdK3A,O3PFUw9FJyxhueOmNii37g,IGAXBf3Ku95tp7Ncl-0ppw,VxwUXPrHgFkd5F0NoR9GwA,BjV1u157zRahviPAr79Bsw,K62Blo3Zy2XmgyOutpAZ1A,EJIW9mEsShBR2QAq2WjvnA,vHP5I1Vh7xPLOZ91Audbyg,8849mypb4iGC4i1dwYbATQ,yVG6NpXYpSLAiqk4IJym3w
business_id,cg4JFJcCxRTTMmcg9O9KtA,MaYb7qMN6BomP1zQGj3Wjg,MaYb7qMN6BomP1zQGj3Wjg,MaYb7qMN6BomP1zQGj3Wjg,MaYb7qMN6BomP1zQGj3Wjg,MaYb7qMN6BomP1zQGj3Wjg,MaYb7qMN6BomP1zQGj3Wjg,MaYb7qMN6BomP1zQGj3Wjg,MaYb7qMN6BomP1zQGj3Wjg,MaYb7qMN6BomP1zQGj3Wjg,...,cFSyJluKa2SHtgMMvlx6SQ,cFSyJluKa2SHtgMMvlx6SQ,zODRL6SF5Re-8ZRHWsTraQ,Tk9KD_DDpcMeceID_VrutQ,Tk9KD_DDpcMeceID_VrutQ,Tk9KD_DDpcMeceID_VrutQ,Tk9KD_DDpcMeceID_VrutQ,PjjAdxAAaZ_sb1-CN6PXwA,PjjAdxAAaZ_sb1-CN6PXwA,PjjAdxAAaZ_sb1-CN6PXwA
user_id,MRZ0kv1a5MsaC19fCdHo-Q,SJvCPlJ5X6Jkbc0TggIiCg,bcHgPhn8sgmgMR-IkmL0Jw,byPpmPDen6VNPNtVh_iKhg,p9rYQFDt1279vWbPO9SqOA,7fDqaGdUMccXQ4bnPwR6yg,j3vaxZvhFyaCtwdTE_8aug,Ohhrhu1RkqfVciIVx_W5HQ,ETYqQCCB1TkiWP_jsRUSrg,9iQGmS4GiTQQT6w3B3iHFQ,...,e1c5D6AIgzvjmEPatPWNZw,3GXPuSZNRB7l-EB7LU4IDg,rI2T6J1_epO6y6JnaqjYmg,HfGKgfULfqWkfPvB_3299Q,Vjb8dsOwTXQU-4bdmNzfbw,V8GNpkR61qHtGl8DLjaaHA,5LtOwl4pMb6d9ozE-XULoA,6dccLKOjy6OE8MKkgwXk4g,YD2JKvHuOnyhKGBKZ-FwAQ,SIpwJauwHleShzfbw5NDEg
stars,3.0,5.0,4.0,4.0,5.0,5.0,3.0,4.0,2.0,4.0,...,5.0,2.0,3.0,5.0,4.0,4.0,3.0,5.0,5.0,5.0
text,"Very nice Italian restaurant...however,\n\nMy ...",Today after our Ethiopian food fiasco (see Sel...,"Delicious! \n\nI wish they took reservations, ...",I'm not sure why I haven't written a review fo...,The deep dish pizza here is fantastic!! Like s...,Question: What do you do when you're only in ...,Thought this place was just ok. The sauce was ...,"First off, I am IN LOVE with the ""artwork"" han...",The service was the most outstanding part of o...,"I lived in Chicago for 5 years, which automati...",...,"Service- super genuine and friendly, prompt.\n...",Where do I start!\n\nWe drove by and saw all t...,Mr. P's is a prototypical pizza joint in a str...,I've only tried the smaller pizzas but they ar...,YUM!! Very tasty sauce and very delicious cru...,My teen son told me about Circles and Squares ...,I was soooo excited to try Pizza Plus. After d...,5 stars!!\n\nWhoa! I didn't even know what I w...,OMG Was in the mood for pizza. Saw this on Doo...,This place is definitely a hidden gem! My uncl...
date,2021-08-28 15:21:11,2011-01-09 05:34:28,2009-10-13 20:17:39,2014-05-25 20:31:02,2016-03-07 02:23:31,2011-06-29 14:15:13,2015-08-08 00:55:35,2009-10-13 19:46:34,2014-02-07 04:17:35,2008-12-13 21:50:23,...,2019-03-23 18:33:59,2020-10-30 19:50:02,2015-01-26 16:23:24,2020-03-18 18:34:19,2020-09-08 22:11:12,2021-03-09 00:16:40,2021-02-04 01:19:00,2020-01-09 21:06:37,2020-04-04 22:10:06,2021-07-22 19:18:18
binary_rating,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,...,1.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0
state,FL,MO,MO,MO,MO,MO,MO,MO,MO,MO,...,PA,PA,PA,PA,PA,PA,PA,IN,IN,IN
is_open,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
latitude,27.936236,38.655013,38.655013,38.655013,38.655013,38.655013,38.655013,38.655013,38.655013,38.655013,...,39.940652,39.940652,40.044187,39.926455,39.926455,39.926455,39.926455,39.927925,39.927925,39.927925


## Split into Testing and Training Datasets

https://blog.devgenius.io/training-an-ml-model-for-sentiment-analysis-in-python-63b6b8c68792

## Sentiment Analysis
Adapted from https://github.com/nhcamp/Yelp-Burrito-Reviews/blob/master/Capstone%202.ipynb

In [6]:
def apply_sentiment_intensity_analysis(sentence):
    """Applies the polarity scores function to a sentence. Used with df.apply(), returns dictionary. 
    """
    analyzer = SentimentIntensityAnalyzer()
    polarity_dict = analyzer.polarity_scores(sentence)
    return polarity_dict

df['polarity_score'] = df['text_for_cloud_lemmatized'].apply(lambda x: apply_sentiment_intensity_analysis(x))

In [7]:
df.head()

Unnamed: 0,review_id,business_id,user_id,stars,text,date,binary_rating,state,is_open,latitude,longitude,city_state,year,text_length,deep_dish,text_clean,text_stemmed,text_lemmatized,text_for_cloud_lemmatized,polarity_score
2527,R2oXPTjy1esbV7ZXIPoxBw,cg4JFJcCxRTTMmcg9O9KtA,MRZ0kv1a5MsaC19fCdHo-Q,3.0,"Very nice Italian restaurant...however,\n\nMy ...",2021-08-28 15:21:11,0.0,FL,1,27.936236,-82.482862,"Tampa, FL",2021,2309,Yes,"[nice, italian, restauranthowever, previous, v...","[nice, italian, restauranthowev, previou, visi...","[nice, italian, restauranthowever, previous, v...",nice italian restauranthowever previous visit ...,"{'neg': 0.102, 'neu': 0.667, 'pos': 0.231, 'co..."
3471,X7TNHS4htMTWSQeBp8LLbw,MaYb7qMN6BomP1zQGj3Wjg,SJvCPlJ5X6Jkbc0TggIiCg,5.0,Today after our Ethiopian food fiasco (see Sel...,2011-01-09 05:34:28,1.0,MO,1,38.655013,-90.297761,"Saint Louis, MO",2011,559,Yes,"[today, ethiopian, food, fiasco, see, selam, r...","[today, ethiopian, food, fiasco, see, selam, r...","[today, ethiopian, food, fiasco, see, selam, r...",today ethiopian food fiasco see selam review l...,"{'neg': 0.046, 'neu': 0.776, 'pos': 0.179, 'co..."
3473,RLWl7Jtw2PVEv7Y0AlF31A,MaYb7qMN6BomP1zQGj3Wjg,bcHgPhn8sgmgMR-IkmL0Jw,4.0,"Delicious! \n\nI wish they took reservations, ...",2009-10-13 20:17:39,1.0,MO,1,38.655013,-90.297761,"Saint Louis, MO",2009,896,Yes,"[delicious, wish, took, reservations, friend, ...","[delici, wish, took, reserv, friend, fianc, go...","[delicious, wish, took, reservation, friend, f...",delicious wish took reservation friend fiance ...,"{'neg': 0.056, 'neu': 0.504, 'pos': 0.44, 'com..."
3480,QXxfgfBfJYX8qJMXSISBkQ,MaYb7qMN6BomP1zQGj3Wjg,byPpmPDen6VNPNtVh_iKhg,4.0,I'm not sure why I haven't written a review fo...,2014-05-25 20:31:02,1.0,MO,1,38.655013,-90.297761,"Saint Louis, MO",2014,828,Yes,"[im, sure, havent, written, review, pi, times,...","[im, sure, havent, written, review, pi, time, ...","[im, sure, havent, written, review, pi, time, ...",im sure havent written review pi time time too...,"{'neg': 0.076, 'neu': 0.593, 'pos': 0.331, 'co..."
3482,0jiyBbd60YQ7TWa8ISt0_A,MaYb7qMN6BomP1zQGj3Wjg,p9rYQFDt1279vWbPO9SqOA,5.0,The deep dish pizza here is fantastic!! Like s...,2016-03-07 02:23:31,1.0,MO,1,38.655013,-90.297761,"Saint Louis, MO",2016,1067,Yes,"[deep, dish, pizza, fantastic, like, reviewers...","[deep, dish, pizza, fantast, like, review, men...","[deep, dish, pizza, fantastic, like, reviewer,...",deep dish pizza fantastic like reviewer mentio...,"{'neg': 0.022, 'neu': 0.719, 'pos': 0.26, 'com..."
