### Yelp data via their API

This data was collected from the yelp fusion api. You can find more information and how to do this yourself [here](https://www.yelp.com/developers/documentation/v3)

I followed [this great tutorial](https://python.gotrained.com/yelp-fusion-api-tutorial/). Good luck!

You will need to perform two different requests:

1) Pull in the information from yelp needed for the reviews
2) Use the information from the first set to request the text reviews

In [1]:
#for scraping
import requests
import json

#for tabular data
import pandas as pd
import numpy as np

Please note! The cell below has the api_key and headers information. This is not a part of this particular notebook so as to protect the yelp api key used.

Obtaining your own API key is relatively simple, I recommend following the tutorial mentioned above.

In [2]:
#prep our dataframe by removing non-needed columns
df = pd.read_csv('restaurant-scores-lives-standard.csv')
df = df.drop(columns=['business_id', 'business_phone_number', 'business_city', 'business_state', 'business_location', 'inspection_id'])

In [5]:
#drop null values
df.dropna(inplace=True)

In [7]:
#create a list of all unique businesses to use for the request
business_names = df.business_name.unique().tolist()

### Step 1: Create our businesses df with basic information (no text reviews yet).

This call will create the necessary information we need to pull in the reviews.

In [25]:
url = 'https://api.yelp.com/v3/businesses/search'
columns=['name', 'id', 'rating', 'price']
#set up a first dud row
business_df = pd.DataFrame([[np.nan, np.nan, np.nan, np.nan]], columns=columns)

for name in business_names:
    #get our parameters for each value
    location = df.loc[df.business_name == name]['business_address'].reset_index()['business_address'][0]
    latitude = df.loc[df.business_name == name]['business_latitude'].reset_index()['business_latitude'][0]
    longitude = df.loc[df.business_name == name]['business_longitude'].reset_index()['business_longitude'][0]
    #set up our calls and make the request
    params = {'term' : name, 'location' : location, 'latitude' : latitude, 'longitude' : longitude}
    req = requests.get(url, params=params, headers=headers)
    #send our request
    #we will add each  new entry to temp, then add temp to our running df
    temp = []
    #you will get a status code if you're over your request limit
    if (req.status_code == 200):
        parsed = json.loads(req.text)
        businesses = parsed['businesses']
        #append each value to our 'temp' list
        try:
            temp.append(businesses[0]['name'])
        except:
            temp.append(np.nan)
        try:
            temp.append(businesses[0]['id'])
        except:
            temp.append(np.nan)
        try:
            temp.append(businesses[0]['rating'])
        except:
            temp.append(np.nan)
        try:
            temp.append(businesses[0]['price'])
        except:
            temp.append(np.nan)
    
    #create a temp_df w/one record, then append it to the bottom of our running df
    try:
        temp_df = pd.DataFrame([temp], columns=columns)
    except:
        temp = [np.nan. np.nan, np.nan, np.nan]
        temp_df = pd.DataFrame([temp], columns=columns)
        
    business_df = pd.concat([business_df, temp_df])

In [26]:
business_df.head(10)

Unnamed: 0,name,id,rating,price
0,,,,
0,Rainbow Grocery,5NvXIkNdCCqUb235WVfMJg,4.0,$$
0,Parada 22,TlBFKt2N2eSEBpN-UZmDBw,4.0,$$
0,Newtree,thrAX79eegx1Of82TCJhrA,4.0,$$
0,Starbucks,C36BK5luxi-8apVMMhsizQ,3.5,$
0,Dojima-Ann,cseyjQ0XIp6dwC0_TcaMOg,3.5,$$
0,Piccino,i2VhtC1JkV_sZOA4urd1ng,4.0,$$
0,Eric's Restaurant,Ux_bs6eZ7WqIsLepTw1uBw,3.5,$$
0,Taiwan Restaurant,eu3UCrfFkTF73F0idXeJ5Q,3.5,$$
0,Java Beach at the Zoo,iRmdKzcbdFLIp3s9e4xrHA,4.0,$


This looks good. Let's save it for our next request calls.

In [4]:
#don't run again!
business_df.to_csv('businesses.csv', index=False)

NameError: name 'business_df' is not defined

## Step 2: Getting the text reviews and average sentiment scores

In [5]:
#load it
business_df = pd.read_csv('businesses.csv')

In [6]:
#drop all of our bad requests from before
business_df.dropna(how='all', inplace=True)

In [11]:
#NLP package for our sentiment analysis
from textblob import TextBlob
#each business id is what the api needs to return text reviews
business_ids = business_df['id'].tolist()

Each business will pull the 3 most recent reviews (assuming there are that many). We will average the ratings from each review and the sentiment and create new features for these two.

In [12]:
#lists to append to our business_df
average_rating = []
average_sentiment = []

review_ids = []
review_text = []

#now time to get some reviews!
for id in business_ids:
    #lists to average and append
    ratings = []
    sentiment = []
    
    #make request and parse data
    url = 'https://api.yelp.com/v3/businesses/{}/reviews'.format(id)
    req = requests.get(url, headers=headers)
    parsed = json.loads(req.text)
    reviews = parsed["reviews"]
    
    #check if our request is good, then loop through reviews
    if (req.status_code == 200):
        for i in range(len(reviews)):
            #append ratings and reviews
            ratings.append(reviews[i]['rating'])
            
            #each individual review
            testimonial = TextBlob(reviews[i]['text'])
            #sentiment of individual review
            sentiment.append(testimonial.sentiment.polarity)
            
            #we'll create a 3rd dataset to store each review
            review_ids.append(reviews[i]['id'])
            review_text.append(reviews[i]['text'])
            
    #add our averages to our lists
    average_rating.append(sum(ratings) / len(ratings))
    average_sentiment.append(sum(sentiment) / len(sentiment))

In [13]:
business_df['review_rating'] = average_rating
business_df['review_sentiment'] = average_sentiment

This will be our df to merge and use with our models.

In [14]:
business_df.to_csv('yelp_business.csv', index=False)

In [15]:
business_df.head()

Unnamed: 0,name,id,rating,price,review_rating,review_sentiment
1,Rainbow Grocery,5NvXIkNdCCqUb235WVfMJg,4.0,$$,3.666667,0.080208
2,Parada 22,TlBFKt2N2eSEBpN-UZmDBw,4.0,$$,3.333333,0.106349
3,Newtree,thrAX79eegx1Of82TCJhrA,4.0,$$,4.666667,0.355787
4,Starbucks,C36BK5luxi-8apVMMhsizQ,3.5,$,3.0,0.27031
5,Dojima-Ann,cseyjQ0XIp6dwC0_TcaMOg,3.5,$$,3.666667,0.021528


## Step 3: Store each individual text review

Note, this is not necessary, but is recommended to spot check your requests for accuracy.

In [3]:
business_df = pd.read_csv('yelp_business.csv')

In [17]:
columns= ['name', 'id', 'text']

review_df = pd.DataFrame([[np.nan, np.nan, np.nan]], columns=columns)
for i in range(len(business_df['id'])):
    id = business_df['id'][i]
    name = business_df['name'][i]
    
    url = 'https://api.yelp.com/v3/businesses/{}/reviews'.format(id)
    req = requests.get(url, headers=headers)
    parsed = json.loads(req.text)
    reviews = parsed["reviews"]
    
    if (req.status_code == 200):
        for j in range(len(reviews)):
            temp = []
            
            temp.append(business_df['name'][i])
            temp.append(business_df['id'][i])
            temp.append(reviews[j]['text'])
            
            temp_df = pd.DataFrame([temp], columns=columns)
            review_df = pd.concat([review_df, temp_df])
            

In [18]:
review_df.head()

Unnamed: 0,name,id,text
0,,,
0,Rainbow Grocery,5NvXIkNdCCqUb235WVfMJg,Love love this co-op grocery store! My mom bou...
0,Rainbow Grocery,5NvXIkNdCCqUb235WVfMJg,The produce and selection of products they hav...
0,Rainbow Grocery,5NvXIkNdCCqUb235WVfMJg,This Grocery Store is Vegan and Gluten Free He...
0,Parada 22,TlBFKt2N2eSEBpN-UZmDBw,#pastelon\nSweet #plantains layered dpicadillo...


Save these as a new csv.

In [19]:
review_df.to_csv('reviews.csv', index=False)

In [None]:
text_review_df.to_csv('text_reviews.csv', index=False)