# Yelp API - Lab



## Introduction 

Now that we've seen how the Yelp API works, it's time to put those API and Pandas skills to work in order to do some basic business analysis! Taking things a step further, you'll also independently explore how to perform pagination in order to retrieve a full results set from the Yelp API!

## Objectives

You will be able to:
* Create HTTP requests to get data from Yelp API
* Parse HTTP responses and save the information in a csv
* Perform pagination to retrieve troves of data!
* Write Pandas code to answer questions about your data 

## Problem Introduction

For this lab you will analyze the yelp data for a group of businesses to learn more about an industry. You will choose a type of business (Italian Restuarants, Nail Salons, Crossfit gyms) and a location to analyze. Then you will get data from the Yelp API, store that data in a SQL Database on AWS, and write queries to answer questions about the data. 


### Process:

1. Read through the data questions and the API documentation to determine which pieces of information you need to pull from the Yelp API.

2. Plan out what . One for the businesses and one for the reviews.

3. Create code to:
  - Perform a search of businesses using pagination
  - Parse the API response for specific data points
  - Save the data you pull as a csv

4. Use the functions above in a loop that will paginate over the results to retrieve all of the results. 

5. Create functions to:
  - Retrieve the reviews data of one business
  - Parse the reviews response for specific review data
  - Save the data you pull as a csv

6. Take all of the business IDs from your business search, and  using the 3 Python functions you've created, run your business IDs through a loop to get the reviews for each business and save them in a csv.

7. Write Pandas code to answer the following questions about your data.


Bonus Steps:  
- Place your helper functions in a package so that your final notebook only has the major steps listed.
- Rewrite your business search functions to be able take an argument for the type of business you are searching for.
- Add another group of businesses to your files.


 
## Data Questions:

- Which are the 5 most reviewed businesses?
- What is the highest rating recieved in your data set and how many businesses have that rating?
- What percentage of businesses have a rating greater than or  4.5?
- What percentage of businesses have a rating less than 3?
- What is the average rating of restaurants that have a price label of one dollar sign? Two dollar signs? Three dollar signs? 
- Return the text of the reviews for the most reviewed restaurant. 
- Return the name of the business with the most recent review. 
- Find the highest rated business and return text of the most recent review. If multiple business have the same rating, select the restaurant with the most reviews. 
- Find the lowest rated business and return text of the most recent review.  If multiple business have the same rating, select the restaurant with the least reviews. 


## Part 1 - Understanding your data and question

Lok at the question and determine what data you will need to store in your database in order to answer the questions. Start to think about what tables will you want to create and what columns will you ahve for those tables. 

Look at the API documentation, and determine what fields of the API response you will match up with the columns you want in your Pandas Dataframes. 


https://www.yelp.com/developers/documentation/v3/get_started

## Part 2 - Create ETL pipeline for the business data from the API

Now that you know what data you need from the API, you want to write code that will execute a api call, parse those results and then insert the results into the DB.  

It is helpful to break this up into three different functions (*api call, parse results, and insert into DB*) and then you can write a function/script that pull the other three functions together. 

Let's first do this for the Business endpoint.

In [196]:
import requests
import json
from keys import client_id, api_key
import pandas as pd
import os
import seaborn as sns
import numpy as np

In [148]:
# Write a function to make a call to the yelp API
url =  'https://api.yelp.com/v3/businesses/search'

headers = {
        'Authorization': 'Bearer {}'.format(api_key),
    }

term = 'Indian Restaurants'

location = 'New York New York'

radius = 8500

url_params = {
                "term": term.replace(' ', '+'),
                "location": location.replace(' ', '+'),
                "radius" : radius,
                "limit": 50,
            }

def yelp_call(url_params, api_key):
    url =  'https://api.yelp.com/v3/businesses/search'
    headers = {
        'Authorization': 'Bearer {}'.format(api_key),}
    response = requests.get(url, headers=headers, params=url_params)
    data = response.json()
    return data

In [149]:
# write a function to parse the API response 

def parse_results(results):
    parsed_results = []
    for business in results['businesses']:
        biz_list = [business['id'],
                    business['name'], 
            ', '.join(business['location']['display_address']),business['rating']]
        if 'price' in business.keys(): 
            biz_list.append(business['price'])
        else: 
            biz_list.append('n/a')
        parsed_results.append(biz_list)
    return parsed_results

In [None]:
# Write a function to take your parsed data and insert it into CSV
# import pandas as pd
# df = pd.DataFrame()

In [None]:
df.to_csv('yelp_lab.csv')

In [150]:
def data_save(parsed_results, csv_filename):
    f = open(csv_filename)
    columns = ['Business ID','Name','Address','Rating','Price',]
    data = pd.read_csv(f, index_col = 0)
    df1 = pd.DataFrame(data,columns=columns)
    df2 = pd.DataFrame(parsed_results,columns=columns)
    df3 = pd.concat([df1,df2])
    df3.to_csv('yelp_lab.csv')
    # your code to save the current results with all of the other results. 
    # I would save the data every time you pull 50 results
    # in case something breaks in the process.

In [None]:
data_save(parse_results(data),'yelp_lab.csv')

In [154]:
#create new file
def make_or_clear_csv2(csv_filename):
    #clear the stuff in csv except the header
    headers = pd.DataFrame(columns = ['Business ID','Name','Address','Rating','Price'])
    if os.path.exists(csv_filename):
        #clear it
        f = open(csv_filename, 'w')
        f.truncate() #eviscerated
        headers.to_csv(csv_filename, mode='a')
    else:
        headers.to_csv(csv_filename)

csv_filename = 'yelp_lab.csv'
make_or_clear_csv2(csv_filename)

In [155]:
# Write a script that combines the three functions above into a single process.
empty_df = pd.DataFrame()
empty_df.to_csv('yelp_lab.csv')

# create a variable  to keep track of which result you are in. 
cur = 0
csv_filename = 'yelp_lab.csv'

#set up a while loop to go through and grab the result 
while cur < 650:
    #set the offset parameter to be where you currently are in the results 
    url_params['offset'] = cur
    #make your API call with the new offset number
    results = yelp_call(url_params, api_key)
  #########  
    #after you get your results you can now use your function to parse those results
    parsed_results = parse_results(results)
    
    # use your function to insert your parsed results into the db
    data_save(parsed_results, csv_filename)
    #increment the counter by 50 to move on to the next results
    cur += 50

## Part 4 -  Create ETL pipeline for the restaurant review data from the API

You've done this for the Businesses, now you need to do this for reviews. You will follow the same process, but your functions will be specific to reviews.

In [None]:
# file = open('yelp_lab.csv')
# columns = ['Business ID','Name','Address','Rating','Price',]
# business_data = pd.read_csv(file)
# business_df = pd.DataFrame(business_data,columns=columns)

# business_id = business_df['Business ID']
# business_id.values #list of business IDs

In [156]:
# write Pandas code to pull back all of the business ids 
# you will need these ids to pull back the reviews for each restaurant
def get_ids(csv_filename):
    '''
    Return the ID numbers of restaurants from given file
    '''
    file = open('yelp_lab.csv')
    columns = ['Business ID','Name','Address','Rating','Price',]
    business_data = pd.read_csv(file)
    business_df = pd.DataFrame(business_data,columns=columns)
    return business_df['Business ID'].values


In [157]:
def get_reviews(bus_id,api_key):
    '''
    Return dictionary of reviews
    '''
    dict_reviews = {}
    url = 'https://api.yelp.com/v3/businesses/{}/reviews'.format(bus_id)
    headers = {
        'Authorization': 'Bearer {}'.format(api_key),
    }
    url_params = {'id' : bus_id}
    results = requests.get(url, headers = headers, params = url_params)
#     dict_reviews[bus_id] = results.json()
    response = results.json()
    return response

In [30]:
testing2 = get_reviews('sCC7-hSdCkNPExejZT9BAQ',api_key)
testing2

{'reviews': [{'id': 'NmHTQPQcab0KXKj97dAVJA',
   'url': 'https://www.yelp.com/biz/the-masalawala-new-york-2?adjust_creative=Y8PxTsoIhnMmiPRmXB-Kqg&hrid=NmHTQPQcab0KXKj97dAVJA&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_reviews&utm_source=Y8PxTsoIhnMmiPRmXB-Kqg',
   'text': "Quite possibly the best Indian food I've had in America. We enjoyed pretty much everything we ordered. I highly recommend the chicken biryani and the cheese...",
   'rating': 5,
   'time_created': '2019-12-24 19:12:46',
   'user': {'id': 'v-b1fKvclW_k9Qf0l_hi6Q',
    'profile_url': 'https://www.yelp.com/user_details?userid=v-b1fKvclW_k9Qf0l_hi6Q',
    'image_url': 'https://s3-media2.fl.yelpcdn.com/photo/9JDx3NMwcGDY4L_XrUlUQQ/o.jpg',
    'name': 'Surashree K.'}},
  {'id': 'hReNY3wgAFqyZ2KF04JIlw',
   'url': 'https://www.yelp.com/biz/the-masalawala-new-york-2?adjust_creative=Y8PxTsoIhnMmiPRmXB-Kqg&hrid=hReNY3wgAFqyZ2KF04JIlw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_reviews&utm_source=Y8PxTsoIhnMmiP

In [None]:
testing = get_reviews('LaMbEoM5Vw9CQP5usaD2xA',api_key)

In [21]:
testing

NameError: name 'testing' is not defined

In [None]:
testing['reviews']

In [None]:
review_parse(testing)

In [158]:
# Write a function to parse out the relevant information from the reviews
def review_parse(result):
    list_parsed_reviews = []
    total = result['total']
    for review in result['reviews']:
        list_parsed_reviews_temp = [
            review['text'],
            review['rating'],
            review['time_created'],
            total
        ]
        list_parsed_reviews.append(list_parsed_reviews_temp)
#         list_parsed_reviews.append(result['total'])
    return list_parsed_reviews


# function to insert parsed data into dataframe
def save_reviews(parsed_results,csv_filename):
    '''
    parsed_resuls to dataframe, add the data
    '''
    columns = ['Review','Review Rating','Time','Total Reviews','Business ID']    
    
    parse_frame = pd.DataFrame(parsed_results,columns = columns)
    
    f = open(csv_filename)
    original = pd.read_csv(f, index_col = 0)
    original_dataframe = pd.DataFrame(original, columns = columns)
    
    updated = pd.concat([original_dataframe,parse_frame])
    
    updated.to_csv(csv_filename)
    
#     df = pd.DataFrame(parsed_results,columns=columns)
#     df.to_csv(csv_filename, mode='a', header = False)

In [102]:
def make_or_clear_csv(csv_filename):
    #clear the stuff in csv except the header
    headers = pd.DataFrame(columns = ['Review','Review Rating','Time','Total Reviews','Business ID'])
    if os.path.exists(csv_filename):
        #clear it
        f = open(csv_filename, 'w')
        f.truncate() #eviscerated
        headers.to_csv(csv_filename, mode='a')
    else:
        headers.to_csv(csv_filename)

In [159]:
csv_filename1 = 'yelp_lab.csv'
csv_filename2 = 'yelp_lab_reviews.csv'
# columns = ['Name','Address','Rating','Price','Business ID']
# business_file = get_id(csv_filename)
# business_df = pd.DataFrame(business_file,columns = columns) 



In [160]:
bus_id_list = get_ids('yelp_lab.csv') #returns list
make_or_clear_csv(csv_filename2)

# counter = 0
# while counter < len(bus_id_list):
#     current_dict = {}
#     for id_num in bus_id_list:
#         review_dictionary = get_reviews(id_num,api_key)
#         print(review_dictionary)
#         current_dict[id_num[0]] = review_parse(review_dictionary)
#         print(current_dict)
#         save_reviews(current_dict,csv_filename2)
#     counter += 1


position = 0
while position < len(bus_id_list):
    for id_num in bus_id_list[position:]:
            #random instances of KeyError when a get_review call returns json format of "Internal Error from API"
            #try to execute lines, but if a key error, break and try again with saved position
        try:
            current_dict = {}
            review_dictionary = get_reviews(id_num,api_key)
            current_dict[id_num] = review_parse(review_dictionary)
        except KeyError:
            print('refreshing')
            break
            
        parsed_results = []
        for review in current_dict[id_num]:
            review.append(id_num)
#             review.append()
            parsed_results.append(review)
        save_reviews(parsed_results,csv_filename2)
        print(position, end=' ') 
        position +=1
print('Done')

0 1 2 refreshing
3 4 5 6 7 8 9 10 11 12 refreshing
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 refreshing
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 2

## Part 5 -  Write Pandas code that will answer the questions posed. 

Now that your data is saved in CSVs, you can answer the questions. 

In [None]:
#import and clean dataframe
data_frame_test = pd.read_csv('yelp_lab_reviews.csv')
# data_frame_test.rename(columns = {'total reviews':'business id','business id':'total reviews'},inplace = True)

data_frame_test.columns #'Unnamed: 0'
review_frame = data_frame_test.drop(['Unnamed: 0'], axis = 1)


In [162]:
review_frame


Unnamed: 0,Review,Review Rating,Time,Total Reviews,Business ID
0,Quite possibly the best Indian food I've had i...,5,2019-12-24 19:12:46,1479,sCC7-hSdCkNPExejZT9BAQ
1,delish neighborhood delight with great variety...,4,2020-03-09 22:11:48,1479,sCC7-hSdCkNPExejZT9BAQ
2,A okay hipster Indian spot in the LES. Food w...,3,2020-01-19 05:53:56,1479,sCC7-hSdCkNPExejZT9BAQ
3,This probably is the nicest and most consisten...,5,2020-03-08 17:19:47,771,HEs6llkn3wIP1ATXeIZQew
4,My husband loves Indian food. I have only had ...,4,2020-03-06 10:35:36,771,HEs6llkn3wIP1ATXeIZQew
...,...,...,...,...,...
1373,The $10 falafel sandwich with complimentary sa...,2,2019-12-06 07:42:54,104,8s5d0nkoNMgu9Oe066q_lg
1374,Unreal! I asked someone for my office to pick...,1,2020-02-24 13:31:21,104,8s5d0nkoNMgu9Oe066q_lg
1375,I am obsessed with this place! My favorite lun...,5,2019-10-22 17:33:22,205,X_AQHQmY2pO_YHaojU_rMg
1376,This has to be the worst Just Salad ! The serv...,1,2020-02-10 09:39:08,205,X_AQHQmY2pO_YHaojU_rMg


In [163]:
yelp_lab = pd.read_csv('yelp_lab.csv')
business_frame = yelp_lab.drop(['Unnamed: 0'], axis = 1)

In [165]:
business_frame

Unnamed: 0,Business ID,Name,Address,Rating,Price
0,sCC7-hSdCkNPExejZT9BAQ,The MasalaWala,"179 Essex St, New York, NY 10002",4.5,$$
1,HEs6llkn3wIP1ATXeIZQew,Pippali,"129 E 27th St, New York, NY 10016",4.0,$$
2,kS2CP48K66yL2TwIhUqy_A,Tamarind,"99 Hudson St, New York, NY 10013",4.0,$$$
3,9pMhRYYgV7E9L7pzECtJKA,The Drunken Munkey - UES,"338 E 92nd St, New York, NY 10128",4.5,$$
4,_7BGw3YFNOTzP1Www3zB7g,Mughlai Grill,"6 Clinton St, New York, NY 10002",4.5,$$
...,...,...,...,...,...
459,KxyH5zvTyXawl17lHjA4SQ,Beyond Sushi,"134 West 37th St, New York, NY 10018",4.5,$$
460,EqZEbtu-91BdCPd4chXPVw,Magic Mix Juicery,"102 Fulton St, New York, NY 10038",4.0,$$
461,FEhAl2T2UGwnsdvjLIQbQA,Just Salad,"320 Park Ave, New York, NY 10020",3.0,$$
462,8s5d0nkoNMgu9Oe066q_lg,Kosher Deluxe,"10 W 46th St, New York, NY 10036",3.0,$$


In [164]:
business_frame_backup = business_frame
review_frame_backup = review_frame

In [168]:
                                                                        #merge left and right to one variable
fin_frame = pd.merge(business_frame, review_frame, how = 'left', on = 'Business ID')
fin_frame

Unnamed: 0,Business ID,Name,Address,Rating,Price,Review,Review Rating,Time,Total Reviews
0,sCC7-hSdCkNPExejZT9BAQ,The MasalaWala,"179 Essex St, New York, NY 10002",4.5,$$,Quite possibly the best Indian food I've had i...,5,2019-12-24 19:12:46,1479
1,sCC7-hSdCkNPExejZT9BAQ,The MasalaWala,"179 Essex St, New York, NY 10002",4.5,$$,delish neighborhood delight with great variety...,4,2020-03-09 22:11:48,1479
2,sCC7-hSdCkNPExejZT9BAQ,The MasalaWala,"179 Essex St, New York, NY 10002",4.5,$$,A okay hipster Indian spot in the LES. Food w...,3,2020-01-19 05:53:56,1479
3,HEs6llkn3wIP1ATXeIZQew,Pippali,"129 E 27th St, New York, NY 10016",4.0,$$,This probably is the nicest and most consisten...,5,2020-03-08 17:19:47,771
4,HEs6llkn3wIP1ATXeIZQew,Pippali,"129 E 27th St, New York, NY 10016",4.0,$$,My husband loves Indian food. I have only had ...,4,2020-03-06 10:35:36,771
...,...,...,...,...,...,...,...,...,...
1373,8s5d0nkoNMgu9Oe066q_lg,Kosher Deluxe,"10 W 46th St, New York, NY 10036",3.0,$$,The $10 falafel sandwich with complimentary sa...,2,2019-12-06 07:42:54,104
1374,8s5d0nkoNMgu9Oe066q_lg,Kosher Deluxe,"10 W 46th St, New York, NY 10036",3.0,$$,Unreal! I asked someone for my office to pick...,1,2020-02-24 13:31:21,104
1375,X_AQHQmY2pO_YHaojU_rMg,Just Salad,"134 W 37th St, New York, NY 10018",3.0,$,I am obsessed with this place! My favorite lun...,5,2019-10-22 17:33:22,205
1376,X_AQHQmY2pO_YHaojU_rMg,Just Salad,"134 W 37th St, New York, NY 10018",3.0,$,This has to be the worst Just Salad ! The serv...,1,2020-02-10 09:39:08,205


In [129]:
# fin_frame1 = fin_frame.rename(columns = {'rating': 'Review Rating'})

In [138]:
# fin_frame = fin_frame1
# fin_frame.head(1)

Unnamed: 0,Business ID,Name,Address,Rating,Price,review,Review Rating,time,total reviews,business id
0,sCC7-hSdCkNPExejZT9BAQ,The MasalaWala,"179 Essex St, New York, NY 10002",4.5,$$,Quite possibly the best Indian food I've had i...,5,2019-12-24 19:12:46,1479,sCC7-hSdCkNPExejZT9BAQ


<b>Questions<b>

    1) Which are the 5 most reviewed businesses?
    2) What is the highest rating recieved in your data set and how many businesses have that rating?
    3) What percentage of businesses have a rating greater than or 4.5?
    4) What percentage of businesses have a rating less than 3?
    5) What is the average rating of restaurants that have a price label of one dollar sign? Two dollar signs? Three dollar signs?
    6) Return the text of the reviews for the most reviewed restaurant.
    7) Return the name of the business with the most recent review.
    8) Find the highest rated business and return text of the most recent review. If multiple business have the same rating, select the restaurant with the most reviews.
    9) Find the lowest rated business and return text of the most recent review. If multiple business have the same rating, select the restaurant with the least reviews.


In [174]:
#1 use combined frame
list(fin_frame[['Total Reviews','Name']].sort_values(by = 'Total Reviews', ascending = False)['Name'].unique()[:5])

['The Halal Guys',
 "S'MAC",
 'Bengal Tiger Indian Food',
 'Red Bamboo',
 'by CHLOE']

In [175]:
#2 use combined frame
fin_frame[fin_frame['Rating'] == max(fin_frame['Rating'].values)]['Rating'].value_counts()

5.0    18
Name: Rating, dtype: int64

In [193]:
#3 use business frame only
num_greater = business_frame[business_frame['Rating'] >= 4.5]['Name'].count() #90
num_total = business_frame['Name'].count() #464
percent = num_greater/num_total * 100
percent #19.3966%

90

In [195]:
#4
num_less = business_frame[business_frame['Rating'] < 3]['Name'].count() #24
num_total = business_frame['Name'].count() #464
percent = num_less/num_total * 100
percent #19.3966%
num_less

24

In [None]:
fig = sns.heatmap(fin_frame.isnull())
# only missing some prices

In [201]:
#clean/remap price
price_dict = {
    '$' : 1,
    '$$' : 2,
    '$$$': 3,
    np.nan: np.nan
}

fin_frame['Price'] = fin_frame['Price'].replace(price_dict)
business_frame['Price'] = business_frame['Price'].replace(price_dict)

In [205]:
#5
#one dollar sign
business_frame[business_frame['Price'] == 1]['Rating'].mean() #3.715
#two dollar sign
business_frame[business_frame['Price'] == 2]['Rating'].mean() #3.7969
#three dollar sign
business_frame[business_frame['Price'] == 3]['Rating'].mean() #3.6956
#np.nan doesnt effect using mean method

3.6956521739130435

In [235]:
#6
top_business_id = fin_frame[['Total Reviews','Review']].sort_values(by = 'Total Reviews', ascending = False).iloc[:3]['Review']
list(top_business_id.values)

["First timing visiting early evening with few people ahead of us. \n\nWe have one in Toronto but this cart is something else. It's a loaded combo and the price...",
 "So, in light of what's happening in NYC with the Covid-19 pandemic, I thought this would be the perfect time to give NYC a lot of love. We vacationed here...",
 'We have The Halal Guys locations in my hometown, but of course we have to try the food truck version in NY! They are open late and it is sooo delicious...']

In [236]:
#7
#convert string timestamps to actual timestamps
fin_frame['Time'] = pd.to_datetime(fin_frame['Time'])

In [262]:
fin_frame[['Name','Time']].sort_values(by = 'Time' ,ascending = False)['Name'].iloc[0]

'Bunna Cafe'

In [304]:
#8) Find the highest rated business and return text of the most recent review. 
#   If multiple business have the same rating, select the restaurant with the most reviews.
max_rating = max(business_frame['Rating'])
max_frame = fin_frame[fin_frame['Rating'] == max_rating].sort_values(by = 'Total Reviews', ascending = False).iloc[:3]
name = max_frame['Name'].unique()
print(name)
list(max_frame['Review'].values)

['Royal Grill Halal Food']


['Quality & delicious food on budget! I was introduced to this food cart by one of my colleagues when I moved to work at midtown. I was new to the cart food...',
 'Street food at 3am? A trip to New York almost guarantees it. Hopping off the train and walking to the hotel, a friend recommended this particular...',
 'Believe the reviews. This place is great-?taste, value, vibe. I love Indian food; spent a year in North India developing this taste. And, this guy brings me...']

In [308]:
#9
#Find the lowest rated business and return text of the most recent review.
#If multiple business have the same rating, select the restaurant with the least reviews
min_rating = min(business_frame['Rating'])
min_frame = fin_frame[fin_frame['Rating'] == min_rating].sort_values(by = 'Total Reviews', ascending = True).iloc[:]
min_frame

Unnamed: 0,Business ID,Name,Address,Rating,Price,Review,Review Rating,Time,Total Reviews
1287,85123ck6ZQF5-4rTQbDHjA,Hurry & Tasty Curry,"133 E 45th St, New York, NY 10017",1.0,1.0,Hurry & Tasty thrives on its delivery service ...,1,2007-02-22 20:02:52,2
1288,85123ck6ZQF5-4rTQbDHjA,Hurry & Tasty Curry,"133 E 45th St, New York, NY 10017",1.0,1.0,"If this place had any redeeming qualities, I w...",1,2007-02-17 11:43:41,2
1298,GgVSxkEzFRyAorxuODA1tQ,Masala Indian Cuisine,"789 Newark Ave, Jersey City, NJ 07306",1.0,,"Where do I start?\n\nFirst, an hour and a half...",1,2016-04-30 16:01:57,2
1299,GgVSxkEzFRyAorxuODA1tQ,Masala Indian Cuisine,"789 Newark Ave, Jersey City, NJ 07306",1.0,,Ordered through seamless and probably the wors...,1,2016-03-19 13:54:50,2


In [312]:
tester = pd.DataFrame([4,2,np.nan])
tester.mean()

0    3.0
dtype: float64

# Extra Reference help

###  Pagination

Returning to the Yelp API, the [documentation](https://www.yelp.com/developers/documentation/v3/business_search) also provides us details regarding the API limits. These often include details about the number of requests a user is allowed to make within a specified time limit and the maximum number of results to be returned. In this case, we are told that any request has a maximum of 50 results per request and defaults to 20. Furthermore, any search will be limited to a total of 1000 results. To retrieve all 1000 of these results, we would have to page through the results piece by piece, retriving 50 at a time. Processes such as these are often refered to as pagination.

Now that you have an initial response, you can examine the contents of the json container. For example, you might start with ```response.json().keys()```. Here, you'll see a key for `'total'`, which tells you the full number of matching results given your query parameters. Write a loop (or ideally a function) which then makes successive API calls using the offset parameter to retrieve all of the results (or 5000 for a particularly large result set) for the original query. As you do this, be mindful of how you store the data. 

**Note: be mindful of the API rate limits. You can only make 5000 requests per day, and APIs can make requests too fast. Start prototyping small before running a loop that could be faulty. You can also use time.sleep(n) to add delays. For more details see https://www.yelp.com/developers/documentation/v3/rate_limiting.**

***Below is sample code that you can use to help you deal with the pagination parameter and bring all of the functions together.***


***Also, something might cause your code to break while it is running. You don't want to constantly repull the same data when this happens, so you should insert the data into the database as you call and parse it, not after you have all of the data***


In [None]:
# create a variable  to keep track of which result you are in. 
cur = 0

#set up a while loop to go through and grab the result 
while cur < num and cur < 1000:
    #set the offset parameter to be where you currently are in the results 
    url_params['offset'] = cur
    #make your API call with the new offset number
    results = yelp_call(url_params, api_key)
    
    #after you get your results you can now use your function to parse those results
    parsed_results = parse_results(results)
    
    # use your function to insert your parsed results into the db
    data_save(parsed_results)
    #increment the counter by 50 to move on to the next results
    cur += 50