# Efficient Yelp API Calls
For this assignment, you will be working with the Yelp API.

As before, you will use the Yelp API to search your favorite city for a cuisine type of your choice.

Extract all of the results from your search and compile them into one dataframe using a for loop as shown in the lesson "Code for Efficient API Extraction"

Save your notebook, commit the change to your repository and submit the repository URL for this assignment.

## Credentials and Accessing the API

In [53]:
import pandas as pd
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os, json, math, time
from yelpapi import YelpAPI
from tqdm.notebook import tqdm_notebook

In [4]:
with open('/Users/Evan/.secret/yelp_api.json') as f:
    login = json.load(f)
login.keys()



dict_keys(['client-id', 'api-key'])

In [5]:
# Create an instance with your key
yelp_api = YelpAPI(login['api-key'], timeout_s=5.0)
yelp_api

<yelpapi.yelpapi.YelpAPI at 0x1c8f24f72e0>

In [14]:
# set our API call parameters 
LOCATION = 'Philadelphia, PA'
TERM = 'Steak'


In [15]:
# use our yelp_api variable's search_query method to perform our API call
search_results = yelp_api.search_query(location='Philadelphia, PA',
                                       term='Steak')
print(type(search_results))
search_results.keys()



<class 'dict'>


dict_keys(['businesses', 'total', 'region'])

## Create a results-in-progress JSON file, but only if it doesn't exist.

In [16]:
# Specifying JSON_FILE filename (can include a folder)
# include the search terms in the filename
JSON_FILE = "Data/results_in_progress_Philly_steak.json"
JSON_FILE



'Data/results_in_progress_Philly_steak.json'

In [17]:
import os, json, math, time

In [45]:
# Code copied from Coding Dojo Learning Platform
# If error occurs and need to restart results .json file

def create_json_file(JSON_FILE,  delete_if_exists=False):

    ## Check if JSON_FILE exists
    file_exists = os.path.isfile(JSON_FILE)

    ## If it DOES exist:
    if file_exists == True:

        ## Check if user wants to delete if exists
        if delete_if_exists==True:

            print(f"[!] {JSON_FILE} already exists. Deleting previous file...")
            ## delete file and confirm it no longer exits.
            os.remove(JSON_FILE)
            ## Recursive call to function after old file deleted
            create_json_file(JSON_FILE,delete_if_exists=False)
        else:
            print(f"[i] {JSON_FILE} already exists.")            


    ## If it does NOT exist:
    else:

        ## INFORM USER AND SAVE EMPTY LIST
        print(f"[i] {JSON_FILE} not found. Saving empty list to new file.")

        ## CREATE ANY NEEDED FOLDERS
        # Get the Folder Name only
        folder = os.path.dirname(JSON_FILE)

        ## If JSON_FILE included a folder:
        if len(folder)>0:
            # create the folder
            os.makedirs(folder,exist_ok=True)
        ## Save empty list to start the json file
        with open(JSON_FILE,'w') as f:
            json.dump([],f)  

## Page Results and Variables

In [50]:
## Create a new empty json file (exist the previous if it exists)
create_json_file(JSON_FILE, delete_if_exists=True)
## Load previous results and use len of results for offset
with open(JSON_FILE,'r') as f:
    previous_results = json.load(f)
    
## set offset based on previous results
n_results = len(previous_results)
print(f'- {n_results} previous results found.')
# use our yelp_api variable's search_query method to perform our API call
results = yelp_api.search_query(location=LOCATION,
                                term=TERM,
                               offset=n_results)
## How many results total?
total_results = results['total']
## How many did we get the details for?
results_per_page = len(results['businesses'])
# Use math.ceil to round up for the total number of pages of results.
n_pages = math.ceil((results['total']-n_results)/ results_per_page)
n_pages





[!] Data/results_in_progress_Philly_steak.json already exists. Deleting previous file...
[i] Data/results_in_progress_Philly_steak.json not found. Saving empty list to new file.
- 0 previous results found.


320

In [19]:
## Load previous results and use len of results for offset
with open(JSON_FILE,'r') as f:
    previous_results = json.load(f)
    
## set offset based on previous results
n_results = len(previous_results)
print(f'- {n_results} previous results found.')

# use our yelp_api variable's search_query method to perform our API call
results = yelp_api.search_query(location=LOCATION,
                                term=TERM,
                               offset=n_results)
results.keys()



- 0 previous results found.


dict_keys(['businesses', 'total', 'region'])

In [18]:
## Check if JSON_FILE exists
file_exists = os.path.isfile(JSON_FILE)
## If it does not exist: 
if file_exists == False:
    
    ## CREATE ANY NEEDED FOLDERS
    # Get the Folder Name only
    folder = os.path.dirname(JSON_FILE)
    ## If JSON_FILE included a folder:
    if len(folder)>0:
        # create the folder
        os.makedirs(folder,exist_ok=True)
        
        
    ## INFORM USER AND SAVE EMPTY LIST
    print(f'[i] {JSON_FILE} not found. Saving empty list to file.')
    
    
    # save an empty list
    with open(JSON_FILE,'w') as f:
        json.dump([],f)  
# If it exists, inform user
else:
    print(f"[i] {JSON_FILE} already exists.")


[i] Data/results_in_progress_Philly_steak.json already exists.


In [22]:
## How many results total?
total_results = results['total']
total_results

## How many did we get the details for?
results_per_page = len(results['businesses'])
results_per_page

print(total_results)
print(results_per_page)

6400
20


In [54]:
for i in tqdm_notebook( range(1,n_pages+1)):
    
    ## Read in results in progress file and check the length
    with open(JSON_FILE, 'r') as f:
        previous_results = json.load(f)
    ## save number of results for to use as offset
    n_results = len(previous_results)
    
    if (n_results + results_per_page) > 1000:
        print('Exceeded 1000 api calls. Stopping loop.')
        break
    
    ## use n_results as the OFFSET 
    results = yelp_api.search_query(location=LOCATION,
                                    term=TERM, 
                                    offset=n_results)
    
    
    
    ## append new results and save to file
    previous_results.extend(results['businesses'])
    
    # display(previous_results)
    with open(JSON_FILE,'w') as f:
        json.dump(previous_results,f)
    
    time.sleep(.2)



  0%|          | 0/320 [00:00<?, ?it/s]

Exceeded 1000 api calls. Stopping loop.


## Convert .json to Dataframe

In [55]:
# load final results
final_df = pd.read_json(JSON_FILE)
display(final_df.head(), final_df.tail())


Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,0oSSjekU-3GR8gselReWnA,butcher-and-singer-philadelphia,Butcher and Singer,https://s3-media4.fl.yelpcdn.com/bphoto/54_xSX...,False,https://www.yelp.com/biz/butcher-and-singer-ph...,1425,"[{'alias': 'tradamerican', 'title': 'American ...",4.5,"{'latitude': 39.9493338, 'longitude': -75.1661...","[pickup, delivery]",$$$$,"{'address1': '1500 Walnut St', 'address2': '',...",12157324444,(215) 732-4444,7186.326378
1,hyFzDuyOWNG2rg5GYJ2wiQ,alpen-rose-philadelphia,Alpen Rose,https://s3-media3.fl.yelpcdn.com/bphoto/z7yQXG...,False,https://www.yelp.com/biz/alpen-rose-philadelph...,226,"[{'alias': 'steak', 'title': 'Steakhouses'}, {...",4.5,"{'latitude': 39.949778, 'longitude': -75.16215...",[delivery],,"{'address1': '116 S 13th St', 'address2': '', ...",12156000709,(215) 600-0709,6953.276252
2,saVXla5i8TjE51S5uCaf6w,steak-48-philadelphia-3,Steak 48,https://s3-media4.fl.yelpcdn.com/bphoto/Q66TvQ...,False,https://www.yelp.com/biz/steak-48-philadelphia...,244,"[{'alias': 'steak', 'title': 'Steakhouses'}, {...",4.0,"{'latitude': 39.947286, 'longitude': -75.165197}",[],$$$$,"{'address1': '260 S Broad St', 'address2': Non...",12155524848,(215) 552-4848,7327.302098
3,4Qi3Ry_lz4V3BM1frq87VA,umami-steak-and-sushi-bar-philadelphia-2,Umami Steak & Sushi Bar,https://s3-media3.fl.yelpcdn.com/bphoto/oR_TVA...,False,https://www.yelp.com/biz/umami-steak-and-sushi...,72,"[{'alias': 'japanese', 'title': 'Japanese'}, {...",4.5,"{'latitude': 39.948221, 'longitude': -75.153647}","[pickup, delivery]",,"{'address1': '727 Walnut Street Basement', 'ad...",12675345395,(267) 534-5395,6753.167085
4,wbDRmtxaKRpBOjutvV6TEA,barclay-prime-philadelphia,Barclay Prime,https://s3-media1.fl.yelpcdn.com/bphoto/KSBuz7...,False,https://www.yelp.com/biz/barclay-prime-philade...,878,"[{'alias': 'steak', 'title': 'Steakhouses'}]",4.5,"{'latitude': 39.9487096, 'longitude': -75.1708...","[pickup, delivery]",$$$$,"{'address1': '237 S 18th St', 'address2': None...",12157327560,(215) 732-7560,7479.015315


Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
995,Rt4xYQBWC8i2xqLp9dP7XQ,jims-steaks-springfield,Jim's Steaks,https://s3-media3.fl.yelpcdn.com/bphoto/-GbmbO...,False,https://www.yelp.com/biz/jims-steaks-springfie...,127,"[{'alias': 'steak', 'title': 'Steakhouses'}]",3.0,"{'latitude': 39.9207126, 'longitude': -75.3230...",[delivery],$$,"{'address1': '469 Baltimore Pike', 'address2':...",16105448400.0,(610) 544-8400,19726.827221
996,1j7Y1eek07NRv_SkBOJFgA,the-ridley-house-holmes-2,The Ridley House,https://s3-media2.fl.yelpcdn.com/bphoto/Wt22zK...,False,https://www.yelp.com/biz/the-ridley-house-holm...,74,"[{'alias': 'bars', 'title': 'Bars'}, {'alias':...",3.5,"{'latitude': 39.89753306828397, 'longitude': -...",[],$$,"{'address1': '2107 Mac Dade Blvd Holmes Pa', '...",16105225400.0,(610) 522-5400,20202.842356
997,Rs5OWq7WbdtPMDHC6c5pzg,city-steaks-burlington,City Steaks,https://s3-media2.fl.yelpcdn.com/bphoto/Is9Zdr...,False,https://www.yelp.com/biz/city-steaks-burlingto...,54,"[{'alias': 'sandwiches', 'title': 'Sandwiches'}]",4.5,"{'latitude': 40.0678291, 'longitude': -74.8540...","[delivery, pickup]",$,"{'address1': '818 High St', 'address2': '', 'a...",16093863600.0,(609) 386-3600,23607.813352
998,xxX7E2q0Rpw1LK3aX2b-YQ,mexican-grill-and-pizza-phila,Mexican Grill & Pizza,https://s3-media4.fl.yelpcdn.com/bphoto/6yZgX7...,False,https://www.yelp.com/biz/mexican-grill-and-piz...,2,"[{'alias': 'mexican', 'title': 'Mexican'}]",4.5,"{'latitude': 40.00157, 'longitude': -75.22311}","[delivery, pickup]",,"{'address1': '2749 N 47th St', 'address2': Non...",,,8987.056031
999,03cvVjbkCOtiIt51M-MLFg,prime-halal-meat-philadelphia,Prime Halal Meat,https://s3-media4.fl.yelpcdn.com/bphoto/InPurt...,False,https://www.yelp.com/biz/prime-halal-meat-phil...,69,"[{'alias': 'meats', 'title': 'Meat Shops'}, {'...",4.5,"{'latitude': 39.9463154, 'longitude': -75.1797...",[delivery],$,"{'address1': '500 S 23rd St', 'address2': '', ...",12157358185.0,(215) 735-8185,8162.064074


## Check for and remove any duplicate results.

Location, coordinates and categories are lists so we can't check for
duplicates using those columns. We will use the id column to check for any
duplicate rows.



In [58]:
final_df.duplicated(subset='id').sum()

0

- There are no duplicates

## Save the dataframe to a .csv file or .csv.gz if too big.

In [59]:
# save the final results to a compressed csv
final_df.to_csv('Data/final_results_Philly_Steak.csv.gz', compression='gzip',index=False)