## Obective

- For this project, I worked with the Yelp API. 
- I used the the Yelp API to search for korean cuisines in the Greater Los Angeles Area.
- Then I used Plotly Express to create a map with the Mapbox API to visualize the results.
    
    

In [1]:
!pip install yelpapi



In [2]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Additional Imports
import os, json, math, time
from yelpapi import YelpAPI
from tqdm.notebook import tqdm_notebook

In [3]:
def write_json(new_data, filename): 
    """Appends a list of records (new_data) to a json file (filename). 
    Adapted from: https://www.geeksforgeeks.org/append-to-json-file-using-python/"""  
    
    with open(filename,'r+') as file:
        # First we load existing data into a dict.
        file_data = json.load(file)
        ## Choose extend or append
        if (type(new_data) == list) & (type(file_data) == list):
            file_data.extend(new_data)
        else:
             file_data.append(new_data)
        # Sets file's current position at offset.
        file.seek(0)
        # convert back to json.
        json.dump(file_data, file)

In [4]:
def create_json_file(JSON_FILE,  delete_if_exists=False):
    
    ## Check if JSON_FILE exists
    file_exists = os.path.isfile(JSON_FILE)
    
    ## If it DOES exist:
    if file_exists == True:
        
        ## Check if user wants to delete if exists
        if delete_if_exists==True:
            
            print(f"[!] {JSON_FILE} already exists. Deleting previous file...")
            ## delete file and confirm it no longer exits.
            os.remove(JSON_FILE)
            ## Recursive call to function after old file deleted
            create_json_file(JSON_FILE,delete_if_exists=False)
        else:
            print(f"[i] {JSON_FILE} already exists.")            
            
            
    ## If it does NOT exist:
    else:
        
        ## INFORM USER AND SAVE EMPTY LIST
        print(f"[i] {JSON_FILE} not found. Saving empty list to new file.")
        
        ## CREATE ANY NEEDED FOLDERS
        # Get the Folder Name only
        folder = os.path.dirname(JSON_FILE)
        
        ## If JSON_FILE included a folder:
        if len(folder)>0:
            # create the folder
            os.makedirs(folder,exist_ok=True)
        ## Save empty list to start the json file
        with open(JSON_FILE,'w') as f:
            json.dump([],f)  

## 1. Registering for Required APIs


- Yelp: https://www.yelp.com/developers/documentation/v3/get_started


> Check the official API documentation to know what arguments we can search for: https://www.yelp.com/developers/documentation/v3/business_search

### Load Credentials and Create Yelp API Object

In [5]:
# Load API Credentials
with open('/Users/christianrim/.secret/yelp_api.json','r') as f:   #use your path here!
    login = json.load(f)

In [6]:
login.keys()

dict_keys(['client-id', 'api-key'])

In [7]:
# Instantiating YelpAPI variable
yelp_api = YelpAPI(login['api-key'], timeout_s=5.0)

### Define Search Terms and File Paths

In [8]:
# set our API call parameters and filename before the first call
location = 'Los Angeles, CA 90020'
term = 'Korean'

In [9]:
location.split(',')[0]

'Los Angeles'

In [10]:
## Specify fodler for saving data

FOLDER = 'Data/'

os.makedirs(FOLDER, exist_ok= True)
# Specifying JSON_FILE filename (can include a folder)
JSON_FILE = FOLDER+f"{location.split(',')[0]}~{term}.json"

In [11]:
JSON_FILE

'Data/Los Angeles~Korean.json'

### Check if Json File exists and Create it if it doesn't

In [12]:
## Check if JSON_FILE exists
file_exists = os.path.isfile(JSON_FILE)
## If it does not exist: 
if file_exists == False:
    ## CREATE ANY NEEDED FOLDERS
    # Get the Folder Name only
    folder = os.path.dirname(JSON_FILE)
    
    ## If JSON_FILE included a folder:
    if len(folder)>0:
        # create the folder
        os.makedirs(folder, exist_ok=True)
        
        
    ## INFORM USER AND SAVE EMPTY LIST
    print(f"[i] {JSON_FILE} not found. Saving empty list to file.")
    
    
    ## save the first page of results
    with open(JSON_FILE, 'w') as f:
          json.dump([], f)
        
## If it exists, inform user
else:
    print(f"[i] {JSON_FILE} already exists.")

[i] Data/Los Angeles~Korean.json not found. Saving empty list to file.


### Load JSON FIle and account for previous results

In [13]:
## Load previous results and use len of results for offset
with open(JSON_FILE, 'r') as f:
    previous_results = json.load(f)
## set offset based on previous results
n_results = len(previous_results)
print(f' - {n_results} previous results found.')

 - 0 previous results found.


### Make the first API call to get the first page of data

- We will use this first result to check:
    - how many total results there are?
    - Where is the actual data we want to save?
    - how many results do we get at a time?


In [14]:
# use our yelp_api variable's search_query method to perform our API call
results = yelp_api.search_query(location=location,
                               term=term,
                               offset=n_results)
results.keys()

dict_keys(['businesses', 'total', 'region'])

In [15]:
## How many results total?


- Where is the actual data we want to save?

In [16]:
total_results = results['total']
total_results

994

In [17]:
## How many did we get the details for?
results_per_page = len(results['businesses'])
results_per_page

20

- Calculate how many pages of results needed to cover the total_results

In [18]:
# Use math.ceil to round up for the total number of pages of results.
n_pages = math.ceil((results['total']-n_results)/ results_per_page)
n_pages

50

In [19]:
# joining new results with old list with extend and saving to file
previous_results.extend(results['businesses'])
with open(JSON_FILE,'w') as f:
    json.dump(previous_results,f)

In [21]:
for i in tqdm_notebook( range(1,n_pages+1)):
    ## Read in results in progress file and check the length
    with open(JSON_FILE, 'r') as f:
        previous_results = json.load(f)
        ## save number of results for to use as offset
        n_results = len(previous_results)
        ## use n_results as the OFFSET 
        results = yelp_api.search_query(location=location,
                               term=term,
                               offset=n_results) 

        ## append new results and save to file
        previous_results.extend(results['businesses'])
        
        with open(JSON_FILE,'w') as f:
             json.dump(previous_results,f)

           time.sleep(.2)


In [30]:
os.remove(JSON_FILE)
os.path.isfile(JSON_FILE)

False

In [33]:
## Create a new empty json file (exist the previous if it exists)
create_json_file(JSON_FILE, delete_if_exists=True)
## Load previous results and use len of results for offset
with open(JSON_FILE,'r') as f:
    previous_results = json.load(f)
    
## set offset based on previous results
n_results = len(previous_results)
print(f'- {n_results} previous results found.')
# use our yelp_api variable's search_query method to perform our API call
results = yelp_api.search_query(location=location,
                                term=term,
                               offset=n_results)
## How many results total?
total_results = results['total']
## How many did we get the details for?
results_per_page = len(results['businesses'])
# Use math.ceil to round up for the total number of pages of results.
n_pages = math.ceil((results['total']-n_results)/ results_per_page)
n_pages



[!] Data/Los Angeles~Korean.json already exists. Deleting previous file...
[i] Data/Los Angeles~Korean.json not found. Saving empty list to new file.
- 0 previous results found.


50

In [34]:
for i in tqdm_notebook( range(1,n_pages+1)):
    
    ## Read in results in progress file and check the length
    with open(JSON_FILE, 'r') as f:
        previous_results = json.load(f)
    ## save number of results for to use as offset
    n_results = len(previous_results)
    
    if (n_results + results_per_page) > 1000:
        print('Exceeded 1000 api calls. Stopping loop.')
        break
    
    ## use n_results as the OFFSET 
    results = yelp_api.search_query(location=location,
                                    term=term, 
                                    offset=n_results)
    
    
    
    ## append new results and save to file
    previous_results.extend(results['businesses'])
    
    # display(previous_results)
    with open(JSON_FILE,'w') as f:
        json.dump(previous_results,f)
    
    time.sleep(.2)



  0%|          | 0/50 [00:00<?, ?it/s]

## Open the Final JSON File with Pandas

In [44]:
# load final results 
df = pd.read_json(JSON_FILE)

In [45]:
df.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,h1R2iKYdm2lwukzMJvJqDw,hangari-kalguksu-los-angeles-4,Hangari Kalguksu,https://s3-media3.fl.yelpcdn.com/bphoto/X_U65O...,False,https://www.yelp.com/biz/hangari-kalguksu-los-...,2412,"[{'alias': 'korean', 'title': 'Korean'}, {'ali...",4.5,"{'latitude': 34.0628602582049, 'longitude': -1...","[delivery, pickup]",$$,"{'address1': '3470 W 6th St', 'address2': 'Ste...",12133882326,(213) 388-2326,1302.013734
1,uzAbw27XQTXTivjgf2bN2w,han-bat-sul-lung-tang-los-angeles-2,Han Bat Sul Lung Tang,https://s3-media2.fl.yelpcdn.com/bphoto/YttPox...,False,https://www.yelp.com/biz/han-bat-sul-lung-tang...,2682,"[{'alias': 'korean', 'title': 'Korean'}, {'ali...",4.5,"{'latitude': 34.065414, 'longitude': -118.3095...",[delivery],$$,"{'address1': '4163 W 5th St', 'address2': '', ...",12133839499,(213) 383-9499,170.572376
2,WO12S4gvOYkYF8_70RWrQA,haneuem-los-angeles-3,HanEuem,https://s3-media3.fl.yelpcdn.com/bphoto/Rh_Q54...,False,https://www.yelp.com/biz/haneuem-los-angeles-3...,198,"[{'alias': 'korean', 'title': 'Korean'}, {'ali...",4.5,"{'latitude': 34.06427634725579, 'longitude': -...","[delivery, pickup]",,"{'address1': '539 S Western Ave', 'address2': ...",12133888988,(213) 388-8988,267.679696
3,XgQ8riUvnMOBVfRG29NCEQ,anju-house-los-angeles-2,Anju House,https://s3-media1.fl.yelpcdn.com/bphoto/PJMUwa...,False,https://www.yelp.com/biz/anju-house-los-angele...,4,"[{'alias': 'korean', 'title': 'Korean'}]",5.0,"{'latitude': 34.0701519575396, 'longitude': -1...","[delivery, pickup]",,"{'address1': '234 S Oxford Ave', 'address2': '...",12133155153,(213) 315-5153,533.129221
4,TWHGJkTAbF22hvXeReQp9w,kobawoo-house-los-angeles-2,Kobawoo House,https://s3-media2.fl.yelpcdn.com/bphoto/KhZfyO...,False,https://www.yelp.com/biz/kobawoo-house-los-ang...,1410,"[{'alias': 'korean', 'title': 'Korean'}, {'ali...",4.5,"{'latitude': 34.06013289065783, 'longitude': -...","[delivery, pickup]",$$,"{'address1': '698 S Vermont Ave', 'address2': ...",12133897300,(213) 389-7300,1971.56136


In [38]:
# checking for duplicate results 
final_df.duplicated(subset='id').sum()

0

In [46]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 988 entries, 0 to 987
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             988 non-null    object 
 1   alias          988 non-null    object 
 2   name           988 non-null    object 
 3   image_url      988 non-null    object 
 4   is_closed      988 non-null    bool   
 5   url            988 non-null    object 
 6   review_count   988 non-null    int64  
 7   categories     988 non-null    object 
 8   rating         988 non-null    float64
 9   coordinates    988 non-null    object 
 10  transactions   988 non-null    object 
 11  price          791 non-null    object 
 12  location       988 non-null    object 
 13  phone          988 non-null    object 
 14  display_phone  988 non-null    object 
 15  distance       988 non-null    float64
dtypes: bool(1), float64(2), int64(1), object(12)
memory usage: 116.9+ KB


In [47]:
# dropping duplicate ids and comfirming no duplicates 
final_df = final_df.drop_duplicates(subset='id')
final_df.duplicated(subset='id').sum()

0

In [48]:
## convert the filename to a .csv.gz
csv_file = JSON_FILE.replace('.json','.csv.gz')
csv_file

'Data/Los Angeles~Korean.csv.gz'

In [49]:
## Save it as a compressed csv 
df.to_csv(csv_file, compression='gzip', index=False)