# Part 1 - Extracting and Saving Data from Yelp API

## Obective

- For this CodeAlong, we will be working with the Yelp API. 
- You will use the the Yelp API to search your home town for a cuisine type of your choice.
- Next class, we will then use Plotly Express to create a map with the Mapbox API to visualize the results.
    
    

## Tools You Will Use
- Part 1:
    - Yelp API:
        - Getting Started: 
            - https://www.yelp.com/developers/documentation/v3/get_started

    - `YelpAPI` python package
        -  "YelpAPI": https://github.com/gfairchild/yelpapi
- Part 2:

    - Plotly Express: https://plotly.com/python/getting-started/
        - With Mapbox API: https://www.mapbox.com/
        - `px.scatter_mapbox` [Documentation](https://plotly.com/python/scattermapbox/): 




### Applying Code From
- Efficient API Calls Lesson Link: https://login.codingdojo.com/m/376/12529/88078

In [68]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Additional Imports
import os, json, math, time
from yelpapi import YelpAPI
from tqdm.notebook import tqdm_notebook

In [61]:
!pip install yelpapi



## 1. Registering for Required APIs


- Yelp: https://www.yelp.com/developers/documentation/v3/get_started


> Check the official API documentation to know what arguments we can search for: https://www.yelp.com/developers/documentation/v3/business_search

### Load Credentials and Create Yelp API Object

In [42]:
# create a relative filepath
relative_path = os.path.join('.secret', 'yelp_api.json')

In [43]:
# Load API Credentials
with open(relative_path) as file:
    login = json.load(file)

print(login.keys())

dict_keys(['client-id', 'api-key'])


In [44]:
# alternate method of loading API creadentials
# correcting the filename in the path
with open(r'C:\Users\bandi\OneDrive\Documents\GitHub\data-enrichment-wk14-activity-mapping-yelp-api-results\.secret\yelp_api.json') as file:
    login = json.load(file)
    
# print the keys of the loaded JSON
print(login.keys())

dict_keys(['client-id', 'api-key'])


In [45]:
# Instantiate YelpAPI Variable
yelp_api = YelpAPI(login['api-key'], timeout_s = 5.0)
yelp_api

<yelpapi.yelpapi.YelpAPI at 0x14083295710>

### Define Search Terms and File Paths

In [46]:
# set our API call parameters and filename before the first call
LOCATION = 'Greenville, SC'
TERM = 'Sushi'

In [47]:
# Quick Test Query
results = yelp_api.search_query(location='Baltimore, MD',
                                       term='Crab Cake')
print(type(results))
results.keys()

<class 'dict'>


dict_keys(['businesses', 'total', 'region'])

In [33]:
# Specify folder for saving data
# folder_path = r"C:\Users\bandi\OneDrive\Documents\GitHub\data-enrichment-wk14-activity-mapping-yelp-api-results"

# # Specifying JSON_FILE filename (can include a folder) => combine the folder path and filename
# JSON_FILE = f'/Data/results_SC_Sushi.json'

# print(f'data will be saved to: {JSON_FILE}')

In [34]:
# Name the file to save results
# JSON_FILE = folder_path
# JSON_FILE

In [48]:
# Name the file to save results
JSON_FILE = r"C:\Users\bandi\OneDrive\Documents\GitHub\data-enrichment-wk14-activity-mapping-yelp-api-results\results_SC_Sushi.json"
JSON_FILE

'C:\\Users\\bandi\\OneDrive\\Documents\\GitHub\\data-enrichment-wk14-activity-mapping-yelp-api-results\\results_SC_Sushi.json'

### Check if Json File exists and Create it if it doesn't

In [49]:
## Check if JSON_FILE exists
file_exists = os.path.isfile(JSON_FILE)

## If it does not exist:
if file_exists == False:
    
    ## CREATE ANY NEEDED FOLDERS
    # Get the Folder Name only
    folder = os.path.dirname(JSON_FILE)
    
    ## If JSON_FILE included a folder:
    if len(folder) > 0:
        # create the folder
        os.makedirs(folder, exist_ok = True)
        
    ## INFORM USER AND SAVE EMPTY LIST
    print(f'[i] {JSON_FILE} not found. Saving empty list to file.')
    
    ## save the first page of results
    with open(JSON_FILE, 'w') as file:
        json.dump([], file)
        
## If it exists, inform user
else:
    print(f'[i] {JSON_FILE} already exits.')

[i] C:\Users\bandi\OneDrive\Documents\GitHub\data-enrichment-wk14-activity-mapping-yelp-api-results\results_SC_Sushi.json already exits.


### Load JSON FIle and account for previous results

In [50]:
## Load previous results and use len of results for offset
with open(JSON_FILE, 'r') as file:
    previous_results = json.load(file)
    
## set offset based on previous results
n_results = len(previous_results)

print(f'- {n_results} previous results found.')

- 0 previous results found.


### Make the first API call to get the first page of data

- We will use this first result to check:
    - how many total results there are?
    - Where is the actual data we want to save?
    - how many results do we get at a time?


In [51]:
# use our yelp_api variable's search_query method to perform our API call
results = yelp_api.search_query(location = LOCATION,
                                term = TERM,
                                offset = n_results)

results.keys()

dict_keys(['businesses', 'total', 'region'])

In [52]:
## How many results total?
total_results = results['total']

total_results

110

- Where is the actual data we want to save?

In [53]:
business_data = results['businesses']

# specify the filename where you want to save the data
json_file_path = JSON_FILE

# save the business data to a JSON file
with open(json_file_path, 'w') as file:
    json.dump(business_data, file, indent = 4)

In [54]:
## How many did we get the details for?
results_per_page = len(business_data)
print(f'number of results retrieved per page', results_per_page)

number of results retrieved per page 20


- Calculate how many pages of results needed to cover the total_results

In [55]:
# Use math.ceil to round up for the total number of pages of results.
n_pages = math.ceil(total_results / results_per_page)

print(f'Total number of pages: {n_pages}')

Total number of pages: 6


In [58]:
# total number of api calls to make as to not exceed call limit
results_per_call = len(results['businesses'])

# Calculate the total number of iterations needed based on total results and results per call.
total_iterations = min(n_pages, math.ceil(total_results / results_per_call))

In [62]:
for i in tqdm_notebook(range(1, n_pages + 1)):

    ## The block of code we want to TRY to run
    try:        
        # Introduce a short delay to respect API rate limits
        time.sleep(0.2)
        
        ## Read in results in progress file and check the length
        with open(JSON_FILE, 'r') as file:
            previous_results = json.load(file)
        
        ## Save number of results to use as offset
        n_results = len(previous_results)
        
        ## Use n_results as the OFFSET 
        results = yelp_api.search_query(location = LOCATION,
                                        term = TERM,
                                        offset = len(previous_results))

        ## Append new results and save to file
        previous_results.extend(results['businesses'])
        with open(JSON_FILE, 'w') as file:
            json.dump(previous_results, file)

            
## What to do if we get an error/exception.
    except Exception as e:
        # check if we are at rate limit
        if 'Too Many Requests for url' in str(e):
            print('Rate limit exceeded. Stop data collection.')
            break
        else:
            print(f'an error occured {e}')
            # optionally handle error differently
            continue

  0%|          | 0/6 [00:00<?, ?it/s]

## Open the Final JSON File with Pandas

In [63]:
df = pd.read_json(JSON_FILE)
display(df.head(), df.tail())

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,location,phone,display_phone,distance,price
0,2jXS4oZkMhAONtd2j7L5Yg,chef-21-sushi-burger-and-korean-bbq-greenville-3,Chef 21 Sushi Burger & Korean BBQ,https://s3-media4.fl.yelpcdn.com/bphoto/DbV4BU...,False,https://www.yelp.com/biz/chef-21-sushi-burger-...,37,"[{'alias': 'korean', 'title': 'Korean'}, {'ali...",4.5,"{'latitude': 34.847671, 'longitude': -82.394229}","[pickup, delivery]","{'address1': '500 E McBee Ave', 'address2': 'S...",18642633018,(864) 263-3018,3341.861901,
1,RGRk1ioORwm_FIX8PM732Q,konnichiwa-greenville,Konnichiwa,https://s3-media3.fl.yelpcdn.com/bphoto/p47H0_...,False,https://www.yelp.com/biz/konnichiwa-greenville...,70,"[{'alias': 'sushi', 'title': 'Sushi Bars'}, {'...",4.1,"{'latitude': 34.845952342825115, 'longitude': ...",[],"{'address1': '101 Falls Park Dr', 'address2': ...",18642524436,(864) 252-4436,4184.255183,
2,7cJxOV-ANX1qLThK3yV96w,otto-izakaya-greenville-4,Otto Izakaya,https://s3-media1.fl.yelpcdn.com/bphoto/TdPhFy...,False,https://www.yelp.com/biz/otto-izakaya-greenvil...,448,"[{'alias': 'japanese', 'title': 'Japanese'}, {...",4.2,"{'latitude': 34.8228218820722, 'longitude': -8...",[delivery],"{'address1': '15 Market Point Dr', 'address2':...",18645688009,(864) 568-8009,5933.485357,$$
3,zG_XOAFi9Y560WJ1RvghBw,sushi-masa-japanese-restaurant-greenville,Sushi-Masa Japanese Restaurant,https://s3-media1.fl.yelpcdn.com/bphoto/zsRavZ...,False,https://www.yelp.com/biz/sushi-masa-japanese-r...,163,"[{'alias': 'sushi', 'title': 'Sushi Bars'}]",4.4,"{'latitude': 34.8512725830078, 'longitude': -8...",[delivery],"{'address1': '8590 Pelham Rd', 'address2': 'St...",18642882227,(864) 288-2227,11481.830881,$$
4,Kx1x7Kf6C2gtogQErWSu0A,o-ku-greenville,O-Ku,https://s3-media2.fl.yelpcdn.com/bphoto/7dR0xy...,False,https://www.yelp.com/biz/o-ku-greenville?adjus...,41,"[{'alias': 'sushi', 'title': 'Sushi Bars'}, {'...",3.9,"{'latitude': 34.847954222223294, 'longitude': ...",[],"{'address1': '30 W Broad St', 'address2': None...",18643264812,(864) 326-4812,3931.009612,


Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,location,phone,display_phone,distance,price
105,Dz1TrhAHtAiCE8ySrQV4vA,publix-super-markets-greenville-2,Publix Super Markets,https://s3-media3.fl.yelpcdn.com/bphoto/GQkkow...,False,https://www.yelp.com/biz/publix-super-markets-...,24,"[{'alias': 'grocery', 'title': 'Grocery'}]",3.6,"{'latitude': 34.86810664795167, 'longitude': -...",[],"{'address1': '215 Pelham Rd', 'address2': '', ...",18643708210,(864) 370-8210,2477.641189,$$
106,xQZIvcjkH2R14yaHr2qQYQ,the-cheesecake-factory-greenville-2,The Cheesecake Factory,https://s3-media3.fl.yelpcdn.com/bphoto/Wk5Aul...,False,https://www.yelp.com/biz/the-cheesecake-factor...,472,"[{'alias': 'desserts', 'title': 'Desserts'}, {...",3.1,"{'latitude': 34.8499166, 'longitude': -82.3335...",[delivery],"{'address1': '700 Haywood Mall', 'address2': '...",18642884444,(864) 288-4444,2209.333296,$$
107,xb9QSdbk63Ani2-S5MrIHQ,harris-teeter-greenville-6,Harris Teeter,https://s3-media3.fl.yelpcdn.com/bphoto/ZelRSg...,False,https://www.yelp.com/biz/harris-teeter-greenvi...,27,"[{'alias': 'grocery', 'title': 'Grocery'}, {'a...",3.6,"{'latitude': 34.8279736, 'longitude': -82.3987...",[],"{'address1': '1720 Augusta St', 'address2': ''...",18649778041,(864) 977-8041,4335.688854,$$
108,GDPBZJ1tDjmHC3v4uxVQzw,publix-super-market-greer-greer,Publix Super Market - Greer,https://s3-media1.fl.yelpcdn.com/bphoto/BzPvjL...,False,https://www.yelp.com/biz/publix-super-market-g...,17,"[{'alias': 'grocery', 'title': 'Grocery'}]",4.1,"{'latitude': 34.8715143081717, 'longitude': -8...",[],"{'address1': '411 The Pkwy', 'address2': '', '...",18648487820,(864) 848-7820,9662.818662,$$
109,_BLlWxSpx1mRGW9eFutYdQ,dairy-queen-grill-and-chill-mauldin-2,Dairy Queen Grill & Chill,https://s3-media1.fl.yelpcdn.com/bphoto/9AJb5X...,False,https://www.yelp.com/biz/dairy-queen-grill-and...,22,"[{'alias': 'hotdogs', 'title': 'Fast Food'}, {...",3.1,"{'latitude': 34.780454197801, 'longitude': -82...","[delivery, pickup]","{'address1': '112 N Main St', 'address2': None...",18643739896,(864) 373-9896,8623.43224,$


In [65]:
# check for duplicate IDs
df.duplicated(subset = 'id').sum()

0

In [66]:
## convert the filename to a .csv.gz
csv_file = JSON_FILE.replace('.json','.csv.gz')
csv_file

'C:\\Users\\bandi\\OneDrive\\Documents\\GitHub\\data-enrichment-wk14-activity-mapping-yelp-api-results\\results_SC_Sushi.csv.gz'

# Creating a Relative File Path

In [71]:
# specify directory and filename
directory = 'Data'
filename = 'final_results_SC_Sushi'
path = os.path.join(directory, filename)

# ensure that the 'Data' directory exists
os.makedirs(directory, exist_ok = True)

## save it as a compressed csv (to save space)
df.to_csv(path, compression = 'gzip', index = False)

# Bonus: compare filesize with os module's `os.path.getsize`

In [72]:
# Step 1: Correctly Save the JSON File
json_file = 'Data/final_results_SC_Sushi.json'  # Specify the correct JSON file name
os.makedirs('Data', exist_ok = True)  # Ensure the Data directory exists
df.to_json(json_file, orient = 'records', lines = True)  # Save the DataFrame as JSON

# Step 2: Convert and Save as .CSV.GZ
csv_gz_file = json_file.replace('.json', '.csv.gz')  # Create the CSV.GZ file name based on the JSON file name
df.to_csv(csv_gz_file, compression = 'gzip', index = False)  # Save the DataFrame as compressed CSV

In [73]:
# Step 3: Compare File Sizes
if os.path.exists(json_file) and os.path.exists(csv_gz_file):
    size_json = os.path.getsize(json_file)
    size_csv_gz = os.path.getsize(csv_gz_file)

    print(f'JSON FILE: {size_json:,} Bytes')
    print(f'CSV.GZ FILE: {size_csv_gz:,} Bytes')

    if size_csv_gz > 0:
        compression_ratio = size_json / size_csv_gz
        print(f'The csv.gz file is {compression_ratio:.2f} times smaller than the JSON file.')
    else:
        print("CSV.GZ file size is 0, cannot compare sizes.")
else:
    print("One or both files do not exist, check file paths.")

JSON FILE: 102,226 Bytes
CSV.GZ FILE: 16,057 Bytes
The csv.gz file is 6.37 times smaller than the JSON file.


## Next Class: Processing the Results and Mapping 