# Part 1 - Extracting and Saving Data from Yelp API

## Obective

- For this CodeAlong, we will be working with the Yelp API. 
- You will use the the Yelp API to search your home town for a cuisine type of your choice.
- Next class, we will then use Plotly Express to create a map with the Mapbox API to visualize the results.
    
    

## Tools You Will Use
- Part 1:
    - Yelp API:
        - Getting Started: 
            - https://www.yelp.com/developers/documentation/v3/get_started

    - `YelpAPI` python package
        -  "YelpAPI": https://github.com/gfairchild/yelpapi
- Part 2:

    - Plotly Express: https://plotly.com/python/getting-started/
        - With Mapbox API: https://www.mapbox.com/
        - `px.scatter_mapbox` [Documentation](https://plotly.com/python/scattermapbox/): 




### Applying Code From
- Efficient API Calls Lesson Link: https://login.codingdojo.com/m/376/12529/88078

In [1]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Additional Imports
# os - for saving and loading files
# json - to work with json files
# math - to round up results
# time - to add a short pause to not overwhelm the server
import os, json, math, time

# to make yelpapi calls
from yelpapi import YelpAPI

# progress bar from tqdm_notebook
from tqdm.notebook import tqdm_notebook

In [2]:
!pip install yelpapi
!pip install tqdm



## 1. Registering for Required APIs


- Yelp: https://www.yelp.com/developers/documentation/v3/get_started


> Check the official API documentation to know what arguments we can search for: https://www.yelp.com/developers/documentation/v3/business_search

### Load Credentials and Create Yelp API Object

In [3]:
# Load API Credentials
# Load API Credentials
with open('/Users/ericakitano/.secret/yelp_api.json', 'r') as f:
    login = json.load(f)

In [4]:
login.keys()

dict_keys(['client-id', 'api-key'])

In [4]:
# Instantiate YelpAPI Variable
yelp = YelpAPI(login['api-key'], timeout_s = 5.0)

### Define Search Terms and File Paths

In [6]:
# set our API call parameters and filename before the first call
location = 'Waikiki, HI'
term = 'acai'

In [7]:
location.split(',')[0]

'Waikiki'

In [6]:
## Specify folder for saving data
FOLDER = 'Data/'

os.makedirs(FOLDER, exist_ok = True)


# Specifying JSON_FILE filename (can include a folder)
JSON_FILE = FOLDER+f"{location.split(',')[0]}-{term}.json"

In [13]:
JSON_FILE

'Data/Waikiki-acai.json'

### Check if Json File exists and Create it if it doesn't

In [14]:
os.path.isfile(JSON_FILE)

False

In [15]:
## Check if JSON_FILE exists
file_exists = os.path.isfile(JSON_FILE)
## If it does not exist: 
if file_exists == False:  
    ## CREATE ANY NEEDED FOLDERS
    # Get the Folder Name only
    folder = os.path.dirname(JSON_FILE)
    
    ## If JSON_FILE included a folder:
    if len(folder)>0:
        # create the folder
        os.makedirs(folder, exist_ok = True)
        
        
    ## INFORM USER AND SAVE EMPTY LIST
    print(f"[i] {JSON_FILE} not found. Saving empty list to file.")
    
    
    ## save the first page of results
    with open(JSON_FILE, 'w') as f:
          json.dump([], f)
        
## If it exists, inform user
else:
    print(f"[i] {JSON_FILE} already exists.")

[i] Data/Waikiki-acai.json not found. Saving empty list to file.


In [16]:
os.path.isfile(JSON_FILE)

True

In [17]:
## Check if JSON_FILE exists
file_exists = os.path.isfile(JSON_FILE)
## If it does not exist: 
if file_exists == False:  
    ## CREATE ANY NEEDED FOLDERS
    # Get the Folder Name only
    folder = os.path.dirname(JSON_FILE)
    
    ## If JSON_FILE included a folder:
    if len(folder)>0:
        # create the folder
        os.makedirs(folder, exist_ok = True)
        
        
    ## INFORM USER AND SAVE EMPTY LIST
    print(f"[i] {JSON_FILE} not found. Saving empty list to file.")
    
    
    ## save the first page of results
    with open(JSON_FILE, 'w') as f:
          json.dump([], f)
        
## If it exists, inform user
else:
    print(f"[i] {JSON_FILE} already exists.")

[i] Data/Waikiki-acai.json already exists.


### Load JSON FIle and account for previous results

### Make the first API call to get the first page of data

- We will use this first result to check:
    - how many total results there are?
    - Where is the actual data we want to save?
    - how many results do we get at a time?


In [18]:
# use our yelp_api variable's search_query method to perform our API call
results = yelp.search_query(term = term, location = location)

In [19]:
type(results)

dict

In [20]:
len(results)

3

In [21]:
results.keys()

dict_keys(['businesses', 'total', 'region'])

In [22]:
results['businesses']

[{'id': 'AVSOYqrPYsLo7c-XRDwobg',
  'alias': 'the-sunrise-shack-honolulu',
  'name': 'The Sunrise Shack',
  'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/kGVUny4Oa838LYk4DZzX-w/o.jpg',
  'is_closed': False,
  'url': 'https://www.yelp.com/biz/the-sunrise-shack-honolulu?adjust_creative=L6aEZ7VCBeYnaQEYoR4t_A&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=L6aEZ7VCBeYnaQEYoR4t_A',
  'review_count': 784,
  'categories': [{'alias': 'coffee', 'title': 'Coffee & Tea'},
   {'alias': 'juicebars', 'title': 'Juice Bars & Smoothies'}],
  'rating': 4.5,
  'coordinates': {'latitude': 21.277213128719804,
   'longitude': -157.8275935854154},
  'transactions': ['delivery'],
  'price': '$$',
  'location': {'address1': '2335 Kalakaua Ave',
   'address2': '',
   'address3': None,
   'city': 'Honolulu',
   'zip_code': '96815',
   'country': 'US',
   'state': 'HI',
   'display_address': ['2335 Kalakaua Ave', 'Honolulu, HI 96815']},
  'phone': '+18089266460',
  'display_phone': 

In [23]:
results['total']

94

In [24]:
results['region']

{'center': {'longitude': -157.82873153686523, 'latitude': 21.275296393424053}}

In [25]:
## How many results total?
pd.DataFrame(results['businesses'])

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,AVSOYqrPYsLo7c-XRDwobg,the-sunrise-shack-honolulu,The Sunrise Shack,https://s3-media2.fl.yelpcdn.com/bphoto/kGVUny...,False,https://www.yelp.com/biz/the-sunrise-shack-hon...,784,"[{'alias': 'coffee', 'title': 'Coffee & Tea'},...",4.5,"{'latitude': 21.277213128719804, 'longitude': ...",[delivery],$$,"{'address1': '2335 Kalakaua Ave', 'address2': ...",18089266460.0,(808) 926-6460,243.572526
1,3vsRGQGSVPec7kKBFGhowA,alo-cafe-hawaii-honolulu-2,ALO Cafe Hawaii,https://s3-media2.fl.yelpcdn.com/bphoto/_tn-bh...,False,https://www.yelp.com/biz/alo-cafe-hawaii-honol...,98,"[{'alias': 'juicebars', 'title': 'Juice Bars &...",4.5,"{'latitude': 21.27713, 'longitude': -157.82433}","[pickup, delivery]",$$,"{'address1': '159 Kaiulani Ave', 'address2': '...",18087797887.0,(808) 779-7887,505.744094
2,QMPFWhM_kMIMSSBQf48-Gg,waffle-and-berry-honolulu,Waffle and Berry,https://s3-media4.fl.yelpcdn.com/bphoto/6VpIBT...,False,https://www.yelp.com/biz/waffle-and-berry-hono...,675,"[{'alias': 'waffles', 'title': 'Waffles'}, {'a...",5.0,"{'latitude': 21.28594, 'longitude': -157.83275}","[pickup, delivery]",$$,"{'address1': '1958 Kalakaua Ave', 'address2': ...",18082068272.0,(808) 206-8272,1256.06174
3,FhShWwhWNm4j6yMjyF0MCw,banán-waikiki-beach-shack-honolulu,Banán - Waikiki Beach Shack,https://s3-media3.fl.yelpcdn.com/bphoto/HBcgiP...,False,https://www.yelp.com/biz/ban%C3%A1n-waikiki-be...,518,"[{'alias': 'vegan', 'title': 'Vegan'}, {'alias...",4.5,"{'latitude': 21.2772821247839, 'longitude': -1...",[],$,"{'address1': '2301 Kalakaua Ave', 'address2': ...",18082001640.0,(808) 200-1640,242.294662
4,vZ6iGKsU7zX8MT5rwNJeZw,tropical-tribe-honolulu,Tropical Tribe,https://s3-media3.fl.yelpcdn.com/bphoto/e7Y1Qu...,False,https://www.yelp.com/biz/tropical-tribe-honolu...,592,"[{'alias': 'acaibowls', 'title': 'Acai Bowls'}...",4.5,"{'latitude': 21.28557897099554, 'longitude': -...","[pickup, delivery]",$$,"{'address1': '1778 Ala Moana Blvd', 'address2'...",18083668226.0,(808) 366-8226,1522.472981
5,vFz6Nwo0LgpbJcU9St-uPA,da-cove-health-bar-and-cafe-honolulu-4,Da Cove Health Bar & Cafe,https://s3-media3.fl.yelpcdn.com/bphoto/aNlP2w...,False,https://www.yelp.com/biz/da-cove-health-bar-an...,1295,"[{'alias': 'juicebars', 'title': 'Juice Bars &...",4.0,"{'latitude': 21.26878, 'longitude': -157.8136199}","[pickup, delivery]",$$,"{'address1': '3045 Monsarrat Ave', 'address2':...",18087328744.0,(808) 732-8744,1718.002818
6,kg9DW8QwQ6NEorFdUSUhgA,nalu-health-bar-and-cafe-urban-honolulu,Nalu Health Bar & Cafe,https://s3-media2.fl.yelpcdn.com/bphoto/tn-UW0...,False,https://www.yelp.com/biz/nalu-health-bar-and-c...,10,"[{'alias': 'juicebars', 'title': 'Juice Bars &...",4.5,"{'latitude': 21.279401, 'longitude': -157.831178}",[],,"{'address1': '226 Lewers St', 'address2': None...",18084254710.0,(808) 425-4710,530.17716
7,iGIjmLHJAiwUP-oHpWP-Sw,aloh-health-bar-and-cafe-honolulu-2,Aloh Health Bar & Cafe,https://s3-media2.fl.yelpcdn.com/bphoto/KuUnIW...,False,https://www.yelp.com/biz/aloh-health-bar-and-c...,82,"[{'alias': 'coffee', 'title': 'Coffee & Tea'},...",4.5,"{'latitude': 21.280353794831946, 'longitude': ...","[pickup, delivery]",$$,"{'address1': '407 Seaside Ave', 'address2': No...",18085488116.0,(808) 548-8116,601.092447
8,oqnyvvSvWSNICv4QhzHyQg,island-vintage-shave-ice-honolulu-7,Island Vintage Shave Ice,https://s3-media3.fl.yelpcdn.com/bphoto/3frxsX...,False,https://www.yelp.com/biz/island-vintage-shave-...,99,"[{'alias': 'shavedice', 'title': 'Shaved Ice'}...",4.0,"{'latitude': 21.272783224863293, 'longitude': ...",[],$$,"{'address1': '2552 Kalakaua Ave', 'address2': ...",,,657.64535
9,tUDgSvk3bdNV_UJJ-yf7XQ,aloha-bowls-and-tea-honolulu,Aloha Bowls and Tea,https://s3-media3.fl.yelpcdn.com/bphoto/e_k8Bm...,False,https://www.yelp.com/biz/aloha-bowls-and-tea-h...,48,"[{'alias': 'acaibowls', 'title': 'Acai Bowls'}...",4.5,"{'latitude': 21.28298862765927, 'longitude': -...",[],$$,"{'address1': '2005 Kalia Rd', 'address2': '', ...",18088077733.0,(808) 807-7733,1148.326573


- Where is the actual data we want to save?

In [27]:
## How many did we get the details for?
results_per_page = len(results['businesses'])
results_per_page

20

- Calculate how many pages of results needed to cover the total_results

In [28]:
(results['total'])/ results_per_page

4.7

In [29]:
# Use math.ceil to round up for the total number of pages of results.
n_pages = math.ceil((results['total'])/ results_per_page)
n_pages

5

In [30]:
for i in tqdm_notebook(range(1,n_pages+1)):
    ## The block of code we want to TRY to run
    try:
        
        time.sleep(.2)
        
        ## Read in results in progress file and check the length
        with open(JSON_FILE, 'r') as f:
            previous_results = json.load(f)
        
        ## save number of results to use as offset
        n_results = len(previous_results)
        
        
        ## use n_results as the OFFSET 
        results = yelp.search_query(location = location, term = term,
                                   offset = n_results+1)

        ## append new results and save to file
        previous_results.extend(results['businesses'])
        
        with open(JSON_FILE, 'w') as f:
            json.dump(previous_results, f)


            
    ## What to do if we get an error/exception.
    except Exception as e:
        print(' [!] ERROR', e)

  0%|          | 0/5 [00:00<?, ?it/s]

## Open the Final JSON File with Pandas

In [31]:
df = pd.read_json(JSON_FILE)

In [32]:
df.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,3vsRGQGSVPec7kKBFGhowA,alo-cafe-hawaii-honolulu-2,ALO Cafe Hawaii,https://s3-media2.fl.yelpcdn.com/bphoto/_tn-bh...,False,https://www.yelp.com/biz/alo-cafe-hawaii-honol...,98,"[{'alias': 'juicebars', 'title': 'Juice Bars &...",4.5,"{'latitude': 21.27713, 'longitude': -157.82433}","[delivery, pickup]",$$,"{'address1': '159 Kaiulani Ave', 'address2': '...",18087797887,(808) 779-7887,505.744094
1,QMPFWhM_kMIMSSBQf48-Gg,waffle-and-berry-honolulu,Waffle and Berry,https://s3-media4.fl.yelpcdn.com/bphoto/6VpIBT...,False,https://www.yelp.com/biz/waffle-and-berry-hono...,675,"[{'alias': 'waffles', 'title': 'Waffles'}, {'a...",5.0,"{'latitude': 21.28594, 'longitude': -157.83275}","[delivery, pickup]",$$,"{'address1': '1958 Kalakaua Ave', 'address2': ...",18082068272,(808) 206-8272,1256.06174
2,FhShWwhWNm4j6yMjyF0MCw,banán-waikiki-beach-shack-honolulu,Banán - Waikiki Beach Shack,https://s3-media3.fl.yelpcdn.com/bphoto/HBcgiP...,False,https://www.yelp.com/biz/ban%C3%A1n-waikiki-be...,518,"[{'alias': 'vegan', 'title': 'Vegan'}, {'alias...",4.5,"{'latitude': 21.2772821247839, 'longitude': -1...",[],$,"{'address1': '2301 Kalakaua Ave', 'address2': ...",18082001640,(808) 200-1640,242.294662
3,vZ6iGKsU7zX8MT5rwNJeZw,tropical-tribe-honolulu,Tropical Tribe,https://s3-media3.fl.yelpcdn.com/bphoto/e7Y1Qu...,False,https://www.yelp.com/biz/tropical-tribe-honolu...,592,"[{'alias': 'acaibowls', 'title': 'Acai Bowls'}...",4.5,"{'latitude': 21.28557897099554, 'longitude': -...","[delivery, pickup]",$$,"{'address1': '1778 Ala Moana Blvd', 'address2'...",18083668226,(808) 366-8226,1522.472981
4,vFz6Nwo0LgpbJcU9St-uPA,da-cove-health-bar-and-cafe-honolulu-4,Da Cove Health Bar & Cafe,https://s3-media3.fl.yelpcdn.com/bphoto/aNlP2w...,False,https://www.yelp.com/biz/da-cove-health-bar-an...,1295,"[{'alias': 'juicebars', 'title': 'Juice Bars &...",4.0,"{'latitude': 21.26878, 'longitude': -157.8136199}","[delivery, pickup]",$$,"{'address1': '3045 Monsarrat Ave', 'address2':...",18087328744,(808) 732-8744,1718.002818


In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 91 entries, 0 to 90
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             91 non-null     object 
 1   alias          91 non-null     object 
 2   name           91 non-null     object 
 3   image_url      91 non-null     object 
 4   is_closed      91 non-null     bool   
 5   url            91 non-null     object 
 6   review_count   91 non-null     int64  
 7   categories     91 non-null     object 
 8   rating         91 non-null     float64
 9   coordinates    91 non-null     object 
 10  transactions   91 non-null     object 
 11  price          68 non-null     object 
 12  location       91 non-null     object 
 13  phone          91 non-null     object 
 14  display_phone  91 non-null     object 
 15  distance       91 non-null     float64
dtypes: bool(1), float64(2), int64(1), object(12)
memory usage: 10.9+ KB


In [34]:
## convert the filename to a .csv.gz
csv_file = JSON_FILE.replace('.json','.csv.gz')
csv_file

'Data/Waikiki-acai.csv.gz'

In [13]:
## convert the filename to a .csv.gz


In [14]:
## Save it as a compressed csv (to save space)
df.to_csv(csv_file, compression = 'gzip', index = False)

## Bonus: compare filesize with os module's `os.path.getsize`

In [36]:
size_json = os.path.getsize(JSON_FILE)
size_csv_gz = os.path.getsize(JSON_FILE.replace('.json','.csv.gz'))

print(f'JSON FILE: {size_json:,} Bytes')
print(f'CSV.GZ FILE: {size_csv_gz:,} Bytes')

print(f'the csv.gz is {size_json/size_csv_gz} times smaller!')

JSON FILE: 92,598 Bytes
CSV.GZ FILE: 13,000 Bytes
the csv.gz is 7.122923076923077 times smaller!


## Next Class: Processing the Results and Mapping 