# Part 1 - Extracting and Saving Data from Yelp API

## Obective

- For this CodeAlong, we will be working with the Yelp API. 
- You will use the the Yelp API to search your home town for a cuisine type of your choice.
- Next class, we will then use Plotly Express to create a map with the Mapbox API to visualize the results.
    
    

## Tools You Will Use
- Part 1:
    - Yelp API:
        - Getting Started: 
            - https://www.yelp.com/developers/documentation/v3/get_started

    - `YelpAPI` python package
        -  "YelpAPI": https://github.com/gfairchild/yelpapi
- Part 2:

    - Plotly Express: https://plotly.com/python/getting-started/
        - With Mapbox API: https://www.mapbox.com/
        - `px.scatter_mapbox` [Documentation](https://plotly.com/python/scattermapbox/): 




### Applying Code From
- Efficient API Calls Lesson Link: https://login.codingdojo.com/m/376/12529/88078

In [1]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Additional Imports
import os, json, math, time
from yelpapi import YelpAPI
from tqdm.notebook import tqdm_notebook

In [2]:
!pip install yelpapi
!pip install tqdm



## 1. Registering for Required APIs


- Yelp: https://www.yelp.com/developers/documentation/v3/get_started


> Check the official API documentation to know what arguments we can search for: https://www.yelp.com/developers/documentation/v3/business_search

### Load Credentials and Create Yelp API Object

In [3]:
# Load API Credentials

with open('/Users/patelmedzy/.secret/yelp_api.json') as f:
    login = json.load(f)



In [4]:
# Instantiate YelpAPI Variable
login.keys()

dict_keys(['client-id', 'api-key'])

In [5]:
#login.items()
#login['api-key']

In [6]:
yelp =YelpAPI(login['api-key'], timeout_s = 5.0)

### Define Search Terms and File Paths

In [7]:
# set our API call parameters and filename before the first call
location = 'Richmond, TX 77407'
term = 'pizza'

In [8]:
location.split(',')[0]

'Richmond'

In [9]:
## Specify fodler for saving data
FOLDER = 'Data/'

os.makedirs(FOLDER, exist_ok=True)
# Specifying JSON_FILE filename (can include a folder)
JSON_FILE = FOLDER+f"{location.split(',')[0]}-{term}.json"

In [10]:
JSON_FILE

'Data/Richmond-pizza.json'

### Check if Json File exists and Create it if it doesn't

In [11]:
## Check if JSON_FILE exists
file_exists = os.path.isfile(JSON_FILE)
## If it does not exist: 
if file_exists == False:    
    ## CREATE ANY NEEDED FOLDERS
    # Get the Folder Name only
    folder = os.path.dirname(JSON_FILE)
    
    ## If JSON folder name is not empty:
    if len(folder)>0:
        # create the folder
        os.makedirs(folder, exist_ok = True)
        
        
    ## INFORM USER AND SAVE EMPTY LIST
    print(f"[i] {JSON_FILE} not found. Saving empty list to file.")
    
    
    ## save the first page of results
    with open(JSON_FILE, 'w') as f:
          json.dump([], f)
        
## If it exists, inform user
else:
    print(f"[i] {JSON_FILE} already exists.")

[i] Data/Richmond-pizza.json already exists.


In [12]:
os.path.isfile(JSON_FILE)

True

### Load JSON FIle and account for previous results

In [13]:
## Load previous results and use len of results for offset

## set offset based on previous results


### Make the first API call to get the first page of data

- We will use this first result to check:
    - how many total results there are?
    - Where is the actual data we want to save?
    - how many results do we get at a time?


In [14]:
# use our yelp_api variable's search_query method to perform our API call
results = yelp.search_query(term = term, location = location)

In [15]:
type(results)

dict

In [16]:
len(results)

3

In [17]:
results.keys()

dict_keys(['businesses', 'total', 'region'])

In [18]:
## How many results total?
results['total']

158

In [19]:
results['businesses']

[{'id': 'cHToQZPO6yEfxJSMFU12sw',
  'alias': 'fat-boys-pizza-richmond-richmond',
  'name': "Fat Boy's Pizza - Richmond",
  'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/GTS4yw5K-OqTVeOUFZCllQ/o.jpg',
  'is_closed': False,
  'url': 'https://www.yelp.com/biz/fat-boys-pizza-richmond-richmond?adjust_creative=Q1Qdfk781tT6qWPSSx9wKg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=Q1Qdfk781tT6qWPSSx9wKg',
  'review_count': 62,
  'categories': [{'alias': 'pizza', 'title': 'Pizza'},
   {'alias': 'desserts', 'title': 'Desserts'},
   {'alias': 'beerbar', 'title': 'Beer Bar'}],
  'rating': 3.5,
  'coordinates': {'latitude': 29.653703149508893,
   'longitude': -95.70841800256004},
  'transactions': ['pickup', 'delivery'],
  'price': '$$',
  'location': {'address1': '10445 W Grand Pkwy S',
   'address2': 'Ste 150',
   'address3': '',
   'city': 'Richmond',
   'zip_code': '77407',
   'country': 'US',
   'state': 'TX',
   'display_address': ['10445 W Grand Pkwy S',
    'S

In [20]:
results['region']

{'center': {'longitude': -95.72628021240234, 'latitude': 29.676708410974392}}

In [21]:
## How many results total?
pd.DataFrame(results['businesses'])

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,cHToQZPO6yEfxJSMFU12sw,fat-boys-pizza-richmond-richmond,Fat Boy's Pizza - Richmond,https://s3-media2.fl.yelpcdn.com/bphoto/GTS4yw...,False,https://www.yelp.com/biz/fat-boys-pizza-richmo...,62,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",3.5,"{'latitude': 29.653703149508893, 'longitude': ...","[pickup, delivery]",$$,"{'address1': '10445 W Grand Pkwy S', 'address2...",17138322182,(713) 832-2182,3085.821153
1,knuJk3YRhUgX-7NtGv3hKg,brooklyn-pizzeria-richmond,Brooklyn Pizzeria,https://s3-media2.fl.yelpcdn.com/bphoto/deghxC...,False,https://www.yelp.com/biz/brooklyn-pizzeria-ric...,193,"[{'alias': 'pizza', 'title': 'Pizza'}]",4.0,"{'latitude': 29.6609608699916, 'longitude': -9...",[delivery],$,"{'address1': '7930 W Grand Pkwy S', 'address2'...",12812323333,(281) 232-3333,2290.908598
2,RF0xAbGNKkTrIvlJwsLsLA,twisted-pizza-and-curries-richmond,Twisted Pizza & Curries,https://s3-media2.fl.yelpcdn.com/bphoto/XFDE4k...,False,https://www.yelp.com/biz/twisted-pizza-and-cur...,50,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 29.6542968, 'longitude': -95.7143...","[pickup, delivery]",,"{'address1': '11131 Harlem Rd', 'address2': 'S...",18325000023,(832) 500-0023,2746.648223
3,nZ9FWG7QlbE4bYCTFJs7DQ,cup-n-char-richmond,Cup N Char,https://s3-media3.fl.yelpcdn.com/bphoto/gx1e3d...,False,https://www.yelp.com/biz/cup-n-char-richmond?a...,18,"[{'alias': 'pizza', 'title': 'Pizza'}]",5.0,"{'latitude': 29.661013325930316, 'longitude': ...",[],,"{'address1': '10450 Fm 1464', 'address2': 'Ste...",12819407440,(281) 940-7440,4228.448569
4,Zzn0kEXv88U24HGjvJSxGQ,daddyo-s-pizza-katy-katy,DaddyO’s Pizza - Katy,https://s3-media1.fl.yelpcdn.com/bphoto/Gy9djJ...,False,https://www.yelp.com/biz/daddyo-s-pizza-katy-k...,131,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 29.7288749792794, 'longitude': -9...","[pickup, delivery]",$,"{'address1': '6356 S Peek Rd', 'address2': 'St...",13463770777,(346) 377-0777,4396.158549
5,_WpiWEiXec2HsBaCDemq9Q,lomontes-italian-restaurant-and-pizzeria-richm...,Lomonte's Italian Restaurant and Pizzeria,https://s3-media2.fl.yelpcdn.com/bphoto/7MfgN7...,False,https://www.yelp.com/biz/lomontes-italian-rest...,510,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 29.624106367820428, 'longitude': ...","[pickup, delivery]",$$,"{'address1': '815 Plantation Dr', 'address2': ...",12812328290,(281) 232-8290,5904.518283
6,_UpQsv4jZdL0jv0eyKO93w,armani-pizza-and-pasta-katy,Armani pizza and pasta,https://s3-media1.fl.yelpcdn.com/bphoto/hzfxMS...,False,https://www.yelp.com/biz/armani-pizza-and-past...,2,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",5.0,"{'latitude': 29.703939043898767, 'longitude': ...","[pickup, delivery]",,"{'address1': '6868 S Mason Rd Katy Tx 77450', ...",12816764441,(281) 676-4441,3460.329886
7,SEMYGKDD8O0ErJ-E8d_sxw,center-court-pizza-and-brew-richmond-3,Center Court Pizza & Brew,https://s3-media2.fl.yelpcdn.com/bphoto/K6WuMA...,False,https://www.yelp.com/biz/center-court-pizza-an...,69,"[{'alias': 'chicken_wings', 'title': 'Chicken ...",4.0,"{'latitude': 29.651672764769724, 'longitude': ...","[pickup, delivery]",$$,"{'address1': '18320 W Airport Blvd', 'address2...",17132347120,(713) 234-7120,3521.05081
8,LCFDyfgqTcDVLN-Eywdf5g,crust-pizza-katy,Crust Pizza,https://s3-media1.fl.yelpcdn.com/bphoto/zUQKtz...,False,https://www.yelp.com/biz/crust-pizza-katy?adju...,97,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.0,"{'latitude': 29.73666847855678, 'longitude': -...",[],,"{'address1': '9920 Gaston Rd', 'address2': 'St...",12817699920,(281) 769-9920,10721.494558
9,7H49Q_dKD5QL_CZivZQV7A,marcos-pizza-richmond,Marco's Pizza,https://s3-media4.fl.yelpcdn.com/bphoto/NX4PSU...,False,https://www.yelp.com/biz/marcos-pizza-richmond...,63,"[{'alias': 'pizza', 'title': 'Pizza'}]",3.5,"{'latitude': 29.67333293426642, 'longitude': -...","[pickup, delivery]",$$,"{'address1': '7101 W Grand Pkwy S', 'address2'...",12812392200,(281) 239-2200,2516.831356


- Where is the actual data we want to save?

In [23]:
results_per_page = len(results['businesses'])
results_per_page

20

In [None]:
## How many did we get the details for?
results_per_page = None
results_per_page

- Calculate how many pages of results needed to cover the total_results

In [29]:
(results['total'])/ results_per_page

7.9

In [30]:
# Use math.ceil to round up for the total number of pages of results.
n_pages = math.ceil((results['total'])/ results_per_page)
n_pages

8

In [31]:
for i in tqdm_notebook( range(1,n_pages+1)):
    ## The block of code we want to TRY to run
    try:
        
        time.sleep(.2)
        
        ## Read in results in progress file and check the length
        with open(JSON_FILE, 'r') as f:
            previous_results = json.load(f)
        
        ## save number of results for to use as offset
        n_results = len(previous_results)
        
        
        ## use n_results as the OFFSET 
        results = yelp.search_query(location = location, term=term,
                                   offset = n_results+1)

        ## append new results and save to file
        previous_results.extend(results['businesses'])
        
        with open(JSON_FILE, 'w') as f:
            json.dump(previous_results, f)

            
    ## What to do if we get an error/exception.
    except Exception as e:
        print('[!] ERROR', e)


  0%|          | 0/8 [00:00<?, ?it/s]

## Open the Final JSON File with Pandas

In [32]:
df = pd.read_json(JSON_FILE)

In [33]:
df.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,knuJk3YRhUgX-7NtGv3hKg,brooklyn-pizzeria-richmond,Brooklyn Pizzeria,https://s3-media2.fl.yelpcdn.com/bphoto/deghxC...,False,https://www.yelp.com/biz/brooklyn-pizzeria-ric...,193,"[{'alias': 'pizza', 'title': 'Pizza'}]",4.0,"{'latitude': 29.6609608699916, 'longitude': -9...",[delivery],$,"{'address1': '7930 W Grand Pkwy S', 'address2'...",12812323333,(281) 232-3333,2290.908598
1,nZ9FWG7QlbE4bYCTFJs7DQ,cup-n-char-richmond,Cup N Char,https://s3-media3.fl.yelpcdn.com/bphoto/gx1e3d...,False,https://www.yelp.com/biz/cup-n-char-richmond?a...,18,"[{'alias': 'pizza', 'title': 'Pizza'}]",5.0,"{'latitude': 29.661013325930316, 'longitude': ...",[],,"{'address1': '10450 Fm 1464', 'address2': 'Ste...",12819407440,(281) 940-7440,4228.448569
2,RF0xAbGNKkTrIvlJwsLsLA,twisted-pizza-and-curries-richmond,Twisted Pizza & Curries,https://s3-media2.fl.yelpcdn.com/bphoto/XFDE4k...,False,https://www.yelp.com/biz/twisted-pizza-and-cur...,50,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 29.6542968, 'longitude': -95.7143...","[delivery, pickup]",,"{'address1': '11131 Harlem Rd', 'address2': 'S...",18325000023,(832) 500-0023,2746.648223
3,_WpiWEiXec2HsBaCDemq9Q,lomontes-italian-restaurant-and-pizzeria-richm...,Lomonte's Italian Restaurant and Pizzeria,https://s3-media2.fl.yelpcdn.com/bphoto/7MfgN7...,False,https://www.yelp.com/biz/lomontes-italian-rest...,510,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 29.624106367820428, 'longitude': ...","[delivery, pickup]",$$,"{'address1': '815 Plantation Dr', 'address2': ...",12812328290,(281) 232-8290,5904.518283
4,Zzn0kEXv88U24HGjvJSxGQ,daddyo-s-pizza-katy-katy,DaddyO’s Pizza - Katy,https://s3-media1.fl.yelpcdn.com/bphoto/Gy9djJ...,False,https://www.yelp.com/biz/daddyo-s-pizza-katy-k...,131,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 29.7288749792794, 'longitude': -9...","[delivery, pickup]",$,"{'address1': '6356 S Peek Rd', 'address2': 'St...",13463770777,(346) 377-0777,4396.158549


In [34]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 157 entries, 0 to 156
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             157 non-null    object 
 1   alias          157 non-null    object 
 2   name           157 non-null    object 
 3   image_url      157 non-null    object 
 4   is_closed      157 non-null    bool   
 5   url            157 non-null    object 
 6   review_count   157 non-null    int64  
 7   categories     157 non-null    object 
 8   rating         157 non-null    float64
 9   coordinates    157 non-null    object 
 10  transactions   157 non-null    object 
 11  price          120 non-null    object 
 12  location       157 non-null    object 
 13  phone          157 non-null    object 
 14  display_phone  157 non-null    object 
 15  distance       157 non-null    float64
dtypes: bool(1), float64(2), int64(1), object(12)
memory usage: 18.7+ KB


In [35]:
## convert the filename to a .csv.gz
csv_file = JSON_FILE.replace('.json','.csv.gz')
csv_file

'Data/Richmond-pizza.csv.gz'

In [36]:
## Save it as a compressed csv (to save space)
df.to_csv(csv_file, compression = 'gzip', index= False)

## Bonus: compare filesize with os module's `os.path.getsize`

In [37]:
size_json = os.path.getsize(JSON_FILE)
size_csv_gz = os.path.getsize(JSON_FILE.replace('.json','.csv.gz'))

print(f'JSON FILE: {size_json:,} Bytes')
print(f'CSV.GZ FILE: {size_csv_gz:,} Bytes')

print(f'the csv.gz is {size_json/size_csv_gz} times smaller!')

JSON FILE: 154,333 Bytes
CSV.GZ FILE: 22,744 Bytes
the csv.gz is 6.7856577558916635 times smaller!


## Next Class: Processing the Results and Mapping 