# Part 1 - Extracting and Saving Data from Yelp API

## Obective

- For this CodeAlong, we will be working with the Yelp API. 
- You will use the the Yelp API to search your home town for a cuisine type of your choice.
- Next class, we will then use Plotly Express to create a map with the Mapbox API to visualize the results.
    
    

## Tools You Will Use
- Part 1:
    - Yelp API:
        - Getting Started: 
            - https://www.yelp.com/developers/documentation/v3/get_started

    - `YelpAPI` python package
        -  "YelpAPI": https://github.com/gfairchild/yelpapi
- Part 2:

    - Plotly Express: https://plotly.com/python/getting-started/
        - With Mapbox API: https://www.mapbox.com/
        - `px.scatter_mapbox` [Documentation](https://plotly.com/python/scattermapbox/): 




### Applying Code From
- Efficient API Calls Lesson Link: https://login.codingdojo.com/m/376/12529/88078

In [1]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Additional Imports
import os, json, math, time
from yelpapi import YelpAPI
from tqdm.notebook import tqdm_notebook

In [4]:
#!pip install yelpapi

## 1. Registering for Required APIs


- Yelp: https://www.yelp.com/developers/documentation/v3/get_started


> Check the official API documentation to know what arguments we can search for: https://www.yelp.com/developers/documentation/v3/business_search

### Load Credentials and Create Yelp API Object

In [2]:
# Load API Credentials
with open('/Users/jonme/.secret/yelp_api.json','r') as f:
    login = json.load(f)

In [3]:
login.keys()

dict_keys(['client-id', 'api-key'])

In [5]:
# Instantiate YelpAPI Variable
yelp = YelpAPI(login['api-key'],timeout_s=5.0)

### Define Search Terms and File Paths

In [6]:
# set our API call parameters and filename before the first call
location = 'Minneapolis, MN'
term = 'pizza'

In [7]:
location.split(',')[0]

'Minneapolis'

In [8]:
## Specify fodler for saving data
FOLDER ='Data/'

os.makedirs(FOLDER,exist_ok=True)

# Specifying JSON_FILE filename (can include a folder)
JSON_FILE = FOLDER+f"{location.split(',')[0]}-{term}.json"

In [9]:
JSON_FILE

'Data/Minneapolis-pizza.json'

### Check if Json File exists and Create it if it doesn't

In [10]:
## Check if JSON_FILE exists
file_exists = os.path.isfile(JSON_FILE)

## If it does not exist: 
if file_exists ==False:
    ## CREATE ANY NEEDED FOLDERS
    # Get the Folder Name only
    folder = os.path.dirname(JSON_FILE)
    
    ## If JSON_FILE included a folder:
    if len(folder)>0:
        # create the folder
        os.makedirs(folder,exist_ok=True)
        
    ## INFORM USER AND SAVE EMPTY LIST
    print(f"[i] {JSON_FILE} not found. Saving empty list to file.")
    
    
    ## save the first page of results
    with open(JSON_FILE,'w')as f:
        json.dump([],f)
        
## If it exists, inform user
else:
       print(f"[i] {JSON_FILE} already exists.")

[i] Data/Minneapolis-pizza.json not found. Saving empty list to file.


![png](Figures/question.png)

### Make the first API call to get the first page of data

- We will use this first result to check:
    - how many total results there are?
    - Where is the actual data we want to save?
    - how many results do we get at a time?


In [12]:
# use our yelp_api variable's search_query method to perform our API call
results = yelp.search_query(term=term,location = location)

In [13]:
type(results)

dict

In [14]:
len(results)

3

In [15]:
results.keys()

dict_keys(['businesses', 'total', 'region'])

In [16]:
results['total']

1700

In [17]:
results['region']

{'center': {'longitude': -93.26156616210938, 'latitude': 44.97044244293755}}

In [18]:
results['businesses']

[{'id': 'iH3FcZ4Xchx3s7fgIINaYg',
  'alias': 'pizzeria-lola-minneapolis',
  'name': 'Pizzeria Lola',
  'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/hnnHp28kNSXxQXr20bAIbQ/o.jpg',
  'is_closed': False,
  'url': 'https://www.yelp.com/biz/pizzeria-lola-minneapolis?adjust_creative=UC-A5xGcluPD3j5hJu71tw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=UC-A5xGcluPD3j5hJu71tw',
  'review_count': 1236,
  'categories': [{'alias': 'pizza', 'title': 'Pizza'}],
  'rating': 4.5,
  'coordinates': {'latitude': 44.901815399420016,
   'longitude': -93.31847304108392},
  'transactions': ['delivery'],
  'price': '$$',
  'location': {'address1': '5557 Xerxes Ave S',
   'address2': '',
   'address3': '',
   'city': 'Minneapolis',
   'zip_code': '55410',
   'country': 'US',
   'state': 'MN',
   'display_address': ['5557 Xerxes Ave S', 'Minneapolis, MN 55410']},
  'phone': '+16124248338',
  'display_phone': '(612) 424-8338',
  'distance': 8848.533353433979},
 {'id': '5Y2nekkwRr

In [19]:

pd.DataFrame(results['businesses'])

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,iH3FcZ4Xchx3s7fgIINaYg,pizzeria-lola-minneapolis,Pizzeria Lola,https://s3-media2.fl.yelpcdn.com/bphoto/hnnHp2...,False,https://www.yelp.com/biz/pizzeria-lola-minneap...,1236,"[{'alias': 'pizza', 'title': 'Pizza'}]",4.5,"{'latitude': 44.901815399420016, 'longitude': ...",[delivery],$$,"{'address1': '5557 Xerxes Ave S', 'address2': ...",16124248338.0,(612) 424-8338,8848.533353
1,5Y2nekkwRrNMMSx3b-2EhA,young-joni-minneapolis,Young Joni,https://s3-media2.fl.yelpcdn.com/bphoto/0Js61i...,False,https://www.yelp.com/biz/young-joni-minneapoli...,909,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 45.00123, 'longitude': -93.26664}",[delivery],$$,"{'address1': '165 13th Ave NE', 'address2': ''...",16123455719.0,(612) 345-5719,3427.201161
2,H16j-AnU6QNcuNFgPNt2Sw,pizza-lucé-downtown-minneapolis,Pizza Lucé Downtown,https://s3-media2.fl.yelpcdn.com/bphoto/J48o_d...,False,https://www.yelp.com/biz/pizza-luc%C3%A9-downt...,680,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.0,"{'latitude': 44.9817564, 'longitude': -93.2735...",[delivery],$$,"{'address1': '119 N 4th St', 'address2': '', '...",16123337359.0,(612) 333-7359,1572.140788
3,Z9UFvaj_zeaVSg4Ta8D9qQ,red-rabbit-minneapolis,Red Rabbit,https://s3-media3.fl.yelpcdn.com/bphoto/q_0dLY...,False,https://www.yelp.com/biz/red-rabbit-minneapoli...,744,"[{'alias': 'italian', 'title': 'Italian'}, {'a...",4.5,"{'latitude': 44.983694, 'longitude': -93.271747}",[delivery],$$,"{'address1': '201 N Washington Ave', 'address2...",16127678855.0,(612) 767-8855,1677.052018
4,_-DCHZV7n1LzeD6cFVpebQ,og-zaza-roseville,OG Zaza,https://s3-media2.fl.yelpcdn.com/bphoto/4MNgRA...,False,https://www.yelp.com/biz/og-zaza-roseville?adj...,7,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 45.01285904350727, 'longitude': -...",[],,"{'address1': '1595 MN-36', 'address2': 'Ste 10...",16512075746.0,(651) 207-5746,8575.524206
5,khPTVPMHdaAUNOX1LVnBng,bricksworth-beer-co-minneapolis,Bricksworth Beer Co,https://s3-media3.fl.yelpcdn.com/bphoto/mRsAED...,False,https://www.yelp.com/biz/bricksworth-beer-co-m...,4,"[{'alias': 'brewpubs', 'title': 'Brewpubs'}, {...",4.5,"{'latitude': 44.9848362, 'longitude': -93.2761...",[],,"{'address1': '305 N 5th Ave', 'address2': 'Ste...",19526575236.0,(952) 657-5236,1970.249686
6,CSmr2eaOX3SDsL5htqIynw,boludo-minneapolis-4,Boludo,https://s3-media1.fl.yelpcdn.com/bphoto/pSvuJR...,False,https://www.yelp.com/biz/boludo-minneapolis-4?...,105,"[{'alias': 'argentine', 'title': 'Argentine'},...",4.5,"{'latitude': 44.976956, 'longitude': -93.26202}",[delivery],,"{'address1': '530 S 4th St', 'address2': None,...",16124463833.0,(612) 446-3833,703.918303
7,OXy8tz9HWiOK_fbh29TEGg,boludo-minneapolis-3,Boludo,https://s3-media1.fl.yelpcdn.com/bphoto/tgxeHC...,False,https://www.yelp.com/biz/boludo-minneapolis-3?...,321,"[{'alias': 'empanadas', 'title': 'Empanadas'},...",4.5,"{'latitude': 44.93427172240348, 'longitude': -...",[delivery],$$,"{'address1': '8 W 38th St', 'address2': '', 'a...",16123535574.0,(612) 353-5574,4236.897521
8,yUO_BhkMsv-oJdJDdi3UGw,wrecktangle-pizza-minneapolis,Wrecktangle Pizza,https://s3-media3.fl.yelpcdn.com/bphoto/7qmBev...,False,https://www.yelp.com/biz/wrecktangle-pizza-min...,99,"[{'alias': 'pizza', 'title': 'Pizza'}]",4.5,"{'latitude': 44.98795, 'longitude': -93.27803}","[delivery, pickup]",$$,"{'address1': '729 N Washington Ave', 'address2...",16126956525.0,(612) 695-6525,2340.875059
9,BzB0KtBMoml85z9FEHtZzw,italian-butcher-savage,Italian Butcher,https://s3-media2.fl.yelpcdn.com/bphoto/RFgr95...,False,https://www.yelp.com/biz/italian-butcher-savag...,2,"[{'alias': 'italian', 'title': 'Italian'}, {'a...",5.0,"{'latitude': 44.7425923944024, 'longitude': -9...",[delivery],,"{'address1': '14425 State Hwy 13', 'address2':...",19528554665.0,(952) 855-4665,26976.103221


In [None]:
## How many results total?


- Where is the actual data we want to save?

In [20]:
## How many did we get the details for?
results_per_page = len(results['businesses'])
results_per_page

20

- Calculate how many pages of results needed to cover the total_results

In [21]:
# Import additional packages for controlling our loop
import time, math
# Use math.ceil to round up for the total number of pages of results.
n_pages = math.ceil((results['total'])/ results_per_page)
n_pages

85

In [22]:
for i in tqdm_notebook( range(1,n_pages+1)):
   # The block of code we want to TRY to run
    try:
        
        time.sleep(.2)
        
        ## Read in results in progress file and check the length
        with open(JSON_FILE, 'r') as f:
            previous_results = json.load(f)
            
        ## save number of results for to use as offset
        n_results = len(previous_results)
        
        
        ## use n_results as the OFFSET 
        results = yelp.search_query(location=location,
                                        term=term, 
                                        offset=n_results+1)

        ## append new results and save to file
        previous_results.extend(results['businesses'])

        with open(JSON_FILE,'w') as f:
            json.dump(previous_results,f)
            
    #What to do if we get an error/exception.
    except Exception as e: # saving the error message so we can print it.
        print('[!] ERROR: ',e)

  0%|          | 0/85 [00:00<?, ?it/s]

[!] ERROR:  VALIDATION_ERROR: Too many results requested, limit+offset must be <= 1000.
[!] ERROR:  VALIDATION_ERROR: Too many results requested, limit+offset must be <= 1000.
[!] ERROR:  VALIDATION_ERROR: Too many results requested, limit+offset must be <= 1000.
[!] ERROR:  VALIDATION_ERROR: Too many results requested, limit+offset must be <= 1000.
[!] ERROR:  VALIDATION_ERROR: Too many results requested, limit+offset must be <= 1000.
[!] ERROR:  VALIDATION_ERROR: Too many results requested, limit+offset must be <= 1000.
[!] ERROR:  VALIDATION_ERROR: Too many results requested, limit+offset must be <= 1000.
[!] ERROR:  VALIDATION_ERROR: Too many results requested, limit+offset must be <= 1000.
[!] ERROR:  VALIDATION_ERROR: Too many results requested, limit+offset must be <= 1000.
[!] ERROR:  VALIDATION_ERROR: Too many results requested, limit+offset must be <= 1000.
[!] ERROR:  VALIDATION_ERROR: Too many results requested, limit+offset must be <= 1000.
[!] ERROR:  VALIDATION_ERROR: To

![png](Figures/Poll_531.png)

## Open the Final JSON File with Pandas

In [24]:
df = pd.read_json(JSON_FILE)

In [25]:
df.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,5Y2nekkwRrNMMSx3b-2EhA,young-joni-minneapolis,Young Joni,https://s3-media2.fl.yelpcdn.com/bphoto/0Js61i...,False,https://www.yelp.com/biz/young-joni-minneapoli...,909,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 45.00123, 'longitude': -93.26664}",[delivery],$$,"{'address1': '165 13th Ave NE', 'address2': ''...",16123455719,(612) 345-5719,3427.201161
1,H16j-AnU6QNcuNFgPNt2Sw,pizza-lucé-downtown-minneapolis,Pizza Lucé Downtown,https://s3-media2.fl.yelpcdn.com/bphoto/J48o_d...,False,https://www.yelp.com/biz/pizza-luc%C3%A9-downt...,680,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.0,"{'latitude': 44.9817564, 'longitude': -93.2735...",[delivery],$$,"{'address1': '119 N 4th St', 'address2': '', '...",16123337359,(612) 333-7359,1572.140788
2,Z9UFvaj_zeaVSg4Ta8D9qQ,red-rabbit-minneapolis,Red Rabbit,https://s3-media3.fl.yelpcdn.com/bphoto/q_0dLY...,False,https://www.yelp.com/biz/red-rabbit-minneapoli...,744,"[{'alias': 'italian', 'title': 'Italian'}, {'a...",4.5,"{'latitude': 44.983694, 'longitude': -93.271747}",[delivery],$$,"{'address1': '201 N Washington Ave', 'address2...",16127678855,(612) 767-8855,1677.052018
3,_-DCHZV7n1LzeD6cFVpebQ,og-zaza-roseville,OG Zaza,https://s3-media2.fl.yelpcdn.com/bphoto/4MNgRA...,False,https://www.yelp.com/biz/og-zaza-roseville?adj...,7,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 45.01285904350727, 'longitude': -...",[],,"{'address1': '1595 MN-36', 'address2': 'Ste 10...",16512075746,(651) 207-5746,8575.524206
4,khPTVPMHdaAUNOX1LVnBng,bricksworth-beer-co-minneapolis,Bricksworth Beer Co,https://s3-media3.fl.yelpcdn.com/bphoto/mRsAED...,False,https://www.yelp.com/biz/bricksworth-beer-co-m...,4,"[{'alias': 'brewpubs', 'title': 'Brewpubs'}, {...",4.5,"{'latitude': 44.9848362, 'longitude': -93.2761...",[],,"{'address1': '305 N 5th Ave', 'address2': 'Ste...",19526575236,(952) 657-5236,1970.249686


In [26]:
## convert the filename to a .csv.gz
csv_file = JSON_FILE.replace('.json','.csv.gz')
csv_file

'Data/Minneapolis-pizza.csv.gz'

In [27]:
## Save it as a compressed csv (to save space)
df.to_csv(csv_file, compression='gzip', index=False)

## Bonus: compare filesize with os module's `os.path.getsize`

In [28]:
size_json = os.path.getsize(JSON_FILE)
size_csv_gz = os.path.getsize(JSON_FILE.replace('.json','.csv.gz'))

print(f'JSON FILE: {size_json:,} Bytes')
print(f'CSV.GZ FILE: {size_csv_gz:,} Bytes')

print(f'the csv.gz is {size_json/size_csv_gz} times smaller!')

JSON FILE: 961,136 Bytes
CSV.GZ FILE: 144,664 Bytes
the csv.gz is 6.643919703589006 times smaller!


## Next Class: Processing the Results and Mapping 