# Mapping Yelp Search Results - Part 1

- 04/26/22


## Obective

- For this CodeAlong, we will be working with the Yelp API. 
- You will use the the Yelp API to search your home town for a cuisine type of your choice.
- Next class, we will then use Plotly Express to create a map with the Mapbox API to visualize the results.
    
    

## Tools You Will Use
- Part 1:
    - Yelp API:
        - Getting Started: 
            - https://www.yelp.com/developers/documentation/v3/get_started

    - `YelpAPI` python package
        -  "YelpAPI": https://github.com/gfairchild/yelpapi
- Part 2:

    - Plotly Express: https://plotly.com/python/getting-started/
        - With Mapbox API: https://www.mapbox.com/
        - `px.scatter_mapbox` [Documentation](https://plotly.com/python/scattermapbox/): 




### Applying Code From
- Efficient API Calls Lesson Link: https://login.codingdojo.com/m/376/12529/88078

In [1]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Additional Imports
import os, json, math, time
from yelpapi import YelpAPI
from tqdm.notebook import tqdm_notebook

## 1. Registering for Required APIs


- Yelp: https://www.yelp.com/developers/documentation/v3/get_started


> Check the official API documentation to know what arguments we can search for: https://www.yelp.com/developers/documentation/v3/business_search

### Load Credentials and Create Yelp API Object

In [2]:
# Load API Credentials
key_path = '../../.secret/yelp_api.json'
with open(key_path, 'r') as f:
    login = json.load(f)
login.keys()

dict_keys(['client-id', 'api-key'])

In [3]:
# Instantiate YelpAPI Variable
yelp = YelpAPI(login['api-key'], timeout_s=3)
yelp

<yelpapi.yelpapi.YelpAPI at 0x29293c71608>

### Define Search Terms and File Paths

In [4]:
# set our API call parameters and filename before the first call
LOCATION = "Charlottesville, VA"
TERM = "Brewery"

In [5]:
## Specify fodler for saving data
FOLDER = "Data/"
os.makedirs(FOLDER, exist_ok=True)

In [6]:
# Specifying JSON_FILE filename (can include a folder)
JSON_FILE = f"{FOLDER}{TERM}{LOCATION.split(',')[0]}.json"
JSON_FILE

'Data/BreweryCharlottesville.json'

### Check if Json File exists and Create it if it doesn't

In [7]:
## Check if JSON_FILE exists
file_exists = os.path.isfile(JSON_FILE)
## If it does not exist: 
if file_exists == False:
    ## CREATE ANY NEEDED FOLDERS
    # Get the Folder Name only
    folder = os.path.dirname(JSON_FILE)
    
    ## If JSON_FILE included a folder:
    if len(folder)>0:
        # create the folder
        os.makedirs(folder,exist_ok=True)
                
    ## INFORM USER AND SAVE EMPTY LIST
    print(f"[i] {JSON_FILE} not found. Saving empty list to file.")
    
    with open(JSON_FILE, 'w') as f:
        json.dump([],f)
    
    ## save the first page of results

        
## If it exists, inform user
else: print(f"[i] {JSON_FILE} already exists")

[i] Data/BreweryCharlottesville.json already exists


### Load JSON FIle and account for previous results

In [8]:
## Load previous results and use len of results for offset
with open(JSON_FILE, 'r') as f:
    previous_results = json.load(f)
    
## set offset based on previous results
n_results = len(previous_results)
print(f' - {n_results} previous results found.')

 - 55 previous results found.


### Make the first API call to get the first page of data

- We will use this first result to check:
    - how many total results there are?
    - Where is the actual data we want to save?
    - how many results do we get at a time?


In [9]:
# use our yelp_api variable's search_query method to perform our API call
results = yelp.search_query(term = TERM, location = LOCATION)
results.keys()

dict_keys(['businesses', 'total', 'region'])

In [10]:
## How many results total?
results['total']

76

- Where is the actual data we want to save?

In [11]:
type(results['businesses'])

list

In [12]:
## How many did we get the details for?
results_per_page = len(results['businesses'])
results_per_page

20

- Calculate how many pages of results needed to cover the total_results

In [13]:
# Use math.ceil to round up for the total number of pages of results.
n_pages = math.ceil(results['total']/results_per_page)
n_pages

4

In [14]:
for i in tqdm_notebook( range(0,n_pages+1)):
    ## The block of code we want to TRY to run
    try:
        time.sleep(.2)        
        ## Read in results in progress file and check the length
        with open(JSON_FILE, 'r') as f:
            previous_results = json.load(f)

        ## set offset based on previous results
        n_results = len(previous_results)
        
        ## use n_results as the OFFSET 
        results = yelp.search_query(term=TERM,
                                   location=LOCATION,
                                   offset=n_results+1)

        ## append new results and save to file
        previous_results.extend(results['businesses'])
        
        with open (JSON_FILE, 'w') as f:
            json.dump(previous_results, f)
            
    ## What to do if we get an error/exception.
    except Exception as e:
        print(e)
        break


  0%|          | 0/5 [00:00<?, ?it/s]

## Open the Final JSON File with Pandas

In [15]:
df = pd.read_json(JSON_FILE)
df

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,mr7C4suIQWIU35BL1SgZCA,random-row-brewing-charlottesville,Random Row Brewing,https://s3-media4.fl.yelpcdn.com/bphoto/FUkM4k...,False,https://www.yelp.com/biz/random-row-brewing-ch...,72,"[{'alias': 'breweries', 'title': 'Breweries'}]",4.0,"{'latitude': 38.0350243, 'longitude': -78.4864...",[],$,"{'address1': '608 Preston Ave', 'address2': ''...",14342848466,(434) 284-8466,1312.456426
1,asAthOuraDc_NxBY6Ne9lA,south-street-brewery-charlottesville,South Street Brewery,https://s3-media2.fl.yelpcdn.com/bphoto/QUbgUs...,False,https://www.yelp.com/biz/south-street-brewery-...,328,"[{'alias': 'breweries', 'title': 'Breweries'},...",4.0,"{'latitude': 38.029338, 'longitude': -78.4826883}",[delivery],$$,"{'address1': '106 W South St', 'address2': '',...",14342936550,(434) 293-6550,1766.043547
2,lFnfvVdrkr5ZCNAiqt-Tpg,starr-hill-brewery-charlottesville,Starr Hill Brewery,https://s3-media1.fl.yelpcdn.com/bphoto/UOJDRu...,False,https://www.yelp.com/biz/starr-hill-brewery-ch...,10,"[{'alias': 'breweries', 'title': 'Breweries'}]",5.0,"{'latitude': 38.03777667, 'longitude': -78.491...",[],,"{'address1': '946 Grady Ave', 'address2': None...",14348235671,(434) 823-5671,1392.568242
3,1263xqcmsqqUURfrp427HQ,selvedge-brewing-charlottesville-2,Selvedge Brewing,https://s3-media2.fl.yelpcdn.com/bphoto/Gy9bmy...,False,https://www.yelp.com/biz/selvedge-brewing-char...,72,"[{'alias': 'brewpubs', 'title': 'Brewpubs'}, {...",4.0,"{'latitude': 38.01952965628474, 'longitude': -...",[],,"{'address1': '1837 Broadway St', 'address2': '...",14342700555,(434) 270-0555,3425.095608
4,VnCEHYb3tHaLYdLRIqHcDQ,three-notchd-craft-kitchen-and-brewery-charlot...,Three Notch'd Craft Kitchen & Brewery,https://s3-media3.fl.yelpcdn.com/bphoto/T01te9...,False,https://www.yelp.com/biz/three-notchd-craft-ki...,204,"[{'alias': 'newamerican', 'title': 'American (...",3.5,"{'latitude': 38.0261135531783, 'longitude': -7...",[delivery],$$,"{'address1': '520 2nd St SE', 'address2': None...",14349563141,(434) 956-3141,2108.785792
5,DmRSBrzA1cX-NeXSrqG_GQ,champion-brewing-charlottesville,Champion Brewing,https://s3-media4.fl.yelpcdn.com/bphoto/M-xuHZ...,False,https://www.yelp.com/biz/champion-brewing-char...,75,"[{'alias': 'breweries', 'title': 'Breweries'}]",4.0,"{'latitude': 38.0277683, 'longitude': -78.4784...",[],$$,"{'address1': '324 6th St SE', 'address2': '', ...",14342952739,(434) 295-2739,1894.850786
6,2jkDXrC9y0AS84fVel7qAw,reason-beer-charlottesville,Reason Beer,https://s3-media2.fl.yelpcdn.com/bphoto/2MJVa0...,False,https://www.yelp.com/biz/reason-beer-charlotte...,16,"[{'alias': 'breweries', 'title': 'Breweries'}]",4.5,"{'latitude': 38.0703773245881, 'longitude': -7...",[],$$,"{'address1': '1180 Seminole Trl', 'address2': ...",14342600145,(434) 260-0145,2985.209247
7,aghKYhXjMBiM7vrZldGZIg,rockfish-brewing-company-charlottesville,Rockfish Brewing Company,https://s3-media2.fl.yelpcdn.com/bphoto/K0ubdo...,False,https://www.yelp.com/biz/rockfish-brewing-comp...,23,"[{'alias': 'breweries', 'title': 'Breweries'},...",4.0,"{'latitude': 38.037279, 'longitude': -78.489467}",[],,"{'address1': '900 Preston Ave', 'address2': ''...",14345660969,(434) 566-0969,1301.019922
8,49X75FwJAnJvt0WTGgRrYw,kardinal-hall-charlottesville,Kardinal Hall,https://s3-media4.fl.yelpcdn.com/bphoto/N-PJMO...,False,https://www.yelp.com/biz/kardinal-hall-charlot...,207,"[{'alias': 'beergardens', 'title': 'Beer Garde...",4.0,"{'latitude': 38.03621156211071, 'longitude': -...",[delivery],$$,"{'address1': '722 Preston Ave', 'address2': 'S...",14342954255,(434) 295-4255,1283.554013
9,K759d2qGTh_IQhaEwBxNKQ,north-american-sake-brewery-charlottesville-2,North American Sake Brewery,https://s3-media1.fl.yelpcdn.com/bphoto/aC9myA...,False,https://www.yelp.com/biz/north-american-sake-b...,73,"[{'alias': 'breweries', 'title': 'Breweries'},...",4.0,"{'latitude': 38.0259306513344, 'longitude': -7...",[pickup],$$,"{'address1': '522 2nd St SE', 'address2': '', ...",14347678105,(434) 767-8105,2127.053892


In [16]:
df.duplicated(subset='id').sum()

0

In [17]:
## convert the filename to a .csv.gz
csv_file = JSON_FILE.replace('.json','.csv.gz')
csv_file

'Data/BreweryCharlottesville.csv.gz'

In [18]:
## Save it as a compressed csv (to save space)
df.to_csv(csv_file, compression='gzip', index = False)

## Bonus: compare filesize with os module's `os.path.getsize`

In [19]:
size_json = os.path.getsize(JSON_FILE)
size_csv_gz = os.path.getsize(csv_file)

print(f'JSON FILE: {size_json:,} Bytes')
print(f'CSV.GZ FILE: {size_csv_gz:,} Bytes')

print(f'the csv.gz is {size_json/size_csv_gz:.2f} times smaller!')

JSON FILE: 54,974 Bytes
CSV.GZ FILE: 8,537 Bytes
the csv.gz is 6.44 times smaller!


## Next Class: Processing the Results and Mapping 