Hana Nur

## Retrieving, cleaning, and analyzing Yelp data using the Yelp Developers API.

Goals:
1. What is the average review rating for fast food restaurants in Washington, DC?
2. How does that compare to the average rating of McDonald's locations in DC?

### Setup

In [103]:
# imports
import pandas as pd
import json, requests, os, dotenv

In [14]:
# set env variables
print(dotenv.load_dotenv())
API_KEY = os.getenv('API_KEY')

True


### Get Data

In [137]:
# get useragent info for header
r = requests.get('https://httpbin.org/user-agent')
useragent = json.loads(r.text)['user-agent']

def get(root, endpoint, headers, parameters, path, size=50):
    '''
    Make get request to Yelp API for given endpoint. Will retrieve all data from
    query.

    INPUT:
        root (str): api root for request
        endpoint (str): api endpoint
        headers (dict): request header
        parameters (dict): request parameters
        path (dict): JSON path for data to create pd.DataFrame
        size (int): size of each page or data limit

    RETURN:
        pd.DataFrame of request output
    '''
    req = requests.get(root + endpoint, params=parameters, headers=headers)
    j = json.loads(req.text)
    df = pd.json_normalize(j, record_path=[path])
    if j['total'] > size:
        pages = round(j['total']/size)
        for page in range(1, pages):
            headers['offset'] = str(page*size)
            headers['offset']
            req = requests.get(root + endpoint, params=parameters, headers=headers)
            j = json.loads(req.text)
            temp = pd.json_normalize(j, record_path=[path])
            print(temp.name[0], headers['offset'])
            df = pd.concat([df, temp])
    return df

In [138]:
# make a request to the business search endpoint
# get data for all fast food businesses in Washington DC

root = 'https://api.yelp.com/v3'
endpoint = '/businesses/search'
header = {'User-Agent': useragent,
        'Authorization': f'Bearer {API_KEY}',
        'sort_by': 'Distance',
        'limit': '50'}
params = {
    'location': 'Washington, DC',
    'term': 'fast food'
}
path = 'businesses'
df = get(root, endpoint, header, params, path)
df

Raising Cane's 50
Raising Cane's 100
Raising Cane's 150


KeyError: "Key 'businesses' not found. If specifying a record_path, all elements of data should have the path."

In [132]:
df.head(21)

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,transactions,...,coordinates.longitude,location.address1,location.address2,location.address3,location.city,location.zip_code,location.country,location.state,location.display_address,price
0,O_UtaytkTgvUfkY1Q8MqTg,raising-canes-washington,Raising Cane's,https://s3-media1.fl.yelpcdn.com/bphoto/V3aZXH...,False,https://www.yelp.com/biz/raising-canes-washing...,14,"[{'alias': 'hotdogs', 'title': 'Fast Food'}, {...",4.1,[],...,-77.00621,50 Massachusetts Ave NE,,Union Station,"Washington, DC",20002,US,DC,"[50 Massachusetts Ave NE, Union Station, Washi...",
1,gFxIqs_GjtqUfqg812ZA1Q,honeymoon-chicken-washington,Honeymoon Chicken,https://s3-media2.fl.yelpcdn.com/bphoto/qhRM9Q...,False,https://www.yelp.com/biz/honeymoon-chicken-was...,380,"[{'alias': 'chickenshop', 'title': 'Chicken Sh...",4.0,"[pickup, delivery]",...,-77.025307,4201 Georgia Ave NW,,,"Washington, DC",20011,US,DC,"[4201 Georgia Ave NW, Washington, DC 20011]",$$
2,Hsv9yqJ8lOY3hP-VeQP0EQ,surfside-dupont-washington,Surfside Dupont,https://s3-media2.fl.yelpcdn.com/bphoto/z7MY5M...,False,https://www.yelp.com/biz/surfside-dupont-washi...,645,"[{'alias': 'tex-mex', 'title': 'Tex-Mex'}, {'a...",4.0,[delivery],...,-77.042319,1800 N St NW,,,"Washington, DC",20036,US,DC,"[1800 N St NW, Washington, DC 20036]",$$
3,10-zwHrW7F0ckvYd_Fni0w,swizzler-washington-4,Swizzler,https://s3-media1.fl.yelpcdn.com/bphoto/k4XKs4...,False,https://www.yelp.com/biz/swizzler-washington-4...,158,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",4.5,"[pickup, delivery]",...,-77.006144,1259 1st St SE,,,"Washington, DC",20003,US,DC,"[1259 1st St SE, Washington, DC 20003]",$$
4,ZbVNADWUpTAjbB79qeXtwg,service-bar-washington,Service Bar,https://s3-media3.fl.yelpcdn.com/bphoto/l-oY-D...,False,https://www.yelp.com/biz/service-bar-washingto...,283,"[{'alias': 'cocktailbars', 'title': 'Cocktail ...",4.3,"[pickup, delivery]",...,-77.02501,928 U St NW,,,"Washington, DC",20001,US,DC,"[928 U St NW, Washington, DC 20001]",$$
5,WTA0ediIX-3Vyyua6EWdSQ,man-vs-fries-washington-d-c,Man vs Fries,https://s3-media1.fl.yelpcdn.com/bphoto/gfzQmo...,False,https://www.yelp.com/biz/man-vs-fries-washingt...,18,"[{'alias': 'fooddeliveryservices', 'title': 'F...",2.6,"[pickup, delivery]",...,-77.02118,631 P St NW,,,"Washington, D.C.",20001,US,DC,"[631 P St NW, Washington, D.C., DC 20001]",
6,2aI5hO7hBotYjNZIiI1FdQ,yums-ii-carryout-washington,Yum's II Carryout,https://s3-media3.fl.yelpcdn.com/bphoto/146b2a...,False,https://www.yelp.com/biz/yums-ii-carryout-wash...,208,"[{'alias': 'chinese', 'title': 'Chinese'}, {'a...",3.2,[pickup],...,-77.031757,1413 14th St NW,,,"Washington, DC",20005,US,DC,"[1413 14th St NW, Washington, DC 20005]",$
7,iqV-9rIC-azTw9OQQc8Vhw,district-taco-washington-6,District Taco,https://s3-media3.fl.yelpcdn.com/bphoto/EHCDWb...,False,https://www.yelp.com/biz/district-taco-washing...,1794,"[{'alias': 'mexican', 'title': 'Mexican'}]",3.8,"[pickup, delivery]",...,-77.030135,1309 F St NW,,,"Washington, DC",20004,US,DC,"[1309 F St NW, Washington, DC 20004]",$$
8,mhAC6aVjmMI1G5UNPk-tfA,bub-and-pops-washington,Bub and Pop's,https://s3-media3.fl.yelpcdn.com/bphoto/CbVBw_...,False,https://www.yelp.com/biz/bub-and-pops-washingt...,1015,"[{'alias': 'sandwiches', 'title': 'Sandwiches'...",4.4,"[pickup, delivery]",...,-77.042368,1815 M St NW,,,"Washington, DC",20036,US,DC,"[1815 M St NW, Washington, DC 20036]",$$
9,bCQEdpKSY563iAzq7cMcGA,the-capital-burger-washington-6,The Capital Burger,https://s3-media1.fl.yelpcdn.com/bphoto/7NfKT0...,False,https://www.yelp.com/biz/the-capital-burger-wa...,886,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",4.3,[delivery],...,-77.021605,1005 7th St NW,,,"Washington, DC",20001,US,DC,"[1005 7th St NW, Washington, DC 20001]",$$


### Clean Data

In [113]:
df.isna().sum()

id                            0
alias                         0
name                          0
image_url                     0
is_closed                     0
url                           0
review_count                  0
categories                    0
rating                        0
transactions                  0
phone                         0
display_phone                 0
distance                      0
coordinates.latitude          0
coordinates.longitude         0
location.address1             0
location.address2           252
location.address3           360
location.city                 0
location.zip_code             0
location.country              0
location.state                0
location.display_address      0
price                       216
dtype: int64

In [117]:
# address2 and address3 are unnecessary
# dropping cols to remove NA values
df = df.drop(['location.address2', 'location.address3'], axis=1)

### Analyze Data

##### What is the average review rating for fast food restaurants in Washington, DC?

In [121]:
df.rating.mean()

4.0

##### How does that compare to the average rating of McDonald's locations in DC?

In [128]:
df['name'].value_counts()

name
Raising Cane's                     36
Honeymoon Chicken                  36
Chicken + Whiskey                  36
Raising Cane's Chicken Fingers     36
All About Burger Chinatown         36
Aladdin's Kitchen                  36
Roaming Rooster                    36
Nando's Peri-Peri                  36
Eat Brgz                           36
Astro Doughnuts & Fried Chicken    36
Bub and Pop's                      36
The Capital Burger                 36
District Taco                      36
Yum's II Carryout                  36
Man vs Fries                       36
Z-Burger                           36
Service Bar                        36
Swizzler                           36
Surfside Dupont                    36
Chaia                              36
Name: count, dtype: int64