Hana Nur

## Retrieving, cleaning, and analyzing Yelp data using the Yelp Developers API.

Goals:
1. What is the average review rating for fast food restaurants in Washington, DC?
2. How does that compare to the average rating of McDonald's locations in DC?

### Setup

In [1]:
# imports
import pandas as pd
import json, requests, os, dotenv

In [2]:
# set env variables
print(dotenv.load_dotenv())
API_KEY = os.getenv('API_KEY')

True


### Get Data

In [26]:
# get useragent info for header
r = requests.get('https://httpbin.org/user-agent')
useragent = json.loads(r.text)['user-agent']

def get(root, endpoint, headers, parameters, path, size=50):
    '''
    Make get request to Yelp API for given endpoint. Will retrieve all data from
    query.

    INPUT:
        root (str): api root for request
        endpoint (str): api endpoint
        headers (dict): request header
        parameters (dict): request parameters
        path (dict): JSON path for data to create pd.DataFrame
        size (int): size of each page or data limit

    RETURN:
        pd.DataFrame of request output
    '''
    req = requests.get(root + endpoint, params=parameters, headers=headers)
    j = json.loads(req.text)
    df = pd.json_normalize(j, record_path=[path])
    if j['total'] > size:
        pages = round(j['total']/size)
        for page in range(1, pages):
            parameters['offset'] = str(page*size)
            if int(parameters['offset']) >= 1000:
                break;
            req_temp = requests.get(root + endpoint, params=parameters, headers=headers)
            j_temp = json.loads(req_temp.text)
            temp = pd.json_normalize(j_temp, record_path=[path])
            df = pd.concat([df, temp])
    return df

In [27]:
# make a request to the business search endpoint
# get data for all fast food businesses in Washington DC

root = 'https://api.yelp.com/v3'
endpoint = '/businesses/search'
header = {"accept": "application/json",
        'Authorization': f'Bearer {API_KEY}'}
params = {
    'location': 'Washington, DC',
    'term': 'fast food',
    'sort_by': 'distance',
    'limit': '50'
}
path = 'businesses'
df = get(root, endpoint, header, params, path)
df

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,transactions,...,coordinates.latitude,coordinates.longitude,location.address1,location.address2,location.address3,location.city,location.zip_code,location.country,location.state,location.display_address
0,tBuVOEsVRymPpn1grr0OSQ,chipotle-mexican-grill-washington-16,Chipotle Mexican Grill,https://s3-media2.fl.yelpcdn.com/bphoto/4Z6bco...,False,https://www.yelp.com/biz/chipotle-mexican-gril...,70,"[{'alias': 'mexican', 'title': 'Mexican'}, {'a...",2.1,"[delivery, pickup]",...,38.910016,-77.032254,1508 14th St NW,Floor,,"Washington, DC",20005,US,DC,"[1508 14th St NW, Floor, Washington, DC 20005]"
1,2aI5hO7hBotYjNZIiI1FdQ,yums-ii-carryout-washington,Yum's II Carryout,https://s3-media3.fl.yelpcdn.com/bphoto/146b2a...,False,https://www.yelp.com/biz/yums-ii-carryout-wash...,209,"[{'alias': 'chinese', 'title': 'Chinese'}, {'a...",3.2,[pickup],...,38.909464,-77.031757,1413 14th St NW,,,"Washington, DC",20005,US,DC,"[1413 14th St NW, Washington, DC 20005]"
2,3TUO2cd0EGC0IG3JRq7mrw,shake-shack-washington-8,Shake Shack 14th Street Logan Circle,https://s3-media1.fl.yelpcdn.com/bphoto/XIX8oj...,False,https://www.yelp.com/biz/shake-shack-washingto...,168,"[{'alias': 'hotdogs', 'title': 'Fast Food'}, {...",3.1,"[delivery, pickup]",...,38.908991,-77.032263,1400 14th St NW,,,"Washington, DC",20005,US,DC,"[1400 14th St NW, Washington, DC 20005]"
3,UaegiireX5X8m_PNFpLH9g,popeyes-louisiana-kitchen-washington-11,Popeyes Louisiana Kitchen,https://s3-media2.fl.yelpcdn.com/bphoto/n2zD6Q...,False,https://www.yelp.com/biz/popeyes-louisiana-kit...,111,"[{'alias': 'chicken_wings', 'title': 'Chicken ...",2.2,"[delivery, pickup]",...,38.908010,-77.032360,1322 14th St NW,,,"Washington, DC",20005,US,DC,"[1322 14th St NW, Washington, DC 20005]"
4,3zIz-vD2WIU0WhrBla01KA,mcdonalds-washington,McDonald's,https://s3-media2.fl.yelpcdn.com/bphoto/bTGzbS...,False,https://www.yelp.com/biz/mcdonalds-washington?...,69,"[{'alias': 'hotdogs', 'title': 'Fast Food'}, {...",1.9,[delivery],...,38.916855,-77.032181,1944 14th St NW,,,"Washington, DC",20009,US,DC,"[1944 14th St NW, Washington, DC 20009]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45,AVihKWsWaQAp0f2DPdXe4Q,zorbas-cafe-washington,Zorba's Cafe,https://s3-media3.fl.yelpcdn.com/bphoto/hnudHb...,False,https://www.yelp.com/biz/zorbas-cafe-washingto...,764,"[{'alias': 'desserts', 'title': 'Desserts'}, {...",3.8,"[delivery, pickup]",...,38.911690,-77.045170,1612 20th St NW,,,"Washington, DC",20009,US,DC,"[1612 20th St NW, Washington, DC 20009]"
46,HCbpbBUl3n5cApIzC8ODBg,the-well-dressed-burrito-washington,The Well Dressed Burrito,https://s3-media1.fl.yelpcdn.com/bphoto/hk91l2...,False,https://www.yelp.com/biz/the-well-dressed-burr...,353,"[{'alias': 'mexican', 'title': 'Mexican'}, {'a...",3.7,"[delivery, pickup]",...,38.906535,-77.043809,1220 19th St NW,,,"Washington, DC",20036,US,DC,"[1220 19th St NW, Washington, DC 20036]"
47,_9iGmwzbSvT8Km9RMet9OA,capitol-city-brewing-company-washington-3,Capitol City Brewing Company,https://s3-media4.fl.yelpcdn.com/bphoto/uFpR5Z...,False,https://www.yelp.com/biz/capitol-city-brewing-...,1274,"[{'alias': 'breweries', 'title': 'Breweries'},...",3.3,"[delivery, pickup]",...,38.900190,-77.027560,1100 New York Ave NW,,,"Washington, DC",20005,US,DC,"[1100 New York Ave NW, Washington, DC 20005]"
48,ZU6xnfItMhceZTvM1MFqlA,kochix-washington,KoChix,https://s3-media4.fl.yelpcdn.com/bphoto/982O66...,False,https://www.yelp.com/biz/kochix-washington?adj...,377,"[{'alias': 'korean', 'title': 'Korean'}, {'ali...",4.3,[pickup],...,38.913743,-77.016354,400 Florida Ave NW,,,"Washington, DC",20001,US,DC,"[400 Florida Ave NW, Washington, DC 20001]"


### Clean Data

In [28]:
df.isna().sum()

id                            0
alias                         0
name                          0
image_url                     0
is_closed                     0
url                           0
review_count                  0
categories                    0
rating                        0
transactions                  0
price                       197
phone                         0
display_phone                 0
distance                      0
coordinates.latitude          0
coordinates.longitude         0
location.address1             2
location.address2           206
location.address3           312
location.city                 0
location.zip_code             0
location.country              0
location.state                0
location.display_address      0
dtype: int64

In [30]:
# address2 and address3 are unnecessary
# dropping cols to remove NA values
df = df.drop(['location.address2', 'location.address3'], axis=1)

In [40]:
df['location.city'].unique()

array(['Washington, DC', 'Washington, D.C.', 'Washington Dc',
       'Washington DC', 'Rosslyn', 'Arlington', 'Washington Navy Yard',
       'Fort Myers', 'Ft Myer', 'Takoma Park', 'Hyattsville',
       'Mount Rainier', 'Brentwood', 'Colmar Manor', 'Chevy Chase',
       'Silver Spring', 'Alexandria', 'Bethesda', 'Bladensburg',
       'Adelphi', 'Langley Park', 'Capitol Heights', 'Oxon Hill',
       'Riverdale', 'Hillcrest Heights', 'Falls Church',
       'Baileys Crossroads', 'Riverdale Park', 'Temple Hills',
       'Oxon HIll', 'Camp Springs', 'College Park', 'Forest Heights',
       'Mclean', 'Landover Hills', 'Suitland', 'Landover',
       'District Heights', 'Seat Pleasant', 'Marlow Heights',
       'Forestville', 'Capital Heights', 'McLean', 'National Harbor',
       'Milton', 'Kensington', 'Lanham', 'Wheaton', 'Greenbelt',
       'Berwyn Heights', 'New Carrollton', 'Annandale', 'Largo',
       'Fort Washington', 'Mc Lean', 'Beltsville', 'Rockville', 'Bowie',
       'Vienna', 'And

In [41]:
# limit results to Washington, DC
city_names = ['Washington, DC', 'Washington, D.C.', 'Washington Dc', 'Washington DC', 'Washington Navy Yard', 'Washington D.C.']
dc_df = df[df['location.city'].isin(city_names)]

### Analyze Data

##### What is the average review rating for fast food restaurants in Washington, DC?

In [42]:
dc_df.rating.mean()

2.830722891566265

##### How does that compare to the average rating of McDonald's locations in DC?

In [45]:
dc_df['name'].value_counts()

name
Subway                     53
McDonald's                 24
Chipotle Mexican Grill     22
Potbelly Sandwich Shop     20
Chick-fil-A                12
                           ..
Panda of DC                 1
All About Burger            1
Mexican Grill & Cantina     1
Mike Carryout               1
KoChix                      1
Name: count, Length: 154, dtype: int64

In [47]:
dc_df[dc_df['name']=="McDonald's"].rating.mean()

2.158333333333333

### Results

Overall, McDonald's average Yelp rating is 0.67 lower than that of the average fast food restaurant in Washington, DC.