# Yelp API Pulled Data Initial Cleaning and Merge (ETL)

This notebook cleans the pulled yelp fusion api data and merges the 5 csvs in one for Yelp data transformation and feature engineering. 

#### Extract:
New york City(NYC) has 5 boroughs : Manhattan, Bronx, Brooklyn, Queens and Staten Island. The restaurant data for the 5 boroughs per zip code was extracted through an API pull from Yelp Fusion. 
**Code for API pull:**Yelp_Fusion_API.ipynb

**Data Source for Yelp API Clean and Merge:** 5 pickle files from API pull, one for each bourough. 

#### Transform: 

Each pickle file was converted into a dataframe and each dataframe was cleaned in the following steps

* Drop duplicate values
* Extract columns stored as a dictionary
    - Transaction Column is unraveled and using pd.dummies saved as: 
        * 'neighborhood'
        * 'delivery', 
        * 'pickup',   
        * 'restaurant_reservation'
    - Location Column is extracted to get State, City and Zip Code
    - Latitude and Longitude are unraveled
    - Price 
        * The price column was filled with 0 for null values
        * The dollar signs are given numerical values: 1, 2,3,4
        * Transaction Column is unraveled and using pd.dummies saved as 
            - price_value_1.0	
            - price_value_2.0	
            - price_value_3.0	
            - price_value_4.0
* Save each cleaned borough dataframe as csv

#### Load

* Read each borough csv as dataframe
* Merge all 5 boroughs into one dataframe
* Load data of all 5 boroughs into yelp_api_merged.csv

Final extracted, transformed dataset has the following data fields:
* 'zip_code', : 
* 'city', 
* 'state', 
* 'latitude', 
* 'longitude', 
* 'review_count',: Count of Yelp Reviews 
* 'rating', : Yelp Star rating
* 'categories', : Restaurant Categories (needs cleaning)
* 'price', : values converted to 1, 2, 3,4 per number of dollar signs
* 'delivery', : Transaction type
* 'pickup',   : Transaction type
* 'restaurant_reservation', : Transaction type
* 'price_value_1.0', : Yelp count of one dollar sign
* 'price_value_2.0', : Yelp count of two dollar sign
* 'price_value_3.0', : Yelp count of three dollar sign
* 'price_value_4.0'  : Yelp count of four dollar sign


# Extract
The following was repeated 5 times for each NYC borough. The following is the code to clean one borough. 

In [1]:
import requests
import pandas as pd
import pickle

In [2]:
# Import Data:
with open ('NYC_API/data_staten_island.pickle','rb') as f:
    df = pickle.load(f)

print(len(df))
df.shape

2805


(2805, 17)

In [3]:
nyc_df = df.copy()

In [4]:

# Drop duplicate values:
nyc_df = df.drop_duplicates(subset=['alias', 'id'], keep='first', inplace=False).copy()

nyc_df.reset_index(inplace=True, drop = True)
nyc_df.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,location,phone,display_phone,distance,price,neighborhood
0,YOOZjCcC4s1MOBtFsaMB8w,the-richmond-staten-island,The Richmond,https://s3-media1.fl.yelpcdn.com/bphoto/LRCrYG...,False,https://www.yelp.com/biz/the-richmond-staten-i...,154,"[{'alias': 'newamerican', 'title': 'American (...",4.5,"{'latitude': 40.62508, 'longitude': -74.07463}","[pickup, delivery]","{'address1': '695 Bay St', 'address2': None, '...",17184898805,(718) 489-8805,8012.152777,,10314
1,IPnEOyBpRVUCSMHjtu6qeA,shaking-crab-staten-island-staten-island,Shaking Crab - Staten Island,https://s3-media4.fl.yelpcdn.com/bphoto/ZzgImK...,False,https://www.yelp.com/biz/shaking-crab-staten-i...,8,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",5.0,"{'latitude': 40.57091, 'longitude': -74.11032}",[delivery],"{'address1': '365 New Dorp Ln', 'address2': No...",17165843291,(716) 584-3291,5212.161279,,10314
2,B0eYXTeHk4405VdIplHK_A,dolce-fantasia-staten-island,Dolce Fantasia,https://s3-media1.fl.yelpcdn.com/bphoto/CYSocg...,False,https://www.yelp.com/biz/dolce-fantasia-staten...,116,"[{'alias': 'italian', 'title': 'Italian'}, {'a...",5.0,"{'latitude': 40.62627, 'longitude': -74.13019}","[pickup, delivery]","{'address1': '1210 Forest Ave', 'address2': No...",17184201411,(718) 420-1411,4228.360504,$$,10314
3,ShQz0wFmdLkFcqI_sb6oBA,taverna-on-the-bay-staten-island-5,Taverna On The Bay,https://s3-media1.fl.yelpcdn.com/bphoto/6BcSmL...,False,https://www.yelp.com/biz/taverna-on-the-bay-st...,144,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",4.5,"{'latitude': 40.6259299, 'longitude': -74.0750...","[pickup, delivery]","{'address1': '661 Bay St', 'address2': '', 'ad...",17185245686,(718) 524-5686,8013.091896,$$,10314
4,hwe3Jhpt8ojZ9Sdb3BvxLg,ohkami-ramen-staten-island,Ohkami Ramen,https://s3-media1.fl.yelpcdn.com/bphoto/0g3MTM...,False,https://www.yelp.com/biz/ohkami-ramen-staten-i...,21,"[{'alias': 'ramen', 'title': 'Ramen'}]",4.5,"{'latitude': 40.544920799721375, 'longitude': ...","[pickup, delivery]","{'address1': '3827 Richmond Ave', 'address2': ...",17187342288,(718) 734-2288,5765.625727,$$,10314


# Extract columns stored as a dictionary

#### Transactions

In [5]:
# Extract dictionary values for transactions

transactions_dummy = nyc_df['transactions'].str.join(sep=',').str.get_dummies(sep=',')

# Combine new columns with original dataframe:
nyc_df = pd.concat([nyc_df, transactions_dummy], axis=1)
nyc_df.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,location,phone,display_phone,distance,price,neighborhood,delivery,pickup,restaurant_reservation
0,YOOZjCcC4s1MOBtFsaMB8w,the-richmond-staten-island,The Richmond,https://s3-media1.fl.yelpcdn.com/bphoto/LRCrYG...,False,https://www.yelp.com/biz/the-richmond-staten-i...,154,"[{'alias': 'newamerican', 'title': 'American (...",4.5,"{'latitude': 40.62508, 'longitude': -74.07463}","[pickup, delivery]","{'address1': '695 Bay St', 'address2': None, '...",17184898805,(718) 489-8805,8012.152777,,10314,1,1,0
1,IPnEOyBpRVUCSMHjtu6qeA,shaking-crab-staten-island-staten-island,Shaking Crab - Staten Island,https://s3-media4.fl.yelpcdn.com/bphoto/ZzgImK...,False,https://www.yelp.com/biz/shaking-crab-staten-i...,8,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",5.0,"{'latitude': 40.57091, 'longitude': -74.11032}",[delivery],"{'address1': '365 New Dorp Ln', 'address2': No...",17165843291,(716) 584-3291,5212.161279,,10314,1,0,0
2,B0eYXTeHk4405VdIplHK_A,dolce-fantasia-staten-island,Dolce Fantasia,https://s3-media1.fl.yelpcdn.com/bphoto/CYSocg...,False,https://www.yelp.com/biz/dolce-fantasia-staten...,116,"[{'alias': 'italian', 'title': 'Italian'}, {'a...",5.0,"{'latitude': 40.62627, 'longitude': -74.13019}","[pickup, delivery]","{'address1': '1210 Forest Ave', 'address2': No...",17184201411,(718) 420-1411,4228.360504,$$,10314,1,1,0
3,ShQz0wFmdLkFcqI_sb6oBA,taverna-on-the-bay-staten-island-5,Taverna On The Bay,https://s3-media1.fl.yelpcdn.com/bphoto/6BcSmL...,False,https://www.yelp.com/biz/taverna-on-the-bay-st...,144,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",4.5,"{'latitude': 40.6259299, 'longitude': -74.0750...","[pickup, delivery]","{'address1': '661 Bay St', 'address2': '', 'ad...",17185245686,(718) 524-5686,8013.091896,$$,10314,1,1,0
4,hwe3Jhpt8ojZ9Sdb3BvxLg,ohkami-ramen-staten-island,Ohkami Ramen,https://s3-media1.fl.yelpcdn.com/bphoto/0g3MTM...,False,https://www.yelp.com/biz/ohkami-ramen-staten-i...,21,"[{'alias': 'ramen', 'title': 'Ramen'}]",4.5,"{'latitude': 40.544920799721375, 'longitude': ...","[pickup, delivery]","{'address1': '3827 Richmond Ave', 'address2': ...",17187342288,(718) 734-2288,5765.625727,$$,10314,1,1,0


#### Location

In [6]:
# Extract dictionary values for location
nyc_df['city'] = nyc_df['location'].apply(lambda x: x.get('city'))
nyc_df['zip_code'] = nyc_df['location'].apply(lambda x: x.get('zip_code'))
nyc_df['state'] = nyc_df['location'].apply(lambda x: x.get('state'))
nyc_df.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,...,display_phone,distance,price,neighborhood,delivery,pickup,restaurant_reservation,city,zip_code,state
0,YOOZjCcC4s1MOBtFsaMB8w,the-richmond-staten-island,The Richmond,https://s3-media1.fl.yelpcdn.com/bphoto/LRCrYG...,False,https://www.yelp.com/biz/the-richmond-staten-i...,154,"[{'alias': 'newamerican', 'title': 'American (...",4.5,"{'latitude': 40.62508, 'longitude': -74.07463}",...,(718) 489-8805,8012.152777,,10314,1,1,0,Staten Island,10304,NY
1,IPnEOyBpRVUCSMHjtu6qeA,shaking-crab-staten-island-staten-island,Shaking Crab - Staten Island,https://s3-media4.fl.yelpcdn.com/bphoto/ZzgImK...,False,https://www.yelp.com/biz/shaking-crab-staten-i...,8,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",5.0,"{'latitude': 40.57091, 'longitude': -74.11032}",...,(716) 584-3291,5212.161279,,10314,1,0,0,Staten Island,10306,NY
2,B0eYXTeHk4405VdIplHK_A,dolce-fantasia-staten-island,Dolce Fantasia,https://s3-media1.fl.yelpcdn.com/bphoto/CYSocg...,False,https://www.yelp.com/biz/dolce-fantasia-staten...,116,"[{'alias': 'italian', 'title': 'Italian'}, {'a...",5.0,"{'latitude': 40.62627, 'longitude': -74.13019}",...,(718) 420-1411,4228.360504,$$,10314,1,1,0,Staten Island,10310,NY
3,ShQz0wFmdLkFcqI_sb6oBA,taverna-on-the-bay-staten-island-5,Taverna On The Bay,https://s3-media1.fl.yelpcdn.com/bphoto/6BcSmL...,False,https://www.yelp.com/biz/taverna-on-the-bay-st...,144,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",4.5,"{'latitude': 40.6259299, 'longitude': -74.0750...",...,(718) 524-5686,8013.091896,$$,10314,1,1,0,Staten Island,10304,NY
4,hwe3Jhpt8ojZ9Sdb3BvxLg,ohkami-ramen-staten-island,Ohkami Ramen,https://s3-media1.fl.yelpcdn.com/bphoto/0g3MTM...,False,https://www.yelp.com/biz/ohkami-ramen-staten-i...,21,"[{'alias': 'ramen', 'title': 'Ramen'}]",4.5,"{'latitude': 40.544920799721375, 'longitude': ...",...,(718) 734-2288,5765.625727,$$,10314,1,1,0,Staten Island,10312,NY


#### Latitude and Longitude

In [7]:
# Extract dictionary values for latitude and longitude
nyc_df['latitude'] = nyc_df['coordinates'].apply(lambda x: x.get('latitude'))
nyc_df['longitude'] = nyc_df['coordinates'].apply(lambda x: x.get('longitude'))
nyc_df.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,...,price,neighborhood,delivery,pickup,restaurant_reservation,city,zip_code,state,latitude,longitude
0,YOOZjCcC4s1MOBtFsaMB8w,the-richmond-staten-island,The Richmond,https://s3-media1.fl.yelpcdn.com/bphoto/LRCrYG...,False,https://www.yelp.com/biz/the-richmond-staten-i...,154,"[{'alias': 'newamerican', 'title': 'American (...",4.5,"{'latitude': 40.62508, 'longitude': -74.07463}",...,,10314,1,1,0,Staten Island,10304,NY,40.62508,-74.07463
1,IPnEOyBpRVUCSMHjtu6qeA,shaking-crab-staten-island-staten-island,Shaking Crab - Staten Island,https://s3-media4.fl.yelpcdn.com/bphoto/ZzgImK...,False,https://www.yelp.com/biz/shaking-crab-staten-i...,8,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",5.0,"{'latitude': 40.57091, 'longitude': -74.11032}",...,,10314,1,0,0,Staten Island,10306,NY,40.57091,-74.11032
2,B0eYXTeHk4405VdIplHK_A,dolce-fantasia-staten-island,Dolce Fantasia,https://s3-media1.fl.yelpcdn.com/bphoto/CYSocg...,False,https://www.yelp.com/biz/dolce-fantasia-staten...,116,"[{'alias': 'italian', 'title': 'Italian'}, {'a...",5.0,"{'latitude': 40.62627, 'longitude': -74.13019}",...,$$,10314,1,1,0,Staten Island,10310,NY,40.62627,-74.13019
3,ShQz0wFmdLkFcqI_sb6oBA,taverna-on-the-bay-staten-island-5,Taverna On The Bay,https://s3-media1.fl.yelpcdn.com/bphoto/6BcSmL...,False,https://www.yelp.com/biz/taverna-on-the-bay-st...,144,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",4.5,"{'latitude': 40.6259299, 'longitude': -74.0750...",...,$$,10314,1,1,0,Staten Island,10304,NY,40.62593,-74.07506
4,hwe3Jhpt8ojZ9Sdb3BvxLg,ohkami-ramen-staten-island,Ohkami Ramen,https://s3-media1.fl.yelpcdn.com/bphoto/0g3MTM...,False,https://www.yelp.com/biz/ohkami-ramen-staten-i...,21,"[{'alias': 'ramen', 'title': 'Ramen'}]",4.5,"{'latitude': 40.544920799721375, 'longitude': ...",...,$$,10314,1,1,0,Staten Island,10312,NY,40.544921,-74.165351


#### Price

In [8]:
#replace NaN with 0
nyc_df["price"] = nyc_df["price"].fillna(0)
nyc_df.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,...,price,neighborhood,delivery,pickup,restaurant_reservation,city,zip_code,state,latitude,longitude
0,YOOZjCcC4s1MOBtFsaMB8w,the-richmond-staten-island,The Richmond,https://s3-media1.fl.yelpcdn.com/bphoto/LRCrYG...,False,https://www.yelp.com/biz/the-richmond-staten-i...,154,"[{'alias': 'newamerican', 'title': 'American (...",4.5,"{'latitude': 40.62508, 'longitude': -74.07463}",...,0,10314,1,1,0,Staten Island,10304,NY,40.62508,-74.07463
1,IPnEOyBpRVUCSMHjtu6qeA,shaking-crab-staten-island-staten-island,Shaking Crab - Staten Island,https://s3-media4.fl.yelpcdn.com/bphoto/ZzgImK...,False,https://www.yelp.com/biz/shaking-crab-staten-i...,8,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",5.0,"{'latitude': 40.57091, 'longitude': -74.11032}",...,0,10314,1,0,0,Staten Island,10306,NY,40.57091,-74.11032
2,B0eYXTeHk4405VdIplHK_A,dolce-fantasia-staten-island,Dolce Fantasia,https://s3-media1.fl.yelpcdn.com/bphoto/CYSocg...,False,https://www.yelp.com/biz/dolce-fantasia-staten...,116,"[{'alias': 'italian', 'title': 'Italian'}, {'a...",5.0,"{'latitude': 40.62627, 'longitude': -74.13019}",...,$$,10314,1,1,0,Staten Island,10310,NY,40.62627,-74.13019
3,ShQz0wFmdLkFcqI_sb6oBA,taverna-on-the-bay-staten-island-5,Taverna On The Bay,https://s3-media1.fl.yelpcdn.com/bphoto/6BcSmL...,False,https://www.yelp.com/biz/taverna-on-the-bay-st...,144,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",4.5,"{'latitude': 40.6259299, 'longitude': -74.0750...",...,$$,10314,1,1,0,Staten Island,10304,NY,40.62593,-74.07506
4,hwe3Jhpt8ojZ9Sdb3BvxLg,ohkami-ramen-staten-island,Ohkami Ramen,https://s3-media1.fl.yelpcdn.com/bphoto/0g3MTM...,False,https://www.yelp.com/biz/ohkami-ramen-staten-i...,21,"[{'alias': 'ramen', 'title': 'Ramen'}]",4.5,"{'latitude': 40.544920799721375, 'longitude': ...",...,$$,10314,1,1,0,Staten Island,10312,NY,40.544921,-74.165351


In [9]:
# Update price to be numerical values:
price = {'$': 1, '$$': 2, '$$$':3, '$$$$': 4}
nyc_df['price_value'] = nyc_df['price'].map(price)
nyc_df['price'] = nyc_df['price'].map(price)

In [10]:
nyc_df = pd.get_dummies(nyc_df, columns=["price_value"])
nyc_df.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,...,restaurant_reservation,city,zip_code,state,latitude,longitude,price_value_1.0,price_value_2.0,price_value_3.0,price_value_4.0
0,YOOZjCcC4s1MOBtFsaMB8w,the-richmond-staten-island,The Richmond,https://s3-media1.fl.yelpcdn.com/bphoto/LRCrYG...,False,https://www.yelp.com/biz/the-richmond-staten-i...,154,"[{'alias': 'newamerican', 'title': 'American (...",4.5,"{'latitude': 40.62508, 'longitude': -74.07463}",...,0,Staten Island,10304,NY,40.62508,-74.07463,0,0,0,0
1,IPnEOyBpRVUCSMHjtu6qeA,shaking-crab-staten-island-staten-island,Shaking Crab - Staten Island,https://s3-media4.fl.yelpcdn.com/bphoto/ZzgImK...,False,https://www.yelp.com/biz/shaking-crab-staten-i...,8,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",5.0,"{'latitude': 40.57091, 'longitude': -74.11032}",...,0,Staten Island,10306,NY,40.57091,-74.11032,0,0,0,0
2,B0eYXTeHk4405VdIplHK_A,dolce-fantasia-staten-island,Dolce Fantasia,https://s3-media1.fl.yelpcdn.com/bphoto/CYSocg...,False,https://www.yelp.com/biz/dolce-fantasia-staten...,116,"[{'alias': 'italian', 'title': 'Italian'}, {'a...",5.0,"{'latitude': 40.62627, 'longitude': -74.13019}",...,0,Staten Island,10310,NY,40.62627,-74.13019,0,1,0,0
3,ShQz0wFmdLkFcqI_sb6oBA,taverna-on-the-bay-staten-island-5,Taverna On The Bay,https://s3-media1.fl.yelpcdn.com/bphoto/6BcSmL...,False,https://www.yelp.com/biz/taverna-on-the-bay-st...,144,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",4.5,"{'latitude': 40.6259299, 'longitude': -74.0750...",...,0,Staten Island,10304,NY,40.62593,-74.07506,0,1,0,0
4,hwe3Jhpt8ojZ9Sdb3BvxLg,ohkami-ramen-staten-island,Ohkami Ramen,https://s3-media1.fl.yelpcdn.com/bphoto/0g3MTM...,False,https://www.yelp.com/biz/ohkami-ramen-staten-i...,21,"[{'alias': 'ramen', 'title': 'Ramen'}]",4.5,"{'latitude': 40.544920799721375, 'longitude': ...",...,0,Staten Island,10312,NY,40.544921,-74.165351,0,1,0,0


#### Remove unneeded columns

In [11]:
nyc_df.columns

Index(['id', 'alias', 'name', 'image_url', 'is_closed', 'url', 'review_count',
       'categories', 'rating', 'coordinates', 'transactions', 'location',
       'phone', 'display_phone', 'distance', 'price', 'neighborhood',
       'delivery', 'pickup', 'restaurant_reservation', 'city', 'zip_code',
       'state', 'latitude', 'longitude', 'price_value_1.0', 'price_value_2.0',
       'price_value_3.0', 'price_value_4.0'],
      dtype='object')

In [12]:
# Remove columns that we will not be working with:
nyc_df = nyc_df.drop(columns = ["id", "alias", "image_url", "url", "coordinates", "location",
                               "phone", "display_phone", "distance", "is_closed", "name", "transactions",
                                "neighborhood"], axis=1)


In [13]:
nyc_df.head()

Unnamed: 0,review_count,categories,rating,price,delivery,pickup,restaurant_reservation,city,zip_code,state,latitude,longitude,price_value_1.0,price_value_2.0,price_value_3.0,price_value_4.0
0,154,"[{'alias': 'newamerican', 'title': 'American (...",4.5,,1,1,0,Staten Island,10304,NY,40.62508,-74.07463,0,0,0,0
1,8,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",5.0,,1,0,0,Staten Island,10306,NY,40.57091,-74.11032,0,0,0,0
2,116,"[{'alias': 'italian', 'title': 'Italian'}, {'a...",5.0,2.0,1,1,0,Staten Island,10310,NY,40.62627,-74.13019,0,1,0,0
3,144,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",4.5,2.0,1,1,0,Staten Island,10304,NY,40.62593,-74.07506,0,1,0,0
4,21,"[{'alias': 'ramen', 'title': 'Ramen'}]",4.5,2.0,1,1,0,Staten Island,10312,NY,40.544921,-74.165351,0,1,0,0


#### Export each cleaned borough dataframe as csv

In [164]:
# Export housing_df to csv
file = "NYC_API/nyc_staten_island.csv"
nyc_df.to_csv(file)

# Merge 5 Boroughs 

In [18]:
# Merging NYC Boroughs 
df_manhatten = pd.read_csv("NYC_API/nyc_manhattan.csv")
df_bronx = pd.read_csv("NYC_API/nyc_bronx.csv")
df_brooklyn = pd.read_csv("NYC_API/nyc_brooklyn.csv")
df_queens = pd.read_csv("NYC_API/nyc_queens.csv")
df_staten_island = pd.read_csv("NYC_API/nyc_staten_island.csv")
df_bronx.head()

Unnamed: 0.1,Unnamed: 0,review_count,categories,rating,price,delivery,pickup,restaurant_reservation,city,zip_code,state,latitude,longitude,price_value_1.0,price_value_2.0,price_value_3.0,price_value_4.0
0,0,287,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",4.0,2.0,1,1,0,Bronx,10464.0,NY,40.84964,-73.78706,0,1,0,0
1,1,576,"[{'alias': 'seafood', 'title': 'Seafood'}]",4.0,2.0,0,0,0,Bronx,10464.0,NY,40.83771,-73.782671,0,1,0,0
2,2,37,"[{'alias': 'newamerican', 'title': 'American (...",4.0,,1,0,0,Bronx,10465.0,NY,40.811256,-73.835281,0,0,0,0
3,3,892,"[{'alias': 'cuban', 'title': 'Cuban'}, {'alias...",4.0,2.0,1,1,0,Bronx,10461.0,NY,40.8379,-73.83437,0,1,0,0
4,4,568,"[{'alias': 'latin', 'title': 'Latin American'}]",4.0,2.0,0,1,0,Bronx,10465.0,NY,40.82432,-73.820381,0,1,0,0


In [19]:
df_manhatten = df_manhatten.drop(columns=["Unnamed: 0"], axis=1)
df_bronx = df_bronx.drop(columns=["Unnamed: 0"], axis=1)
df_brooklyn = df_brooklyn.drop(columns=["Unnamed: 0"], axis=1)
df_queens = df_queens.drop(columns=["Unnamed: 0"], axis=1)
df_staten_island = df_staten_island.drop(columns=["Unnamed: 0"], axis=1)

In [21]:
list_df = [df_brooklyn, df_manhatten, df_queens, df_staten_island]

yelp_merged_df = df_bronx.append(list_df)
print(yelp_merged_df.shape)
yelp_merged_df.head()

(18314, 16)


Unnamed: 0,review_count,categories,rating,price,delivery,pickup,restaurant_reservation,city,zip_code,state,latitude,longitude,price_value_1.0,price_value_2.0,price_value_3.0,price_value_4.0
0,287,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",4.0,2.0,1,1,0,Bronx,10464.0,NY,40.84964,-73.78706,0,1,0,0
1,576,"[{'alias': 'seafood', 'title': 'Seafood'}]",4.0,2.0,0,0,0,Bronx,10464.0,NY,40.83771,-73.782671,0,1,0,0
2,37,"[{'alias': 'newamerican', 'title': 'American (...",4.0,,1,0,0,Bronx,10465.0,NY,40.811256,-73.835281,0,0,0,0
3,892,"[{'alias': 'cuban', 'title': 'Cuban'}, {'alias...",4.0,2.0,1,1,0,Bronx,10461.0,NY,40.8379,-73.83437,0,1,0,0
4,568,"[{'alias': 'latin', 'title': 'Latin American'}]",4.0,2.0,0,1,0,Bronx,10465.0,NY,40.82432,-73.820381,0,1,0,0


In [22]:
yelp_merged_df.columns

Index(['review_count', 'categories', 'rating', 'price', 'delivery', 'pickup',
       'restaurant_reservation', 'city', 'zip_code', 'state', 'latitude',
       'longitude', 'price_value_1.0', 'price_value_2.0', 'price_value_3.0',
       'price_value_4.0'],
      dtype='object')

In [23]:
yelp_merged_df = yelp_merged_df[['zip_code','city', 'state', 'latitude',
       'longitude', 'review_count', 'rating', 'categories', 'price', 'delivery', 'pickup',
       'restaurant_reservation', 'price_value_1.0', 'price_value_2.0', 'price_value_3.0',
       'price_value_4.0']]
yelp_merged_df.head()

Unnamed: 0,zip_code,city,state,latitude,longitude,review_count,rating,categories,price,delivery,pickup,restaurant_reservation,price_value_1.0,price_value_2.0,price_value_3.0,price_value_4.0
0,10464.0,Bronx,NY,40.84964,-73.78706,287,4.0,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",2.0,1,1,0,0,1,0,0
1,10464.0,Bronx,NY,40.83771,-73.782671,576,4.0,"[{'alias': 'seafood', 'title': 'Seafood'}]",2.0,0,0,0,0,1,0,0
2,10465.0,Bronx,NY,40.811256,-73.835281,37,4.0,"[{'alias': 'newamerican', 'title': 'American (...",,1,0,0,0,0,0,0
3,10461.0,Bronx,NY,40.8379,-73.83437,892,4.0,"[{'alias': 'cuban', 'title': 'Cuban'}, {'alias...",2.0,1,1,0,0,1,0,0
4,10465.0,Bronx,NY,40.82432,-73.820381,568,4.0,"[{'alias': 'latin', 'title': 'Latin American'}]",2.0,0,1,0,0,1,0,0


In [24]:
yelp_merged_df.shape

(18314, 16)

In [27]:
yelp_merged_df["price"] = yelp_merged_df["price"].fillna(0)

In [28]:
yelp_merged_df.head()

Unnamed: 0,zip_code,city,state,latitude,longitude,review_count,rating,categories,price,delivery,pickup,restaurant_reservation,price_value_1.0,price_value_2.0,price_value_3.0,price_value_4.0
0,10464.0,Bronx,NY,40.84964,-73.78706,287,4.0,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",2.0,1,1,0,0,1,0,0
1,10464.0,Bronx,NY,40.83771,-73.782671,576,4.0,"[{'alias': 'seafood', 'title': 'Seafood'}]",2.0,0,0,0,0,1,0,0
2,10465.0,Bronx,NY,40.811256,-73.835281,37,4.0,"[{'alias': 'newamerican', 'title': 'American (...",0.0,1,0,0,0,0,0,0
3,10461.0,Bronx,NY,40.8379,-73.83437,892,4.0,"[{'alias': 'cuban', 'title': 'Cuban'}, {'alias...",2.0,1,1,0,0,1,0,0
4,10465.0,Bronx,NY,40.82432,-73.820381,568,4.0,"[{'alias': 'latin', 'title': 'Latin American'}]",2.0,0,1,0,0,1,0,0


In [29]:
#Save as csv file
yelp_merged_df.to_csv("NYC_API/yelp_api_merged.csv")

In [30]:
yelp_merged_df.columns

Index(['zip_code', 'city', 'state', 'latitude', 'longitude', 'review_count',
       'rating', 'categories', 'price', 'delivery', 'pickup',
       'restaurant_reservation', 'price_value_1.0', 'price_value_2.0',
       'price_value_3.0', 'price_value_4.0'],
      dtype='object')