# Toronto Restaurants:

## Cleaning the Data

Now that we have a dataset of Toronto restaurants, we'll have to wrangle it into data we can work with. 

**Here is what we'll need to do:** 
1. Remove duplicates (if any)
2. Typeacast quant values (str -> int, float)
3. One-hot encode categorical variables
4. Get geo-coordinates for each restaurant (necessary to visualise on a map)

In [3]:
# import data processing libs 
import pandas as pd
import numpy as np

# import plotting libs
import matplotlib.pyplot as plt
import matplotlib.colors as colors 
import seaborn as sns 
%matplotlib inline 

# import requests
import requests

# import regex
import re

# import map 
import folium 

print(folium.__version__)
print('All libraries successfully imported!')

0.11.0
All libraries successfully imported!


Let's take a look at our raw dataframe

### 1. Get Rid of Duplicates

In [46]:
restaurants = pd.read_csv('raw_restaurants_data.csv', index_col = False)
restaurants.drop(columns = ['Unnamed: 0'], inplace = True)
print(restaurants.shape)
restaurants.head()

(5125, 9)


Unnamed: 0,occasion,name,avg review,review count,neighborhood,address,cuisines,cost for 2,featured in
0,CASUAL DINING,Pizzeria Libretto,4.5,"(1,010 Reviews)","Ossington Avenue, Trinity Bellwoods","221 Ossington Avenue, Toronto M6J 2Z8","['Pizza', 'Italian']",CA$50,['Hipster Hangouts']
1,CASUAL DINING,KINKA IZAKAYA,4.6,"(1,166 Reviews)","Church Street, Church And Wellesley","398 Church Street, Toronto M5B 2A2 2A2","['Japanese', 'Asian']",CA$55,['Unique Dining']
2,CASUAL DINING,Pai,4.9,(614 Reviews),"Duncan Street, Entertainment District","18 Duncan Street, Toronto","['Thai', 'Asian']",CA$50,[]
3,QUICK BITES,Banh Mi Boys,4.7,(941 Reviews),"Queen Street West, Fashion District","392 Queen Street West, Toronto",['Sandwich'],CA$25,['Pocket Friendly']
4,QUICK BITES,The Stockyards,4.6,(734 Reviews),"Saint Clair Avenue West, Forest Hill","699 St Clair Avenue West, Toronto M6C 1B2","['BBQ', 'Burger']",CA$25,"['Kickass Burgers, Barbecue & Grill, Grill My ..."


In [48]:
# let's get rid of duplicates (if any)
restaurants.drop_duplicates(subset = ['name', 'address', 'avg review', 'review count'], keep = 'first', inplace = True)
restaurants.shape

(5125, 9)

### 2. Typecast Variables to Appropriate Types

In [49]:
restaurants.dtypes

occasion         object
name             object
avg review      float64
review count     object
neighborhood     object
address          object
cuisines         object
cost for 2       object
featured in      object
dtype: object

Each of our variables except for "avg review" are of type object (str types are labeled object in pandas dataframe). Since most of our data is categorical, this is more or less true. However, there are a couple of quantitative variables we'll want to typecast to numerical values (i.e int or float). Our numerical variables will be: 
1. avg review (already type float)
2. review count
3. cost for 2 

Let's start by dealing with review count. Since each observation consists of the numerical value we are interested in (eg. 1,010) and some other bits that aren't useful to us, we'll need to tease out only the numerical value from the rest (this includes words, commas, and parentheses). 

A regular expression (regex) is likely the most straight-forward and efficient way of achieving this. Since review counts are integers, we'll type cast them to int once we have teased out the numerical value for each observation.

In [50]:
# import regex
import re
def get_int(s):
    numbers = re.sub(r'\D+', '', s)
    numbers = int(numbers)
    return numbers

restaurants['review count'] = restaurants['review count'].apply(get_int)

In [51]:
# we can actually use the same function on cost for 2
# but before we can do that, some observations do not contain any numerical values. We'll have to drop those. 
restaurants['cost for 2'].unique() 

array(['CA$50', 'CA$55', 'CA$25', 'CA$40', 'CA$30', 'CA$75', 'CA$80',
       'CA$150', 'CA$70', 'CA$85', 'CA$65', 'CA$35', 'CA$60', 'CA$45',
       'CA$110', 'CA$100', 'CA$120', 'CA$140', 'CA$20', 'CA$90', 'CA$180',
       'CA$15', 'CA$160', 'CA$155', 'CA$300', 'CA$43', 'CA$130', 'CA$10',
       'CA$115', 'CA$13', 'CA$145', 'CA$125', 'CA$95', 'CA$112', 'CA$600',
       'CA$54', 'CA$5', 'CA$CA$CA$CA$'], dtype=object)

In [52]:
restaurants.drop(restaurants.loc[restaurants['cost for 2'] == 'CA$'*4].index, inplace = True)
restaurants['cost for 2'] = restaurants['cost for 2'].apply(get_int)

In [53]:
restaurants.head()

Unnamed: 0,occasion,name,avg review,review count,neighborhood,address,cuisines,cost for 2,featured in
0,CASUAL DINING,Pizzeria Libretto,4.5,1010,"Ossington Avenue, Trinity Bellwoods","221 Ossington Avenue, Toronto M6J 2Z8","['Pizza', 'Italian']",50,['Hipster Hangouts']
1,CASUAL DINING,KINKA IZAKAYA,4.6,1166,"Church Street, Church And Wellesley","398 Church Street, Toronto M5B 2A2 2A2","['Japanese', 'Asian']",55,['Unique Dining']
2,CASUAL DINING,Pai,4.9,614,"Duncan Street, Entertainment District","18 Duncan Street, Toronto","['Thai', 'Asian']",50,[]
3,QUICK BITES,Banh Mi Boys,4.7,941,"Queen Street West, Fashion District","392 Queen Street West, Toronto",['Sandwich'],25,['Pocket Friendly']
4,QUICK BITES,The Stockyards,4.6,734,"Saint Clair Avenue West, Forest Hill","699 St Clair Avenue West, Toronto M6C 1B2","['BBQ', 'Burger']",25,"['Kickass Burgers, Barbecue & Grill, Grill My ..."


Next, let's observe featured in. 

In [54]:
featured = restaurants['featured in']
print(featured.unique())
print(' \n There are {} unique featured values'.format(len(featured.unique())))

["['Hipster Hangouts']" "['Unique Dining']" '[]' "['Pocket Friendly']"
 "['Kickass Burgers, Barbecue & Grill, Grill My Cheese']"
 "['Mexican Magic, Hipster Hangouts']" "['Mamma Mia!']"
 "['Date Night, Happy Herbivores']" "['Barbecue & Grill']"
 "['Great Chinese']" "['Date Night']"
 "['Kickass Burgers, Pocket Friendly']" "['Super Seafood']"
 "['Mexican Magic']" "['Craft Beer']" "['Mexican Magic, Pocket Friendly']"
 "['Weekend Brunches, Great Breakfasts, Happy Herbivores, Grill My Cheese']"
 "['Ramen']" "['Super Sushi']"
 "['Weekend Brunches, Late Night Restaurants']"
 "['Sweet Tooth, Hipster Hangouts']" "['Comfort Food, Unique Dining']"
 "['Date Night, Mexican Magic']" "['Great Breakfasts, Weekend Brunches']"
 "['Kickass Burgers']" "['Date Night, Mamma Mia!']" "['Comfort Food']"
 "['Great Breakfasts']" "['Weekend Brunches']"
 "['Great Chinese, Pocket Friendly']" "['Trending This Week']"
 "['Sweet Tooth']" "['Late Night Restaurants, Hipster Hangouts']"
 "['Late Night Restaurants']" "['Sp

We can see that some of the values for featured in are (for all intents and purposes) null. If the vast majority of our records contain null values for featured in, we might be better off dropping the variable since it won't add to much additional information to our data.

In [55]:
null_vals = restaurants.loc[restaurants['featured in'] == '[]']
print('{} observations contain null values'.format(len(null_vals)))
print('Alternatively, {}% of our data'.format(round(len(null_vals) / len(featured) * 100, 2)))

4822 observations contain null values
Alternatively, 94.49% of our data


Since roughly, 95% of the observations for "featured in" are null, we're probably better off dropping it entirely.

In [56]:
restaurants.drop(columns = ['featured in'], inplace = True)
restaurants.head()

Unnamed: 0,occasion,name,avg review,review count,neighborhood,address,cuisines,cost for 2
0,CASUAL DINING,Pizzeria Libretto,4.5,1010,"Ossington Avenue, Trinity Bellwoods","221 Ossington Avenue, Toronto M6J 2Z8","['Pizza', 'Italian']",50
1,CASUAL DINING,KINKA IZAKAYA,4.6,1166,"Church Street, Church And Wellesley","398 Church Street, Toronto M5B 2A2 2A2","['Japanese', 'Asian']",55
2,CASUAL DINING,Pai,4.9,614,"Duncan Street, Entertainment District","18 Duncan Street, Toronto","['Thai', 'Asian']",50
3,QUICK BITES,Banh Mi Boys,4.7,941,"Queen Street West, Fashion District","392 Queen Street West, Toronto",['Sandwich'],25
4,QUICK BITES,The Stockyards,4.6,734,"Saint Clair Avenue West, Forest Hill","699 St Clair Avenue West, Toronto M6C 1B2","['BBQ', 'Burger']",25


Next, cuisine types may look like a list type object but it's actually a string. To make handling these observations easier down the line, it'll be usefull to convert them into comma separated strings. Again, a regex will do the trick.

In [57]:
def cleaner(string): 
    regex = re.compile('[\'\[\]]')
    string = regex.sub('', string)
    
    return string
    
restaurants['cuisines'] = restaurants['cuisines'].apply(cleaner)
restaurants.head()

Unnamed: 0,occasion,name,avg review,review count,neighborhood,address,cuisines,cost for 2
0,CASUAL DINING,Pizzeria Libretto,4.5,1010,"Ossington Avenue, Trinity Bellwoods","221 Ossington Avenue, Toronto M6J 2Z8","Pizza, Italian",50
1,CASUAL DINING,KINKA IZAKAYA,4.6,1166,"Church Street, Church And Wellesley","398 Church Street, Toronto M5B 2A2 2A2","Japanese, Asian",55
2,CASUAL DINING,Pai,4.9,614,"Duncan Street, Entertainment District","18 Duncan Street, Toronto","Thai, Asian",50
3,QUICK BITES,Banh Mi Boys,4.7,941,"Queen Street West, Fashion District","392 Queen Street West, Toronto",Sandwich,25
4,QUICK BITES,The Stockyards,4.6,734,"Saint Clair Avenue West, Forest Hill","699 St Clair Avenue West, Toronto M6C 1B2","BBQ, Burger",25


In [58]:
restaurants.dtypes

occasion         object
name             object
avg review      float64
review count      int64
neighborhood     object
address          object
cuisines         object
cost for 2        int64
dtype: object

### 3. One-Hot Encode Categorical Variables

Now, our data looks more or less good. Since we'll be passing it to our learning algorithm later, we'll want to transform our categorical variables into a numerical format our algorithm will be able to handle. In our case, the only categorical variable we really care about is cuisines. We'll one-hot encode it (generate dummy variables for each unique values in cuisines) to get numerical values. 

We'll then assign our dummies to a new dataframe (onehot) which we'll be able to use later.

In [None]:
df = restaurants.cuisines.str.split(',', expand = True).stack().reset_index(level = 1, drop = True).\
to_frame('cuisines')
onehot = pd.get_dummies(df, prefix = 'd', columns = ['cuisines']).groupby(level = 0).sum()

print(onehot.shape)
onehot.head()

In [60]:
onehot.describe()

Unnamed: 0,d_,d_.1,d_ Afghan,d_ African,d_ American,d_ Argentine,d_ Asian,d_ Asian Fusion,d_ BBQ,d_ Bakery,...,d_Tibetan,d_Trinbagonian,d_Turkish,d_Ukrainian,d_Vegetarian,d_Venezuelan,d_Vietnamese,d_West Indian,d_Xinjiang,d_Yunnan
count,5103.0,5103.0,5103.0,5103.0,5103.0,5103.0,5103.0,5103.0,5103.0,5103.0,...,5103.0,5103.0,5103.0,5103.0,5103.0,5103.0,5103.0,5103.0,5103.0,5103.0
mean,0.000196,0.000588,0.002156,0.003135,0.014501,0.000196,0.155595,0.002939,0.003919,0.010582,...,0.000784,0.000196,0.001764,0.000392,0.000392,0.000196,0.022928,0.00196,0.00098,0.000196
std,0.013999,0.0313,0.046383,0.055912,0.119557,0.013999,0.362506,0.054142,0.062487,0.102333,...,0.027989,0.013999,0.041963,0.019795,0.019795,0.013999,0.149688,0.044229,0.03129,0.013999
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


### 4. Geocode Addresses

##### Getting Lat/Lng Values: 

To get a better picture of the restaurant landscape in Toronto, it will be useful to visualise our data on a map. Most map APIs require geo-coordinates (Latitude/Longitude) to plot datapoints. So, we'll have to get the longitude and latitude coordinates for each of our restaurant addresses. 

To that end, we'll need to find a suitable API, and feed it specific geographic information for each restaurant (address, in our case). It's critical that we pay particular attention to the accuracy and consistency of our address data. In my experience, the addresses listed on websites do not always follow a consistent format and, in some instances, may even contain typos. Since our data was scraped from the web, it is liable to issues mentioned above. We will need to either force addresses into a consistent, error-free format (a rather cumbersome task that we will try to avoid if possible) or leverage an API that can handle a wide range of formats. 

To that effect, I have found that the Google Maps API is extremely flexible and can generally handle a wide range of formats (and even typos) rather well.

In [19]:
# Assign Google API key
api_key = 'YOUR API KEY' # you Google Geolocator API key

# Initialize array for transformed addresses
Lat = []
Lng = []
clean_address = []
url = 'https://maps.googleapis.com/maps/api/geocode/json'

iterator = 0

# Loop through addresses, call API, extract lat, lng, clean addresses
for address in restaurants['address']: 
    
    try: 
        params = {
            'address': address, 
            'sensor': 'false',
            'region': 'ca',
            'key': api_key
        }

        req = requests.get(url, params)
        res = req.json()

        result = res['results'][0]

        lat = result['geometry']['location']['lat']
        lng = result['geometry']['location']['lng']
        new_add = result['formatted_address']
        
    except: 
        lat = None
        lng = None
        new_add = None 
    
    Lat.append(lat)
    Lng.append(lng)
    clean_address.append(new_add)
    
    # Increment iterator and print to keep track of progress
    iterator += 1 
    print(' \r {} out of {}'.format(iterator, len(restaurants['address'])), end = "")


restaurants['Lat'] = Lat
restaurants['Lng'] = Lng
restaurants['Google Addresses'] = clean_address

 5103 out of 5103                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

In [20]:
# inspect null values
restaurants.loc[restaurants['Lat'].isnull()].shape

(7, 11)

In [21]:
# drop missing values 
restaurants.dropna(subset = ['Lat', 'Lng'], inplace = True)
restaurants.loc[restaurants['Lat'].isnull()].shape# drop missing values 
restaurants.dropna(subset = ['Lat', 'Lng'], inplace = True)
restaurants.loc[restaurants['Lat'].isnull()].shape

(0, 11)

In [22]:
restaurants.head()

Unnamed: 0,occasion,name,avg review,review count,neighborhood,address,cuisines,cost for 2,Lat,Lng,Google Addresses
0,CASUAL DINING,Pizzeria Libretto,4.5,1010,"Ossington Avenue, Trinity Bellwoods","221 Ossington Avenue, Toronto M6J 2Z8","Pizza, Italian",50,43.649002,-79.420343,"221 Ossington Ave, Toronto, ON M6J 2Z8, Canada"
1,CASUAL DINING,KINKA IZAKAYA,4.6,1166,"Church Street, Church And Wellesley","398 Church Street, Toronto M5B 2A2 2A2","Japanese, Asian",55,43.660421,-79.378989,"398 Church St, Toronto, ON M5B 2A2, Canada"
2,CASUAL DINING,Pai,4.9,614,"Duncan Street, Entertainment District","18 Duncan Street, Toronto","Thai, Asian",50,43.647869,-79.388685,"18 Duncan St, Toronto, ON M5H 3G8, Canada"
3,QUICK BITES,Banh Mi Boys,4.7,941,"Queen Street West, Fashion District","392 Queen Street West, Toronto",Sandwich,25,43.648827,-79.39697,"392 Queen St W, Toronto, ON M5V 2A6, Canada"
4,QUICK BITES,The Stockyards,4.6,734,"Saint Clair Avenue West, Forest Hill","699 St Clair Avenue West, Toronto M6C 1B2","BBQ, Burger",25,43.681346,-79.426147,"699 St Clair Ave W, Toronto, ON M6C 1B2, Canada"


##### Neighbourhoods: 

It would be interesting to illustrate our data at the neighbourhood level. For example, we could see which neighbourhoods in Toronto are host to the most expensive, best rated, most diverse set of restaurants (and more!). To do this, we'll need to do the following: 

1. Segment Toronto into neighbourhoods. We'll need to determine the geographic boundaries for each neighbourhood to do this. 
2. Create a new label assigning each restaurant to its respective neighbourhood.

We'll start by getting a geojson file delimiting each neighbourhood in Toronto. The link below provides some good instructions on how to do this: 
https://medium.com/dataexplorations/generating-geojson-file-for-toronto-fsas-9b478a059f04



In [23]:
# Assign geojson delimiting Toronto neighbourhoods
n_geo = r'Toronto_neighbourhoods.geojson'

Now, let's assign a neighbourhood to each restaurant. We can actually achieve this by looking at postal codes. While we already have a neighbourhood label, it isn't consistent wiith how our geojson is encoded. There is a good table on Wikipedia we can scrape to get the right data. 

In [24]:
# get Toronto neighbourhoods data
from pandas.io.html import read_html
Toronto_Neighbourhoods = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
neigh = pd.read_html(Toronto_Neighbourhoods, attrs = {'class':'wikitable'})
df = neigh[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [34]:
# let's extract the first three characters of the postal code from 'Google Address' in our restaurants dataframe
# We can do this efficiently using a regex
import re

def get_postal(address): 
    try: 
        postal = re.compile(r'[A-Z]\d[A-Z]')
        return postal.search(address).group()
    except: 
        return None

restaurants['Postal Codes'] = restaurants['Google Addresses'].apply(get_postal)

len(restaurants['Postal Codes'].unique())

208

In [35]:
# did we get any None? 
print('There are {} None values'.format(len(restaurants.loc[restaurants['Postal Codes'].isna()])))
restaurants.loc[restaurants['Postal Codes'].isna()]

There are 12 None values


Unnamed: 0,occasion,name,avg review,review count,neighborhood,address,cuisines,cost for 2,Lat,Lng,Google Addresses,Postal Codes
202,CASUAL DINING,Made in Mexico Restaurant and Cantina,4.2,242,Newmarket,185 Main Street,Mexican,43,42.421402,-71.066762,"185 Main St, Malden, MA 02148, USA",
264,CASUAL DINING,Reds Wine Tavern,3.9,145,"First Canadian Place, Financial District","First Canadian Place, 77 Adelaide Street West,...","Grill, European",70,43.648891,-79.380873,"First Canadian Place, Toronto, ON, Canada",
346,CASUAL DINING,Paisano's Italian Garden Cafe,4.1,117,York,Willowdale Avenue. M3B 1Y6,Italian,50,43.780584,-79.406659,"Willowdale Ave, Toronto, ON, Canada",
1319,CASUAL DINING,Vietnam Noodle Star,3.9,47,"Midland Avenue, Scarborough",4188 Finch Avenue East M1S 4T6,Vietnamese,20,43.804583,-79.287792,"4188 Finch Ave E, Scarborough, ON, Canada",
1452,QUICK BITES,Flip Toss & Thai Kitchen,3.8,60,"Harbord Street, Harbord Village","141 Harbord Street, Toronto","Thai, Asian",30,43.662105,-79.406228,"141 Harbord St, Toronto, ON 30100, Canada",
2146,CASUAL DINING,Full House Chinese Cuisine,3.2,28,"Finch Avenue East, Scarborough",4188 Finch Avenue East M1S 4T6,Cantonese,25,43.804583,-79.287792,"4188 Finch Ave E, Scarborough, ON, Canada",
2191,CASUAL DINING,La Parisienne Creperie,3.5,22,"99 Bronte Road, Oakville",none,"French, Desserts",80,42.230552,-83.735396,"3989 Research Park Dr, Ann Arbor, MI 48108, USA",
2553,CASUAL DINING,Sushi Hama,3.9,26,"12599 Hwy 50, Caledon","King's Highway 50, Caledon","Japanese, Sushi",40,43.907935,-79.777472,"Hwy 50, Caledon, ON, Canada",
2728,CASUAL DINING,Sushi Tower,3.7,28,"Yonge Street, Downtown Yonge","374A Yonge Street, Toronto","Sushi, Japanese, Asian",40,43.658794,-79.382227,"374 Yonge St, Toronto, ON, Canada",
3593,FOOD TRUCK,Food Cabbie,3.3,8,Richmond Hill,"Location Varies, Richmond Hill, Toronto","Mexican, American",20,43.88284,-79.440281,"Richmond Hill, ON, Canada",


For the most part, our None values are the result of either missing postal codes in Google Addresses or addresses that are not in Canada (i.e don't have a valid postal code either). We do not care about addresses outside of Canada and there is no good/quick solution for the missing postal codes. Since there are only 12 missing values out of over 5,000 and those values do not appear to be systemically missing (i.e random), it should be safe to simply drop them. 

In [40]:
restaurants.drop(restaurants.loc[restaurants['Postal Codes'].isna()].index, axis = 0, inplace = True)
print('There are {} None values'.format(len(restaurants.loc[restaurants['Postal Codes'].isna()])))
restaurants.loc[restaurants['Postal Codes'].isna()]

There are 0 None values


Unnamed: 0,occasion,name,avg review,review count,neighborhood,address,cuisines,cost for 2,Lat,Lng,Google Addresses,Postal Codes


In [41]:
restaurants.head()

Unnamed: 0,occasion,name,avg review,review count,neighborhood,address,cuisines,cost for 2,Lat,Lng,Google Addresses,Postal Codes
0,CASUAL DINING,Pizzeria Libretto,4.5,1010,"Ossington Avenue, Trinity Bellwoods","221 Ossington Avenue, Toronto M6J 2Z8","Pizza, Italian",50,43.649002,-79.420343,"221 Ossington Ave, Toronto, ON M6J 2Z8, Canada",M6J
1,CASUAL DINING,KINKA IZAKAYA,4.6,1166,"Church Street, Church And Wellesley","398 Church Street, Toronto M5B 2A2 2A2","Japanese, Asian",55,43.660421,-79.378989,"398 Church St, Toronto, ON M5B 2A2, Canada",M5B
2,CASUAL DINING,Pai,4.9,614,"Duncan Street, Entertainment District","18 Duncan Street, Toronto","Thai, Asian",50,43.647869,-79.388685,"18 Duncan St, Toronto, ON M5H 3G8, Canada",M5H
3,QUICK BITES,Banh Mi Boys,4.7,941,"Queen Street West, Fashion District","392 Queen Street West, Toronto",Sandwich,25,43.648827,-79.39697,"392 Queen St W, Toronto, ON M5V 2A6, Canada",M5V
4,QUICK BITES,The Stockyards,4.6,734,"Saint Clair Avenue West, Forest Hill","699 St Clair Avenue West, Toronto M6C 1B2","BBQ, Burger",25,43.681346,-79.426147,"699 St Clair Ave W, Toronto, ON M6C 1B2, Canada",M6C


Our restaurants is now clean, complete, and ready to be analyzed. We also have a dataframe of dummies for our categorical variables (onehot) which will be useful when we begin modelling. 

We'll save these two dataframe as csv files and move onto modelling our data. 

In [42]:
restaurants.to_csv('clean_restaurants_data.csv')
onehot.to_csv('one_hot_restaurants.csv')