<h2>Open a new restaurant in Toronto</h2>

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Introduction</a>

2. <a href="#item2">Data</a>

3. <a href="#item3">Methodology</a>

4. <a href="#item4">Results</a>  
    
5. <a href="#item4">Results</a>   
    
6. <a href="#item4">Results</a>   
    
    Discussion
    Conclusion
</font>
</div>

## 1 Introduction
The entrepreneur is planing to open a new restaurant in Toronto, but he is not sure which localtion would be most appropriate for his new venue. 
We noticed that the Toronto already has a lot of restaurants in town, but we need to help this entrepreneur to find this location. 

We have to discover the most important factors that contribute to the restaurant’s success. 

## 2 Data
We can expect this factors to be among the following list: neighborhood wealth, accessibility, crime rates, visibility, competition, etc.
We may use the datasets from Toronto Opendata website to address some of these considerations. From there, we can access the city’s average housing prices list. 

We will be working with Get Wellbeing Toronto - Economics data set that includes average house price by Neighborhood. 
Also we will be using the Foursquare location data to retrieve the food venues to link it with the averaged house pricing data.

<h3> 1 Get the data for Toronto </h3>
<h4>Import Pandas, Beautiful Soup and Requests libraries</h4> 

In [553]:
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from IPython.display import HTML, display
from pandas import ExcelFile
print('Libraries imported.')

Libraries imported.


<h4>Get the requred data and add it to dataframe</h4>

Get Wellbeing Toronto - Economics data that includes average house price by Neighborhood. 

In [554]:
xls = pd.ExcelFile('http://opendata.toronto.ca/social.development/wellbeing/WB-Economics.xlsx')

In [555]:
df = xls.parse('RawData-Ref Period 2011', skiprows=2, index_col=None, na_values=['NA'])
df.columns = ['Neighborhood','Neighborhood Id','Businesses','Child Care Spaces','Debt Risk Score','Home Prices', 'Local Employment', 'Social Assistance Recipients']
df.head()

Unnamed: 0,Neighborhood,Neighborhood Id,Businesses,Child Care Spaces,Debt Risk Score,Home Prices,Local Employment,Social Assistance Recipients
0,Mount Olive-Silverstone-Jamestown,2,271,60,687,251119,3244,6561
1,Thistletown-Beaumond Heights,3,217,25,718,414216,1311,1276
2,Rexdale-Kipling,4,144,75,721,392271,1178,1323
3,Elms-Old Rexdale,5,67,60,692,233832,903,1683
4,Kingsview Village-The Westway,6,160,129,717,292861,2799,4348


Get the Postal code data for the Toronto 
Get the page html from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M. Parse it using Beautiful Soup, finding the table with the required data and create the dataframe

In [556]:
page_html = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(page_html, 'html.parser')
code_table = soup.find('table', {'class':'wikitable sortable'})
code_table_rows = code_table.find_all('tr')
rows = []
for tr in code_table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        rows.append(row)
df_loc = pd.DataFrame(rows, columns=['PostalCode', 'Borough', 'Neighborhood'])

Add column names then remove all rows where we have <b>Not assigned</b> values in <b>Borough</b> column. Replace the <b>Not assigned</b> values in <b>Neighborhood</b> with the corresponding values of <b>Borough</b>.

In [557]:
df_loc = pd.DataFrame(rows, columns=['PostalCode', 'Borough', 'Neighborhood'])
df_loc = df_loc[df_loc.Borough != 'Not assigned']

df_loc[['Neighborhood']] = df_loc[['Neighborhood']].mask(df_loc[['Neighborhood']].apply(lambda x: x.str.contains('Not assigned')), df_loc['Borough'], axis=0)

df_loc.index = range(len(df_loc))
df_loc.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


 ## 3 Methodology
We would combine the average hose price from <b>Get Wellbeing Toronto - Economics</b> data with the Neighborhood postal code dataset to get house prices per postal codes. Then  
we would get the venues from food category using the <b>Foursquare</b> location data. We would cluster the combined data and would try to determine the best possible location for the new restaurant.

Add the average house prices column to postal code dataframe from Toronto Economics dataframe, matching the values to the respective neighborhoods. Convert the average house prices to the units of millions.

In [558]:
def match_Neighborhoods(x):
    df_fil = df.apply(lambda y: y['Home Prices'] if (x['Neighborhood'] in y['Neighborhood']) else None, axis=1)
    df_fil = df_fil.dropna(axis=0, how='all')
    if df_fil.empty:
        df_fil = np.nan
    else:
        df_fil = df_fil.mean()/1000000
        df_fil = round(df_fil,6)
        
    return df_fil

df_loc['AvHomePrice'] = df_loc.apply(match_Neighborhoods, axis=1)
df_loc.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,AvHomePrice
0,M3A,North York,Parkwoods,0.553698
1,M4A,North York,Victoria Village,0.365107
2,M5A,Downtown Toronto,Harbourfront,
3,M5A,Downtown Toronto,Regent Park,0.484444
4,M6A,North York,Lawrence Heights,
5,M6A,North York,Lawrence Manor,
6,M7A,Queen's Park,Queen's Park,
7,M9A,Etobicoke,Islington Avenue,
8,M1B,Scarborough,Rouge,0.42685
9,M1B,Scarborough,Malvern,0.294599


<h4>Group Boroughs by PostalCode</h4>

In [559]:
aggregations = {
    'Borough':'min',
    'AvHomePrice':'mean',
    'Neighborhood':', '.join
}

df_loc = df_loc.groupby('PostalCode').agg(aggregations)
df_loc = df_loc.dropna(axis=0, how='any')
df_loc.reset_index(inplace=True)
df_loc.head()

Unnamed: 0,PostalCode,Borough,AvHomePrice,Neighborhood
0,M1B,Scarborough,0.360725,"Rouge, Malvern"
1,M1C,Scarborough,0.529278,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,0.347395,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,0.316584,Woburn
4,M1J,Scarborough,0.356096,Scarborough Village


In [560]:
df_loc.shape

(56, 4)

<h4>Import Geospatial_data file</h4>
Read Geospatial_data file contaning the coordinates by postal code

In [561]:
filename = "http://cocl.us/Geospatial_data"
df_crd = pd.read_csv(filename, index_col=0)
df_crd = df_crd.reset_index()
df_crd.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<h4>Add coordinates to Neighbourhood dataframe</h4>
Find respective coordinates by the Portal code in Geospatial_data dataframe and add it to Neighbourhood dataframe

In [562]:
for i in df_loc.index: 
    coordinates = df_crd.loc[df_crd['Postal Code'] ==  df_loc.iloc[i]['PostalCode'], ('Latitude', 'Longitude')]
    df_loc.loc[[0,i], 'Latitude'] = coordinates.Latitude.iloc[0]
    df_loc.loc[[0,i], 'Longitude'] = coordinates.Longitude.iloc[0]
df_loc.head(12)

Unnamed: 0,PostalCode,Borough,AvHomePrice,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,0.360725,"Rouge, Malvern",43.739416,-79.588437
1,M1C,Scarborough,0.529278,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,0.347395,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,0.316584,Woburn,43.770992,-79.216917
4,M1J,Scarborough,0.356096,Scarborough Village,43.744734,-79.239476
5,M1K,Scarborough,0.311855,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
6,M1L,Scarborough,0.367679,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
7,M1M,Scarborough,0.532561,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
8,M1P,Scarborough,0.279189,"Dorset Park, Scarborough Town Centre, Wexford ...",43.75741,-79.273304
9,M1R,Scarborough,0.422689,"Maryvale, Wexford",43.750072,-79.295849


<h2>Explore Neighborhoods in Toronto</h2>

In [563]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df_loc['Borough'].unique()),
        df_loc.shape[0]
    )
)

The dataframe has 9 boroughs and 56 neighborhoods.


In [564]:
df_tor = df_loc[df_loc.Borough.str.contains('Toronto')].reset_index(drop=True)
df_tor

Unnamed: 0,PostalCode,Borough,AvHomePrice,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,0.751945,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,0.67784,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4N,Central Toronto,1.09811,Lawrence Park,43.72802,-79.38879
3,M4T,Central Toronto,1.265389,"Moore Park, Summerhill East",43.689574,-79.38316
4,M4W,Downtown Toronto,1.265389,Rosedale,43.679563,-79.377529
5,M4X,Downtown Toronto,0.537025,"Cabbagetown, St. James Town",43.667967,-79.367675
6,M5A,Downtown Toronto,0.484444,"Harbourfront, Regent Park",43.65426,-79.360636
7,M5H,Downtown Toronto,0.617042,"Adelaide, King, Richmond",43.650571,-79.384568
8,M5P,Central Toronto,0.957688,"Forest Hill North, Forest Hill West",43.696948,-79.411307
9,M5T,Downtown Toronto,0.477989,"Chinatown, Grange Park, Kensington Market",43.653206,-79.400049


In [565]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [566]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_tor['Latitude'], df_tor['Longitude'], df_tor['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [567]:
CLIENT_ID = 'BXHYB5NWY5RQOK3CUQR22OAFEJD0KTTBFNSFNG1MBE4MHXOX' # your Foursquare ID
CLIENT_SECRET = '1YQLIBQHCY345W1CDMRE2OBVJYSFEDUIHZ1NPZJ3RHO3P52O' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BXHYB5NWY5RQOK3CUQR22OAFEJD0KTTBFNSFNG1MBE4MHXOX
CLIENT_SECRET:1YQLIBQHCY345W1CDMRE2OBVJYSFEDUIHZ1NPZJ3RHO3P52O


<h5>Create getNearbyVenues function to get nearby venues from Food category for all Toronto neighborhoods. This will give us all restaurants in the selected area.</h5>

In [568]:
def getNearbyVenues(names, prices, latitudes, longitudes, radius=500):
    
    LIMIT = 100
    venues_list=[]
    category_id='4d4b7105d754a06374d81259'
    for name, price, lat, lng in zip(names, prices, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            category_id)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,
            price,
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
                  'AvHomePrice',
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<h5>Call getNearbyVenues function on each neighborhood and create a new dataframe called toronto_venues</h5>

In [569]:
toronto_venues = getNearbyVenues(names=df_tor['Neighborhood'],
                                 prices=df_tor['AvHomePrice'],
                                   latitudes=df_tor['Latitude'],
                                   longitudes=df_tor['Longitude']
                                  )

The Beaches
The Danforth West, Riverdale
Lawrence Park
Moore Park, Summerhill East
Rosedale
Cabbagetown, St. James Town
Harbourfront, Regent Park
Adelaide, King, Richmond
Forest Hill North, Forest Hill West
Chinatown, Grange Park, Kensington Market
Dovercourt Village, Dufferin
Little Portugal, Trinity
High Park, The Junction South
Parkdale, Roncesvalles
Runnymede, Swansea


<h5>Check the size of the resulting dataframe, how many venues were returned for each neighborhood and how many unique categories can be curated from all the returned venues</h5>

In [570]:
print(toronto_venues.shape)
toronto_venues.head()

(367, 8)


Unnamed: 0,Neighborhood,AvHomePrice,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,0.751945,43.676357,-79.293031,My Bbq,43.676881,-79.289286,BBQ Joint
1,The Beaches,0.751945,43.676357,-79.293031,Domino's Pizza,43.679058,-79.297382,Pizza Place
2,The Beaches,0.751945,43.676357,-79.293031,Fearless Meat,43.680337,-79.290289,Burger Joint
3,The Beaches,0.751945,43.676357,-79.293031,Seaspray Restaurant,43.678888,-79.298167,Asian Restaurant
4,"The Danforth West, Riverdale",0.67784,43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


In [571]:
toronto_venues.groupby('Neighborhood').count()
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 75 uniques categories.


<h2>Analyze Each Neighborhood</h2>

<h5>Check how many venues in Food category were returned for each neighborhood</h5>

In [572]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,AvHomePrice,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
"Adelaide, King, Richmond",96,96,96,96,96,96,96
"Cabbagetown, St. James Town",28,28,28,28,28,28,28
"Chinatown, Grange Park, Kensington Market",62,62,62,62,62,62,62
"Dovercourt Village, Dufferin",9,9,9,9,9,9,9
"Forest Hill North, Forest Hill West",5,5,5,5,5,5,5
"Harbourfront, Regent Park",26,26,26,26,26,26,26
"High Park, The Junction South",16,16,16,16,16,16,16
Lawrence Park,1,1,1,1,1,1,1
"Little Portugal, Trinity",44,44,44,44,44,44,44
"Moore Park, Summerhill East",2,2,2,2,2,2,2


In [573]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# add AvHomePrice column back to dataframe
toronto_onehot['AvHomePrice'] = toronto_venues['AvHomePrice'] 

# move AvHomePrice neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,AvHomePrice,American Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,Brazilian Restaurant,...,South American Restaurant,Southern / Soul Food Restaurant,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,The Beaches,0.751945,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,The Beaches,0.751945,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0.751945,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0.751945,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"The Danforth West, Riverdale",0.67784,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<h4>Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category and print each neighborhood along with the top 5 most common venues</h4>

In [574]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
num_top_venues = 5
for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
              venue  freq
0       AvHomePrice  0.62
1        Restaurant  0.07
2              Café  0.06
3  Asian Restaurant  0.06
4    Sandwich Place  0.06


----Cabbagetown, St. James Town----
                venue  freq
0         AvHomePrice  0.54
1         Pizza Place  0.14
2          Restaurant  0.14
3                Café  0.11
4  Italian Restaurant  0.07


----Chinatown, Grange Park, Kensington Market----
                           venue  freq
0                    AvHomePrice  0.48
1                           Café  0.11
2  Vegetarian / Vegan Restaurant  0.10
3          Vietnamese Restaurant  0.08
4                         Bakery  0.06


----Dovercourt Village, Dufferin----
                  venue  freq
0           AvHomePrice  0.50
1           Pizza Place  0.22
2                Bakery  0.22
3                  Café  0.11
4  Fast Food Restaurant  0.11


----Forest Hill North, Forest Hill West----
               venue  freq
0        AvHomePrice  0.96


<h4>Create a function to sort the venues in descending order and put, create the new dataframe and display the top 10 venues for each neighborhood.</h4>

In [575]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [576]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood', 'AvHomePrice']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']
neighborhoods_venues_sorted['AvHomePrice'] = toronto_grouped['AvHomePrice']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 2:] = return_most_common_venues(toronto_grouped.iloc[ind, 1:], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,AvHomePrice,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",0.617042,Restaurant,Sandwich Place,Café,Asian Restaurant,American Restaurant,Salad Place,Steakhouse,Deli / Bodega,Burger Joint,Thai Restaurant
1,"Cabbagetown, St. James Town",0.537025,Pizza Place,Restaurant,Café,Italian Restaurant,Chinese Restaurant,Sandwich Place,Gastropub,Indian Restaurant,Japanese Restaurant,Diner
2,"Chinatown, Grange Park, Kensington Market",0.477989,Café,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Chinese Restaurant,Bakery,Mexican Restaurant,Dim Sum Restaurant,Comfort Food Restaurant,Burger Joint,Caribbean Restaurant
3,"Dovercourt Village, Dufferin",0.502736,Pizza Place,Bakery,Brazilian Restaurant,Portuguese Restaurant,Middle Eastern Restaurant,Café,Fast Food Restaurant,Food,Food Court,Fish & Chips Shop
4,"Forest Hill North, Forest Hill West",0.957688,French Restaurant,Sushi Restaurant,Mexican Restaurant,Restaurant,Sandwich Place,Vietnamese Restaurant,Dumpling Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant
5,"Harbourfront, Regent Park",0.484444,Café,Bakery,Restaurant,Italian Restaurant,Mexican Restaurant,Breakfast Spot,Pizza Place,Sandwich Place,Greek Restaurant,Japanese Restaurant
6,"High Park, The Junction South",0.615948,Mexican Restaurant,Café,Irish Pub,Cajun / Creole Restaurant,Restaurant,Fast Food Restaurant,Sandwich Place,Diner,Fried Chicken Joint,Steakhouse
7,Lawrence Park,1.09811,Dim Sum Restaurant,Deli / Bodega,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Vietnamese Restaurant
8,"Little Portugal, Trinity",0.619435,Asian Restaurant,Restaurant,Vietnamese Restaurant,Café,New American Restaurant,Pizza Place,Vegetarian / Vegan Restaurant,Cuban Restaurant,French Restaurant,Bakery
9,"Moore Park, Summerhill East",1.265389,Restaurant,Italian Restaurant,Vietnamese Restaurant,Fast Food Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant


<h2>Cluster Neighborhoods</h2>

<h4>Run *k*-means to cluster the neighborhood into 10 clusters.</h>

In [577]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)
toronto_grouped_clustering = toronto_grouped_clustering.drop('AvHomePrice', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 4, 0, 3], dtype=int32)

<h4>Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.</h4>

In [578]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_tor
toronto_merged = toronto_merged.drop('AvHomePrice', 1)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

In [579]:
toronto_merged.dropna(how='any', inplace = True) 
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].astype(np.int64, inplace = True)
toronto_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,AvHomePrice,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,1,0.751945,Burger Joint,Asian Restaurant,BBQ Joint,Pizza Place,Filipino Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,0.67784,Greek Restaurant,Pizza Place,Italian Restaurant,Sushi Restaurant,Café,Restaurant,Burger Joint,Indian Restaurant,Japanese Restaurant,Diner
2,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,4,1.09811,Dim Sum Restaurant,Deli / Bodega,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Vietnamese Restaurant
3,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,3,1.265389,Restaurant,Italian Restaurant,Vietnamese Restaurant,Fast Food Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant
4,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,2,1.265389,Japanese Restaurant,Vietnamese Restaurant,Filipino Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant
5,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675,0,0.537025,Pizza Place,Restaurant,Café,Italian Restaurant,Chinese Restaurant,Sandwich Place,Gastropub,Indian Restaurant,Japanese Restaurant,Diner
6,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,0,0.484444,Café,Bakery,Restaurant,Italian Restaurant,Mexican Restaurant,Breakfast Spot,Pizza Place,Sandwich Place,Greek Restaurant,Japanese Restaurant
7,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568,0,0.617042,Restaurant,Sandwich Place,Café,Asian Restaurant,American Restaurant,Salad Place,Steakhouse,Deli / Bodega,Burger Joint,Thai Restaurant
8,M5P,Central Toronto,"Forest Hill North, Forest Hill West",43.696948,-79.411307,0,0.957688,French Restaurant,Sushi Restaurant,Mexican Restaurant,Restaurant,Sandwich Place,Vietnamese Restaurant,Dumpling Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant
9,M5T,Downtown Toronto,"Chinatown, Grange Park, Kensington Market",43.653206,-79.400049,0,0.477989,Café,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Chinese Restaurant,Bakery,Mexican Restaurant,Dim Sum Restaurant,Comfort Food Restaurant,Burger Joint,Caribbean Restaurant


<h4>Show resulting clusters</h4>

In [580]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 4. Results

<h2>Examine Clusters<h2>
<h4>Cluster 0</h4>

In [581]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,AvHomePrice,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,East Toronto,0,0.67784,Greek Restaurant,Pizza Place,Italian Restaurant,Sushi Restaurant,Café,Restaurant,Burger Joint,Indian Restaurant,Japanese Restaurant,Diner
5,Downtown Toronto,0,0.537025,Pizza Place,Restaurant,Café,Italian Restaurant,Chinese Restaurant,Sandwich Place,Gastropub,Indian Restaurant,Japanese Restaurant,Diner
6,Downtown Toronto,0,0.484444,Café,Bakery,Restaurant,Italian Restaurant,Mexican Restaurant,Breakfast Spot,Pizza Place,Sandwich Place,Greek Restaurant,Japanese Restaurant
7,Downtown Toronto,0,0.617042,Restaurant,Sandwich Place,Café,Asian Restaurant,American Restaurant,Salad Place,Steakhouse,Deli / Bodega,Burger Joint,Thai Restaurant
8,Central Toronto,0,0.957688,French Restaurant,Sushi Restaurant,Mexican Restaurant,Restaurant,Sandwich Place,Vietnamese Restaurant,Dumpling Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant
9,Downtown Toronto,0,0.477989,Café,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Chinese Restaurant,Bakery,Mexican Restaurant,Dim Sum Restaurant,Comfort Food Restaurant,Burger Joint,Caribbean Restaurant
10,West Toronto,0,0.502736,Pizza Place,Bakery,Brazilian Restaurant,Portuguese Restaurant,Middle Eastern Restaurant,Café,Fast Food Restaurant,Food,Food Court,Fish & Chips Shop
11,West Toronto,0,0.619435,Asian Restaurant,Restaurant,Vietnamese Restaurant,Café,New American Restaurant,Pizza Place,Vegetarian / Vegan Restaurant,Cuban Restaurant,French Restaurant,Bakery
12,West Toronto,0,0.615948,Mexican Restaurant,Café,Irish Pub,Cajun / Creole Restaurant,Restaurant,Fast Food Restaurant,Sandwich Place,Diner,Fried Chicken Joint,Steakhouse
13,West Toronto,0,0.540739,Breakfast Spot,Cuban Restaurant,Italian Restaurant,Restaurant,Eastern European Restaurant,Burger Joint,Deli / Bodega,Food Court,Food,Fish & Chips Shop


<h4>Cluster 1</h4>

In [582]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,AvHomePrice,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,1,0.751945,Burger Joint,Asian Restaurant,BBQ Joint,Pizza Place,Filipino Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant


<h4>Cluster 2</h4>

In [583]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,AvHomePrice,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Downtown Toronto,2,1.265389,Japanese Restaurant,Vietnamese Restaurant,Filipino Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant


<h4>Cluster 3</h4>

In [584]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,AvHomePrice,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Central Toronto,3,1.265389,Restaurant,Italian Restaurant,Vietnamese Restaurant,Fast Food Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant


<h4>Cluster 4</h4>

In [585]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,AvHomePrice,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Central Toronto,4,1.09811,Dim Sum Restaurant,Deli / Bodega,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Vietnamese Restaurant


It looks like the most of the neighborhoods are located in the 1st cluster. 
When we look at the 1st cluster, we see the most common venues in the neighborhoods are cafes and pizza places. 

## 5. Discussion

Se where should we open a new restaurant? 
By checking out  housing price maps, it appears that the Lawrence Park cluster (4) neighborhood might be a good candidate. This area looks like a quite densly populated area, so we expect the region to have a lot of foot and car traffic, so good visibility. 
This neighborhood has also reasonable average house prices.

## 6. Conclusion

This is only a first-order solution to the question 'Where to open a new restaurant in Toronto?' 
Using public datasets, we are able to  partially address one of the factors that we have mentioned at the beginning - average house prices.
There certainly is lot of room for improvement. 
For example, we have to factor in crime rates, comtetion etc. 
Toronto Opendata website should have other datasets that we might use to futher improve the results.