<h1> The Battle of the Neighbourhoods: Retail Therapy </h1>

<h2> Introduction </h2>
<p>Toronto is the most populous city in Canada with a recorded population of nearly 3 million people. It is the capital city of the province of Ontario and is widely recognised as one of the most multicultural and cosmopolitan cities in the world. As an international centre of business, finance, arts and culture, Toronto is extremely popular with both tourists and residents alike. </p>
<h3> Business Problem </h3>
<p>My client is the chief executive officer (CEO) of a large retail clothing business. As a budding data scientist, my client has asked me to decide on the most suitable neighbourhood in Toronto to open a new store. My client has stressed that the key to retail success on the high street boils down to four factors; great products, attentive customer service, consistently high foot-fall, and convenient parking. </p>
<p>While the products themselves and customer service are not my responsibility, I can leverage the Foursquare API to ensure that the recommended neighbourhoods have busy streets and nearby parking locations. Additionally, I will be clustering Toronto’s neighbourhoods by popular venues to determine which of them can be considered hot-spots for retail outlets and eateries. </p>

<h2> The Data </h2>
<h3> Required data </h3>
<p>The following data will be required to provide an accurate recommendation to my client:</p>
<ul>
    <li>A list of Toronto’s neighbourhoods, with latitude and longitude coordinates, calculated by geopy’s Nominatim</li>
    <li>A list of the most popular venues for each postal code region retrieved via the Foursquare API</li>
    <li>A list of suitable car parks for each postal code region retrieved via the Foursquare API</li>
    <li>A list of total population for each postal code region retrieved via Statistics Canada [1]
</ul>
<h3> Assumptions </h3>
<p>There will be some assumptions made to keep this project relatively simple. Firstly, I am making the assumption that all postal-codes cover the same land area and hence population density will have a direct linear relationship with total population. Additionally, I am making the assumption that while total populations may have changed since this data was curated (2016), postal-code population sizes will be largely similar in relation to each other. Furthermore, it will be assumed all car parks retrieved from the Foursquare API are deemed suitable and that only one car park is required to meet the parking criteria outline in the first section.</p>

<h3> Import necessary Python libraries </h3>

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import requests
print('Libraries imported.')

Libraries imported.


<h3> Get the table of neighbourhoods and postal codes and read into a dataframe </h3>

In [2]:
link = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
tables = pd.read_html(link, header=0)
df=pd.DataFrame(tables[0])

<h3>Ignore cells with a borough that is Not assigned </h3>

In [3]:
df.drop(df[df['Borough']=="Not assigned"].index,axis=0, inplace=True)

<h3> Combine rows with the same Postal Code </h3>

In [4]:
df_pc=df.groupby("Postal Code", as_index=False).agg(lambda neighbourhood:','.join(set(neighbourhood)))

<h3>If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough</h3>

In [5]:
df_pc.loc[df_pc['Neighbourhood'] == 'Not assigned', 'Neighbourhood'] = ...
df_pc.loc[df_pc['Neighbourhood'] == 'Not assigned', 'Borough']

Series([], Name: Borough, dtype: object)

<h3> Let's take a look at what we have so far! </h3>

In [6]:
df_pc.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


<h3> Adding Latitude and Longitude </h3>

In [7]:
coordinates_df = pd.read_csv('http://cocl.us/Geospatial_data')
df_pc = pd.merge(df_pc, coordinates_df, on='Postal Code')
df_pc.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


<h3> Adding population statistics </h3>

In [8]:
population_df = pd.read_csv('Data\\toronto_population.csv')

In [9]:
df_pc = pd.merge(df_pc, population_df, on='Postal Code')
df_pc.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Population
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,66108
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,35626
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,46943
3,M1G,Scarborough,Woburn,43.770992,-79.216917,29690
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,24383


<h3> Ignore cells with zero population</h3>
<p>Assuming it's safe to drop data with a population of zero.</p>
<p>This will also cover postal codes with an unassigned population. M7R is the only postcode in this dataset without a population assigned. This is because it refers to the city of Mississauga, which can safely be considered as outside of Toronto and hence not relevant for this project. </p>

In [10]:
df_pc.drop(df_pc[df_pc['Population']==0].index,axis=0, inplace=True)

<h3> Plot population for each Neighbourhood and drop those below the mean </h3>

In [11]:
from matplotlib.pyplot import figure
df_pc.plot(x="Neighbourhood", y="Population", kind="bar", figsize=(15,5))

<matplotlib.axes._subplots.AxesSubplot at 0x24ac3183978>

In [12]:
population_mean = df_pc["Population"].mean()
df_pc = df_pc[df_pc.Population > population_mean]
df_pc

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Population
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,66108
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,35626
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,46943
3,M1G,Scarborough,Woburn,43.770992,-79.216917,29690
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476,36699
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,48434
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577,35081
10,M1P,Scarborough,"Dorset Park, Wexford Heights, Scarborough Town...",43.75741,-79.273304,45571
11,M1R,Scarborough,"Wexford, Maryvale",43.750072,-79.295849,29858
12,M1S,Scarborough,Agincourt,43.7942,-79.262029,37769


<h3> Obtain all parking lots in Toronto and identify the number of unique neighbourhoods that have access to a parking lot </h3>

In [13]:
CLIENT_ID = 'GXVOFCDKFJLQYEHM11FA1MBHGT2MVV0OXNYUB5KMC0QS2XZV'
CLIENT_SECRET = 'XKNWJEXJCNMZXPPUA2OFQ2KEVWLQ4BR0D0PKJOPLH3LQFBHX'
VERSION = '20200409' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GXVOFCDKFJLQYEHM11FA1MBHGT2MVV0OXNYUB5KMC0QS2XZV
CLIENT_SECRET:XKNWJEXJCNMZXPPUA2OFQ2KEVWLQ4BR0D0PKJOPLH3LQFBHX


In [14]:
def getNearbyParking(names, latitudes, longitudes, radius=1000, limit=100):
    
    parking_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query=Parking'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        parking_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_parking = pd.DataFrame([item for parking_list in parking_list for item in parking_list])
    nearby_parking.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Car Park', 
                  'Car Park Latitude', 
                  'Car Park Longitude', 
                  'Car Park Category']
    
    return(nearby_parking)

In [15]:
toronto_parking = getNearbyParking(names=df_pc['Neighbourhood'],
                                   latitudes=df_pc['Latitude'],
                                   longitudes=df_pc['Longitude'])
toronto_parking.head()

Malvern, Rouge
Rouge Hill, Port Union, Highland Creek
Guildwood, Morningside, West Hill
Woburn
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Dorset Park, Wexford Heights, Scarborough Town Centre
Wexford, Maryvale
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Steeles West, L'Amoreaux West
Fairview, Henry Farm, Oriole
Willowdale, Newtonbrook
Willowdale, Willowdale East
Willowdale, Willowdale West
Parkwoods
Don Mills
Bathurst Manor, Wilson Heights, Downsview North
Downsview
Woodbine Heights
East Toronto, Broadview North (Old East York)
The Danforth West, Riverdale
India Bazaar, The Beaches West
Church and Wellesley
Regent Park, Harbourfront
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Glencairn
Caledonia-Fairbanks
Christie
Dufferin, Dovercourt Village
Little Portugal, Trinity
Brockton, Parkdale Village, Exhibition Plac

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Car Park,Car Park Latitude,Car Park Longitude,Car Park Category
0,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,"Re/Max West Realty Inc., Brokerage",43.783623,-79.169489,Office
1,Scarborough Village,43.744734,-79.239476,Eglinton Go Station Parking Lot,43.739985,-79.231362,Parking
2,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,Kennedy Station - South Parking Lot,43.731911,-79.262755,Parking
3,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,Kennedy Station - North Lot,43.7332,-79.263298,Parking
4,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,Kennedy Station - East Parking Lot,43.733258,-79.263618,Parking


<h3> Drop any results that aren't in the Parking venue category </h3>

In [16]:
toronto_parking.drop(toronto_parking[toronto_parking['Car Park Category'] != 'Parking'].index,axis=0, inplace=True)

In [17]:
print('There are {} unique parking lots in Toronto.'.format(len(toronto_parking['Car Park'].unique())))
print('There are {} neighbourhoods with access to at least one parking lot.'.format(len(toronto_parking['Neighbourhood'].unique())))

There are 44 unique parking lots in Toronto.
There are 24 neighbourhoods with access to at least one parking lot.


In [18]:
neighbourhood_parking = toronto_parking['Neighbourhood'].unique()
df_pc = df_pc[df_pc['Neighbourhood'].isin(neighbourhood_parking)]
df_pc.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Population
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476,36699
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,48434
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577,35081
10,M1P,Scarborough,"Dorset Park, Wexford Heights, Scarborough Town...",43.75741,-79.273304,45571
12,M1S,Scarborough,Agincourt,43.7942,-79.262029,37769


In [19]:
print('There are {} neighbourhoods with access to at least one parking lot.'.format(len(df_pc['Neighbourhood'].unique())))

There are 24 neighbourhoods with access to at least one parking lot.


<h3> Using the Foursquare Api to obtain the top 100 venues for each neighbourhood in Toronto </h3>
<p> I am assuming that the top venues are within walking distance (1km) of the postal code coordinates </p> 

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000, limit=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<h3> Use the above method to obtain the top 100 venues for each neighbourhood and store them in a dataframe </h3>

In [21]:
toronto_venues = getNearbyVenues(names=df_pc['Neighbourhood'],
                                   latitudes=df_pc['Latitude'],
                                   longitudes=df_pc['Longitude'])
toronto_venues.head()

Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Dorset Park, Wexford Heights, Scarborough Town Centre
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Fairview, Henry Farm, Oriole
Willowdale, Newtonbrook
Willowdale, Willowdale East
Don Mills
East Toronto, Broadview North (Old East York)
The Danforth West, Riverdale
India Bazaar, The Beaches West
Church and Wellesley
Regent Park, Harbourfront
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Glencairn
Christie
Dufferin, Dovercourt Village
Little Portugal, Trinity
Brockton, Parkdale Village, Exhibition Place
High Park, The Junction South
Runnymede, Swansea
New Toronto, Mimico South, Humber Bay Shores


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Scarborough Village,43.744734,-79.239476,Diamond Pizza,43.743699,-79.245922,Pizza Place
1,Scarborough Village,43.744734,-79.239476,Tim Hortons,43.738992,-79.238961,Coffee Shop
2,Scarborough Village,43.744734,-79.239476,Dairy Queen,43.739506,-79.236894,Ice Cream Shop
3,Scarborough Village,43.744734,-79.239476,Dairy Queen,43.73958,-79.236991,Ice Cream Shop
4,Scarborough Village,43.744734,-79.239476,Subway,43.738284,-79.236792,Sandwich Place


In [22]:
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,43,43,43,43,43,43
"Brockton, Parkdale Village, Exhibition Place",100,100,100,100,100,100
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",15,15,15,15,15,15
Christie,100,100,100,100,100,100
Church and Wellesley,100,100,100,100,100,100
"Clarks Corners, Tam O'Shanter, Sullivan",40,40,40,40,40,40
Don Mills,44,44,44,44,44,44
"Dorset Park, Wexford Heights, Scarborough Town Centre",43,43,43,43,43,43
"Dufferin, Dovercourt Village",68,68,68,68,68,68
"East Toronto, Broadview North (Old East York)",99,99,99,99,99,99


In [23]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 241 uniques categories.


<h2>The Method</h2>

We now have a dataframe consisting of 24 unique neighbourhoods (true as of 4th Sept 2020) with their top 100 venues. We will now perform one-hot encoding on the data which is the first step in enabling us to perform k-means clustering on the data set.

In [24]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Animal Shelter,Antique Shop,Art Gallery,Arts & Crafts Store,...,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Women's Store,Yoga Studio
0,Scarborough Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Scarborough Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Scarborough Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Scarborough Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Scarborough Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<h3> Let's group rows by neighbourhood take the mean of the frequency of occurrence of each category </h3>

In [25]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Animal Shelter,Antique Shop,Art Gallery,Arts & Crafts Store,...,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,...,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Christie,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0
4,Church and Wellesley,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,...,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02


<h3> Let's print each neighborhood along with the top 5 most common venues </h3>

In [26]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                  venue  freq
0    Chinese Restaurant  0.14
1         Shopping Mall  0.09
2        Sandwich Place  0.05
3  Caribbean Restaurant  0.05
4                Bakery  0.05


----Brockton, Parkdale Village, Exhibition Place----
                    venue  freq
0                    Café  0.08
1             Coffee Shop  0.06
2                     Bar  0.05
3                  Bakery  0.04
4  Furniture / Home Store  0.04


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
              venue  freq
0              Café  0.13
1       Coffee Shop  0.13
2   Harbor / Marina  0.13
3    Scenic Lookout  0.07
4  Sculpture Garden  0.07


----Christie----
                venue  freq
0   Korean Restaurant  0.12
1                Café  0.07
2         Coffee Shop  0.07
3       Grocery Store  0.06
4  Mexican Restaurant  0.03


----Church and Wellesley----
                 venue  freq
0          Coffee Shop  0.10
1  Jap

<h3>A function to sort the venues in descending order.</h3>

In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

<h3>Now let's create the new dataframe and display the top 10 venues for each neighborhood.</h3>

In [28]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Shopping Mall,Sandwich Place,Caribbean Restaurant,Bakery,Coffee Shop,Pizza Place,Skating Rink,Seafood Restaurant,Cantonese Restaurant
1,"Brockton, Parkdale Village, Exhibition Place",Café,Coffee Shop,Bar,Bakery,Furniture / Home Store,Restaurant,Tibetan Restaurant,Gift Shop,Thrift / Vintage Store,Italian Restaurant
2,"CN Tower, King and Spadina, Railway Lands, Har...",Harbor / Marina,Coffee Shop,Café,Sculpture Garden,Airport,Airport Lounge,Dance Studio,Scenic Lookout,Dog Run,Track
3,Christie,Korean Restaurant,Coffee Shop,Café,Grocery Store,Mexican Restaurant,Cocktail Bar,Ice Cream Shop,Diner,Ethiopian Restaurant,Vegetarian / Vegan Restaurant
4,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Café,Restaurant,Italian Restaurant,Men's Store,Bookstore,Dance Studio


<h2>Use k-means clustering to cluster the neighbourhoods into 5 clusters! </h2>

In [29]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 7

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_ 

array([0, 5, 1, 5, 0, 5, 0, 0, 5, 5, 0, 6, 4, 5, 0, 3, 5, 2, 5, 5, 6, 5,
       5, 0])

<h3>Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.</h3>

In [30]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_pc

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head()# check the last columns!

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476,36699,6,Ice Cream Shop,Convenience Store,Train Station,Fast Food Restaurant,Bowling Alley,Sandwich Place,Restaurant,Grocery Store,Japanese Restaurant,Pizza Place
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,48434,3,Chinese Restaurant,Discount Store,Coffee Shop,Pizza Place,Fast Food Restaurant,Grocery Store,Asian Restaurant,Light Rail Station,Sandwich Place,Bank
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577,35081,4,Intersection,Bus Line,Diner,Coffee Shop,Bakery,Pharmacy,Pub,Bus Station,Soccer Field,Fast Food Restaurant
10,M1P,Scarborough,"Dorset Park, Wexford Heights, Scarborough Town...",43.75741,-79.273304,45571,0,Furniture / Home Store,Coffee Shop,Asian Restaurant,Light Rail Station,Fast Food Restaurant,Pharmacy,Chinese Restaurant,Indian Restaurant,Restaurant,Bowling Alley
12,M1S,Scarborough,Agincourt,43.7942,-79.262029,37769,0,Chinese Restaurant,Shopping Mall,Sandwich Place,Caribbean Restaurant,Bakery,Coffee Shop,Pizza Place,Skating Rink,Seafood Restaurant,Cantonese Restaurant


In [31]:
import folium # map rendering library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import matplotlib.cm as cm
import matplotlib.colors as colors

address = 'Toronto'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


<h2>Examine Clusters</h2>

Now we have separated the remaining neighbourhoods into 7 distinct clusters based on most popular venue, we should take a look at the categories of venues in each cluster and determine what makes them unique. This should aid us in making an accurate recommendation to the client. At a cursory glance, the map above might suggest that a neighbourhood in cluster 1 may be a good recommendation due to their close proximity to the centre of Toronto.

In [32]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Scarborough,45571,0,Furniture / Home Store,Coffee Shop,Asian Restaurant,Light Rail Station,Fast Food Restaurant,Pharmacy,Chinese Restaurant,Indian Restaurant,Restaurant,Bowling Alley
12,Scarborough,37769,0,Chinese Restaurant,Shopping Mall,Sandwich Place,Caribbean Restaurant,Bakery,Coffee Shop,Pizza Place,Skating Rink,Seafood Restaurant,Cantonese Restaurant
18,North York,58293,0,Coffee Shop,Clothing Store,Restaurant,Japanese Restaurant,Juice Bar,Bank,Bakery,Sandwich Place,Chocolate Shop,Movie Theater
22,North York,75897,0,Coffee Shop,Ramen Restaurant,Bubble Tea Shop,Korean Restaurant,Japanese Restaurant,Pizza Place,Fast Food Restaurant,Sandwich Place,Sushi Restaurant,Bank
27,North York,39153,0,Restaurant,Gym,Japanese Restaurant,Beer Store,Supermarket,Coffee Shop,Asian Restaurant,Middle Eastern Restaurant,Intersection,Italian Restaurant
42,East Toronto,32640,0,Indian Restaurant,Coffee Shop,Café,Beach,Park,Restaurant,Burrito Place,Bus Stop,Sandwich Place,Fast Food Restaurant
52,Downtown Toronto,30472,0,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Café,Restaurant,Italian Restaurant,Men's Store,Bookstore,Dance Studio


In [33]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,Downtown Toronto,49195,1,Harbor / Marina,Coffee Shop,Café,Sculpture Garden,Airport,Airport Lounge,Dance Studio,Scenic Lookout,Dog Run,Track


In [34]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
87,Etobicoke,37975,2,Mexican Restaurant,Park,Restaurant,Café,Pizza Place,Skating Rink,Breakfast Spot,Liquor Store,Fried Chicken Joint,Fast Food Restaurant


In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,48434,3,Chinese Restaurant,Discount Store,Coffee Shop,Pizza Place,Fast Food Restaurant,Grocery Store,Asian Restaurant,Light Rail Station,Sandwich Place,Bank


In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Scarborough,35081,4,Intersection,Bus Line,Diner,Coffee Shop,Bakery,Pharmacy,Pub,Bus Station,Soccer Field,Fast Food Restaurant


In [37]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Scarborough,34588,5,Fast Food Restaurant,Coffee Shop,Pharmacy,Bank,Sandwich Place,Intersection,Pizza Place,Convenience Store,Deli / Bodega,Noodle House
21,North York,32320,5,Korean Restaurant,Café,Pizza Place,Park,Middle Eastern Restaurant,Coffee Shop,Diner,Japanese Restaurant,Supermarket,Sushi Restaurant
40,East York,35738,5,Café,Coffee Shop,Greek Restaurant,Pizza Place,Convenience Store,Bakery,Ethiopian Restaurant,Pharmacy,Fast Food Restaurant,Beer Bar
41,East Toronto,31583,5,Greek Restaurant,Coffee Shop,Café,Pub,Italian Restaurant,Fast Food Restaurant,Bank,Bakery,Restaurant,Ramen Restaurant
53,Downtown Toronto,41078,5,Coffee Shop,Café,Diner,Pub,Theater,Park,Bakery,Restaurant,Breakfast Spot,Italian Restaurant
75,Downtown Toronto,32086,5,Korean Restaurant,Coffee Shop,Café,Grocery Store,Mexican Restaurant,Cocktail Bar,Ice Cream Shop,Diner,Ethiopian Restaurant,Vegetarian / Vegan Restaurant
76,West Toronto,44950,5,Coffee Shop,Café,Park,Italian Restaurant,Bar,Sushi Restaurant,Pharmacy,Brewery,Gourmet Shop,Bakery
77,West Toronto,32684,5,Café,Bar,Bakery,Restaurant,Italian Restaurant,Pizza Place,Coffee Shop,Asian Restaurant,Cocktail Bar,Vegetarian / Vegan Restaurant
78,West Toronto,40957,5,Café,Coffee Shop,Bar,Bakery,Furniture / Home Store,Restaurant,Tibetan Restaurant,Gift Shop,Thrift / Vintage Store,Italian Restaurant
82,West Toronto,40035,5,Café,Bar,Coffee Shop,Thai Restaurant,Sushi Restaurant,Grocery Store,Convenience Store,Italian Restaurant,Park,Cajun / Creole Restaurant


In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Scarborough,36699,6,Ice Cream Shop,Convenience Store,Train Station,Fast Food Restaurant,Bowling Alley,Sandwich Place,Restaurant,Grocery Store,Japanese Restaurant,Pizza Place
72,North York,28522,6,Grocery Store,Fast Food Restaurant,Pizza Place,Coffee Shop,Gas Station,Italian Restaurant,Furniture / Home Store,Park,Pub,Shoe Store


In [39]:
toronto_merged[toronto_merged["Cluster Labels"] == 0]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,M1P,Scarborough,"Dorset Park, Wexford Heights, Scarborough Town...",43.75741,-79.273304,45571,0,Furniture / Home Store,Coffee Shop,Asian Restaurant,Light Rail Station,Fast Food Restaurant,Pharmacy,Chinese Restaurant,Indian Restaurant,Restaurant,Bowling Alley
12,M1S,Scarborough,Agincourt,43.7942,-79.262029,37769,0,Chinese Restaurant,Shopping Mall,Sandwich Place,Caribbean Restaurant,Bakery,Coffee Shop,Pizza Place,Skating Rink,Seafood Restaurant,Cantonese Restaurant
18,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,58293,0,Coffee Shop,Clothing Store,Restaurant,Japanese Restaurant,Juice Bar,Bank,Bakery,Sandwich Place,Chocolate Shop,Movie Theater
22,M2N,North York,"Willowdale, Willowdale East",43.77012,-79.408493,75897,0,Coffee Shop,Ramen Restaurant,Bubble Tea Shop,Korean Restaurant,Japanese Restaurant,Pizza Place,Fast Food Restaurant,Sandwich Place,Sushi Restaurant,Bank
27,M3C,North York,Don Mills,43.7259,-79.340923,39153,0,Restaurant,Gym,Japanese Restaurant,Beer Store,Supermarket,Coffee Shop,Asian Restaurant,Middle Eastern Restaurant,Intersection,Italian Restaurant
42,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,32640,0,Indian Restaurant,Coffee Shop,Café,Beach,Park,Restaurant,Burrito Place,Bus Stop,Sandwich Place,Fast Food Restaurant
52,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,30472,0,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Café,Restaurant,Italian Restaurant,Men's Store,Bookstore,Dance Studio


<h2> References </h2>
<ol>
    <li> Statistics Canada. 2017. Population and dwelling counts, for Canada and forward sortation areas© as reported by the respondents, 2016 Census (table). Population and Dwelling Count Highlight Tables. 2016 Census. </li>
</ol>