# Segmenting and Clustering Neighborhoods in Toronto

This is the week 3 assignment for the Applied Data Science Capstone course.

### Part 1: Getting and cleaning the Neighborhoods of Toronto Data

To start, we must first download a table of the Neighborhoods of Toronto.
For this project we will be scraping the the data from the following web page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

We will use the _pandas_ library for scraping the table.

In [1]:
# import necessary libraries for scraping

import pandas as pd
import numpy as np

We will use the _pandas_ library to download and create a dataframe from the web page.

In [2]:
# define inputs
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

# use read_html function to read the table
url_df = pd.read_html(url)

# print the number of tables from the web page
print('The url has',len(url_df), 'tables.')

The url has 3 tables.


Since the web page has multiple tables, we must specify the specific table that we would like to use. In this instance we want to use the 1st table on the web page to get our list of Toronto Neighborhoods.

Additionally multiple Postal Codes in the tables have not been assigned a Borough or Neighborhood. We will remove those values from the data frame.

In [3]:
# specify the 1st table in the webpage to get the appropriate dataframe
tor_n_df = url_df[0]

# remove the 'Not assigned' Postal Codes from the data frame. Reset the index to the new dataframe.
tor_n_df = tor_n_df[tor_n_df.Borough != 'Not assigned'].reset_index(drop=True)
tor_n_df = tor_n_df[tor_n_df['Postal Code'] != 'M7R'].reset_index(drop=True)

tor_n_df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
97,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
98,M4Y,Downtown Toronto,Church and Wellesley
99,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
100,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


Now the _tor_n_df_ dataframe created above has a list of all Toronto Neighborhoods with an assigned postal code.

Below we will print the number of rows in the dataframe to confirm the shape.

In [4]:
tor_n_df.shape

(102, 3)

***

### Part 2: Adding Geo Coordinates to the Toronto Neighborhoods dataframe

In the second section of the assingment we will use the add the latitude and longitude coordinates for each neighborhood to the existing dataframe.

To do this we will use the _pgeocode_ python library which is able to lookup Canadian geo coordinates by postal codes.

In [5]:
# intstall and import pgeocode library
%pip install pgeocode
import pgeocode

print('Library installed')

Note: you may need to restart the kernel to use updated packages.
Library installed


Next we will use a while loop to determine the geo coordinates of each neighborhood using the geocoder library.

We will append the Latitude and Longitude for each Postal Code to the dataframe and then print out the new dataframe.

This will complete Part 2 of the assignment.

In [6]:
# Pull list of postal codes that we need geo coordinates for.
postal_codes = tor_n_df['Postal Code']

# initialize geo coordinate variable to add coordinates to.
latitude = []
longitude = []

# loop through the postal codes until you have coordinates for each neighborhood
for ps in postal_codes:
    nomi = pgeocode.Nominatim('ca')
    g = nomi.query_postal_code(ps)
    latitude.append(g.latitude)
    longitude.append(g.longitude)
    
# add Latitude and Longitude columns to the neighborhoods dataframe
tor_n_df['Latitude'] = latitude
tor_n_df['Longitude'] = longitude

# confirm the dataframe now was the appropriate geo coordinates
tor_n_df

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.3300
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889
...,...,...,...,...,...
97,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.6518,-79.5076
98,M4Y,Downtown Toronto,Church and Wellesley,43.6656,-79.3830
99,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.7804,-79.2505
100,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.6325,-79.4939


***

# Part 3: Clustering Neighborhoods - Gourmet vs Franchise coffee aficionados

This clustering exercise is going to compare neighborhoods based on Gourmet vs Franchise coffee aficionados. To do this we will be clustering Toronto neighborhoods based on the number of coffee shops they have.

To get started, install matplotlib and folium for map visualiation, geocoder to pull Toronto city coordinates, and ski-kit learn for K-Means clustering.

In [7]:
# import necessary libraries for remainder of the work for map visualization
import matplotlib.cm as cm
import matplotlib.colors as colors

%pip install folium
import folium

%pip install geopy
from geopy.geocoders import Nominatim

from sklearn.cluster import KMeans

import requests
from pandas.io.json import json_normalize

print('Libraries imported successfully.')

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Libraries imported successfully.


Now that we have all the imported libraries, we will use geocoder to find the geo coordinates for Toronto's city center. This will be used to initialize our folium map in Toronto.

In [8]:
# get latitude and longitude of the city of toronto
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent = 'tor_explorer')
location = geolocator.geocode(address)
tor_latitude = location.latitude
tor_longitude = location.longitude

print('{}, {}'.format(tor_latitude, tor_longitude))

43.6534817, -79.3839347


Next we will create the map of Toronto using folium and plot our existing neighborhood data.
To do this, we will use the geo coordinates added to the Toronto Neighborhoods dataframe created Part 2. We will add Borough and Neighborhood labels to our map plot.

In [9]:
# create map of Toronto using lat lon values
map_tor = folium.Map(location=[tor_latitude, tor_longitude], zoom_start = 10)

# add neighborhoods as markers to the map
for lat, lon, borough, neighborhood in zip(tor_n_df['Latitude'], tor_n_df['Longitude'], tor_n_df['Borough'], tor_n_df['Neighbourhood']):
    label = '{},{}'.format(borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        location=[lat,lon],
        radius = 5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.8,
        parse_html=False).add_to(map_tor)

map_tor

### Define credentials and version for Foursquare API
To get more details about what is available in each neighborhood we are going to use the foursquare api. 

In [10]:
# define variables needed to access Foursquare API
client_id = ''
client_secret = ''

### Getting a list of coffee shops for our neighborhoods

First we will build the API route that we will use to pull the coffee shop venue data. 

We will also test it on a single neighborhood before building out the dataframe for all of them.

In [12]:
# define variables for the API route to get neighborhood coffee shops
version = '20201209'
radius = 1500
limit = 100
categoryid = '4bf58dd8d48988d1e0931735'

# get geo coordinates for test neighborhood; we will use the first neighborhood in the toronto dataframe
n_latitude = tor_n_df.loc[5, 'Latitude']
n_longitude = tor_n_df.loc[5, 'Longitude']

# define API route
fs_api_GET = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
    client_id,
    client_secret,
    version,
    n_latitude,
    n_longitude,
    radius,
    limit,
    categoryid
    )

# call the Foursquare API route and print the results
n_cshops = requests.get(fs_api_GET).json()
n_cshops

{'meta': {'code': 200, 'requestId': '5fd6b0648cc76a4d451df53b'},
 'response': {'venues': [{'id': '4b4ccc3bf964a5204dbf26e3',
    'name': 'Second Cup',
    'location': {'address': 'Humbertown Plaza',
     'lat': 43.66227280880637,
     'lng': -79.5192342745383,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.66227280880637,
       'lng': -79.5192342745383}],
     'distance': 844,
     'cc': 'CA',
     'city': 'Etobicoke',
     'state': 'ON',
     'country': 'Canada',
     'formattedAddress': ['Humbertown Plaza', 'Etobicoke ON', 'Canada']},
    'categories': [{'id': '4bf58dd8d48988d1e0931735',
      'name': 'Coffee Shop',
      'pluralName': 'Coffee Shops',
      'shortName': 'Coffee Shop',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/coffeeshop_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1607905380',
    'hasPerk': False},
   {'id': '5a667251a92d980767bf3e11',
    'name': 'Starbucks',
    'location': {'address': '4242

Now let's create a function that defines a coffee shop as either Gourmet or Franchise.
- __Franchise:__ Starbucks, Tim Hortons, or Second Cup
- __Gourmet:__ Everything else

To start we will need to clean the venue data from Foursquare to pull the coffee shop with the name of the coffee shops and the geo coordinates.

In [13]:
# create object with list of coffee shop venues and transform json into dataframe
coffee_shops = n_cshops['response']['venues']
nearby_coffee_shops = json_normalize(coffee_shops)

# only keep relevant venue data
select_columns = ['name', 'location.lat', 'location.lng']
nearby_coffee_shops = nearby_coffee_shops.loc[:, select_columns]

# add new features for Franchise and Gourmet coffee shops based on the name of the coffee shops listed.
nearby_coffee_shops['Franchise'] = np.where(nearby_coffee_shops['name'] == ('Tim Hortons' or 'Starbucks' or 'Second Cup'), 1, 0)
nearby_coffee_shops['Gourmet'] = np.where(nearby_coffee_shops['name'] != ('Tim Hortons' or 'Starbucks' or 'Second Cup'), 1, 0)

# show output to ensure we get the right data table.
nearby_coffee_shops

  app.launch_new_instance()


Unnamed: 0,name,location.lat,location.lng,Franchise,Gourmet
0,Second Cup,43.662273,-79.519234,0,1
1,Starbucks,43.660005,-79.513883,0,1


Now that we have defined a way to get the coffee shops in 1.5km radius of a neighborhood and label the coffee shops as either 'Franchise' or 'Gourmet'.

Next we will need to do this for each neighborhood listed. To do so we will right a function that gets the coffee shop for each neighborhood and store it in a data frame.

In [14]:
def getNearbyCoffeeShops(names, latitudes, longitudes, radius = 1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        # create Foursquare API request for venues using same method as above
        fs_api_GET = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            client_id,
            client_secret,
            version,
            lat,
            lng,
            radius,
            limit,
            categoryid
            )
        
        # make the get request and store the results
        n_cshops = requests.get(fs_api_GET).json()['response']['venues']
        
        # return only the relevant information and store in the venues list
        venues_list.append([(
            name,
            lat,
            lng,
            v['name'],
            v['location']['lat'],
            v['location']['lng']) for v in n_cshops])
    
    # create dataframe that includes a list of each venue in a 1KM radius by neighborhood     
    nearby_coffee_shops = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_coffee_shops.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Coffee Shop Name', 'Venue Latitude', 'Venue Longitude']
    
    return(nearby_coffee_shops)

Now that we have the appropriate function to get the list of coffee shops by neighborhood and label, we can call the function on the Toronto Neighborhood list.

In [15]:
# call the getNearbyCoffeeShops function and store in new dataframe
toronto_coffee_shops = getNearbyCoffeeShops(tor_n_df['Neighbourhood'], tor_n_df['Latitude'], tor_n_df['Longitude'])



Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

Now lets add the 'Franchise' and 'Gourmet' label to the dataframe and print out the final result.

In [16]:
# label each venue as either 'Franchise' or 'Gourmet'
toronto_coffee_shops['Franchise'] = np.where(toronto_coffee_shops['Coffee Shop Name'] == 'Tim Hortons', 1, np.where(toronto_coffee_shops['Coffee Shop Name'] == 'Starbucks',1, np.where(toronto_coffee_shops['Coffee Shop Name'] == 'Second Cup',1,0)))
toronto_coffee_shops['Gourmet'] = np.where(toronto_coffee_shops['Franchise'] == 1, 0, 1)
    
# print the results to see the final dataframe
print(toronto_coffee_shops.shape)
toronto_coffee_shops

(2385, 8)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Coffee Shop Name,Venue Latitude,Venue Longitude,Franchise,Gourmet
0,Parkwoods,43.7545,-79.3300,Tim Hortons,43.760670,-79.326589,1,0
1,Parkwoods,43.7545,-79.3300,Tim Hortons,43.752814,-79.314067,1,0
2,Parkwoods,43.7545,-79.3300,Tim Hortons,43.740555,-79.323653,1,0
3,Parkwoods,43.7545,-79.3300,Starbucks,43.754199,-79.351382,1,0
4,Parkwoods,43.7545,-79.3300,Tim Hortons,43.755045,-79.351641,1,0
...,...,...,...,...,...,...,...,...
2380,"Mimico NW, The Queensway West, South of Bloor,...",43.6256,-79.5231,Reel Espresso Bar,43.629726,-79.528760,0,1
2381,"Mimico NW, The Queensway West, South of Bloor,...",43.6256,-79.5231,Tim Hortons,43.617767,-79.539474,1,0
2382,"Mimico NW, The Queensway West, South of Bloor,...",43.6256,-79.5231,Starbucks,43.638135,-79.537814,1,0
2383,"Mimico NW, The Queensway West, South of Bloor,...",43.6256,-79.5231,Starbucks,43.616190,-79.525714,1,0


Lets check the number of coffee shops for each neighborhood.

In [17]:
toronto_coffee_shops.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Coffee Shop Name,Venue Latitude,Venue Longitude,Franchise,Gourmet
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Agincourt,9,9,9,9,9,9,9
"Alderwood, Long Branch",13,13,13,13,13,13,13
"Bathurst Manor, Wilson Heights, Downsview North",6,6,6,6,6,6,6
Bayview Village,10,10,10,10,10,10,10
"Bedford Park, Lawrence Manor East",19,19,19,19,19,19,19
...,...,...,...,...,...,...,...
"Willowdale, Willowdale West",7,7,7,7,7,7,7
Woburn,10,10,10,10,10,10,10
Woodbine Heights,33,33,33,33,33,33,33
York Mills West,19,19,19,19,19,19,19


### Analyze each neighborhood by number of Gourmet vs Franchise coffee shops

As we already encoded each coffee shop as either 'Gourmet' or 'Franchise' earlier we will start by removing all columns except the neighborhood, Franchise, and Gourmet.

From there we will group the neighborhood rows by taking the mean frequency of occurrence of Franchise vs Gourmet coffee shops.

In [18]:
# remove all the columns except neighborhood, franchise, and gourmet
toronto_coffee_shops_encoded = toronto_coffee_shops.drop(labels = ['Neighborhood Latitude','Neighborhood Longitude', 'Coffee Shop Name', 'Venue Latitude', 'Venue Longitude'], axis = 1)

# examine the shape of the new dataframe
print(toronto_coffee_shops_encoded.shape)

# group neighborhoods by taking the mean frequency of coffee shops of Franchise vs Gourmet
toronto_coffee_shops_grouped = toronto_coffee_shops_encoded.groupby('Neighborhood').mean().reset_index()

toronto_coffee_shops_grouped

(2385, 3)


Unnamed: 0,Neighborhood,Franchise,Gourmet
0,Agincourt,0.444444,0.555556
1,"Alderwood, Long Branch",0.384615,0.615385
2,"Bathurst Manor, Wilson Heights, Downsview North",0.666667,0.333333
3,Bayview Village,0.600000,0.400000
4,"Bedford Park, Lawrence Manor East",0.789474,0.210526
...,...,...,...
92,"Willowdale, Willowdale West",0.571429,0.428571
93,Woburn,0.700000,0.300000
94,Woodbine Heights,0.242424,0.757576
95,York Mills West,0.578947,0.421053


__Confirm new size and we will now have the dataframe we will use in our clustering model__

In [19]:
# confirm the new size
toronto_coffee_shops_grouped.shape

(97, 3)

### Cluster Neighborhoods based on Franchise to Gourmet coffee shops

This section will walk through a k-means clustering model to cluster neighborhoods by Gourmet vs Franchise coffee shops.

We will start with 5 clusters to determine the different type of neighborhoods for Coffee Aficionados.

Run a _k-means_ cluster model to create the 5 clusters of Toronto Neighborhoods.

In [20]:
# set variables needed for the model
clusters = 5

toronto_coffee_shops_clustering = toronto_coffee_shops_grouped.drop('Neighborhood', 1)

# run k-means clustering model
kmeans = KMeans(n_clusters = clusters, random_state=0).fit(toronto_coffee_shops_clustering)

# checke cluster labels generated for each row in the in the toronto_coffee_shops dataframe
print(kmeans.labels_[0:97])

# add cluster labels to neighborhood dataframe
toronto_coffee_shops_grouped.insert(0,'Cluster', kmeans.labels_)

[4 4 3 0 1 3 1 2 0 0 2 4 0 4 0 3 0 3 0 0 2 0 1 0 2 4 0 0 3 3 3 3 1 3 0 2 1
 3 3 4 4 1 1 0 3 3 1 1 2 3 1 3 4 2 3 0 1 0 1 2 4 0 0 0 3 4 3 1 4 4 3 1 3 0
 3 3 2 4 0 2 4 1 1 3 0 1 3 4 0 1 4 0 0 3 2 0 0]


Now that we have the cluster labels for the Neighborhood data frame we can do some final modifications to examine and the map with different neighborhood clusters.

In [21]:
# create data frame that adds cluster data onto the existing tor_n_df dataframe that has the geo coordinates of each neighborhood
toronto_coffee_shops_cluster_df = tor_n_df

# merge the data together
toronto_coffee_shops_cluster_df = toronto_coffee_shops_cluster_df.join(toronto_coffee_shops_grouped.set_index('Neighborhood'), on='Neighbourhood')

# check to see if the merged table was successful
toronto_coffee_shops_cluster_df

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster,Franchise,Gourmet
0,M3A,North York,Parkwoods,43.7545,-79.3300,0.0,0.615385,0.384615
1,M4A,North York,Victoria Village,43.7276,-79.3148,1.0,0.769231,0.230769
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626,0.0,0.540000,0.460000
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504,3.0,0.652174,0.347826
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889,0.0,0.580000,0.420000
...,...,...,...,...,...,...,...,...
97,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.6518,-79.5076,1.0,0.750000,0.250000
98,M4Y,Downtown Toronto,Church and Wellesley,43.6656,-79.3830,0.0,0.600000,0.400000
99,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.7804,-79.2505,0.0,0.550000,0.450000
100,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.6325,-79.4939,1.0,0.875000,0.125000


One of the Neighborhoods returned no coffee shops and has a value of NA as a result. We will drop that neighborhood from the dataframe as it won't be part of the clusters we show.

We will then change the cluster column from a float to an int, which will be needed layer for creating the folium map.

In [22]:
# remove rows with NaN values
toronto_coffee_shops_cluster_df = toronto_coffee_shops_cluster_df.dropna()

# convert 'Cluster' column from float to integer to be used in plotting the map later
toronto_coffee_shops_cluster_df['Cluster'] = pd.to_numeric(toronto_coffee_shops_cluster_df['Cluster'], downcast = 'integer')

# check the shape of the remaining dataframe
toronto_coffee_shops_cluster_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster,Franchise,Gourmet
0,M3A,North York,Parkwoods,43.7545,-79.3300,0,0.615385,0.384615
1,M4A,North York,Victoria Village,43.7276,-79.3148,1,0.769231,0.230769
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626,0,0.540000,0.460000
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504,3,0.652174,0.347826
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889,0,0.580000,0.420000
...,...,...,...,...,...,...,...,...
97,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.6518,-79.5076,1,0.750000,0.250000
98,M4Y,Downtown Toronto,Church and Wellesley,43.6656,-79.3830,0,0.600000,0.400000
99,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.7804,-79.2505,0,0.550000,0.450000
100,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.6325,-79.4939,1,0.875000,0.125000


### Now that we have our cluster data set, we can visualize on a map

In [23]:
# create the map
toronto_map_cs = folium.Map(location = [tor_latitude, tor_longitude], zoom_start=11)

# set the color scheme for the clusters
x = np.arange(clusters)
ys = [i + x + (i*x)**2 for i in range(clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add neighborhood markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_coffee_shops_cluster_df['Latitude'], toronto_coffee_shops_cluster_df['Longitude'], toronto_coffee_shops_cluster_df['Neighbourhood'], toronto_coffee_shops_cluster_df['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), partse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius = 5,
        popup = label,
        color = rainbow[cluster-1],
        fill = True,
        fill_color = rainbow[cluster-1],
        fill_opacity = 0.8).add_to(toronto_map_cs)

toronto_map_cs

### Examine the clusters

Now we will examine each cluster by the proportion of the Franchise to Gourmet coffee shops and assign a "Coffee Aficiono" neighborhood type for each cluster respectively

__Cluster 1: "Balanced"__

Neighborhoods in this cluster had the more even balance between Franchise and Gourmet coffee shops.

In [24]:
toronto_coffee_shops_cluster_df.loc[toronto_coffee_shops_cluster_df['Cluster'] == 0]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster,Franchise,Gourmet
0,M3A,North York,Parkwoods,43.7545,-79.33,0,0.615385,0.384615
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626,0,0.54,0.46
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889,0,0.58,0.42
7,M3B,North York,Don Mills,43.745,-79.359,0,0.6,0.4
13,M3C,North York,Don Mills,43.7334,-79.3329,0,0.6,0.4
17,M9C,Etobicoke,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",43.6437,-79.5767,0,0.571429,0.428571
24,M5G,Downtown Toronto,Central Bay Street,43.6564,-79.386,0,0.62,0.38
33,M2J,North York,"Fairview, Henry Farm, Oriole",43.7801,-79.3479,0,0.571429,0.428571
34,M3J,North York,"Northwood Park, York University",43.7694,-79.4921,0,0.625,0.375
36,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.623,-79.3936,0,0.545455,0.454545


__Cluster 2: "Franchise Aficionados"__

Neighborhoods in this cluster favored a higher proportion of Franchise coffee shops.

In [25]:
toronto_coffee_shops_cluster_df.loc[toronto_coffee_shops_cluster_df['Cluster'] == 1]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster,Franchise,Gourmet
1,M4A,North York,Victoria Village,43.7276,-79.3148,1,0.769231,0.230769
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.6662,-79.5282,1,1.0,0.0
12,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7878,-79.1564,1,0.8,0.2
23,M4G,East York,Leaside,43.7124,-79.3644,1,0.789474,0.210526
27,M2H,North York,Hillcrest Village,43.8015,-79.3577,1,0.769231,0.230769
29,M4H,East York,Thorncliffe Park,43.7059,-79.3464,1,0.785714,0.214286
38,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.7298,-79.2639,1,0.8,0.2
44,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.7122,-79.2843,1,0.833333,0.166667
55,M5M,North York,"Bedford Park, Lawrence Manor East",43.7335,-79.4177,1,0.789474,0.210526
58,M1N,Scarborough,"Birch Cliff, Cliffside West",43.6952,-79.2646,1,0.75,0.25


__Cluster 3: "Gourmet Aficionados"__

Neighborhoods in this cluster had the highest proportion of Gourmet Coffee Shops.

In [26]:
toronto_coffee_shops_cluster_df.loc[toronto_coffee_shops_cluster_df['Cluster'] == 2]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster,Franchise,Gourmet
14,M4C,East York,Woodbine Heights,43.6913,-79.3116,2,0.242424,0.757576
19,M4E,East Toronto,The Beaches,43.6784,-79.2941,2,0.275,0.725
21,M6E,York,Caledonia-Fairbanks,43.6889,-79.4507,2,0.315789,0.684211
31,M6H,West Toronto,"Dufferin, Dovercourt Village",43.6655,-79.4378,2,0.22449,0.77551
37,M6J,West Toronto,"Little Portugal, Trinity",43.648,-79.4177,2,0.3,0.7
43,M6K,West Toronto,"Brockton, Parkdale Village, Exhibition Place",43.6383,-79.4301,2,0.265306,0.734694
54,M4M,East Toronto,Studio District,43.6561,-79.3406,2,0.342105,0.657895
56,M6M,York,"Del Ray, Mount Dennis, Keelsdale and Silverthorn",43.6934,-79.4857,2,0.333333,0.666667
69,M6P,West Toronto,"High Park, The Junction South",43.6605,-79.4633,2,0.23913,0.76087
75,M6R,West Toronto,"Parkdale, Roncesvalles",43.6469,-79.4521,2,0.210526,0.789474


__Cluster 4: "Franchise Leaning"__

This cluster leaned to having a higher proportion of Franchise coffee shops.

In [27]:
toronto_coffee_shops_cluster_df.loc[toronto_coffee_shops_cluster_df['Cluster'] == 3]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster,Franchise,Gourmet
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504,3,0.652174,0.347826
6,M1B,Scarborough,"Malvern, Rouge",43.8113,-79.193,3,0.666667,0.333333
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783,3,0.68,0.32
10,M6B,North York,Glencairn,43.7081,-79.4479,3,0.714286,0.285714
11,M9B,Etobicoke,"West Deane Park, Princess Gardens, Martin Grov...",43.6505,-79.5517,3,0.714286,0.285714
15,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756,3,0.64,0.36
18,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7678,-79.1866,3,0.666667,0.333333
20,M5E,Downtown Toronto,Berczy Park,43.6456,-79.3754,3,0.68,0.32
22,M1G,Scarborough,Woburn,43.7712,-79.2144,3,0.7,0.3
28,M3H,North York,"Bathurst Manor, Wilson Heights, Downsview North",43.7535,-79.4472,3,0.666667,0.333333


__Cluster 5: "Gourmet Leaning"__

This cluster had more coffee shops leaning torward Gourmet coffee shops.

In [28]:
toronto_coffee_shops_cluster_df.loc[toronto_coffee_shops_cluster_df['Cluster'] == 4]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster,Franchise,Gourmet
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.7063,-79.3094,4,0.428571,0.571429
16,M6C,York,Humewood-Cedarvale,43.6915,-79.4307,4,0.4,0.6
25,M6G,Downtown Toronto,Christie,43.6683,-79.4205,4,0.387755,0.612245
26,M1H,Scarborough,Cedarbrae,43.7686,-79.2389,4,0.5,0.5
35,M4J,East York,"East Toronto, Broadview North (Old East York)",43.6872,-79.3368,4,0.424242,0.575758
41,M4K,East Toronto,"The Danforth West, Riverdale",43.6803,-79.3538,4,0.371429,0.628571
47,M4L,East Toronto,"India Bazaar, The Beaches West",43.6693,-79.3155,4,0.416667,0.583333
52,M2M,North York,"Willowdale, Newtonbrook",43.7915,-79.4103,4,0.416667,0.583333
63,M6N,York,"Runnymede, The Junction North",43.6748,-79.4839,4,0.454545,0.545455
70,M9P,Etobicoke,Westmount,43.6949,-79.5323,4,0.4,0.6


# Add Cluster name labels to the map

Now that we have named each cluster based on the proportion of 'Franchise' to 'Gourmet' coffee shops, lets add the cluster label names to the map so it is easier to tell what each neighborhoods Coffee Aficionado is

In [29]:
# add a new column to the dataframe with Cluster Names
toronto_coffee_shops_cluster_df['Cluster Name'] = np.where(toronto_coffee_shops_cluster_df['Cluster'] == 0, "Balanced", np.where(toronto_coffee_shops_cluster_df['Cluster'] == 1,"Franchise Aficionados", np.where(toronto_coffee_shops_cluster_df['Cluster'] == 2,"Gourmet Aficionados",np.where(toronto_coffee_shops_cluster_df['Cluster'] == 3,"Franchise Leaning","Gourmet Leaning"))))

# print cluster
toronto_coffee_shops_cluster_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  from ipykernel import kernelapp as app


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster,Franchise,Gourmet,Cluster Name
0,M3A,North York,Parkwoods,43.7545,-79.3300,0,0.615385,0.384615,Balanced
1,M4A,North York,Victoria Village,43.7276,-79.3148,1,0.769231,0.230769,Franchise Aficionados
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626,0,0.540000,0.460000,Balanced
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504,3,0.652174,0.347826,Franchise Leaning
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889,0,0.580000,0.420000,Balanced
...,...,...,...,...,...,...,...,...,...
97,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.6518,-79.5076,1,0.750000,0.250000,Franchise Aficionados
98,M4Y,Downtown Toronto,Church and Wellesley,43.6656,-79.3830,0,0.600000,0.400000,Balanced
99,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.7804,-79.2505,0,0.550000,0.450000,Balanced
100,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.6325,-79.4939,1,0.875000,0.125000,Franchise Aficionados


### Add Cluster Names to Map

Lastly for the project we will label the Cluster Name to the map to get visualize the clusters by the Franchise vs Gourmet Coffee Shops

In [30]:
# create the map
toronto_map_cs = folium.Map(location = [tor_latitude, tor_longitude], zoom_start=11)

# set the color scheme for the clusters
x = np.arange(clusters)
ys = [i + x + (i*x)**2 for i in range(clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add neighborhood markers to the map
markers_colors = []
for lat, lon, poi, cluster, cluster_name in zip(toronto_coffee_shops_cluster_df['Latitude'], toronto_coffee_shops_cluster_df['Longitude'], toronto_coffee_shops_cluster_df['Neighbourhood'], toronto_coffee_shops_cluster_df['Cluster'], toronto_coffee_shops_cluster_df['Cluster Name']):
    label = folium.Popup(str(poi) + ' Cluster: ' + str(cluster_name), partse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius = 5,
        popup = label,
        color = rainbow[cluster-1],
        fill = True,
        fill_color = rainbow[cluster-1],
        fill_opacity = 0.8).add_to(toronto_map_cs)

toronto_map_cs

# And now we are done. Thank you!