# Neighborhood Clustering Based on The Number of Healthcare Facilities in Toronto, Canada

## Part of IBM Data Science Capstone by Coursera

In this project, we are going to do neighborhoods clustering in Toronto, Canada based on the number of healthcare facilities located nearby its neighborhood centre. The analysis will cover these stages:
* Collecting neighborhoods information
* Collecting location of healthcare facilities in Toronto
* Create clusters of neighborhood based on the number of healthcare located nearby

## Neighborhood Data Collection

We will first extract the iformation neighborhoods, boroughs, and postal codes in Toronto. This data is provided by a Wikipedia page, we will use pandas to extract it and make a dataframe.

In [1]:
import pandas as pd
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

html_df = pd.read_html(url)

df_tor = html_df[0]
df_tor.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


It appears that some of postal codes are not assigned to any neighborhood, we will see how many of it.

In [2]:
no_borough = df_tor[df_tor['Borough'] == 'Not assigned'].shape[0]

print('{} postal codes have not assigned to any borough'.format(no_borough))

77 postal codes have not assigned to any borough


We got 77 postal codes unassigned. Now, let's see if we have boroughs that do not have postal code assigned.

In [3]:
no_assigned_neigh = df_tor[(df_tor['Borough'] == 'Not assigned') & (df_tor['Neighbourhood'] != 'Not assigned')].shape[0]

if no_assigned_neigh > 0:
    print('{} postal codes have borough without assigned neighborhood')
else:
    print('There is no postal code that has borough without assigned neighborhood')

There is no postal code that has borough without assigned neighborhood


All boroughs have postal codes assigned. Next, we will clean our dataframe by removing records of unassigned postal codes

In [4]:
df_tor = df_tor[df_tor['Borough'] != 'Not assigned'].reset_index(drop = True)
print('Number of neighborhoods = {}'.format(df_tor.shape[0]))
df_tor.head()

Number of neighborhoods = 103


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


We have cleaned our data. We will proceed to extract the coordinates of neighborhood centres.

In [5]:
ref_url = 'http://cocl.us/Geospatial_data'

df_ref = pd.read_csv(ref_url)
df_ref.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [6]:
df_tor = pd.merge(df_tor,
                df_ref,
                left_on = 'Postal Code',
                right_on = 'Postal Code',
                how = 'left')
                
df_tor.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


Our neighborhoods dataframe is all set. Let's see how many records do we have on our dataframe

In [7]:
print('The dataframe has {} rows'.format(df_tor.shape[0]))

The dataframe has 103 rows


So far we have 103 records of our dataframe. It is time to visualize our neighborhoods location on a map

In [8]:
import folium 

tor_lat = 43.6532
tor_long = -79.3832

postal = df_tor['Postal Code']
borough = df_tor['Borough']
neigh = df_tor['Neighbourhood']
neigh_lat = df_tor['Latitude']
neigh_lng = df_tor['Longitude']

tor_map = folium.Map(location = [tor_lat, tor_long], zoom_start = 12)

In [9]:
for postal, borough, neigh, lat, lng in zip(
    df_tor['Postal Code'], 
    df_tor['Borough'], 
    df_tor['Neighbourhood'], 
    df_tor['Latitude'], 
    df_tor['Longitude']
):
    label = '{}, {}, {}'.format(postal, borough, neigh)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False
    ).add_to(tor_map)

tor_map

Looking good. We are done with neighborhood data collection Time to move on to the next stage.

## Healthcare Facilities Data Collecting

In this stage, we will extract the location of healthcare facilities within the area of Toronto, we will utilize Places Search API provided by Here Map.

More information for using this API can be foud in this page https://developer.here.com/documentation/places/dev_guide/topics/explore-nearby-places.html

First, we will create a function to explore particular places in an area of observation. This function will require these inputs:
* Here API key
* Centre coordinate in latitude and longitude
* Search radius
* Search category 

Then, the function will return:
* Location's name
* Location's category
* Location's coordinate

In [10]:
import requests
from urllib.parse import urlencode

here_api_key = 'Your Here API Key'

def explore_here(latitude, longitude, radius, category):
    
    endpointhere = 'https://places.ls.hereapi.com/places/v1/discover/explore'
    paramshere = {
        'apikey' : here_api_key,
        'in' : '{},{};r={}'.format(latitude, longitude, radius),
        'cat' : '{}'.format(category)
    }
    urlhereparams = '{}&pretty'.format(urlencode(paramshere))
    urlhere = '{}?{}'.format(endpointhere, urlhereparams)

    place_name = []
    category = []
    latitude = []
    longitude = []

    try: 
        result = requests.get(urlhere).json()['results']
        result_item = result['items']

        if 'next' in result.keys():
            while 'next' in result.keys():
                next_url = result['next']
                result = requests.get(next_url).json()
                # next_result = result['results']
                next_result_item = result['items']
                result_item = result_item + next_result_item

        if len(result_item) > 0:
            try:
                for i in range(0, len(result_item)):
                    place_name.append(result_item[i]['title'])
                    category.append(result_item[i]['category']['title'])
                    latitude.append(result_item[i]['position'][0])
                    longitude.append(result_item[i]['position'][1])
            except:
                pass
    
    except:
        pass

    return {
        'Place Name' : place_name,
        'Category' : category,
        'Latitude' : latitude,
        'Longitude' : longitude
    }

We will then use this function to search for heathcare facilities. the category shall be used is "hospital-health-care-facility". We will hover each neighborhoods centre and do the search within the radius of 5000 meters. This is to ensure we cover all area within the city of Toronto.

More information about available categories are explained on this page https://developer.here.com/documentation/places/dev_guide/topics/categories.html

In [11]:
facility_name = []
location_category = []
location_latitude = []
location_longitude = []

for lat, lng in zip(df_tor['Latitude'], df_tor['Longitude']):
    search_result = explore_here(lat, lng, 5000, 'hospital-health-care-facility')

    for i in range(0, len(search_result['Place Name'])):
        facility_name.append(search_result['Place Name'][i])
        location_category.append(search_result['Category'][i])
        location_latitude.append(search_result['Latitude'][i])
        location_longitude.append(search_result['Longitude'][i])

exctracted_facilities = {
    'Facility Name' : facility_name,
    'Category' : location_category,
    'Latitude' : location_latitude,
    'Longitude' : location_longitude
}

print('Healthcare facilities data has been extracted')

Healthcare facilities data has been extracted


Our search has been completed, time to convert it into dataframe

In [12]:
df_hospitals = pd.DataFrame(exctracted_facilities)
df_hospitals.head()

Unnamed: 0,Facility Name,Category,Latitude,Longitude
0,Orangebloom Therapy & Counselling,Hospital or Healthcare Facility,43.75124,-79.32977
1,Kidspeech,Hospital or Healthcare Facility,43.75298,-79.33497
2,Bodywork Massage and Wellness,Hospital or Healthcare Facility,43.748588,-79.331838
3,Lidia Damian Psychotherapy,Business & Services,43.75508,-79.32211
4,Parkwood Medical Centre,Hospital or Healthcare Facility,43.7604,-79.32703


Not done yet, we will make sure that our search returns relevant places to our project goal. We will explore the categories within our dataframe.

In [13]:
df_hospitals['Category'].unique()

array(['Hospital or Healthcare Facility', 'Business & Services',
       'Service', 'Government or Community Facility',
       'Educational Facility', 'Hospital', 'Shop', 'Sport Facility/Venue',
       'Clothing & Accessories', 'Food & Drink', 'Facility', "Chemist's",
       'Recreation', 'Communications/Media', 'Theatre, Music & Culture',
       'Outdoor Sports', 'DIY/garden centre', 'Shopping Centre',
       'Travel Agency', 'Sights & Museums', 'Car Dealer/Repair'],
      dtype=object)

Seems like we have many location categories that we don't need. as we are looking for healthcare facilities data, we will limit the dataframe only for locations with these categories:
* Hospital or Healthcare Facility
* Hospital

In [14]:
df_hospitals = df_hospitals[
    (df_hospitals['Category'] == 'Hospital or Healthcare Facility') | (df_hospitals['Category'] == 'Hospital')
    ]
df_hospitals.reset_index(drop = True, inplace = True)

df_hospitals.head()

Unnamed: 0,Facility Name,Category,Latitude,Longitude
0,Orangebloom Therapy & Counselling,Hospital or Healthcare Facility,43.75124,-79.32977
1,Kidspeech,Hospital or Healthcare Facility,43.75298,-79.33497
2,Bodywork Massage and Wellness,Hospital or Healthcare Facility,43.748588,-79.331838
3,Parkwood Medical Centre,Hospital or Healthcare Facility,43.7604,-79.32703
4,Tina Elio Anti-Aging & Laser Clinic,Hospital or Healthcare Facility,43.74499,-79.325865


We have trimmed our data, but it loks like we have places that don't really as same as clinics or hospitals. We will do further trimming by filtering all locations with word clinic, hospital, or healthcare in its name. We will then see how many records do we have.

In [15]:
df_hospitals = df_hospitals[
    (df_hospitals['Facility Name'].str.contains('Clinic')) | 
    df_hospitals['Facility Name'].str.contains('Hospital') |
    df_hospitals['Facility Name'].str.contains('Healthcare')
    ]

df_hospitals.drop_duplicates(subset = ['Facility Name'], keep = 'first', inplace = True)
df_hospitals.reset_index(drop = True, inplace = True)

print('Number of records for healthcare facilities = {}'.format(df_hospitals.shape[0]))
df_hospitals.head()

Number of records for healthcare facilities = 628


Unnamed: 0,Facility Name,Category,Latitude,Longitude
0,Tina Elio Anti-Aging & Laser Clinic,Hospital or Healthcare Facility,43.74499,-79.325865
1,Cassandra Clinic,Hospital or Healthcare Facility,43.752721,-79.313589
2,Don Mills Denture Clinic,Hospital or Healthcare Facility,43.75892,-79.31494
3,Physioworx Physiotherapy Clinic,Hospital or Healthcare Facility,43.752393,-79.313503
4,Walking Mobility Clinic (North York),Hospital or Healthcare Facility,43.74714,-79.34624


All good, we have 628 healthcare facilities spreaded out within our area of observation. Let's plot it to the map.

In [16]:
tor_hospital_map = folium.Map(location = [tor_lat, tor_long], zoom_start = 12)

for name, lat, lng in zip(df_hospitals['Facility Name'], df_hospitals['Latitude'],df_hospitals['Longitude']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 3,
        popup = label,
        color = 'red',
        fill = True,
        fill_color = 'red',
        fill_opacity = 1,
        parse_html = False
    ).add_to(tor_hospital_map)

for postal, borough, neigh, lat, lng in zip(
    df_tor['Postal Code'], 
    df_tor['Borough'], 
    df_tor['Neighbourhood'], 
    df_tor['Latitude'], 
    df_tor['Longitude']
):
    label = '{}, {}, {}'.format(postal, borough, neigh)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False
    ).add_to(tor_hospital_map)

tor_hospital_map

Looking good, we will create a heatmap for it.

In [17]:
from folium import plugins
from folium.plugins import HeatMap

tor_hospital_heatmap = folium.Map(location = [tor_lat, tor_long], zoom_start = 12)

heat_df_hospitals = df_hospitals[['Latitude', 'Longitude']]
heat_data = [[row['Latitude'], row['Longitude']] for index, row in heat_df_hospitals.iterrows()]
HeatMap(heat_data).add_to(tor_hospital_heatmap)

for postal, borough, neigh, lat, lng in zip(
    df_tor['Postal Code'], 
    df_tor['Borough'], 
    df_tor['Neighbourhood'], 
    df_tor['Latitude'], 
    df_tor['Longitude']
):
    label = '{}, {}, {}'.format(postal, borough, neigh)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False
    ).add_to(tor_hospital_heatmap)

tor_hospital_heatmap

Seems like There are more healthcare facilities located in the southern area of Toronto. But we're not there yet. We will continue to the next stage

## Neighborhood Clustering

In this stage we will create clusters of neighborhood basedon the number of healthcare facilities that located nearby the centre. The process will include distance calculating between each facilities to each neighborhood centre. After having count of facilities, we will create a K-Means Clusterin model to create clusters.

Before we can calculate disttance between each neighborhood centre to each facilities, we will need to transform the coordinates first from WGS81 projection inti UTM projection. The consideration is made because measuring euclidean distance will be best if it is calculated based on a 2D coordinate (X abd Y).

In [18]:
import shapely
import pyproj
import math
import utm

def WGS84_to_UTM(lng, lat):
    proj_degree = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_utm = pyproj.Proj(proj="utm", zone=48, datum='WGS84')
    UTM_coord = pyproj.transform(proj_degree, proj_utm, lng, lat)
    return UTM_coord[0], UTM_coord[1]

def get_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt((dx * dx) + (dy * dy))

After creating the function, we will use it to generate X ad Y coordinate to our neighborhoods dataframe.

In [19]:
neigh_x = []
neigh_y = []

for lat, lng in zip(df_tor['Latitude'], df_tor['Longitude']):
    UTM_coord = WGS84_to_UTM(lng, lat)
    X = UTM_coord[0]
    Y = UTM_coord[1]
    neigh_x.append(X)
    neigh_y.append(Y)

df_tor['X'] = neigh_x
df_tor['Y'] = neigh_y
df_tor.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,X,Y
0,M3A,North York,Parkwoods,43.753259,-79.329656,848576.452374,15142340.0
1,M4A,North York,Victoria Village,43.725882,-79.315572,847601.053538,15145440.0
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,851650.155783,15153210.0
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,859663.027114,15145620.0
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,853930.191333,15152190.0


And to the healthcare data as well.

In [20]:
loc_x = []
loc_y = []

for lat, lng in zip(df_hospitals['Latitude'], df_hospitals['Longitude']):
    UTM_coord = WGS84_to_UTM(lng, lat)
    X = UTM_coord[0]
    Y = UTM_coord[1]
    loc_x.append(X)
    loc_y.append(Y)

df_hospitals['X'] = loc_x
df_hospitals['Y'] = loc_y
df_hospitals.head()

Unnamed: 0,Facility Name,Category,Latitude,Longitude,X,Y
0,Tina Elio Anti-Aging & Laser Clinic,Hospital or Healthcare Facility,43.74499,-79.325865,848319.220058,15143280.0
1,Cassandra Clinic,Hospital or Healthcare Facility,43.752721,-79.313589,847285.881532,15142470.0
2,Don Mills Denture Clinic,Hospital or Healthcare Facility,43.75892,-79.31494,847358.728064,15141780.0
3,Physioworx Physiotherapy Clinic,Hospital or Healthcare Facility,43.752393,-79.313503,847280.857552,15142510.0
4,Walking Mobility Clinic (North York),Hospital or Healthcare Facility,43.74714,-79.34624,849947.405179,15142950.0


After we have all coordinates converted, we will begin to count number of hospital nearby. We will consider each facility is nearby if the distance between its location and the neighborhood centre is less than 2.5 kilometres.

In [21]:
count = []

for neigh_x, neigh_y in zip(df_tor['X'], df_tor['Y']):
    n = 0

    for loc_x, loc_y in zip(df_hospitals['X'], df_hospitals['Y']):
        if get_distance(neigh_x, neigh_y, loc_x, loc_y) <= 2500:
            n = n + 1

    count.append(n)

df_tor['Hospitals Nearby'] = count
df_tor.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,X,Y,Hospitals Nearby
0,M3A,North York,Parkwoods,43.753259,-79.329656,848576.452374,15142340.0,22
1,M4A,North York,Victoria Village,43.725882,-79.315572,847601.053538,15145440.0,19
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,851650.155783,15153210.0,75
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,859663.027114,15145620.0,14
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,853930.191333,15152190.0,108


All good, now we have counted the total number of healthcare facilities for each neighborhood. These numbers will be used as a single feature in our clustering analysis.

We will start with scaling our feature value.

In [22]:
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler

In [23]:
scaler = MinMaxScaler()

feature = df_tor['Hospitals Nearby'].values.reshape(-1, 1)
feature = scaler.fit_transform(feature)
feature[0:10]

array([[0.19642857],
       [0.16964286],
       [0.66964286],
       [0.125     ],
       [0.96428571],
       [0.08035714],
       [0.03571429],
       [0.08928571],
       [0.16071429],
       [0.84821429]])

Then, we will create a K-Means clustering model with total 5 numbers of clusters.

In [24]:
kmeans = KMeans(n_clusters = 5, random_state = 10)
kmeans.fit(feature)

df_tor['Cluster'] = kmeans.labels_ + 1
df_tor.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,X,Y,Hospitals Nearby,Cluster
0,M3A,North York,Parkwoods,43.753259,-79.329656,848576.452374,15142340.0,22,3
1,M4A,North York,Victoria Village,43.725882,-79.315572,847601.053538,15145440.0,19,3
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,851650.155783,15153210.0,75,2
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,859663.027114,15145620.0,14,1
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,853930.191333,15152190.0,108,5


Looking good, as seen on the previous data, each neighborhood has been assigned to its respective cluster. Now we will see the characteristics of each clusters by calculating minimum, mean, and maximum number of healthcare facilities located nearby the neighborhood within its cluster.

In [25]:
cluster_dict = {}
clust = []
total_neigh = []
min_hospitals = []
mean_hospitals = []
max_hospitals = []

for i in range(1, max(df_tor['Cluster'] + 1)):
    cluster_i = df_tor[df_tor['Cluster'] == i].reset_index(drop = True)
    clust.append(i)
    total_neigh.append(cluster_i.shape[0])
    min_hospitals.append(cluster_i['Hospitals Nearby'].min())
    mean_hospitals.append(int(cluster_i['Hospitals Nearby'].mean()))
    max_hospitals.append(cluster_i['Hospitals Nearby'].max())

    cluster_dict['Cluster {}'.format(i)] = cluster_i

clust_summary = {
    'Cluster' : clust,
    'Total Neighborhoods' : total_neigh,
    'Minimum Hospitals Nearby' : min_hospitals,
    'Average Hospitals Nearby' : mean_hospitals,
    'Maximum Hospitals Nearby' : max_hospitals
}


In [26]:
df_clust_summary = pd.DataFrame(clust_summary)
df_clust_summary

Unnamed: 0,Cluster,Total Neighborhoods,Minimum Hospitals Nearby,Average Hospitals Nearby,Maximum Hospitals Nearby
0,1,39,0,9,15
1,2,12,73,84,95
2,3,36,17,23,31
3,4,11,33,41,59
4,5,5,102,106,112


Now we have figured out the difference between each cluster to one another. But it seems like we don't have a neat cluster's label, we will reorder it by the average hospitals nearby value. So the cluster 1 will have the least average number and cluster 5 will have the most.

In [27]:
df_clust_summary.sort_values(by = ['Average Hospitals Nearby'], inplace = True)
df_clust_summary.reset_index(drop = True, inplace = True)
df_clust_summary['Cluster'] = list(df_clust_summary.index + 1)
df_clust_summary

Unnamed: 0,Cluster,Total Neighborhoods,Minimum Hospitals Nearby,Average Hospitals Nearby,Maximum Hospitals Nearby
0,1,39,0,9,15
1,2,36,17,23,31
2,3,11,33,41,59
3,4,12,73,84,95
4,5,5,102,106,112


All good, we have reordered the clusters. Now, let's reassign all neighborhoods to new clusters based on criterias above.

In [28]:
new_clust = []

for i in range(0, df_tor.shape[0]):
    n_hosp = df_tor['Hospitals Nearby'][i]

    for j in range(0, df_clust_summary.shape[0]):

        clust = df_clust_summary['Cluster'][j]
        n_min = df_clust_summary['Minimum Hospitals Nearby'][j] 
        n_max = df_clust_summary['Maximum Hospitals Nearby'][j]

        if n_hosp >= n_min and n_hosp <= n_max:
            new_clust.append(clust)

df_tor['Cluster'] = new_clust
df_tor.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,X,Y,Hospitals Nearby,Cluster
0,M3A,North York,Parkwoods,43.753259,-79.329656,848576.452374,15142340.0,22,2
1,M4A,North York,Victoria Village,43.725882,-79.315572,847601.053538,15145440.0,19,2
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,851650.155783,15153210.0,75,4
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,859663.027114,15145620.0,14,1
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,853930.191333,15152190.0,108,5


We have all neighborhood reassigned. For the final part, we will visualize it on the map by assigning different colors for each cluster.

In [29]:
def rgb_to_hex(rgb):
    hex = '%02x%02x%02x' % rgb
    return '#{}'.format(hex)

In [30]:
colors = []
n_clusers = max(df_tor['Cluster'])

for i in range(1, n_clusers + 1):
    r = int(255 - (255 * (i / n_clusers)))
    g = 0
    b = int(255 * (i / n_clusers))
    rgb = (r,g,b)
    colors.append(rgb_to_hex(rgb))


In [31]:
df_tor

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,X,Y,Hospitals Nearby,Cluster
0,M3A,North York,Parkwoods,43.753259,-79.329656,848576.452374,1.514234e+07,22,2
1,M4A,North York,Victoria Village,43.725882,-79.315572,847601.053538,1.514544e+07,19,2
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636,851650.155783,1.515321e+07,75,4
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,859663.027114,1.514562e+07,14,1
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,853930.191333,1.515219e+07,108,5
...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,863453.483804,1.515265e+07,22,2
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,853398.516964,1.515183e+07,102,5
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,848449.391620,1.515243e+07,19,2
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,862878.159470,1.515461e+07,20,2


In [32]:
tor_cluster_map = folium.Map(location = [tor_lat, tor_long], zoom_start = 12)

for i in range(1, n_clusers + 1):
    df_plot = df_tor[df_tor['Cluster'] == i]
    df_plot.reset_index(drop = True, inplace = True)

    for j in range(0, df_plot.shape[0]):
        postal = df_plot['Postal Code'][j]
        borough = df_plot['Borough'][j]
        neigh = df_plot['Neighbourhood'][j]
        neigh_lat = df_plot['Latitude'][j]
        neigh_lng = df_plot['Longitude'][j]
        clust = i
        col = colors[i - 1]
        neigh_label = '{}\nCluster = {}'.format(neigh, clust)
        neigh_label = folium.Popup(neigh_label, parse_html = True)

        folium.Circle(
            [neigh_lat, neigh_lng],
            radius = 250,
            popup = neigh_label,
            color = col,
            fill = True,
            fill_color = col,
            fill_opacity = 1,
            parse_html = False
        ).add_to(tor_cluster_map)

tor_cluster_map

All good, from the map we can see that the neighborhoods pointed with blue color have the most healthcare facilities nearby. And opposite to the red one, have the least failities nearby.

# Result

So far in this project, we managed to do neighborhood clustering by the number of healthcare facilities that located nearby its centres. It appears that the neighborhoods near the downtown area have higher accessibility to nearby facilities compared to the neighborhoods on the outskirt of the city. This will conclude our project.