</a>
<h1 align=center><font size = 5>Segmenting and Clustering Hospitals in Toronto, Ontario</font></h1>


## Introduction

You have been referred for treatment at a hospital which requires you to be an in-patient for a few weeks. Given the choice of any hospital in Toronto you would like to choose based on nearby venues such as coffee shops, restaurants and other venues of interest. To conduct the analysis we use the Foursquare API to explore hospitals in Toronto, ON. We use the **search** function to find all hospitals in Toronto and then use the **explore** function to get the most common venue categories near to each hospital, and then use this feature to group the hospitals into clusters. We use the _k_-means clustering algorithm to complete this task. Finally, we use the Folium library to visualize the hospitals in Toronto and their emerging clusters.

Before we get the data and start exploring it, let's download all the dependencies that we will need.


In [1]:
import pandas as pd
import numpy as np
import requests,  os, string
import json
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
!pip install BeautifulSoup4
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import folium # map rendering library
import matplotlib.cm as cm
import matplotlib.colors as colors
!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

Collecting BeautifulSoup4
[?25l  Downloading https://files.pythonhosted.org/packages/d1/41/e6495bd7d3781cee623ce23ea6ac73282a373088fcd0ddc809a047b18eae/beautifulsoup4-4.9.3-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 1.1MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2; python_version >= "3.0" (from BeautifulSoup4)
  Downloading https://files.pythonhosted.org/packages/36/69/d82d04022f02733bf9a72bc3b96332d360c0c5307096d76f6bb7489f7e57/soupsieve-2.2.1-py3-none-any.whl
Installing collected packages: soupsieve, BeautifulSoup4
Successfully installed BeautifulSoup4-4.9.3 soupsieve-2.2.1
Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/0c/67/915668d0e286caa21a1da82a85ffe3d20528ec7212777b43ccd027d94023/geopy-2.1.0-py3-none-any.whl (112kB)
[K     |████████████████████████████████| 112kB 19.0MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163

## 1. Download and Explore Dataset


In [3]:
#Foursquare credentials
CLIENT_ID = '2P4NUK2Z1FNXQKV4RBJM1RGQVRV4AGBWDHCMTRI54UPCDFPO'# Foursquare Id
CLIENT_SECRET = 'KHFPIIS0VT1LPW3MQMWYL2DEVMY1CF2GLAQZRAQYRIB4FNB2' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
ACCESS_TOKEN = 'ALBJJ5EIMCAX0ETDLPKCUCPOUMEHR45OFKOPKRST0QE30KW3'

In [45]:
LIMIT = 50 # limit of number of hospital returned by Foursquare API
radius = 5000 # define radius
categoryID = '4bf58dd8d48988d196941735' #Hospitals
NEAR= 'Toronto,ON'
#search_query = 'Hospital'


In [46]:
# create URL
url = 'https://api.foursquare.com/v2/venues/search?near={}&radius={}&limit={}&client_id={}&client_secret={}&oauth_token={}&v={}&categoryId={}'.format(NEAR,radius, LIMIT, CLIENT_ID, CLIENT_SECRET, ACCESS_TOKEN, VERSION, categoryID)

url # display URL    

'https://api.foursquare.com/v2/venues/search?near=Toronto,ON&radius=5000&limit=50&client_id=2P4NUK2Z1FNXQKV4RBJM1RGQVRV4AGBWDHCMTRI54UPCDFPO&client_secret=KHFPIIS0VT1LPW3MQMWYL2DEVMY1CF2GLAQZRAQYRIB4FNB2&oauth_token=ALBJJ5EIMCAX0ETDLPKCUCPOUMEHR45OFKOPKRST0QE30KW3&v=20180605&categoryId=4bf58dd8d48988d196941735'

In [47]:
results = requests.get(url).json()

Now we will clean the json and structure it into a pandas dataframe

In [48]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
med_venues = json_normalize(venues)
med_venues.head()
#filtered_columns = ['venues.id','venues.name', 'venues.location.lat', 'venues.location.lng', 'venues.location.address', 'venues.location.postalCode']
filtered_columns = ['id','name', 'location.lat', 'location.lng', 'location.address', 'location.postalCode']
med_venues = med_venues.loc[:, filtered_columns]

#clean columns
med_venues.columns = [col.split(".")[-1] for col in med_venues.columns]

med_venues.head(20)

  """


Unnamed: 0,id,name,lat,lng,address,postalCode
0,5fff116a6903fc06b8275e63,COVID-19 Assessment Centre,43.652553,-79.405875,347 Bathurst St,M5T 2S7
1,4e68b018d22df5c11f69264c,H-wing: Sunnybrook,43.722276,-79.374563,2075 Bayview Avenue,M4N 3N5
2,4f4e38e7e4b04c96fb7ccd3d,D Wing: Sunnybrook,43.721289,-79.374939,,
3,4bce159acc8cd13ac7b6c3cf,Dundas Euclid Animal Hospital,43.651518,-79.409984,840 Dundas St. W.,M6J 1V5
4,543438d1498e9f625c7aab6e,Ultrasound Waiting Area,43.658005,-79.388426,,
5,5de14daedb954c00086c07b2,Sunnybrook Hospital,43.721838,-79.375978,2075 Bayview Ave.,M4N 3M5
6,4e68eff014954826ade3c663,Veterans Centre: Sunnybrook,43.723458,-79.37492,2075 Bayview Avenue,M4N 3M5
7,54873641498e2f6042038c0c,Peter Munk Medical Imaging -TGH,43.65879,-79.389719,,
8,5cf91f0eaf35f300262f0a04,Blood Collection Lab,43.659252,-79.387154,,M5G 2C4
9,5fcbd92af4821b4765aa3d0f,Women’s College Covid Assessment Centre,43.662346,-79.38695,76 Grenville St,M7A 1N8


In [49]:
med_venues = med_venues.dropna().reset_index()
med_venues

Unnamed: 0,index,id,name,lat,lng,address,postalCode
0,0,5fff116a6903fc06b8275e63,COVID-19 Assessment Centre,43.652553,-79.405875,347 Bathurst St,M5T 2S7
1,1,4e68b018d22df5c11f69264c,H-wing: Sunnybrook,43.722276,-79.374563,2075 Bayview Avenue,M4N 3N5
2,3,4bce159acc8cd13ac7b6c3cf,Dundas Euclid Animal Hospital,43.651518,-79.409984,840 Dundas St. W.,M6J 1V5
3,5,5de14daedb954c00086c07b2,Sunnybrook Hospital,43.721838,-79.375978,2075 Bayview Ave.,M4N 3M5
4,6,4e68eff014954826ade3c663,Veterans Centre: Sunnybrook,43.723458,-79.37492,2075 Bayview Avenue,M4N 3M5
5,9,5fcbd92af4821b4765aa3d0f,Women’s College Covid Assessment Centre,43.662346,-79.38695,76 Grenville St,M7A 1N8
6,10,4ee201b9d5fbcceafebf2847,Sleep Medicine and Research Rounds at Toronto ...,43.659117,-79.388323,200 Elizabeth St.,M5G 2C4
7,11,4c06c62aa0129c74d364d2c9,Casey House Hospice,43.669028,-79.377989,9 Huntley Street,M4Y 2K8
8,15,4f6dcf7be4b0725b61747dba,"Ellicsr, Health, Wellness & Cancer Survivorshi...",43.657861,-79.388314,"200 Elizabeth street, BCS021",M5G 2C4
9,25,59c27e3e98fbfc5bc1672f4e,Urology Department,43.721069,-79.378316,Wellness Way,M4G


## 2. Explore venues nearby hospitals in Toronto


Define a function to explore all hopitals and all venues within 500m radius

In [50]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Hospital', 
                  'Hospital Latitude', 
                  'Hospital Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [51]:
venues = getNearbyVenues(names=med_venues['name'],latitudes=med_venues['lat'],longitudes=med_venues['lng']) # working with hospitals

COVID-19 Assessment Centre
H-wing: Sunnybrook
Dundas Euclid Animal Hospital
Sunnybrook Hospital
Veterans Centre: Sunnybrook
Women’s College Covid Assessment Centre
Sleep Medicine and Research Rounds at Toronto General Hospital
Casey House Hospice
Ellicsr, Health, Wellness & Cancer Survivorship Centre
Urology Department
R. Fraser Elliot Building - Toronto General Hospital
EG12 lecture hall: Sunnybrook
Sunnybrook Health Sciences Centre
Mount Sinai Hospital
CAMH
Toronto Western Hospital
Toronto Rehabilitation Institute
Sherbourne Health Centre
M-wing: Sunnybrook
Princess Margaret Cancer Centre
Toronto General Hospital
Emergency Room: Sunnybrook Hospital
The Hospital for Sick Children (SickKids)
Humber River Hospital
Women's College Hospital


In [52]:
print(venues.shape)

venues.head()

(808, 7)


Unnamed: 0,Hospital,Hospital Latitude,Hospital Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,COVID-19 Assessment Centre,43.652553,-79.405875,#Hashtag Gallery,43.65183,-79.408103,Art Gallery
1,COVID-19 Assessment Centre,43.652553,-79.405875,Market 707,43.652128,-79.404844,Food Court
2,COVID-19 Assessment Centre,43.652553,-79.405875,Kanto,43.652167,-79.404843,Filipino Restaurant
3,COVID-19 Assessment Centre,43.652553,-79.405875,Montauk,43.652084,-79.406898,Bar
4,COVID-19 Assessment Centre,43.652553,-79.405875,Queen Margherita Pizza,43.652054,-79.407263,Italian Restaurant


Group the venues by hospital and count how many venues for each

In [53]:
venues.groupby('Hospital').count()

Unnamed: 0_level_0,Hospital Latitude,Hospital Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Hospital,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CAMH,50,50,50,50,50,50
COVID-19 Assessment Centre,50,50,50,50,50,50
Casey House Hospice,50,50,50,50,50,50
Dundas Euclid Animal Hospital,50,50,50,50,50,50
EG12 lecture hall: Sunnybrook,4,4,4,4,4,4
"Ellicsr, Health, Wellness & Cancer Survivorship Centre",50,50,50,50,50,50
Emergency Room: Sunnybrook Hospital,4,4,4,4,4,4
H-wing: Sunnybrook,4,4,4,4,4,4
Humber River Hospital,5,5,5,5,5,5
M-wing: Sunnybrook,4,4,4,4,4,4


In [54]:
print('There are {} unique categories.'.format(len(venues['Venue Category'].unique())))

There are 138 unique categories.


## 2. Analyse each hospital


In [55]:
#one hot encoding
onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
onehot['Hospital'] = venues['Hospital'] 

onehot.head()

Unnamed: 0,Adult Boutique,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Bakery,...,Tea Room,Thai Restaurant,Theater,Theme Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size

In [56]:
onehot.shape

(808, 138)

Group rows by neighbourhood and by taking the mean of the frequency of occurrence of each category

In [57]:
grouped = onehot.groupby('Hospital').mean().reset_index()
grouped

Unnamed: 0,Hospital,Adult Boutique,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Tea Room,Thai Restaurant,Theater,Theme Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,CAMH,0.0,0.02,0.0,0.0,0.02,0.06,0.0,0.0,0.0,...,0.0,0.0,0.04,0.0,0.0,0.04,0.04,0.0,0.0,0.0
1,COVID-19 Assessment Centre,0.0,0.02,0.02,0.02,0.0,0.02,0.0,0.02,0.0,...,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0
2,Casey House Hospice,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.02,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0
3,Dundas Euclid Animal Hospital,0.0,0.02,0.02,0.0,0.0,0.04,0.0,0.0,0.02,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0
4,EG12 lecture hall: Sunnybrook,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Ellicsr, Health, Wellness & Cancer Survivorshi...",0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.02,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0
6,Emergency Room: Sunnybrook Hospital,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,H-wing: Sunnybrook,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Humber River Hospital,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0
9,M-wing: Sunnybrook,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's confirm the new size

In [58]:
grouped.shape

(25, 138)

The Top 5 venues per neighborhood

In [59]:
num_top_venues = 5

for hood in grouped['Hospital']:
    print("----"+hood+"----")
    temp = grouped[grouped['Hospital'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----CAMH----
         venue  freq
0  Men's Store  0.08
1          Bar  0.08
2  Art Gallery  0.06
3   Restaurant  0.04
4         Café  0.04


----COVID-19 Assessment Centre----
                           venue  freq
0                            Bar  0.12
1                           Café  0.12
2                         Bakery  0.06
3                           Park  0.04
4  Vegetarian / Vegan Restaurant  0.04


----Casey House Hospice----
                 venue  freq
0          Coffee Shop  0.18
1        Grocery Store  0.06
2           Restaurant  0.06
3       Breakfast Spot  0.04
4  Japanese Restaurant  0.04


----Dundas Euclid Animal Hospital----
                venue  freq
0                 Bar  0.10
1  Italian Restaurant  0.08
2        Cocktail Bar  0.06
3         Pizza Place  0.04
4         Art Gallery  0.04


----EG12 lecture hall: Sunnybrook----
             venue  freq
0  Thai Restaurant  0.25
1      Coffee Shop  0.25
2       Food Court  0.25
3    Deli / Bodega  0.25
4            

Let's put that into a pandas dataframe

In [60]:
# write a function to sort the venues in descending order

def return_most_common_venues(row, num_top_venues): 
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Display the Top 10 venues per neighborhood

In [61]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Hospital']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
hospital_venues_sorted = pd.DataFrame(columns=columns)
hospital_venues_sorted['Hospital'] = grouped['Hospital']

for ind in np.arange(grouped.shape[0]):
    hospital_venues_sorted.iloc[ind, 1:] = return_most_common_venues(grouped.iloc[ind, :], num_top_venues)

hospital_venues_sorted

Unnamed: 0,Hospital,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CAMH,Bar,Men's Store,Art Gallery,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Theater,Seafood Restaurant,Café,Restaurant,Furniture / Home Store
1,COVID-19 Assessment Centre,Café,Bar,Bakery,Park,Mexican Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Ramen Restaurant,Arepa Restaurant,Poutine Place
2,Casey House Hospice,Coffee Shop,Restaurant,Grocery Store,Breakfast Spot,Japanese Restaurant,Men's Store,Pub,Burger Joint,Pie Shop,Pharmacy
3,Dundas Euclid Animal Hospital,Bar,Italian Restaurant,Cocktail Bar,Restaurant,Pizza Place,Sandwich Place,Art Gallery,Taco Place,Hobby Shop,New American Restaurant
4,EG12 lecture hall: Sunnybrook,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
5,"Ellicsr, Health, Wellness & Cancer Survivorshi...",Coffee Shop,Café,Italian Restaurant,French Restaurant,Bubble Tea Shop,Sandwich Place,Thai Restaurant,Middle Eastern Restaurant,Pizza Place,Park
6,Emergency Room: Sunnybrook Hospital,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
7,H-wing: Sunnybrook,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
8,Humber River Hospital,Park,Grocery Store,Vietnamese Restaurant,Pharmacy,Coffee Shop,Distribution Center,Electronics Store,Donut Shop,Doner Restaurant,Dog Run
9,M-wing: Sunnybrook,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant


## 3. Cluster Neighborhoods


Generate k-means clusters

In [62]:
# set number of clusters
kclusters = 5 

grouped_clustering = grouped.drop('Hospital', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]
# add clustering labels
hospital_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [63]:
# add latitude/longitude for each hospital
df3 = venues.groupby('Hospital').first().reset_index() #group by and take first row
merged = df3[['Hospital', 'Hospital Longitude', 'Hospital Latitude']].join(hospital_venues_sorted.set_index('Hospital'), on='Hospital')
merged = merged.rename({'Hospital Longitude': 'Longitude', 'Hospital Latitude': 'Latitude'}, axis=1)  # rename coloumns long/lat
merged

Unnamed: 0,Hospital,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CAMH,-79.418846,43.643438,1,Bar,Men's Store,Art Gallery,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Theater,Seafood Restaurant,Café,Restaurant,Furniture / Home Store
1,COVID-19 Assessment Centre,-79.405875,43.652553,1,Café,Bar,Bakery,Park,Mexican Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Ramen Restaurant,Arepa Restaurant,Poutine Place
2,Casey House Hospice,-79.377989,43.669028,4,Coffee Shop,Restaurant,Grocery Store,Breakfast Spot,Japanese Restaurant,Men's Store,Pub,Burger Joint,Pie Shop,Pharmacy
3,Dundas Euclid Animal Hospital,-79.409984,43.651518,1,Bar,Italian Restaurant,Cocktail Bar,Restaurant,Pizza Place,Sandwich Place,Art Gallery,Taco Place,Hobby Shop,New American Restaurant
4,EG12 lecture hall: Sunnybrook,-79.374107,43.721692,2,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
5,"Ellicsr, Health, Wellness & Cancer Survivorshi...",-79.388314,43.657861,4,Coffee Shop,Café,Italian Restaurant,French Restaurant,Bubble Tea Shop,Sandwich Place,Thai Restaurant,Middle Eastern Restaurant,Pizza Place,Park
6,Emergency Room: Sunnybrook Hospital,-79.37589,43.720854,2,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
7,H-wing: Sunnybrook,-79.374563,43.722276,2,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
8,Humber River Hospital,-79.488066,43.724337,3,Park,Grocery Store,Vietnamese Restaurant,Pharmacy,Coffee Shop,Distribution Center,Electronics Store,Donut Shop,Doner Restaurant,Dog Run
9,M-wing: Sunnybrook,-79.376429,43.721781,2,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant


In [64]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="ON_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [65]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(merged['Latitude'], merged['Longitude'], merged['Hospital'], merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 4. Examine Clusters

In [66]:
print(merged.shape)
merged.sort_values(["Cluster Labels"], inplace=True)
merged.head(20)

(25, 14)


Unnamed: 0,Hospital,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Veterans Centre: Sunnybrook,-79.37492,43.723458,0,Food Court,Deli / Bodega,Coffee Shop,Thai Restaurant,Café,Yoga Studio,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
0,CAMH,-79.418846,43.643438,1,Bar,Men's Store,Art Gallery,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Theater,Seafood Restaurant,Café,Restaurant,Furniture / Home Store
1,COVID-19 Assessment Centre,-79.405875,43.652553,1,Café,Bar,Bakery,Park,Mexican Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Ramen Restaurant,Arepa Restaurant,Poutine Place
3,Dundas Euclid Animal Hospital,-79.409984,43.651518,1,Bar,Italian Restaurant,Cocktail Bar,Restaurant,Pizza Place,Sandwich Place,Art Gallery,Taco Place,Hobby Shop,New American Restaurant
13,Sherbourne Health Centre,-79.37229,43.662427,1,Coffee Shop,Restaurant,Pub,Diner,Grocery Store,Gastropub,Japanese Restaurant,Italian Restaurant,Hotel,Breakfast Spot
20,Toronto Western Hospital,-79.406074,43.653434,1,Bar,Café,Vegetarian / Vegan Restaurant,Taco Place,Mexican Restaurant,Burger Joint,Comfort Food Restaurant,Ramen Restaurant,Japanese Restaurant,Poutine Place
9,M-wing: Sunnybrook,-79.376429,43.721781,2,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
15,Sunnybrook Health Sciences Centre,-79.37621,43.721505,2,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
16,Sunnybrook Hospital,-79.375978,43.721838,2,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
6,Emergency Room: Sunnybrook Hospital,-79.37589,43.720854,2,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant


Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

Cluster 1

In [67]:
merged.loc[merged['Cluster Labels'] == 0, merged.columns[[0] + list(range(4, merged.shape[1]))]]

Unnamed: 0,Hospital,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Veterans Centre: Sunnybrook,Food Court,Deli / Bodega,Coffee Shop,Thai Restaurant,Café,Yoga Studio,Escape Room,Electronics Store,Donut Shop,Doner Restaurant


Cluster 2

In [68]:
merged.loc[merged['Cluster Labels'] == 1, merged.columns[[0] + list(range(4, merged.shape[1]))]]

Unnamed: 0,Hospital,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CAMH,Bar,Men's Store,Art Gallery,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Theater,Seafood Restaurant,Café,Restaurant,Furniture / Home Store
1,COVID-19 Assessment Centre,Café,Bar,Bakery,Park,Mexican Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Ramen Restaurant,Arepa Restaurant,Poutine Place
3,Dundas Euclid Animal Hospital,Bar,Italian Restaurant,Cocktail Bar,Restaurant,Pizza Place,Sandwich Place,Art Gallery,Taco Place,Hobby Shop,New American Restaurant
13,Sherbourne Health Centre,Coffee Shop,Restaurant,Pub,Diner,Grocery Store,Gastropub,Japanese Restaurant,Italian Restaurant,Hotel,Breakfast Spot
20,Toronto Western Hospital,Bar,Café,Vegetarian / Vegan Restaurant,Taco Place,Mexican Restaurant,Burger Joint,Comfort Food Restaurant,Ramen Restaurant,Japanese Restaurant,Poutine Place


Cluster 3

In [69]:
merged.loc[merged['Cluster Labels'] == 2, merged.columns[[0] + list(range(4, merged.shape[1]))]]

Unnamed: 0,Hospital,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,M-wing: Sunnybrook,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
15,Sunnybrook Health Sciences Centre,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
16,Sunnybrook Hospital,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
6,Emergency Room: Sunnybrook Hospital,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
21,Urology Department,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
4,EG12 lecture hall: Sunnybrook,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant
7,H-wing: Sunnybrook,Deli / Bodega,Coffee Shop,Thai Restaurant,Food Court,Yoga Studio,Dog Run,Escape Room,Electronics Store,Donut Shop,Doner Restaurant


Cluster 4

In [70]:
merged.loc[merged['Cluster Labels'] == 3, merged.columns[[0] + list(range(4, merged.shape[1]))]]

Unnamed: 0,Hospital,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Humber River Hospital,Park,Grocery Store,Vietnamese Restaurant,Pharmacy,Coffee Shop,Distribution Center,Electronics Store,Donut Shop,Doner Restaurant,Dog Run


Cluster 5

In [71]:
merged.loc[merged['Cluster Labels'] == 4, merged.columns[[0] + list(range(4, merged.shape[1]))]]

Unnamed: 0,Hospital,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Toronto Rehabilitation Institute,Coffee Shop,Café,Art Gallery,French Restaurant,Bubble Tea Shop,Japanese Restaurant,Movie Theater,Poke Place,Pizza Place,Burrito Place
18,Toronto General Hospital,Coffee Shop,Sushi Restaurant,Chinese Restaurant,Spa,Bubble Tea Shop,Gym / Fitness Center,Ice Cream Shop,Italian Restaurant,Japanese Restaurant,Café
17,The Hospital for Sick Children (SickKids),Coffee Shop,Sandwich Place,Italian Restaurant,Café,Bubble Tea Shop,Burger Joint,Donut Shop,Portuguese Restaurant,Poke Place,Pizza Place
12,R. Fraser Elliot Building - Toronto General Ho...,Coffee Shop,Sandwich Place,Burger Joint,Café,Italian Restaurant,Bubble Tea Shop,Yoga Studio,Middle Eastern Restaurant,Poke Place,Pizza Place
23,Women's College Hospital,Coffee Shop,Sushi Restaurant,Café,Sandwich Place,Bookstore,Park,Beer Bar,Distribution Center,Smoothie Shop,Pizza Place
11,Princess Margaret Cancer Centre,Coffee Shop,Café,Ice Cream Shop,Sushi Restaurant,French Restaurant,Art Gallery,Sandwich Place,Japanese Restaurant,Salad Place,Bookstore
10,Mount Sinai Hospital,Coffee Shop,Café,Japanese Restaurant,Gastropub,Art Gallery,Bubble Tea Shop,Sandwich Place,French Restaurant,Ramen Restaurant,Salad Place
5,"Ellicsr, Health, Wellness & Cancer Survivorshi...",Coffee Shop,Café,Italian Restaurant,French Restaurant,Bubble Tea Shop,Sandwich Place,Thai Restaurant,Middle Eastern Restaurant,Pizza Place,Park
2,Casey House Hospice,Coffee Shop,Restaurant,Grocery Store,Breakfast Spot,Japanese Restaurant,Men's Store,Pub,Burger Joint,Pie Shop,Pharmacy
14,Sleep Medicine and Research Rounds at Toronto ...,Coffee Shop,Chinese Restaurant,Sushi Restaurant,Bubble Tea Shop,Burger Joint,Italian Restaurant,Japanese Restaurant,Spa,Yoga Studio,Modern European Restaurant


End of code