# Capstone Project - The Battle of Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The COVID-19 pandemic has severely affected lives, societies, and economies around the world. In Canada, Toronto has been particularly affected as a large metropolitan city home to roughly 3 million people and countless businesses. In the Downtown Toronto area, access to nearby medical centres is not evenly distributed for each neighborhood. It is of interest to Toronto Public Health, policy makers, and community leaders to determine which Toronto neighborhoods may be lacking access to adequate medical centres (clinics, hospitals, etc.) within walking distance. This access is important in the event an individual requires medical care, testing for COVID-19, and eventually for receiving vaccinations once they become available. Intervention may be required for neighborhoods with lower medical centre availability (pop-up centres, increased transportation options, etc.). As pharmacies can also be used for testing and vaccinations, we are also interested in looking at the number of pharmacies in each neighborhood. **The goal of this project will be to segment ___Downtown Toronto___ neighborhoods by their number of medical centres and pharmacies present.**

## Data <a name="data"></a>

To group Downtown Toronto neighbourhoods by their number of medical centres and pharmacies present, the neighbourhoods will be clustered into three groups based on these metrics. This will help interested parties identify neighborhoods of greatest, medium, and least concern.

To accomplish this, the following data is required:
* A list of neighborhoods in Toronto with their corresponding latitude and longitude coordinates
* A list of the medical centres (clinics, hospitals, etc.) present in each neighborhood, within 1 km of each latitude and longitude coordinate set (a reasonable walking distance)
* A list of pharmacies present in each neighborhood, within 1 km of each latitude and longitude coordinate set (a reasonable walking distance)

### Neighborhood Information

A list of Toronto neighborhoods with corresponding boroughs and postal codes are available from the Wikipedia page: \
'List of postal codes of Canada: M' (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M).

The Wikipedia page is scraped to transfer neighborhood information to a Pandas dataframe.

First, importing Pandas:

In [224]:
import pandas as pd

The wikipedia page contains several tables. We are only interested in the first one, which we assign to a pandas dataframe and preview the dataframe:

In [225]:
wiki = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(wiki, header=0)[0]
df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,"Malvern, Rouge"


Removing rows with unassigned boroughs:

In [226]:
df = df[df.Borough != 'Not assigned']
df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


Determining the number of rows in the dataframe that we are dealing with:

In [227]:
print('The number of rows in the dataframe is', df.shape[0])

The number of rows in the dataframe is 103


#### Importing Latitude and Longitude Data

Importing the latitude and longitude data for each neighbourhood into a pandas dataframe from csv file: http://cocl.us/Geospatial_data:

In [228]:
geo_url = 'http://cocl.us/Geospatial_data'
geo_df = pd.read_csv(geo_url)
geo_df.head(10)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


Setting indices to 'Postal Code' for both dataframes to allow for merging, and merging the two dataframes into a new one:

In [229]:
geo_df = geo_df.set_index('Postal Code')
df = df.set_index('Postal Code')
all_df = pd.concat([df, geo_df], axis=1, join='outer', sort=False)
all_df.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
M3A,North York,Parkwoods,43.753259,-79.329656
M4A,North York,Victoria Village,43.725882,-79.315572
M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
M3B,North York,Don Mills,43.745906,-79.352188
M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


Resetting the index of the new dataframe and recover 'Postal Code' column title:

In [230]:
all_df.reset_index(inplace=True)
all_df.rename(columns={'index': 'Postal Code'},inplace=True)
all_df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


For this project, we are only interested in looking at Downtown Toronto neighborhoods:

In [231]:
dt_df = all_df[all_df.Borough == 'Downtown Toronto']
dt_df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
25,M6G,Downtown Toronto,Christie,43.669542,-79.422564
30,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
36,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752
42,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576


In [232]:
print('The number of rows in the dataframe is', dt_df.shape[0])

The number of rows in the dataframe is 19


Therefore we are dealing with **19 Downtown Toronto neighborhoods**. \
Note: Some postal codes contain more than one neighborhood, listed together in one row. We will be dealing with these as one data point.

Let's reset the index in our dataframe and drop the old one:

In [233]:
dt_df.reset_index(inplace=True)
dt_df = dt_df.drop(columns='index')
dt_df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576


Let's visualize the neighborhoods by superimposing them on a map of Toronto.

Importing geopy and folium:

In [234]:
#!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#!pip install folium==0.5.0
import folium # map rendering library

Obtaining latitude and longitude coordinates of Toronto for mapping:

In [235]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Downtown Toronto Neighborhoods Superimposed on Map

In [273]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, borough, neighborhood in zip(dt_df['Latitude'], dt_df['Longitude'], dt_df['Borough'], dt_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Foursquare

Now that we have our neighborhood location data, let's use Foursquare API to get venue information for each neighborhood.

Defining Foursquare credentials and version:

In [237]:
CLIENT_ID = 'U4FBECMV3D0DKSFWFHCAWQFOGZ0YUEBI2DTZDBCFCYHHO0FI' # my Foursquare ID
CLIENT_SECRET = 'LJZD1ROUJCT5UAMJJO4IDY3WB55BEPXXBOGQTPUAXB2MVWS3' # my Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: U4FBECMV3D0DKSFWFHCAWQFOGZ0YUEBI2DTZDBCFCYHHO0FI
CLIENT_SECRET:LJZD1ROUJCT5UAMJJO4IDY3WB55BEPXXBOGQTPUAXB2MVWS3


We will be obtaining venue data for each neighborhood using a pre-defined radius of 1 km around the listed latitude and longitude coordinates. Our search query, to encompass all types of medical centres (clinics, hospitals, etc.) will be 'medical':

In [238]:
radius = 1000
LIMIT = 100
search_query = 'medical'

As an example, let's obtain a list of medical centres in one Downtown Toronto neighbourhood, St. James Town, whose index number in our dataframe is 3. Let's obtain the latitude and longitude coordinates of the neighborhood from our dataframe:

In [239]:
latitude = dt_df.loc[3,'Latitude']
longitude = dt_df.loc[3,'Longitude']

Generating an URL for our search call to the Foursquare API:

In [240]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=U4FBECMV3D0DKSFWFHCAWQFOGZ0YUEBI2DTZDBCFCYHHO0FI&client_secret=LJZD1ROUJCT5UAMJJO4IDY3WB55BEPXXBOGQTPUAXB2MVWS3&ll=43.6514939,-79.3754179&v=20180604&query=medical&radius=1000&limit=100'

Executing our **get** request:

In [241]:
import requests

In [242]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ec5a578aba297001c3ae87c'},
 'response': {'venues': [{'id': '4cdf5495f8cdb1f738339112',
    'name': "St Michael's Hospital Medical Imaging",
    'location': {'address': '30 Bond St.',
     'lat': 43.65368444376282,
     'lng': -79.37870569300675,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.65368444376282,
       'lng': -79.37870569300675}],
     'distance': 359,
     'cc': 'CA',
     'city': 'Toronto',
     'state': 'ON',
     'country': 'Canada',
     'formattedAddress': ['30 Bond St.', 'Toronto ON', 'Canada']},
    'categories': [{'id': '4bf58dd8d48988d104941735',
      'name': 'Medical Center',
      'pluralName': 'Medical Centers',
      'shortName': 'Medical',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/building/medical_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1590011689',
    'hasPerk': False},
   {'id': '55ace30f498e675077de6b47',
    'name': 'Oxford Medical Imaging',
  

How many results, i.e. venues featuring the keyword 'medical', did our search query obtain?

In [243]:
venues = results['response']['venues']
print('There are {} medical centres in St. James Town.'.format(len(venues)))

There are 34 medical centres in St. James Town.


Therefore there are **34 medical centres in St. James Town**. Let's repeat the query for each of our 19 neighborhoods and store the results in a list.

In [244]:
medical_list = []
for row in range(dt_df.shape[0]):
    latitude = dt_df.loc[row,'Latitude']
    longitude = dt_df.loc[row,'Longitude']
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
    results = requests.get(url).json()
    venues = results['response']['venues']
    medical_list.append(len(venues))
medical_list

[5, 48, 48, 34, 20, 50, 9, 48, 22, 36, 34, 38, 38, 2, 3, 23, 8, 43, 39]

Let's convert this list to a column in our dataframe, dt_df, and preview the changes.

In [245]:
dt_df['Medical Centres'] = medical_list
dt_df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Medical Centres
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,5
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,48
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,48
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,34
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,20
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,50
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564,9
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,48
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,22
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,36


Let's repeat this process to obtain the number of pharmacies for each neighborhood.

In [246]:
search_query = 'pharmacy'

As an example, let's obtain a list of pharmacies in St. James Town:

In [247]:
latitude = dt_df.loc[3,'Latitude']
longitude = dt_df.loc[3,'Longitude']

In [248]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=U4FBECMV3D0DKSFWFHCAWQFOGZ0YUEBI2DTZDBCFCYHHO0FI&client_secret=LJZD1ROUJCT5UAMJJO4IDY3WB55BEPXXBOGQTPUAXB2MVWS3&ll=43.6514939,-79.3754179&v=20180604&query=pharmacy&radius=1000&limit=100'

In [249]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ec5a61d211536001b257dfd'},
 'response': {'venues': [{'id': '5c6474c3033693002ccba1bb',
    'name': 'Wellth Pharmacy',
    'location': {'address': '85 Church St',
     'lat': 43.65176,
     'lng': -79.375062,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.65176,
       'lng': -79.375062}],
     'distance': 41,
     'postalCode': 'M5C 2G2',
     'cc': 'CA',
     'city': 'Toronto',
     'state': 'ON',
     'country': 'Canada',
     'formattedAddress': ['85 Church St', 'Toronto ON M5C 2G2', 'Canada']},
    'categories': [{'id': '4bf58dd8d48988d10f951735',
      'name': 'Pharmacy',
      'pluralName': 'Pharmacies',
      'shortName': 'Pharmacy',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/pharmacy_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1590011742',
    'hasPerk': False},
   {'id': '4f1de655e4b0e2eeede3ac0e',
    'name': "Guardian - Morelli's Pharmacy",
    'location': {'addres

How many results, i.e. venues featuring the keyword 'pharmacy', did our search query obtain?

In [250]:
venues = results['response']['venues']
print('There are {} pharmacies in St. James Town.'.format(len(venues)))

There are 16 pharmacies in St. James Town.


Therefore there are **16 pharmacies in St. James Town**. Let's repeat the query for each of our 19 neighborhoods and store the results in a list.

In [251]:
pharmacy_list = []
for row in range(dt_df.shape[0]):
    latitude = dt_df.loc[row,'Latitude']
    longitude = dt_df.loc[row,'Longitude']
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
    results = requests.get(url).json()
    venues = results['response']['venues']
    pharmacy_list.append(len(venues))
pharmacy_list

[9, 22, 24, 16, 10, 27, 9, 22, 4, 15, 16, 16, 25, 1, 1, 10, 9, 19, 17]

Let's convert this list to a column in our dataframe, dt_df, and preview the changes.

In [252]:
dt_df['Pharmacies'] = pharmacy_list
dt_df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Medical Centres,Pharmacies
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,5,9
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,48,22
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,48,24
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,34,16
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,20,10
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,50,27
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564,9,9
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,48,22
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,22,4
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,36,15


Now our data set is complete, cleaned up, and ready for analysis.

## Methodology <a name="methodology"></a>

Now that we have collected the required data, our next step will be to apply the machine learning technique **k-means** clustering to group the neighborhoods according to their number of medical centres and pharmacies present. **Our value of k will be 3 as we are looking to cluster the neighborhoods into 3 groups of greatest, medium, and least concern.**

In [253]:
from sklearn.cluster import KMeans 
clusterNum = 3

Let's declare our variable X which will be used for clustering. It will store the number of medical centres and pharmacies present in each neighbohood as an array. Normalization of the data is not required as the two features (number of medical centres and pharmacies) are of the same magnitude.

In [254]:
X = dt_df.values[:,6:8]

Initializing and fitting our k-means model:

In [255]:
k_means = KMeans(init = "k-means++", n_clusters = clusterNum, n_init = 12)
k_means.fit(X)

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=3, n_init=12, n_jobs=None, precompute_distances='auto',
    random_state=None, tol=0.0001, verbose=0)

The next step will be to assign each neighborhood to a cluster based on our k-means model, and add the corresponding cluster label for each neighborhood to the dataframe. Then, the clusters will be superimposed and visualized on a map of Toronto using folium.

## Results <a name="results"></a>

Let's take a look at out cluster labels:

In [256]:
labels = k_means.labels_
print(labels)

[0 2 2 1 0 2 0 2 0 1 1 1 2 0 0 0 0 1 1]


We assign the labels to each row in our dataframe, dt_df, and view the dataframe:

In [257]:
dt_df['Cluster'] = labels
dt_df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Medical Centres,Pharmacies,Cluster
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,5,9,0
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,48,22,2
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,48,24,2
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,34,16,1
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,20,10,0
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,50,27,2
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564,9,9,0
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,48,22,2
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,22,4,0
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,36,15,1


Now, let's visualize our clusters on a map of Toronto.

In [258]:
# obtaining latitude and longitude coordinates of Toronto for mapping
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


Importing useful libraries for plotting:

In [259]:
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors

In [279]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(clusterNum)
ys = [i + x + (i*x)**2 for i in range(clusterNum)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to map
for lat, lng, neighborhood, medical_centres, pharmacies, cluster in zip(dt_df['Latitude'], dt_df['Longitude'], dt_df['Neighborhood'], \
                                                                        dt_df['Medical Centres'], dt_df['Pharmacies'],dt_df['Cluster']):
    label = '{}, Medical Centres: {}, Pharmacies: {}, Cluster: {}'.format(neighborhood, medical_centres, pharmacies, cluster)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Discussion <a name="discussion"></a>

We now have a view of the neighborhoods in Downtown Toronto in which medical centres and pharmacies are concentrated. Based on this analysis we have identified 3 clusters/categories: 

#### 1) Neighborhoods of Greatest Concern

In [261]:
dt_df[dt_df['Cluster'] == 0]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Medical Centres,Pharmacies,Cluster
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,5,9,0
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,20,10,0
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564,9,9,0
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,22,4,0
13,M5V,Downtown Toronto,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.39442,2,1,0
14,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,3,1,0
15,M5W,Downtown Toronto,Stn A PO Boxes,43.646435,-79.374846,23,10,0
16,M4X,Downtown Toronto,"St. James Town, Cabbagetown",43.667967,-79.367675,8,9,0


In [269]:
print("There are {} neighborhoods in the category 'Greatest Concern'.".format(len(dt_df[dt_df['Cluster'] == 0])))

There are 8 neighborhoods in the category 'Greatest Concern'.


These neighborhoods were identified to have the least number of either medical centres, pharmacies, or both. They are represented by <font color='red'>red circles</font> on our map of Toronto. These are areas where potential interventions (e.g. pop-up testing centres) may be focused by community and public health planning. 

#### 2) Neighborhoods of Medium Concern

In [262]:
dt_df[dt_df['Cluster'] == 1]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Medical Centres,Pharmacies,Cluster
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,34,16,1
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,36,15,1
10,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817,34,16,1
11,M5S,Downtown Toronto,"University of Toronto, Harbord",43.662696,-79.400049,38,16,1
17,M5X,Downtown Toronto,"First Canadian Place, Underground city",43.648429,-79.38228,43,19,1
18,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,39,17,1


In [272]:
print("There are {} neighborhoods in the category 'Medium Concern'.".format(len(dt_df[dt_df['Cluster'] == 1])))

There are 6 neighborhoods in the category 'Medium Concern'.


These neighborhoods were identified to have a medium number of medical centres and pharmacies. They are represented by <font color='purple'>purple circles</font> on our map of Toronto. These are areas where potential interventions (e.g. pop-up testing centres) may be needed but should likely not be the primary focus.

#### 3) Neighborhoods of Least Concern

In [263]:
dt_df[dt_df['Cluster'] == 2]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Medical Centres,Pharmacies,Cluster
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,48,22,2
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,48,24,2
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,50,27,2
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,48,22,2
12,M5T,Downtown Toronto,"Kensington Market, Chinatown, Grange Park",43.653206,-79.400049,38,25,2


In [271]:
print("There are {} neighborhoods in the category 'Least Concern'.".format(len(dt_df[dt_df['Cluster'] == 2])))

There are 5 neighborhoods in the category 'Least Concern'.


These neighborhoods were identified to have a high number of medical centres and pharmacies. They are represented by <font color='green'>green circles</font> on our map of Toronto. There are areas where potential interventions (e.g. pop-up centres) are the least needed.

#### Limitations and Further Analysis Required

A major limitation of this analysis is that it did not take into account the population of each neighborhood to properly assess its true medical access requirements. This is a feature that must be added for any further analysis. Some neighborhoods are also much nearer geographically to each other than others, which skews results as some venues may appear more than once while categorized under different neighborhoods. Furthermore, this study should be expanded to encompass all of Toronto and not just the downtown area.

## Conclusion <a name="conclusion"></a>

The purpose of this project was to group Toronto neighborhoods into 3 categories: Greatest Concern, Medium Concern, and Least Concern on the basis of the availability of medical centres and pharmacies in each neighborhood, as a response to the COVID-19 pandemic. This information is needed by Toronto Public Health, policy makers, and community leaders to determine which Toronto neighborhoods may be lacking access to adequate medical centres within walking distance. The data was successfully extracted and clustered into the 3 categories. The number of neighborhood groups falling into each category is as follows:

* Greatest Concern: 8
* Medium Concern: 6
* Least Concern: 5

However, further analysis is required to take into the account the population of each neighborhood and to segregate geographically close neighborhoods to get the full picture of medical access in each neighborhood.