# Capstone Project - Covid Test Centers Segmenting and Clustering in Toronto

## Introduction: Business Problem

COVID-19 has impacted the whole world at unprecendented levels.<br> Governments and Health care departments are put under extreme pressure to come up with various tasks in the battle against this pandemic.<br> One of many such projects is setting up test centers to assess patients. In this project we'll be segmenting and clustering the Covid-19 Test centers in city of Toronto.<br>By plotting the test centers on the city map gives us a visual of how they are spread across this vast city with a population of 3 million.<br>At the end using Foursquare and K nearest neighbors data science concepts we find the 10 most common venues around each of the test centers.<br>This will help in understanding what kind of businesses or amenties exist around each test center. 

Stakeholders for the insights achieved can be many;<br>
  a) Government officials while planning more test centers in other neighborhoods than the present ones.<br>
  b) Patients to quickly identify what kind of services/amenties exist around each test center e.g Parking centers.<br>
  c) Businesses to take extra precautions for possible exposure due to increased influx of people in the neighborhood.<br>

## Data
To achieve the solution we begin with the official datasets provided by Govt of Onatrio in their website that list the covid test centers across the Province.<br>
https://data.ontario.ca/dataset/8ba078b2-ca9b-44c1-b5db-9674d85421f9/resource/04bede2c-5e30-4a05-b890-cd407043485e/download/assessment-centre-locations.csv
    Along with test center name file contains city name, postal_code, latitude and longitude details that we'll be using when plotting maps(using Folium) and when segmenting and clustering while finding 10 most common venues around each test center present in Toronto(using FourSquare API).

## Methodology
In first step we have collected the required data which contained test_center location_name, city_name, postal_code, latitude and longitude details. We also performed necessary data wrangling for smooth analysis further. 

In the Second step in our analysis we will start using Foursqaure API for finding the near by venues around each test center.

In third and final step we will focus on plotting our findings on maps using Folium and then use K nearest neighbor clustering, thereby finding the n most common venues around each test center.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import requests
import os
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim 
!pip install folium
import folium

print('Libraries imported')

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 2.9 MB/s eta 0:00:011
Collecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Libraries imported


### Data preparation and wrangling

In [4]:
#Importing dataset - Covid test centers in Ontario
url = 'https://raw.githubusercontent.com/srikanth-240/Coursera-Capstone-End_Game/master/covid_test_centers_ontario.csv'
cols_list=['location_name','city','address','postal_code','latitude','longitude']
df_cvt = pd.read_csv(url, usecols=cols_list,error_bad_lines=False)

In [5]:
df_cvt.head()

Unnamed: 0,location_name,city,address,postal_code,latitude,longitude
0,Kirkland and District Hospital,Kirkland Lake,145 Government Road East,P2N 3P4,48.153552,-80.014725
1,Collingwood Health Centre,Collingwood,186 Erie Street,L9Y 4T3,44.501231,-80.204437
2,Midland Assessment Centre,Midland,"845 King Street, Unit 3",L4R 0B7,44.73475,-79.870959
3,Milton District Hospital,Milton,725 Bronte Street South,L9T 9K1,43.497418,-79.868476
4,Oakville Trafalgar Memorial Hospital,Oakville,3001 Hospital Gate,L6M 0L8,43.450869,-79.763927


In [6]:
df_cvt.drop_duplicates(inplace=True)

In [7]:
df_cvt.shape

(153, 6)

### Analysis 
Let's perform some basic exploratory data analysis and derive some additional info from our raw data.

In [8]:
df_cvt_grp = df_cvt.groupby(['postal_code']).count()

In [9]:
df_cvt_grp

Unnamed: 0_level_0,location_name,city,address,latitude,longitude
postal_code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
H0M 1A0,1,1,1,1,1
K0A 1A0,1,1,1,1,1
K0A 1M0,1,1,1,1,1
K0C 2K0,1,1,1,1,1
K0J 1B0,1,1,1,1,1
...,...,...,...,...,...
P7B 6V4,1,1,1,1,1
P8N 2Z6,1,1,1,1,1
P8T 1B4,1,1,1,1,1
P9A 2B7,2,2,2,2,2


In [10]:
toronto_data=df_cvt[df_cvt['city'].str.contains("Toronto")]
toronto_data.head()

Unnamed: 0,location_name,city,address,postal_code,latitude,longitude
100,Humber River Hospital Assessment Centre,Toronto,2111 Finch Avenue West,M3N 1N1,43.754813,-79.525921
101,Michael Garron Hospital - Emergency Department,Toronto,825 Coxwell Avenue,M4C 3E7,43.68991,-79.324858
102,Michael Garron Hospital - Outpatient Clinic,Toronto,825 Coxwell Avenue,M4C 3E7,43.68991,-79.324858
103,Market Place Temporary Assessment Centre,Toronto,4 The Market Place,M4C 5M1,43.695869,-79.292138
104,Mount Sinai Hospital,Toronto,600 University Avenue,M5G 1X5,43.657575,-79.390096


#### Now that we have required data ready to use, lets get started with Foursquare API set up and usage

In [11]:
CLIENT_ID = 'W4PBLUERW3QAKYRKDMGPKNXWDO5FCII0YASDM22CSHKAEIAX'
CLIENT_SECRET = 'PA5KT0XC2RG2E3KJF4E4D3XAZ4UZ3T1DAEX4D5P5NMXAQPYV'
VERSION = '20201111'

#### Time to get all near by venues using latitude and longitudes of each test center obtained earlier

In [12]:
def getNearbyVenues(names, latitudes, longitudes):
    radius=500
    LIMIT=100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            

        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            radius, 
            LIMIT)
            

        results = requests.get(url).json()["response"]['groups'][0]['items']

        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Test_Center', 
                  'Test_Center Latitude', 
                  'Test_Center Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
toronto_venues = getNearbyVenues(names=toronto_data['location_name'],
                                   latitudes=toronto_data['latitude'],
                                   longitudes=toronto_data['longitude']
                                  )

Humber River Hospital Assessment Centre
Michael Garron Hospital - Emergency Department
Michael Garron Hospital - Outpatient Clinic
Market Place Temporary Assessment Centre
Mount Sinai Hospital
North York General Hospital - Branson
North York General Hospital - Emergency Department
Scarborough Health Network - Birchmount
Scarborough Health Network Centenary Site
Sunnybrook Health Sciences Centre
Toronto Western Hospital 
St. Joseph's Health Centre 
St Michael's Hospital 
Women's College Hospital


In [16]:
toronto_venues.head()

Unnamed: 0,Test_Center,Test_Center Latitude,Test_Center Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Humber River Hospital Assessment Centre,43.754813,-79.525921,Best Western Plus Toronto North York Hotel & S...,43.756887,-79.528779,Hotel
1,Humber River Hospital Assessment Centre,43.754813,-79.525921,Holiday Inn Express Toronto-North York,43.756232,-79.527348,Hotel
2,Humber River Hospital Assessment Centre,43.754813,-79.525921,Tim Hortons,43.754344,-79.527024,Coffee Shop
3,Humber River Hospital Assessment Centre,43.754813,-79.525921,Hwy 400 at Finch W.,43.754399,-79.526967,Intersection
4,Humber River Hospital Assessment Centre,43.754813,-79.525921,Perkins,43.756567,-79.527475,American Restaurant


In [17]:
toronto_venues.shape

(374, 7)

In [18]:
toronto_venues.groupby('Test_Center').count()

Unnamed: 0_level_0,Test_Center Latitude,Test_Center Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Test_Center,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Humber River Hospital Assessment Centre,7,7,7,7,7,7
Market Place Temporary Assessment Centre,4,4,4,4,4,4
Michael Garron Hospital - Emergency Department,9,9,9,9,9,9
Michael Garron Hospital - Outpatient Clinic,9,9,9,9,9,9
Mount Sinai Hospital,61,61,61,61,61,61
North York General Hospital - Branson,7,7,7,7,7,7
North York General Hospital - Emergency Department,13,13,13,13,13,13
Scarborough Health Network - Birchmount,8,8,8,8,8,8
Scarborough Health Network Centenary Site,9,9,9,9,9,9
St Michael's Hospital,84,84,84,84,84,84


In [23]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot.insert(loc=0, column='Test_Center', value=toronto_venues['Test_Center'] )
toronto_onehot.shape

(374, 134)

In [24]:
toronto_grouped = toronto_onehot.groupby('Test_Center').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Test_Center,American Restaurant,Arcade,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bagel Shop,...,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Store,Wine Bar,Yoga Studio
0,Humber River Hospital Assessment Centre,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Market Place Temporary Assessment Centre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Michael Garron Hospital - Emergency Department,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Michael Garron Hospital - Outpatient Clinic,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Mount Sinai Hospital,0.0,0.0,0.0,0.032787,0.016393,0.0,0.0,0.0,0.0,...,0.016393,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0


#### Fetch the most common venues based on their existence counts

In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Test_Center']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Test_Center'] = toronto_grouped['Test_Center']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Test_Center,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Humber River Hospital Assessment Centre,Hotel,American Restaurant,Coffee Shop,Intersection,Diner,Eastern European Restaurant,Donut Shop,Doner Restaurant,Distribution Center,Discount Store
1,Market Place Temporary Assessment Centre,Golf Course,Metro Station,Convenience Store,Park,Creperie,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Electronics Store
2,Michael Garron Hospital - Emergency Department,Coffee Shop,Café,Park,Pizza Place,Farmers Market,Dance Studio,Sandwich Place,Diner,Doner Restaurant,Distribution Center
3,Michael Garron Hospital - Outpatient Clinic,Coffee Shop,Café,Park,Pizza Place,Farmers Market,Dance Studio,Sandwich Place,Diner,Doner Restaurant,Distribution Center
4,Mount Sinai Hospital,Coffee Shop,Café,Japanese Restaurant,Italian Restaurant,Ramen Restaurant,French Restaurant,Art Gallery,Bar,Sandwich Place,Bubble Tea Shop


#### Now let's use K nearest neighbor clustering

In [27]:
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Test_Center', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

kmeans.labels_[0:10]

array([2, 3, 4, 4, 1, 1, 1, 1, 1, 1], dtype=int32)

In [28]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [29]:
toronto_merged = toronto_data
toronto_merged=toronto_merged[toronto_merged['location_name']!='Michael Garron Hospital - Emergency Department']

In [31]:
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Test_Center'), on='location_name')

#### And here we are, showing each test center and 10 most common venues around each of it

In [32]:
toronto_merged.head()

Unnamed: 0,location_name,city,address,postal_code,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
100,Humber River Hospital Assessment Centre,Toronto,2111 Finch Avenue West,M3N 1N1,43.754813,-79.525921,2,Hotel,American Restaurant,Coffee Shop,Intersection,Diner,Eastern European Restaurant,Donut Shop,Doner Restaurant,Distribution Center,Discount Store
102,Michael Garron Hospital - Outpatient Clinic,Toronto,825 Coxwell Avenue,M4C 3E7,43.68991,-79.324858,4,Coffee Shop,Café,Park,Pizza Place,Farmers Market,Dance Studio,Sandwich Place,Diner,Doner Restaurant,Distribution Center
103,Market Place Temporary Assessment Centre,Toronto,4 The Market Place,M4C 5M1,43.695869,-79.292138,3,Golf Course,Metro Station,Convenience Store,Park,Creperie,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Electronics Store
104,Mount Sinai Hospital,Toronto,600 University Avenue,M5G 1X5,43.657575,-79.390096,1,Coffee Shop,Café,Japanese Restaurant,Italian Restaurant,Ramen Restaurant,French Restaurant,Art Gallery,Bar,Sandwich Place,Bubble Tea Shop
105,North York General Hospital - Branson,Toronto,555 Finch Avenue West,M2R 1N5,43.772477,-79.448125,1,Bakery,Coffee Shop,Skating Rink,Juice Bar,Pizza Place,Shopping Mall,Grocery Store,Curling Ice,Dance Studio,Deli / Bodega


#### Now it is time for plotting our findings on a map using Folium

In [33]:
address = "Toronto, ON"

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto city are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto city are 43.6534817, -79.3839347.


In [34]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['latitude'], toronto_merged['longitude'], toronto_merged['location_name'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### That was a good visual of data on map of Toronto.<br> Below we'll use clustering of all 5 and check the outputs

In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[0] + list(range(7, toronto_merged.shape[1]))]]

Unnamed: 0,location_name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
109,Sunnybrook Health Sciences Centre,Restaurant,Coffee Shop,Deli / Bodega,Diner,Eastern European Restaurant,Donut Shop,Doner Restaurant,Distribution Center,Discount Store,Yoga Studio


In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[0] + list(range(7, toronto_merged.shape[1]))]]

Unnamed: 0,location_name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
104,Mount Sinai Hospital,Coffee Shop,Café,Japanese Restaurant,Italian Restaurant,Ramen Restaurant,French Restaurant,Art Gallery,Bar,Sandwich Place,Bubble Tea Shop
105,North York General Hospital - Branson,Bakery,Coffee Shop,Skating Rink,Juice Bar,Pizza Place,Shopping Mall,Grocery Store,Curling Ice,Dance Studio,Deli / Bodega
106,North York General Hospital - Emergency Depart...,Furniture / Home Store,Coffee Shop,Trail,Intersection,Tennis Court,Train Station,Moving Target,Food & Drink Shop,Department Store,Doner Restaurant
107,Scarborough Health Network - Birchmount,Chinese Restaurant,Park,Coffee Shop,Bus Stop,Caribbean Restaurant,Athletics & Sports,Shopping Mall,Department Store,Deli / Bodega,Eastern European Restaurant
108,Scarborough Health Network Centenary Site,Coffee Shop,Pharmacy,Discount Store,Beer Store,Fast Food Restaurant,Supermarket,Sandwich Place,Café,Doner Restaurant,Distribution Center
110,Toronto Western Hospital,Café,Bar,Vegetarian / Vegan Restaurant,Mexican Restaurant,Bakery,Caribbean Restaurant,Taco Place,Park,Art Gallery,Cocktail Bar
111,St. Joseph's Health Centre,Coffee Shop,Breakfast Spot,Pharmacy,Café,Bakery,Eastern European Restaurant,Shoe Store,Burrito Place,Bus Stop,Restaurant
112,St Michael's Hospital,Clothing Store,Café,Coffee Shop,Gym,Hotel,Restaurant,New American Restaurant,Lingerie Store,Japanese Restaurant,Italian Restaurant


In [37]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[0] + list(range(7, toronto_merged.shape[1]))]]

Unnamed: 0,location_name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
100,Humber River Hospital Assessment Centre,Hotel,American Restaurant,Coffee Shop,Intersection,Diner,Eastern European Restaurant,Donut Shop,Doner Restaurant,Distribution Center,Discount Store


In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[0] + list(range(7, toronto_merged.shape[1]))]]

Unnamed: 0,location_name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
103,Market Place Temporary Assessment Centre,Golf Course,Metro Station,Convenience Store,Park,Creperie,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Electronics Store


In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[0] + list(range(7, toronto_merged.shape[1]))]]

Unnamed: 0,location_name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
102,Michael Garron Hospital - Outpatient Clinic,Coffee Shop,Café,Park,Pizza Place,Farmers Market,Dance Studio,Sandwich Place,Diner,Doner Restaurant,Distribution Center
113,Women's College Hospital,Coffee Shop,Park,Italian Restaurant,Café,Sandwich Place,Hobby Shop,Distribution Center,Diner,Department Store,Bookstore


## Results and Discussion

Here we are with all the results for analysis.<br>
    First using the datasets we narrowed down to COVID test centers available in city of Toronto.<br> This we achieved by narrowing down from all the locations listed in the province of Ontario's dataset.<br>
As observed we have 5 locations across the city in different neighborhoods.<br> In order to understand proximity of each of those we plotted them on map and this gave us an insight of a visual representation of their presence.<br>
    Next using Foursquare we identified most common business amenities available around each of the test center location.<br> For this we used the latitude and longitude of the respective test centers and used the Four Square's features to fetch each of the venues around it.<br>
    Finally using K-nearest-neighbors data science algorithm we segmented and clustered each of the above mentioned locations.




## Conclusion 
Purpose of this project was to identify COVID test center locations across Toronto. And the results of this will help the stake holders as mentioned below;<br>
a) Government officials while planning more test centers in other neighborhoods than the present ones.<br>
b) Patients to quickly identify what kind of services/amenities exist around each test center e.g Parking centers.<br>
c) Businesses to take extra precautions for possible exposure due to increased influx of people in the neighborhood.<br>

Considering the results we achieved by plotting test centers on Toronto map and by finding the most common venues around each test center we can conclude that we have addressed the business problem we started with