# Covid-19 cases in City of Toronto neighborhoods


## Data

To study the number of covid-19 cases in Toronto at neighborhood level, covid-19 neighborhood data made available via City of Toronto municipality [this link](https://www.toronto.ca/home/covid-19/covid-19-latest-city-of-toronto-news/covid-19-status-of-cases-in-toronto/) is used here. The coordinates of each neighborhood are obtained from Nominatim geolocation service through GeoPy library in Python [this link](https://geopy.readthedocs.io/en/stable/). Moreover, Toronto neighborhoods population data is extracted from Wellbeing Toronto application [this link](https://map.toronto.ca/wellbeing/) to see if there is a relationship between the number of covid-19 cases in each neighborhood with population, level of income and age. Finally, `Foursquare` venues data are used to see what type of venues are most popular in the neighborhood with highest numbers of covid-19 cases. See Tables 1 to 5 in the following.

In [1]:
import pandas as pd
import numpy as np
import requests
#import lxml.html as lh
from pypostalcode import PostalCodeDatabase
#import re
#from operator import itemgetter
import geocoder
#import wget
from geopy.geocoders import Nominatim
import folium
from sklearn.cluster import KMeans # import k-means from clustering stage
import matplotlib.cm as cm
import matplotlib.colors as colors
import json
#import geojson
import matplotlib.pyplot as plt
import seaborn as sns
import folium.plugins
from folium.plugins import MarkerCluster
folium.plugins.MarkerCluster()
import warnings
warnings.filterwarnings('ignore')

The number of covid-19 cases in each neighborhood as of July, 14th 2020 is presented in dataframe `df_covid` below.

In [2]:
path_covid = '~\CityofToronto_COVID-19_NeighbourhoodData.csv'

#### Table 1

In [3]:
df_covid = pd.read_csv(path_covid)
df_covid.iloc[0:140,:].sort_values(by = 'Case Count', ascending = False).head(10)

Unnamed: 0,Neighbourhood ID,Neighbourhood Name,Case Count,"Rate per 100,000 people"
78,,Missing Address/Postal Code,613,
45,25.0,Glenfield-Jane Heights,499,1636.54849
125,1.0,West Humber-Clairville,471,1413.904899
82,2.0,Mount Olive-Silverstone-Jamestown,471,1429.265036
33,26.0,Downsview-Roding-CFB,443,1263.836586
139,27.0,York University Heights,420,1522.125177
58,14.0,Islington-City Centre West,351,798.362334
127,113.0,Weston,318,1767.452201
106,131.0,Rouge,303,651.668961
13,24.0,Black Creek,299,1375.534802


- Acquiring the geolocation of neighborhoods using `Nominatim` geolocation service.

In [4]:
address = []
location = []
latitude = []
longitude = []
count = 0
for i in range(len(df_covid)-2):
    try:
        address = df_covid['Neighbourhood Name'][i]+', Toronto, Ontario, Canada'
        geolocator = Nominatim(user_agent="foursquare_agent")
        location = geolocator.geocode(address, timeout=10)
        latitude.append(location.latitude)
        longitude.append(location.longitude)
    except AttributeError:
        try:
            address = address.replace(', Toronto, Ontario, Canada','').split('-')[0]+', Toronto, Ontario, Canada'
            geolocator = Nominatim(user_agent="foursquare_agent")
            location = geolocator.geocode(address, timeout=10)
            latitude.append(location.latitude)
            longitude.append(location.longitude)
        except AttributeError:
            try:
                print(address)
                address = df_covid['Neighbourhood Name'][i]+', Toronto, Ontario, Canada'
                print(address)
                address = address.replace(', Toronto, Ontario, Canada','').split('-')[1]+', Toronto, Ontario, Canada'
                print(address)
                geolocator = Nominatim(user_agent="foursquare_agent")
                location = geolocator.geocode(address, timeout=10)
                latitude.append(location.latitude)
                longitude.append(location.longitude)
            except IndexError:
                latitude.append('NaN')
                longitude.append('NaN')
        
    count += 1
    print(count, latitude[i], longitude[i])

1 43.8080383 -79.2664391
2 43.7853531 -79.2785494
3 43.6017173 -79.5452325
4 43.6703377 -79.407117
5 43.7348039 -79.3572426
6 43.76389295 -79.45636693710946
7 43.6673421 -79.3884571
8 43.7691966 -79.3766617
9 43.7981268 -79.3829726
10 43.7373876 -79.4109253
11 43.6950296 -79.4716835
12 43.7535196 -79.2553355
Birchcliffe, Toronto, Ontario, Canada
Birchcliffe-Cliffside, Toronto, Ontario, Canada
Cliffside, Toronto, Ontario, Canada
13 43.7111699 -79.2481769
14 43.6954005 -79.485495
15 43.781769 -79.4237615
16 43.70304535 -79.45134360044906
17 43.7381512 -79.3725113
18 43.6839242 -79.3569642
19 43.7007782 -79.4945217
20 43.6644734 -79.3669861
21 43.6867527 -79.4595746
22 43.6781015 -79.409415775
23 43.7874914 -79.1507681
24 43.6799189 -79.3886888
25 43.7088231 -79.2959856
26 43.7431757 -79.4517261
27 43.7218363 -79.2362138
28 43.677954 -79.4430828
29 43.6864333 -79.3003555
30 43.6864333 -79.3003555
31 43.7926732 -79.3547219
32 43.7528467 -79.282067
33 43.6653069 -79.4326362
34 43.7492988 -7

In [5]:
latitude.append('NaN')
latitude.append('NaN')
longitude.append('NaN')
longitude.append('NaN')

In [6]:
df_covid['Latitude'] = latitude
df_covid['Longitude'] = longitude

#### Table 2

In [7]:
df_covid.iloc[0:140,:].sort_values(by = 'Case Count', ascending = False).head(10)

Unnamed: 0,Neighbourhood ID,Neighbourhood Name,Case Count,"Rate per 100,000 people",Latitude,Longitude
78,,Missing Address/Postal Code,613,,,
45,25.0,Glenfield-Jane Heights,499,1636.54849,43.6535,-79.3839
125,1.0,West Humber-Clairville,471,1413.904899,43.6824,-79.4808
82,2.0,Mount Olive-Silverstone-Jamestown,471,1429.265036,43.6535,-79.3839
33,26.0,Downsview-Roding-CFB,443,1263.836586,43.7493,-79.4622
139,27.0,York University Heights,420,1522.125177,,
58,14.0,Islington-City Centre West,351,798.362334,43.6488,-79.549
127,113.0,Weston,318,1767.452201,43.7002,-79.5162
106,131.0,Rouge,303,651.668961,43.8049,-79.1658
13,24.0,Black Creek,299,1375.534802,43.6954,-79.4855


- Obtaining the geolocation of City of Toronto:

In [8]:
address0 = 'City of Toronto, ON'
geolocator0 = Nominatim(user_agent="to_explorer")
location0 = geolocator.geocode(address0)
latitude0 = location0.latitude
longitude0 = location0.longitude
print('The geograpical coordinate of City of Toronto are {}, {}.'.format(latitude0, longitude0))

The geograpical coordinate of City of Toronto are 43.7170226, -79.41978303501344.


- Cleaning the `df_covid` dataframe:

In [9]:
indx_rm = np.where(df_covid['Latitude'] == 'NaN')[0].tolist()

In [10]:
df_covid.drop(df_covid.index[indx_rm], inplace = True)

In [11]:
df_covid.reset_index(inplace = True)

In [12]:
df_covid.drop(columns = 'index', inplace = True)

#### Figure 1

In [13]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude0, longitude0], zoom_start=11)

# add markers to map
for lat, lng, neighborhood, case in zip(df_covid['Latitude'], df_covid['Longitude'], df_covid['Neighbourhood Name'], df_covid['Case Count']):
    label = '{}, {} cases'.format(neighborhood, case)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [14]:
# create map of Toronto using latitude and longitude values
map_toronto2 = folium.Map(location=[latitude0, longitude0], zoom_start=10)

marker_cluster = MarkerCluster().add_to(map_toronto2)

# add markers to map
for lat, lng, neighborhood, case in zip(df_covid['Latitude'], df_covid['Longitude'], df_covid['Neighbourhood Name'], df_covid['Case Count']):
    label = '{}, {} cases'.format(neighborhood, case)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(marker_cluster)  
    
map_toronto2

- Let's check the neighbourhoods with highest number of cases

In [15]:
df_covid.sort_values(by = 'Case Count', ascending = False).head(10)

Unnamed: 0,Neighbourhood ID,Neighbourhood Name,Case Count,"Rate per 100,000 people",Latitude,Longitude
45,25.0,Glenfield-Jane Heights,499,1636.54849,43.6535,-79.3839
121,1.0,West Humber-Clairville,471,1413.904899,43.6824,-79.4808
78,2.0,Mount Olive-Silverstone-Jamestown,471,1429.265036,43.6535,-79.3839
33,26.0,Downsview-Roding-CFB,443,1263.836586,43.7493,-79.4622
57,14.0,Islington-City Centre West,351,798.362334,43.6488,-79.549
123,113.0,Weston,318,1767.452201,43.7002,-79.5162
102,131.0,Rouge,303,651.668961,43.8049,-79.1658
13,24.0,Black Creek,299,1375.534802,43.6954,-79.4855
128,137.0,Woburn,298,557.16556,43.7598,-79.2253
62,6.0,Kingsview Village-The Westway,267,1213.636364,43.6995,-79.5563


- The data shows `Glenfield-Jane Heights`, `Mount Olive-Silverstone-Jamestown`, `West Humber-Clairville`, `Downsview-Roding-CFB` and `York University Heights` are the Toronto neighbourhoods with highest number of  covid-19 cases as of July 5th, 2020.

### Age distribution of cases

Data is acquired from [COVID-19: Status of Cases in Toronto](https://www.toronto.ca/home/covid-19/covid-19-latest-city-of-toronto-news/covid-19-status-of-cases-in-toronto/).

#### Table 3

In [16]:
df_age = pd.read_csv('case_by_age.csv')
df_age['Age Group'][0] = '0-19'
df_age

Unnamed: 0,Age Group,% of Total Case Count,Case Count
0,0-19,6.122038,912
1,20-29,14.298181,2130
2,30-39,14.063234,2095
3,40-49,14.07666,2097
4,50-59,15.808552,2355
5,60-69,10.706854,1595
6,70-79,7.068537,1053
7,80-89,10.263812,1529
8,90+,7.377324,1099
9,Unknown,0.214808,32


We'll assume that this distribution is similar over all neighborhoods in Toronto.

### Severity of cases by age

Data is acquired from [COVID-19: Status of Cases in Toronto](https://www.toronto.ca/home/covid-19/covid-19-latest-city-of-toronto-news/covid-19-status-of-cases-in-toronto/).

#### Table 4

In [17]:
df_severity = pd.read_csv('severity_by_age.csv')
df_severity['Age Group'][0] = '0-19'
df_severity

Unnamed: 0,Age Group,ICU Cases,Deaths,Hospitalized Cases,Intubated Cases
0,0-19,3,1,13,1
1,20-29,11,1,51,7
2,30-39,12,1,85,7
3,40-49,40,9,147,24
4,50-59,96,34,301,77
5,60-69,109,102,337,80
6,70-79,86,208,355,63
7,80-89,39,398,383,27
8,90+,4,366,174,1
9,Unknown,0,0,0,0


### Well being Toronto demographics data

- Let's add the data from **Wellbeing Toronto** to see if we could find a relation between number of cases and demographic properties of each neighbourhood. 

In [18]:
df_wellbeing = pd.read_csv('wellbeing_toronto.csv')

In [19]:
df_wellbeing.drop(columns = ['Combined Indicators'], inplace = True)

#### Table 5

In [20]:
df_wellbeing.head(10)

Unnamed: 0,Neighbourhood,Neighbourhood Id,Low Income Families,Healthy Food Index,Total Area,Total Population,Pop 0 - 4 years,Pop 5 - 9 years,Pop 10 - 14 years,Pop 15 - 19 years,...,Pop 55 - 59 years,Pop 60 - 64 years,Pop 65 - 69 years,Pop 70 - 74 years,Pop 75 - 79 years,Pop 80 - 84 years,Pop 85 - 89 years,Pop 90 - 94 years,Pop 95 - 99 years,Pop 100 years and over
0,West Humber-Clairville,1,3790,23.82,30.09,33312,1540,1720,1790,2325,...,2195,1795,1595,1185,885,700,615,160,50,5
1,Mount Olive-Silverstone-Jamestown,2,5010,37.57,4.6,32954,2190,2500,2415,2585,...,1955,1520,1285,885,630,465,300,70,10,0
2,Thistletown-Beaumond Heights,3,1080,42.26,3.4,10360,540,600,595,650,...,660,535,490,375,335,320,350,100,20,5
3,Rexdale-Kipling,4,1110,23.31,2.5,10529,560,515,565,635,...,870,650,520,350,295,270,300,85,15,0
4,Elms-Old Rexdale,5,1190,24.71,2.9,9456,540,605,660,690,...,730,525,415,305,235,180,145,40,5,0
5,Kingsview Village-The Westway,6,2800,35.69,5.1,22000,1425,1485,1325,1500,...,1330,1220,1015,750,660,595,575,170,20,0
6,Willowridge-Martingrove-Richview,7,1610,30.46,5.5,22156,1175,1230,1155,1275,...,1600,1305,1245,945,920,910,885,240,30,10
7,Humber Heights-Westmount,8,820,32.82,2.8,10948,440,490,515,525,...,790,720,580,485,495,530,950,375,70,15
8,Edenbridge-Humber Valley,9,900,46.41,5.5,15535,565,705,845,885,...,1230,1155,910,635,550,520,665,200,55,0
9,Princess-Rosethorn,10,460,39.52,5.2,11051,450,585,730,780,...,1045,810,630,435,335,315,325,115,10,0


In [21]:
df_wellbeing.sort_values(by = 'Total Population', ascending = False)

Unnamed: 0,Neighbourhood,Neighbourhood Id,Low Income Families,Healthy Food Index,Total Area,Total Population,Pop 0 - 4 years,Pop 5 - 9 years,Pop 10 - 14 years,Pop 15 - 19 years,...,Pop 55 - 59 years,Pop 60 - 64 years,Pop 65 - 69 years,Pop 70 - 74 years,Pop 75 - 79 years,Pop 80 - 84 years,Pop 85 - 89 years,Pop 90 - 94 years,Pop 95 - 99 years,Pop 100 years and over
76,Waterfront Communities-The Island,77,4480,31.61,7.60,65913,2135,925,605,1140,...,2560,2120,1760,1225,740,530,365,90,20,5
136,Woburn,137,6860,27.73,12.20,53485,3140,3365,3125,3400,...,3490,2755,2220,1810,1550,1330,1130,305,60,5
50,Willowdale East,51,6260,28.90,5.00,50434,2290,1930,1710,2680,...,2985,2475,2080,1445,1130,780,830,230,40,5
130,Rouge,131,3760,21.95,37.60,46496,2370,2810,2775,3265,...,3555,3135,2590,1670,1005,680,685,205,65,10
116,L'Amoreaux,117,5840,30.73,7.10,43993,2035,2045,2050,2625,...,3120,2810,2575,1855,1770,1440,1345,370,70,10
13,Islington-City Centre West,14,6140,31.01,16.40,43965,2225,1930,1705,1920,...,2920,2480,2050,1545,1240,1080,1480,450,115,30
131,Malvern,132,5170,38.63,8.90,43794,2405,2680,2815,3170,...,2980,2555,2255,1535,1045,590,445,110,30,0
92,Dovercourt-Wallace Emerson-Juncti,93,3860,35.06,3.70,36625,1740,1480,1260,1405,...,2200,1735,1375,1045,890,640,595,170,30,0
25,Downsview-Roding-CFB,26,4060,23.72,14.90,35052,2010,1870,1865,2025,...,2505,1875,1520,1105,1085,950,870,210,35,0
44,Parkwoods-Donalda,45,3370,33.60,7.50,34805,1800,1980,2055,2295,...,2410,1855,1530,1200,950,800,775,185,40,10


In [22]:
df_wellbeing.rename(columns = {'   In Labour Force':'In Labour Force'}, inplace = True)