# Toronto COVID cases by neighborhood

##  1.Introduction/Business Problem

In the course of the COVID pandemic, all goverments implemented several rules and restrictions in order to mitigate the outbreaks of the disease, these restrictions were usually defined based on the type of business and industry of the venues.

In this notebook I will try to rate the efficacy of the restrictions put in place by the goverment in the city of Toronto, by analizying the data of the total COVID cases and the type of venues, all segmented by Postal Code and neighborhood.

The target audience would be all the involved goverment dependenciees (Commerce, Health, City Planning) which could determine if the imposed restrictions did work and if any of these should be kept in place. Also to define a workbook for future pandemics. 

## 2.Data

### 2.1 Data Sources

**2.2.1 Toronto neighborhood data**

I will continue to use the neighborhood data from the wikipedia page which was used in the previous weeks:
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Using this data and the geocoder library, I will create a dataset with the following features:

PostalCode | Borough | Neighborhood | Latitude | Longitude
-----|-----|-----|-----|-----
M3A | North York | Parkwoods | 43.75245 | -79.32991
M4A | North York | Victoria Village | 43.73057 | -79.31306

**2.1.2 COVID-19 Cases in Toronto**

For the COVID cases in Toronto I will use the dataset provided by the Toronto city goverment in their Open Data portal:
https://open.toronto.ca/dataset/covid-19-cases-in-toronto/

An example of the dataset is below, wihch includes all the requiered features for the analysis (age, postal code, neighborhood)

_id  | Assigned_ID |Outbreak Associated | Age Group | Neighbourhood Name | FSA | Source of Infection | Classification | Episode Date | Reported Date | Client Gender | Outcome | Currently Hospitalized | Currently in ICU | Currently Intubated | Ever Hospitalized | Ever in ICU | Ever Intubated
-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----
3326211 |99865 |Sporadic| 19 and younger| Thorncliffe Park| M4H| Household Contact| CONFIRMED| 2021-02-24| 2021-02-25| FEMALE| RESOLVED| No| No| No |  No |  No |  No
3326212 |  99866 |  Sporadic |  40 to 49 Years |  Taylor-Massey |  M4C |  Household Contact |  CONFIRMED |  2021-02-24 |  2021-02-25 |  FEMALE |  RESOLVED |  No |  No |  No |  No |  No |  No 


**2.1.3 Canada demographic data by Postal Code (2016 Census)**

To have a better insights of the data, I will use the demographics data from 2016 census (latest data avilable) from the Canada goverment statistics page:
https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Tables/CompFile.cfm?Lang=Eng&T=1201&OFT=FULLCSV

An example of the dataset is below, wihch includes the total population by postal code.

Geographic code | Geographic name | Province or territory | Incompletely enumerated Indian reserves and Indian settlements, 2016 | Population, 2016 | Total private dwellings, 2016 | Private dwellings occupied by usual residents, 2016
-----|-----|-----|-----|-----|-----|-----
01 | Canada |  | T | 35151728 | 15412443 | 14072079
A0A | A0A | Newfoundland and Labrador |  | 46587 | 26155 | 19426
A0B | A0B | Newfoundland and Labrador |  | 19792 | 13658 | 8792

**2.1.4 Foursquare API**

I will use the Foursquare API to gather the data of venues by neighborhood and define the most popular categories in each neighborhood

### 2.2 Data Gathering and Wrangling

First I will import all the required libraries (BeautifulSoup, pandas, numpy, requests, geocoder, folium, matplotlib) and define my foursquare CLIENT_ID and CLIENT_SECRET in a hidden cell.

In [2]:
# The code was removed by Watson Studio for sharing.

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25ldone
[?25h  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1273 sha256=acf808fd1d5ba5719bf5a91f62c998b129454109a7759decda2b9707790faed7
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/0a/9e/ba/20e5bbc1afef3a491f0b3bb74d508f99403aabe76eda2167ca
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 7.2 MB/s  eta 0:00:01
[?25hCollecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed 

#### 2.1.1 Toronto neighborhood data
First, let's import the Toronto neighborhood data

In [3]:
#Obtain html code with bautifulsoup and requests modules
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
data  = requests.get(url).text
soup = BeautifulSoup(data,"html5lib")

#Parse html table into a pandas dataframe
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

df=pd.DataFrame(table_contents)
#Code to replace additional data in the table to the correct neighborhood
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

df.shape

(103, 3)

Let's add the latitude and longitude to each neighborhood

In [4]:
#Create new dataframe to add latitude and longitude per neighborhood
column_names = ['PostalCode','Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
neighborhoods = pd.DataFrame(columns=column_names)

#Add latitude and longitude to each neighborhood and append it to the new dataframe
for index,neighborhood in df.iterrows():
    
    postal_code= neighborhood['PostalCode']
    borough = neighborhood['Borough']
    neighborhood = neighborhood['Neighborhood']
       
    lat_lng_coords = None
    while(lat_lng_coords is None):
    #geocoder.google was not retrieving any coordenates
     g = geocoder.arcgis ('{}, Toronto, Ontario'.format(postal_code))
     lat_lng_coords = g.latlng

    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    
    
    neighborhoods = neighborhoods.append({'PostalCode': postal_code,
                                            'Borough': borough,
                                            'Neighborhood': neighborhood,
                                            'Latitude': latitude,
                                            'Longitude': longitude}, ignore_index=True)

neighborhoods.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.75245,-79.32991
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
4,M7A,Queen's Park,Ontario Provincial Government,43.66253,-79.39188


#### 2.1.2 COVID-19 Cases in Toronto
We will import the COVID-19 Cases in Toronto from the goverment portal.

In [5]:
path='https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/e5bf35bc-e681-43da-b2ce-0242d00922ad?format=csv'
toronto_cases = pd.read_csv(path)
toronto_cases.shape

(170621, 18)

In [6]:
toronto_cases.head(5)

Unnamed: 0,_id,Assigned_ID,Outbreak Associated,Age Group,Neighbourhood Name,FSA,Source of Infection,Classification,Episode Date,Reported Date,Client Gender,Outcome,Currently Hospitalized,Currently in ICU,Currently Intubated,Ever Hospitalized,Ever in ICU,Ever Intubated
0,3400296,1,Sporadic,50 to 59 Years,Willowdale East,M2N,Travel,CONFIRMED,2020-01-22,2020-01-23,FEMALE,RESOLVED,No,No,No,No,No,No
1,3400297,2,Sporadic,50 to 59 Years,Willowdale East,M2N,Travel,CONFIRMED,2020-01-21,2020-01-23,MALE,RESOLVED,No,No,No,Yes,No,No
2,3400298,3,Sporadic,20 to 29 Years,Parkwoods-Donalda,M3A,Travel,CONFIRMED,2020-02-05,2020-02-21,FEMALE,RESOLVED,No,No,No,No,No,No
3,3400299,4,Sporadic,60 to 69 Years,Church-Yonge Corridor,M4W,Travel,CONFIRMED,2020-02-16,2020-02-25,FEMALE,RESOLVED,No,No,No,No,No,No
4,3400300,5,Sporadic,60 to 69 Years,Church-Yonge Corridor,M4W,Travel,CONFIRMED,2020-02-20,2020-02-26,MALE,RESOLVED,No,No,No,No,No,No


We will remove all the records that do not have a value for the FSA feature (Forward sortation area i.e. first three characters of postal code, based on the case's primary home address)

In [9]:
toronto_cases.dropna(subset=['FSA'], inplace=True)
toronto_cases.shape

(169280, 18)

Let's group the cases total by Postal Code and Episode Date

In [13]:
toronto_cases_grouped_by_date = toronto_cases.groupby(["FSA", "Episode Date"])["Classification"].count()
toronto_cases_grouped_by_date.head(5)

FSA  Episode Date
M0H  2021-04-11      1
M0M  2021-01-27      1
M1A  2021-04-08      1
M1B  2020-03-04      1
     2020-03-16      2
Name: Classification, dtype: int64

In [49]:
toronto_cases_grouped_total = toronto_cases.groupby(["FSA"])["Classification"].count().reset_index()
toronto_cases_grouped_total.rename(columns={"Classification": "Total_COVID_Cases"}, inplace=True)

toronto_cases_grouped_total.head(5)

Unnamed: 0,FSA,Total_COVID_Cases
0,M0H,1
1,M0M,1
2,M1A,1
3,M1B,5724
4,M1C,2185


#### 2.1.3 Canada demographic data by Postal Code (2016 Census)
We will import the demographics data from 2016 census (latest data avilable) from the Canada goverment statistics page.

In [50]:
path='https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Tables/CompFile.cfm?Lang=Eng&T=1201&OFT=FULLCSV'
canada_demographics = pd.read_csv(path)
canada_demographics.shape
canada_demographics.head()

Unnamed: 0,Geographic code,Geographic name,Province or territory,"Incompletely enumerated Indian reserves and Indian settlements, 2016","Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016"
0,01,Canada,,T,35151728.0,15412443.0,14072079.0
1,A0A,A0A,Newfoundland and Labrador,,46587.0,26155.0,19426.0
2,A0B,A0B,Newfoundland and Labrador,,19792.0,13658.0,8792.0
3,A0C,A0C,Newfoundland and Labrador,,12587.0,8010.0,5606.0
4,A0E,A0E,Newfoundland and Labrador,,22294.0,12293.0,9603.0


Let's merge the first three datasets to have a dataframe with the Postal Code, Neighborhood, COVID cases, Total Population features.

In [62]:
neighborhoods_merged = neighborhoods.merge(toronto_cases_grouped_total, left_on='PostalCode', right_on = 'FSA')
neighborhoods_merged = neighborhoods_merged.merge(canada_demographics[['Geographic code','Population, 2016']], left_on='PostalCode', right_on = 'Geographic code')
neighborhoods_merged.drop(columns=['FSA', 'Geographic code'],inplace=True)
neighborhoods_merged.rename(columns={"Population, 2016": "Population_2016"}, inplace=True)
#Delete data for neighborhoods with a population smaller than 100
neighborhoods_merged.drop(neighborhoods_merged[neighborhoods_merged.Population_2016 < 100].index, inplace=True)
neighborhoods_merged.shape

(96, 7)

In [63]:
neighborhoods_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Total_COVID_Cases,Population_2016
0,M3A,North York,Parkwoods,43.75245,-79.32991,1900,34615.0
1,M4A,North York,Victoria Village,43.73057,-79.31306,995,14443.0
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264,2331,41078.0
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042,2165,21048.0
4,M9A,Etobicoke,Islington Avenue,43.66263,-79.52831,1835,35594.0
...,...,...,...,...,...,...,...
94,M4X,Downtown Toronto,"St. James Town, Cabbagetown",43.66788,-79.36649,1445,20822.0
96,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.65319,-79.51113,276,10787.0
97,M4Y,Downtown Toronto,Church and Wellesley,43.66659,-79.38133,1099,30472.0
98,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.63278,-79.48945,874,21299.0


Let's add a column to indicate the COVID cases per 1,000 population.

In [64]:
neighborhoods_merged['Cases_per_1000'] = (neighborhoods_merged.Total_COVID_Cases  / neighborhoods_merged.Population_2016) * 1000
neighborhoods_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Total_COVID_Cases,Population_2016,Cases_per_1000
0,M3A,North York,Parkwoods,43.75245,-79.32991,1900,34615.0,54.889499
1,M4A,North York,Victoria Village,43.73057,-79.31306,995,14443.0,68.891505
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264,2331,41078.0,56.745703
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042,2165,21048.0,102.860129
4,M9A,Etobicoke,Islington Avenue,43.66263,-79.52831,1835,35594.0,51.553633
...,...,...,...,...,...,...,...,...
94,M4X,Downtown Toronto,"St. James Town, Cabbagetown",43.66788,-79.36649,1445,20822.0,69.397752
96,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.65319,-79.51113,276,10787.0,25.586354
97,M4Y,Downtown Toronto,Church and Wellesley,43.66659,-79.38133,1099,30472.0,36.065897
98,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.63278,-79.48945,874,21299.0,41.034790


In [76]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ON_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood, Cases_per_1000 in zip(neighborhoods_merged['Latitude'], neighborhoods_merged['Longitude'], neighborhoods_merged['Borough'], neighborhoods_merged['Neighborhood'], neighborhoods_merged['Cases_per_1000']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=Cases_per_1000/7,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In this first visualization, we can notice that the most cases per capita are concentrated in the north west area of Toronto