# Effectiveness of Covid-19 management and preventative measures in Ontario, Canada.
> "An analysis of how Ontario's covid-19 eradication strategy compares to it's reality."

- toc: true
- branch: master
- badges: true
- comments: true
- author: Enobong Udoh
- categories: [Covid-19, Ontario, Canada]
- image: images/project_thumbnails/ontario_covid.png
- hide: false
- search_exclude: true
- metadata_key1: metadata_value1
- metadata_key2: metadata_value2



## Project motivation and background

Covid-19 is an infectious respiratory diseases caused by the newly discovered Coronavirus. The novel virus, also known as SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2, formerly called 2019-nCoV), is a family of viruses popularized by their spiky crown. The virus was first detected amid an outbreak of respiratory illness cases in Wuhan City, China and was initially reported by World Health Organization on the 31st of December 2019.
    
In this work, exploratory analysis is carried out to assess the impact of Ontario's Covid preventative solutions and restrictive measures (mobility), on the daily changes in covid cases.  

In particular, this project will explore the following lines of inquiry with the help of a number of publicly accessible data sets: 

1. Is there an observable relationship between the reported covid activities and the proposed medical solution i.e. vaccination?

2. Is there an influence on the number of reported cases in ontario by people's activities across the days of the week?

3. With the government's vaccination plan, preference was given to adults who were 70 and over first, as well as those considered medically compromised. Was this as a result of the significance in the number of confirmed positive cases in the age group 70 and above?

4. How does the proportion of affected groups compare with the those getting vaccinated?

5. How has the pandemic impacted the community's mobility?  Is there an observable effect on the number of cases in the province?

## Data collection

The following datasets were identified to fulfill the analysis requirement: 

1. Ontario's Covid-19 Pandemic and Vaccination trends from 25-January-2020 to 17-July-2021 (Primary data)
  - [Data Source](https://covid19tracker.ca/vaccinationtracker.html)
  - [Data Dictionary](2021-09-10-eu-ontario-data-preparation.ipynb) 

2. Confirmed Positive Cases in cities within Ontario with age
  - [Data Source](https://data.ontario.ca/dataset/confirmed-positive-cases-of-covid-19-in-ontario)
  - [Data Dictionary](https://data.ontario.ca/dataset/confirmed-positive-cases-of-covid-19-in-ontario)

3. Ontario Vaccination data by age
  - [Data Source](https://data.ontario.ca/en/dataset/covid-19-vaccine-data-in-ontario/resource/775ca815-5028-4e9b-9dd4-6975ff1be021)  
  - [Data Dictionary](https://data.ontario.ca/en/dataset/covid-19-vaccine-data-in-ontario)

4. Google Covid-19 mobility report
  - [Data Source](https://www.google.com/covid19/mobility/)
  - [Data Dictionary](https://www.google.com/covid19/mobility/data_documentation.html?hl=en)
  



## Data Preparation

For this project, to gain a better understanding of the covid-19 related activities in Ontario, the datasets above were cleaned for exploration.  
- The Ontario's Covid-19 pandemic and vaccination data contained information on the daily changes and totals of covid-19 cases, fatalities, tests, hospitalizations, criticals, recoveries, vaccinations (partial and full) and vaccines distributed, in the province.

- Confirmed positive cases data contained records of verified cases with fields such as, accurate episode date, case reported date, test reported date, specimen date, age group, client gender and more.

- Ontario vaccination data by age contained fields such as date, age group, total population, the proportion of each age group's population with at least the first dose and second dose of the vaccine per day. 

- The mobility data from google contained fields such as date and percent changes in movement to retail and recreation, grocery and pharmacy, parks, transit stations, workplaces and residential places. These changes were assessed in comparison to the baseline which was computed by google as the median value, for the corresponding day of the week, during the 5-week period Jan 3, 2020 –Feb 6, 2020

Each data set was reviewed independently. Some columns were renamed and the unused columns were dropped. Each data set was checked for missing values, duplicates and outliers. The missing values were filled where necessary. In the confirmed cases data, only 99.14% of the data was retained as missing values and duplicated information were removed. The data sets were all indexed as time series with a uniform end date of July 17, 2021.

> Tip: [Tell me more: a deep dive into the data preparation work](2021-09-08-eu-ontario-data-preparation.ipynb)

## Data Exploration

### Is there an observable relationship between the reported covid activities and the proposed medical solution i.e. vaccination?

![image_1](ontario-images/fig_1.png)

> Note: The correlation between total cases and the totals of other activities is all positive. Total tests has the highest correlation with total cases, with a correlation coefficient of 0.98 and total hospitalizations has the least, with a coefficeint of 0.5

### Is there an influence on the number of reported cases in ontario by people's activities across the days of the week?

![image_2_first](ontario-images/fig_2_first.png)

> Note: The heatmap depicts that there is approximately 0 correlation between the days of the weeks and the daily changes covid cases. The correlation between the days of the week and the other covid activities is also really low 

![image_2](ontario-images/fig_2.png)

> Note: The plots shows that from a cummulative perspective, daily changes in cases fluctuates to an extent across the days of the week but the impact of days on daily total cases is barely noticeable.

### With the government's vaccination plan, preference was given to adults who were 70 and over first, as well as those considered medically compromised. Was this as a result of the significance in the number of confirmed positive cases in the age group 70 and above?

![image_3](ontario-images/fig_3.png)

> Note: The spread of cases is higher amongst people in their 20s.

![image_3_1](ontario-images/fig_3_1.png)

> Note: In both years, less than 30% of the positive cases were adults over 70 years

### How does the proportion of affected groups compare with the those getting vaccinated?

![image_4](ontario-images/fig_4.png)

> Note: Although those in their 20s and 30s account for more of the covid cases, they fall within the bottom three(3) portions of the pie in terms of partial and full vaccinations.

### How has the pandemic impacted the community's mobility?  Is there an observable effect on the number of cases in the province?

![mobility_image_1](ontario-images/fig_5_1.png)

![mobility_image_2](ontario-images/fig_5_2.png)

![mobility_image_3](ontario-images/fig_5_3.png)

> Note: The two major peaks in daily covid case changes were experienced on days where the province was on lockdown and movement to non-essential places did not see a peak. Movement to grocery stores and pharmacies, residential areas and parks for exercise had lower restrictions, thus, mobility mostly stayed around and above the baseline.

> Tip: [Tell me more: a deep dive into the data exploration and analysis work](2021-09-09-eu-ontario-data-exploration.ipynb)

### Conclusions from exploratory analysis

- The data shows that there is correlation between covid activties and the preventive solution - vaccinations. Although levels of correlation differ, total cases has a positive correlation with the totals of other activities. 
    - Total cases  vs Total fatalities has a correlation of ~0.96
    - Total cases  vs Total tests has a correlation of ~0.98
    - Total cases  vs Total hospitalizations has a correlation of ~0.50
    - Total cases  vs Total criticals has a correlation of ~0.79
    - Total cases  vs Total recoveries has a correlation of ~0.99
    - Total cases  vs Partial vaccinations has a correlation of ~0.83
    - Total cases  vs Full vaccinations has a correlation of ~0.60
    - Total cases  vs Vaccines distributed has a correlation of ~0.83
    
- While there is a very low correlation between the days of the week and total cases in ontario, a bar plot shows that the daily changes in number of cases tends to differ across the different days of the week but this is insignificant for total cases. From the data, Friday, April 16 2021, is the day with the highest number of cases.

- Furthermore, although the older population in Ontario are said to have a higher risk of contracting the virus, the data shows that there is a higher number of people testing positive amongst young adults in their 20s and 30s.

- Despite cases being higher amongst the younger population, as of July 17-2021, vaccination efforts had a higher spread amongst the older population. If events progress at this rate, it will likely slow down the speed with which the province overcomes the pandemic.

- Additionally, irrespective of activities slowing down and the preventative measures, such as full lockdowns and restricted movements, the number of cases in the province has continued to rise. It can also be noted that daily change in cases saw it's two major peaks during the government imposed stay-at-home order.

### Recommendations:
In the event of future pandemics, to overcome it's impact faster, it is recommended that;
- the large scale public education on hygiene measures such as; washing hands, wearing masks, sanitizing shared spaces etc, should continually be emphasized to minimize the spread of the virus.
- the general public is continually sensitized on the need to get tested as this would reduce the chances of asymptomatic people spreading the virus.
- the vaccination opportunities should be open to the general public, including the younger population, early. This would potentially improve the quality of results derived from the solution.
- Analyse the impact of mobility restriction measures periodically to determine how viable that solution is. If cases tend to increase drastically at the end of lockdowns, it might be due to asymtomatic carriers suddenly mixing up with others whenever some degree of freedom is allowed. 
- Explore limiting capacity as opposed to full lockdowns during a pandemic. This would likely decrease the sudden excitement for everyone to be outside at the same time and would increase the possibility of knowing who was where and when. E.g: via the barcode registrations required by some enclosed spaces presently.

## Data Modelling



- Regression algorithms were used in modelling data from the primary dataset as the data is continuous. 
- The prediction target is daily total cases and predictor variables used are daily total tests and changes in partial vaccinations. These features were selected as they had a correlation coeffiction > 0.75 with the target.
- To support training and testing the model, the data was split into 2, a test_size of 30% and train_size of 70%.

![data_modelling_image](ontario-images/fig_model_first.png)

### Observations:
- The Linear regression model had an approximate accuracy score of 0.982. After model fitting, the model was used to predict total cases, given the the test features and the  R-squared error was approximately 0.9815.

- validating the model using statsmodel.formala.api, an R-squared score of approximately 0.982 was also obtained and the confidence level of prediction accuracy is 97.5%.

- Based on the linear regression model, it can be interpreted that for every case increase in the total number of cases in ontario, there'll be aproximately a 0.799 increase in partial vaccinations and a 0.025 increase in total tests.

- Mathematically: 
  
    `total_cases = -11860.454 + (0.799 * change_vaccinations) +  (0.025 * total_tests)`

- The decision tree regressor algorithm was also explored for prediction. This model shows that it has better prediction abilities with an R-squared score of 0.993 and lower mean errors (MAE and MSE) for it's prediction on the test data, in comparison to linear regression.

- Further attempt to predict total cases per day was done using random forest regressor. This model was shown to have a higher accuracy score and lower mean errors than the previous models (**`Recommended`**). 

> Tip: [Tell me more: a deep dive into the data modelling work](2021-09-10-eu-ontario-data-modelling-validation.ipynb)

## Model Validation


- A 5-set kfold split was done to derive 5 distinct arrangements of a dataframe containing a prediction target and predictors. Each set contained a split of data for training and testing.
- Testing on a linear regression model,`set_1`, derived from the split, made the best predictions of the 5. 
- Mathematically, for set_1: 
  
    `total_cases = -11660.203 + (0.860 * change_vaccinations) +  (0.025 * total_tests)`

- Others models were explored using only X and y from `set 1` 

![model_validation_image](ontario-images/fig_modelling.png)

### Observations:

- The linear regression model has an accuracy score of approximately 0.9771 

- With the decision tree regressor, the prediction accuracy score (r-squared) improved from what was observed with linear regression to ~ 0.9991

- This even got better with the random forest regressor model, where the r-squared score was computed as ~ 0.9998. 

- In both decision tree and random forest regressors, the errors metrices with set_1 are lower than what was observed with the train_test_split data.

- However, random forest model made predictions with the least amount of errors across the 3 models and is recommended.

> Tip: [Tell me more: a deep dive into the data modelling work](2021-09-10-eu-ontario-data-modelling-validation.ipynb)

## Acknowledgements:
- I acknowledge that this is my first data science project, completed at the end of a 10 weeks part-time course and some concepts may not be presented as a seasoned professional in the field.
- I also acknowledge that Ontario is a large province with rural and urban cities and this analysis represents the province as a collective hence, it might not mirror the reality of some cities.
- I acknowledge the government and public health unit of Ontario for the publicly accessible data on confirmed positive cases and vaccination activities.
- Noah Little, Covid-19 Tracker Canada, https://covid19tracker.ca/vaccinationtracker.html, Accessed: 2021-07-18
- Google LLC "Google COVID-19 Community Mobility Reports".
https://www.google.com/covid19/mobility/ Accessed: 2021-08-09.