# COVID 19 Cases and Vaccine Visualization

<p> Over the last year and a half the COVID 19 pandemic has had a great impact on the world. This impact has extended to all industries: trade, tourism, retail, manufucturing, education as well as others. The governments of the world have all focused their efforrts on containing the virus and as of recently the new focus is on distributing the vaccine.</p>

In our analysis and visualization we would like to look at the data peratining to the Ontario province of Canada.Due to the scarcity of data and the continuous updates made on the current available data we will be utilising three different datasets inorder to analyse and visualize or data.

    - Vaccine_covid.csv : This file contains data on the vaccine distribution around Ontario so far. 
    - daily_change.csv : This file contains the daily changes in the covid 19 cases across Ontario in the major cities. 
    - covidtesting.csv : This file contains additional data on the testing and identification of cases.

In [192]:
# Necessary imports 
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import plotly.io as pio

In [193]:
test_cases = pd.read_csv("covidtesting.csv")
test_cases.head()

Unnamed: 0,Reported Date,Confirmed Negative,Presumptive Negative,Presumptive Positive,Confirmed Positive,Resolved,Deaths,Total Cases,Total patients approved for testing as of Reporting Date,Total tests completed in the last day,...,Number of patients hospitalized with COVID-19,Number of patients in ICU with COVID-19,Number of patients in ICU on a ventilator with COVID-19,Total Positive LTC Resident Cases,Total Positive LTC HCW Cases,Total LTC Resident Deaths,Total LTC HCW Deaths,Total_Lineage_B.1.1.7,Total_Lineage_B.1.351,Total_Lineage_P.1
0,2020-01-26,,,1.0,,,,,,,...,,,,,,,,,,
1,2020-01-27,,,2.0,,,,,,,...,,,,,,,,,,
2,2020-01-28,,,1.0,1.0,,,1.0,,,...,,,,,,,,,,
3,2020-01-30,,,0.0,2.0,,,2.0,,,...,,,,,,,,,,
4,2020-01-31,,,0.0,2.0,,,2.0,,,...,,,,,,,,,,


<p>As noted earlier one of the main issues we have with the data is the fact that it is ever evolving. From the above dataframe we can tell the data was collected in early 2020 and has the highest number of missing values as this time frame corresponds with the full scale onset of the pandemic.</p>

In [194]:
test_cases.shape

(420, 22)

The data we are using has 420 rows of data and 22 variables. The rows of data each correspond to a given date starting from 2020-01-26

The columns that we have are as shown below: 

In [195]:
for i in test_cases.columns:
    print(i)

Reported Date
Confirmed Negative
Presumptive Negative
Presumptive Positive
Confirmed Positive
Resolved
Deaths
Total Cases
Total patients approved for testing as of Reporting Date
Total tests completed in the last day
Percent positive tests in last day
Under Investigation
Number of patients hospitalized with COVID-19
Number of patients in ICU with COVID-19
Number of patients in ICU on a ventilator with COVID-19
Total Positive LTC Resident Cases
Total Positive LTC HCW Cases
Total LTC Resident Deaths
Total LTC HCW Deaths
Total_Lineage_B.1.1.7
Total_Lineage_B.1.351
Total_Lineage_P.1


The data we have does not have associated metadata files that explain what each of the variables mean and as such we ay need to make our own assumptions based on what the variable names.

    - Reported Date : Month, date and year starting from 2020-01-26.
    - Confirmed Negative : Tested individuals that have had a negative result.
    - Presumptive Negative : Untested individuals that have been assumed to be negative due to a lack of Sympotms.
    - Presumptive Positive : Untested individuals that have been assumed to be positive due to presence of covid rellated             symptoms. 
    - Confirmed Positive : Tested individuals that have had a positive result.
    - Resolved : Individuals that were previously positive but obtained a negative result when they were tested again. 
    - Deaths:  Deaths that resulted from  COVID 19. 
    - Total Cases : Total COVID 19 cases as at a given date. 
    - Total patients approved for testing as of Reporting Date : Number of patients who have been tested as of the specified         report date. ( Refer above for the Reported Date Meaning)
    - Total tests completed in the last day : Number of test that were completed prior to the current reporting date.
    - Percent positive tests in last day : Perecentage of test done in the last day compared to the total cases. 
    - Under Investigation - Individuals that have been identified as potential positive cases. 
    - Number of patients hospitalized with COVID-19 : Individuals that have been admitted/hospitalized due to COVID 19. 
    - Number of patients in ICU with COVID-19 : Individuals that have been moved to ICU in a hospital due to COVID 19. 
    - Number of patients in ICU on a ventilator with COVID-19 : Hospitalized individuals that have been intubated.
    - Total Positive LTC Resident Cases - Positive cases that were recorded and linked to individuals in Long Term Care.
    - Total Positive LTC HCW Cases : Total number of cases that were recorded from Long Term Care homes. 
    - Total LTC Resident Deaths : Cases that were recorded by long term care residents.  

Due to the irregular naming pattern of the last three variables in our data it is impossible to make assumptions on what they mean. 
<p>These variables are: </p>

   - Total_Lineage_B.1.1.7
   - Total_Lineage_B.1.351
   - Total_Lineage_P.1

#### Missing Values 

Given the high number of missing values it would be essential to identify how many missing variables we have in our data. 

In [196]:
def missing_values(data):
    """
    -------------------------------------------------------
    Function that computes the number of missing values 
    each variable in a datset contains.
    -------------------------------------------------------
    Parameters:
       data - A pandas dataframe (parameter type and constraints)
    Returns:
       Null Counter - The variable name and number of null values in the variable (dictionary)
    -------------------------------------------------------
    """
    new_data = data.copy() # Allows us to work with a deepcopy of the data
    columns =  new_data.columns
    null_count = {}
    for i in columns:
        current = new_data[i].isnull().sum()
        null_count[i] = current
    return null_count 
count_null =  missing_values(test_cases)
count_null

{'Reported Date': 0,
 'Confirmed Negative': 373,
 'Presumptive Negative': 408,
 'Presumptive Positive': 395,
 'Confirmed Positive': 2,
 'Resolved': 12,
 'Deaths': 40,
 'Total Cases': 2,
 'Total patients approved for testing as of Reporting Date': 6,
 'Total tests completed in the last day': 69,
 'Percent positive tests in last day': 73,
 'Under Investigation': 0,
 'Number of patients hospitalized with COVID-19': 56,
 'Number of patients in ICU with COVID-19': 56,
 'Number of patients in ICU on a ventilator with COVID-19': 56,
 'Total Positive LTC Resident Cases': 106,
 'Total Positive LTC HCW Cases': 110,
 'Total LTC Resident Deaths': 103,
 'Total LTC HCW Deaths': 103,
 'Total_Lineage_B.1.1.7': 358,
 'Total_Lineage_B.1.351': 362,
 'Total_Lineage_P.1': 374}

Some of the variables have too many variabes for them to have any statistical significance to uus hence we can remove these variables from our data. 

In [197]:
test_cases = test_cases.drop(['Total_Lineage_B.1.1.7','Total_Lineage_B.1.351','Total_Lineage_P.1','Presumptive Negative',
                              'Presumptive Positive'],axis = 1)

#### Total and Daily Cases In Ontario

In [198]:
# Setting the time as the index on the datafarme 
cases_time_index = test_cases.set_index(pd.DatetimeIndex(test_cases['Reported Date']))
cases_time_index.tail()

Unnamed: 0_level_0,Reported Date,Confirmed Negative,Confirmed Positive,Resolved,Deaths,Total Cases,Total patients approved for testing as of Reporting Date,Total tests completed in the last day,Percent positive tests in last day,Under Investigation,Number of patients hospitalized with COVID-19,Number of patients in ICU with COVID-19,Number of patients in ICU on a ventilator with COVID-19,Total Positive LTC Resident Cases,Total Positive LTC HCW Cases,Total LTC Resident Deaths,Total LTC HCW Deaths
Reported Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2021-03-27,2021-03-27,,17519.0,315865.0,7308.0,340692.0,12372873.0,61005.0,4.5,33065,985.0,365.0,192.0,15025.0,6809.0,3893.0,10.0
2021-03-28,2021-03-28,,18405.0,317408.0,7327.0,343140.0,12423100.0,50227.0,4.5,25452,917.0,366.0,217.0,15029.0,6811.0,3893.0,10.0
2021-03-29,2021-03-29,,18965.0,318932.0,7337.0,345234.0,12462570.0,39470.0,6.1,17716,841.0,382.0,236.0,15030.0,6816.0,3893.0,10.0
2021-03-30,2021-03-30,,19810.0,320409.0,7351.0,347570.0,12498641.0,36071.0,6.2,35066,1090.0,387.0,249.0,15037.0,6819.0,3897.0,10.0
2021-03-31,2021-03-31,,20155.0,322382.0,7366.0,349903.0,12551173.0,52532.0,4.8,40446,1111.0,396.0,252.0,15044.0,6820.0,3901.0,10.0


Having the time component as our index allows us to peform time series visualizations on our data and examine the trends and shifts over the course of the year. 

We can begin by looking at the progression of total cases overtime across Toronto. 

In [199]:
# Visualization of the time series data with x set to the date variable 
fig = px.line(cases_time_index,
              x="Reported Date",
              y="Total Cases",    
              labels={
                     "Reported Date": "Date",
                     "Total Cases": "Total Reported Covid Cases"
                 },
              title='Total Covid Cases Between Jan 26 2020 - Mar 31 2021')
fig.show()

<p>As expecteed the total number of covid cases has continuously grown from late January of 2020. This however is not a good way of looking at the trend of the growth, It would be more impactful to look at the daily growth and decline rates.</p>

We have the data on Total Cases but we are missing the daily changes. The daily changes would be the difference between the cases tthat occurred on a given reporting date subtracted from the ones on the previous date.

In [200]:
# Create a daily column variable that contains the daily changes in cases 
cases_time_index['Daily Cases'] = cases_time_index['Total Cases'].diff()

<p>The daily patterns are going to provide knowledge on whether the measures being implemented by the government and other agencies are actua;lly working.</p>

In [201]:
# Daily pattern change of Covid 19 across Ontario 
fig = px.line(cases_time_index,
              x="Reported Date",
              y="Daily Cases",
              labels={
                     "Reported Date": "Date",
                     "Daily Cases": "Daily Reported Covid Cases"
                 },
              title='Daily Covid Cases Between Jan 26 2020 - Mar 31 2021')
fig.show()

<p>From the time series above we note that in late January the cases recorded we zero. This may not necesarilly be the case as there were more than zero cases but the fact that there was limited testing may be the main cause of zero cases.</p>

</p>We can also note that there was a consistent rise in the number of daily cases in Ontario begining March 26 2020 with a peak on April 24 at 640 daily cases.</p>

<p>Following the measures that were taken up by the government and other agencies we notice a consistent drop in the daily cases.</p>

The measures taken by the federal government and the Ontario Government at the time included the following 

-  Closure of all recreational facilities, private and public schools, day cares, bars and restaurants and any large   gatherings. Instituted on March 16 2020. 
- 14 day quarantines on individuals flying into the country through Pearson International Airport in Toronto, Ontario. Initiated on March 16 2020.
- Recommendation to employers to shift the employees from in-office to work from home. Initiated on March 22 2020.

Given that these measures we instituted in mid-to-late May it may be feasible to assume that these measures were the cause of the decline in the cases as they took time to take effect.

There is also consistent rise in the infection rates begining Sep 3 2020 which corresponds with the time that the government relaxed some of the measures it had previously enacted. The continuous rise in the infection rate may have been due to the confidence that the decline in the daily rates may have created and this paved way for the second wave. 

Gathers and other celebrations that normally take place during the Christmas holiday season and the new year may have also been impactful to the spike that we noticed in the daily cases which ran up to 4249 in Jan 8 2021. Earlier this year the Ontario Government enacted the measures to reduce movement and ban gatherings once again.

#### COVID Impact on ICU Facilities

The COVID-19 helped Canada as well as other countries realize that the ICU facilities were ill equipped to deal with a large number of infected individuals. 
During the height of the pandemic most ICU facilities were filled to the max and we had to resort to other unconventional facilities to create more room for the infected. 

With this data we can be able to look at the maximum capacities that the ICU facilities in Ontario as well as compare the hospitalization rates to the ICU addmission rate.

In [202]:
# We can add columns to include the year and month variables 
cases_time_index['Month'] = pd.to_datetime(cases_time_index['Reported Date']).dt.strftime('%b')
cases_time_index['Year'] = pd.to_datetime(cases_time_index['Reported Date']).dt.year
# We can add columns to include the proprotion of the intubated patients 
cases_time_index['Intubated'] = (cases_time_index['Number of patients in ICU on a ventilator with COVID-19'])/(cases_time_index['Number of patients in ICU with COVID-19'])

In [203]:
# Split for 2020 and 2021 data 
cases_time_index_2020 = cases_time_index[cases_time_index['Year'] == 2020]
cases_time_index_2021 = cases_time_index[cases_time_index['Year'] == 2021]

In [233]:
# Monthly pattern change pattern change in ICU 
fig = px.bar(cases_time_index_2020,
              x="Month",
              y="Number of patients in ICU with COVID-19",
              labels={
                     "Month": "Month",
                     "Number of patients in ICU with COVID-19": "Number of patients in ICU with COVID-19"
                 },
              title='Number of patients in ICU with COVID-19 Between Jan 26 2020 - Dec 2021')
fig.show()

In [205]:
# Monthly pattern change pattern change in ICU 
fig = px.bar(cases_time_index_2021,
              x="Month",
              y="Number of patients in ICU with COVID-19",
              labels={
                     "Month": "Month",
                     "Number of patients in ICU with COVID-19": "Number of patients in ICU with COVID-19"
                 },
              title='Number of patients in ICU with COVID-19 Between Jan 01 2021 - Mar 31 2021')
fig.show()

<p>The stratification in the cells reoresents every individual day of the month. In 2020 the months that had the highest rates of ICU admission occurred in April, May and December.</p>

<p>In 2021 The highest addmision rates were in January but have been declining steadily since and may be due to the roll out of the COVID 19 vaccine.</p>

The one flaw with this system may be attributed to the fact that individuals may be in the hospital ICU for multiple months hence the chances of double and tripple counting of certan individuals.

It is important to note that a patient being in the ICU does not imply that the patient was intubated hence we should identify what proportion of icu patients are intubated (Attached to a ventilator)

In [206]:
fig = px.pie(cases_time_index_2020,
               values='Intubated',
               names='Month',
            title='Number of patients in ICU with COVID-19 in 2020 who are Intubated')
fig.show()

The proportion of Intubated patients has consistently been under 14% all through 2020. This is a percentage in comparison to the number of patients addmitted into ICU facilities. 

May 2020 had the highest ICU intubation rates at 13.2% and the lowest was September at 9.05% which also corresponds with the low case count in September.

#### COVID Impact on Long Term Care Homes

<p>Information from press brifiengs shows that the COVID-19 pandemic was/is more dangerous to the older population compared to the younger generation. As a result of this there was a need to focus on what kind of impact COVID 19 had i Long Term Care homes where a large majority of residents have very advanced ages.</p>

In [207]:
# Create a Long term care cases variable that contains the daily changes in cases 
cases_time_index['LTC Cases'] = cases_time_index['Total Positive LTC Resident Cases'].diff()

In [208]:
# Scatter plot describing Long Term Care Home cases Over 2019 and 2020
fig = px.scatter(cases_time_index,
                 x="Reported Date",
                 y="LTC Cases",    
                 labels={
                     "Reported Date": "Date",
                     "LTC Cases": "Total Reported Covid Cases"
                 },
              title='LTC Covid Cases Between Jan 26 2020 - Mar 31 2021')
fig.show()

The pattern of the cases in Long Term Care Homes in Ontario seems to replicate that of the Daily Cases in Ontario. Just as noted earlier there is a decline between July 2020 to early September 2020. Beyond September 2020 a sharp rise is noted peaking at late December and early January of 2021. 

#### Impact on Ontario Cities 

Ontario is the 4th largest Province In Canada and carries the largest population compared to any other province in Canada (14.57 Million).Toronto and other cities in Ontario have dense populations compared to other more scattred areas hence the COVID-19 impact may have been unevenly felt in these areas. Identifing the number of cases in the various cities could be a potential way to examiine the hardest hit areas.

In [209]:
daily = pd.read_csv("daily_change.csv")
daily.head()

Unnamed: 0,Date,Algoma_Public_Health_Unit,Brant_County_Health_Unit,Chatham-Kent_Health_Unit,Durham_Region_Health_Department,Eastern_Ontario_Health_Unit,Grey_Bruce_Health_Unit,Haldimand-Norfolk_Health_Unit,"Haliburton,_Kawartha,_Pine_Ridge_District_Health_Unit",Halton_Region_Health_Department,...,Simcoe_Muskoka_District_Health_Unit,Southwestern_Public_Health,Sudbury_&_District_Health_Unit,Thunder_Bay_District_Health_Unit,Timiskaming_Health_Unit,Toronto_Public_Health,Wellington-Dufferin-Guelph_Public_Health,Windsor-Essex_County_Health_Unit,York_Region_Public_Health_Services,Total
0,2020-03-24,,,,,,,,,,...,,,,,,,,,,0
1,2020-03-25,0.0,1.0,0.0,3.0,0.0,1.0,0.0,0.0,1.0,...,0.0,0.0,1.0,0.0,0.0,17.0,1.0,1.0,5.0,46
2,2020-03-26,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,4.0,...,1.0,0.0,0.0,0.0,1.0,21.0,1.0,2.0,5.0,69
3,2020-03-27,0.0,0.0,0.0,5.0,0.0,1.0,0.0,14.0,1.0,...,4.0,2.0,1.0,0.0,0.0,22.0,0.0,0.0,34.0,124
4,2020-03-28,,,,,,,,,,...,,,,,,,,,,0


In [210]:
count_null_daily =  missing_values(daily)
count_null_daily

{'Date': 0,
 'Algoma_Public_Health_Unit': 3,
 'Brant_County_Health_Unit': 3,
 'Chatham-Kent_Health_Unit': 3,
 'Durham_Region_Health_Department': 3,
 'Eastern_Ontario_Health_Unit': 3,
 'Grey_Bruce_Health_Unit': 3,
 'Haldimand-Norfolk_Health_Unit': 3,
 'Haliburton,_Kawartha,_Pine_Ridge_District_Health_Unit': 3,
 'Halton_Region_Health_Department': 3,
 'Hamilton_Public_Health_Services': 3,
 'Hastings_and_Prince_Edward_Counties_Health_Unit': 3,
 'Huron_Perth_District_Health_Unit': 3,
 'Kingston,_Frontenac_and_Lennox_&_Addington_Public_Health': 3,
 'Lambton_Public_Health': 3,
 'Leeds,_Grenville_and_Lanark_District_Health_Unit': 3,
 'Middlesex-London_Health_Unit': 3,
 'Niagara_Region_Public_Health_Department': 3,
 'North_Bay_Parry_Sound_District_Health_Unit': 3,
 'Northwestern_Health_Unit': 3,
 'Ottawa_Public_Health': 3,
 'Peel_Public_Health': 3,
 'Peterborough_Public_Health': 3,
 'Porcupine_Health_Unit': 3,
 'Region_of_Waterloo,_Public_Health': 3,
 'Renfrew_County_and_District_Health_Unit': 

Analysis on the missing data indicates that this will not be an issue as only a few rows miss the relevant data. The rows of our data seem to have high granularity and contain sufficient info on the daily change in cases for specific major cities in Ontario

All variables contain three rows of missing data, this level of consistency in missing data raises red flags. The low number of missing values can allow us to drop theses rows and remain with a fairly sufficient amount of data. 

In [211]:
daily_no_null = daily.dropna()

In [212]:
# Missing rows removal confirmation test
print("This is the original length : {} and this is the length after removal of all rows with missing data : {}".format(
    len(daily),len(daily_no_null)))
print("As expected the difference in length is : {}".format(len(daily) - len(daily_no_null)))

This is the original length : 373 and this is the length after removal of all rows with missing data : 370
As expected the difference in length is : 3


The data currently gives us the total of the daily cases for all the cities in Ontario. The variables in the data are the cities themselves and as such we can compute the total cases for every given city over the course of the reported dates.

In [252]:
def total_cities(data):
    """
    -------------------------------------------------------
    Function that computes the total number of cases that
    have occurred in all the cities in Ontario
    -------------------------------------------------------
    Parameters:
       data - A pandas dataframe (parameter type and constraints)
    Returns:
       Sum Cities - Total number of cases for every city(dictionary)
    -------------------------------------------------------
    """
    new_data = data.copy() # Allows us to work with a deepcopy of the data
    columns =  new_data.columns
    city_sum = {}
    for i in columns:
        current = new_data[i].sum()
        city_sum[i] = current
    return city_sum
total_sum =  total_cities(daily_no_null.drop(['Date','Total'], axis=1))
total_sum

{'Algoma_Public_Health_Unit': 228.0,
 'Brant_County_Health_Unit': 2182.0,
 'Chatham-Kent_Health_Unit': 1604.0,
 'Durham_Region_Health_Department': 14109.0,
 'Eastern_Ontario_Health_Unit': 3230.0,
 'Grey_Bruce_Health_Unit': 783.0,
 'Haldimand-Norfolk_Health_Unit': 1599.0,
 'Haliburton,_Kawartha,_Pine_Ridge_District_Health_Unit': 1132.0,
 'Halton_Region_Health_Department': 10543.0,
 'Hamilton_Public_Health_Services': 12822.0,
 'Hastings_and_Prince_Edward_Counties_Health_Unit': 486.0,
 'Huron_Perth_District_Health_Unit': 1432.0,
 'Kingston,_Frontenac_and_Lennox_&_Addington_Public_Health': 898.0,
 'Lambton_Public_Health': 2822.0,
 'Leeds,_Grenville_and_Lanark_District_Health_Unit': 1244.0,
 'Middlesex-London_Health_Unit': 7073.0,
 'Niagara_Region_Public_Health_Department': 9659.0,
 'North_Bay_Parry_Sound_District_Health_Unit': 288.0,
 'Northwestern_Health_Unit': 678.0,
 'Ottawa_Public_Health': 17233.0,
 'Peel_Public_Health': 69538.0,
 'Peterborough_Public_Health': 857.0,
 'Porcupine_Health

In [253]:
# Monthly pattern change pattern change in ICU 
x = []
y = []
for i in total_sum.keys():
    x.append(i)
for i in  total_sum.values():
    y.append(i)
fig = dict({
    "data": [{"type": "bar",
              "x": x,
              "y": y}],
    "layout" : {"title": {"text": "Total Cases Reported At the Main Hospital In Major Ontario Cities"}}
})
pio.show(fig)

As expected Toronto was the region with the highest number of cases as reported by Toronto Public Health. Other regions that also reported high cases were as follows: 
    - Peel Region
    - York Region 
    - Ottawa Region 
    - Durham Region 
    - Windsor and Essex Region

The above stated regions account for the majority of the cases recorded in Ontario. Due to the orientation of the data we cannot at present arrange it based on date  inorder to see how the trend of cases in the cities.

The Government of Canada and Ontario have been taking precautions preparing for a phase 3 of the COVID pandemic. The most important measure that has been taken is the steps to increase vaccination centres across Ontario. Identifying what kind of impact these centres have had so far may be a crucial indicator or even predictor of future success or failure. 

### Vaccination in Ontario

As of March 31 2021 at 10:30 according to the Government of Canada 315,820 doses of the vaccine have been fully administered. 
The vaccine requires two doses and across Canada about 2,192,252 doses have been administered. Based on the start date of the vaccination this implies approximately 89,000 doses of the vaccine have been administered. 

Despite the vaccination procedures there is an expectation that all the precautions that were being taken before such as wearing maks and physical distancing continue being taken even after vaccination. 

We would like to  utilize data provided by the Government of Ontario that shows their testing efforts and visualizee their progress over the last few months.

In [284]:
vaccine = pd.read_csv("vaccine_covid.csv")
vaccine.head()

Unnamed: 0,report_date,previous_day_doses_administered,total_doses_administered,total_doses_in_fully_vaccinated_individuals,total_individuals_fully_vaccinated
0,2020-12-24,,10756,,
1,2020-12-30,4595.0,18603,,
2,2020-12-31,5463.0,23502,,
3,2021-01-01,5415.0,28887,,
4,2021-01-02,4305.0,33191,,


As we would like to identify their progress overtime we will need to set the index to the Reported Dates 

In [268]:
vaccine_index = vaccine.set_index(pd.DatetimeIndex(vaccine['report_date']))

In [274]:
# Daily pattern change of Covid 19 across Ontario 
fig = px.line(vaccine_index,
              x="report_date",
              y="previous_day_doses_administered",
              labels={
                     "report_date": "Date",
                     "previous_day_doses_administered": "Vaccine Doses Administered"
                 },
              title='Vaccine Doses Administered Between Dec 24 2020 - Mar 31 2021')
fig.show()

There is a consisted growth in the administration of vaccines. There are occassional dips which may be attributed to the shortage of vaccines and delays in the distribution of the vaccines by the government. 

In [282]:
length_vaccine = len(vaccine_index)
doses_full = vaccine_index['total_doses_in_fully_vaccinated_individuals '].iloc[length_vaccine-1]
total_full = vaccine_index['total_individuals_fully_vaccinated'].iloc[length_vaccine-1]
(doses_full ,total_full)

('631,639', '315,820')

Number of fully vaccinated individuals is almost twice the number of administered doses which implies that the number of doses required to complete vaccination is two.

Overall Ontario's vaccination progress seems to be promising. An extra possible step that may be taken is to attempt to make preictions on possible vaccination progress. 


Based on the government of Canada's website large quantities of doses are expected in July 2021 and thus it would be logical to keep track of progress then as well as beyond that time.

#### SOURCES AND REFERENCES 

- Government of Ontario announcement page : https://covid-19.ontario.ca/index.html
- Government of Ontario Testing progress : https://covid-19.ontario.ca/covid-19-test-and-testing-location-information
- Info on Population Data : https://www.statcan.gc.ca/eng/subjects-start/population_and_demography
- Government of Canada Vaccine Administration : https://health-infobase.canada.ca/covid-19/vaccine-administration/
- Government of Canada Key Administration : https://health-infobase.canada.ca/covid-19/vaccination-coverage/
- Prevention measures for COVID-19 :  https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection/prevention-risks.html