### Covid-19 Data 
The values displayed in the table are provided by the Public Health Infobase, managed by the Health Promotion and Chronic Disease Prevention Branch (HPCDPB) of the Public Health Agency of Canada (PHAC). The values in the dataset were up to date as at 09/01/2021.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
### Importing the Dataset for Covid-19 alongside its Data dictionary

df = pd.read_csv('covid19-download.csv',index_col='pruid')
df_dataDict = pd.read_csv('covid19-data-dictionary.csv',index_col='Column Header')

In [None]:
pd.set_option('display.max_columns', 34)
pd.set_option('display.max_rows', 34)

In [None]:
### Covid-19 Data Dictionary

df_dataDict.head(20)

In [None]:
### Top 10 rows of the Covid-19 Dataset shown in the DataFrame as seen below

df.head(10)

In [None]:
df

In [None]:
# Displaying the columns in the dataset

df.columns

In [None]:
# Removing prnameFR column as it is a duplicate of prname in French 
df.drop('prnameFR', axis=1, inplace=True)


In [None]:
data = df['prname'].value_counts()
data

In [None]:
df['prname'].unique()

In [None]:
df2 = df.loc[48,['prname','date','numconf']]
df3 = df.loc[59,['prname','date','numconf']]

The Number of Confirmed Cases in Alberta as at 08/01/2021 is 109,652 

In [None]:
df.loc[35,['date','numconf']].max()

In [None]:
filt1 = (df['date'] == '2020-12-31')
filt2 = (df['date'] == '2021-01-08')

In [None]:
df_2020 = df.loc[filt1,['prname','numtotal']]
df_2020.rename(columns={'numtotal':'numtotal_2020'},inplace=True)

In [None]:
df_2021 = df.loc[filt2,['prname','numtotal']]
df_2021.rename(columns={'numtotal':'numtotal_2021'},inplace=True)

In [None]:
df_new = pd.merge(df_2020,df_2021)
df_new

In [None]:
df_new['Total'] = df_new['numtotal_2020'] + df_new['numtotal_2021']
df_new['Percentage Increase'] = ((df_new['numtotal_2021'] - df_new['numtotal_2020']) / df_new['Total']) *100

In [None]:
df_new.drop(13,axis =0,inplace=True)


In [None]:
#df_new.set_index('prname',inplace=True)


In [None]:
df_new

In [None]:
df_new.to_csv('Modified.csv')

In [None]:
# Column Chart for % Increase in Covid-19 Cases in Canada 

plt.figure(figsize=(20,10))
plt.title('% Increase of Covid-19 Cases in Canada as at 08/01/2021')
plt.bar(df_new['prname'][1:13], df_new['Percentage Increase'][1:13], width=0.8, align='center')

plt.ylabel('% Increase of Covid-19 Cases')
plt.xlabel('Provinces')
plt.show()



#### Summary
As seen in the Figure above,as at 08/01/2021, the province of New Brunswick seems to have had the highest increase in Covid-19 cases in Canada. There were no significant changes in the cases recoreded in the Norwest Territories and Nunavut.

In [None]:
plt.figure(figsize=(15,8))

plt.title('Total number of Covid-19 cases in Canada') 
plt.plot(df_new['prname'][1:13], df_new['numtotal_2020'][1:13],'b.-', label='2020 Total Cases')
plt.plot(df_new['prname'][1:13], df_new['numtotal_2021'][1:13],'r.-', label='2021 Total Cases')

plt.ylabel('Total number of Covid-19 Cases Confirmed')
plt.xlabel('Provinces')

plt.legend()
plt.show()


#### Summary
The total number of Covid-19 cases recorded in the year 2020 is compared to the total number of cases recorded as at January 08, 2021. A notable increase is seen in the province of Quebec as the highest, followed by Ontario and then Alberta

In [None]:
#Covid-19 Trend in Canada

plt.title('Covid-19 trend in Canada as at 08/01/2021')
plt.plot(df['date'][1], df['numconf'][1],'b.-', label='Canada')

plt.ylabel('Number of Covid-19 Cases Confirmed')
plt.xlabel('Date')

plt.xticks(df.date[::900])
plt.legend()
plt.show()


#### Summary
The figure above shows a trend of confirmed Covid-19 cases in Canada. As this time, no dip has been noticed in the trend which is at its all time highest of 644,348 confirmed cases.

In [None]:
# Trend of Covid-19 cases in Alberta and Ontario

plt.figure(figsize=(10,5))

plt.title('Covid-19 trend in Alberta and Ontario (2020 -2021)')
plt.plot(df['date'][48], df['numconf'][48], label='Alberta')
plt.plot(df['date'][35], df['numconf'][35], label='Ontario')

plt.ylabel('Number of Covid-19 Cases Confirmed')
plt.xlabel('Date')

plt.xticks(df.date[::670])
plt.legend()
plt.show()

#### Summary
The Number of confirmed cases for the provinces of Alberta and Ontario are being compared in the Figure above. Given the numbers, Ontario clearly has more numbers of confirmed cases with Alberta following behind.

In [None]:
# Histogram for the distribution of Covid-19 cases in Canada
plt.figure(figsize=(10,5))
plt.title('Histogram Distribution for Total Number of Covid-19 Cases in Canada')

plt.hist(df['numtotal'])

plt.xlabel('Total Number of Covid-19 Cases')
plt.show()

#### Summary
A histogram is plotted as a way to know the distribution for the total number of Covid-19 cases for the Provinces and Territories. It is noticed that the mean for the distribution lies between 0 to 50,000.