## Coronavirus 
- Coronaviruses are zoonotic viruses (means transmitted between animals and people).
- Symptoms include from fever, cough, respiratory symptoms, and breathing difficulties.
- In severe cases, it can cause pneumonia, severe acute respiratory syndrome (SARS), kidney failure and even death.
- A novel coronavirus (nCoV) is a new strain that has not been previously identified in humans.

## COVID-19
- Corona Virus Disease 2019
- Caused by a SARS-COV-2 corona virus.
- First identified in Wuhan, Hubei, China.
- Earliest reported symptoms occurred on 1 December 2019.
- First cases were linked to contact with the Huanan Seafood Wholesale Market, which sold live animals.
- On 30 January the WHO declared the outbreak to be a Public Health Emergency of International Concern

In [1]:
# For Analysis 
import numpy as np 
import pandas as pd 
import geopandas as gpd 

# DateTime Format 
from datetime import datetime, timedelta, timezone

# Colors 
import calmap

# Grammar of graphics 
from plotnine import *

# Map Folium 
import folium 

# Static Plotting 
import matplotlib.pyplot as plt 
import matplotlib.dates as mdates
import seaborn as sns 

# Interactive Visualizations 
import plotly.express as px 

## Data Import

In [2]:
# Import xlsx file and store each sheet in to a df list
xl_file = pd.ExcelFile("./data/data.xls",)
dfs = {sheet_name: xl_file.parse(sheet_name) for sheet_name in xl_file.sheet_names}

In [3]:
# Data from each sheet can be accessed via key
keylist = list(dfs.keys())

In [4]:
# Examine the sheet name 
keylist[1:10]

['2020-03-12-05-30',
 '2020-03-12-04-30',
 '2020-03-12-03-30',
 '2020-03-12-02-00',
 '2020-03-12-01-00',
 '2020-03-12-00-30',
 '2020-03-12-00-00',
 '2020-03-11-22-30',
 '2020-03-11-21-00']

## Data Exploring

In [5]:
# Examine first few rows 
dfs[keylist[0]].head(20)

Unnamed: 0,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
0,Hubei,Mainland China,3/12/2020 06:00,67781,3056,50318
1,Guangdong,Mainland China,3/12/2020 06:00,1356,8,1289
2,Zhejiang,Mainland China,3/12/2020 06:00,1215,1,1197
3,Shandong,Mainland China,3/12/2020 06:00,760,6,734
4,Henan,Mainland China,3/12/2020 06:00,1273,22,1249
5,Anhui,Mainland China,3/12/2020 06:00,990,6,984
6,Jiangxi,Mainland China,3/12/2020 06:00,935,1,934
7,Hunan,Mainland China,3/12/2020 06:00,1018,4,999
8,Heilongjiang,Mainland China,3/12/2020 06:00,482,13,440
9,Sichuan,Mainland China,3/12/2020 06:00,539,3,496


In [6]:
# Check datamatrix of updated data 
dfs[keylist[0]].shape

(213, 6)

In [7]:
# Basic info about dataset 
dfs[keylist[0]].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 213 entries, 0 to 212
Data columns (total 6 columns):
Province/State    96 non-null object
Country/Region    213 non-null object
Last Update       213 non-null object
Confirmed         213 non-null int64
Deaths            213 non-null int64
Recovered         213 non-null int64
dtypes: int64(3), object(3)
memory usage: 10.1+ KB


In [8]:
# Numerical summary of dataset
dfs[keylist[0]].describe()

Unnamed: 0,Confirmed,Deaths,Recovered
count,213.0,213.0,213.0
mean,599.694836,22.13615,320.71831
std,4790.25079,218.429915,3454.529561
min,1.0,0.0,0.0
25%,3.0,0.0,0.0
50%,17.0,0.0,0.0
75%,93.0,1.0,6.0
max,67781.0,3056.0,50318.0


In [9]:
# Check Missing Values 
dfs[keylist[0]].isnull().sum() 

Province/State    117
Country/Region      0
Last Update         0
Confirmed           0
Deaths              0
Recovered           0
dtype: int64

## Data Cleaning

In [10]:
# Data Cleaning 
for key, df in dfs.items():
    dfs[key].loc[:, 'Confirmed'].fillna(value=0, inplace=True)
    dfs[key].loc[:, 'Deaths'].fillna(value=0, inplace=True)
    dfs[key].loc[:, 'Recovered'].fillna(value=0, inplace=True)
   
    # Convert cases into integer
    dfs[key] = dfs[key].astype({'Confirmed':'int64', 'Deaths':'int64', 'Recovered':'int64'})
    
    # Replace 
    dfs[key] = dfs[key].replace({'Country/Region': 'Mainland China'}, 'China')
    dfs[key] = dfs[key].replace({'Country/Region': 'Queensland'}, 'Brisbane')
    dfs[key] = dfs[key].replace({'Country/Region': 'New South Wales'}, 'Sydney')
    dfs[key] = dfs[key].replace({'Country/Region': 'Victoria'}, 'Melbourne')
    dfs[key]=dfs[key].replace({'Province/State':'South Australia'}, 'Adelaide')

    # DateTime Format 
    dfs[key]['Last Update'] = '0' + dfs[key]['Last Update']
    dfs[key]['Date_last_updated'] = [datetime.strptime(d, '%m/%d/%Y %H:%M') for d in dfs[key]['Last Update']]

In [11]:
# Take a look at cleaned data 
dfs[keylist[0]].head()

Unnamed: 0,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered,Date_last_updated
0,Hubei,China,03/12/2020 06:00,67781,3056,50318,2020-03-12 06:00:00
1,Guangdong,China,03/12/2020 06:00,1356,8,1289,2020-03-12 06:00:00
2,Zhejiang,China,03/12/2020 06:00,1215,1,1197,2020-03-12 06:00:00
3,Shandong,China,03/12/2020 06:00,760,6,734,2020-03-12 06:00:00
4,Henan,China,03/12/2020 06:00,1273,22,1249,2020-03-12 06:00:00


## Confirmed Cases

In [12]:
# Construct new dataframe called df_confirmed for line plot 
DateList = []
ChinaList =[]
OtherList = []

for key, df in dfs.items():
    # Group by Country/Region and Sum
    dfTemp = df.groupby(['Country/Region'])['Confirmed'].agg(np.sum)
    # Make a df with dfTemp index and values and store in dfTemp
    dfTemp = pd.DataFrame({'Code':dfTemp.index, 'Confirmed':dfTemp.values})
    # Sorting values by Confirmed 
    dfTemp = dfTemp.sort_values(by='Confirmed', ascending=False).reset_index(drop=True)
    # Append dfTemp to DateList in df[0]
    DateList.append(df['Date_last_updated'][0])
    # Append dfTemp to ChinaList df[0]
    ChinaList.append(dfTemp['Confirmed'][0])
    # Append dfTemp to Otherlist df[1:]
    OtherList.append(dfTemp['Confirmed'][1:].sum())

# Make Confirmed dataframe with DateList, ChinaList, OtherList 
# Column name Date, Mainland China, Other Locations 
df_confirmed = pd.DataFrame({'Date':DateList,
                             'Mainland China':ChinaList,
                             'Other locations':OtherList})  


# Sorting and index reset 
df_confirmed['date_day']=[d.date() for d in df_confirmed['Date']]
df_confirmed=df_confirmed.groupby(by=df_confirmed['date_day'], sort=False).transform(max).drop_duplicates(['Date'])
df_confirmed['Total']=df_confirmed['Mainland China']+df_confirmed['Other locations']
df_confirmed=df_confirmed.reset_index(drop=True)

In [13]:
# Examine confirmed dataset
df_confirmed.head()

Unnamed: 0,Date,Mainland China,Other locations,Total
0,2020-03-12 06:00:00,80793,46942,127735
1,2020-03-11 22:30:00,80791,45562,126353
2,2020-03-10 23:00:00,80778,38469,119247
3,2020-03-09 22:30:00,80754,33717,114471
4,2020-03-08 22:00:00,80735,29361,110096


## Recovered Cases

In [14]:
# Construct new dataframe called df_recovered for line plot 
DateList = []
ChinaList =[]
OtherList = []

for key, df in dfs.items():
    # Group by Country/Region and Sum
    dfTemp = df.groupby(['Country/Region'])['Recovered'].agg(np.sum)
    # Make a df with dfTemp index and values and store in dfTemp
    dfTemp = pd.DataFrame({'Code':dfTemp.index, 'Recovered':dfTemp.values})
    # Sorting values by Confirmed 
    dfTemp = dfTemp.sort_values(by='Recovered', ascending=False).reset_index(drop=True)
    # Append dfTemp to DateList in df[0]
    DateList.append(df['Date_last_updated'][0])
    # Append dfTemp to ChinaList df[0]
    ChinaList.append(dfTemp['Recovered'][0])
    # Append dfTemp to Otherlist df[1:]
    OtherList.append(dfTemp['Recovered'][1:].sum())

# Make Confirmed dataframe with DateList, ChinaList, OtherList 
# Column name Date, Mainland China, Other Locations 
df_recovered = pd.DataFrame({'Date':DateList,
                             'Mainland China':ChinaList,
                             'Other locations':OtherList})  


# Sorting and index reset 
df_recovered['date_day']=[d.date() for d in df_recovered['Date']]
df_recovered=df_recovered.groupby(by=df_recovered['date_day'], sort=False).transform(max).drop_duplicates(['Date'])
df_recovered['Total']=df_recovered['Mainland China']+df_recovered['Other locations']
df_recovered=df_recovered.reset_index(drop=True)

In [15]:
df_recovered.head()

Unnamed: 0,Date,Mainland China,Other locations,Total
0,2020-03-12 06:00:00,62811,5502,68313
1,2020-03-11 22:30:00,62780,5492,68272
2,2020-03-10 23:00:00,61459,5134,66593
3,2020-03-09 22:30:00,59884,4168,64052
4,2020-03-08 22:00:00,58587,3669,62256


## Deaths Cases

In [16]:
# Construct new dataframe called df_deaths for line plot 
DateList = []
ChinaList =[]
OtherList = []

for key, df in dfs.items():
    # Group by Country/Region and Sum
    dfTemp = df.groupby(['Country/Region'])['Deaths'].agg(np.sum)
    # Make a df with dfTemp index and values and store in dfTemp
    dfTemp = pd.DataFrame({'Code':dfTemp.index, 'Deaths':dfTemp.values})
    # Sorting values by Confirmed 
    dfTemp = dfTemp.sort_values(by='Deaths', ascending=False).reset_index(drop=True)
    # Append dfTemp to DateList in df[0]
    DateList.append(df['Date_last_updated'][0])
    # Append dfTemp to ChinaList df[0]
    ChinaList.append(dfTemp['Deaths'][0])
    # Append dfTemp to Otherlist df[1:]
    OtherList.append(dfTemp['Deaths'][1:].sum())

# Make Confirmed dataframe with DateList, ChinaList, OtherList 
# Column name Date, Mainland China, Other Locations 
df_deaths = pd.DataFrame({'Date':DateList,
                             'Mainland China':ChinaList,
                             'Other locations':OtherList})  


# Sorting and index reset 
df_deaths['date_day']=[d.date() for d in df_deaths['Date']]
df_deaths=df_deaths.groupby(by=df_deaths['date_day'], sort=False).transform(max).drop_duplicates(['Date'])
df_deaths['Total']=df_deaths['Mainland China']+df_deaths['Other locations']
df_deaths=df_deaths.reset_index(drop=True)

In [17]:
df_deaths.head()

Unnamed: 0,Date,Mainland China,Other locations,Total
0,2020-03-12 06:00:00,3169,1546,4715
1,2020-03-11 22:30:00,3169,1463,4632
2,2020-03-10 23:00:00,3158,1140,4298
3,2020-03-09 22:30:00,3136,890,4026
4,2020-03-08 22:00:00,3119,711,3830


## Total Confirmed , Recovered and Deaths Cases 

In [22]:
# Total No. of cases of updated dataset 
confirmed_cases = dfs[keylist[0]]['Confirmed'].sum()
deaths_cases = dfs[keylist[0]]['Deaths'].sum()
recovered_cases = dfs[keylist[0]]['Recovered'].sum()

In [23]:
# Print the Total no. of cases in first dataset
print(f"Confirmed = {confirmed_cases}")
print(f"Recovered = {recovered_cases}")
print(f"Deaths = {deaths_cases}")

Confirmed = 127735
Recovered = 68313
Deaths = 4715


## Active Cases 
$Active Cases = Total Confirmed - Total Recovered - Total Deaths$

In [27]:
active_cases = dfs[keylist[0]]['Confirmed'].sum() - dfs[keylist[0]]['Recovered'].sum() - dfs[keylist[0]]['Deaths'].sum()
print(f"Active Cases = {active_cases}")

Active Cases = 54707


## Case Fatality Rate 
$Case Fatality Rate(CFR) = \frac {Deaths(of 100 Cases)}{Confirmed Cases} \times 100$

In [None]:
CFR = round(deaths_cases/ confirmed_cases) * 100
print(CFR)

In [None]:
%%HTML
<style type="text/css">
table.dataframe td, table.dataframe th {
    border: 1px  black solid !important;
  color: black !important;
}`
</style>

- Plotly Template: https://plot.ly/python/templates/