# COVID-19 Data Analysis & Visualization

## What is COVID-19?

> Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus.
Most people infected with the COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment.  Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness.
The best way to prevent and slow down transmission is to be well informed about the COVID-19 virus, the disease it causes and how it spreads. Protect yourself and others from infection by washing your hands or using an alcohol based rub frequently and not touching your face. 
The COVID-19 virus spreads primarily through droplets of saliva or discharge from the nose when an infected person coughs or sneezes, so it’s important that you also practice respiratory etiquette (for example, by coughing into a flexed elbow).

![Coronavirus Image](https://cdn.pixabay.com/photo/2020/03/16/16/29/virus-4937553_960_720.jpg)

# Importing Libraries

In [1]:
# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import plotly.express as px
import plotly.graph_objects as go

from IPython.core.display import display, HTML

from ipywidgets import interact
import ipywidgets as widgets
import folium

# Loading the data from source

In [2]:
# loading data right from the source:
death_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
confirmed_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
recovered_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
country_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv')

# Understanding the data

In [3]:
# Checking some of the rows from death_df dataframe
death_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/21/21,5/22/21,5/23/21,5/24/21,5/25/21,5/26/21,5/27/21,5/28/21,5/29/21,5/30/21
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,2782,2792,2802,2812,2836,2855,2869,2881,2899,2919
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,2441,2442,2444,2445,2447,2447,2447,2448,2449,2450
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,3405,3411,3418,3426,3433,3440,3448,3455,3460,3465
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,127,127,127,127,127,127,127,127,127,127
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,709,715,725,731,735,742,745,749,757,764


In [4]:
# Checking some of the rows from confirmed_df dataframe
confirmed_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/21/21,5/22/21,5/23/21,5/24/21,5/25/21,5/26/21,5/27/21,5/28/21,5/29/21,5/30/21
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,65080,65486,65728,66275,66903,67743,68366,69130,70111,70761
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,132153,132176,132209,132215,132229,132244,132264,132285,132297,132309
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,126434,126651,126860,127107,127361,127646,127926,128198,128456,128725
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,13569,13569,13569,13569,13664,13671,13682,13693,13693,13693
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,31909,32149,32441,32623,32933,33338,33607,33944,34180,34366


In [5]:
# Checking some of the rows from recovered_df dataframe
recovered_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/21/21,5/22/21,5/23/21,5/24/21,5/25/21,5/26/21,5/27/21,5/28/21,5/29/21,5/30/21
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,55790,55889,56035,56295,56518,56711,56962,57119,57281,57450
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,128425,128601,128732,128826,128907,128978,129042,129097,129215,129308
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,88066,88208,88346,88497,88672,88861,89040,89232,89419,89625
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,13234,13234,13234,13234,13263,13381,13405,13416,13416,13416
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,26513,26775,26778,27087,27204,27467,27529,27577,27646,27766


In [6]:
# Checking some of the rows from country_df dataframe
country_df.head()

Unnamed: 0,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Incident_Rate,People_Tested,People_Hospitalized,Mortality_Rate,UID,ISO3
0,Afghanistan,2021-05-31 10:24:02,33.93911,67.709953,70761.0,2919.0,57450.0,10392.0,181.772452,,,4.125154,4,AFG
1,Albania,2021-05-31 10:24:02,41.1533,20.1683,132309.0,2450.0,129308.0,551.0,4597.574536,,,1.851726,8,ALB
2,Algeria,2021-05-31 10:24:02,28.0339,1.6596,128725.0,3465.0,89625.0,35635.0,293.5506,,,2.691785,12,DZA
3,Andorra,2021-05-31 10:24:02,42.5063,1.5218,13693.0,127.0,13416.0,150.0,17722.125154,,,0.927481,20,AND
4,Angola,2021-05-31 10:24:02,-11.2027,17.8739,34366.0,764.0,27766.0,5836.0,104.563134,,,2.223128,24,AGO


In [7]:
# Checking the size of the dataframes
print(death_df.shape)
print(confirmed_df.shape)
print(recovered_df.shape)
print(country_df.shape)

(276, 499)
(276, 499)
(261, 499)
(193, 14)


In [8]:
# Checking for the data types of each column
death_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 276 entries, 0 to 275
Columns: 499 entries, Province/State to 5/30/21
dtypes: float64(2), int64(495), object(2)
memory usage: 1.1+ MB


In [9]:
# Checking for the data types of each column
death_df.info('show')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 276 entries, 0 to 275
Data columns (total 499 columns):
 #    Column          Dtype  
---   ------          -----  
 0    Province/State  object 
 1    Country/Region  object 
 2    Lat             float64
 3    Long            float64
 4    1/22/20         int64  
 5    1/23/20         int64  
 6    1/24/20         int64  
 7    1/25/20         int64  
 8    1/26/20         int64  
 9    1/27/20         int64  
 10   1/28/20         int64  
 11   1/29/20         int64  
 12   1/30/20         int64  
 13   1/31/20         int64  
 14   2/1/20          int64  
 15   2/2/20          int64  
 16   2/3/20          int64  
 17   2/4/20          int64  
 18   2/5/20          int64  
 19   2/6/20          int64  
 20   2/7/20          int64  
 21   2/8/20          int64  
 22   2/9/20          int64  
 23   2/10/20         int64  
 24   2/11/20         int64  
 25   2/12/20         int64  
 26   2/13/20         int64  
 27   2/14/20         in

In [10]:
# Checking for the data types of each column
confirmed_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 276 entries, 0 to 275
Columns: 499 entries, Province/State to 5/30/21
dtypes: float64(2), int64(495), object(2)
memory usage: 1.1+ MB


In [11]:
# Checking for the data types of each column
confirmed_df.info('show')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 276 entries, 0 to 275
Data columns (total 499 columns):
 #    Column          Dtype  
---   ------          -----  
 0    Province/State  object 
 1    Country/Region  object 
 2    Lat             float64
 3    Long            float64
 4    1/22/20         int64  
 5    1/23/20         int64  
 6    1/24/20         int64  
 7    1/25/20         int64  
 8    1/26/20         int64  
 9    1/27/20         int64  
 10   1/28/20         int64  
 11   1/29/20         int64  
 12   1/30/20         int64  
 13   1/31/20         int64  
 14   2/1/20          int64  
 15   2/2/20          int64  
 16   2/3/20          int64  
 17   2/4/20          int64  
 18   2/5/20          int64  
 19   2/6/20          int64  
 20   2/7/20          int64  
 21   2/8/20          int64  
 22   2/9/20          int64  
 23   2/10/20         int64  
 24   2/11/20         int64  
 25   2/12/20         int64  
 26   2/13/20         int64  
 27   2/14/20         in

In [12]:
# Checking for the data types of each column
recovered_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 261 entries, 0 to 260
Columns: 499 entries, Province/State to 5/30/21
dtypes: float64(2), int64(495), object(2)
memory usage: 1017.6+ KB


In [13]:
# Checking for the data types of each column
recovered_df.info('show')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 261 entries, 0 to 260
Data columns (total 499 columns):
 #    Column          Dtype  
---   ------          -----  
 0    Province/State  object 
 1    Country/Region  object 
 2    Lat             float64
 3    Long            float64
 4    1/22/20         int64  
 5    1/23/20         int64  
 6    1/24/20         int64  
 7    1/25/20         int64  
 8    1/26/20         int64  
 9    1/27/20         int64  
 10   1/28/20         int64  
 11   1/29/20         int64  
 12   1/30/20         int64  
 13   1/31/20         int64  
 14   2/1/20          int64  
 15   2/2/20          int64  
 16   2/3/20          int64  
 17   2/4/20          int64  
 18   2/5/20          int64  
 19   2/6/20          int64  
 20   2/7/20          int64  
 21   2/8/20          int64  
 22   2/9/20          int64  
 23   2/10/20         int64  
 24   2/11/20         int64  
 25   2/12/20         int64  
 26   2/13/20         int64  
 27   2/14/20         in

In [14]:
# Checking for the distribution of data
confirmed_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 276 entries, 0 to 275
Columns: 499 entries, Province/State to 5/30/21
dtypes: float64(2), int64(495), object(2)
memory usage: 1.1+ MB


In [15]:
# Checking for the distribution of data
confirmed_df.info('show')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 276 entries, 0 to 275
Data columns (total 499 columns):
 #    Column          Dtype  
---   ------          -----  
 0    Province/State  object 
 1    Country/Region  object 
 2    Lat             float64
 3    Long            float64
 4    1/22/20         int64  
 5    1/23/20         int64  
 6    1/24/20         int64  
 7    1/25/20         int64  
 8    1/26/20         int64  
 9    1/27/20         int64  
 10   1/28/20         int64  
 11   1/29/20         int64  
 12   1/30/20         int64  
 13   1/31/20         int64  
 14   2/1/20          int64  
 15   2/2/20          int64  
 16   2/3/20          int64  
 17   2/4/20          int64  
 18   2/5/20          int64  
 19   2/6/20          int64  
 20   2/7/20          int64  
 21   2/8/20          int64  
 22   2/9/20          int64  
 23   2/10/20         int64  
 24   2/11/20         int64  
 25   2/12/20         int64  
 26   2/13/20         int64  
 27   2/14/20         in

In [16]:
# Checking for the distribution of data
confirmed_df['Country/Region'].value_counts()

China             34
Canada            16
United Kingdom    12
France            12
Australia          8
                  ..
Suriname           1
Peru               1
Panama             1
Jamaica            1
Niger              1
Name: Country/Region, Length: 193, dtype: int64

# Preparing the data

In [17]:
# Converting the columns names to lower case
death_df.columns = map(str.lower,death_df.columns)
confirmed_df.columns = map(str.lower,confirmed_df.columns)
recovered_df.columns = map(str.lower,recovered_df.columns)
country_df.columns = map(str.lower,country_df.columns)

In [18]:
# Renaming some of the columns for easy handling
death_df = death_df.rename(columns={'province/state': 'state', 'country/region': 'country'})
confirmed_df = confirmed_df.rename(columns={'province/state': 'state', 'country/region': 'country'})
recovered_df = recovered_df.rename(columns={'province/state': 'state', 'country/region': 'country'})
country_df = country_df.rename(columns={'country_region': 'country'})

In [19]:
death_df.head()

Unnamed: 0,state,country,lat,long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/21/21,5/22/21,5/23/21,5/24/21,5/25/21,5/26/21,5/27/21,5/28/21,5/29/21,5/30/21
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,2782,2792,2802,2812,2836,2855,2869,2881,2899,2919
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,2441,2442,2444,2445,2447,2447,2447,2448,2449,2450
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,3405,3411,3418,3426,3433,3440,3448,3455,3460,3465
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,127,127,127,127,127,127,127,127,127,127
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,709,715,725,731,735,742,745,749,757,764


# Feature Engineering
### Confirmed/Death/Recovered New Cases

In [20]:
confirmed_df.head()

Unnamed: 0,state,country,lat,long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/21/21,5/22/21,5/23/21,5/24/21,5/25/21,5/26/21,5/27/21,5/28/21,5/29/21,5/30/21
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,65080,65486,65728,66275,66903,67743,68366,69130,70111,70761
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,132153,132176,132209,132215,132229,132244,132264,132285,132297,132309
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,126434,126651,126860,127107,127361,127646,127926,128198,128456,128725
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,13569,13569,13569,13569,13664,13671,13682,13693,13693,13693
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,31909,32149,32441,32623,32933,33338,33607,33944,34180,34366


In [21]:
# Creating a new feature "NewCases" to capture the difference between the last/latest day count and 2nd last day count
confirmed_df.insert(4,'NewCases',0) # 插入放在第4個columns上，NewCases是column name，初始值設為0
confirmed_df['NewCases'] = confirmed_df.iloc[:,-1] - confirmed_df.iloc[:,-2] # 計算所有的row最後一欄位減去倒數第二欄位值

In [22]:
confirmed_df.head()

Unnamed: 0,state,country,lat,long,NewCases,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,...,5/21/21,5/22/21,5/23/21,5/24/21,5/25/21,5/26/21,5/27/21,5/28/21,5/29/21,5/30/21
0,,Afghanistan,33.93911,67.709953,650,0,0,0,0,0,...,65080,65486,65728,66275,66903,67743,68366,69130,70111,70761
1,,Albania,41.1533,20.1683,12,0,0,0,0,0,...,132153,132176,132209,132215,132229,132244,132264,132285,132297,132309
2,,Algeria,28.0339,1.6596,269,0,0,0,0,0,...,126434,126651,126860,127107,127361,127646,127926,128198,128456,128725
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,13569,13569,13569,13569,13664,13671,13682,13693,13693,13693
4,,Angola,-11.2027,17.8739,186,0,0,0,0,0,...,31909,32149,32441,32623,32933,33338,33607,33944,34180,34366


In [23]:
# Creating a new feature "NewCases" to capture the difference between the last/latest day count and 2nd last day count
recovered_df.insert(4,'NewCases',0)
recovered_df['NewCases'] = recovered_df.iloc[:,-1] - recovered_df.iloc[:,-2]

In [24]:
# Creating a new feature "NewCases" to capture the difference between the last/latest day count and 2nd last day count
death_df.insert(4,'NewCases',0)
death_df['NewCases'] = death_df.iloc[:,-1] - death_df.iloc[:,-2]

## Overal Worldwide Counts

In [25]:
# Summing up the total confirmed cases across countries
confirmed_total = country_df['confirmed'].sum()
confirmed_total

170384848.0

In [26]:
# Summing up the total deaths cases across countries
deaths_total = country_df['deaths'].sum()
deaths_total

3542587.0

In [27]:
# Summing up the total recovered cases across countries
recovered_total = country_df['recovered'].sum()
recovered_total

107683365.0

In [28]:
# Summing up the total active cases across countries
active_total = country_df['active'].sum()
active_total

26492923.0

In [29]:
# displaying the current total stats

display(HTML("<div style = 'background-color: #504e4e; padding: 32px '>" +
             "<span style='color: #fff; font-size:32px;'> Confirmed: "  + str(confirmed_total) +"</span>" +
             "<span style='color: red; font-size:32px;margin-left:22px;'> Deaths: " + str(deaths_total) + "</span>"+
             "<span style='color: lightgreen; font-size:32px; margin-left:22px;'> Recovered: " + str(recovered_total) + "</span>"+
             "<span style='color: #fff; font-size:32px; margin-left:22px;'> Active: "  + str(active_total) +"</span>" +
             "</div>")
       )

# Data Visualization through Bubble Charts

### Latest Count of Confirmed New Cases

In [30]:
# Printing some rows
confirmed_df.head()

Unnamed: 0,state,country,lat,long,NewCases,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,...,5/21/21,5/22/21,5/23/21,5/24/21,5/25/21,5/26/21,5/27/21,5/28/21,5/29/21,5/30/21
0,,Afghanistan,33.93911,67.709953,650,0,0,0,0,0,...,65080,65486,65728,66275,66903,67743,68366,69130,70111,70761
1,,Albania,41.1533,20.1683,12,0,0,0,0,0,...,132153,132176,132209,132215,132229,132244,132264,132285,132297,132309
2,,Algeria,28.0339,1.6596,269,0,0,0,0,0,...,126434,126651,126860,127107,127361,127646,127926,128198,128456,128725
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,13569,13569,13569,13569,13664,13671,13682,13693,13693,13693
4,,Angola,-11.2027,17.8739,186,0,0,0,0,0,...,31909,32149,32441,32623,32933,33338,33607,33944,34180,34366


In [31]:
confirmed_df['country'].value_counts()

China             34
Canada            16
United Kingdom    12
France            12
Australia          8
                  ..
Suriname           1
Peru               1
Panama             1
Jamaica            1
Niger              1
Name: country, Length: 193, dtype: int64

In [32]:
# To understand how groupby clause works
confirmed_df.groupby("country")['NewCases'].sum()

country
Afghanistan           650
Albania                12
Algeria               269
Andorra                 0
Angola                186
                     ... 
Vietnam               199
West Bank and Gaza    210
Yemen                   6
Zambia                299
Zimbabwe               11
Name: NewCases, Length: 193, dtype: int64

In [33]:
# Aggregating the confirmed new cases against each country
country_confirmed_newcases = confirmed_df.groupby("country")['NewCases'].sum().reset_index(name ='TotalNewCases')
country_confirmed_newcases = country_confirmed_newcases.sort_values(by='TotalNewCases', ascending=False)
country_confirmed_newcases.head(5)

Unnamed: 0,country,TotalNewCases
79,India,152734
23,Brazil,43520
6,Argentina,21346
37,Colombia,20218
143,Russia,9558


In [34]:
# Visualizing the new confirmed cases against each country using plotly
def bubble_chart(n):
    fig = px.scatter(country_confirmed_newcases.head(n), x="country", y="TotalNewCases", size="TotalNewCases", color="country",
               hover_name="country", size_max=60)
    fig.update_layout(
    title=str(n) +" countries with highest number of confirmed new cases. <br> (Last updated on "+(confirmed_df.columns)[-1]+")",
    xaxis_title="Countries",
    yaxis_title="Confirmed New Cases",
    width = 900
    )
    fig.show();
    

interact(bubble_chart, n=10)
ipywLayout = widgets.Layout()
ipywLayout.display='none'

interactive(children=(IntSlider(value=10, description='n', max=30, min=-10), Output()), _dom_classes=('widget-…

### Latest Count of New Death Cases

In [35]:
# Aggregating the new death cases against each country
country_death_newcases = death_df.groupby("country")['NewCases'].sum().reset_index(name ='TotalNewCases')
country_death_newcases = country_death_newcases.sort_values(by='TotalNewCases', ascending=False)
country_death_newcases.head(5)

Unnamed: 0,country,TotalNewCases
79,India,3128
23,Brazil,874
37,Colombia,535
143,Russia,349
6,Argentina,348


In [36]:
# Visualizing the new death cases against each country using plotly
def bubble_chart(n):
    fig = px.scatter(country_death_newcases.head(n), x="country", y="TotalNewCases", size="TotalNewCases", color="country",
               hover_name="country", size_max=60)
    fig.update_layout(
    title=str(n) +" countries with highest number of new death cases. <br> (Last updated on "+(death_df.columns)[-1]+")",
    xaxis_title="Countries",
    yaxis_title="New Death Cases",
    width = 900
    )
    fig.show();

    
interact(bubble_chart, n=10)
ipywLayout = widgets.Layout()
ipywLayout.display='none'

interactive(children=(IntSlider(value=10, description='n', max=30, min=-10), Output()), _dom_classes=('widget-…

### Latest Count of New Recovered Cases

In [37]:
# Aggregating the recovered new cases against each country
country_recovered_newcases = recovered_df.groupby("country")['NewCases'].sum().reset_index(name ='TotalNewCases')
country_recovered_newcases = country_recovered_newcases.sort_values(by='TotalNewCases', ascending=False)
country_recovered_newcases.head(5)

Unnamed: 0,country,TotalNewCases
79,India,238022
23,Brazil,44283
6,Argentina,30601
81,Iran,16495
37,Colombia,12412


In [38]:
# Visualizing the recovered new cases against each country using plotly
def bubble_chart(n):
    fig = px.scatter(country_recovered_newcases.head(n), x="country", y="TotalNewCases", size="TotalNewCases", color="country",
               hover_name="country", size_max=60)
    fig.update_layout(
    title=str(n) +" countries with highest number of new recovered cases. <br> (Last updated on "+(recovered_df.columns)[-1]+")",
    xaxis_title="Countries",
    yaxis_title="New Recovered Cases",
    width = 900
    )
    fig.show();

    
interact(bubble_chart, n=10)
ipywLayout = widgets.Layout()
ipywLayout.display='none'

interactive(children=(IntSlider(value=10, description='n', max=30, min=-10), Output()), _dom_classes=('widget-…

# Data Visualization through Line Charts

### Trend of Confirmed Case

In [39]:
confirmed_df.head()

Unnamed: 0,state,country,lat,long,NewCases,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,...,5/21/21,5/22/21,5/23/21,5/24/21,5/25/21,5/26/21,5/27/21,5/28/21,5/29/21,5/30/21
0,,Afghanistan,33.93911,67.709953,650,0,0,0,0,0,...,65080,65486,65728,66275,66903,67743,68366,69130,70111,70761
1,,Albania,41.1533,20.1683,12,0,0,0,0,0,...,132153,132176,132209,132215,132229,132244,132264,132285,132297,132309
2,,Algeria,28.0339,1.6596,269,0,0,0,0,0,...,126434,126651,126860,127107,127361,127646,127926,128198,128456,128725
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,13569,13569,13569,13569,13664,13671,13682,13693,13693,13693
4,,Angola,-11.2027,17.8739,186,0,0,0,0,0,...,31909,32149,32441,32623,32933,33338,33607,33944,34180,34366


In [40]:
# Just to visualize the spread of data for a country
confirmed_df[confirmed_df['country'] == 'India'].iloc[:,5:].sum(axis=0) # axis=0 將各column(y)值相加

1/22/20           0
1/23/20           0
1/24/20           0
1/25/20           0
1/26/20           0
             ...   
5/26/21    27369093
5/27/21    27555457
5/28/21    27729247
5/29/21    27894800
5/30/21    28047534
Length: 495, dtype: int64

In [41]:
# Visualizing the trend of confirmed cases over time using Plotly
def confirmedCases_trend(name):
    x_data = confirmed_df.iloc[:, 5:].columns
    y_data = confirmed_df[confirmed_df['country'] == name].iloc[:,5:].sum(axis=0)
    
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=x_data, y=y_data,
                    mode='markers',
                    name='markers'))

    fig.update_layout(
        title=str(name) +"'s trend on confirmed casess. <br> (Last updated on "+(confirmed_df.columns)[-1]+")",
        xaxis_title="Date",
        yaxis_title="Confirmed Cases",
        width = 800
        )

    fig.show()
    
    

interact(confirmedCases_trend, name='India')
ipywLayout = widgets.Layout()
ipywLayout.display='none'

interactive(children=(Text(value='India', description='name'), Output()), _dom_classes=('widget-interact',))

### Trend of Death Cases

In [42]:
# Visualizing the trend of death cases over time using Plotly
def deathCases_trend(name):
    x_data = death_df.iloc[:, 5:].columns
    y_data = death_df[death_df['country'] == name].iloc[:,5:].sum(axis=0)
    
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=x_data, y=y_data,
                    mode='markers',
                    name='markers'))

    fig.update_layout(
        title=str(name) +"'s trend on death cases. <br> (Last updated on "+(death_df.columns)[-1]+")",
        xaxis_title="Date",
        yaxis_title="Death Cases",
        width = 800
        )

    fig.show()
    

interact(deathCases_trend, name='India')
ipywLayout = widgets.Layout()
ipywLayout.display='none'

interactive(children=(Text(value='India', description='name'), Output()), _dom_classes=('widget-interact',))

### Trend of Recovered Case

In [43]:
# Visualizing the trend of recovered cases over time using Plotly
def recoveredCases_trend(name):
    x_data = recovered_df.iloc[:, 5:].columns
    y_data = recovered_df[recovered_df['country'] == name].iloc[:,5:].sum(axis=0)
    
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=x_data, y=y_data,
                    mode='markers',
                    name='markers'))

    fig.update_layout(
        title=str(name) +"'s trend on recovered casess. <br> (Last updated on "+(recovered_df.columns)[-1]+")",
        xaxis_title="Date",
        yaxis_title="Recovered Cases",
        width = 800
        )

    fig.show()
    
    
interact(recoveredCases_trend, name='India')
ipywLayout = widgets.Layout()
ipywLayout.display='none'

interactive(children=(Text(value='India', description='name'), Output()), _dom_classes=('widget-interact',))

# Interactive Table Chart

### COVID-19 Confirmed/Death/Recovered/Active cases - Sorted by Confirmed Cases in Descending order

In [44]:
country_df.head()

Unnamed: 0,country,last_update,lat,long_,confirmed,deaths,recovered,active,incident_rate,people_tested,people_hospitalized,mortality_rate,uid,iso3
0,Afghanistan,2021-05-31 10:24:02,33.93911,67.709953,70761.0,2919.0,57450.0,10392.0,181.772452,,,4.125154,4,AFG
1,Albania,2021-05-31 10:24:02,41.1533,20.1683,132309.0,2450.0,129308.0,551.0,4597.574536,,,1.851726,8,ALB
2,Algeria,2021-05-31 10:24:02,28.0339,1.6596,128725.0,3465.0,89625.0,35635.0,293.5506,,,2.691785,12,DZA
3,Andorra,2021-05-31 10:24:02,42.5063,1.5218,13693.0,127.0,13416.0,150.0,17722.125154,,,0.927481,20,AND
4,Angola,2021-05-31 10:24:02,-11.2027,17.8739,34366.0,764.0,27766.0,5836.0,104.563134,,,2.223128,24,AGO


In [45]:
# Printing the top n countries sorted by Confirmed Cases in Descending order
def show_latest_cases(n):
    n = int(n)
    df1 = country_df[['country','last_update','confirmed','deaths','recovered','active','incident_rate','mortality_rate']]
    df1 = df1.sort_values(by ='confirmed', ascending=False)
    return df1.head(n)


interact(show_latest_cases, n='10')
ipywLayout = widgets.Layout()
ipywLayout.display='none'

interactive(children=(Text(value='10', description='n'), Output()), _dom_classes=('widget-interact',))

### COVID-19 Confirmed/Death/Recovered/Active cases - Country specific

In [46]:
# Country specific search to see the count details
def country_specific(name):
    df1 = country_df[['country','last_update','confirmed','deaths','recovered','active','incident_rate','mortality_rate']]
    country_specific = df1.loc[df1['country'] == name]
    return country_specific


interact(country_specific, name='India')
ipywLayout = widgets.Layout()
ipywLayout.display='none'

interactive(children=(Text(value='India', description='name'), Output()), _dom_classes=('widget-interact',))

# Data Visualization through Bar Charts

### Worst Hit Countries - Confirmed Cases

In [47]:
country_df.head()

Unnamed: 0,country,last_update,lat,long_,confirmed,deaths,recovered,active,incident_rate,people_tested,people_hospitalized,mortality_rate,uid,iso3
0,Afghanistan,2021-05-31 10:24:02,33.93911,67.709953,70761.0,2919.0,57450.0,10392.0,181.772452,,,4.125154,4,AFG
1,Albania,2021-05-31 10:24:02,41.1533,20.1683,132309.0,2450.0,129308.0,551.0,4597.574536,,,1.851726,8,ALB
2,Algeria,2021-05-31 10:24:02,28.0339,1.6596,128725.0,3465.0,89625.0,35635.0,293.5506,,,2.691785,12,DZA
3,Andorra,2021-05-31 10:24:02,42.5063,1.5218,13693.0,127.0,13416.0,150.0,17722.125154,,,0.927481,20,AND
4,Angola,2021-05-31 10:24:02,-11.2027,17.8739,34366.0,764.0,27766.0,5836.0,104.563134,,,2.223128,24,AGO


In [48]:
# Visualizing the top n countries with respect to confirmed cases using Plotly
def confirmedCases_bar_chart(n):
    df1 = country_df.sort_values(by ='confirmed', ascending=False)
    fig = px.bar(df1.head(n), x="country", y="confirmed")
    
    fig.update_layout(
    title=str(n) +" countries with highest number of confirmed cases. <br> (Last updated on "+ df1.last_update[1] +")",
    xaxis_title="Countries",
    yaxis_title="Confirmed Cases",
    width = 800
    )
    
    fig.show();

    
interact(confirmedCases_bar_chart, n=10)
ipywLayout = widgets.Layout()
ipywLayout.display='none'

interactive(children=(IntSlider(value=10, description='n', max=30, min=-10), Output()), _dom_classes=('widget-…

### Worst Hit Countries - Death Cases

In [49]:
# Visualizing the top n countries with respect to death cases using Plotly
def deathCases_bar_chart(n):
    df1 = country_df.sort_values(by ='deaths', ascending=False)
    fig = px.bar(df1.head(n), x="country", y="deaths")
    
    fig.update_layout(
    title=str(n) +" countries with highest number of death cases. <br> (Last updated on "+ df1.last_update[1] +")",
    xaxis_title="Countries",
    yaxis_title="Death Cases",
    width = 800
    )
    
    fig.show();

    
interact(deathCases_bar_chart, n=10)
ipywLayout = widgets.Layout()
ipywLayout.display='none'

interactive(children=(IntSlider(value=10, description='n', max=30, min=-10), Output()), _dom_classes=('widget-…

### Countries - Recovered Cases

In [50]:
# Visualizing the top n countries with respect to recovered cases using Plotly
def recoveredCases_bar_chart(n):
    df1 = country_df.sort_values(by ='recovered', ascending=False)
    fig = px.bar(df1.head(n), x="country", y="recovered")
    
    fig.update_layout(
    title=str(n) +" countries with highest number of recovered cases. <br> (Last updated on "+ df1.last_update[1] +")",
    xaxis_title="Countries",
    yaxis_title="Recovered Cases",
    width = 800
    )
    
    fig.show();

    
interact(recoveredCases_bar_chart, n=10)
ipywLayout = widgets.Layout()
ipywLayout.display='none'

interactive(children=(IntSlider(value=10, description='n', max=30, min=-10), Output()), _dom_classes=('widget-…

### Countries - Active Cases

In [51]:
# Visualizing the top n countries with respect to active cases using Plotly
def activeCases_bar_chart(n):
    df1 = country_df.sort_values(by ='active', ascending=False)
    fig = px.bar(df1.head(n), x="country", y="recovered")
    
    fig.update_layout(
    title=str(n) +" countries with highest number of active cases. <br> (Last updated on "+ df1.last_update[1] +")",
    xaxis_title="Countries",
    yaxis_title="Active Cases",
    width = 800
    )
    
    fig.show();

    
interact(recoveredCases_bar_chart, n=10)
ipywLayout = widgets.Layout()
ipywLayout.display='none'

interactive(children=(IntSlider(value=10, description='n', max=30, min=-10), Output()), _dom_classes=('widget-…

# Data Visualization on Maps

### Global spread of COVID-19 using Folium

In [52]:
confirmed_df.head()

Unnamed: 0,state,country,lat,long,NewCases,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,...,5/21/21,5/22/21,5/23/21,5/24/21,5/25/21,5/26/21,5/27/21,5/28/21,5/29/21,5/30/21
0,,Afghanistan,33.93911,67.709953,650,0,0,0,0,0,...,65080,65486,65728,66275,66903,67743,68366,69130,70111,70761
1,,Albania,41.1533,20.1683,12,0,0,0,0,0,...,132153,132176,132209,132215,132229,132244,132264,132285,132297,132309
2,,Algeria,28.0339,1.6596,269,0,0,0,0,0,...,126434,126651,126860,127107,127361,127646,127926,128198,128456,128725
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,13569,13569,13569,13569,13664,13671,13682,13693,13693,13693
4,,Angola,-11.2027,17.8739,186,0,0,0,0,0,...,31909,32149,32441,32623,32933,33338,33607,33944,34180,34366


In [53]:
# Checking number of null in lat
confirmed_df[confirmed_df['lat'].isnull()]

Unnamed: 0,state,country,lat,long,NewCases,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,...,5/21/21,5/22/21,5/23/21,5/24/21,5/25/21,5/26/21,5/27/21,5/28/21,5/29/21,5/30/21
52,Repatriated Travellers,Canada,,,0,0,0,0,0,0,...,13,13,13,13,13,13,13,13,13,13
88,Unknown,China,,,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [54]:
# Checking number of null in long
confirmed_df[confirmed_df['long'].isnull()]

Unnamed: 0,state,country,lat,long,NewCases,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,...,5/21/21,5/22/21,5/23/21,5/24/21,5/25/21,5/26/21,5/27/21,5/28/21,5/29/21,5/30/21
52,Repatriated Travellers,Canada,,,0,0,0,0,0,0,...,13,13,13,13,13,13,13,13,13,13
88,Unknown,China,,,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [55]:
# Removing the null values from confirmed_df
confirmed_df = confirmed_df[~confirmed_df['lat'].isnull()]

In [56]:
# Using Folium to spread of COVID19 cases over the world
world_map = folium.Map(location=[11,0], tiles="cartodbpositron", zoom_start=2, max_zoom = 6, min_zoom = 2)


for i in range(0,len(confirmed_df)):
    folium.Circle(
        location=[confirmed_df.iloc[i]['lat'], confirmed_df.iloc[i]['long']],
        fill=True,
        radius=(int((np.log(confirmed_df.iloc[i,-1]+1.00001)))+0.2)*5000,
        # 標準化radius，對其取log
        color='red',
        fill_color='indigo',
        # 使用HTML語言在tooltip中呈現
        tooltip = "<div style='margin: 0; background-color: black; color: white;'>"+
                    "<h4 style='text-align:center;font-weight: bold'>"+confirmed_df.iloc[i]['country'] + "</h4>"
                    "<hr style='margin:10px;color: white;'>"+
                    "<ul style='color: white;;list-style-type:circle;align-item:left;padding-left:20px;padding-right:20px'>"+
                        "<li>Confirmed: "+str(confirmed_df.iloc[i,-1])+"</li>"+
                        "<li>Deaths:   "+str(death_df.iloc[i,-1])+"</li>"+
                        "<li>Death Rate: "+ str(np.round(death_df.iloc[i,-1]/(confirmed_df.iloc[i,-1]+1.00001)*100,2))+ "</li>"+
                    "</ul></div>",
        ).add_to(world_map)

world_map


## Global spread of COVID-19 using Plotly

In [57]:
fig = px.scatter_mapbox(confirmed_df, lat="lat", lon="long", color="country",
                  color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=0,
                  mapbox_style="carto-positron")

fig.update_layout(
    title="Global spread of COVID-19. (Last updated on "+ confirmed_df.columns[-1] +")"
    )
    
fig.show();