# **COVID-19 - EXPLORATORY DATA ANALYSIS AND VISUALIZATION**

This kernel is created together with [another kernel](https://www.kaggle.com/nhntran/covid-19-vietnam-data-eda-and-visualization?scriptVersionId=32963257) which is for specific country data in Vietnam.

*The writen up report for this analysis could be found [here](https://towardsdatascience.com/covid-19-what-do-we-know-about-the-situation-in-vietnam-82c195163d7e).*

*A nice visualization of all Vietnam COVID-19 patients could be found [here](https://medium.com/@tranhnnguyenvn/a-full-picture-of-vietnam-covid-19-patients-496f7ccad3ea). *

*Data used in this kernel: Updated data from the Johns Hopkins University (the CSSEGI)*


**REFERENCE SOURCES**

1. Kaggle:

https://www.kaggle.com/abhinand05/covid-19-digging-a-bit-deeper

https://www.kaggle.com/corochann/covid-19-eda-with-recent-update-on-april

https://www.kaggle.com/vikassingh1996/coronavirus-an-exploratory-study-w-detail-report/notebook

2. TowardsDataScience:

https://towardsdatascience.com/visualizing-the-coronavirus-pandemic-with-choropleth-maps-7f30fccaecf5

3. Other:
The data from https://github.com/CSSEGISandData/COVID-19

**A. ENVIRONMENT SETUP**

In [1]:
# Install wget to download the data
!pip install wget

Collecting wget
  Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25ldone
[?25h  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9681 sha256=a8a295db09dd4f22c29fef70c1d074d9203ac5649dd5f519d3ccc62006c05d74
  Stored in directory: /Users/nhntran/Library/Caches/pip/wheels/a1/b6/7c/0e63e34eb06634181c63adacca38b79ff8f35c37e3c13e3c02
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


In [3]:
!pip install folium

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 2.0 MB/s eta 0:00:01
[?25hCollecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0


In [4]:
!pip install altair

Collecting altair
  Downloading altair-4.1.0-py3-none-any.whl (727 kB)
[K     |████████████████████████████████| 727 kB 604 kB/s eta 0:00:01
Collecting toolz
  Downloading toolz-0.10.0.tar.gz (49 kB)
[K     |████████████████████████████████| 49 kB 4.5 MB/s eta 0:00:011
Building wheels for collected packages: toolz
  Building wheel for toolz (setup.py) ... [?25ldone
[?25h  Created wheel for toolz: filename=toolz-0.10.0-py3-none-any.whl size=55575 sha256=db7259664b3a6ba07318b9e325ab49259d889b431eb8f5365d674fbf847d1328
  Stored in directory: /Users/nhntran/Library/Caches/pip/wheels/e2/83/7c/248063997a4f9ff6bf145822e620e8c37117a6b4c765584077
Successfully built toolz
Installing collected packages: toolz, altair
Successfully installed altair-4.1.0 toolz-0.10.0


In [5]:
# Import neccessary package
import os
import pandas as pd
import wget
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style("whitegrid")

%config InlineBackend.figure_format = 'retina' #high resolution for rendered images on notebook

# Map visualization
import folium
import altair as alt

# Plotly
from plotly import tools, subplots
import plotly.offline as py
py.init_notebook_mode(connected=True) # Required to use plotly offline in jupyter notebook
import plotly.graph_objs as go
import plotly.express as px
import plotly.figure_factory as ff
import plotly.io as pio
pio.templates.default = "plotly_white"

'''Display markdown formatted output like bold, italic bold etc.'''
from IPython.display import Markdown
def bold(string):
    display(Markdown(string))

**B. DOWNLOADING THE DATA**

The updated data from the Johns Hopkins University (the CSSEGI)

In [6]:
# Remove on the downloaded csv files (new data everyday from the CSSEGI)
! rm *.csv

# The updated data from the Johns Hopkins University:
# global confirm cases and death cases:
urls = ['https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv', 
        'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv']


for url in urls:
    filename = wget.download(url)
# data will be in /kaggle/working folder

rm: *.csv: No such file or directory


In [8]:
# Read the dataset
cases_df = pd.read_csv('time_series_covid19_confirmed_global.csv')
death_df = pd.read_csv('time_series_covid19_deaths_global.csv')
cases_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/31/20,6/1/20,6/2/20,6/3/20,6/4/20,6/5/20,6/6/20,6/7/20,6/8/20,6/9/20
0,,Afghanistan,33.0,65.0,0,0,0,0,0,0,...,15205,15750,16509,17267,18054,18969,19551,20342,20917,21459
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,1137,1143,1164,1184,1197,1212,1232,1246,1263,1299
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,9394,9513,9626,9733,9831,9935,10050,10154,10265,10382
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,764,765,844,851,852,852,852,852,852,852
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,86,86,86,86,86,86,88,91,92,96


In [9]:
death_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/31/20,6/1/20,6/2/20,6/3/20,6/4/20,6/5/20,6/6/20,6/7/20,6/8/20,6/9/20
0,,Afghanistan,33.0,65.0,0,0,0,0,0,0,...,257,265,270,294,300,309,327,357,369,384
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,33,33,33,33,33,33,34,34,34,34
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,653,661,667,673,681,690,698,707,715,724
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,51,51,51,51,51,51,51,51,51,51
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,4,4,4,4,4,4,4,4,4,4


**C. DATA WRANGLING - DATA EXPLORATORY ANALYSIS**

**1. General information about the data**

In [10]:
## General information about the data

# print(cases_df.shape)
# print("All columns:", cases_df.columns)
# print("Types:", cases_df.dtypes)

# cases_df.describe(include='all')

# ## Missing values ?
# cases_df.info()
# cases_df.isnull().sum().sort_values(ascending=False)

# => Only missing value in Province/State

**2. Merging and cleaning up the data**

- Merge confirmed cases and death cases.
- Correct the values in 'Province/State' 
- Changing the country name for using the package 'pycoutry_convert'
- Fixing wrong data in a certain date in 'Hubei', China


In [11]:
## Combine Confirmed and Death Cases:
dates = cases_df.columns[4:]
cases_df_cleanup = cases_df.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'],
                                value_vars = dates, var_name = 'Date', value_name = 'Confirmed Cases')
death_df_cleanup = death_df.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'],
                                value_vars = dates, var_name = 'Date', value_name = 'Deaths')
data = pd.concat([cases_df_cleanup, death_df_cleanup['Deaths']], axis = 1, sort = False)
data.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed Cases,Deaths
0,,Afghanistan,33.0,65.0,1/22/20,0,0
1,,Albania,41.1533,20.1683,1/22/20,0,0
2,,Algeria,28.0339,1.6596,1/22/20,0,0
3,,Andorra,42.5063,1.5218,1/22/20,0,0
4,,Angola,-11.2027,17.8739,1/22/20,0,0


In [12]:
data.columns

Index(['Province/State', 'Country/Region', 'Lat', 'Long', 'Date',
       'Confirmed Cases', 'Deaths'],
      dtype='object')

In [13]:
len(data['Province/State'].unique())

82

In [14]:
## Correct the values in 'Province/State' 
##(remove the 'Recovered' values and the "," in 'Bonaire, Sint Eustatius and Saba' to avoid redundancy)

data = data[data['Province/State'].str.contains('Recovered') != True]
data = data[data['Province/State'].str.contains(',') != True]

In [15]:
len(data['Province/State'].unique())

81

In [16]:
len(data['Country/Region'].unique())

188

In [17]:
## Checking some country names
data[data['Country/Region'] == 'Taiwan*']
#data[data['Country/Region'] == "Cote d'Ivoire"]

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed Cases,Deaths
207,,Taiwan*,23.7,121.0,1/22/20,1,0
473,,Taiwan*,23.7,121.0,1/23/20,1,0
739,,Taiwan*,23.7,121.0,1/24/20,3,0
1005,,Taiwan*,23.7,121.0,1/25/20,3,0
1271,,Taiwan*,23.7,121.0,1/26/20,4,0
...,...,...,...,...,...,...,...
36117,,Taiwan*,23.7,121.0,6/5/20,443,7
36383,,Taiwan*,23.7,121.0,6/6/20,443,7
36649,,Taiwan*,23.7,121.0,6/7/20,443,7
36915,,Taiwan*,23.7,121.0,6/8/20,443,7


In [18]:
## Changing the country name for using the package 'pycoutry_convert'
# Dict to change:
country = {'US':'USA', 
           'Korea, South':'South Korea',
           'Taiwan*': 'Taiwan',
           'Congo (Kinshasa)': 'Democratic Republic of the Congo',
           "Cote d'Ivoire": "Côte d'Ivoire",
           'Reunion': 'Réunion',
           'Congo (Brazzaville)': 'Republic of the Congo',
           'Bahamas, The': 'Bahamas',
           'Gambia, The': 'Gambia'
          }
for old, new in country.items():
    data['Country/Region'] = data['Country/Region'].replace(old, new)

In [19]:
## Checking the result of changing some country names
#data[data['Country/Region'] == 'Taiwan']
data[data['Country/Region'] == "Bahamas"]

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed Cases,Deaths
18,,Bahamas,25.0343,-77.3963,1/22/20,0,0
284,,Bahamas,25.0343,-77.3963,1/23/20,0,0
550,,Bahamas,25.0343,-77.3963,1/24/20,0,0
816,,Bahamas,25.0343,-77.3963,1/25/20,0,0
1082,,Bahamas,25.0343,-77.3963,1/26/20,0,0
...,...,...,...,...,...,...,...
35928,,Bahamas,25.0343,-77.3963,6/5/20,102,11
36194,,Bahamas,25.0343,-77.3963,6/6/20,103,11
36460,,Bahamas,25.0343,-77.3963,6/7/20,103,11
36726,,Bahamas,25.0343,-77.3963,6/8/20,103,11


In [20]:
len(data['Country/Region'].unique())

188

In [21]:
## Show all of the data from China
#data.loc[data['Country/Region'] == 'China']

## Show the specific data that is wrong (according to literature)
data[(data['Province/State'] == 'Hubei') & (data['Date'] == '2/12/20')] 

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed Cases,Deaths
5648,Hubei,China,30.9756,112.2707,2/12/20,33366,1068


In [22]:
## Fixing wrong data in a certain date in 'Hubei', China

## fixing function
def fixing_value(date, region, value_name, new_value):
    for key, val in new_value.items():
        data.loc[(data['Date'] == date) & (data[region] == key), value_name] = val
# fix data       
hubei_feb12 = {'Hubei':34874}
fixing_value('2/12/20', 'Province/State', 'Confirmed Cases', hubei_feb12)

#checking the fixing effect
data[(data['Province/State'] == 'Hubei') & (data['Date'] == '2/12/20')] 

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed Cases,Deaths
5648,Hubei,China,30.9756,112.2707,2/12/20,34874,1068


In [23]:
### Convert date from string to datetime type

data['Date'] = pd.to_datetime(data['Date'])
# need to convert 'Date' data from string to datetime,
# sort by date again. Otherwise 2/1/20 then 2/10/20 => wrong sorting
#data['Date'] = data['Date'].dt.strftime('%m/%d/%Y')
data['Date'] = data['Date']
data.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed Cases,Deaths
0,,Afghanistan,33.0,65.0,2020-01-22,0,0
1,,Albania,41.1533,20.1683,2020-01-22,0,0
2,,Algeria,28.0339,1.6596,2020-01-22,0,0
3,,Andorra,42.5063,1.5218,2020-01-22,0,0
4,,Angola,-11.2027,17.8739,2020-01-22,0,0


**3. Saving the data for further analysis**

In [24]:
data.to_csv('global_covid19cases.csv', index = False)

**4. Extracting specific data for each country**

In [25]:
## Show all of the data from US
#data.loc[data['Country/Region'] == 'US']

## Extract the specific country data
us_data = data[data['Country/Region'] == 'US']
us_data

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed Cases,Deaths


**D. WORLD WIDE DETAIL REPORT ON THE MOST UPDATED DAY (SINGLE DAY)**

**1. The globally confirmed cases and deaths**

In [26]:
## Extract the final date in the dataframe
date = data['Date'].iloc[-1]

most_recent_data = data[data['Date'] == date]
print('Globally COVID-19 data on date {}:\n'.format(date))
print('Confirmed Cases:   {:,}'.format(most_recent_data['Confirmed Cases'].sum()))
print('Deaths Cases:      {:,}'.format(most_recent_data['Deaths'].sum()))

Globally COVID-19 data on date 2020-06-09 00:00:00:

Confirmed Cases:   7,236,047
Deaths Cases:      411,141


**2. Distribution of cases by country**

In [27]:
color_case = 'YlOrRd'
color_death = 'YlOrRd'
# color reference
#cmaps['Sequential'] = [
#            'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds',
#            'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu',
#            'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn']

country_cases = most_recent_data.groupby('Country/Region')['Confirmed Cases', 'Deaths'].sum().reset_index()
country_cases.sort_values('Confirmed Cases', ascending = False)\
            .style.background_gradient(cmap = color_case, subset = ['Confirmed Cases'])\
            .background_gradient(cmap = color_death, subset = ['Deaths'])


Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.



Unnamed: 0,Country/Region,Confirmed Cases,Deaths
174,USA,1973230,111694
23,Brazil,739503,38406
139,Russia,484630,6134
178,United Kingdom,290581,40968
78,India,276146,7750
158,Spain,241966,27136
84,Italy,235561,34043
132,Peru,203736,5738
61,France,191523,29299
65,Germany,186506,8736


**3. Graph presentation top 10 countries**

In [28]:
## SINGLE VALUE GRAPH (** Using altair (alt) for this graph)
## list of parameters
bar_color = '#da635eff'
sort_value = 'Confirmed Cases' # 'Deaths'/'Recovered'
y_axis = 'Country/Region'# 'Province/State'
top_num = 10 # number of country
threshold = 1000

## preparing data for graph (extracting top countries)
top10_country = country_cases.sort_values(sort_value, ascending = False).head(top_num)
# Can also choose not top_num but a cutoff threshold of sort_value
#top10_country = country_cases[country_cases[sort_value] > threshold].sort_values(sort_value, ascending = False)

## drawing graph
def drawing_single_value_bar_graph(data, bar_color, sort_value, y_axis, top_num):
    
    bars = alt.Chart(data)\
        .mark_bar(color = bar_color,cornerRadiusTopLeft = 3, cornerRadiusTopRight=3, size = 20, opacity = 0.7)\
        .encode(
                    x = '{}:Q'.format(sort_value),
                    y = alt.Y('{}:O'.format(y_axis), sort = '-x'))\
        .properties( 
            title = {
            "text":['Top {}: {}'.format(y_axis, sort_value)],
            "subtitle":['*Updated on {}'.format(date)],
            "fontSize":15,
            "fontWeight": 'bold',
            "font":'Courier New',
            }
        )
    # dx = 3 Nudges text to right so it doesn't appear on top of the bar
    text = bars.mark_text(align = 'left', baseline = 'middle',dx = 3).encode(text = '{}:Q'.format(sort_value))

    return (bars + text).properties( height = 400, width = 800)
    

fig = drawing_single_value_bar_graph(top10_country, bar_color, sort_value, y_axis, top_num)
fig

**4. Distribution of cases on map**

In [29]:
## Using most_recent_data data
most_recent_data.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed Cases,Deaths
36974,,Afghanistan,33.0,65.0,2020-06-09,21459,384
36975,,Albania,41.1533,20.1683,2020-06-09,1299,34
36976,,Algeria,28.0339,1.6596,2020-06-09,10382,724
36977,,Andorra,42.5063,1.5218,2020-06-09,852,51
36978,,Angola,-11.2027,17.8739,2020-06-09,96,4


In [30]:
##### *********  MAP - STYLE 1  *********
# Using folium
# Data: Using most_recent_data dataframe

## parameter for map
mapstyle = 'CartoDB positron'
line_color = '#da635eff'
fill_color = '#da635eff'
fill_opacity = 0.6
# other styles: 'OpenStreetMap', "Stamen Terrain”, “Stamen Toner”, “Stamen Watercolor”

## create map
world_map = folium.Map(location = [10,0], zoom_start = 2, max_zoom = 8, min_zoom = 2, tiles = mapstyle)
## define detail of the map
for lat, long, case, name in zip(most_recent_data['Lat'], most_recent_data['Long'], most_recent_data['Confirmed Cases'],\
                                most_recent_data['Country/Region']):
    folium.CircleMarker([lat, long], radius = (int((np.log(case+1.00001)))+0.2),
                       popup = ("<h5 style='text-align:center;font-weight: bold'>" + str(name).capitalize()+ "</h5>" + '<br>'
                                '<strong>Confirmed Cases</strong>: ' + str(case) + '<br>'),\
                       color = line_color, weight= 1.5, \
                        fill_color = fill_color, fill_opacity = fill_opacity).add_to(world_map)
# opacity = fill_opacity #opacity of the line 
 
## Save map
world_map.save("./world_map.html")
world_map


In [31]:
##### *********  MAP - STYLE 2  *********
# Using choropleth
# Data: Using country_cases dataframe (log10 scale)

## list of parameters
color = '#da635eff' #'Reds'
map_value = 'Confirmed Cases' # 'Deaths'/'Recovered'

## function to drawing graph
def drawing_global_heatmap(country_cases, map_value, color):
    temp_df = country_cases[['Country/Region',map_value]]

    fig = px.choropleth(temp_df, locations="Country/Region",
                        color = np.log10(temp_df[map_value] + 1), # + 1 to avoid divided by 0, log0
                        hover_name = "Country/Region", # column to add to hover information
                        hover_data = [map_value],
                        projection = 'miller', # change to type of map display
                        color_continuous_scale = px.colors.sequential.Plasma,locationmode = "country names")
    
    fig.update_geos(fitbounds = "locations", visible = False)
#     fig.update_layout(height=500, margin={"r":0,"t":0,"l":0,"b":0})
    fig.update_layout(title_text = "{} Heat Map (Log Scale)".format(map_value), title_x = 0.5)
    fig.update_coloraxes(colorbar_title = "{}(Log Scale)".format(map_value),colorscale="Reds")
    return fig
# drawing graph
fig = drawing_global_heatmap(country_cases, map_value, color)
#fig.to_image("Global Heat Map {map_value}.png")
fig.show()

In [32]:
## drawing graph
fig = drawing_global_heatmap(country_cases,'Deaths', 'Reds')
# fig.to_image("Global Heat Map {map_value}.png")
fig.show()

**E. WORLD WIDE DETAIL REPORT SINCE THE EARLY STAGE OF THE PANDEMIC - UP TO NOW**

In [33]:
## Using datafram 'data'
data.head()
# Note: this dataframe 'data' has detail data of some province/state in the same country.
# If analyzing country data: Must use groupby('Country/Region') 
# to sum the cases of all the Province/State of the same country, 
# for example, Canada:
data[data['Country/Region'] == "Canada"]

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed Cases,Deaths
35,Alberta,Canada,53.9333,-116.5765,2020-01-22,0,0
36,British Columbia,Canada,49.2827,-123.1207,2020-01-22,0,0
37,Grand Princess,Canada,37.6489,-122.6655,2020-01-22,0,0
38,Manitoba,Canada,53.7609,-98.8139,2020-01-22,0,0
39,New Brunswick,Canada,46.5653,-66.4619,2020-01-22,0,0
...,...,...,...,...,...,...,...
37018,Quebec,Canada,52.9399,-73.5491,2020-06-09,53185,5029
37019,Saskatchewan,Canada,52.9399,-106.4509,2020-06-09,656,13
37205,Diamond Princess,Canada,0.0000,0.0000,2020-06-09,0,1
37218,Northwest Territories,Canada,64.8255,-124.8457,2020-06-09,5,0


**1. Static graph**

In [34]:
## Sum up all the cases in the world by date
world_cases_all_time = data.groupby('Date')['Confirmed Cases', 'Deaths'].sum().sort_values('Date').reset_index()

# Adding 'New Confirmed Cases' column
world_cases_all_time['New Confirmed Cases'] = world_cases_all_time['Confirmed Cases'] - world_cases_all_time['Confirmed Cases'].shift(1)
# # Adding 'Mortality' column
# world_cases_all_time['Mortality'] = world_cases_all_time['Deaths']/world_cases_all_time['Confirmed Cases']

# Checking the adding
world_cases_all_time.head()
#world_cases_all_time.tail()



Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.



Unnamed: 0,Date,Confirmed Cases,Deaths,New Confirmed Cases
0,2020-01-22,555,17,
1,2020-01-23,654,18,99.0
2,2020-01-24,941,26,287.0
3,2020-01-25,1434,42,493.0
4,2020-01-26,2118,56,684.0


In [35]:
## For good x-axis display (only show day and year)
# world_cases_all_time['Date'] = pd.to_datetime(world_cases_all_time['Date'])
# world_cases_all_time['Date'] = world_cases_all_time['Date'].dt.strftime('%m/%d')
##### *********  GRAPH - STYLE 1 - LINE  *********

# Combine all the data together for drawing graph
world_cases_all_time_melt = world_cases_all_time.melt(id_vars = ['Date'], 
                                    value_vars = ['Confirmed Cases', 'Deaths', 'New Confirmed Cases'])
#world_cases_all_time_melt.head()

fig = px.line(world_cases_all_time_melt, x="Date", y="value", color='variable')
fig.update_layout(title = {'text': 'Worldwide Confirmed/Death Cases Over Time', 'x': 0.5},
                   xaxis_title = 'Date (2020)',
                   yaxis_title = 'Cases',
                 legend = {'title': None})
#fig.update_xaxes(dtick = 10) # changing the distance between ticks
fig.show()

In [36]:
##### *********  GRAPH - STYLE 2 - BAR  *********
# Using the dataframe world_cases_all_time
# 'Date', Confirmed Cases', 'Deaths', 'New Confirmed Cases'

fig = go.Figure(data=[
    go.Bar(name = 'Confirmed Cases', x = world_cases_all_time['Date'], y = world_cases_all_time['Confirmed Cases']),
    go.Bar(name = 'Deaths', x = world_cases_all_time['Date'], y = world_cases_all_time['Deaths'])
])

fig.update_layout(title = {'text': 'Worldwide Confirmed/Death Cases Over Time', 'x': 0.5},
                   xaxis_title = 'Date (2020)',
                   yaxis_title = 'Cases',
                 legend = {'title': None})
fig.show()

**2. Animated graph**

In [37]:
## Sum up all the cases for each country by date
countries_cases_all_time = data.groupby(['Date', 'Country/Region'])['Confirmed Cases', 'Deaths'].sum().sort_values('Date').reset_index()

# checking the grouping
countries_cases_all_time.tail()


Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.



Unnamed: 0,Date,Country/Region,Confirmed Cases,Deaths
26315,2020-06-09,Georgia,818,13
26316,2020-06-09,Germany,186506,8736
26317,2020-06-09,Ghana,10201,48
26318,2020-06-09,Eswatini,371,3
26319,2020-06-09,Zimbabwe,314,4


In [38]:
world_cases_all_time

Unnamed: 0,Date,Confirmed Cases,Deaths,New Confirmed Cases
0,2020-01-22,555,17,
1,2020-01-23,654,18,99.0
2,2020-01-24,941,26,287.0
3,2020-01-25,1434,42,493.0
4,2020-01-26,2118,56,684.0
...,...,...,...,...
135,2020-06-05,6770163,396107,137185.0
136,2020-06-06,6896910,399984,126747.0
137,2020-06-07,7010577,402727,113667.0
138,2020-06-08,7118995,406543,108418.0


In [39]:

##### *********  MAP - STYLE 2 WITH ANIMATION  *********
# Using choropleth
# Data: Using country_cases dataframe (log10 scale)

## list of parameters
color = '#da635eff' #'Reds'
map_value = 'Confirmed Cases' # 'Deaths'/'Recovered'

## function to drawing graph
temp_df = countries_cases_all_time[['Date', 'Country/Region','Confirmed Cases']]
temp_df['Date'] = temp_df['Date'].dt.strftime('%m/%d/%Y')
fig = px.choropleth(temp_df, locations = "Country/Region", locationmode = 'country names', 
                     color = np.log10(temp_df[map_value] + 1), 
                     hover_name = "Country/Region", projection="mercator",
                     animation_frame = "Date", width = 1000, height = 800,
                     color_continuous_scale = px.colors.sequential.Viridis,
                     title = 'The Spread of COVID-19 Cases Across The World')

#Showing the figure
fig.update_geos(fitbounds = "locations", visible = False)
fig.update(layout_coloraxis_showscale=True)
fig.update_coloraxes(colorbar_title = "Confirmed Cases\n(Log Scale)".format(map_value), colorscale = "Reds")
py.offline.iplot(fig)
# fig.to_image("Global Heat Map {map_value}.png")

**F. COUNTRY COMPARISON**

In [40]:
## Sum up all the cases for each country by date
countries_cases_all_time = data.groupby(['Date', 'Country/Region'])['Confirmed Cases', 'Deaths'].sum().sort_values('Date').reset_index()

# checking the grouping
countries_cases_all_time.tail()


Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.



Unnamed: 0,Date,Country/Region,Confirmed Cases,Deaths
26315,2020-06-09,Georgia,818,13
26316,2020-06-09,Germany,186506,8736
26317,2020-06-09,Ghana,10201,48
26318,2020-06-09,Eswatini,371,3
26319,2020-06-09,Zimbabwe,314,4


In [41]:
top10_country = country_cases.sort_values(sort_value, ascending = False).head(10)['Country/Region'].unique()
top10_country_cases = countries_cases_all_time[countries_cases_all_time['Country/Region'].isin(top10_country)]
top10_country_cases

Unnamed: 0,Date,Country/Region,Confirmed Cases,Deaths
13,2020-01-22,Peru,0,0
33,2020-01-22,Russia,0,0
54,2020-01-22,USA,1,0
68,2020-01-22,United Kingdom,0,0
86,2020-01-22,Spain,0,0
...,...,...,...,...
26227,2020-06-09,Brazil,739503,38406
26281,2020-06-09,India,276146,7750
26287,2020-06-09,Italy,235561,34043
26312,2020-06-09,France,191523,29299


In [42]:
## Graph top 10 country, y = confirmed case
fig = px.line(top10_country_cases,
              x='Date', y='Confirmed Cases', color='Country/Region',
              title=f'Confirmed Cases for top 10 country')
fig.update_layout(legend = {'title': None})
#fig.update_xaxes(dtick = 10)
fig.show()

In [43]:
## Graph top 10 country, y = log(confirmed case)
fig = px.line(top10_country_cases,
              x='Date', y='Confirmed Cases', color='Country/Region',
              title=f'Confirmed Cases for Top 10 Country')
fig.update_layout(legend = {'title': None},
                 xaxis_title = 'Date', yaxis_title = 'Log(Cases)')
#fig.update_xaxes(dtick = 10)
fig.update_layout(yaxis_type="log")
fig.show()

In [44]:
target_countries = np.array(['USA', 'United Kingdom', 'Spain', 'China', 'Italy', 'Germany',
                             'Singapore', 'Japan','South Korea', 'Vietnam'])
target_country_cases = countries_cases_all_time[countries_cases_all_time['Country/Region'].isin(target_countries)]
fig = px.line(target_country_cases,
              x = 'Date', y = 'Confirmed Cases', color = 'Country/Region',
              title = f'Confirmed Cases for target countries')
fig.update_layout(legend = {'title': None})
#fig.update_xaxes(dtick = 10)
fig.show()

**1. COMPARING DIFFERENT COUNTRIES - DAYS SINCE 50TH CASES**

Using the list of target_countries

In [45]:
target_countries = np.array(['USA', 'United Kingdom', 'Spain', 'China', 'Italy', 'Germany',
                             'Singapore', 'Japan','South Korea', 'Vietnam'])
target_country_cases = countries_cases_all_time[countries_cases_all_time['Country/Region'].isin(target_countries)]

In [46]:
## Comparison since 50 cases in each country

threshold = 50 # 50 cases
countries_since_50cases = pd.DataFrame(columns = ['Date', 'Country/Region',
                                                 'Confirmed Cases', 'Deaths', 'Date Since 50 Cases', 'Confirmed Cases (log10)'])
for country_name in target_countries:
    # Extract country data:
    country = target_country_cases[target_country_cases['Country/Region'] == country_name]
    country = country[country['Confirmed Cases'] >= threshold]
    start_date = country['Date'].min()
    country['Date Since 50 Cases'] = (country['Date'] - start_date)/pd.Timedelta('1 days')
    country['Confirmed Cases (log10)'] = np.log10(country['Confirmed Cases'])
    countries_since_50cases = countries_since_50cases.append(country)
countries_since_50cases

Unnamed: 0,Date,Country/Region,Confirmed Cases,Deaths,Date Since 50 Cases,Confirmed Cases (log10)
6258,2020-02-24,USA,51,0,0.0,1.707570
6445,2020-02-25,USA,51,0,1.0,1.707570
6633,2020-02-26,USA,57,0,2.0,1.755875
6822,2020-02-27,USA,58,0,3.0,1.763428
7008,2020-02-28,USA,60,0,4.0,1.778151
...,...,...,...,...,...,...
25441,2020-06-05,Vietnam,328,0,83.0,2.515874
25629,2020-06-06,Vietnam,329,0,84.0,2.517196
25817,2020-06-07,Vietnam,331,0,85.0,2.519828
26005,2020-06-08,Vietnam,332,0,86.0,2.521138


In [47]:
### HOW NUMBER OF COVID-19 CASES INCREASES SINCE 50 CASES IN EACH COUNTRY

## Draw graph for different target countries:
fig = px.line(countries_since_50cases,
              x = 'Date Since 50 Cases', y = 'Confirmed Cases (log10)', color = 'Country/Region',
             hover_name = 'Country/Region', hover_data = ['Date Since 50 Cases', 'Confirmed Cases'])
## Update layout
fig.update_layout(title = {'text': '<b>Number of Confirmed Cases by Country Since 50 Cases</b>',
                           'x': 0.5},
                   xaxis_title = '<b>Days Since 50 Cases</b>',
                   yaxis_title = '<b>Confirmed Cases</b>',
                     legend = {'title': None})

## Add reference lines
# Case doubles every day
x1 = np.arange(0, 15)
y1 = np.log10(2**(x1 + np.log2(50)))
fig.add_trace(go.Scatter(x = x1, y = y1, mode='lines',
                         name = 'Case doubles every day',
                         hoverinfo = "none", showlegend = False,
                         line = dict(dash ='dash', width = 2,
                                   color = ('rgb(200, 200, 200)'))))
# Case doubles every 3 days
x3 = np.arange(0, 50)
y3 = np.log10(2**(x3/3 + np.log2(50)))
fig.add_trace(go.Scatter(x = x3, y = y3, mode='lines',
                         name = 'Case doubles every 3 days',
                         hoverinfo = "none", showlegend = False,
                         line = dict(dash ='dash', width = 2,
                                   color = ('rgb(200, 200, 200)'))))

# Case doubles every 6 days
x6 = np.arange(0, 80)
y6 = np.log10(2**(x6/6 + np.log2(50)))
fig.add_trace(go.Scatter(x = x6, y = y6, mode='lines',
                         name = 'Case doubles every 6 days',
                         hoverinfo = "none", showlegend = False,
                         line = dict(dash ='dash', width = 2,
                                   color = ('rgb(200, 200, 200)'))))

# Case doubles every 2 weeks
x14 = np.arange(0, 80)
y14 = np.log10(2**(x14/14 + np.log2(50)))
fig.add_trace(go.Scatter(x = x14, y = y14, mode='lines',
                         name = 'Case doubles every 2 weeks',
                         hoverinfo = "none", showlegend = False,
                         line = dict(dash ='dash', width = 2,
                                   color = ('rgb(200, 200, 200)'))))

## Add annotation for reference lines 
fig.update_layout(
    annotations = [
        dict(
            x = x1[-1],
            y = y1[-1],
            text = "Case doubles every day"
        ),
        dict(
            x = x3[-1],
            y = y3[-1],
            text = "Case doubles every 3 days"
        ),
        dict(
            x = x6[-1],
            y = y6[-1],
            text = "Case doubles every 6 days"
        ),
        dict(
            x = x14[-1],
            y = y14[-1],
            text = "Case doubles every 2 weeks"
        )
    ]
)

## Set y-axis to real number of cases (instead of log(cases)) to be more clear.
fig.update_yaxes(
    ticktext=["100 ", "1K ", "10K ", "100K ", "1M "],
    tickvals=[2, 3, 4, 5, 6],
)

fig.show()

**2. COMPARING COUNTRY SINCE 100 CONFIRMED CASES**

In [48]:
## Comparison since 100 cases in each country

threshold = 100 # 100 cases
countries_since_100cases = pd.DataFrame(columns = ['Date', 'Country/Region',
                                                 'Confirmed Cases', 'Deaths', 'Date Since 100 Cases', 'Confirmed Cases (log10)'])

# Record information for text label to the right end of every trace
x_pos = []
y_pos = []
label = []
for country_name in target_countries:
    # Extract country data:
    label.append(country_name)
    country = target_country_cases[target_country_cases['Country/Region'] == country_name]
    country = country[country['Confirmed Cases'] >= threshold]
    start_date = country['Date'].min()
    country['Date Since 100 Cases'] = (country['Date'] - start_date)/pd.Timedelta('1 days')
    country['Confirmed Cases (log10)'] = np.log10(country['Confirmed Cases'])
    x_pos.append(country['Date Since 100 Cases'].iloc[-1] + 2)
    y_pos.append(country['Confirmed Cases (log10)'].iloc[-1])
    countries_since_100cases = countries_since_100cases.append(country)
countries_since_100cases

Unnamed: 0,Date,Country/Region,Confirmed Cases,Deaths,Date Since 100 Cases,Confirmed Cases (log10)
7762,2020-03-03,USA,118,7,0.0,2.071882
7949,2020-03-04,USA,149,11,1.0,2.173186
8137,2020-03-05,USA,219,12,2.0,2.340444
8326,2020-03-06,USA,267,14,3.0,2.426511
8514,2020-03-07,USA,403,17,4.0,2.605305
...,...,...,...,...,...,...
25441,2020-06-05,Vietnam,328,0,75.0,2.515874
25629,2020-06-06,Vietnam,329,0,76.0,2.517196
25817,2020-06-07,Vietnam,331,0,77.0,2.519828
26005,2020-06-08,Vietnam,332,0,78.0,2.521138


In [49]:
### HOW NUMBER OF COVID-19 CASES INCREASES SINCE 100 CASES IN EACH COUNTRY


## Draw graph for different target countries:
fig = px.line(countries_since_100cases,
              x = 'Date Since 100 Cases', y = 'Confirmed Cases (log10)', color = 'Country/Region',
              color_discrete_sequence = px.colors.qualitative.Pastel1,
             hover_name = 'Country/Region', hover_data = ['Date Since 100 Cases', 'Confirmed Cases'])
## Update layout
fig.update_layout(title = {'text': '<b>Number of Confirmed Cases by Country Since 100 Cases</b>',
                           'x': 0.5},
                   xaxis_title = '<b>Days Since 100 Cases</b>',
                   yaxis_title = '<b>Confirmed Cases</b>',
                     legend = {'title': None})

## Add reference lines
# Case doubles every day
x1 = np.arange(0, 15)
y1 = np.log10(2**(x1 + np.log2(100)))
fig.add_trace(go.Scatter(x = x1, y = y1, mode='lines',
                         name = 'Case doubles every day',
                         hoverinfo = "none", showlegend = False,
                         line = dict(dash ='dash', width = 2,
                                   color = ('rgb(200, 200, 200)'))))
# Case doubles every 3 days
x3 = np.arange(0, 50)
y3 = np.log10(2**(x3/3 + np.log2(100)))
fig.add_trace(go.Scatter(x = x3, y = y3, mode='lines',
                         name = 'Case doubles every 3 days',
                         hoverinfo = "none", showlegend = False,
                         line = dict(dash ='dash', width = 2,
                                   color = ('rgb(200, 200, 200)'))))

# Case doubles every 6 days
x6 = np.arange(0, 80)
y6 = np.log10(2**(x6/6 + np.log2(100)))
fig.add_trace(go.Scatter(x = x6, y = y6, mode='lines',
                         name = 'Case doubles every 6 days',
                         hoverinfo = "none", showlegend = False,
                         line = dict(dash ='dash', width = 2,
                                   color = ('rgb(200, 200, 200)'))))

# Case doubles every 2 weeks
x14 = np.arange(0, 80)
y14 = np.log10(2**(x14/14 + np.log2(100)))
fig.add_trace(go.Scatter(x = x14, y = y14, mode='lines',
                         name = 'Case doubles every 2 weeks',
                         hoverinfo = "none", showlegend = False,
                         line = dict(dash ='dash', width = 2,
                                   color = ('rgb(200, 200, 200)'))))

# Case doubles every 1 month
x30 = np.arange(0, 80)
y30 = np.log10(2**(x30/30 + np.log2(100)))
fig.add_trace(go.Scatter(x = x30, y = y30, mode='lines',
                         name = 'Case doubles every 1 month',
                         hoverinfo = "none", showlegend = False,
                         line = dict(dash ='dash', width = 2,
                                   color = ('rgb(200, 200, 200)'))))

## Add a text label to the right end of every trace. Most of the code below  
    # is adding specific offsets y position because some labels overlapped. 
# fig.add_trace(go.Scatter(
#     x = x_pos, y= y_pos, mode = 'text', text = label
# ))

## Add Vietnam trace
df_vietnam = countries_since_100cases[countries_since_100cases['Country/Region'] == 'Vietnam']

fig.add_trace(go.Scatter(x = df_vietnam['Date Since 100 Cases'], y = df_vietnam['Confirmed Cases (log10)'], mode='lines',
                         name = 'Vietnam',
                         hoverinfo = "none", showlegend = False,
                         line = dict(width = 4,
                                   color = ('red'))))
## Add annotation for reference lines 
fig.update_layout(
    annotations = [
        dict(
            x = x1[-1],
            y = y1[-1],
            text = "Case doubles every day"
        ),
        dict(
            x = x3[-1],
            y = y3[-1],
            text = "Case doubles every 3 days"
        ),
        dict(
            x = x6[-1],
            y = y6[-1],
            text = "Case doubles every 6 days"
        ),
        dict(
            x = x14[-1],
            y = y14[-1],
            text = "Case doubles every 2 weeks"
        ),
        dict(
            x = x30[-1],
            y = y30[-1],
            text = "Case doubles every 1 month"
        ),
        ## Reference line for some countries (manually adjust since overlapped)
        dict(
            x = df_vietnam['Date Since 100 Cases'].iloc[-1] + 8,
            y = df_vietnam['Confirmed Cases (log10)'].iloc[-1] + 0.02,
            xref="x",
            yref="y",
            text = "<b>Vietnam</b>",
            showarrow=False,align = 'right',
            font=dict(
            family="Courier New, monospace",
            size=16,
            color="red"
            ),
        )
    ]
)

## Set y-axis to real number of cases (instead of log(cases)) to be more clear.
fig.update_yaxes(
    ticktext=["100 ", "1K ", "10K ", "100K ", "1M "],
    tickvals=[2, 3, 4, 5, 6],
)

fig.show()

**MORE BEAUTIFUL GRAPH**

In [50]:
end_ =  countries_since_100cases['Date Since 100 Cases'].max()
end_
new_df = pd.DataFrame(np.arange(0,end_ + 1))
new_df.columns = ['Date Since 100 Cases']
new_df 

Unnamed: 0,Date Since 100 Cases
0,0.0
1,1.0
2,2.0
3,3.0
4,4.0
...,...
135,135.0
136,136.0
137,137.0
138,138.0


In [51]:
for country_name in target_countries:
    #Extract each country
    country = target_country_cases[target_country_cases['Country/Region'] == country_name]
    country = country[country['Confirmed Cases'] >= threshold]
    start_date = country['Date'].min()
    country['Date Since 50 Cases'] = (country['Date'] - start_date)/pd.Timedelta('1 days')
    country['Confirmed Cases (log10)'] = np.log10(country['Confirmed Cases'])
    # New column for each country
    new_df.join(country['Confirmed Cases (log10)'], lsuffix='_caller', rsuffix='_other')
    
new_df

Unnamed: 0,Date Since 100 Cases
0,0.0
1,1.0
2,2.0
3,3.0
4,4.0
...,...
135,135.0
136,136.0
137,137.0
138,138.0


**3. MORE INTERACTIVE GRAPH WITH HOVER FUNCTION - COMPARING COUNTRY SINCE 100 CONFIRMED CASES**

- Hightlight the country of interest on graph

In [52]:
countries_since_100cases.head()

Unnamed: 0,Date,Country/Region,Confirmed Cases,Deaths,Date Since 100 Cases,Confirmed Cases (log10)
7762,2020-03-03,USA,118,7,0.0,2.071882
7949,2020-03-04,USA,149,11,1.0,2.173186
8137,2020-03-05,USA,219,12,2.0,2.340444
8326,2020-03-06,USA,267,14,3.0,2.426511
8514,2020-03-07,USA,403,17,4.0,2.605305


In [53]:
### Need to separate each country as x and y
for country_name in target_countries:
    country = countries_since_100cases[countries_since_100cases['Country/Region'] == country_name]
    x = country['Date Since 100 Cases']
    y = country['Confirmed Cases (log10)']

In [54]:
country_name = 'USA'
country = countries_since_100cases[countries_since_100cases['Country/Region'] == country_name]
x_USA = country['Date Since 100 Cases']
y_USA = country['Confirmed Cases (log10)']

In [55]:
## NOT IN USE - JUST A REFERENCE
## function for a reference line
# def create_ref_line(threshold, num_days, x_limit, name):
#     x = np.arange(0, x_limit)
#     y = np.log10(2**(x/num_days + np.log2(threshold)))
#     ref_line = pd.DataFrame(columns = ['x', 'y','Name of Reference Line'])
#     ref_line['x'] = x
#     ref_line['y'] = y
#     ref_line['Name of Reference Line'] = name
#     return ref_line

# threshold = 100
# ## Add lines
# line1 = create_ref_line(threshold, 1, 15, 'Case doubles every day')
# line3 = create_ref_line(threshold, 3, 50, 'Case doubles every 3 days')
# line6 = create_ref_line(threshold, 6, 80, 'Case doubles every 6 days')
# line14 = create_ref_line(threshold, 14, 80, 'Case doubles every 2 weeks')
# line30 = create_ref_line(threshold, 30, 80, 'Case doubles every month')

In [56]:
### HOW NUMBER OF COVID-19 CASES INCREASES SINCE 100 CASES IN EACH COUNTRY
## Using FigureWidget

threshold = 100 # 100 cases

# Reference line:
def create_line(threshold, num_days, x_limit):
    return [0, np.log10(2**(0 + np.log2(threshold))), x_limit, 
            np.log10(2**(x_limit/num_days + np.log2(threshold)))]
line1 = create_line(100, 1, 15) # 'Case doubles every day'
line3 = create_line(threshold, 3, 50) # 'Case doubles every 3 days'
line6 = create_line(threshold, 6, 80) # 'Case doubles every 6 days'
line14 = create_line(threshold, 14, 80) # 'Case doubles every 2 weeks'
line30 = create_line(threshold, 30, 80) # 'Case doubles every month'
    
fig = go.FigureWidget()
fig.layout.hovermode = 'closest'
fig.layout.hoverdistance = -1 #ensures no "gaps" for selecting sparse data
default_linewidth = 2
highlighted_linewidth_delta = 2



## Update layout
fig.update_layout(title = {'text': '<b>Number of Confirmed Cases by Country Since 100 Cases</b>',
                           'x': 0.5},
                   xaxis_title = '<b>Days Since 100 Cases</b>',
                   yaxis_title = '<b>Confirmed Cases</b>',
                     legend = {'title': None})

## Add country data lines
annotation_list = [] #add annotation to be use, lists can contain different variable types
for country_name in target_countries:
    country = countries_since_100cases[countries_since_100cases['Country/Region'] == country_name]
    x = country['Date Since 100 Cases']
    y = country['Confirmed Cases (log10)']
    annotation_list.extend([x.iloc[-1], y.iloc[-1], country_name])
    fig.add_trace(go.Scatter(x = x, y = y, mode='lines',
                         name = country_name,
                        showlegend = False,
                         opacity = 0.8,
                         line = dict(dash = 'solid', width = 2,
                                   color = 'rgb(200, 200, 200)')))
fig.add_shape(
        # Case doubles every day
            type="line",
            x0 = line1[0],
            y0 = line1[1],
            x1 = line1[2],
            y1 = line1[3],
            line = dict(
                color = 'rgb(200, 200, 200)',
                width = 2,
                dash = "dash",
            )
)
fig.add_shape(
        # Case doubles every 3 days
            type="line",
            x0 = line3[0],
            y0 = line3[1],
            x1 = line3[2],
            y1 = line3[3],
            line = dict(
                color = 'rgb(200, 200, 200)',
                width = 2,
                dash = "dash",
            )
)
fig.add_shape(
        # Case doubles every 6 days
            type="line",
            x0 = line6[0],
            y0 = line6[1],
            x1 = line6[2],
            y1 = line6[3],
            line = dict(
                color = 'rgb(200, 200, 200)',
                width = 2,
                dash = "dash",
            )
)
fig.add_shape(
        # Case doubles every 14 days
            type="line",
            x0 = line14[0],
            y0 = line14[1],
            x1 = line14[2],
            y1 = line14[3],
            line = dict(
                color = 'rgb(200, 200, 200)',
                width = 2,
                dash = "dash",
            )
)
fig.add_shape(
        # Case doubles every 1 month
            type="line",
            x0 = line30[0],
            y0 = line30[1],
            x1 = line30[2],
            y1 = line30[3],
            line = dict(
                color = 'rgb(200, 200, 200)',
                width = 2,
                dash = "dash",
            )
)
## Add annotation for reference lines 
fig.update_layout(
    annotations = [
        dict(
            x = line1[2],
            y = line1[3],
            # xanchor="right",
            text = "Case doubles every day"
        ),
        dict(
            x = line3[2],
            y = line3[3],
            text = "Case doubles every 3 days"
        ),
        dict(
            x = line6[2],
            y = line6[3],
            text = "Case doubles every 6 days"
        ),
        dict(
            x = line14[2],
            y = line14[3],
            text = "Case doubles every 2 weeks"
        ),
        dict(
            x = line30[2],
            y = line30[3],
            text = "Case doubles every 1 month"
        )
    ]
)

## Set y-axis to real number of cases (instead of log(cases)) to be more clear.
# fig.update_yaxes(
#     ticktext=["100 ", "1K ", "10K ", "100K ", "1M "],
#     tickvals=[2, 3, 4, 5, 6],
# )

# our custom event handler
def update_trace(trace, points, selector):
    # this list stores the points which were clicked on
    # in all but one event they it be empty
    if len(points.point_inds) > 0:
        for i in range( len(fig.data) ):
            #fig.data[i]['line']['width'] = default_linewidth + highlighted_linewidth_delta * (i == points.trace_index)
            if i == points.trace_index:
                fig.data[i]['line']['color'] = 'red'
                fig.data[i]['line']['width'] = default_linewidth + highlighted_linewidth_delta
            else:
                fig.data[i]['line']['color'] = 'rgb(200, 200, 200)'
                fig.data[i]['line']['width'] = default_linewidth
                
# we need to add the on_click event to each trace separately       
for i in range( len(fig.data) ):
    fig.data[i].on_click(update_trace)

# let's show the figure 
fig

FigureWidget({
    'data': [{'line': {'color': 'rgb(200, 200, 200)', 'dash': 'solid', 'width': 2},
           …

In [57]:
### NOT IN USE, JUST A REFERENCE - ADD COUNTRY NAME TO EACH LINE ON GRAPH
### HOW NUMBER OF COVID-19 CASES INCREASES SINCE 100 CASES IN EACH COUNTRY
## Using FigureWidget

# threshold = 100 # 100 cases

# # Reference line:
# def create_line(threshold, num_days, x_limit):
#     return [0, np.log10(2**(0 + np.log2(threshold))), x_limit, 
#             np.log10(2**(x_limit/num_days + np.log2(threshold)))]
# line1 = create_line(100, 1, 15) # 'Case doubles every day'
# line3 = create_line(threshold, 3, 50) # 'Case doubles every 3 days'
# line6 = create_line(threshold, 6, 80) # 'Case doubles every 6 days'
# line14 = create_line(threshold, 14, 80) # 'Case doubles every 2 weeks'
# line30 = create_line(threshold, 30, 80) # 'Case doubles every month'
    
# fig = go.FigureWidget()
# fig.layout.hovermode = 'closest'
# fig.layout.hoverdistance = -1 #ensures no "gaps" for selecting sparse data
# default_linewidth = 2
# highlighted_linewidth_delta = 2



# ## Update layout
# fig.update_layout(title = {'text': '<b>Number of Confirmed Cases by Country Since 100 Cases</b>',
#                            'x': 0.5},
#                    xaxis_title = '<b>Days Since 100 Cases</b>',
#                    yaxis_title = '<b>Confirmed Cases</b>',
#                      legend = {'title': None})

# ## Add country data lines
# annotation_list = [] #add annotation to be use, lists can contain different variable types
# x_list = []
# y_list = []
# for country_name in target_countries:
#     country = countries_since_100cases[countries_since_100cases['Country/Region'] == country_name]
#     x = country['Date Since 100 Cases']
#     y = country['Confirmed Cases (log10)']
#     annotation_list.append(country_name)
#     x_list.append(x.iloc[-1])
#     y_list.append(y.iloc[-1])
    
#     #draw line
#     fig.add_trace(go.Scatter(x = x, y = y, mode='lines',
#                          name = country_name,
#                         showlegend = False,
#                          opacity = 0.8,
#                          line = dict(dash = 'solid', width = 2,
#                                    color = 'rgb(200, 200, 200)')))

# fig.add_shape(
#         # Case doubles every day
#             type="line",
#             x0 = line1[0],
#             y0 = line1[1],
#             x1 = line1[2],
#             y1 = line1[3],
#             line = dict(
#                 color = 'rgb(200, 200, 200)',
#                 width = 2,
#                 dash = "dash",
#             )
# )
# fig.add_shape(
#         # Case doubles every 3 days
#             type="line",
#             x0 = line3[0],
#             y0 = line3[1],
#             x1 = line3[2],
#             y1 = line3[3],
#             line = dict(
#                 color = 'rgb(200, 200, 200)',
#                 width = 2,
#                 dash = "dash",
#             )
# )
# fig.add_shape(
#         # Case doubles every 6 days
#             type="line",
#             x0 = line6[0],
#             y0 = line6[1],
#             x1 = line6[2],
#             y1 = line6[3],
#             line = dict(
#                 color = 'rgb(200, 200, 200)',
#                 width = 2,
#                 dash = "dash",
#             )
# )
# fig.add_shape(
#         # Case doubles every 14 days
#             type="line",
#             x0 = line14[0],
#             y0 = line14[1],
#             x1 = line14[2],
#             y1 = line14[3],
#             line = dict(
#                 color = 'rgb(200, 200, 200)',
#                 width = 2,
#                 dash = "dash",
#             )
# )
# fig.add_shape(
#         # Case doubles every 1 month
#             type="line",
#             x0 = line30[0],
#             y0 = line30[1],
#             x1 = line30[2],
#             y1 = line30[3],
#             line = dict(
#                 color = 'rgb(200, 200, 200)',
#                 width = 2,
#                 dash = "dash",
#             )
# )
# ## Add annotation for reference lines 
# fig.update_layout(
#     annotations = [
#         dict(
#             x = line1[2],
#             y = line1[3],
#             # xanchor="right",
#             text = "Case doubles every day"
#         ),
#         dict(
#             x = line3[2],
#             y = line3[3],
#             text = "Case doubles every 3 days"
#         ),
#         dict(
#             x = line6[2],
#             y = line6[3],
#             text = "Case doubles every 6 days"
#         ),
#         dict(
#             x = line14[2],
#             y = line14[3],
#             text = "Case doubles every 2 weeks"
#         ),
#         dict(
#             x = line30[2],
#             y = line30[3],
#             text = "Case doubles every 1 month"
#         )
#     ]
# )

# fig.add_trace(go.Scatter(
#     x=x_list,
#     y=y_list,
#     mode="text",
#     name="Country name",
#     text=annotation_list,
#     textposition="top right",
#     textfont=dict(
#         color="crimson")
# ))
# fig.update_layout(showlegend=False)

# # Set y-axis to real number of cases (instead of log(cases)) to be more clear.
# fig.update_yaxes(
#     ticktext=["100 ", "1K ", "10K ", "100K ", "1M "],
#     tickvals=[2, 3, 4, 5, 6],
# )

# # our custom event handler
# def update_trace(trace, points, selector):
#     # this list stores the points which were clicked on
#     # in all but one event they it be empty
#     if len(points.point_inds) > 0:
#         for i in range( len(fig.data) ):
#             #fig.data[i]['line']['width'] = default_linewidth + highlighted_linewidth_delta * (i == points.trace_index)
#             if i == points.trace_index:
#                 fig.data[i]['line']['color'] = 'red'
#                 fig.data[i]['line']['width'] = default_linewidth + highlighted_linewidth_delta
#             else:
#                 fig.data[i]['line']['color'] = 'rgb(200, 200, 200)'
#                 fig.data[i]['line']['width'] = default_linewidth
                
# # we need to add the on_click event to each trace separately       
# for i in range( len(fig.data) ):
#     fig.data[i].on_click(update_trace)

# # let's show the figure 
# fig

**--- NOTE ABOUT HOW TO CALCULATE THE REFERENCE LINE FOR GRAPH ---**

Set threshold = 50 (for 50 cases at day 0)

**1. Number of cases doubles every day:**

x = 0, 1, 2, ...

y = 2^(x + log2(threshold))
=> The result of y would be: 50, 100, 200, 400, 800, ...

**2. Number of cases doubles every 2 days:**

x = 0, 1, 2...

y = 2^(x/2 + log2(threshold))
=> The result of y would be: 50, 71, 100, 141, 200, 282, 400....

**3. Number of cases doubles every month (30 days):**

x = 0, 1, 2...

y = 2^(x/30 + log2(threshold))
=> The result of y would be: 50,51,17, 52.36, 53.59, ...

**G. FURTHER READING**

My report on Medium about the Vietnam data I worked on, combining with the world data analysis. (Coming soon)

My other kernels about the same topic: https://www.kaggle.com/nhntran/covid-19-vietnam-data-eda-and-visualization?scriptVersionId=32963257


**If this kernel helps you, please upvote. Thank you! **