<h>
<font face = "Verdana" size ="5">This notebook visualizes the spread of COVID-19 across the world under multiple analysis perspectives.</font>
<h>
    <font face = "Verdana" size ="4">
    <br>
    <br>Data: <a href='https://github.com/CSSEGISandData/COVID-19'>https://github.com/CSSEGISandData/COVID-19</a>
    <br>Learn more from the <a href='https://www.who.int/emergencies/diseases/novel-coronavirus-2019'>WHO</a>
    <br>Learn more from the <a href='https://www.cdc.gov/coronavirus/2019-ncov'>CDC</a>
    <br>Learn more from the <a href='https://github.com/therealcyberlord'>Xingyu Bian</a>
    <br>Map Visualizations from  <a href='https://towardsdatascience.com/coronavirus-data-visualizations-using-plotly-cfbdb8fcfc3d'>Terence Shin</a>
    <br>
    </font>
    
   <font face = "Verdana" size ="4">
   <br>Feel free to provide me with feedbacks. 
   <br>Author: Zhang Nan. 
   <br>Contact: adam21zhang@gmail.com
    <br> Last update: 3/28/2020 10:00 PM
    <br> Make sure you run the notebook to see the graphs better. 
 </font>
 <font face = "Verdana" size ="1">
<center><img src='https://blog.covance.com/wp-content/uploads/2017/04/vaccine-graphic-covance-april-768x512.jpg'>
 Source: https://blog.covance.com/wp-content/uploads/2017/04/vaccine-graphic-covance-april-768x512.jpg </center> 
    
 <font face = "Verdana" size ="4"> Keep strong, world! Stay safe. </font>


In [None]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')

In [None]:
import numpy as np 
import matplotlib.pyplot as plt 
import matplotlib.colors as mcolors
import pandas as pd 
import random
import math
import time
from sklearn.linear_model import LinearRegression, BayesianRidge
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, mean_absolute_error
import datetime
import operator 
plt.style.use('fivethirtyeight')
%matplotlib inline 

Import the data (make sure you update this on a daily basis)

In [None]:
confirmed_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
deaths_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
recoveries_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')

# Plot the world overall trend 
*     Plus Top 10 most serious countries details


In [None]:
# Merge the three tables to one table so that the it is easy to control : new table columns = ['Province/State','Country/Region','Lat','Long', 'Date', 'No.Confirmed', 'No.death', 'No.Recoveries']
# Unpivot the tables 
basic_columns = ['Province/State','Country/Region', 'Lat','Long']
confirmed_df_pivote = confirmed_df.melt(id_vars=basic_columns, var_name='Date', value_name='No.Confirmed')
confirmed_df_pivote['Date'] = pd.to_datetime(confirmed_df_pivote['Date'])
deaths_df_pivote = deaths_df.melt(id_vars=basic_columns, var_name='Date', value_name='No.death')
deaths_df_pivote['Date'] = pd.to_datetime(deaths_df_pivote['Date'])
recoveries_df_pivote = recoveries_df.melt(id_vars=basic_columns, var_name='Date', value_name='No.Recoveries')
recoveries_df_pivote['Date'] = pd.to_datetime(recoveries_df_pivote['Date'])
# merge the sperate tables into one 
result = pd.merge(confirmed_df_pivote, deaths_df_pivote,  how='left', on = ['Province/State','Country/Region', 'Lat','Long', 'Date'])
result = pd.merge(result, recoveries_df_pivote,  how='left', on = ['Province/State','Country/Region', 'Lat','Long', 'Date'])
result.head()

In [None]:
# get overall trend data first

df_overall  = result.groupby('Date').agg({'No.Confirmed':'sum', 'No.death':'sum', 'No.Recoveries': 'sum'})\
                    .reset_index().sort_values(by='Date',ascending=True)

In [None]:
import plotly.graph_objects as go
# Create traces
fig = go.Figure()
fig.add_trace(go.Line(x=df_overall['Date'], y=df_overall['No.Confirmed'],
                    mode='lines',
                    name='No.Confirmed'))
fig.add_trace(go.Line(x=df_overall['Date'], y=df_overall['No.death'],
                    mode='lines',
                    name='No.death'))
fig.add_trace(go.Line(x=df_overall['Date'], y=df_overall['No.Recoveries'],
                    mode='lines', name='No.Recoveries'))

fig.update_layout(title_text="<b>Global trend</b>",)

fig.update_yaxes(title_text="<b>Numbers</b>")

fig.show()

In [None]:
# the top 10 countries data

top_countries  = list(result.groupby('Country/Region')\
                .agg({'No.Confirmed':'sum'})
                .sort_values(by='No.Confirmed',ascending=False)
                .reset_index()
                .head(10)['Country/Region'])

df_topCountries = result.loc[result['Country/Region'].isin(top_countries)]

In [None]:
# start plotting the top 10 countries  
fig = go.Figure()

for country in top_countries:
   
    df_inter =  df_topCountries.loc[df_topCountries['Country/Region'] == country]\
                .groupby(['Date','Country/Region'])\
                .agg({'No.Confirmed':'sum'})\
                .reset_index()\
                .sort_values(by='Date',ascending=True)
    
    fig.add_trace(go.Line(x=df_inter['Date'], y=df_inter['No.Confirmed'],
                    mode='lines',
                    name= country))

fig.update_layout(title_text="<b>Top 10 countries trend for confirmed cases</b>",)

fig.update_yaxes(title_text="<b>Numbers</b>")

fig.show()

In [None]:
# start plotting the top 10 countries  
fig = go.Figure()

for country in top_countries:
   
    df_inter =  df_topCountries.loc[df_topCountries['Country/Region'] == country]\
                .groupby(['Date','Country/Region'])\
                .agg({'No.death':'sum'})\
                .reset_index()\
                .sort_values(by='Date',ascending=True)
    
    fig.add_trace(go.Line(x=df_inter['Date'], y=df_inter['No.death'],
                    mode='lines',
                    name= country))

fig.update_layout(title_text="<b>Top 10 countries trend for Death cases</b>",)

fig.update_yaxes(title_text="<b>Numbers</b>")

fig.show()

In [None]:
# start plotting the top 10 countries  
fig = go.Figure()

for country in top_countries:
   
    df_inter =  df_topCountries.loc[df_topCountries['Country/Region'] == country]\
                .groupby(['Date','Country/Region'])\
                .agg({'No.Recoveries':'sum'})\
                .reset_index()\
                .sort_values(by='Date',ascending=True)
    
    fig.add_trace(go.Line(x=df_inter['Date'], y=df_inter['No.Recoveries'],
                    mode='lines',
                    name= country))

fig.update_layout(title_text="<b>Top 10 countries trend for Recoveries cases</b>",)

fig.update_yaxes(title_text="<b>Numbers</b>")

fig.show()

# Plot the Dynamic spread process Globally

In [None]:
result['Date'] = result['Date'].apply(lambda x: x.strftime('%Y-%m-%d'))

In [None]:
result_map = result.groupby(['Date','Country/Region'])\
                .agg({'No.Confirmed':'sum', 'No.death':'sum', 'No.Recoveries': 'sum'})\
                .reset_index()\
                .sort_values(by='Date',ascending=True)

In [None]:
import plotly.express as px
fig = px.choropleth(result_map, 
                    locations="Country/Region", 
                    locationmode = "country names",
                    color="No.Confirmed", 
                    hover_name="Country/Region", 
                    animation_frame="Date"
                   )

fig.update_layout(
    title_text = 'Spread of Coronavirus',
    title_x = 0.5,
    geo=dict(
        showframe = False,
        showcoastlines = False,
    ))
    
fig.show()

In [None]:
set(result['Country/Region'])

In [None]:
def plot_country(result,country):

    df = result.loc[result['Country/Region'] == country]
    
    fig = go.Figure()
    fig.add_trace(go.Line(x=df['Date'], y=df['No.Confirmed'],
                        mode='lines',
                        name='No.Confirmed'))
    fig.add_trace(go.Line(x=df['Date'], y=df['No.death'],
                        mode='lines',
                        name='No.death'))
    fig.add_trace(go.Line(x=df['Date'], y=df['No.Recoveries'],
                        mode='lines', name='No.Recoveries'))

    fig.update_layout(title_text=f"<b>{country} trend</b>",)

    fig.update_yaxes(title_text="<b>Numbers</b>")

    fig.show()
    
plot_country(result,'Singapore')

In [None]:
plot_country(result,'US')

In [None]:
plot_country(result,'Japan')

# Conclusions
1. China is entering the later diffusion period for the COVID-19 as the accumulated number of confirmed case and death slowly increase while the other countries like Italy and the US are in the diffusion 'take-off' period.
2. According to the analysis of China's diffusion pattern, China's 'take-off' period is 30 days (From Jan 31 to Mar 1). Assuming the other countries take strict and efficient actions like China, Italy's COVID-19 spread process will enter the later diffusion period in 30 days later.