<h1>COVID19 World and Italy monitor<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Notebook-Description" data-toc-modified-id="Notebook-Description-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Notebook Description</a></span></li><li><span><a href="#Italy-Monitor" data-toc-modified-id="Italy-Monitor-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Italy Monitor</a></span><ul class="toc-item"><li><span><a href="#Geographical-analysis-of-traffic-for-Milan" data-toc-modified-id="Geographical-analysis-of-traffic-for-Milan-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Geographical analysis of traffic for Milan</a></span></li><li><span><a href="#Provinces" data-toc-modified-id="Provinces-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Provinces</a></span><ul class="toc-item"><li><span><a href="#Top-provinces-for-the-day" data-toc-modified-id="Top-provinces-for-the-day-2.2.1"><span class="toc-item-num">2.2.1&nbsp;&nbsp;</span>Top provinces for the day</a></span></li><li><span><a href="#Provinces-analysis-per-region" data-toc-modified-id="Provinces-analysis-per-region-2.2.2"><span class="toc-item-num">2.2.2&nbsp;&nbsp;</span>Provinces analysis per region</a></span></li></ul></li><li><span><a href="#Regional-analysis" data-toc-modified-id="Regional-analysis-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Regional analysis</a></span><ul class="toc-item"><li><span><a href="#Daily-cases" data-toc-modified-id="Daily-cases-2.3.1"><span class="toc-item-num">2.3.1&nbsp;&nbsp;</span>Daily cases</a></span></li><li><span><a href="#Total-cases-decomposition" data-toc-modified-id="Total-cases-decomposition-2.3.2"><span class="toc-item-num">2.3.2&nbsp;&nbsp;</span>Total cases decomposition</a></span></li><li><span><a href="#Evolution-analysis" data-toc-modified-id="Evolution-analysis-2.3.3"><span class="toc-item-num">2.3.3&nbsp;&nbsp;</span>Evolution analysis</a></span></li><li><span><a href="#Model-fitting" data-toc-modified-id="Model-fitting-2.3.4"><span class="toc-item-num">2.3.4&nbsp;&nbsp;</span>Model fitting</a></span></li></ul></li></ul></li><li><span><a href="#World-Monitor" data-toc-modified-id="World-Monitor-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>World Monitor</a></span><ul class="toc-item"><li><span><a href="#Top-countries" data-toc-modified-id="Top-countries-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Top countries</a></span></li><li><span><a href="#Total-cases-decomposition" data-toc-modified-id="Total-cases-decomposition-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Total cases decomposition</a></span></li><li><span><a href="#Country-comparisons" data-toc-modified-id="Country-comparisons-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Country comparisons</a></span></li><li><span><a href="#Model-fitting" data-toc-modified-id="Model-fitting-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Model fitting</a></span></li></ul></li></ul></div>

# Notebook Description

<img src="image.png">

This repo contains analysis built around data about global and italian diffusion of COVID-19 that might be useful in monitoring the spread of coronavirus around the world.

This monitor does not by any means intend to provide any attempt to predict future evolutions of the virus but only to offer a visual tool to capture its dynamics.

Data are automatically downloaded from publicly available repos such as:

1) John Hopkins CSSE (https://github.com/CSSEGISandData/COVID-19) for world data

2) Protezione Civile Italiana (https://github.com/pcm-dpc/COVID-19) for national data

Just scroll down the notebook to perform various analysis on both worldwide and italian data

Code available at my repo:
https://github.com/mspadaccino/COVID-19

(jupyter notebook and code by Maurizio Spadaccino, april 2020)

In [9]:
%load_ext autoreload
%autoreload 2
import sys
import os
import types
sys.path.append('../')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from src.models import func_log, func_gomp, func_exp, func_ext_log
import inspect
import plotly.express as px
from sklearn.linear_model import LinearRegression
from scipy.integrate import odeint
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
from src.data_downloader import DATA_REPOS, download_from_repo
from src.tools import add_extra_features
from matplotlib.ticker import ScalarFormatter
import matplotlib.ticker as ticker
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import plotly.graph_objects as go
from plotly.graph_objs import Layout
import ast
pd.set_option('display.max_columns', None)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [2]:
dest='../data'

In [22]:
print('updating datasets from repos...')
print('downloading Italian data')
download_from_repo(DATA_REPOS['italy']['url'], filenames=DATA_REPOS['italy']['streams'], dest=dest)
print('downloading world data')
download_from_repo(DATA_REPOS['world']['url'], filenames=DATA_REPOS['world']['streams'], dest=dest)

updating datasets from repos...
downloading Italian data
updated  /dati-andamento-nazionale/dpc-covid19-ita-andamento-nazionale.csv
updated  /dati-regioni/dpc-covid19-ita-regioni.csv
updated  /dati-province/dpc-covid19-ita-province.csv
could not retrieve repo infos,  Error -5 while decompressing data: incomplete or truncated stream
downloading world data
updated  /csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv
updated  /csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
updated  /csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv
last commit  2020-04-08 01:55:40


In [23]:
# loading datasets
df_naz = pd.read_csv(os.path.join(dest,'dpc-covid19-ita-andamento-nazionale.csv')).drop('stato',1)
reg = pd.read_csv(os.path.join(dest,'dpc-covid19-ita-regioni.csv'))
prov = pd.read_csv(os.path.join(dest,'dpc-covid19-ita-province.csv')).drop('stato',1)
df_naz.index = pd.to_datetime(df_naz.index)
reg['data'] = pd.to_datetime(reg['data'])
prov['data'] = pd.to_datetime(prov['data'])
df_world_confirmed = pd.read_csv(os.path.join(dest,'time_series_covid19_confirmed_global.csv'))
df_world_deaths = pd.read_csv(os.path.join(dest,'time_series_covid19_deaths_global.csv'))
df_world_recovered = pd.read_csv(os.path.join(dest,'time_series_covid19_recovered_global.csv'))
populations = pd.read_csv(os.path.join(dest,'API_SP.POP.TOTL_DS2_en_csv_v2.csv'), skiprows=4, engine='python').set_index('Country Name')['2018']
ita_populations = pd.read_csv(os.path.join(dest,'popitaregions.csv'))
df_world_confirmed['pop'] = df_world_confirmed['Country/Region'].map(populations)
df_world_deaths['pop'] = df_world_deaths['Country/Region'].map(populations)
df_world_recovered['pop'] = df_world_recovered['Country/Region'].map(populations)
df_naz = add_extra_features(df_naz)
regions = reg.groupby('denominazione_regione')
df_reg = {}
df_reg['Italy'] = df_naz
for item in regions.groups:
    df_reg[item] = add_extra_features(regions.get_group(item)).replace((np.inf, np.nan), 0)

provinces = prov.groupby('sigla_provincia')
df_prov = pd.DataFrame()
for item in provinces.groups:
    df_prov = pd.concat((df_prov,add_extra_features(provinces.get_group(item)).replace((np.inf, np.nan), 0)),0)

In [24]:
#fixing country and province different names from different datasets
pop_replace = [('US', 'United States'), ('Korea, South', 'Korea, Rep.'), 
               ('Venezuela','Venezuela, RB'), ('Bahamas','Bahamas, The'), 
               ('Iran','Iran, Islamic Rep.'), ('Russia','Russian Federation'), 
               ('Egypt','Egypt, Arab Rep.'), ('Syria','Syrian Arab Republic'),
               ('Slovakia','Slovak Republic'),('Czechia','Czech Republic'),
               ('Congo (Brazzaville)','Congo, Rep.'),
               ('Congo (Kinshasa)','Congo, Dem. Rep.'),('Kyrgyzstan','Kyrgyz Republic'),
               ('Laos','Lao PDR'),('Brunei','Brunei Darussalam'),
               ('Gambia', 'Gambia, The')]
for item in pop_replace:
    try:
        populations.loc[item[0]] = populations.loc[item[1]]
        del populations[item[1]]
    except Exception as e:
        print(e)
        
pops = ita_populations.loc[ita_populations['Regione']=='Trentino-Alto Adige', ['Popolazione','Superficie sqkm','ab/sqkm','Numero_comuni','Numero_province']].values/2
newdf = pd.DataFrame(index = ['P.A. Trento', 'P.A. Bolzano', 'Italy'], columns=ita_populations.set_index('Regione').columns)
newdf.loc['P.A. Trento']=pops[0]
newdf.loc['P.A. Bolzano']=pops[0]
newdf.loc['Italy']= [populations.loc['Italy'], 0., 0., 0., 0.]
newdf.reset_index(inplace=True)
newdf.rename(columns={'index': 'Regione'}, inplace=True)
ita_populations = pd.concat((ita_populations,newdf)).set_index('Regione')

In [25]:
# defining labels for use in next functions
orig_data_columns = ['ricoverati_con_sintomi','terapia_intensiva','totale_ospedalizzati','isolamento_domiciliare',
                    'totale_positivi',
                     #'variazione_totale_positivi',
                     'nuovi_positivi',
                     'dimessi_guariti','deceduti','totale_casi','tamponi']                    
extra_data_columns = ['growth_factor','deceduti_su_tot','deceduti_su_dimessi', 'totale_casi_su_tamponi',
                      'totale_ospedalizzati_su_tamponi','deceduti_su_tamponi']
delta_data_columns = ['daily_'+item for item in orig_data_columns]
percent_delta_data_columns = ['%daily_'+item for item in orig_data_columns]
data_columns = orig_data_columns + extra_data_columns + delta_data_columns + percent_delta_data_columns
prov_data_columns = ['totale_casi', 'daily_totale_casi', '%daily_totale_casi', 'growth_factor']
models = {'gompertz': func_gomp, 'logistic':func_log, 'extended logistic':func_ext_log, 
          'exponential':func_exp, 'log_linear': 'log', 'no_fit':'actual'}
countries_columns = df_world_confirmed['Country/Region'].unique()
countries_labels = ['confirmed', 'recovered', 'deaths', 'daily_confirmed', 'daily_recovered', 'daily_deaths',
                   '%daily_confirmed', '%daily_recovered', '%daily_deaths']

# Italy Monitor

## Geographical analysis of traffic for Milan

This section provides an analysis of traffic variations of main businesses with respect to past observed data, as downloaded from Google. This gets updated once in a while to provide a visual representation of the effects of lockdown in the Milan area.

In [26]:
pd.options.mode.chained_assignment = None
date = '04-08-2020'
data = pd.read_csv('../data/geodata/places_popularity_{}.csv'.format(date))
MILAN_CENTER = (45.4654219, 9.1859243)
data['populartimes'] = data['populartimes'].fillna('['']').apply(lambda x: ast.literal_eval(x)) 
data['business'] = data['types'].fillna('['']').apply(lambda x: ast.literal_eval(x))[0][0]
data['Monday'] = None
data['Tuesday'] = None
data['Wednesday'] = None
data['Thursday'] = None
data['Friday'] = None
data['Saturday'] = None
data['Sunday'] = None
for index, row in enumerate(data.iterrows()):
    for day in row[1]['populartimes']:
        data[day['name']].iloc[index, ] = np.mean(day['data'])
data['avg_popularity'] = data[['Monday','Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']].mean(1)
data['traffic_%_increase'] = data['current_popularity'] / data['avg_popularity'] - 1
types_to_check = data.types.unique()
filtered = pd.DataFrame()
for mytype in types_to_check:
    filtered = pd.concat((filtered,data[data.types.str.contains(mytype)][['name','types','coordinates.lat','coordinates.lng','traffic_%_increase','current_popularity']].dropna()),0)
filtered['types'] = filtered['types'].fillna('['']').apply(lambda x: ast.literal_eval(x)).apply(lambda x: x[0])
filtered['traffic_%_increase'] = np.round(filtered['traffic_%_increase']*100,2)
import plotly.express as px
fig = px.scatter_mapbox(filtered, 
                        lat='coordinates.lat', 
                        lon='coordinates.lng', 
                        color='traffic_%_increase', 
                        size = filtered['current_popularity'],
                        labels = 'traffic_%_increase',
                        text = 'types',
                        hover_name='name',
                        zoom = 11,
                        height=800,
                        mapbox_style="open-street-map",                        
                        title=str('current traffic % difference for Milan area for date {}'.format(date))
                            
               )
fig.update_layout(barnorm='percent')
fig.update_layout(paper_bgcolor='rgba(0,0,0,0)',font = dict(color = 'lightgray'))
fig.show()
pd.options.mode.chained_assignment = 'warn'

## Provinces

### Top provinces for the day

In this section you can view on any selected date, the geographical distribution around the italian provinces.

See which provinces recorder the highest number of selected cases for the day:

In [30]:
@interact
def get_top_provinces(label= prov_data_columns, 
                      top_prov=widgets.IntSlider(min=1,max=50,step=1,value=20), date=widgets.DatePicker(
                      description='Pick a Date',value=pd.to_datetime(df_prov.index.max())),
                      show_map=True, show_grid=False):
    try:
        df_prov.index = pd.to_datetime(df_prov.index)

        if len(label) == 0:
            label = prov_data_columns[:1]
        label = list([label])

        tempdf = df_prov.loc[str(date)][['sigla_provincia','denominazione_provincia', 'lat', 'long']+ label].sort_values(by=label, 
             ascending=False)[:top_prov].set_index('sigla_provincia')

        if show_map:
            fig = px.scatter_mapbox(tempdf, 
#                         lat='lat', lon='long', z=label[0], radius=10, 
                        lat='lat', lon='long', color=label[0], size=label[0],
                        labels = label[0],
                        hover_name='denominazione_provincia',
                        zoom=4.5,  height=800,
                        mapbox_style="open-street-map",                        
                        title='top {} provinces on day {} for {}'.format(top_prov, date.strftime("%m/%d/%Y"), label[0]),
               )
            fig.update_layout(paper_bgcolor='rgba(0,0,0,0)',font = dict(color = 'lightgray'))
            fig.show()
        else:
            fig = px.bar(tempdf[label].reset_index(), x=label[0], y='sigla_provincia', orientation='h')
            fig.update_layout(showlegend=True,title='top {} provinces on day {}'.format(top_prov, date.strftime("%m/%d/%Y")),
                             paper_bgcolor='rgba(0,0,0,0)',font = dict(color = 'lightgray'),plot_bgcolor='rgba(0,0,0,0)')
            fig.update_xaxes(showgrid=show_grid, gridwidth=1, gridcolor='gray')
            fig.update_yaxes(showgrid=show_grid, gridwidth=1, gridcolor='gray')

            fig.show()
    except Exception as e:
        print(e)

interactive(children=(Dropdown(description='label', options=('totale_casi', 'daily_totale_casi', '%daily_total…

### Provinces analysis per region

In this monitor, we can analyse virus statistics for the provinces in each of the selected region on a given date:

In [31]:
@interact
def get_prov_data(label=prov_data_columns, region = list(df_prov.denominazione_regione.unique()),date=widgets.DatePicker(
                      description='Pick a Date',value=pd.to_datetime(df_prov.index.max())), show_grid=False):
    try:
        df_prov.index = pd.to_datetime(df_prov.index)
        if isinstance(df_prov.groupby('denominazione_regione').get_group(region).loc[date
            ], pd.Series):
            temp = pd.DataFrame(df_prov.groupby('denominazione_regione').get_group(region).loc[date]).T
        else:    
            temp = df_prov.groupby('denominazione_regione').get_group(region).loc[date
            ].set_index('denominazione_provincia')[label].sort_values()#.plot(kind='barh', 
        fig = px.bar(temp.reset_index(), x=label, y='denominazione_provincia', orientation='h')
        fig.update_layout(showlegend=True,title='{} on day {}'.format(label, date.strftime("%m/%d/%Y")),
                             paper_bgcolor='rgba(0,0,0,0)',font = dict(color = 'lightgray'),plot_bgcolor='rgba(0,0,0,0)')
        fig.update_xaxes(showgrid=show_grid, gridwidth=1, gridcolor='lightgray')
        fig.update_yaxes(showgrid=show_grid, gridwidth=1, gridcolor='lightgray')

        fig.show()
    except Exception as e:
        print(e)

interactive(children=(Dropdown(description='label', options=('totale_casi', 'daily_totale_casi', '%daily_total…

## Regional analysis

### Daily cases

In this monitor, you can compare data on a given date for one or more regions on both absolute and relative values). 

Please note: more items can be selected, on both region and data fields.

In [32]:
@interact
def get_values_for_day(regions = widgets.SelectMultiple(description="regions",options=list(df_reg.keys())),
                       labels = widgets.SelectMultiple(description="data",options=data_columns),
                       date=widgets.DatePicker(description='Pick a Date',value=pd.to_datetime(df_prov.index.max())),
                       cases_per_mln_people=False, show_grid=False):
    try:
        if len(regions) == 0:
            regions = ['Italy']
        regions = list(regions)    
        if len(labels) == 0:
            labels = ['nuovi_positivi', 'deceduti','dimessi_guariti']
        labels = list(labels)
        mult = 1.
        fig = go.Figure()
        for region in regions:    
            if cases_per_mln_people: 
                mult = 1e06/ita_populations.loc[region, 'Popolazione']
            for item in labels: 
                df_reg[region].index = pd.to_datetime(df_reg[region].index)
            fig.add_traces(go.Bar(y=labels, x=df_reg[region][labels].loc[date]*mult, name=region, orientation='h'))
            fig.update_layout(showlegend=True,title='day ' + str(date.strftime("%m/%d/%Y")),
                             paper_bgcolor='rgba(0,0,0,0)',font = dict(color = 'lightgray'),plot_bgcolor='rgba(0,0,0,0)')
        fig.update_xaxes(showgrid=show_grid, gridwidth=1, gridcolor='lightgray')
        fig.update_yaxes(showgrid=show_grid, gridwidth=1, gridcolor='lightgray')
        fig.show()
    except Exception as e:
        print(e)

interactive(children=(SelectMultiple(description='regions', options=('Italy', 'Abruzzo', 'Basilicata', 'Calabr…

### Total cases decomposition

In [33]:
@interact
def get_regional_pie(region = df_reg.keys(),
                     case = ['totale casi', 'attualmente positivi'],
                     date=widgets.DatePicker(description='Pick a Date',value=pd.to_datetime(df_reg['Italy'].index.max())),):
    try:
        if len(region) == 0:
            region = 'Italy'
        if case == 'attualmente positivi':
            labels = ['ricoverati_con_sintomi','terapia_intensiva', 'isolamento_domiciliare']
        else:
            labels = ['totale_positivi','dimessi_guariti', 'deceduti']
    #     df = df_reg[region][labels]
    #     df = pd.concat((df[['ricoverati_con_sintomi','terapia_intensiva', 'isolamento_domiciliare']].sum(1), df['totale_positivi']),1)
    #     df = pd.concat((df[['totale_positivi','dimessi_guariti', 'deceduti']].sum(1), df['totale_casi']),1)   
        df_reg[region].index = pd.to_datetime(df_reg[region].index)
        fig = px.pie(df_reg[region].loc[str(date)], labels=labels, names=labels, 
                     values = df_reg[region][labels].loc[str(date)].values, 
                     title = '{} decomposition on date {} for {}'.format(case, date.strftime("%m/%d/%Y"), region)
                    )
        fig.update_layout(paper_bgcolor='rgba(0,0,0,0)',font = dict(color = 'lightgray'),plot_bgcolor='rgba(0,0,0,0)')
        fig.update_traces(textposition='inside', textinfo='percent+label')
        fig.show()
    except Exception as e:
        print(e)

interactive(children=(Dropdown(description='region', options=('Italy', 'Abruzzo', 'Basilicata', 'Calabria', 'C…

### Evolution analysis

In this section, it is possible to visualize evolution of selected cases and compare it between different regions (multiple selection allowed). When flag relative_date is selected, time series are normalized on their x_axis in order to compare cases starting from their first appearance date.

In [34]:
@interact
def plt_region(regions = widgets.SelectMultiple(description="regions",options=list(df_reg.keys())), 
               labels = widgets.SelectMultiple(description="fields",options=data_columns),
               log=False, relative_dates=False, cases_per_mln_people=False, plot_bars=True, show_grid=False,
               aggregate=False):    
    try:
        if len(labels) == 0:
            labels = data_columns[:1] 
        labels = list(labels)
        if len(regions) == 0:
            regions = ['Italy']
        regions = list(regions)  
        fig = go.Figure()
        mult = 1.
        for item in labels:
            if aggregate:
                if cases_per_mln_people: 
                    mult = 1e06/ita_populations.loc[regions, 'Popolazione'].sum()                
                temp = df_reg[regions[0]][item].copy()                
                for region in regions[1:]:
                    temp = temp.add(df_reg[region][item])
                temp = pd.DataFrame(temp)
                if relative_dates: temp = temp.loc[~(temp[item]==0)].reset_index(drop=True).iloc[:-1] 
                if log:
                    if plot_bars:
                        fig.add_traces(go.Bar(x=temp.index, y=temp[item]*mult, name=item+'_'+'-'.join(regions)))
                    else: 
                        fig.add_traces(go.Scatter(x=temp.index, y=temp[item]*mult, name=item+'_'+'-'.join(regions)))
                else:
                    if plot_bars:
                        fig.add_traces(go.Bar(x=temp.index, y=temp[item]*mult, name=item+'_'+'-'.join(regions)))
                    else:
                        fig.add_traces(go.Scatter(x=temp.index, y=temp[item]*mult, name=item+'_'+'-'.join(regions)))
                fig.update_layout(legend_orientation="h")

            else:
                for region in regions:
                    if cases_per_mln_people: 
                        mult = 1e06/ita_populations.loc[region, 'Popolazione']
                    df_reg[region].index = pd.to_datetime(df_reg[region].index)
                    temp = df_reg[region]
                    
                    if relative_dates: temp = temp.loc[~(temp[item]==0)].reset_index(drop=True).iloc[:-1] 
                    if log:
                        if plot_bars:
                            fig.add_traces(go.Bar(x=temp.index, y=temp[item]*mult, name=item+'_'+region))
                        else: 
                            fig.add_traces(go.Scatter(x=temp.index, y=temp[item]*mult, name=item+'_'+region))
                    else:
                        if plot_bars:
                            fig.add_traces(go.Bar(x=temp.index, y=temp[item]*mult, name=item+'_'+region))
                        else:
                            fig.add_traces(go.Scatter(x=temp.index, y=temp[item]*mult, name=item+'_'+region))
                    
        fig.update_layout(showlegend=True,paper_bgcolor='rgba(0,0,0,0)',font = dict(color = 'lightgray'),plot_bgcolor='rgba(0,0,0,0)')
        if log: fig.update_layout(yaxis_type="log")
        fig.update_xaxes(showgrid=show_grid, gridwidth=1, gridcolor='lightgray')
        fig.update_yaxes(showgrid=show_grid, gridwidth=1, gridcolor='lightgray')

        fig.show()
    except Exception as e:
        print(e)   

interactive(children=(SelectMultiple(description='regions', options=('Italy', 'Abruzzo', 'Basilicata', 'Calabr…

### Model fitting

In this section, it is possible to fit various models in order to provide a visual forecasting according to the selected model.

Various options are available:

1) Generalized logistic

2) Extended logistic

3) Gompertz growth model

4) Exponential model

5) Logarithmic-linear regression model

6) No fit (just actual data)

for cases where there is no reason to expect a saturation ("plateau"), such as daily new infected data, it is possibile to fit the data a as a derivative by setting fit_differential flag to True


In [35]:
@interact
def get_model(region=list(df_reg.keys()), 
              start_fit=widgets.DatePicker(value=pd.to_datetime(df_naz.index[0])), 
              end_fit=widgets.DatePicker(value=pd.to_datetime(df_naz.index[-1])), fwd_look=50, 
              func=models, label = data_columns, 
              stdev=widgets.IntSlider(min=0, max=3, value=0),
              fit_differential=False, plot_bars=False, show_grid=False):
    try:
        if region=='Italy':
            df = df_naz
        else:
            df = df_reg[region]
        df.index = pd.to_datetime(df.index)
        start_fit = pd.Timestamp(start_fit)
        end_fit = pd.Timestamp(end_fit)
        y_fit = df[label].loc[start_fit:end_fit].dropna()
        x_fit = range(len(y_fit.index))
        if isinstance(func, types.FunctionType):
            x_pred2 = range(len(df.index)+fwd_look)
            x_pred1 = range(len(df.index))
            sig = inspect.signature(func)
            n_params = len(sig.parameters.items()) -1
            if fit_differential:
                y_fit = y_fit.cumsum()
            params, params_cov = curve_fit(func, x_fit, y_fit, 
                                bounds=([0. for item in range(n_params)], 
                                        [np.inf for item in range(n_params)]), 
                                           method='trf', maxfev=10000)
            stderr = np.sqrt(np.diag(params_cov))
            params_up = params + stderr * stdev
            params_down = params - stderr * stdev
            y_pred1 = func(x_pred1, *params)
            y_pred2 = func(x_pred2, *params)
            y_pred_up = func(x_pred2, *params_up)
            y_pred_down = func(x_pred2, *params_down)
            if fit_differential:
                y_pred1 = np.diff(y_pred1)
                y_pred2 = np.diff(y_pred2)
                y_pred_up = np.diff(y_pred_up)
                y_pred_down = np.diff(y_pred_down)
            errors = (y_pred_up - y_pred_down)
            rmse = np.sqrt(np.mean((y_fit - func(x_fit, *params)) ** 2))
        elif func=='log':
            x_pred2 = range(len(df.index)+fwd_look)
            x_pred1 = range(len(df.index))
            model = LinearRegression()
            model.fit(np.array(x_fit).reshape(-1,1), np.log1p(y_fit))
            r2 = model.score(np.array(x_fit).reshape(-1, 1), np.log1p(y_fit))
            params = model.coef_
            y_pred1 = model.predict(np.array(x_pred1).reshape(-1,1))
            y_pred2 = model.predict(np.array(x_pred2).reshape(-1,1))
            errors = 0.
        fig = go.Figure()
        if isinstance(func, types.FunctionType):
            if plot_bars:
                fig.add_traces(go.Bar(x=pd.date_range(start=df.index.min(), end=df.index.max()), y=df[label].values, 
                                      name=label))
            else:
                fig.add_traces(go.Scatter(x=pd.date_range(start=df.index.min(), end=df.index.max()), y=df[label].values, 
                                  name=label, mode='markers'))

            fig.add_traces(go.Scatter(x=pd.date_range(start=start_fit, end=end_fit), 
                                  y=y_pred1[:(end_fit-start_fit).days], name='model rmse: '+str(int(rmse))))
            fig.add_traces(go.Scatter(x=pd.date_range(start=end_fit, end=df.index.max()+pd.Timedelta(str(fwd_look)+'d')), 
                                  y=y_pred2[(end_fit-start_fit).days:],error_y=dict(array=errors,color='green',
                                        thickness=.2,width=0.5), name='forecast'))   
        elif func=='log':
            fig.add_traces(go.Scatter(x=pd.date_range(start=df.index.min(), 
                                end=df.index.max()), y=np.log1p(df[label].values), 
                                  name=label, mode='markers'))
            fig.add_traces(go.Scatter(x=pd.date_range(start=start_fit, end=end_fit), 
                                  y=y_pred1[:(end_fit-start_fit).days], 
                                      name='model rmse: '+str(np.round(r2,2))))
            fig.add_traces(go.Scatter(x=pd.date_range(start=end_fit, end=df.index.max()+pd.Timedelta(str(fwd_look)+'d')), 
                                  y=y_pred2[(end_fit-start_fit).days:]
                                      , name='log forecast'))   

        else:
            if plot_bars:
                fig.add_traces(go.Bar(x=pd.date_range(start=df.index.min(), end=df.index.max()), y=df[label].values, 
                                  name=label))                
            else:
                fig.add_traces(go.Scatter(x=pd.date_range(start=df.index.min(), end=df.index.max()), y=df[label].values, 
                                  name=label))


        fig.update_layout(showlegend=True,paper_bgcolor='rgba(0,0,0,0)',font = dict(color = 'lightgray'),plot_bgcolor='rgba(0,0,0,0)')
        fig.update_xaxes(showgrid=show_grid, gridwidth=1, gridcolor='lightgray')
        fig.update_yaxes(showgrid=show_grid, gridwidth=1, gridcolor='lightgray')
        fig.show()
    except Exception as e:
        print(e)

interactive(children=(Dropdown(description='region', options=('Italy', 'Abruzzo', 'Basilicata', 'Calabria', 'C…

# World Monitor

## Top countries

Analysing top countries for given date on selected cases

In [38]:
@interact
def get_top_countries(labels=countries_labels, 
                      top_prov=widgets.IntSlider(min=1,max=80,step=1,value=10), date=widgets.DatePicker(
                      description='Pick a Date',value=pd.to_datetime(pd.to_datetime([item for item in df_world_confirmed.columns if '/' in item][-1]))),
                      show_grid=False, show_map=True):
    try:
            datecols = df_world_confirmed.columns.difference(['Province/State','Country/Region','Lat','Long', 'pop'])
            df_geo = df_world_confirmed[df_world_confirmed['Province/State'].isna()].groupby('Country/Region').first()
            df = {}
            df['confirmed'] = df_world_confirmed.groupby('Country/Region').agg('sum')[datecols].T
            df['recovered'] = df_world_recovered.groupby('Country/Region').agg('sum')[datecols].T
            df['deaths'] = df_world_deaths.groupby('Country/Region').agg('sum')[datecols].T
            df['daily_deaths'] = df['deaths'].diff()
            df['daily_confirmed'] = df['confirmed'].diff()
            df['daily_recovered'] = df['recovered'].diff()
            df['%daily_deaths'] = df['deaths'].diff()/df['deaths'].shift()
            df['%daily_confirmed'] = df['confirmed'].diff()/df['confirmed'].shift()
            df['%daily_recovered'] = df['recovered'].diff()/df['recovered'].shift()
            for item in df.keys():
                df[item].index = pd.to_datetime(df[item].index)     
            mult = 1.
            fig = go.Figure()        
            tempdf = df[labels].loc[date.strftime("%Y-%m-%d")].T
            tempdf['lat'] = tempdf.index.map(df_geo['Lat'])
            tempdf['long'] = tempdf.index.map(df_geo['Long'])
            tempdf.columns = [labels, 'lat', 'long']
            tempdf = tempdf.sort_values(by=labels, ascending=False)[:top_prov]        
            if show_map:
                fig = px.scatter_mapbox(tempdf.reset_index(), 
                            lat='lat', lon='long', color=labels, size=labels, 
                            labels = labels,
                            hover_name='Country/Region',
                            zoom=0,  height=800,
                            mapbox_style="open-street-map",                        
                            title='top {} countries on day {} for {}'.format(top_prov, date.strftime("%m/%d/%Y"), labels),
                   )
                fig.update_layout(paper_bgcolor='rgba(0,0,0,0)',font = dict(color = 'lightgray'))
                fig.show()
            else:
                fig = px.bar(tempdf.reset_index(), x=labels, y='Country/Region', orientation='h')
                fig.update_layout(showlegend=True,title='top {} countries on day {}'.format(top_prov, date.strftime("%m/%d/%Y")),
                                 paper_bgcolor='rgba(0,0,0,0)',font = dict(color = 'lightgray'),plot_bgcolor='rgba(0,0,0,0)')
                fig.update_xaxes(showgrid=show_grid, gridwidth=1, gridcolor='gray')
                fig.update_yaxes(showgrid=show_grid, gridwidth=1, gridcolor='gray')

                fig.show()
    except Exception as e:
        print(e)

interactive(children=(Dropdown(description='labels', options=('confirmed', 'recovered', 'deaths', 'daily_confi…

## Total cases decomposition

In [39]:
@interact
def get_world_pie(region = countries_columns,
                  date=widgets.DatePicker(description='Pick a Date',
                        value=pd.to_datetime([item for item in df_world_confirmed.columns if '/' in item][-1])),):
    try:
        if len(region) == 0:
            region = 'Italy'
        df = {}
        df['confirmed'] = df_world_confirmed.copy().groupby('Country/Region').sum().drop(['Lat', 'Long', 'pop'], 1).T
        df['recovered'] = df_world_recovered.copy().groupby('Country/Region').sum().drop(['Lat', 'Long', 'pop'], 1).T        
        df['deaths'] = df_world_deaths.copy().groupby('Country/Region').sum().drop(['Lat', 'Long', 'pop'], 1).T
        df['active'] = df['confirmed'] - df['recovered'] - df['deaths']        
        labels = ['recovered', 'deaths', 'active']
        df = pd.concat((df['recovered'][region], df['deaths'][region], df['active'][region]),1)
        df.columns = labels
        for item in df.keys():
            df.index = pd.to_datetime(df.index)  
        fig = px.pie(df.loc[date.strftime("%Y-%m-%d")], labels=labels, names=labels, 
                     values = df[labels].loc[date.strftime("%Y-%m-%d")].values, title = 'total cases decomposition on date {} for {}'.format(date.strftime("%m/%d/%Y"), region))
        fig.update_layout(paper_bgcolor='rgba(0,0,0,0)',font = dict(color = 'lightgray'),plot_bgcolor='rgba(0,0,0,0)')
        fig.update_traces(textposition='inside', textinfo='percent+label')
        fig.show()
    except Exception as e:
        print(e)

interactive(children=(Dropdown(description='region', options=('Afghanistan', 'Albania', 'Algeria', 'Andorra', …

## Country comparisons

In this section we provide comparison analysise between different countries

In [40]:
@interact
def world_comparison(regions = widgets.SelectMultiple(description="regions",options=countries_columns), 
                     labels = widgets.SelectMultiple(description="data",options=countries_labels),
                     log=False, relative_dates=False, cases_per_mln_people=False, plot_bars=False, show_grid=False,
                     aggregate=False):    
    try:
        if len(labels) == 0:
            labels = countries_labels[:1]
        labels = list(labels)
        if len(regions) == 0:
            regions = ['Italy', 'France', 'Spain', 'Germany', 'United Kingdom', 'US']
        regions = list(regions)
        df = {}
        df['confirmed'] = df_world_confirmed.copy().groupby('Country/Region').sum().drop(['Lat', 'Long', 'pop'], 1).T
        df['recovered'] = df_world_recovered.copy().groupby('Country/Region').sum().drop(['Lat', 'Long', 'pop'], 1).T        
        df['deaths'] = df_world_deaths.copy().groupby('Country/Region').sum().drop(['Lat', 'Long', 'pop'], 1).T
        df['daily_deaths'] = df['deaths'].diff()
        df['daily_confirmed'] = df['confirmed'].diff()
        df['daily_recovered'] = df['recovered'].diff()
        df['%daily_deaths'] = df['deaths'].diff()/df['deaths'].shift()
        df['%daily_confirmed'] = df['confirmed'].diff()/df['confirmed'].shift()
        df['%daily_recovered'] = df['recovered'].diff()/df['recovered'].shift()
        
        mult = 1.
        fig = go.Figure()
        for item in labels:
            if aggregate:
                if cases_per_mln_people: 
                    mult = 1e06/populations.loc[regions].sum()
                temp = df[item][regions[0]].copy()        
                for region in regions[1:]:                    
                    temp = temp.add(df[item][region])
                temp.index = pd.to_datetime(temp.index)
                if relative_dates: temp = temp.loc[~(temp==0)].reset_index(drop=True).iloc[:-1] 
                if log:
                    if plot_bars:
                        fig.add_traces(go.Bar(x=temp.index, y=temp*mult, name=item+'_'+'-'.join(regions)))
                    else:
                        fig.add_traces(go.Scatter(x=temp.index, y=temp*mult, name=item+'_'+'-'.join(regions)))
                else:
                    if plot_bars:
                        fig.add_traces(go.Bar(x=temp.index, y=temp*mult, name=item+'_'+'-'.join(regions)))
                    else:
                        fig.add_traces(go.Scatter(x=temp.index, y=temp*mult, name=item+'_'+'-'.join(regions)))
                fig.update_layout(legend_orientation="h")
            else:
                for region in regions:            
                    temp = df[item][region]
                    temp.index = pd.to_datetime(temp.index)
                    if cases_per_mln_people: 
                        mult = 1e06/populations.loc[region]
                    if relative_dates: temp = temp.loc[~(temp==0)].reset_index(drop=True).iloc[:-1] 
                    if log:
                        if plot_bars:
                            fig.add_traces(go.Bar(x=temp.index, y=temp*mult, name=item+'_'+region))
                        else:
                            fig.add_traces(go.Scatter(x=temp.index, y=temp*mult, name=item+'_'+region))
                    else:
                        if plot_bars:
                            fig.add_traces(go.Bar(x=temp.index, y=temp*mult, name=item+'_'+region))
                        else:
                            fig.add_traces(go.Scatter(x=temp.index, y=temp*mult, name=item+'_'+region))
        fig.update_layout(showlegend=True,paper_bgcolor='rgba(0,0,0,0)',font = dict(color = 'lightgray'),plot_bgcolor='rgba(0,0,0,0)')
        if log: fig.update_layout(yaxis_type="log")
        fig.update_xaxes(showgrid=show_grid, gridwidth=1, gridcolor='lightgray')
        fig.update_yaxes(showgrid=show_grid, gridwidth=1, gridcolor='lightgray')
        fig.show()
    except Exception as e:
        print(e)

interactive(children=(SelectMultiple(description='regions', options=('Afghanistan', 'Albania', 'Algeria', 'And…

## Model fitting

The same model fitting can be performed for world countries

In [41]:
@interact
def get_model_world(region=countries_columns, 
              start_fit=widgets.DatePicker(value=pd.to_datetime(df_naz.index[0])), 
              end_fit=widgets.DatePicker(value=pd.to_datetime(df_naz.index[-1])), fwd_look=50, 
              func=models, label = countries_labels, stdev=widgets.IntSlider(min=0, max=3, value=0),
              fit_differential=False, plot_bars=False, show_grid=False):
    if 'confirmed' in label:
        df = df_world_confirmed.groupby('Country/Region').sum().drop(['Lat', 'Long', 'pop'], 1).T[region]
        if label == 'daily_confirmed':
            df = df.diff()
        if label == '%daily_confirmed':
            df = df.dropna().diff()/df.dropna().shift()            
    elif 'recovered' in label:
        df = df_world_recovered.groupby('Country/Region').sum().drop(['Lat', 'Long', 'pop'], 1).T[region]
        if label == 'daily_recovered':
            df = df.diff()
        if label == '%daily_recovered':
            df = df.dropna().diff()/df.dropna().shift()  
    elif 'deaths' in label:
        df = df_world_deaths.groupby('Country/Region').sum().drop(['Lat', 'Long', 'pop'], 1).T[region]    
        if label == 'daily_deaths':
            df = df.diff()
        if label == '%daily_deaths':
            df = df.dropna().diff()/df.dropna().shift()  
    try:
        df.index = pd.to_datetime(df.index)
        start_fit = pd.Timestamp(start_fit)
        end_fit = pd.Timestamp(end_fit)
        y_fit = df.loc[start_fit:end_fit].dropna()
        x_fit = range(len(y_fit.index))
        if isinstance(func, types.FunctionType):
            x_pred2 = range(len(df.index)+fwd_look)
            x_pred1 = range(len(df.index))
            sig = inspect.signature(func)
            n_params = len(sig.parameters.items()) -1
            if fit_differential:
                y_fit = y_fit.cumsum()
            params, params_cov = curve_fit(func, x_fit, y_fit, 
                                bounds=([0. for item in range(n_params)], 
                                        [np.inf for item in range(n_params)]), 
                                           method='trf',  maxfev=10000)
            stderr = np.sqrt(np.diag(params_cov))
            params_up = params + stderr * stdev
            params_down = params - stderr * stdev
            y_pred1 = func(x_pred1, *params)
            y_pred2 = func(x_pred2, *params)
            y_pred_up = func(x_pred2, *params_up)
            y_pred_down = func(x_pred2, *params_down)
            if fit_differential:
                y_pred1 = np.diff(y_pred1)
                y_pred2 = np.diff(y_pred2)
                y_pred_up = np.diff(y_pred_up)
                y_pred_down = np.diff(y_pred_down)
            errors = (y_pred_up - y_pred_down)
            rmse = np.sqrt(np.mean((y_fit - func(x_fit, *params)) ** 2))
        elif func=='log':
            x_pred2 = range(len(df.index)+fwd_look)
            x_pred1 = range(len(df.index))
            model = LinearRegression()
            model.fit(np.array(x_fit).reshape(-1,1), np.log1p(y_fit))
            r2 = model.score(np.array(x_fit).reshape(-1, 1), np.log1p(y_fit))
            params = model.coef_
            y_pred1 = model.predict(np.array(x_pred1).reshape(-1,1))
            y_pred2 = model.predict(np.array(x_pred2).reshape(-1,1))
            errors = 0.
        fig = go.Figure()
        if isinstance(func, types.FunctionType):
            if plot_bars:
                fig.add_traces(go.Bar(x=pd.date_range(start=df.index.min(), end=df.index.max()), y=df.values, 
                                  name=label))
            else:
                fig.add_traces(go.Scatter(x=pd.date_range(start=df.index.min(), end=df.index.max()), y=df.values, 
                                  name=label, mode='markers'))
            fig.add_traces(go.Scatter(x=pd.date_range(start=start_fit, end=end_fit), 
                                  y=y_pred1[:(end_fit-start_fit).days], name='model rmse: '+str(int(rmse))))
            fig.add_traces(go.Scatter(x=pd.date_range(start=end_fit, end=df.index.max()+pd.Timedelta(str(fwd_look)+'d')), 
                                  y=y_pred2[(end_fit-start_fit).days:],error_y=dict(array=errors,color='green',
                                        thickness=.2,width=0.5), name='forecast'))   
        elif func=='log':
            if plot_bars:
                fig.add_traces(go.Bar(x=pd.date_range(start=df.index.min(), 
                                end=df.index.max()), y=np.log1p(df.values), 
                                  name=label))
            else:
                fig.add_traces(go.Scatter(x=pd.date_range(start=df.index.min(), 
                                end=df.index.max()), y=np.log1p(df.values), 
                                  name=label, mode='markers'))
            fig.add_traces(go.Scatter(x=pd.date_range(start=start_fit, end=end_fit), 
                                  y=y_pred1[:(end_fit-start_fit).days], 
                                      name='model rmse: '+str(np.round(r2,2))))
            fig.add_traces(go.Scatter(x=pd.date_range(start=end_fit, end=df.index.max()+pd.Timedelta(str(fwd_look)+'d')), 
                                  y=y_pred2[(end_fit-start_fit).days:]
                                      , name='log forecast'))   

        else:
            if plot_bars:
                fig.add_traces(go.Bar(x=pd.date_range(start=df.index.min(), end=df.index.max()), y=df.values, 
                                  name=label))
            else:
                fig.add_traces(go.Scatter(x=pd.date_range(start=df.index.min(), end=df.index.max()), y=df.values, 
                                  name=label))

        fig.update_layout(showlegend=True, title=region,paper_bgcolor='rgba(0,0,0,0)',font = dict(color = 'lightgray'),
                          plot_bgcolor='rgba(0,0,0,0)')
        fig.update_xaxes(showgrid=show_grid, gridwidth=1, gridcolor='lightgray')
        fig.update_yaxes(showgrid=show_grid, gridwidth=1, gridcolor='lightgray')
        fig.show()
    except Exception as e:
        print(e)

interactive(children=(Dropdown(description='region', options=('Afghanistan', 'Albania', 'Algeria', 'Andorra', …