シンプルなlog関数
# COVID-19 W2: A few charts and a simple baseline


# Summary

**Disclaimer** We still have limited data to predict or understand what will happen in the next few weeks (months).

At this point I see more value in collecting data and monitoring the outbreak than trying to predict the future.

[Please don't kill yourself because I published a notebook](https://www.reddit.com/r/datascience/comments/fsfdn2/the_best_thing_you_can_do_to_fight_covid19_is/)



### Challenges
 * The outbreak patterns vary a lot among countries
 * Most countries have only 2 weeks data
 * Only a handful countries managed to succesfuly slow down the outbreak
 * Almost every country had several serious regulations in recent weeks
 * Increasing testing capacity could have serious impact on confirmed cases



 ### Assumptions
  * As we are still in the early period, we will see exponential growth in the next few weeks
  * Thanks to the panic/awareness/regulations/social distancing the exponential increase will slow down

As the process is not stationary at all I decided to use a simple heuristic approach. Maybe I will import sklearn next week.
  
  ### TIL
 * Namibia's country code is NA. Now I remember I heard it in joke before, but I had to investigate a bug learn it again :)
 * I haven't used plotly recently, I quite enjoyed the "new" Plotly Express interface


In [1]:
%matplotlib inline
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
pd.set_option('display.max_columns', 99)
pd.set_option('display.max_rows', 99)
import os
import numpy as np
from matplotlib import pyplot as plt
from tqdm import tqdm
import datetime as dt

In [2]:
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [16, 10]
plt.rcParams['font.size'] = 14
import seaborn as sns
sns.set_palette(sns.color_palette('tab20', 20))

import plotly.express as px
import plotly.graph_objects as go

In [3]:
COMP = '../input/covid19-global-forecasting-week-2'
DATEFORMAT = '%Y-%m-%d'


def get_comp_data(COMP):
    train = pd.read_csv(f'{COMP}/train.csv')
    test = pd.read_csv(f'{COMP}/test.csv')
    submission = pd.read_csv(f'{COMP}/submission.csv')
    print(train.shape, test.shape, submission.shape)
    train['Country_Region'] = train['Country_Region'].str.replace(',', '')
    test['Country_Region'] = test['Country_Region'].str.replace(',', '')

    train['Location'] = train['Country_Region'] + '-' + train['Province_State'].fillna('')

    test['Location'] = test['Country_Region'] + '-' + test['Province_State'].fillna('')

    train['LogConfirmed'] = to_log(train.ConfirmedCases)
    train['LogFatalities'] = to_log(train.Fatalities)
    train = train.drop(columns=['Province_State'])
    test = test.drop(columns=['Province_State'])

    country_codes = pd.read_csv('../input/covid19-metadata/country_codes.csv', keep_default_na=False)
    train = train.merge(country_codes, on='Country_Region', how='left')
    test = test.merge(country_codes, on='Country_Region', how='left')

    train['DateTime'] = pd.to_datetime(train['Date'])
    test['DateTime'] = pd.to_datetime(test['Date'])
    
    return train, test, submission


def process_each_location(df):
    dfs = []
    for loc, df in tqdm(df.groupby('Location')):
        df = df.sort_values(by='Date')
        df['Fatalities'] = df['Fatalities'].cummax()
        df['ConfirmedCases'] = df['ConfirmedCases'].cummax()
        df['LogFatalities'] = df['LogFatalities'].cummax()
        df['LogConfirmed'] = df['LogConfirmed'].cummax()
        df['LogConfirmedNextDay'] = df['LogConfirmed'].shift(-1)
        df['ConfirmedNextDay'] = df['ConfirmedCases'].shift(-1)
        df['DateNextDay'] = df['Date'].shift(-1)
        df['LogFatalitiesNextDay'] = df['LogFatalities'].shift(-1)
        df['FatalitiesNextDay'] = df['Fatalities'].shift(-1)
        df['LogConfirmedDelta'] = df['LogConfirmedNextDay'] - df['LogConfirmed']
        df['ConfirmedDelta'] = df['ConfirmedNextDay'] - df['ConfirmedCases']
        df['LogFatalitiesDelta'] = df['LogFatalitiesNextDay'] - df['LogFatalities']
        df['FatalitiesDelta'] = df['FatalitiesNextDay'] - df['Fatalities']
        dfs.append(df)
    return pd.concat(dfs)


def add_days(d, k):
    return dt.datetime.strptime(d, DATEFORMAT) + dt.timedelta(days=k)


def to_log(x):
    return np.log(x + 1)


def to_exp(x):
    return np.exp(x) - 1


In [4]:
start = dt.datetime.now()
train, test, submission = get_comp_data(COMP)
train.shape, test.shape, submission.shape
train.head(2)
test.head(2)

(20580, 6) (12642, 4) (12642, 3)


((20580, 13), (12642, 9), (12642, 3))

Unnamed: 0,Id,Country_Region,Date,ConfirmedCases,Fatalities,Location,LogConfirmed,LogFatalities,country_iso_code_2,country_iso_code_3,continent,geo_region,DateTime
0,1,Afghanistan,2020-01-22,0.0,0.0,Afghanistan-,0.0,0.0,AF,AFG,Asia,Southern Asia,2020-01-22
1,2,Afghanistan,2020-01-23,0.0,0.0,Afghanistan-,0.0,0.0,AF,AFG,Asia,Southern Asia,2020-01-23


Unnamed: 0,ForecastId,Country_Region,Date,Location,country_iso_code_2,country_iso_code_3,continent,geo_region,DateTime
0,1,Afghanistan,2020-03-19,Afghanistan-,AF,AFG,Asia,Southern Asia,2020-03-19
1,2,Afghanistan,2020-03-20,Afghanistan-,AF,AFG,Asia,Southern Asia,2020-03-20


In [5]:
train.describe()
train.nunique()
train.dtypes
train.count()

TRAIN_START = train.Date.min()
TEST_START = test.Date.min()
TRAIN_END = train.Date.max()
TEST_END = test.Date.max()
TRAIN_START, TRAIN_END, TEST_START, TEST_END

Unnamed: 0,Id,ConfirmedCases,Fatalities,LogConfirmed,LogFatalities
count,20580.0,20580.0,20580.0,20580.0,20580.0
mean,14685.5,514.939116,21.080952,1.81902,0.384179
std,8487.230117,4541.261768,287.758197,2.526578,1.038621
min,1.0,0.0,0.0,0.0,0.0
25%,7335.75,0.0,0.0,0.0,0.0
50%,14685.5,0.0,0.0,0.0,0.0
75%,22035.25,35.0,0.0,3.583519,0.0
max,29370.0,105792.0,12428.0,11.56924,9.427788


Id                    20580
Country_Region          173
Date                     70
ConfirmedCases         1536
Fatalities              328
Location                294
LogConfirmed           1536
LogFatalities           328
country_iso_code_2      173
country_iso_code_3      173
continent                 6
geo_region               21
DateTime                 70
dtype: int64

Id                             int64
Country_Region                object
Date                          object
ConfirmedCases               float64
Fatalities                   float64
Location                      object
LogConfirmed                 float64
LogFatalities                float64
country_iso_code_2            object
country_iso_code_3            object
continent                     object
geo_region                    object
DateTime              datetime64[ns]
dtype: object

Id                    20580
Country_Region        20580
Date                  20580
ConfirmedCases        20580
Fatalities            20580
Location              20580
LogConfirmed          20580
LogFatalities         20580
country_iso_code_2    20580
country_iso_code_3    20580
continent             20580
geo_region            20580
DateTime              20580
dtype: int64

('2020-01-22', '2020-03-31', '2020-03-19', '2020-04-30')

# Worldwide

In [6]:
train = train.sort_values(by='Date')
countries_latest_state = train[train['Date'] == TRAIN_END].groupby([
    'Country_Region', 'continent', 'geo_region', 'country_iso_code_3']).sum()[[
    'ConfirmedCases', 'Fatalities']].reset_index()
countries_latest_state['Log10Confirmed'] = np.log10(countries_latest_state.ConfirmedCases + 1)
countries_latest_state['Log10Fatalities'] = np.log10(countries_latest_state.Fatalities + 1)
countries_latest_state = countries_latest_state.sort_values(by='Fatalities', ascending=False)
countries_latest_state.to_csv('countries_latest_state.csv', index=False)

countries_latest_state.shape
countries_latest_state.head()

(173, 8)

Unnamed: 0,Country_Region,continent,geo_region,country_iso_code_3,ConfirmedCases,Fatalities,Log10Confirmed,Log10Fatalities
81,Italy,Europe,Southern Europe,ITA,105792.0,12428.0,5.024457,4.094436
147,Spain,Europe,Southern Europe,ESP,95923.0,8464.0,4.981927,3.927627
162,US,Americas,Northern America,USA,188018.0,3870.0,5.274202,3.587823
58,France,Europe,Western Europe,FRA,52827.0,3532.0,4.722864,3.548144
33,China,Asia,Eastern Asia,CHN,82279.0,3309.0,4.915294,3.519828


In [7]:
fig = go.Figure(data=go.Choropleth(
    locations = countries_latest_state['country_iso_code_3'],
    z = countries_latest_state['Log10Confirmed'],
    text = countries_latest_state['Country_Region'],
    colorscale = 'viridis_r',
    autocolorscale=False,
    reversescale=False,
    marker_line_color='darkgray',
    marker_line_width=0.5,
    colorbar_tickprefix = '10^',
    colorbar_title = 'Confirmed cases <br>(log10 scale)',
))

_ = fig.update_layout(
    title_text=f'COVID-19 Global Cases [Updated: {TRAIN_END}]',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular'
    )
)

fig.show()

In [8]:
fig = go.Figure(data=go.Choropleth(
    locations = countries_latest_state['country_iso_code_3'],
    z = countries_latest_state['Log10Fatalities'],
    text = countries_latest_state['Country_Region'],
    colorscale = 'viridis_r',
    autocolorscale=False,
    reversescale=False,
    marker_line_color='darkgray',
    marker_line_width=0.5,
    colorbar_tickprefix = '10^',
    colorbar_title = 'Deaths <br>(log10 scale)',
))

_ = fig.update_layout(
    title_text=f'COVID-19 Global Deaths [Updated: {TRAIN_END}]',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular'
    )
)

fig.show()

In [9]:
countries_latest_state['DeathConfirmedRatio'] = (countries_latest_state.Fatalities + 1) / (countries_latest_state.ConfirmedCases + 1)
countries_latest_state['DeathConfirmedRatio'] = countries_latest_state['DeathConfirmedRatio'].clip(0, 0.1) 
fig = px.scatter(countries_latest_state,
                 x='ConfirmedCases',
                 y='Fatalities',
                 color='DeathConfirmedRatio',
                 size='Log10Fatalities',
                 size_max=20,
                 hover_name='Country_Region',
                 color_continuous_scale='viridis_r'
)
_ = fig.update_layout(
    title_text=f'COVID-19 Deaths vs Confirmed Cases by Country [Updated: {TRAIN_END}]',
    xaxis_type="log",
    yaxis_type="log"
)
fig.show()

In [10]:
# The source dataset is not necessary cumulative we will force it
latest_loc = train[train['Date'] == TRAIN_END][['Location', 'ConfirmedCases', 'Fatalities']]
max_loc = train.groupby(['Location'])[['ConfirmedCases', 'Fatalities']].max().reset_index()
check = pd.merge(latest_loc, max_loc, on='Location')
np.mean(check.ConfirmedCases_x == check.ConfirmedCases_y)
np.mean(check.Fatalities_x == check.Fatalities_y)
check[check.Fatalities_x != check.Fatalities_y]
check[check.ConfirmedCases_x != check.ConfirmedCases_y]

0.9931972789115646

0.9863945578231292

Unnamed: 0,Location,ConfirmedCases_x,Fatalities_x,ConfirmedCases_y,Fatalities_y
49,Iceland-,1135.0,2.0,1135.0,5.0
150,Kazakhstan-,343.0,2.0,343.0,3.0
157,US-Hawaii,204.0,0.0,204.0,1.0
263,Slovakia-,363.0,0.0,363.0,1.0


Unnamed: 0,Location,ConfirmedCases_x,Fatalities_x,ConfirmedCases_y,Fatalities_y
43,Guyana-,12.0,2.0,20.0,2.0
115,China-Guizhou,146.0,2.0,147.0,2.0


In [11]:
train_clean = process_each_location(train)

train_clean.shape
train_clean.tail()

100%|██████████| 294/294 [00:03<00:00, 76.47it/s]


(20580, 22)

Unnamed: 0,Id,Country_Region,Date,ConfirmedCases,Fatalities,Location,LogConfirmed,LogFatalities,country_iso_code_2,country_iso_code_3,continent,geo_region,DateTime,LogConfirmedNextDay,ConfirmedNextDay,DateNextDay,LogFatalitiesNextDay,FatalitiesNextDay,LogConfirmedDelta,ConfirmedDelta,LogFatalitiesDelta,FatalitiesDelta
20575,29366,Zimbabwe,2020-03-27,5.0,1.0,Zimbabwe-,1.791759,0.693147,ZW,ZWE,Africa,Eastern Africa,2020-03-27,2.079442,7.0,2020-03-28,0.693147,1.0,0.287682,2.0,0.0,0.0
20576,29367,Zimbabwe,2020-03-28,7.0,1.0,Zimbabwe-,2.079442,0.693147,ZW,ZWE,Africa,Eastern Africa,2020-03-28,2.079442,7.0,2020-03-29,0.693147,1.0,0.0,0.0,0.0,0.0
20577,29368,Zimbabwe,2020-03-29,7.0,1.0,Zimbabwe-,2.079442,0.693147,ZW,ZWE,Africa,Eastern Africa,2020-03-29,2.079442,7.0,2020-03-30,0.693147,1.0,0.0,0.0,0.0,0.0
20578,29369,Zimbabwe,2020-03-30,7.0,1.0,Zimbabwe-,2.079442,0.693147,ZW,ZWE,Africa,Eastern Africa,2020-03-30,2.197225,8.0,2020-03-31,0.693147,1.0,0.117783,1.0,0.0,0.0
20579,29370,Zimbabwe,2020-03-31,8.0,1.0,Zimbabwe-,2.197225,0.693147,ZW,ZWE,Africa,Eastern Africa,2020-03-31,,,,,,,,,


# Continents

In [12]:
regional_progress = train_clean.groupby(['DateTime', 'continent']).sum()[['ConfirmedCases', 'Fatalities']].reset_index()
regional_progress['Log10Confirmed'] = np.log10(regional_progress.ConfirmedCases + 1)
regional_progress['Log10Fatalities'] = np.log10(regional_progress.Fatalities + 1)
regional_progress = regional_progress[regional_progress.continent != '#N/A']

In [13]:
fig = px.area(regional_progress, x="DateTime", y="ConfirmedCases", color="continent")
_ = fig.update_layout(
    title_text=f'COVID-19 Cumulative Confirmed Cases by Continent [Updated: {TRAIN_END}]'
)
fig.show()
fig2 = px.line(regional_progress, x='DateTime', y='ConfirmedCases', color='continent')
_ = fig2.update_layout(
    yaxis_type="log",
    title_text=f'COVID-19 Cumulative Confirmed Cases by Continent [Updated: {TRAIN_END}]'
)
fig2.show()



In [14]:
fig = px.area(regional_progress, x="DateTime", y="Fatalities", color="continent")
_ = fig.update_layout(
    title_text=f'COVID-19 Cumulative Confirmed Deaths by Continent [Updated: {TRAIN_END}]'
)
fig.show()
fig2 = px.line(regional_progress, x='DateTime', y='Fatalities', color='continent')
_ = fig2.update_layout(
    yaxis_type="log",
    title_text=f'COVID-19 Cumulative Confirmed Deaths by Continent [Updated: {TRAIN_END}]'
)
fig2.show()

In [15]:
china = train_clean[train_clean.Location.str.startswith('China')]
top10_locations = china.groupby('Location')[['ConfirmedCases']].max().sort_values(
    by='ConfirmedCases', ascending=False).reset_index().Location.values[:10]
fig2 = px.line(china[china.Location.isin(top10_locations)], x='DateTime', y='ConfirmedCases', color='Location')
_ = fig2.update_layout(
    yaxis_type="log",
    title_text=f'COVID-19 Cumulative Confirmed Cases in China [Updated: {TRAIN_END}]'
)
fig2.show()

In [16]:
europe = train_clean[train_clean.continent == 'Europe']
top10_locations = europe.groupby('Location')[['ConfirmedCases']].max().sort_values(
    by='ConfirmedCases', ascending=False).reset_index().Location.values[:10]
fig2 = px.line(europe[europe.Location.isin(top10_locations)], x='DateTime', y='ConfirmedCases', color='Location')
_ = fig2.update_layout(
    yaxis_type="log",
    title_text=f'COVID-19 Cumulative Confirmed Cases in Europe [Updated: {TRAIN_END}]'
)
fig2.show()

In [17]:
us = train_clean[train_clean.Country_Region == 'US']
top10_locations = us.groupby('Location')[['ConfirmedCases']].max().sort_values(
    by='ConfirmedCases', ascending=False).reset_index().Location.values[:10]
fig2 = px.line(us[us.Location.isin(top10_locations)], x='DateTime', y='ConfirmedCases', color='Location')
_ = fig2.update_layout(
    yaxis_type="log",
    title_text=f'COVID-19 Cumulative Confirmed Cases in the USA [Updated: {TRAIN_END}]'
)
fig2.show()

In [18]:
africa = train_clean[train_clean.continent == 'Africa']
top10_locations = africa.groupby('Location')[['ConfirmedCases']].max().sort_values(
    by='ConfirmedCases', ascending=False).reset_index().Location.values[:10]
fig2 = px.line(africa[africa.Location.isin(top10_locations)], x='DateTime', y='ConfirmedCases', color='Location')
_ = fig2.update_layout(
    yaxis_type="log",
    title_text=f'COVID-19 Cumulative Confirmed Cases in Africa [Updated: {TRAIN_END}]'
)
fig2.show()

# Countries

In [19]:
country_progress = train_clean.groupby(['Date', 'DateTime', 'Country_Region']).sum()[[
    'ConfirmedCases', 'Fatalities', 'ConfirmedDelta', 'FatalitiesDelta']].reset_index()
top10_countries = country_progress.groupby('Country_Region')[['Fatalities']].max().sort_values(
    by='Fatalities', ascending=False).reset_index().Country_Region.values[:10]

fig2 = px.line(country_progress[country_progress.Country_Region.isin(top10_countries)],
               x='DateTime', y='ConfirmedCases', color='Country_Region')
_ = fig2.update_layout(
    yaxis_type="log",
    title_text=f'COVID-19 Cumulative Confirmed Cases by Country [Updated: {TRAIN_END}]'
)
fig2.show()
fig3 = px.line(country_progress[country_progress.Country_Region.isin(top10_countries)],
               x='DateTime', y='Fatalities', color='Country_Region')
_ = fig3.update_layout(
    yaxis_type="log",
    title_text=f'COVID-19 Cumulative Deaths by Country [Updated: {TRAIN_END}]'
)
fig3.show()

# Outbreak during March

In [20]:
countries_0301 = country_progress[country_progress.Date == '2020-03-01'][[
    'Country_Region', 'ConfirmedCases', 'Fatalities']]
countries_0331 = country_progress[country_progress.Date == '2020-03-31'][[
    'Country_Region', 'ConfirmedCases', 'Fatalities']]
countries_in_march = pd.merge(countries_0301, countries_0331, on='Country_Region', suffixes=['_0301', '_0331'])
countries_in_march['IncreaseInMarch'] = countries_in_march.ConfirmedCases_0331 / (countries_in_march.ConfirmedCases_0301 + 1)
countries_in_march = countries_in_march[countries_in_march.ConfirmedCases_0331 > 200].sort_values(
    by='IncreaseInMarch', ascending=False)
countries_in_march.tail(15)

Unnamed: 0,Country_Region,ConfirmedCases_0301,Fatalities_0301,ConfirmedCases_0331,Fatalities_0331,IncreaseInMarch
81,Italy,1694.0,34.0,105792.0,12428.0,62.414159
77,Iran,978.0,54.0,44605.0,2898.0,45.561798
92,Lebanon,10.0,0.0,470.0,12.0,42.727273
156,Thailand,42.0,1.0,1651.0,10.0,38.395349
78,Iraq,19.0,0.0,694.0,50.0,34.7
165,United Arab Emirates,21.0,0.0,664.0,6.0,30.181818
170,Vietnam,16.0,0.0,212.0,0.0,12.470588
12,Bahrain,47.0,0.0,567.0,4.0,11.8125
142,Singapore,106.0,0.0,926.0,3.0,8.654206
154,Taiwan*,40.0,1.0,322.0,5.0,7.853659


In [21]:
selected_countries = [
    'Italy', 'Vietnam', 'Bahrain', 'Singapore', 'Taiwan*', 'Japan', 'Kuwait', 'Korea, South', 'China']
fig2 = px.line(country_progress[country_progress.Country_Region.isin(selected_countries)],
               x='DateTime', y='ConfirmedCases', color='Country_Region')
_ = fig2.update_layout(
    yaxis_type="log",
    title_text=f'COVID-19 Cumulative Confirmed Cases by Country [Updated: {TRAIN_END}]'
)
fig2.show()
fig3 = px.line(country_progress[country_progress.Country_Region.isin(selected_countries)],
               x='DateTime', y='Fatalities', color='Country_Region')
_ = fig3.update_layout(
    yaxis_type="log",
    title_text=f'COVID-19 Cumulative Deaths by Country [Updated: {TRAIN_END}]'
)
fig3.show()

In [22]:
train_clean['Geo#Country#Contintent'] = train_clean.Location + '#' + train_clean.Country_Region + '#' + train_clean.continent
latest = train_clean[train_clean.Date == '2020-03-31'][[
    'Geo#Country#Contintent', 'ConfirmedCases', 'Fatalities', 'LogConfirmed', 'LogFatalities']]
daily_confirmed_deltas = train_clean[train_clean.Date >= '2020-03-17'].pivot(
    'Geo#Country#Contintent', 'Date', 'LogConfirmedDelta').round(3).reset_index()
daily_confirmed_deltas = latest.merge(daily_confirmed_deltas, on='Geo#Country#Contintent')
daily_confirmed_deltas.shape
daily_confirmed_deltas.head()
daily_confirmed_deltas.to_csv('daily_confirmed_deltas.csv', index=False)

(294, 20)

Unnamed: 0,Geo#Country#Contintent,ConfirmedCases,Fatalities,LogConfirmed,LogFatalities,2020-03-17,2020-03-18,2020-03-19,2020-03-20,2020-03-21,2020-03-22,2020-03-23,2020-03-24,2020-03-25,2020-03-26,2020-03-27,2020-03-28,2020-03-29,2020-03-30,2020-03-31
0,Afghanistan-#Afghanistan#Asia,174.0,4.0,5.164786,1.609438,0.0,0.0,0.083,0.0,0.495,0.0,0.604,0.125,0.111,0.156,0.0,0.086,0.346,0.023,
1,Albania-#Albania#Europe,243.0,15.0,5.497168,2.772589,0.069,0.08,0.088,0.081,0.156,0.154,0.166,0.17,0.174,0.066,0.057,0.073,0.05,0.086,
2,Algeria-#Algeria#Africa,716.0,44.0,6.575076,3.806662,0.207,0.16,0.034,0.431,0.367,0.134,0.137,0.134,0.194,0.108,0.104,0.118,0.133,0.203,
3,Andorra-#Andorra#Europe,376.0,12.0,5.932245,2.564949,0.0,0.3,0.342,0.158,0.248,0.162,0.208,0.136,0.174,0.175,0.142,0.081,0.102,0.016,
4,Angola-#Angola#Africa,7.0,2.0,2.079442,1.098612,0.0,0.0,0.693,0.405,0.0,0.288,0.0,0.0,0.223,0.0,0.182,0.288,0.0,0.0,


In [23]:
deltas = train_clean[np.logical_and(
        train_clean.LogConfirmed > 2,
        ~train_clean.Location.str.startswith('China')
)].dropna().sort_values(by='LogConfirmedDelta', ascending=False)

deltas['start'] = deltas['LogConfirmed'].round(0)
confirmed_deltas = pd.concat([
    deltas.groupby('start')[['LogConfirmedDelta']].mean(),
    deltas.groupby('start')[['LogConfirmedDelta']].std(),
    deltas.groupby('start')[['LogConfirmedDelta']].count()
], axis=1)

deltas.mean()

confirmed_deltas.columns = ['avg', 'std', 'cnt']
confirmed_deltas
confirmed_deltas.to_csv('confirmed_deltas.csv')

Id                      15994.845504
ConfirmedCases           1220.050704
Fatalities                 54.007367
LogConfirmed                4.700472
LogFatalities               1.070036
LogConfirmedNextDay         4.877704
ConfirmedNextDay         1387.250921
LogFatalitiesNextDay        1.175230
FatalitiesNextDay          62.401300
LogConfirmedDelta           0.177232
ConfirmedDelta            167.200217
LogFatalitiesDelta          0.105193
FatalitiesDelta             8.393933
start                       4.688624
dtype: float64

Unnamed: 0_level_0,avg,std,cnt
start,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2.0,0.190193,0.264444,514
3.0,0.18555,0.217392,990
4.0,0.193425,0.179476,883
5.0,0.178451,0.146976,795
6.0,0.170952,0.120477,611
7.0,0.146204,0.103453,432
8.0,0.165915,0.114865,190
9.0,0.126862,0.087752,113
10.0,0.128586,0.05771,52
11.0,0.106281,0.039876,34


In [24]:
fig = px.box(deltas,  x="start", y="LogConfirmedDelta", range_y=[0, 0.35])
fig.show()

In [25]:
fig = px.box(deltas[deltas.Date >= '2020-03-01'],  x="DateTime", y="LogConfirmedDelta", range_y=[0, 0.6])
fig.show()

In [26]:
deltas = train_clean[np.logical_and(
        train_clean.LogConfirmed > 0,
        ~train_clean.Location.str.startswith('China')
)].dropna().sort_values(by='LogConfirmedDelta', ascending=False)
deltas = deltas[deltas['Date'] >= '2020-03-12']

confirmed_deltas = pd.concat([
    deltas.groupby('Location')[['LogConfirmedDelta']].mean(),
    deltas.groupby('Location')[['LogConfirmedDelta']].std(),
    deltas.groupby('Location')[['LogConfirmedDelta']].count(),
    deltas.groupby('Location')[['LogConfirmed']].max()
], axis=1)
confirmed_deltas.columns = ['avg', 'std', 'cnt', 'max']

confirmed_deltas.sort_values(by='avg').head(10)
confirmed_deltas.sort_values(by='avg').tail(10)
confirmed_deltas.to_csv('confirmed_deltas.csv')

Unnamed: 0_level_0,avg,std,cnt,max
Location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Saint Vincent and the Grenadines-,0.0,0.0,17,0.693147
Timor-Leste-,0.0,0.0,9,0.693147
Papua New Guinea-,0.0,0.0,11,0.693147
Diamond Princess-,0.000445,0.001939,19,6.569481
Korea South-,0.011474,0.004171,19,9.175956
France-Saint Barthelemy,0.029453,0.097662,19,1.94591
Maldives-,0.039327,0.07182,19,2.890372
Central African Republic-,0.043322,0.173287,16,1.386294
Liberia-,0.04621,0.123962,15,1.386294
Bhutan-,0.048226,0.118502,19,1.609438


Unnamed: 0_level_0,avg,std,cnt,max
Location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Uganda-,0.311352,0.489762,10,3.526361
US-West Virginia,0.314329,0.267553,14,4.983607
Canada-Quebec,0.317442,0.243443,19,8.140607
US-Mississippi,0.323716,0.273964,19,6.742881
US-Connecticut,0.3293,0.255133,19,7.852439
US-New Jersey,0.33868,0.258537,19,9.719384
US-Missouri,0.343191,0.175712,19,6.958448
Mali-,0.378114,0.313383,6,3.258097
US-Michigan,0.4126,0.419342,19,8.779404
Turkey-,0.464193,0.318235,19,9.289891


# Create prediction

In [27]:
DECAY = 0.93
DECAY ** 7, DECAY ** 14, DECAY ** 21, DECAY ** 28

confirmed_deltas = train.groupby(['Location', 'Country_Region', 'continent'])[[
    'Id']].count().reset_index()

GLOBAL_DELTA = 0.11
confirmed_deltas['DELTA'] = GLOBAL_DELTA

confirmed_deltas.loc[confirmed_deltas.continent=='Africa', 'DELTA'] = 0.14
confirmed_deltas.loc[confirmed_deltas.continent=='Oceania', 'DELTA'] = 0.06
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Korea South', 'DELTA'] = 0.011
confirmed_deltas.loc[confirmed_deltas.Country_Region=='US', 'DELTA'] = 0.15
confirmed_deltas.loc[confirmed_deltas.Country_Region=='China', 'DELTA'] = 0.01
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Japan', 'DELTA'] = 0.05
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Singapore', 'DELTA'] = 0.05
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Taiwan*', 'DELTA'] = 0.05
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Switzerland', 'DELTA'] = 0.05
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Norway', 'DELTA'] = 0.05
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Iceland', 'DELTA'] = 0.05
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Austria', 'DELTA'] = 0.06
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Italy', 'DELTA'] = 0.04
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Spain', 'DELTA'] = 0.08
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Portugal', 'DELTA'] = 0.12
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Israel', 'DELTA'] = 0.12
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Iran', 'DELTA'] = 0.08
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Germany', 'DELTA'] = 0.07
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Malaysia', 'DELTA'] = 0.06
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Russia', 'DELTA'] = 0.18
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Ukraine', 'DELTA'] = 0.18
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Brazil', 'DELTA'] = 0.12
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Turkey', 'DELTA'] = 0.18
confirmed_deltas.loc[confirmed_deltas.Country_Region=='Philippines', 'DELTA'] = 0.18
confirmed_deltas.loc[confirmed_deltas.Location=='France-', 'DELTA'] = 0.1
confirmed_deltas.loc[confirmed_deltas.Location=='United Kingdom-', 'DELTA'] = 0.12
confirmed_deltas.loc[confirmed_deltas.Location=='Diamond Princess-', 'DELTA'] = 0.00
confirmed_deltas.loc[confirmed_deltas.Location=='China-Hong Kong', 'DELTA'] = 0.08
confirmed_deltas.loc[confirmed_deltas.Location=='San Marino-', 'DELTA'] = 0.03


confirmed_deltas.shape, confirmed_deltas.DELTA.mean()

confirmed_deltas[confirmed_deltas.DELTA != GLOBAL_DELTA].shape, confirmed_deltas[confirmed_deltas.DELTA != GLOBAL_DELTA].DELTA.mean()
confirmed_deltas[confirmed_deltas.DELTA != GLOBAL_DELTA]
confirmed_deltas.describe()

(0.6017008706075703,
 0.36204393768990795,
 0.21784215250621053,
 0.13107581281801395)

((294, 5), 0.10711224489795881)

((168, 5), 0.1049464285714286)

Unnamed: 0,Location,Country_Region,continent,Id,DELTA
2,Algeria-,Algeria,Africa,70,0.14
4,Angola-,Angola,Africa,70,0.14
8,Australia-Australian Capital Territory,Australia,Oceania,70,0.06
9,Australia-New South Wales,Australia,Oceania,70,0.06
10,Australia-Northern Territory,Australia,Oceania,70,0.06
...,...,...,...,...,...
278,Uganda-,Uganda,Africa,70,0.14
279,Ukraine-,Ukraine,Europe,70,0.18
281,United Kingdom-,United Kingdom,Europe,70,0.12
292,Zambia-,Zambia,Africa,70,0.14


Unnamed: 0,Id,DELTA
count,294.0,294.0
mean,70.0,0.107112
std,0.0,0.043603
min,70.0,0.0
25%,70.0,0.11
50%,70.0,0.11
75%,70.0,0.14
max,70.0,0.18


In [28]:
daily_log_confirmed = train_clean.pivot('Location', 'Date', 'LogConfirmed').reset_index()
daily_log_confirmed = daily_log_confirmed.sort_values(TRAIN_END, ascending=False)
daily_log_confirmed.to_csv('daily_log_confirmed.csv', index=False)

for i, d in tqdm(enumerate(pd.date_range(add_days(TRAIN_END, 1), add_days(TEST_END, 1)))):
    new_day = str(d).split(' ')[0]
    last_day = dt.datetime.strptime(new_day, DATEFORMAT) - dt.timedelta(days=1)
    last_day = last_day.strftime(DATEFORMAT)
    for loc in confirmed_deltas.Location.values:
        confirmed_delta = confirmed_deltas.loc[confirmed_deltas.Location == loc, 'DELTA'].values[0]
        daily_log_confirmed.loc[daily_log_confirmed.Location == loc, new_day] = daily_log_confirmed.loc[daily_log_confirmed.Location == loc, last_day] + \
            confirmed_delta * DECAY ** i

31it [00:42,  1.38s/it]


In [29]:
daily_log_confirmed.head()

Date,Location,2020-01-22,2020-01-23,2020-01-24,2020-01-25,2020-01-26,2020-01-27,2020-01-28,2020-01-29,2020-01-30,2020-01-31,2020-02-01,2020-02-02,2020-02-03,2020-02-04,2020-02-05,2020-02-06,2020-02-07,2020-02-08,2020-02-09,2020-02-10,2020-02-11,2020-02-12,2020-02-13,2020-02-14,2020-02-15,2020-02-16,2020-02-17,2020-02-18,2020-02-19,2020-02-20,2020-02-21,2020-02-22,2020-02-23,2020-02-24,2020-02-25,2020-02-26,2020-02-27,2020-02-28,2020-02-29,2020-03-01,2020-03-02,2020-03-03,2020-03-04,2020-03-05,2020-03-06,2020-03-07,2020-03-08,2020-03-09,...,2020-03-14,2020-03-15,2020-03-16,2020-03-17,2020-03-18,2020-03-19,2020-03-20,2020-03-21,2020-03-22,2020-03-23,2020-03-24,2020-03-25,2020-03-26,2020-03-27,2020-03-28,2020-03-29,2020-03-30,2020-03-31,2020-04-01,2020-04-02,2020-04-03,2020-04-04,2020-04-05,2020-04-06,2020-04-07,2020-04-08,2020-04-09,2020-04-10,2020-04-11,2020-04-12,2020-04-13,2020-04-14,2020-04-15,2020-04-16,2020-04-17,2020-04-18,2020-04-19,2020-04-20,2020-04-21,2020-04-22,2020-04-23,2020-04-24,2020-04-25,2020-04-26,2020-04-27,2020-04-28,2020-04-29,2020-04-30,2020-05-01
140,Italy-,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.098612,1.098612,1.098612,1.098612,1.098612,1.098612,1.098612,1.386294,1.386294,1.386294,1.386294,1.386294,1.386294,1.386294,1.386294,1.386294,1.386294,1.386294,1.386294,1.386294,1.386294,3.044522,4.143135,5.049856,5.438079,5.777652,6.118097,6.486161,6.790097,7.029088,7.435438,7.619233,7.825245,8.035926,8.258163,8.441823,8.679992,8.905987,9.12402,...,9.959773,10.1165,10.239281,10.357965,10.483298,10.622205,10.758371,10.888912,10.987646,11.065513,11.144424,11.217036,11.29713,11.367888,11.434672,11.489554,11.530176,11.56924,11.60924,11.64644,11.681036,11.71321,11.743132,11.77096,11.796839,11.820907,11.84329,11.864107,11.883466,11.90147,11.918214,11.933786,11.948268,11.961736,11.974261,11.98591,11.996743,12.006817,12.016187,12.024901,12.033004,12.040541,12.04755,12.054068,12.06013,12.065768,12.071011,12.075887,12.080421
209,Spain-,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.693147,0.693147,0.693147,0.693147,0.693147,0.693147,0.693147,0.693147,1.098612,1.098612,1.098612,1.098612,1.098612,1.098612,1.098612,1.098612,1.098612,1.098612,1.098612,1.098612,1.098612,1.098612,1.098612,1.098612,1.94591,2.639057,2.772589,3.496508,3.828641,4.442651,4.795791,5.111988,5.407172,5.560682,5.993961,6.216606,6.51323,6.979145,...,8.762802,8.961751,9.204624,9.371523,9.540435,9.796125,9.923829,10.14152,10.267054,10.46701,10.593781,10.810051,10.964519,11.093159,11.201442,11.291168,11.384603,11.471311,11.551311,11.625711,11.694903,11.759252,11.819096,11.874751,11.92651,11.974647,12.019413,12.061046,12.099765,12.135773,12.169261,12.200404,12.229368,12.256304,12.281354,12.304651,12.326317,12.346467,12.365206,12.382634,12.398841,12.413914,12.427932,12.440968,12.453092,12.464368,12.474854,12.484606,12.493675
257,US-New York,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,6.265301,6.597146,6.875232,7.442493,7.822445,8.587838,9.025335,9.368284,9.667829,9.946786,10.153546,10.336633,10.542126,10.711681,10.866872,10.996233,11.10742,11.236302,11.386302,11.525802,11.655537,11.776191,11.888398,11.992752,12.0898,12.180055,12.263993,12.342054,12.414652,12.482167,12.544957,12.603351,12.657657,12.708162,12.755132,12.798814,12.839438,12.877219,12.912355,12.945031,12.97542,13.003682,13.029965,13.054409,13.077141,13.098282,13.117944,13.136229,13.153234
121,Germany-,0.0,0.0,0.0,0.0,0.0,0.693147,1.609438,1.609438,1.609438,1.791759,2.197225,2.397895,2.564949,2.564949,2.564949,2.564949,2.639057,2.639057,2.70805,2.70805,2.833213,2.833213,2.833213,2.833213,2.833213,2.833213,2.833213,2.833213,2.833213,2.833213,2.833213,2.833213,2.833213,2.833213,2.890372,3.332205,3.850148,3.89182,4.382027,4.875197,5.075174,5.283204,5.572154,6.180017,6.508769,6.684612,6.947937,7.070724,...,8.430763,8.664923,8.891924,9.133243,9.419628,9.63698,9.895909,10.008478,10.121578,10.277015,10.403869,10.527392,10.690558,10.837068,10.962943,11.036437,11.110745,11.181765,11.251765,11.316865,11.377408,11.433713,11.486077,11.534775,11.580064,11.622183,11.661354,11.697783,11.731662,11.763169,11.792471,11.819721,11.845064,11.868633,11.890553,11.910937,11.929895,11.947526,11.963923,11.979172,11.993353,12.006542,12.018808,12.030215,12.040823,12.050689,12.059865,12.068398,12.076333
62,China-Hubei,6.098074,6.098074,6.309918,6.635947,6.96508,7.261225,8.17611,8.17611,8.497806,8.666819,8.875427,9.321703,9.512147,9.721906,9.886647,10.003921,10.124789,10.207326,10.29661,10.364986,10.415323,10.415323,10.78326,10.904248,10.937561,10.971348,11.001933,11.029764,11.035406,11.042009,11.045526,11.067966,11.067966,11.071128,11.07886,11.085031,11.091285,11.096121,11.102518,11.111074,11.113999,11.115696,11.117406,11.119394,11.12126,11.122354,11.12296,11.123491,...,11.124185,11.124244,11.124303,11.124317,11.124332,11.124332,11.124332,11.124332,11.124332,11.124332,11.124347,11.124347,11.124347,11.124347,11.124347,11.124347,11.124347,11.124347,11.134347,11.143647,11.152296,11.16034,11.16782,11.174777,11.181247,11.187264,11.19286,11.198064,11.202904,11.207405,11.211591,11.215484,11.219104,11.222471,11.225602,11.228514,11.231223,11.233741,11.236084,11.238262,11.240288,11.242172,11.243925,11.245554,11.24707,11.248479,11.24979,11.251009,11.252142


In [30]:
confirmed_prediciton = pd.melt(daily_log_confirmed[:25], id_vars='Location')
confirmed_prediciton['ConfirmedCases'] = to_exp(confirmed_prediciton['value'])
fig2 = px.line(confirmed_prediciton,
               x='Date', y='ConfirmedCases', color='Location')
_ = fig2.update_layout(
    yaxis_type="log",
    title_text=f'COVID-19 Cumulative Confirmed Cases Prediction [Updated: {TRAIN_END}]'
)
fig2.show()

# Fatalities

In [31]:
train_clean['Geo#Country#Contintent'] = train_clean.Location + '#' + train_clean.Country_Region + '#' + train_clean.continent
latest = train_clean[train_clean.Date == TRAIN_END][[
    'Geo#Country#Contintent', 'ConfirmedCases', 'Fatalities', 'LogConfirmed', 'LogFatalities']]
daily_death_deltas = train_clean[train_clean.Date >= '2020-03-17'].pivot(
    'Geo#Country#Contintent', 'Date', 'LogFatalitiesDelta').round(3).reset_index()
daily_death_deltas = latest.merge(daily_death_deltas, on='Geo#Country#Contintent')
daily_death_deltas.shape
daily_death_deltas.head()
daily_death_deltas.to_csv('daily_death_deltas.csv', index=False)

(294, 20)

Unnamed: 0,Geo#Country#Contintent,ConfirmedCases,Fatalities,LogConfirmed,LogFatalities,2020-03-17,2020-03-18,2020-03-19,2020-03-20,2020-03-21,2020-03-22,2020-03-23,2020-03-24,2020-03-25,2020-03-26,2020-03-27,2020-03-28,2020-03-29,2020-03-30,2020-03-31
0,Afghanistan-#Afghanistan#Asia,174.0,4.0,5.164786,1.609438,0.0,0.0,0.0,0.0,0.693,0.0,0.0,0.405,0.511,0.0,0.0,0.0,0.0,0.0,
1,Albania-#Albania#Europe,243.0,15.0,5.497168,2.772589,0.405,0.0,0.0,0.0,0.0,0.511,0.182,0.0,0.154,0.251,0.201,0.0,0.087,0.288,
2,Algeria-#Algeria#Africa,716.0,44.0,6.575076,3.806662,0.47,0.223,0.182,0.288,0.118,0.0,0.105,0.095,0.167,0.038,0.105,0.065,0.118,0.223,
3,Andorra-#Andorra#Europe,376.0,12.0,5.932245,2.564949,0.0,0.0,0.0,0.0,0.693,0.0,0.0,0.0,0.693,0.0,0.0,0.56,0.251,0.368,
4,Angola-#Angola#Africa,7.0,2.0,2.079442,1.098612,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.099,0.0,0.0,


In [32]:
death_deltas = train.groupby(['Location', 'Country_Region', 'continent'])[[
    'Id']].count().reset_index()

GLOBAL_DELTA = 0.11
death_deltas['DELTA'] = GLOBAL_DELTA

death_deltas.loc[death_deltas.Country_Region=='China', 'DELTA'] = 0.005
death_deltas.loc[death_deltas.continent=='Oceania', 'DELTA'] = 0.08
death_deltas.loc[death_deltas.Country_Region=='Korea South', 'DELTA'] = 0.04
death_deltas.loc[death_deltas.Country_Region=='Japan', 'DELTA'] = 0.04
death_deltas.loc[death_deltas.Country_Region=='Singapore', 'DELTA'] = 0.05
death_deltas.loc[death_deltas.Country_Region=='Taiwan*', 'DELTA'] = 0.06



death_deltas.loc[death_deltas.Country_Region=='US', 'DELTA'] = 0.17

death_deltas.loc[death_deltas.Country_Region=='Switzerland', 'DELTA'] = 0.15
death_deltas.loc[death_deltas.Country_Region=='Norway', 'DELTA'] = 0.15
death_deltas.loc[death_deltas.Country_Region=='Iceland', 'DELTA'] = 0.01
death_deltas.loc[death_deltas.Country_Region=='Austria', 'DELTA'] = 0.14
death_deltas.loc[death_deltas.Country_Region=='Italy', 'DELTA'] = 0.07
death_deltas.loc[death_deltas.Country_Region=='Spain', 'DELTA'] = 0.1
death_deltas.loc[death_deltas.Country_Region=='Portugal', 'DELTA'] = 0.13
death_deltas.loc[death_deltas.Country_Region=='Israel', 'DELTA'] = 0.16
death_deltas.loc[death_deltas.Country_Region=='Iran', 'DELTA'] = 0.06
death_deltas.loc[death_deltas.Country_Region=='Germany', 'DELTA'] = 0.14
death_deltas.loc[death_deltas.Country_Region=='Malaysia', 'DELTA'] = 0.14
death_deltas.loc[death_deltas.Country_Region=='Russia', 'DELTA'] = 0.2
death_deltas.loc[death_deltas.Country_Region=='Ukraine', 'DELTA'] = 0.2
death_deltas.loc[death_deltas.Country_Region=='Brazil', 'DELTA'] = 0.2
death_deltas.loc[death_deltas.Country_Region=='Turkey', 'DELTA'] = 0.22
death_deltas.loc[death_deltas.Country_Region=='Philippines', 'DELTA'] = 0.12
death_deltas.loc[death_deltas.Location=='France-', 'DELTA'] = 0.14
death_deltas.loc[death_deltas.Location=='United Kingdom-', 'DELTA'] = 0.14
death_deltas.loc[death_deltas.Location=='Diamond Princess-', 'DELTA'] = 0.00

death_deltas.loc[death_deltas.Location=='China-Hong Kong', 'DELTA'] = 0.01
death_deltas.loc[death_deltas.Location=='San Marino-', 'DELTA'] = 0.05


death_deltas.shape
death_deltas.DELTA.mean()

death_deltas[death_deltas.DELTA != GLOBAL_DELTA].shape
death_deltas[death_deltas.DELTA != GLOBAL_DELTA].DELTA.mean()
death_deltas[death_deltas.DELTA != GLOBAL_DELTA]
death_deltas.describe()

(294, 5)

0.10836734693877562

(122, 5)

0.10606557377049175

Unnamed: 0,Location,Country_Region,continent,Id,DELTA
8,Australia-Australian Capital Territory,Australia,Oceania,70,0.08
9,Australia-New South Wales,Australia,Oceania,70,0.08
10,Australia-Northern Territory,Australia,Oceania,70,0.08
11,Australia-Queensland,Australia,Oceania,70,0.08
12,Australia-South Australia,Australia,Oceania,70,0.08
...,...,...,...,...,...
275,US-West Virginia,US,Americas,70,0.17
276,US-Wisconsin,US,Americas,70,0.17
277,US-Wyoming,US,Americas,70,0.17
279,Ukraine-,Ukraine,Europe,70,0.20


Unnamed: 0,Id,DELTA
count,294.0,294.0
mean,70.0,0.108367
std,0.0,0.047433
min,70.0,0.0
25%,70.0,0.11
50%,70.0,0.11
75%,70.0,0.11
max,70.0,0.22


In [33]:
daily_log_deaths = train_clean.pivot('Location', 'Date', 'LogFatalities').reset_index()
daily_log_deaths = daily_log_deaths.sort_values(TRAIN_END, ascending=False)
daily_log_deaths.to_csv('daily_log_deaths.csv', index=False)

for i, d in tqdm(enumerate(pd.date_range(add_days(TRAIN_END, 1), add_days(TEST_END, 1)))):
    new_day = str(d).split(' ')[0]
    last_day = dt.datetime.strptime(new_day, DATEFORMAT) - dt.timedelta(days=1)
    last_day = last_day.strftime(DATEFORMAT)
    for loc in death_deltas.Location:
        death_delta = death_deltas.loc[death_deltas.Location == loc, 'DELTA'].values[0]
        daily_log_deaths.loc[daily_log_deaths.Location == loc, new_day] = daily_log_deaths.loc[daily_log_deaths.Location == loc, last_day] + \
            death_delta * DECAY ** i

31it [00:42,  1.38s/it]


In [34]:
confirmed_prediciton = pd.melt(daily_log_deaths[:25], id_vars='Location')
confirmed_prediciton['Fatalities'] = to_exp(confirmed_prediciton['value'])
fig2 = px.line(confirmed_prediciton,
               x='Date', y='Fatalities', color='Location')
_ = fig2.update_layout(
    yaxis_type="log",
    title_text=f'COVID-19 Cumulative Fatalities Prediction [Updated: {TRAIN_END}]'
)
fig2.show()

# Create submission file

In [35]:
submission.head(2)

Unnamed: 0,ForecastId,ConfirmedCases,Fatalities
0,1,1,1
1,2,1,1


In [36]:
confirmed = []
fatalities = []
for id, d, loc in tqdm(test[['ForecastId', 'Date', 'Location']].values):
    c = to_exp(daily_log_confirmed.loc[daily_log_confirmed.Location == loc, d].values[0])
    f = to_exp(daily_log_deaths.loc[daily_log_deaths.Location == loc, d].values[0])
    confirmed.append(c)
    fatalities.append(f)

100%|██████████| 12642/12642 [00:20<00:00, 623.99it/s]


In [37]:
my_submission = test.copy()
my_submission['ConfirmedCases'] = confirmed
my_submission['Fatalities'] = fatalities
my_submission.shape
my_submission.head()




(12642, 11)

Unnamed: 0,ForecastId,Country_Region,Date,Location,country_iso_code_2,country_iso_code_3,continent,geo_region,DateTime,ConfirmedCases,Fatalities
0,1,Afghanistan,2020-03-19,Afghanistan-,AF,AFG,Asia,Southern Asia,2020-03-19,22.0,0.0
1,2,Afghanistan,2020-03-20,Afghanistan-,AF,AFG,Asia,Southern Asia,2020-03-20,24.0,0.0
2,3,Afghanistan,2020-03-21,Afghanistan-,AF,AFG,Asia,Southern Asia,2020-03-21,24.0,0.0
3,4,Afghanistan,2020-03-22,Afghanistan-,AF,AFG,Asia,Southern Asia,2020-03-22,40.0,1.0
4,5,Afghanistan,2020-03-23,Afghanistan-,AF,AFG,Asia,Southern Asia,2020-03-23,40.0,1.0


In [38]:
my_submission.groupby('Date').sum().tail()

Unnamed: 0_level_0,ForecastId,ConfirmedCases,Fatalities
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-04-26,1863519,2974951.0,155813.550612
2020-04-27,1863813,3027175.0,158664.503057
2020-04-28,1864107,3076681.0,161368.963727
2020-04-29,1864401,3123548.0,163930.980791
2020-04-30,1864695,3167862.0,166354.985626


# Sanity check

In [39]:
total = my_submission.groupby('Date')[['ConfirmedCases', 'Fatalities']].sum().reset_index()

fig2 = px.line(pd.melt(total, id_vars=['Date']), x='Date', y='value', color='variable')
_ = fig2.update_layout(
    yaxis_type="log",
    title_text=f'COVID-19 Cumulative Prediction Total [Updated: {TRAIN_END}]'
)
fig2.show()

In [40]:
my_submission[[
    'ForecastId', 'ConfirmedCases', 'Fatalities'
]].to_csv('submission.csv', index=False)
print(DECAY)
my_submission.head()
my_submission.tail()
my_submission.shape

0.93


Unnamed: 0,ForecastId,Country_Region,Date,Location,country_iso_code_2,country_iso_code_3,continent,geo_region,DateTime,ConfirmedCases,Fatalities
0,1,Afghanistan,2020-03-19,Afghanistan-,AF,AFG,Asia,Southern Asia,2020-03-19,22.0,0.0
1,2,Afghanistan,2020-03-20,Afghanistan-,AF,AFG,Asia,Southern Asia,2020-03-20,24.0,0.0
2,3,Afghanistan,2020-03-21,Afghanistan-,AF,AFG,Asia,Southern Asia,2020-03-21,24.0,0.0
3,4,Afghanistan,2020-03-22,Afghanistan-,AF,AFG,Asia,Southern Asia,2020-03-22,40.0,1.0
4,5,Afghanistan,2020-03-23,Afghanistan-,AF,AFG,Asia,Southern Asia,2020-03-23,40.0,1.0


Unnamed: 0,ForecastId,Country_Region,Date,Location,country_iso_code_2,country_iso_code_3,continent,geo_region,DateTime,ConfirmedCases,Fatalities
12637,12638,Zimbabwe,2020-04-26,Zimbabwe-,ZW,ZWE,Africa,Eastern Africa,2020-04-26,48.113015,6.586917
12638,12639,Zimbabwe,2020-04-27,Zimbabwe-,ZW,ZWE,Africa,Eastern Africa,2020-04-27,49.166181,6.714455
12639,12640,Zimbabwe,2020-04-28,Zimbabwe-,ZW,ZWE,Africa,Eastern Africa,2020-04-28,50.165882,6.834989
12640,12641,Zimbabwe,2020-04-29,Zimbabwe-,ZW,ZWE,Africa,Eastern Africa,2020-04-29,51.113476,6.948775
12641,12642,Zimbabwe,2020-04-30,Zimbabwe-,ZW,ZWE,Africa,Eastern Africa,2020-04-30,52.01048,7.056078


(12642, 11)

In [41]:
end = dt.datetime.now()
print('Finished', end, (end - start).seconds, 's')

Finished 2020-04-01 20:42:33.590690 127 s
