# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [1]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [2]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [3]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x7f65184ae0d0>)

In [4]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [5]:
df.columns

Index(['CountryName', 'CountryCode', 'RegionName', 'RegionCode',
       'Jurisdiction', 'Date', 'C1_School closing', 'C1_Flag',
       'C2_Workplace closing', 'C2_Flag', 'C3_Cancel public events', 'C3_Flag',
       'C4_Restrictions on gatherings', 'C4_Flag', 'C5_Close public transport',
       'C5_Flag', 'C6_Stay at home requirements', 'C6_Flag',
       'C7_Restrictions on internal movement', 'C7_Flag',
       'C8_International travel controls', 'E1_Income support', 'E1_Flag',
       'E2_Debt/contract relief', 'E3_Fiscal measures',
       'E4_International support', 'H1_Public information campaigns',
       'H1_Flag', 'H2_Testing policy', 'H3_Contact tracing',
       'H4_Emergency investment in healthcare', 'H5_Investment in vaccines',
       'H6_Facial Coverings', 'H6_Flag', 'H7_Vaccination policy', 'H7_Flag',
       'M1_Wildcard', 'ConfirmedCases', 'ConfirmedDeaths', 'StringencyIndex',
       'StringencyIndexForDisplay', 'StringencyLegacyIndex',
       'StringencyLegacyIndexForDispla

In [6]:
# For testing, restrict training data to that before a hypothetical predictor submission date
HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [7]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [8]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [9]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [10]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [11]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [12]:
df

Unnamed: 0,CountryName,RegionName,GeoID,Date,NewCases,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
0,Aruba,,Aruba__nan,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,,Aruba__nan,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,,Aruba__nan,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,,Aruba__nan,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,,Aruba__nan,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91024,Zimbabwe,,Zimbabwe__nan,2020-07-27,192.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
91025,Zimbabwe,,Zimbabwe__nan,2020-07-28,113.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
91026,Zimbabwe,,Zimbabwe__nan,2020-07-29,62.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
91027,Zimbabwe,,Zimbabwe__nan,2020-07-30,213.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0


In [13]:
# Set number of past days to use to make predictions
nb_lookback_days = 30

# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d + 1]
        X_samples.append(X_sample)
        y_samples.append(y_sample)

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [14]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [15]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_samples,
                                                    y_samples,
                                                    test_size=0.2,
                                                    random_state=301)

In [16]:
# Create and train Lasso model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.
model = Lasso(alpha=0.1,
              precompute=True,
              max_iter=10000,
              positive=True,
              selection='random')
# Fit model
model.fit(X_train, y_train)

Lasso(alpha=0.1, max_iter=10000, positive=True, precompute=True,
      selection='random')

In [17]:
# Evaluate model
train_preds = model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))

test_preds = model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))

Train MAE: 143.93261241527352
Test MAE: 146.99429654615457


In [18]:
# Inspect the learned feature coefficients for the model
# to see what features it's paying attention to.

# Give names to the features
x_col_names = []
for d in range(-nb_lookback_days, 0):
    x_col_names.append('Day ' + str(d) + ' ' + cases_col[0])
for d in range(-nb_lookback_days, 1):
    for col_name in npi_cols:
        x_col_names.append('Day ' + str(d) + ' ' + col_name)

# View non-zero coefficients
for (col, coeff) in zip(x_col_names, list(model.coef_)):
    if coeff != 0.:
        print(col, coeff)
print('Intercept', model.intercept_)

Day -7 NewCases 0.04388208438852578
Day -6 NewCases 0.3884012325249454
Day -5 NewCases 0.267901524283928
Day -4 NewCases 0.060639025062931166
Day -3 NewCases 0.02462600901480678
Day -2 NewCases 0.08915248411120327
Day -1 NewCases 0.1992961636420298
Day -26 C6_Stay at home requirements 5.654036241170593
Day -22 C2_Workplace closing 3.6831741362788373
Day -21 C2_Workplace closing 10.147264260411363
Intercept 27.203348427146693


In [19]:
# Save model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [20]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [21]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="../../../validation/data/2020-09-30_historical_ip.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 66.77601228408992
2020-08-02: 74.9569598505673
2020-08-03: 78.35858631827678
2020-08-04: 81.84713614514999
2020-08-05: 88.56976948804756
2020-08-06: 102.44459930011737
2020-08-07: 133.43901576564656
2020-08-08: 148.24921513557192
2020-08-09: 157.32817665034844
2020-08-10: 165.3678010780756
2020-08-11: 176.50491862582044
2020-08-12: 194.55036613421123
2020-08-13: 216.50298052297444
2020-08-14: 232.79329348599475
2020-08-15: 245.44676167247675
2020-08-16: 257.5603893892693
2020-08-17: 272.3478961242516
2020-08-18: 291.0531154491563
2020-08-19: 310.84743514976395
2020-08-20: 328.2390872703902
2020-08-21: 343.70195933870855
2020-08-22: 359.17768485798115
2020-08-23: 376.55530011503197
2020-08-24: 396.05065847709227
2020-08-25: 415.972237313178
2020-08-26: 434.8130475168458
2020-08-27: 447.13877619944174
2020-08-28: 474.43980797572385
2020-08-29: 498.98627191832594
2020-08-30: 521.429873214367
2020-08-31: 543.1514552196612

Predicting for Afghanistan__

2020-08-06: 2633.9456284958915
2020-08-07: 3805.5756990902555
2020-08-08: 4259.960259491119
2020-08-09: 4366.77457315297
2020-08-10: 4164.254339566802
2020-08-11: 3795.9711904509077
2020-08-12: 3736.6787297777864
2020-08-13: 4237.736937093347
2020-08-14: 4567.472682119614
2020-08-15: 4661.2365406307945
2020-08-16: 4545.428214984863
2020-08-17: 4401.397917884595
2020-08-18: 4479.716476505505
2020-08-19: 4765.6647153304475
2020-08-20: 4994.242729955988
2020-08-21: 5078.347526953055
2020-08-22: 5057.974298360148
2020-08-23: 5048.022110045073
2020-08-24: 5160.85891069595
2020-08-25: 5362.793739987219
2020-08-26: 5535.477553120077
2020-08-27: 5632.963899122409
2020-08-28: 5672.714407587701
2020-08-29: 5731.295014599495
2020-08-30: 5856.873717375756
2020-08-31: 6023.659558420579

Predicting for Burundi__nan
2020-08-01: 54.7125573579724
2020-08-02: 64.70848881823463
2020-08-03: 72.36664600031088
2020-08-04: 80.25636104883957
2020-08-05: 86.86628891766786
2020-08-06: 97.96244777519355
2020-08-

2020-08-16: 1148.3717775443317
2020-08-17: 1152.788669684983
2020-08-18: 1195.8518256836676
2020-08-19: 1264.928093442359
2020-08-20: 1308.6451752245905
2020-08-21: 1330.1067668727628
2020-08-22: 1343.1396371537535
2020-08-23: 1366.0454871860322
2020-08-24: 1410.3771652746975
2020-08-25: 1463.3076510794851
2020-08-26: 1504.923725457257
2020-08-27: 1522.8547930915622
2020-08-28: 1546.2705332450653
2020-08-29: 1578.1153845590404
2020-08-30: 1621.9184341519112
2020-08-31: 1668.803862304389

Predicting for Belarus__nan
2020-08-01: 1629.0712089507929
2020-08-02: 1843.9000966059534
2020-08-03: 1958.8226563961177
2020-08-04: 1868.9822779643578
2020-08-05: 1462.9358092361515
2020-08-06: 1179.3662723499353
2020-08-07: 1709.3329769190716
2020-08-08: 1959.9400163630417
2020-08-09: 2055.5222670693543
2020-08-10: 1954.1377274905788
2020-08-11: 1743.1416451241496
2020-08-12: 1723.6245926587164
2020-08-13: 1964.757518774081
2020-08-14: 2145.929298415059
2020-08-15: 2211.2188651282404
2020-08-16: 2153

2020-08-27: 4284.87282643789
2020-08-28: 4325.148426507885
2020-08-29: 4384.256049421119
2020-08-30: 4490.63213122354
2020-08-31: 4621.76546142342

Predicting for Brazil__Ceara
2020-08-01: 941.9678384859367
2020-08-02: 620.2380235084158
2020-08-03: 627.8514199248447
2020-08-04: 1554.4530308274873
2020-08-05: 2232.338031710934
2020-08-06: 1136.6991415136777
2020-08-07: 1086.2765720504094
2020-08-08: 969.8453351024793
2020-08-09: 1193.3476729245813
2020-08-10: 1701.66424110686
2020-08-11: 1827.4000107275485
2020-08-12: 1486.9112741465894
2020-08-13: 1357.4783914230431
2020-08-14: 1347.6896226834144
2020-08-15: 1551.3172329471115
2020-08-16: 1808.1157554493766
2020-08-17: 1849.2833478776504
2020-08-18: 1723.3991074153716
2020-08-19: 1652.8170841332424
2020-08-20: 1689.1836631179924
2020-08-21: 1836.9915351526881
2020-08-22: 1990.172188357298
2020-08-23: 2027.7104747789888
2020-08-24: 1988.6971928768583
2020-08-25: 1973.8082345929436
2020-08-26: 2028.201426997751
2020-08-27: 2139.071869062

2020-08-02: 723.8125116077754
2020-08-03: 745.8320983246235
2020-08-04: 703.033489253487
2020-08-05: 571.2702112850296
2020-08-06: 495.2911468297844
2020-08-07: 717.1425522746642
2020-08-08: 806.2901630835092
2020-08-09: 833.5939998502168
2020-08-10: 796.8829488799205
2020-08-11: 734.2389660135639
2020-08-12: 748.7015080207857
2020-08-13: 853.4666222580107
2020-08-14: 923.5415690277059
2020-08-15: 948.0866722319311
2020-08-16: 932.8398271379672
2020-08-17: 918.0005464418534
2020-08-18: 949.4726193935593
2020-08-19: 1015.6335772797612
2020-08-20: 1068.7252076116768
2020-08-21: 1093.603501666684
2020-08-22: 1108.15961233141
2020-08-23: 1124.2798542076946
2020-08-24: 1161.919582613962
2020-08-25: 1214.026636544597
2020-08-26: 1259.2357616553174
2020-08-27: 1285.0337716615206
2020-08-28: 1308.8353072000866
2020-08-29: 1337.135474774254
2020-08-30: 1377.5605608120143
2020-08-31: 1424.29240585521

Predicting for Brazil__Pernambuco
2020-08-01: 1121.674911571087
2020-08-02: 1194.1065632461757


2020-08-03: 3701.459425172886
2020-08-04: 4402.796763552884
2020-08-05: 4231.3274826401375
2020-08-06: 3092.330687112658
2020-08-07: 4140.617970751953
2020-08-08: 4180.954020824565
2020-08-09: 4370.405572004834
2020-08-10: 4591.63070051172
2020-08-11: 4376.208817673882
2020-08-12: 4191.03577946703
2020-08-13: 4519.8427655288815
2020-08-14: 4689.395956913092
2020-08-15: 4869.497966301339
2020-08-16: 4953.9019760932315
2020-08-17: 4887.323422218369
2020-08-18: 4903.009428170214
2020-08-19: 5078.206507859205
2020-08-20: 5246.532680179942
2020-08-21: 5392.051873312424
2020-08-22: 5474.322216815628
2020-08-23: 5504.190870690556
2020-08-24: 5581.375299281246
2020-08-25: 5724.1007911670185
2020-08-26: 5877.201876342423
2020-08-27: 6004.443269354086
2020-08-28: 6097.987926882373
2020-08-29: 6176.288950114609
2020-08-30: 6282.176733297134
2020-08-31: 6420.117703249102

Predicting for Brazil__Sergipe
2020-08-01: 336.5556431065236
2020-08-02: 893.837709904129
2020-08-03: 1484.876516500577
2020-08

2020-08-18: 280.4888490662819
2020-08-19: 299.73014945839293
2020-08-20: 317.053296425006
2020-08-21: 332.6278059020657
2020-08-22: 348.1018811815047
2020-08-23: 365.2481599041174
2020-08-24: 384.33657909682074
2020-08-25: 403.9070879735403
2020-08-26: 422.61488738454096
2020-08-27: 434.8986713143329
2020-08-28: 451.9411818296385
2020-08-29: 470.5327407252279
2020-08-30: 490.6036216786605
2020-08-31: 510.87641798072923

Predicting for Canada__nan
2020-08-01: 5645.313358895991
2020-08-02: 6236.473727271319
2020-08-03: 6969.712189844282
2020-08-04: 7066.3053914277625
2020-08-05: 5313.67155242047
2020-08-06: 4074.3091722318213
2020-08-07: 5798.122002182294
2020-08-08: 6667.6472266717465
2020-08-09: 7194.433445308387
2020-08-10: 6944.3968290170815
2020-08-11: 6058.959553018432
2020-08-12: 5829.436702042909
2020-08-13: 6578.633500726986
2020-08-14: 7225.0122531317975
2020-08-15: 7537.05733471187
2020-08-16: 7350.196630472338
2020-08-17: 6985.757316743612
2020-08-18: 7016.05623890911
2020-08

2020-08-29: 10937.949071434983
2020-08-30: 11201.952818248943
2020-08-31: 11532.087981538212

Predicting for Comoros__nan
2020-08-01: 52.78667748335026
2020-08-02: 63.27506886155866
2020-08-03: 69.76292231954103
2020-08-04: 73.45429838008556
2020-08-05: 78.34810330761404
2020-08-06: 94.24540317780234
2020-08-07: 121.60267443737979
2020-08-08: 136.94471315606867
2020-08-09: 147.09860194441913
2020-08-10: 155.15720440721196
2020-08-11: 166.02688482119743
2020-08-12: 183.81032677785885
2020-08-13: 204.5710907687446
2020-08-14: 220.9300151751112
2020-08-15: 233.91415719887516
2020-08-16: 246.03745986936383
2020-08-17: 260.61256035608164
2020-08-18: 278.8558166682578
2020-08-19: 298.10338995961774
2020-08-20: 315.4031843442903
2020-08-21: 330.90878617849444
2020-08-22: 336.1574612652104
2020-08-23: 347.5762487887672
2020-08-24: 364.6325287637227
2020-08-25: 383.0302662881834
2020-08-26: 400.53480769379246
2020-08-27: 403.6556533662259
2020-08-28: 413.10365890863136
2020-08-29: 426.046326527

2020-08-31: 459.5145317223559

Predicting for Denmark__nan
2020-08-01: 1659.8692583659538
2020-08-02: 1919.4605472469664
2020-08-03: 2031.9356321258554
2020-08-04: 2116.530903031302
2020-08-05: 1728.0773585550737
2020-08-06: 1290.8957023358828
2020-08-07: 1797.9327867060515
2020-08-08: 2059.367562156688
2020-08-09: 2200.09271239157
2020-08-10: 2171.147537000864
2020-08-11: 1950.8239501873006
2020-08-12: 1872.6403814442453
2020-08-13: 2093.0250938411173
2020-08-14: 2284.287679713393
2020-08-15: 2385.143535193958
2020-08-16: 2358.8894218275445
2020-08-17: 2272.932951527393
2020-08-18: 2286.5497389638404
2020-08-19: 2420.475920581047
2020-08-20: 2555.6490786154754
2020-08-21: 2630.1834553111667
2020-08-22: 2642.5606995562975
2020-08-23: 2635.915738196887
2020-08-24: 2683.1229938482893
2020-08-25: 2785.5910201510924
2020-08-26: 2889.154308261006
2020-08-27: 2952.231705747703
2020-08-28: 2985.719559426192
2020-08-29: 3017.39010239251
2020-08-30: 3080.0158687924604
2020-08-31: 3169.585098664

2020-08-03: 448.29152113533416
2020-08-04: 404.20575262076136
2020-08-05: 359.18742617340797
2020-08-06: 319.9928895767911
2020-08-07: 460.89249801827503
2020-08-08: 521.6289646659884
2020-08-09: 529.5885266006075
2020-08-10: 508.0381764610788
2020-08-11: 484.57257574438324
2020-08-12: 502.4022322127445
2020-08-13: 573.0925943227192
2020-08-14: 618.7912799337322
2020-08-15: 633.2005909446877
2020-08-16: 628.6610865854962
2020-08-17: 629.1698271579517
2020-08-18: 656.8258735154132
2020-08-19: 703.626323431899
2020-08-20: 739.8680785571199
2020-08-21: 758.36101286164
2020-08-22: 767.1126240991573
2020-08-23: 781.6433906469384
2020-08-24: 811.2744953792833
2020-08-25: 848.9124044058652
2020-08-26: 881.0280347782118
2020-08-27: 892.2044369017708
2020-08-28: 908.1221715864406
2020-08-29: 929.3301496711155
2020-08-30: 959.4282657471445
2020-08-31: 992.9098763735652

Predicting for Fiji__nan
2020-08-01: 53.11866177122797
2020-08-02: 63.015968983759414
2020-08-03: 69.63636082239427
2020-08-04:

2020-08-31: 1360.491236089706

Predicting for United Kingdom__Wales
2020-08-01: 1299.6441195642035
2020-08-02: 1416.9365919539134
2020-08-03: 1293.4056757676387
2020-08-04: 811.9548452815258
2020-08-05: 477.00552606470944
2020-08-06: 686.0519107218458
2020-08-07: 1214.4045179653583
2020-08-08: 1370.3905092242871
2020-08-09: 1261.6049109916266
2020-08-10: 997.3681590134588
2020-08-11: 875.6704318588156
2020-08-12: 1042.6811301951575
2020-08-13: 1308.1873974992648
2020-08-14: 1411.5977456468593
2020-08-15: 1346.4138513971213
2020-08-16: 1219.3013408611478
2020-08-17: 1192.6832384119664
2020-08-18: 1311.3677850236963
2020-08-19: 1463.7206571815411
2020-08-20: 1530.6543564110318
2020-08-21: 1504.0519752619637
2020-08-22: 1466.4514598430126
2020-08-23: 1487.0354718897431
2020-08-24: 1576.93402400454
2020-08-25: 1676.4602919404272
2020-08-26: 1728.0928924416833
2020-08-27: 1727.5956436143006
2020-08-28: 1729.7450474890309
2020-08-29: 1767.8645910094494
2020-08-30: 1841.255216472998
2020-08-3

2020-08-03: 101.74841861246125
2020-08-04: 98.2216018737422
2020-08-05: 91.8333091379832
2020-08-06: 113.86641440706434
2020-08-07: 155.6460935046085
2020-08-08: 171.40123263235088
2020-08-09: 178.89643787523818
2020-08-10: 181.230376610044
2020-08-11: 188.55192066274438
2020-08-12: 210.82929936493053
2020-08-13: 237.8487700793728
2020-08-14: 255.5022305355561
2020-08-15: 266.64970567886064
2020-08-16: 275.6583275389356
2020-08-17: 289.43496359605933
2020-08-18: 310.54114513087507
2020-08-19: 333.0749283997481
2020-08-20: 351.3617531583433
2020-08-21: 365.8881399833642
2020-08-22: 369.77993624145427
2020-08-23: 381.3847615337743
2020-08-24: 400.3501934697776
2020-08-25: 420.71863630486706
2020-08-26: 438.97367528066167
2020-08-27: 453.0856286654481
2020-08-28: 464.43518464889655
2020-08-29: 479.39887128825956
2020-08-30: 498.1795845410718
2020-08-31: 518.0257115950496

Predicting for Hong Kong__nan
2020-08-01: 143.44702339186972
2020-08-02: 157.71670835394306
2020-08-03: 164.8532817106

2020-08-08: 11940.860792184572
2020-08-09: 12320.480433887684
2020-08-10: 11671.212210845484
2020-08-11: 10418.22753203949
2020-08-12: 10313.307742423012
2020-08-13: 11756.484473168273
2020-08-14: 12688.207380732501
2020-08-15: 12950.914667144874
2020-08-16: 12544.319500356825
2020-08-17: 12053.904889201984
2020-08-18: 12273.78106116656
2020-08-19: 13085.334864426302
2020-08-20: 13725.536053014113
2020-08-21: 13935.148452248435
2020-08-22: 13799.687855332346
2020-08-23: 13710.62442870936
2020-08-24: 14006.077135275133
2020-08-25: 14562.762334604939
2020-08-26: 15030.063511092643
2020-08-27: 15242.262753039364
2020-08-28: 15290.564145855578
2020-08-29: 15402.988720851947
2020-08-30: 15723.245724728227
2020-08-31: 16165.524668433154

Predicting for Iraq__nan
2020-08-01: 1706.6193010347342
2020-08-02: 1823.4667530975587
2020-08-03: 1918.6813214419442
2020-08-04: 1986.7465738064182
2020-08-05: 1671.962274703001
2020-08-06: 1270.8712057002447
2020-08-07: 1771.3163715993903
2020-08-08: 1977.

2020-08-11: 610.9271919275873
2020-08-12: 664.2253048137452
2020-08-13: 761.4231221521184
2020-08-14: 785.429160776014
2020-08-15: 774.4658454054664
2020-08-16: 765.7635945596554
2020-08-17: 781.6513728222077
2020-08-18: 831.872468073064
2020-08-19: 888.9405068060083
2020-08-20: 915.3069688654443
2020-08-21: 922.3135490916914
2020-08-22: 941.053851720666
2020-08-23: 972.449668994253
2020-08-24: 1017.6405988102102
2020-08-25: 1061.765082207251
2020-08-26: 1091.1194578587704
2020-08-27: 1112.819098961377
2020-08-28: 1139.5849208335192
2020-08-29: 1175.3756267801257
2020-08-30: 1217.960156059574
2020-08-31: 1258.598058083255

Predicting for Kyrgyz Republic__nan
2020-08-01: 413.0542056750752
2020-08-02: 483.3505084865683
2020-08-03: 462.17896348184297
2020-08-04: 303.9978528337415
2020-08-05: 305.6604297341451
2020-08-06: 304.6555672002778
2020-08-07: 465.74236787195247
2020-08-08: 527.9627185549538
2020-08-09: 507.28508893479034
2020-08-10: 450.69582364850896
2020-08-11: 442.3099494541587

2020-08-28: 1295.2403131319247
2020-08-29: 1311.0272071081865
2020-08-30: 1340.5020550288361
2020-08-31: 1381.3660201775795

Predicting for Lesotho__nan
2020-08-01: 54.415511721968386
2020-08-02: 63.40607245692691
2020-08-03: 69.82972434264796
2020-08-04: 73.25150122532878
2020-08-05: 78.02723796239162
2020-08-06: 94.56545016380582
2020-08-07: 122.30464664501986
2020-08-08: 137.233202416471
2020-08-09: 147.18446886366632
2020-08-10: 155.07493543220525
2020-08-11: 166.0200328466407
2020-08-12: 184.11952081725354
2020-08-13: 204.99925992590738
2020-08-14: 221.20361308149987
2020-08-15: 234.0380256176577
2020-08-16: 246.08581061933137
2020-08-17: 260.7425031420448
2020-08-18: 279.1401632884815
2020-08-19: 298.4335126219523
2020-08-20: 315.6586979038713
2020-08-21: 331.077092164933
2020-08-22: 356.59271126321426
2020-08-23: 379.4798765651132
2020-08-24: 400.697751380329
2020-08-25: 421.44568606337657
2020-08-26: 441.2506848036908
2020-08-27: 456.8838793345853
2020-08-28: 480.28114559025164

2020-08-17: 250.29092218895943
2020-08-18: 270.697795884893
2020-08-19: 291.16003175343224
2020-08-20: 307.9935744222694
2020-08-21: 321.88963847694197
2020-08-22: 346.1536105555813
2020-08-23: 369.1680977773871
2020-08-24: 391.334686061497
2020-08-25: 412.5971548708207
2020-08-26: 432.00785934796676
2020-08-27: 452.3496853179872
2020-08-28: 476.2015768930121
2020-08-29: 500.4781069212936
2020-08-30: 524.4364674770338
2020-08-31: 547.6277176789414

Predicting for Mexico__nan
2020-08-01: 9951.716220400098
2020-08-02: 9739.271929299563
2020-08-03: 8667.916996926071
2020-08-04: 8654.220405161524
2020-08-05: 7948.7083085214545
2020-08-06: 6361.116500457834
2020-08-07: 9241.892652258042
2020-08-08: 9723.448846012561
2020-08-09: 9565.264469892934
2020-08-10: 9310.002034775693
2020-08-11: 8731.63187688167
2020-08-12: 8743.09565711511
2020-08-13: 9856.1787754749
2020-08-14: 10320.385827242611
2020-08-15: 10368.652628895465
2020-08-16: 10186.731225008198
2020-08-17: 10000.935968847336
2020-08-1

2020-08-10: 1447.9507348339982
2020-08-11: 1260.7245500321317
2020-08-12: 1230.4344722781561
2020-08-13: 1397.429576058438
2020-08-14: 1552.722791996496
2020-08-15: 1629.4327831942521
2020-08-16: 1591.0786788886433
2020-08-17: 1520.5231850329626
2020-08-18: 1539.1059363336367
2020-08-19: 1645.3615220397983
2020-08-20: 1752.3261886360074
2020-08-21: 1805.6297105200122
2020-08-22: 1809.2473074449788
2020-08-23: 1803.37239270852
2020-08-24: 1842.9104233490052
2020-08-25: 1924.329046042198
2020-08-26: 2004.6430903734965
2020-08-27: 2043.5835319302803
2020-08-28: 2065.077222608609
2020-08-29: 2088.216652326979
2020-08-30: 2137.4845702800712
2020-08-31: 2207.1313886903954

Predicting for Namibia__nan
2020-08-01: 194.70845753857353
2020-08-02: 201.63586752494848
2020-08-03: 208.5725968114806
2020-08-04: 198.22650940411162
2020-08-05: 164.08589628211303
2020-08-06: 175.7086249790388
2020-08-07: 249.16130732879043
2020-08-08: 276.4614044568699
2020-08-09: 286.89315691194895
2020-08-10: 281.0593

2020-08-28: 806.8930238890825
2020-08-29: 832.7943458350567
2020-08-30: 862.2146878814381
2020-08-31: 897.8737677855568

Predicting for Pakistan__nan
2020-08-01: 2858.456851701124
2020-08-02: 3308.3175788617536
2020-08-03: 3538.229331346667
2020-08-04: 3203.6967882541894
2020-08-05: 2538.5056335470226
2020-08-06: 2027.427004172426
2020-08-07: 2972.694896065651
2020-08-08: 3440.607715134289
2020-08-09: 3584.6286117826576
2020-08-10: 3349.2843105292013
2020-08-11: 2974.104070064681
2020-08-12: 2935.8161605928362
2020-08-13: 3367.7455487662437
2020-08-14: 3688.7053743697697
2020-08-15: 3780.9073583466834
2020-08-16: 3650.6127434549817
2020-08-17: 3500.6560434163994
2020-08-18: 3565.2671968719897
2020-08-19: 3819.2247215918223
2020-08-20: 4032.320159129932
2020-08-21: 4104.917717698727
2020-08-22: 4071.9690998372707
2020-08-23: 4049.5538804855933
2020-08-24: 4143.409096887775
2020-08-25: 4322.267841413459
2020-08-26: 4477.0908719781655
2020-08-27: 4553.565320000916
2020-08-28: 4577.0884847

2020-08-24: 1299.6313739946388
2020-08-25: 1354.6846878784752
2020-08-26: 1403.9057648338357
2020-08-27: 1439.0032409667274
2020-08-28: 1465.833780934412
2020-08-29: 1495.9420159730269
2020-08-30: 1538.597208096489
2020-08-31: 1588.8333854075302

Predicting for Palestine__nan
2020-08-01: 1588.5202269303368
2020-08-02: 1664.8921595263175
2020-08-03: 1884.3816635964533
2020-08-04: 1730.5564038901964
2020-08-05: 1302.7656353633106
2020-08-06: 1098.4753782431628
2020-08-07: 1607.302706518412
2020-08-08: 1828.8065681034225
2020-08-09: 1934.7343213502888
2020-08-10: 1811.0152975612775
2020-08-11: 1604.4775362280654
2020-08-12: 1606.5225290439298
2020-08-13: 1839.8990849652323
2020-08-14: 2010.7430348430914
2020-08-15: 2070.841795864637
2020-08-16: 2005.1854786407014
2020-08-17: 1930.716598794995
2020-08-18: 1976.1145214635037
2020-08-19: 2117.0533799739787
2020-08-20: 2236.071685098984
2020-08-21: 2282.208869455735
2020-08-22: 2275.571460434594
2020-08-23: 2273.8796417525627
2020-08-24: 2333

2020-08-24: 450.1209045140656
2020-08-25: 472.8353630901429
2020-08-26: 493.32964331016126
2020-08-27: 497.0520785477481
2020-08-28: 505.96716030957407
2020-08-29: 519.3969173557884
2020-08-30: 538.3722514359673
2020-08-31: 558.9709482110019

Predicting for Singapore__nan
2020-08-01: 59.07788463078229
2020-08-02: 71.72004440058473
2020-08-03: 78.62519055844177
2020-08-04: 84.3438755246444
2020-08-05: 86.7407868510347
2020-08-06: 99.7873299575921
2020-08-07: 128.96688833039246
2020-08-08: 145.703786275123
2020-08-09: 156.8762226148213
2020-08-10: 165.2709807595287
2020-08-11: 174.7987571914743
2020-08-12: 191.72578477035668
2020-08-13: 213.22264281590816
2020-08-14: 230.53385612854146
2020-08-15: 244.21781954830726
2020-08-16: 256.34744893429314
2020-08-17: 270.3184075622671
2020-08-18: 288.3224872165624
2020-08-19: 308.0145632986728
2020-08-20: 325.9567829791325
2020-08-21: 341.9028165499832
2020-08-22: 357.3115653547381
2020-08-23: 374.22198329498474
2020-08-24: 393.32799753682235
202

2020-08-17: 281.28485421064585
2020-08-18: 298.158449733861
2020-08-19: 317.5215407288304
2020-08-20: 336.09091928156084
2020-08-21: 353.1386285340378
2020-08-22: 379.29437432667135
2020-08-23: 401.6609515226835
2020-08-24: 422.3264206625469
2020-08-25: 443.352735170287
2020-08-26: 464.15160495712564
2020-08-27: 486.4798620370075
2020-08-28: 511.45405007711025
2020-08-29: 535.5804233724314
2020-08-30: 559.0672743809625
2020-08-31: 582.5135573656228

Predicting for Seychelles__nan
2020-08-01: 52.73026053870302
2020-08-02: 62.894680023774825
2020-08-03: 69.57756146326574
2020-08-04: 73.1141532142525
2020-08-05: 77.86259854019443
2020-08-06: 94.03169177749264
2020-08-07: 121.37336293927953
2020-08-08: 136.64749959348393
2020-08-09: 146.80440928876862
2020-08-10: 154.78315504455094
2020-08-11: 165.644135263762
2020-08-12: 183.50968766556736
2020-08-13: 204.27193363773836
2020-08-14: 220.6071685587409
2020-08-15: 233.5650158688198
2020-08-16: 245.65276704472805
2020-08-17: 260.2330585726245

2020-08-28: 487.1585109894444
2020-08-29: 509.97295909313965
2020-08-30: 533.1026954218192
2020-08-31: 555.6113698882339

Predicting for Tunisia__nan
2020-08-01: 1531.0865830698683
2020-08-02: 1214.358375605594
2020-08-03: 1026.68494768263
2020-08-04: 873.2739398852773
2020-08-05: 888.4671339014095
2020-08-06: 865.3379880085189
2020-08-07: 1307.7762546171696
2020-08-08: 1278.8535838567523
2020-08-09: 1184.995271506924
2020-08-10: 1109.453855991461
2020-08-11: 1105.1205422209453
2020-08-12: 1203.6702326368707
2020-08-13: 1378.4529073953947
2020-08-14: 1400.4216310776615
2020-08-15: 1364.5848090626323
2020-08-16: 1335.061412437946
2020-08-17: 1358.5258571306365
2020-08-18: 1445.9310022435225
2020-08-19: 1540.6380991925305
2020-08-20: 1572.6945799289979
2020-08-21: 1570.2375920452298
2020-08-22: 1563.3376374377663
2020-08-23: 1595.8266941178083
2020-08-24: 1663.9199687801581
2020-08-25: 1729.2761227955823
2020-08-26: 1764.7021980807351
2020-08-27: 1768.531940009716
2020-08-28: 1784.108185

2020-08-09: 879.1866774517179
2020-08-10: 814.1807105017263
2020-08-11: 756.0766533257226
2020-08-12: 791.7990191098686
2020-08-13: 919.3378175832788
2020-08-14: 989.838603015422
2020-08-15: 995.5485707537528
2020-08-16: 967.0144057843235
2020-08-17: 955.4564868512664
2020-08-18: 1000.5175811462568
2020-08-19: 1078.1022977299542
2020-08-20: 1130.0760025688915
2020-08-21: 1145.4269803780805
2020-08-22: 1143.8344953580147
2020-08-23: 1157.2009507587613
2020-08-24: 1211.179455237872
2020-08-25: 1273.7388517540946
2020-08-26: 1318.9553934076564
2020-08-27: 1337.8463940560805
2020-08-28: 1354.092260707772
2020-08-29: 1383.503812835918
2020-08-30: 1432.3325019435986
2020-08-31: 1485.012004446094

Predicting for United States__Alabama
2020-08-01: 3199.079281564478
2020-08-02: 3039.9161201195075
2020-08-03: 2879.9127310488548
2020-08-04: 3178.427116756412
2020-08-05: 3037.5501445618966
2020-08-06: 2248.0360718704596
2020-08-07: 3081.0035384461326
2020-08-08: 3226.9539489801973
2020-08-09: 3313

2020-08-13: 9067.852795920991
2020-08-14: 9733.969437699123
2020-08-15: 9825.739308254564
2020-08-16: 9471.825536793925
2020-08-17: 9140.758402497895
2020-08-18: 9386.047382724142
2020-08-19: 10042.665027504574
2020-08-20: 10500.991123598307
2020-08-21: 10606.897829454554
2020-08-22: 10487.929878798024
2020-08-23: 10450.044373774035
2020-08-24: 10718.939650983248
2020-08-25: 11161.225388856043
2020-08-26: 11500.397591698027
2020-08-27: 11641.123564380341
2020-08-28: 11674.895559101058
2020-08-29: 11781.446607940345
2020-08-30: 12050.991056714263
2020-08-31: 12398.023438305789

Predicting for United States__Georgia
2020-08-01: 4879.769194471736
2020-08-02: 4255.990492988523
2020-08-03: 4099.795905581723
2020-08-04: 5005.060294452811
2020-08-05: 4214.08116063636
2020-08-06: 3249.4743413631686
2020-08-07: 4483.002429691956
2020-08-08: 4608.27170105547
2020-08-09: 4825.973689586734
2020-08-10: 4985.260128883183
2020-08-11: 4588.386781711135
2020-08-12: 4457.549330037372
2020-08-13: 4883.55

2020-08-17: 2357.500585785314
2020-08-18: 2376.377882742323
2020-08-19: 2521.0165252560346
2020-08-20: 2667.244399653988
2020-08-21: 2739.2906358328055
2020-08-22: 2740.7733801293125
2020-08-23: 2734.465265236512
2020-08-24: 2786.5086939076855
2020-08-25: 2896.9045656727913
2020-08-26: 3005.9238370730563
2020-08-27: 3067.534953907102
2020-08-28: 3097.4551096330097
2020-08-29: 3129.8473737755176
2020-08-30: 3196.6106831392826
2020-08-31: 3291.6451532368073

Predicting for United States__Massachusetts
2020-08-01: 4704.592847196339
2020-08-02: 4958.446704243892
2020-08-03: 4495.97132877967
2020-08-04: 3786.017246195084
2020-08-05: 3324.957593054808
2020-08-06: 2905.1687454539497
2020-08-07: 4449.269977442885
2020-08-08: 4846.3198613676
2020-08-09: 4666.1307583902435
2020-08-10: 4258.627201284833
2020-08-11: 3942.0670722093546
2020-08-12: 4092.675587193622
2020-08-13: 4761.187367894008
2020-08-14: 5049.040202881177
2020-08-15: 4988.7875262706275
2020-08-16: 4777.0488265557715
2020-08-17: 4

2020-08-17: 5606.921604611416
2020-08-18: 5748.82776768662
2020-08-19: 6185.955767693951
2020-08-20: 6514.538356780702
2020-08-21: 6588.970176643548
2020-08-22: 6498.644026448689
2020-08-23: 6458.717684429095
2020-08-24: 6626.157683102108
2020-08-25: 6922.293368973347
2020-08-26: 7156.52361913118
2020-08-27: 7244.784075803944
2020-08-28: 7258.189400670986
2020-08-29: 7317.841495654666
2020-08-30: 7489.919096550346
2020-08-31: 7720.33150499256

Predicting for United States__North Dakota
2020-08-01: 609.2182716010454
2020-08-02: 576.0356020633069
2020-08-03: 527.7912622143177
2020-08-04: 385.03085177297703
2020-08-05: 243.1002255008792
2020-08-06: 346.2553917994527
2020-08-07: 575.4506837288573
2020-08-08: 619.0937741457114
2020-08-09: 583.718752179627
2020-08-10: 496.8701172931677
2020-08-11: 457.62545238233895
2020-08-12: 539.0758486030551
2020-08-13: 652.7644466568726
2020-08-14: 693.9823721790306
2020-08-15: 676.8671734893378
2020-08-16: 639.071477477295
2020-08-17: 640.6889864874565

2020-08-12: 2960.019868545841
2020-08-13: 3504.868968037762
2020-08-14: 3697.881262406334
2020-08-15: 3599.6016033715223
2020-08-16: 3404.381537664827
2020-08-17: 3344.3616074190286
2020-08-18: 3544.243702226334
2020-08-19: 3840.8221350268177
2020-08-20: 3976.979115110473
2020-08-21: 3949.8362190409925
2020-08-22: 3879.7738142783546
2020-08-23: 3906.39825881648
2020-08-24: 4067.500642924899
2020-08-25: 4259.049423532047
2020-08-26: 4366.620571756543
2020-08-27: 4381.726584189202
2020-08-28: 4387.54288115916
2020-08-29: 4454.739221723245
2020-08-30: 4590.601653255048
2020-08-31: 4735.014246844425

Predicting for United States__Oregon
2020-08-01: 1641.7834727995148
2020-08-02: 1620.9711078092898
2020-08-03: 1512.2777140056376
2020-08-04: 1442.1403981885044
2020-08-05: 1173.3392691538543
2020-08-06: 1046.9094575913414
2020-08-07: 1564.7415720598208
2020-08-08: 1680.6416434465616
2020-08-09: 1668.5729704233663
2020-08-10: 1577.5627224830469
2020-08-11: 1451.2525671417648
2020-08-12: 1502.5

2020-08-17: 4036.116680616441
2020-08-18: 4044.668893970621
2020-08-19: 4298.1213588549
2020-08-20: 4557.78877168436
2020-08-21: 4684.04337075734
2020-08-22: 4658.618128764217
2020-08-23: 4608.227916849475
2020-08-24: 4677.341404807707
2020-08-25: 4862.034713205116
2020-08-26: 5048.023313638382
2020-08-27: 5148.169063370713
2020-08-28: 5175.613811511104
2020-08-29: 5203.619741740178
2020-08-30: 5299.504496425561
2020-08-31: 5452.4539357661

Predicting for United States__Virgin Islands
2020-08-01: 70.17926252413955
2020-08-02: 76.8831424507554
2020-08-03: 80.96490608574086
2020-08-04: 91.57977973923666
2020-08-05: 96.70305915543213
2020-08-06: 106.59641019493569
2020-08-07: 137.22713342804715
2020-08-08: 151.76049564013067
2020-08-09: 162.66533154420915
2020-08-10: 173.1630251481725
2020-08-11: 183.5488189678993
2020-08-12: 199.9778221587879
2020-08-13: 221.32241511236992
2020-08-14: 237.8436655447128
2020-08-15: 251.75913010165164
2020-08-16: 264.86542425359426
2020-08-17: 279.31509519

2020-08-01: 70.17926252413955
2020-08-02: 76.8831424507554
2020-08-03: 80.96490608574086
2020-08-04: 91.57977973923666
2020-08-05: 96.70305915543213
2020-08-06: 106.59641019493569
2020-08-07: 137.22713342804715
2020-08-08: 151.76049564013067
2020-08-09: 162.66533154420915
2020-08-10: 173.1630251481725
2020-08-11: 183.5488189678993
2020-08-12: 199.9778221587879
2020-08-13: 221.32241511236992
2020-08-14: 237.8436655447128
2020-08-15: 251.75913010165164
2020-08-16: 264.86542425359426
2020-08-17: 279.3150951938564
2020-08-18: 297.26287709076144
2020-08-19: 316.73187443883694
2020-08-20: 334.44414431298406
2020-08-21: 350.66900409388313
2020-08-22: 356.474393965176
2020-08-23: 368.0071014847192
2020-08-24: 385.1174954533407
2020-08-25: 403.77703685940395
2020-08-26: 421.73775604574007
2020-08-27: 425.4277029720101
2020-08-28: 435.31783678492354
2020-08-29: 448.5096387265121
2020-08-30: 465.3511303458928
2020-08-31: 483.16089891698994

Predicting for Vietnam__nan
2020-08-01: 54.5219015316691

In [22]:
# Check the predictions
preds_df.head()

Unnamed: 0,CountryName,RegionName,Date,PredictedDailyNewCases
213,Aruba,,2020-08-01,66.776012
214,Aruba,,2020-08-02,74.95696
215,Aruba,,2020-08-03,78.358586
216,Aruba,,2020-08-04,81.847136
217,Aruba,,2020-08-05,88.569769


# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [23]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip ../../../validation/data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [24]:
!head predictions/2020-08-01_2020-08-04.csv

CountryName,RegionName,Date,PredictedDailyNewCases
Aruba,,2020-08-01,66.77601228408992
Aruba,,2020-08-02,74.9569598505673
Aruba,,2020-08-03,78.35858631827678
Aruba,,2020-08-04,81.84713614514999
Afghanistan,,2020-08-01,157.7121761352117
Afghanistan,,2020-08-02,255.39593436154007
Afghanistan,,2020-08-03,286.8870248655943
Afghanistan,,2020-08-04,281.5833176459894
Angola,,2020-08-01,128.93381519331052


# Test cases
We can generate a prediction file. Let's validate a few cases...

In [25]:
import os
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

ModuleNotFoundError: No module named 'covid_xprize'

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [None]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="../../../validation/data/2020-09-30_historical_ip.csv",
         output_file="predictions/val_4_days.csv")

## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [None]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="../../../validation/data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [None]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

In [None]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

### Check it

In [None]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")