# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [2]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [3]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [4]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x7ff7d0cc20a0>)

In [5]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [6]:
df

Unnamed: 0,CountryName,CountryCode,RegionName,RegionCode,Jurisdiction,Date,C1_School closing,C1_Flag,C2_Workplace closing,C2_Flag,...,StringencyIndex,StringencyIndexForDisplay,StringencyLegacyIndex,StringencyLegacyIndexForDisplay,GovernmentResponseIndex,GovernmentResponseIndexForDisplay,ContainmentHealthIndex,ContainmentHealthIndexForDisplay,EconomicSupportIndex,EconomicSupportIndexForDisplay
0,Aruba,ABW,,,NAT_TOTAL,2020-01-01,0.0,,0.0,,...,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.0
1,Aruba,ABW,,,NAT_TOTAL,2020-01-02,0.0,,0.0,,...,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.0
2,Aruba,ABW,,,NAT_TOTAL,2020-01-03,0.0,,0.0,,...,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.0
3,Aruba,ABW,,,NAT_TOTAL,2020-01-04,0.0,,0.0,,...,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.0
4,Aruba,ABW,,,NAT_TOTAL,2020-01-05,0.0,,0.0,,...,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88240,Zimbabwe,ZWE,,,NAT_TOTAL,2020-11-24,,,,,...,,67.59,,73.81,,58.33,,63.89,,25.0
88241,Zimbabwe,ZWE,,,NAT_TOTAL,2020-11-25,,,,,...,,67.59,,73.81,,58.33,,63.89,,25.0
88242,Zimbabwe,ZWE,,,NAT_TOTAL,2020-11-26,,,,,...,,67.59,,73.81,,58.33,,63.89,,25.0
88243,Zimbabwe,ZWE,,,NAT_TOTAL,2020-11-27,,,,,...,,67.59,,73.81,,58.33,,63.89,,25.0


In [7]:
# For testing, restrict training data to that before a hypothetical predictor submission date
HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [8]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)


In [9]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)


In [10]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [11]:
df

Unnamed: 0,CountryName,RegionName,GeoID,Date,NewCases,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
0,Aruba,,Aruba__nan,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,,Aruba__nan,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,,Aruba__nan,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,,Aruba__nan,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,,Aruba__nan,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88120,Zimbabwe,,Zimbabwe__nan,2020-07-27,78.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
88121,Zimbabwe,,Zimbabwe__nan,2020-07-28,192.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
88122,Zimbabwe,,Zimbabwe__nan,2020-07-29,113.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0
88123,Zimbabwe,,Zimbabwe__nan,2020-07-30,62.0,3.0,1.0,2.0,3.0,1.0,2.0,2.0,4.0,2.0,1.0,1.0,4.0


In [12]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [13]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [15]:
# Set number of past days to use to make predictions
nb_lookback_days = 30

# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols                     # sceglie le colonne "NewCases" + tutti i NPIs
y_col = cases_col                                 # L'output è solo il numero di Nuovi Casi
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()

for g in geo_ids:
    
    gdf = df[df.GeoID == g]                   # Dataframe per il GeoID
    all_case_data = np.array(gdf[cases_col])  # nuovi casi in ordine cronologico shape = (len(gdf), 1)
    all_npi_data = np.array(gdf[npi_cols])    # npi in ordine cronologico shape = (len(gdf), 12)

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)  # questo è il numero totale di giorni che abbiamo a disposizione
    
    for d in range(nb_lookback_days, nb_total_days - 1): # questo for va da 30 alla fine
        X_cases = all_case_data[d - nb_lookback_days: d] # prende i range 0-30, 1-31, 2-32 etc...
        # X_cases è (30 x 1)
        
        
        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d] # Higher NPI means LOWER NewCases, stessi range sopra
        # X_npis è (30 x 12)

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(), X_npis.flatten()]) # 1 sample è (30 NC + 30 x 12 PIs,)
        # X_sample è un vettore con in fila tutti i NC e NPI, 
        # come se lo spazio fosse composto da 390 variabili
        
        y_sample = all_case_data[d + 1] # L'output è semplicemente il giorno dopo
        
        X_samples.append(X_sample)
        y_samples.append(y_sample)

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [19]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [23]:
# Split data into train and test sets
# Scelta Randomica di test e train set.
X_train, X_test, y_train, y_test = train_test_split(X_samples,
                                                    y_samples,
                                                    test_size=0.2,
                                                    random_state=42)

In [24]:
# Create and train Lasso model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.
model = Lasso(alpha=0.1,
              precompute=True,
              max_iter=10000,
              positive=True,
              selection='random')
# Fit model
model.fit(X_train, y_train)

Lasso(alpha=0.1, max_iter=10000, positive=True, precompute=True,
      selection='random')

In [25]:
# Evaluate model
train_preds = model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))

test_preds = model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))

Train MAE: 259.84015150381117
Test MAE: 254.17028978654804


In [26]:
# Inspect the learned feature coefficients for the model
# to see what features it's paying attention to.

# Questo rientra nel contesto dell'explainability 

# Give names to the features
x_col_names = []
for d in range(-nb_lookback_days, 0):
    x_col_names.append('Day ' + str(d) + ' ' + cases_col[0])
for d in range(-nb_lookback_days, 1):
    for col_name in npi_cols:
        x_col_names.append('Day ' + str(d) + ' ' + col_name)

# View non-zero coefficients
for (col, coeff) in zip(x_col_names, list(model.coef_)):
    if coeff != 0.:
        print(col, coeff)
print('Intercept', model.intercept_)

Day -6 NewCases 0.5247728165951634
Day -5 NewCases 0.1210331714268772
Day -3 NewCases 0.09711381862030334
Day -2 NewCases 0.07841775054352185
Day -1 NewCases 0.21972407921421103
Day -26 C4_Restrictions on gatherings 0.15560632624736404
Day -23 C6_Stay at home requirements 20.411046334500515
Day -22 C7_Restrictions on internal movement 2.4120275597048355
Day -17 C6_Stay at home requirements 5.475021870227211
Day -14 C6_Stay at home requirements 9.073259701124135
Day -10 C6_Stay at home requirements 1.2387309022515984
Intercept 44.05469448823396


In [27]:
# Save model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [16]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [17]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="../../covid_xprize/validation/data/2020-09-30_historical_ip.csv", verbose=True)


Predicting for Aruba__nan
2020-08-01: 133.51810815377561
2020-08-02: 158.51751227595548
2020-08-03: 179.30764030240323
2020-08-04: 202.2792297129373
2020-08-05: 204.52205696851195
2020-08-06: 225.03689663720843
2020-08-07: 296.1225300942008
2020-08-08: 329.2035953935809
2020-08-09: 357.72937025706744
2020-08-10: 385.82103208556987
2020-08-11: 399.86422226719947
2020-08-12: 427.29233100993315
2020-08-13: 478.4559852201127
2020-08-14: 514.0251192563574
2020-08-15: 537.8126159725089
2020-08-16: 567.2386834453145
2020-08-17: 589.7131108934607
2020-08-18: 614.3799077698756
2020-08-19: 655.5742088497011
2020-08-20: 690.2872717382089
2020-08-21: 719.5849807021218
2020-08-22: 750.9072005558282
2020-08-23: 775.8254666867851
2020-08-24: 784.1213352284149
2020-08-25: 816.7590735567652
2020-08-26: 848.7632883019012
2020-08-27: 877.859256292197
2020-08-28: 909.3846013070637
2020-08-29: 935.7816745594394
2020-08-30: 955.8056349566982
2020-08-31: 986.3378964398028

Predicting for Afghanistan__nan
20

2020-08-24: 5247.666874435325
2020-08-25: 5297.246959053889
2020-08-26: 5357.229301752524
2020-08-27: 5455.269162440245
2020-08-28: 5546.096433738701
2020-08-29: 5643.884083384859
2020-08-30: 5742.075524410449
2020-08-31: 5813.4175528621

Predicting for Burundi__nan
2020-08-01: 125.44606044719147
2020-08-02: 151.83250577591548
2020-08-03: 169.66309695316394
2020-08-04: 183.59379616619927
2020-08-05: 192.23038812093597
2020-08-06: 210.03610950155473
2020-08-07: 285.00289837919456
2020-08-08: 319.71490631273014
2020-08-09: 345.9929650990085
2020-08-10: 370.125002463733
2020-08-11: 385.06894386394106
2020-08-12: 411.214257118737
2020-08-13: 464.0162739250537
2020-08-14: 500.5161032825843
2020-08-15: 513.7799498078782
2020-08-16: 539.1569261081926
2020-08-17: 560.3242420757847
2020-08-18: 577.4144207577635
2020-08-19: 617.4206460940884
2020-08-20: 650.3662730386014
2020-08-21: 672.4340733172579
2020-08-22: 699.6306743919355
2020-08-23: 718.8888608443381
2020-08-24: 700.3845816255734
2020-0

2020-08-13: 2251.6190945173535
2020-08-14: 2293.1716260153794
2020-08-15: 2315.132576919909
2020-08-16: 2357.019482691742
2020-08-17: 2402.698492635403
2020-08-18: 2445.935946880911
2020-08-19: 2504.050387206253
2020-08-20: 2549.109818609777
2020-08-21: 2584.360833766545
2020-08-22: 2628.793290043104
2020-08-23: 2670.0765831468843
2020-08-24: 2674.956693671583
2020-08-25: 2719.531884362851
2020-08-26: 2761.630499513812
2020-08-27: 2798.2597479468027
2020-08-28: 2842.25181463573
2020-08-29: 2881.133676777949
2020-08-30: 2904.639949276971
2020-08-31: 2945.6132863825974

Predicting for Belize__nan
2020-08-01: 199.61887392022737
2020-08-02: 225.22716745868598
2020-08-03: 249.65675749874353
2020-08-04: 270.95990594373416
2020-08-05: 279.7217811295276
2020-08-06: 290.64063255617395
2020-08-07: 365.86605266832265
2020-08-08: 400.4973468485093
2020-08-09: 430.4644280513837
2020-08-10: 459.309844121057
2020-08-11: 476.0418111605606
2020-08-12: 499.72512512718754
2020-08-13: 552.7100821599212
20

2020-08-24: 7947.476513865948
2020-08-25: 8070.609684642011
2020-08-26: 8205.591283723785
2020-08-27: 8260.55201117233
2020-08-28: 8345.423817881026
2020-08-29: 8456.888936078702
2020-08-30: 8547.955601550644
2020-08-31: 8665.902401709976

Predicting for Switzerland__nan
2020-08-01: 3060.5638667715857
2020-08-02: 7159.810107651998
2020-08-03: 5155.5600925819035
2020-08-04: 5205.443271794007
2020-08-05: 5229.629257227523
2020-08-06: 4804.021437477777
2020-08-07: 4565.592354219822
2020-08-08: 6390.769589094763
2020-08-09: 5686.037097258264
2020-08-10: 5680.266576730932
2020-08-11: 5759.687070299462
2020-08-12: 5456.040648050168
2020-08-13: 5490.775684257189
2020-08-14: 6354.816755954178
2020-08-15: 6129.233566175817
2020-08-16: 6157.38121864377
2020-08-17: 6234.7129639068335
2020-08-18: 6065.913300640985
2020-08-19: 6160.427291519178
2020-08-20: 6601.589600303872
2020-08-21: 6574.569250833503
2020-08-22: 6636.536629120101
2020-08-23: 6706.20361974216
2020-08-24: 6605.782188861873
2020-08

2020-08-09: 407.2292383805601
2020-08-10: 438.79713584695827
2020-08-11: 426.0363425197605
2020-08-12: 449.5593279771632
2020-08-13: 507.4428583643393
2020-08-14: 545.1182133554242
2020-08-15: 581.4680869996369
2020-08-16: 613.0522319572324
2020-08-17: 622.6518461156238
2020-08-18: 644.6429738528832
2020-08-19: 688.2306621938652
2020-08-20: 724.6352194799572
2020-08-21: 761.0860229764445
2020-08-22: 793.9192450180727
2020-08-23: 810.4024874617479
2020-08-24: 816.5436927284322
2020-08-25: 849.6539947673837
2020-08-26: 882.527333927146
2020-08-27: 916.2011355409281
2020-08-28: 948.6184032876313
2020-08-29: 970.9675839789211
2020-08-30: 988.9207002529104
2020-08-31: 1019.1203202848535

Predicting for Costa Rica__nan
2020-08-01: 991.7892213453046
2020-08-02: 2182.373208746918
2020-08-03: 1440.8820675539298
2020-08-04: 1551.6417897350407
2020-08-05: 1592.3922726915512
2020-08-06: 1562.2419006960758
2020-08-07: 1525.1651025280553
2020-08-08: 2053.6526359092923
2020-08-09: 1788.2297814884118


2020-08-23: 1862.0238454750206
2020-08-24: 1913.1275639945284
2020-08-25: 1979.2412711925665
2020-08-26: 2000.874876083661
2020-08-27: 2015.4202312386233
2020-08-28: 2070.011941096191
2020-08-29: 2137.6218690256405
2020-08-30: 2192.990706216255
2020-08-31: 2253.0730610082687

Predicting for Algeria__nan
2020-08-01: 998.957771367674
2020-08-02: 1111.1325197980552
2020-08-03: 1162.8468792336785
2020-08-04: 1230.6053905937972
2020-08-05: 1160.6079870194599
2020-08-06: 707.0920421194402
2020-08-07: 1146.337666464125
2020-08-08: 1265.6145030483103
2020-08-09: 1317.5638023016395
2020-08-10: 1408.074372463647
2020-08-11: 1351.9956926830864
2020-08-12: 1166.9869377162254
2020-08-13: 1375.6689398306662
2020-08-14: 1470.448245392684
2020-08-15: 1527.887368991251
2020-08-16: 1608.9165199500087
2020-08-17: 1588.6084395942698
2020-08-18: 1524.2484008118915
2020-08-19: 1637.3655688021754
2020-08-20: 1711.8906121115867
2020-08-21: 1770.8354091522338
2020-08-22: 1840.680271200887
2020-08-23: 1849.4397

2020-08-12: 14305.783286876349
2020-08-13: 14211.752000433582
2020-08-14: 13232.791576009895
2020-08-15: 13851.268421515702
2020-08-16: 14858.626489718283
2020-08-17: 14921.690141874999
2020-08-18: 15106.531527835803
2020-08-19: 15082.087581726995
2020-08-20: 14658.460250752996
2020-08-21: 15027.896517058209
2020-08-22: 15609.743737780762
2020-08-23: 15778.47387774854
2020-08-24: 15950.271582900703
2020-08-25: 15993.656041140417
2020-08-26: 15855.452604363632
2020-08-27: 16109.464948706205
2020-08-28: 16484.412753757177
2020-08-29: 16682.633640030992
2020-08-30: 16875.663938116337
2020-08-31: 16976.073775859444

Predicting for Gabon__nan
2020-08-01: 130.22996771382608
2020-08-02: 161.73854896970101
2020-08-03: 172.02369274786014
2020-08-04: 196.939511126199
2020-08-05: 196.3893951654091
2020-08-06: 222.25065308025302
2020-08-07: 293.0183453536293
2020-08-08: 328.3219596893982
2020-08-09: 352.55294697850525
2020-08-10: 380.52659111172414
2020-08-11: 393.60433588513985
2020-08-12: 423.16

2020-08-20: 680.6949324037101
2020-08-21: 709.1245210722304
2020-08-22: 742.1997217536941
2020-08-23: 782.940673717447
2020-08-24: 769.5001846627387
2020-08-25: 801.2916340121612
2020-08-26: 846.1215051943236
2020-08-27: 876.2373421699156
2020-08-28: 902.672076563261
2020-08-29: 996.1817785076973
2020-08-30: 1018.5203446433836
2020-08-31: 1049.9628320324896

Predicting for Greece__nan
2020-08-01: 1878.136619603678
2020-08-02: 1871.5163603141573
2020-08-03: 2251.2538427200143
2020-08-04: 2318.5807544089134
2020-08-05: 2288.2363266436078
2020-08-06: 2291.8598317818955
2020-08-07: 2242.0336580025128
2020-08-08: 2270.909631407248
2020-08-09: 2481.12371188278
2020-08-10: 2556.397036781396
2020-08-11: 2575.501072741263
2020-08-12: 2601.8870435138174
2020-08-13: 2593.840381278263
2020-08-14: 2636.59293291889
2020-08-15: 2758.270083824323
2020-08-16: 2829.3902173820447
2020-08-17: 2871.9293917365126
2020-08-18: 2906.0675915882116
2020-08-19: 2924.762957536204
2020-08-20: 2972.841310512545
2020

2020-08-24: 7583.758591306589
2020-08-25: 7592.037184847955
2020-08-26: 7619.223741581274
2020-08-27: 7708.331936340583
2020-08-28: 7872.104617040666
2020-08-29: 8019.762731570694
2020-08-30: 8109.647955272481
2020-08-31: 8164.516366916492

Predicting for Indonesia__nan
2020-08-01: 5150.940399363371
2020-08-02: 5026.466524800851
2020-08-03: 5065.727127129732
2020-08-04: 5628.408475430177
2020-08-05: 5529.510102749557
2020-08-06: 5951.838066704657
2020-08-07: 5721.15449476598
2020-08-08: 5633.412699095852
2020-08-09: 5725.763913277577
2020-08-10: 6000.082514267629
2020-08-11: 6058.294465198918
2020-08-12: 6295.270930065653
2020-08-13: 6246.869207126305
2020-08-14: 6225.603580362562
2020-08-15: 6321.814202335336
2020-08-16: 6487.586435823932
2020-08-17: 6588.720084065677
2020-08-18: 6751.785041252537
2020-08-19: 6783.670031912659
2020-08-20: 6813.769647881882
2020-08-21: 6909.272126437327
2020-08-22: 7034.946426042556
2020-08-23: 7143.3687047872045
2020-08-24: 7275.752594818505
2020-08-2

2020-08-25: 7116.070085879488
2020-08-26: 7203.879386900866
2020-08-27: 7287.088909243079
2020-08-28: 7389.428107342417
2020-08-29: 7465.723116850422
2020-08-30: 7540.481231302825
2020-08-31: 7646.367353433679

Predicting for Japan__nan
2020-08-01: 1795.4052894870374
2020-08-02: 2693.5373376789566
2020-08-03: 2101.8141974901387
2020-08-04: 2215.6468620499522
2020-08-05: 2517.1177544906345
2020-08-06: 2622.3174573001384
2020-08-07: 2378.671871593255
2020-08-08: 2762.36073957949
2020-08-09: 2541.033938381294
2020-08-10: 2595.0540417072607
2020-08-11: 2796.5267525264057
2020-08-12: 2849.2541489635655
2020-08-13: 2800.4652631608574
2020-08-14: 2988.0073215465823
2020-08-15: 2911.8281476495195
2020-08-16: 2957.791463075608
2020-08-17: 3092.2390021572132
2020-08-18: 3134.2764621020337
2020-08-19: 3155.6155800204037
2020-08-20: 3265.854262223705
2020-08-21: 3261.4184652341532
2020-08-22: 3311.553717075136
2020-08-23: 3406.157806593002
2020-08-24: 3434.677130580599
2020-08-25: 3477.77170476422


Predicting for Liberia__nan
2020-08-01: 127.8077271961215
2020-08-02: 151.52032265074203
2020-08-03: 171.18679352864274
2020-08-04: 196.24387368368804
2020-08-05: 195.05755496589632
2020-08-06: 221.00424664792885
2020-08-07: 290.0646136103255
2020-08-08: 321.9823332529289
2020-08-09: 350.2839329436698
2020-08-10: 378.71779821577303
2020-08-11: 392.80226506926016
2020-08-12: 422.84987146146193
2020-08-13: 473.4219536705546
2020-08-14: 508.43290393292773
2020-08-15: 541.302790539694
2020-08-16: 572.8078313477758
2020-08-17: 596.7357795081459
2020-08-18: 629.5450670559255
2020-08-19: 672.4663486074265
2020-08-20: 709.1448942203763
2020-08-21: 744.81845741252
2020-08-22: 779.1303612823128
2020-08-23: 809.5566958471268
2020-08-24: 844.8094735217736
2020-08-25: 885.2367248693802
2020-08-26: 923.4044146879232
2020-08-27: 961.2579159347126
2020-08-28: 998.1828609786676
2020-08-29: 1033.2048258244631
2020-08-30: 1070.864401861792
2020-08-31: 1111.3060627502362

Predicting for Libya__nan
2020-0

2020-08-10: 1865.064489414513
2020-08-11: 1918.8454685624436
2020-08-12: 1887.7248826241698
2020-08-13: 1908.590170936619
2020-08-14: 1874.2762752184317
2020-08-15: 1957.0896561951774
2020-08-16: 2074.4435345034153
2020-08-17: 2127.8468656570108
2020-08-18: 2132.069924975524
2020-08-19: 2155.3787016427727
2020-08-20: 2158.0337317929457
2020-08-21: 2218.5169683566514
2020-08-22: 2302.3260974124923
2020-08-23: 2349.453472245455
2020-08-24: 2336.46956195354
2020-08-25: 2358.0044906036946
2020-08-26: 2375.008548855041
2020-08-27: 2421.2118028697396
2020-08-28: 2484.4732578160115
2020-08-29: 2526.8074884199636
2020-08-30: 2541.34997121838
2020-08-31: 2567.367621717217

Predicting for Madagascar__nan
2020-08-01: 138.00964351704272
2020-08-02: 152.06572804705635
2020-08-03: 165.97659407569057
2020-08-04: 183.53803632356454
2020-08-05: 189.85260691678937
2020-08-06: 210.67188108611364
2020-08-07: 291.57197190586135
2020-08-08: 320.6534298829942
2020-08-09: 344.83473021277837
2020-08-10: 370.26

2020-08-01: 1336.1141928275756
2020-08-02: 1846.5774182567152
2020-08-03: 2005.5586024418208
2020-08-04: 1459.166449538126
2020-08-05: 1403.8436400281391
2020-08-06: 1483.0791341102029
2020-08-07: 1624.0548991096457
2020-08-08: 1942.9907506822938
2020-08-09: 2049.1159294722374
2020-08-10: 1817.7074694513194
2020-08-11: 1784.2373896540303
2020-08-12: 1827.6862749977845
2020-08-13: 1924.7175006089187
2020-08-14: 2126.407886005138
2020-08-15: 2192.08956760979
2020-08-16: 2106.2727345402427
2020-08-17: 2099.8488114823663
2020-08-18: 2121.681103971939
2020-08-19: 2192.971010931647
2020-08-20: 2323.514601517002
2020-08-21: 2383.990060774983
2020-08-22: 2368.6262962091405
2020-08-23: 2377.1177195263435
2020-08-24: 2362.9150451037017
2020-08-25: 2417.0033851813428
2020-08-26: 2504.4240530124703
2020-08-27: 2556.2154343806637
2020-08-28: 2572.668563222412
2020-08-29: 2591.5719037141657
2020-08-30: 2601.138604844106
2020-08-31: 2645.285714293553

Predicting for Namibia__nan
2020-08-01: 183.66611

2020-08-01: 533.3870091100966
2020-08-02: 410.2425774359942
2020-08-03: 370.7332063084533
2020-08-04: 287.17024794970087
2020-08-05: 253.75219185699854
2020-08-06: 300.57723643865467
2020-08-07: 565.1325814726672
2020-08-08: 554.28352594235
2020-08-09: 546.3455873113553
2020-08-10: 521.5463899844888
2020-08-11: 503.7905678134953
2020-08-12: 553.7660882939033
2020-08-13: 698.4645550908648
2020-08-14: 725.7988813609392
2020-08-15: 749.9112527227418
2020-08-16: 756.2420502565969
2020-08-17: 758.9093805671952
2020-08-18: 811.5476794749217
2020-08-19: 903.1797373754899
2020-08-20: 944.9630267119805
2020-08-21: 979.8610610963301
2020-08-22: 1003.3493560785244
2020-08-23: 1023.075382897794
2020-08-24: 1091.7653599003893
2020-08-25: 1163.8292876533937
2020-08-26: 1213.1162053053604
2020-08-27: 1257.5795739379923
2020-08-28: 1292.926141392334
2020-08-29: 1327.631239926631
2020-08-30: 1387.1153499968918
2020-08-31: 1450.122119505146

Predicting for Pakistan__nan
2020-08-01: 3159.7801201418374
20

2020-08-02: 858.1666614377541
2020-08-03: 1016.7545501494224
2020-08-04: 1091.892302675058
2020-08-05: 971.2874775370294
2020-08-06: 1106.6749761603955
2020-08-07: 1105.7428259377984
2020-08-08: 1119.2119724125525
2020-08-09: 1227.5631415998287
2020-08-10: 1277.1692635316022
2020-08-11: 1250.9698627478826
2020-08-12: 1330.5604777082144
2020-08-13: 1351.9524405636082
2020-08-14: 1380.532090605724
2020-08-15: 1459.0823205428358
2020-08-16: 1503.5212523162895
2020-08-17: 1518.1051381703276
2020-08-18: 1576.7788112046069
2020-08-19: 1609.8151061519557
2020-08-20: 1647.596336836498
2020-08-21: 1710.7860495927996
2020-08-22: 1755.9268282723174
2020-08-23: 1789.224306601606
2020-08-24: 1841.0058332006333
2020-08-25: 1881.2877199148766
2020-08-26: 1924.9074706659717
2020-08-27: 1981.4586784345815
2020-08-28: 2028.9355840368721
2020-08-29: 2071.7789963446967
2020-08-30: 2122.456643599579
2020-08-31: 2167.9803665165587

Predicting for Palestine__nan
2020-08-01: 1387.347665378654
2020-08-02: 1590

2020-08-11: 407.1634791017587
2020-08-12: 432.05708614431035
2020-08-13: 483.9346502993825
2020-08-14: 519.4345424023537
2020-08-15: 531.5139057761137
2020-08-16: 559.0811937743235
2020-08-17: 582.0472674298503
2020-08-18: 598.8206552921779
2020-08-19: 638.5048849232138
2020-08-20: 670.8615022204241
2020-08-21: 692.3873837631545
2020-08-22: 720.7545785849873
2020-08-23: 741.0758725870033
2020-08-24: 722.6391358542108
2020-08-25: 747.6779665943642
2020-08-26: 773.2925305666627
2020-08-27: 793.6676566402091
2020-08-28: 819.930698796536
2020-08-29: 838.2192271978352
2020-08-30: 839.6312830764051
2020-08-31: 860.1661049685347

Predicting for Singapore__nan
2020-08-01: 130.59490765878556
2020-08-02: 156.03823330191943
2020-08-03: 176.94960040129234
2020-08-04: 189.81908722082207
2020-08-05: 195.58696543791268
2020-08-06: 214.69169643065106
2020-08-07: 290.10462755352796
2020-08-08: 324.615889853739
2020-08-09: 352.4992428021385
2020-08-10: 376.1074815662604
2020-08-11: 390.93325135059865
20

2020-08-28: 2653.8161016504964
2020-08-29: 2716.0639684586945
2020-08-30: 2743.080851427793
2020-08-31: 2759.420024652336

Predicting for Slovenia__nan
2020-08-01: 1422.5124172478222
2020-08-02: 1135.677266372927
2020-08-03: 1590.9808221359353
2020-08-04: 2080.5312554921206
2020-08-05: 1934.9832682575948
2020-08-06: 1876.8933757183427
2020-08-07: 1771.87619428866
2020-08-08: 1634.6949345550775
2020-08-09: 1888.8591370368522
2020-08-10: 2163.035692369878
2020-08-11: 2143.9997796467937
2020-08-12: 2142.805737186008
2020-08-13: 2095.963311099404
2020-08-14: 2042.501921295163
2020-08-15: 2175.3823339748683
2020-08-16: 2337.414417795024
2020-08-17: 2368.1110632745667
2020-08-18: 2383.2204391094224
2020-08-19: 2373.630815658242
2020-08-20: 2363.7175098623316
2020-08-21: 2451.5979357689635
2020-08-22: 2557.944062698453
2020-08-23: 2600.3529795159316
2020-08-24: 2592.4913020144877
2020-08-25: 2598.184979957454
2020-08-26: 2608.372228140883
2020-08-27: 2669.282287399242
2020-08-28: 2744.9579322

2020-08-17: 556.6746617705826
2020-08-18: 574.0094257057426
2020-08-19: 613.8042241353328
2020-08-20: 646.4712589928947
2020-08-21: 668.3418813463734
2020-08-22: 695.7117443224271
2020-08-23: 715.0012994806132
2020-08-24: 696.6011128523044
2020-08-25: 721.5658075088947
2020-08-26: 747.2713758476397
2020-08-27: 767.4132214246156
2020-08-28: 792.9766956920264
2020-08-29: 810.56505190883
2020-08-30: 811.7559573748033
2020-08-31: 832.0914535050813

Predicting for Trinidad and Tobago__nan
2020-08-01: 201.11517221240305
2020-08-02: 188.38559373265554
2020-08-03: 189.0969020128576
2020-08-04: 213.57548046550292
2020-08-05: 238.8890695163423
2020-08-06: 242.08137020995747
2020-08-07: 342.7477475419903
2020-08-08: 360.9811620883943
2020-08-09: 376.5275249186328
2020-08-10: 407.0588313633398
2020-08-11: 427.9499177714136
2020-08-12: 450.30335905981997
2020-08-13: 514.8520184804822
2020-08-14: 544.2666589003177
2020-08-15: 551.6694424519683
2020-08-16: 580.4217231092498
2020-08-17: 603.8449513379

2020-08-01: 175936.81584631425
2020-08-02: 177288.4238030577
2020-08-03: 185012.66774873895
2020-08-04: 182519.3004305578
2020-08-05: 152789.1026280044
2020-08-06: 196374.5076132336
2020-08-07: 186761.11210964492
2020-08-08: 186823.65086903737
2020-08-09: 194067.96950150316
2020-08-10: 189824.23982656896
2020-08-11: 179138.38820403247
2020-08-12: 198870.08499688818
2020-08-13: 196918.24929415097
2020-08-14: 197908.57691204455
2020-08-15: 203168.25376631477
2020-08-16: 200691.70398011137
2020-08-17: 197436.71826966258
2020-08-18: 207151.0475873746
2020-08-19: 207885.35596077517
2020-08-20: 209648.6653369645
2020-08-21: 213497.47605461802
2020-08-22: 212859.1514403742
2020-08-23: 212659.58100430522
2020-08-24: 218105.7278818663
2020-08-25: 219823.50148368947
2020-08-26: 221999.801842037
2020-08-27: 225084.23589092266
2020-08-28: 225740.31040629867
2020-08-29: 226892.12428196566
2020-08-30: 230562.09200502795
2020-08-31: 232687.35426403573

Predicting for United States__Alaska
2020-08-01:

2020-08-17: 729.5203469308835
2020-08-18: 735.9068680094724
2020-08-19: 786.2002010431311
2020-08-20: 821.9032321895236
2020-08-21: 853.8214058107062
2020-08-22: 896.6810371189711
2020-08-23: 919.2395661553794
2020-08-24: 919.6844576952358
2020-08-25: 956.4272877640228
2020-08-26: 989.3253425547879
2020-08-27: 1021.4155595257193
2020-08-28: 1059.8364806421578
2020-08-29: 1085.8817299669438
2020-08-30: 1102.4143523477733
2020-08-31: 1135.0839741362952

Predicting for United States__Delaware
2020-08-01: 491.1125185922974
2020-08-02: 564.8292646053619
2020-08-03: 528.7433896593127
2020-08-04: 707.9037222651825
2020-08-05: 683.7415083435286
2020-08-06: 438.2775723380577
2020-08-07: 666.4921038037671
2020-08-08: 729.3580735790842
2020-08-09: 739.9768279561413
2020-08-10: 860.4966595189142
2020-08-11: 850.2879391513098
2020-08-12: 757.3356867810342
2020-08-13: 875.1850850554085
2020-08-14: 927.0744947660763
2020-08-15: 949.7764554502019
2020-08-16: 1032.2884163954195
2020-08-17: 1040.6301316

2020-08-25: 3551.166577548121
2020-08-26: 3679.723835532931
2020-08-27: 3683.7918646269877
2020-08-28: 3778.253978355659
2020-08-29: 3886.4727900995704
2020-08-30: 3837.7913678212863
2020-08-31: 3907.155838863653

Predicting for United States__Louisiana
2020-08-01: 1411.498061302116
2020-08-02: 2765.2690045963377
2020-08-03: 1488.1188033760018
2020-08-04: 1403.4480246951287
2020-08-05: 3420.3260343831034
2020-08-06: 1298.6798981999589
2020-08-07: 1887.0064924064798
2020-08-08: 2601.612671054048
2020-08-09: 1918.2612645346157
2020-08-10: 2080.961176052711
2020-08-11: 2932.8959814501086
2020-08-12: 2024.3070834733257
2020-08-13: 2302.5043012693795
2020-08-14: 2667.4137826042293
2020-08-15: 2333.186829232873
2020-08-16: 2503.874156656103
2020-08-17: 2887.7095313749555
2020-08-18: 2504.3675807533805
2020-08-19: 2656.9702178109887
2020-08-20: 2848.757625407416
2020-08-21: 2710.9025006669763
2020-08-22: 2846.5006544979365
2020-08-23: 3036.7270038392003
2020-08-24: 2872.661477561764
2020-08-2

2020-08-10: 3323.4698323151574
2020-08-11: 4589.098396897276
2020-08-12: 3180.220846343904
2020-08-13: 3488.4304582283216
2020-08-14: 3948.428767029156
2020-08-15: 3742.6384109532137
2020-08-16: 3881.6789368988907
2020-08-17: 4434.410612458716
2020-08-18: 3839.2651533677113
2020-08-19: 3982.7591767349872
2020-08-20: 4237.783333123498
2020-08-21: 4156.109242451357
2020-08-22: 4311.960707381879
2020-08-23: 4580.180886478136
2020-08-24: 4328.045488371419
2020-08-25: 4414.981969422375
2020-08-26: 4564.304473372904
2020-08-27: 4555.4486094912845
2020-08-28: 4687.905182610622
2020-08-29: 4841.053814805171
2020-08-30: 4762.439571733516
2020-08-31: 4833.733942658484

Predicting for United States__North Dakota
2020-08-01: 773.5145254434788
2020-08-02: 1025.6460032628088
2020-08-03: 1089.614379695767
2020-08-04: 1118.2012048586903
2020-08-05: 967.0574499716893
2020-08-06: 621.3518464799686
2020-08-07: 972.7514921316838
2020-08-08: 1148.2288939731677
2020-08-09: 1217.797502180208
2020-08-10: 1277

2020-08-24: 1942.1512835592953
2020-08-25: 2013.4583899793283
2020-08-26: 2082.283120701408
2020-08-27: 2103.869677878095
2020-08-28: 2168.168203018515
2020-08-29: 2235.323560927891
2020-08-30: 2247.860821521334
2020-08-31: 2306.6373825822698

Predicting for United States__Pennsylvania
2020-08-01: 5830.427503377539
2020-08-02: 6413.9712428894745
2020-08-03: 6505.130313475134
2020-08-04: 7426.955376748356
2020-08-05: 7397.572675268688
2020-08-06: 3666.9848416004684
2020-08-07: 6064.780703920185
2020-08-08: 6613.497582656129
2020-08-09: 6719.211234283564
2020-08-10: 7498.51977814207
2020-08-11: 7263.147597850673
2020-08-12: 5615.310294263154
2020-08-13: 6635.176064034724
2020-08-14: 7007.934105872908
2020-08-15: 7150.510405490171
2020-08-16: 7690.583856272931
2020-08-17: 7533.67157066698
2020-08-18: 6808.6139808148
2020-08-19: 7269.75917670892
2020-08-20: 7511.857893043372
2020-08-21: 7670.988637116254
2020-08-22: 8034.146175361258
2020-08-23: 7977.419087678205
2020-08-24: 7663.798877621

2020-08-25: 878.3595487277153
2020-08-26: 911.9393178628122
2020-08-27: 939.6043033603843
2020-08-28: 973.7152568736949
2020-08-29: 1004.3768208420654
2020-08-30: 1022.7067254413435
2020-08-31: 1054.8932209395111

Predicting for United States__Washington
2020-08-01: 4171.316193944767
2020-08-02: 3628.7639904632997
2020-08-03: 2761.192866937978
2020-08-04: 1933.8165435254634
2020-08-05: 3351.6343785377603
2020-08-06: 1782.8396184295282
2020-08-07: 3592.296579541612
2020-08-08: 3614.8246596240424
2020-08-09: 3053.898793611572
2020-08-10: 2845.5581361371515
2020-08-11: 3310.899242921557
2020-08-12: 2738.0781002551125
2020-08-13: 3580.754219105605
2020-08-14: 3710.1136473034753
2020-08-15: 3420.3409088572866
2020-08-16: 3395.640628168669
2020-08-17: 3554.920647194553
2020-08-18: 3355.7562019378806
2020-08-19: 3779.9569220269327
2020-08-20: 3905.826495704089
2020-08-21: 3792.3519088836665
2020-08-22: 3824.8011003792803
2020-08-23: 3892.324514414434
2020-08-24: 3825.100901786743
2020-08-25: 

2020-08-30: 936.0402374722796
2020-08-31: 966.5285079774995

Predicting for Yemen__nan
2020-08-01: 130.26612228820377
2020-08-02: 157.53477960596615
2020-08-03: 172.52304031285374
2020-08-04: 191.4744753944079
2020-08-05: 200.79432325439237
2020-08-06: 219.16942577537029
2020-08-07: 291.66619698423665
2020-08-08: 326.0654303373911
2020-08-09: 351.2524643828342
2020-08-10: 377.59781974629516
2020-08-11: 393.3396180969726
2020-08-12: 419.7276912482721
2020-08-13: 471.52650393082047
2020-08-14: 507.6062390241766
2020-08-15: 520.4180318244672
2020-08-16: 546.8233694830503
2020-08-17: 568.5884708480803
2020-08-18: 585.8527113150399
2020-08-19: 625.4666277066528
2020-08-20: 658.1225901395773
2020-08-21: 680.0001242998385
2020-08-22: 707.706113816676
2020-08-23: 727.3679432842396
2020-08-24: 709.0176822791964
2020-08-25: 733.9589201360195
2020-08-26: 759.6944285362947
2020-08-27: 779.8901712937165
2020-08-28: 805.6870055341956
2020-08-29: 823.5351727399606
2020-08-30: 824.830059220488
2020-08

In [18]:
# Check the predictions
preds_df[preds_df.GeoID == 'Italy_nan']

AttributeError: 'DataFrame' object has no attribute 'GeoID'

# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [52]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip ../../covid_xprize/validation/data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [53]:
!head predictions/2020-08-01_2020-08-04.csv

CountryName,RegionName,Date,PredictedDailyNewCases
Aruba,,2020-08-01,142.56629738919912
Aruba,,2020-08-02,161.10804722795518
Aruba,,2020-08-03,174.4263128199457
Aruba,,2020-08-04,197.16095044264176
Afghanistan,,2020-08-01,266.70666571128135
Afghanistan,,2020-08-02,361.78323642335
Afghanistan,,2020-08-03,373.63396107727465
Afghanistan,,2020-08-04,379.39879996232133
Angola,,2020-08-01,317.3563234888792


# Test cases
We can generate a prediction file. Let's validate a few cases...

In [19]:
import os
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [22]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="../../covid_xprize/validation/data/2020-09-30_historical_ip.csv",
         output_file="predictions/val_4_days.csv")

validate(start_date="2020-08-01",
         end_date="2020-08-30",
         ip_file="../../covid_xprize/validation/data/2020-09-30_historical_ip.csv",
         output_file="predictions/try_1_month.csv")

Generating predictions from 2020-08-01 to 2020-08-04...
Anguilla__nan  is missing
Cayman Islands__nan  is missing
Falkland Islands__nan  is missing
Gibraltar__nan  is missing
Montserrat__nan  is missing
Pitcairn Islands__nan  is missing
Turks and Caicos Islands__nan  is missing
British Virgin Islands__nan  is missing
Saved predictions to predictions/val_4_days.csv
Done!
Missing countries / regions: {'Anguilla', 'Cayman Islands', 'Falkland Islands', 'Pitcairn Islands', 'Turks and Caicos Islands', 'British Virgin Islands', 'Montserrat', 'Gibraltar'}
Generating predictions from 2020-08-01 to 2020-08-30...
Anguilla__nan  is missing
Cayman Islands__nan  is missing
Falkland Islands__nan  is missing
Gibraltar__nan  is missing
Montserrat__nan  is missing
Pitcairn Islands__nan  is missing
Turks and Caicos Islands__nan  is missing
British Virgin Islands__nan  is missing
Saved predictions to predictions/try_1_month.csv
Done!
Missing countries / regions: {'Anguilla', 'Cayman Islands', 'Falkland Is

## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [21]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="../../covid_xprize/validation/data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

Generating predictions from 2021-01-01 to 2021-01-31...
Traceback (most recent call last):
  File "predict.py", line 207, in <module>
    predict(args.start_date, args.end_date, args.ip_file, args.output_file)
  File "predict.py", line 52, in predict
    preds_df = predict_df(start_date, end_date, path_to_ips_file, verbose=False)
  File "predict.py", line 175, in predict_df
    pred_df = pd.concat(geo_pred_dfs)
  File "/home/mattia/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/reshape/concat.py", line 274, in concat
    op = _Concatenator(
  File "/home/mattia/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/reshape/concat.py", line 331, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate


FileNotFoundError: [Errno 2] No such file or directory: 'predictions/val_1_month_future.csv'

## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [59]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

Start date: 2020-12-04
End date: 2021-06-02


In [60]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

Saved scenario to predictions/180_days_future_scenario.csv


### Check it

In [61]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")

Generating predictions from 2020-12-04 to 2021-06-02...
Saved predictions to predictions/val_6_month_future.csv
Done!
All good!
CPU times: user 6.41 s, sys: 859 ms, total: 7.27 s
Wall time: 2min 39s
