# Example Predictor: Linear Rollout Predictor

This example contains basic functionality for training and evaluating a linear predictor that rolls out predictions day-by-day.

First, a training data set is created from historical case and npi data.

Second, a linear model is trained to predict future cases from prior case data along with prior and future npi data.
The model is an off-the-shelf sklearn Lasso model, that uses a positive weight constraint to enforce the assumption that increased npis has a negative correlation with future cases.

Third, a sample evaluation set is created, and the predictor is applied to this evaluation set to produce prediction results in the correct format.

## Training

In [1]:
import pickle
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

### Copy the data locally

In [2]:
# Main source for the training data
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
# Local file
DATA_FILE = 'data/OxCGRT_latest.csv'

In [3]:
import os
import urllib.request
if not os.path.exists('data'):
    os.mkdir('data')
urllib.request.urlretrieve(DATA_URL, DATA_FILE)

('data/OxCGRT_latest.csv', <http.client.HTTPMessage at 0x7fa581b52d00>)

In [4]:
# Load historical data from local file
df = pd.read_csv(DATA_FILE, 
                 parse_dates=['Date'],
                 encoding="ISO-8859-1",
                 dtype={"RegionName": str,
                        "RegionCode": str},
                 error_bad_lines=False)

In [9]:
df

Unnamed: 0,CountryName,CountryCode,RegionName,RegionCode,Jurisdiction,Date,C1_School closing,C1_Flag,C2_Workplace closing,C2_Flag,...,StringencyIndex,StringencyIndexForDisplay,StringencyLegacyIndex,StringencyLegacyIndexForDisplay,GovernmentResponseIndex,GovernmentResponseIndexForDisplay,ContainmentHealthIndex,ContainmentHealthIndexForDisplay,EconomicSupportIndex,EconomicSupportIndexForDisplay
0,Aruba,ABW,,,NAT_TOTAL,2020-01-01,0.0,,0.0,,...,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.0
1,Aruba,ABW,,,NAT_TOTAL,2020-01-02,0.0,,0.0,,...,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.0
2,Aruba,ABW,,,NAT_TOTAL,2020-01-03,0.0,,0.0,,...,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.0
3,Aruba,ABW,,,NAT_TOTAL,2020-01-04,0.0,,0.0,,...,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.0
4,Aruba,ABW,,,NAT_TOTAL,2020-01-05,0.0,,0.0,,...,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.00,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87975,Zimbabwe,ZWE,,,NAT_TOTAL,2020-11-23,,,,,...,,67.59,,73.81,,58.33,,63.89,,25.0
87976,Zimbabwe,ZWE,,,NAT_TOTAL,2020-11-24,,,,,...,,67.59,,73.81,,58.33,,63.89,,25.0
87977,Zimbabwe,ZWE,,,NAT_TOTAL,2020-11-25,,,,,...,,67.59,,73.81,,58.33,,63.89,,25.0
87978,Zimbabwe,ZWE,,,NAT_TOTAL,2020-11-26,,,,,...,,67.59,,73.81,,58.33,,63.89,,25.0


In [None]:
# For testing, restrict training data to that before a hypothetical predictor submission date
HYPOTHETICAL_SUBMISSION_DATE = np.datetime64("2020-07-31")
df = df[df.Date <= HYPOTHETICAL_SUBMISSION_DATE]

In [10]:
# Add RegionID column that combines CountryName and RegionName for easier manipulation of data
df['GeoID'] = df['CountryName'] + '__' + df['RegionName'].astype(str)

In [11]:
# Add new cases column
df['NewCases'] = df.groupby('GeoID').ConfirmedCases.diff().fillna(0)

In [14]:
# Keep only columns of interest
id_cols = ['CountryName',
           'RegionName',
           'GeoID',
           'Date']
cases_col = ['NewCases']
npi_cols = ['C1_School closing',
            'C2_Workplace closing',
            'C3_Cancel public events',
            'C4_Restrictions on gatherings',
            'C5_Close public transport',
            'C6_Stay at home requirements',
            'C7_Restrictions on internal movement',
            'C8_International travel controls',
            'H1_Public information campaigns',
            'H2_Testing policy',
            'H3_Contact tracing',
            'H6_Facial Coverings']
df = df[id_cols + cases_col + npi_cols]

In [13]:
df

Unnamed: 0,CountryName,RegionName,GeoID,Date,NewCases,C1_School closing,C2_Workplace closing,C3_Cancel public events,C4_Restrictions on gatherings,C5_Close public transport,C6_Stay at home requirements,C7_Restrictions on internal movement,C8_International travel controls,H1_Public information campaigns,H2_Testing policy,H3_Contact tracing,H6_Facial Coverings
0,Aruba,,Aruba__nan,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,,Aruba__nan,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,,Aruba__nan,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,,Aruba__nan,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,,Aruba__nan,2020-01-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87975,Zimbabwe,,Zimbabwe__nan,2020-11-23,48.0,,,,,,,,,,,,
87976,Zimbabwe,,Zimbabwe__nan,2020-11-24,88.0,,,,,,,,,,,,
87977,Zimbabwe,,Zimbabwe__nan,2020-11-25,90.0,,,,,,,,,,,,
87978,Zimbabwe,,Zimbabwe__nan,2020-11-26,110.0,,,,,,,,,,,,


In [15]:
# Fill any missing case values by interpolation and setting NaNs to 0
df.update(df.groupby('GeoID').NewCases.apply(
    lambda group: group.interpolate()).fillna(0))

In [16]:
# Fill any missing NPIs by assuming they are the same as previous day
for npi_col in npi_cols:
    df.update(df.groupby('GeoID')[npi_col].ffill().fillna(0))

In [18]:
# Set number of past days to use to make predictions
nb_lookback_days = 30

# Create training data across all countries for predicting one day ahead
X_cols = cases_col + npi_cols
y_col = cases_col
X_samples = []
y_samples = []
geo_ids = df.GeoID.unique()
for g in geo_ids:
    gdf = df[df.GeoID == g]
    all_case_data = np.array(gdf[cases_col])
    all_npi_data = np.array(gdf[npi_cols])

    # Create one sample for each day where we have enough data
    # Each sample consists of cases and npis for previous nb_lookback_days
    nb_total_days = len(gdf)
    for d in range(nb_lookback_days, nb_total_days - 1):
        X_cases = all_case_data[d-nb_lookback_days:d]

        # Take negative of npis to support positive
        # weight constraint in Lasso.
        X_npis = -all_npi_data[d - nb_lookback_days:d]

        # Flatten all input data so it fits Lasso input format.
        X_sample = np.concatenate([X_cases.flatten(),
                                   X_npis.flatten()])
        y_sample = all_case_data[d + 1]
        X_samples.append(X_sample)
        y_samples.append(y_sample)

X_samples = np.array(X_samples)
y_samples = np.array(y_samples).flatten()

In [19]:
# Helpful function to compute mae
def mae(pred, true):
    return np.mean(np.abs(pred - true))

In [23]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_samples,
                                                    y_samples,
                                                    test_size=0.2,
                                                    random_state=42)

In [24]:
# Create and train Lasso model.
# Set positive=True to enforce assumption that cases are positively correlated
# with future cases and npis are negatively correlated.
model = Lasso(alpha=0.1,
              precompute=True,
              max_iter=10000,
              positive=True,
              selection='random')
# Fit model
model.fit(X_train, y_train)

Lasso(alpha=0.1, max_iter=10000, positive=True, precompute=True,
      selection='random')

In [25]:
# Evaluate model
train_preds = model.predict(X_train)
train_preds = np.maximum(train_preds, 0) # Don't predict negative cases
print('Train MAE:', mae(train_preds, y_train))

test_preds = model.predict(X_test)
test_preds = np.maximum(test_preds, 0) # Don't predict negative cases
print('Test MAE:', mae(test_preds, y_test))

Train MAE: 259.84015150381117
Test MAE: 254.17028978654804


In [26]:
# Inspect the learned feature coefficients for the model
# to see what features it's paying attention to.

# Give names to the features
x_col_names = []
for d in range(-nb_lookback_days, 0):
    x_col_names.append('Day ' + str(d) + ' ' + cases_col[0])
for d in range(-nb_lookback_days, 1):
    for col_name in npi_cols:
        x_col_names.append('Day ' + str(d) + ' ' + col_name)

# View non-zero coefficients
for (col, coeff) in zip(x_col_names, list(model.coef_)):
    if coeff != 0.:
        print(col, coeff)
print('Intercept', model.intercept_)

Day -6 NewCases 0.5247728165951634
Day -5 NewCases 0.1210331714268772
Day -3 NewCases 0.09711381862030334
Day -2 NewCases 0.07841775054352185
Day -1 NewCases 0.21972407921421103
Day -26 C4_Restrictions on gatherings 0.15560632624736404
Day -23 C6_Stay at home requirements 20.411046334500515
Day -22 C7_Restrictions on internal movement 2.4120275597048355
Day -17 C6_Stay at home requirements 5.475021870227211
Day -14 C6_Stay at home requirements 9.073259701124135
Day -10 C6_Stay at home requirements 1.2387309022515984
Intercept 44.05469448823396


In [27]:
# Save model to file
if not os.path.exists('models'):
    os.mkdir('models')
with open('models/model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)

## Evaluation

Now that the predictor has been trained and saved, this section contains the functionality for evaluating it on sample evaluation data.

In [49]:
# Reload the module to get the latest changes
import predict
from importlib import reload
reload(predict)
from predict import predict_df

In [50]:
%%time
preds_df = predict_df("2020-08-01", "2020-08-31", path_to_ips_file="../../covid_xprize/validation/data/2020-09-30_historical_ip.csv", verbose=True)


Ciao Predicting for Aruba__nan
2020-08-01: 142.56629738919912
2020-08-02: 161.10804722795518
2020-08-03: 174.4263128199457
2020-08-04: 197.16095044264176
2020-08-05: 210.12892853592734
2020-08-06: 224.3894928032965
2020-08-07: 300.9846869052062
2020-08-08: 331.53430485522495
2020-08-09: 355.37882124482644
2020-08-10: 383.95219509480125
2020-08-11: 402.3595895555415
2020-08-12: 427.71454483169816
2020-08-13: 481.39656783856697
2020-08-14: 515.8852783362389
2020-08-15: 537.0332388906891
2020-08-16: 566.820183596775
2020-08-17: 591.1012893517072
2020-08-18: 615.1538922787201
2020-08-19: 657.5807663728117
2020-08-20: 691.805496617225
2020-08-21: 719.6914373138824
2020-08-22: 751.1929101770512
2020-08-23: 776.866188792993
2020-08-24: 785.0317761001897
2020-08-25: 818.3052191929476
2020-08-26: 850.0850848784933
2020-08-27: 878.4497943579479
2020-08-28: 910.044055700633
2020-08-29: 936.7575821656437
2020-08-30: 956.7940372240646
2020-08-31: 987.6669992634872

Ciao Predicting for Afghanistan_


Ciao Predicting for Burundi__nan
2020-08-01: 127.89421563175962
2020-08-02: 153.5474992137036
2020-08-03: 168.03624523085261
2020-08-04: 187.84739595928207
2020-08-05: 191.58902334726457
2020-08-06: 212.4661534141367
2020-08-07: 287.3919228627161
2020-08-08: 321.07118462603665
2020-08-09: 346.37540398475244
2020-08-10: 372.70194421205713
2020-08-11: 385.7544085027397
2020-08-12: 413.1684604794124
2020-08-13: 466.16751848039837
2020-08-14: 501.96662161370307
2020-08-15: 514.969728485811
2020-08-16: 541.1762840294788
2020-08-17: 561.5983454168595
2020-08-18: 579.254154150491
2020-08-19: 619.4253751949907
2020-08-20: 651.9799563662277
2020-08-21: 673.9932810835197
2020-08-22: 701.5084111097739
2020-08-23: 720.4717088973526
2020-08-24: 702.2391207615078
2020-08-25: 727.4130896409574
2020-08-26: 753.1043582452374
2020-08-27: 773.3682172228093
2020-08-28: 799.0144470730545
2020-08-29: 816.4782399764488
2020-08-30: 817.7733948328228
2020-08-31: 838.2381564813963

Ciao Predicting for Belgium_

2020-08-23: 858.1778622049205
2020-08-24: 867.2704137552745
2020-08-25: 900.2576856180298
2020-08-26: 932.9635523415557
2020-08-27: 962.7849445945915
2020-08-28: 995.3495798026777
2020-08-29: 1025.6658826770063
2020-08-30: 1046.540897147188
2020-08-31: 1077.9367711744887

Ciao Predicting for Bermuda__nan
2020-08-01: 123.3465001844647
2020-08-02: 149.0007178810067
2020-08-03: 165.79991251749823
2020-08-04: 188.1321438854004
2020-08-05: 190.79259950178746
2020-08-06: 210.4963403438404
2020-08-07: 283.987477882798
2020-08-08: 317.43463632968565
2020-08-09: 343.97899910014416
2020-08-10: 371.6126420333276
2020-08-11: 384.31762850482335
2020-08-12: 411.08886553558085
2020-08-13: 463.26542289214683
2020-08-14: 498.8279468077635
2020-08-15: 512.4611430017189
2020-08-16: 539.3515917790652
2020-08-17: 559.6902072080978
2020-08-18: 577.0056189332686
2020-08-19: 616.7016584847884
2020-08-20: 649.0691452831531
2020-08-21: 671.3844684596437
2020-08-22: 699.2539254595947
2020-08-23: 720.627628065686

2020-08-01: 1894.6534036705912
2020-08-02: 2533.149921098141
2020-08-03: 6855.694783465215
2020-08-04: 4814.27319379324
2020-08-05: 5055.3871003233035
2020-08-06: 4849.839990717543
2020-08-07: 3352.19062253006
2020-08-08: 3888.654873502807
2020-08-09: 5890.39954086879
2020-08-10: 5184.757138749184
2020-08-11: 5337.955461123853
2020-08-12: 5221.548211765181
2020-08-13: 4418.4608681000755
2020-08-14: 4771.551881898764
2020-08-15: 5721.761954277928
2020-08-16: 5528.48383984678
2020-08-17: 5661.124513774663
2020-08-18: 5598.153345381976
2020-08-19: 5197.245788817032
2020-08-20: 5417.399434483228
2020-08-21: 5903.470205058065
2020-08-22: 5903.228768420358
2020-08-23: 6019.832821338494
2020-08-24: 5970.248032140009
2020-08-25: 5784.7339469738945
2020-08-26: 5925.768908265391
2020-08-27: 6192.442225291941
2020-08-28: 6258.066696462249
2020-08-29: 6362.283571250284
2020-08-30: 6367.752208092066
2020-08-31: 6303.216454823003

Ciao Predicting for Chile__nan
2020-08-01: 1642.0946510970884
2020-08

2020-08-07: 1096.7898440769625
2020-08-08: 1308.1412321443406
2020-08-09: 1893.23568813582
2020-08-10: 1627.521119550831
2020-08-11: 1720.619722132666
2020-08-12: 1714.5466676665628
2020-08-13: 1524.1060397683361
2020-08-14: 1672.5538744935457
2020-08-15: 1955.4558641416925
2020-08-16: 1882.5910085607384
2020-08-17: 1951.3022938653748
2020-08-18: 1956.4479832714487
2020-08-19: 1873.919662731922
2020-08-20: 1975.004445535032
2020-08-21: 2130.8834480253367
2020-08-22: 2135.1248685252554
2020-08-23: 2194.7778345630145
2020-08-24: 2195.656242648344
2020-08-25: 2169.8649946343808
2020-08-26: 2241.97312273955
2020-08-27: 2338.1942444611686
2020-08-28: 2371.931982299379
2020-08-29: 2425.3036867969536
2020-08-30: 2446.3601416386623
2020-08-31: 2453.641370919157

Ciao Predicting for Cuba__nan
2020-08-01: 169.539637056759
2020-08-02: 199.39993669063097
2020-08-03: 209.5190058846132
2020-08-04: 246.3365839110475
2020-08-05: 257.4756231626314
2020-08-06: 264.2134199126757
2020-08-07: 337.012881709

2020-08-30: 2091.2138049978034
2020-08-31: 2142.501214842201

Ciao Predicting for Algeria__nan
2020-08-01: 1216.978053576839
2020-08-02: 1266.3571536137479
2020-08-03: 1265.319284266852
2020-08-04: 1335.8796680960186
2020-08-05: 1306.6840151402953
2020-08-06: 1353.1615136782666
2020-08-07: 1443.1716741191672
2020-08-08: 1489.545626071283
2020-08-09: 1519.3026030457781
2020-08-10: 1571.7132276955958
2020-08-11: 1580.370364707847
2020-08-12: 1624.5566261155777
2020-08-13: 1692.8817527170927
2020-08-14: 1740.1375137827667
2020-08-15: 1782.1288184181349
2020-08-16: 1830.2477954641918
2020-08-17: 1858.5937779744
2020-08-18: 1904.130746375743
2020-08-19: 1962.6068443832442
2020-08-20: 2011.66002401385
2020-08-21: 2059.305909300581
2020-08-22: 2107.982670029275
2020-08-23: 2147.564835979573
2020-08-24: 2195.6802976498298
2020-08-25: 2250.7072822433424
2020-08-26: 2301.9236251597226
2020-08-27: 2353.215231989354
2020-08-28: 2404.180379982252
2020-08-29: 2450.969848896021
2020-08-30: 2502.13809

2020-08-12: 15031.553878291434
2020-08-13: 15906.357679851159
2020-08-14: 15320.886763029132
2020-08-15: 14163.284048222169
2020-08-16: 14767.311403678432
2020-08-17: 15614.877094966709
2020-08-18: 15867.949875324663
2020-08-19: 16436.891510022648
2020-08-20: 16216.709875924622
2020-08-21: 15703.151565697543
2020-08-22: 16047.857085353273
2020-08-23: 16534.940030237038
2020-08-24: 16779.965738150866
2020-08-25: 17177.391227039087
2020-08-26: 17153.529098956173
2020-08-27: 16975.465971146557
2020-08-28: 17212.910786794982
2020-08-29: 17534.066605373457
2020-08-30: 17782.644268125383
2020-08-31: 18091.17623996458

Ciao Predicting for Gabon__nan
2020-08-01: 123.54529446509365
2020-08-02: 153.42081035556154
2020-08-03: 175.11076008776428
2020-08-04: 187.0303698189179
2020-08-05: 203.53760942933525
2020-08-06: 213.08905430204192
2020-08-07: 286.088896737014
2020-08-08: 322.78386581887924
2020-08-09: 350.3236590116057
2020-08-10: 374.5946558347166
2020-08-11: 394.23063685455804
2020-08-12: 4

2020-08-24: 834.1914877464469
2020-08-25: 870.7268080783329
2020-08-26: 929.9096517324472
2020-08-27: 968.4936215881567
2020-08-28: 1007.5131183189173
2020-08-29: 1042.8136929719453
2020-08-30: 1067.0818345988512
2020-08-31: 1105.3074838747025

Ciao Predicting for Gambia__nan
2020-08-01: 133.06005718084882
2020-08-02: 151.15371631600874
2020-08-03: 165.48518152986858
2020-08-04: 182.99890318772964
2020-08-05: 190.25284796470544
2020-08-06: 210.59553181030685
2020-08-07: 288.8264328218105
2020-08-08: 319.54497445010236
2020-08-09: 344.04532935568045
2020-08-10: 369.50352072870453
2020-08-11: 383.7931067544886
2020-08-12: 411.4523915609342
2020-08-13: 465.8941146019599
2020-08-14: 500.49857849689636
2020-08-15: 512.8491951771664
2020-08-16: 542.369044303228
2020-08-17: 564.0308224452849
2020-08-18: 582.4587838647083
2020-08-19: 623.83112825905
2020-08-20: 683.3444612052995
2020-08-21: 711.5090582345897
2020-08-22: 744.4952345083173
2020-08-23: 784.9301805149569
2020-08-24: 771.7536808025

2020-08-01: 5065.332972973032
2020-08-02: 4578.817776316815
2020-08-03: 4668.363512780632
2020-08-04: 4765.524288477867
2020-08-05: 6090.911345354978
2020-08-06: 6255.071805977817
2020-08-07: 5648.90329510728
2020-08-08: 5412.828023907434
2020-08-09: 5388.115325601708
2020-08-10: 5516.7086529643
2020-08-11: 6234.256847319104
2020-08-12: 6412.383968586171
2020-08-13: 6173.6057951171915
2020-08-14: 6077.915712927255
2020-08-15: 6048.986701670241
2020-08-16: 6166.267256295605
2020-08-17: 6578.584318295054
2020-08-18: 6734.668945277943
2020-08-19: 6675.801066489228
2020-08-20: 6661.430941812861
2020-08-21: 6667.828845674861
2020-08-22: 6773.840550484241
2020-08-23: 7029.0922220678
2020-08-24: 7148.4846418015695
2020-08-25: 7172.397970350956
2020-08-26: 7205.036561765242
2020-08-27: 7241.71070198328
2020-08-28: 7341.1766334932745
2020-08-29: 7517.476837375372
2020-08-30: 7633.123900122071
2020-08-31: 7698.518343345082

Ciao Predicting for Indonesia__nan
2020-08-01: 5193.709107626841
2020-08

2020-08-14: 572.5843756992641
2020-08-15: 605.8631855793996
2020-08-16: 639.5007137701275
2020-08-17: 663.14927212064
2020-08-18: 694.9372588578732
2020-08-19: 740.8945346939884
2020-08-20: 777.2366887480475
2020-08-21: 813.4479255643357
2020-08-22: 849.2316831804433
2020-08-23: 879.7206868881724
2020-08-24: 914.9863745950279
2020-08-25: 957.1168070332353
2020-08-26: 995.554372362194
2020-08-27: 1034.0622847583757
2020-08-28: 1072.0975590207095
2020-08-29: 1107.4754656280547
2020-08-30: 1145.5767887694014
2020-08-31: 1187.1776996231115

Ciao Predicting for Jordan__nan
2020-08-01: 4705.15879431744
2020-08-02: 5403.033485825868
2020-08-03: 5332.406829165358
2020-08-04: 5188.829359862857
2020-08-05: 5446.866262491185
2020-08-06: 5436.640450222398
2020-08-07: 5370.4262474044135
2020-08-08: 5737.811955673384
2020-08-09: 5757.909419624746
2020-08-10: 5740.590034832532
2020-08-11: 5906.973134693928
2020-08-12: 5930.74475911365
2020-08-13: 5957.051837924268
2020-08-14: 6176.080844792716
2020-0

2020-08-01: 1772.5941248011236
2020-08-02: 1567.9296052716722
2020-08-03: 1475.8669174330182
2020-08-04: 1562.562457227052
2020-08-05: 1816.6050720262097
2020-08-06: 1976.8486226839555
2020-08-07: 1970.2841242480436
2020-08-08: 1887.5337176210692
2020-08-09: 1846.5795907626398
2020-08-10: 1906.6974246446337
2020-08-11: 2061.368465495232
2020-08-12: 2179.387448535676
2020-08-13: 2209.825873942551
2020-08-14: 2192.4074356995243
2020-08-15: 2188.21301194982
2020-08-16: 2239.150004648532
2020-08-17: 2343.772969698571
2020-08-18: 2435.965395540556
2020-08-19: 2483.2383242752653
2020-08-20: 2501.366796010108
2020-08-21: 2519.4967374236153
2020-08-22: 2568.8859690994173
2020-08-23: 2644.1577815031633
2020-08-24: 2720.432142930716
2020-08-25: 2756.745654843591
2020-08-26: 2789.7234582758924
2020-08-27: 2822.8718362797376
2020-08-28: 2860.34639073628
2020-08-29: 2923.1148032339274
2020-08-30: 2987.486221698634
2020-08-31: 3036.9555719159666

Ciao Predicting for Liberia__nan
2020-08-01: 123.9150

2020-08-30: 811.7559573748033
2020-08-31: 832.0914535050813

Ciao Predicting for Morocco__nan
2020-08-01: 4286.030813408814
2020-08-02: 2669.3487284327807
2020-08-03: 5379.774273302402
2020-08-04: 4630.555430597067
2020-08-05: 4938.807103681353
2020-08-06: 4803.737423020537
2020-08-07: 4586.491736112674
2020-08-08: 4037.7618524790214
2020-08-09: 5218.716948236213
2020-08-10: 5058.212493138204
2020-08-11: 5205.20111232958
2020-08-12: 5242.4238807612055
2020-08-13: 5066.122786528322
2020-08-14: 4899.554956774211
2020-08-15: 5437.3834147552025
2020-08-16: 5458.936254216684
2020-08-17: 5571.312013281311
2020-08-18: 5637.169546703228
2020-08-19: 5568.013630297377
2020-08-20: 5546.580863732157
2020-08-21: 5827.690557240792
2020-08-22: 5916.922003862212
2020-08-23: 6023.433563058006
2020-08-24: 6066.501673190871
2020-08-25: 6054.097556688388
2020-08-26: 6087.869386615232
2020-08-27: 6256.8183671817715
2020-08-28: 6395.923988229841
2020-08-29: 6504.124193982652
2020-08-30: 6576.313679008661
20

2020-08-05: 191.31800721946763
2020-08-06: 210.74229205506447
2020-08-07: 283.8820326085622
2020-08-08: 318.2177162848212
2020-08-09: 345.4779940079251
2020-08-10: 371.35757288954574
2020-08-11: 385.9993982712893
2020-08-12: 412.9389993186108
2020-08-13: 465.0572264366448
2020-08-14: 501.3611536888113
2020-08-15: 524.4056828669088
2020-08-16: 552.7304392510902
2020-08-17: 575.2310007108281
2020-08-18: 599.6042045330722
2020-08-19: 641.2189612612691
2020-08-20: 676.2995992348804
2020-08-21: 705.1593481980801
2020-08-22: 735.8802191191811
2020-08-23: 758.2338595524068
2020-08-24: 765.7733491020114
2020-08-25: 798.2505137565679
2020-08-26: 830.0509581900076
2020-08-27: 859.180312564824
2020-08-28: 890.055443885346
2020-08-29: 914.8550859054055
2020-08-30: 933.4415190000085
2020-08-31: 963.3605817553935

Ciao Predicting for Malaysia__nan
2020-08-01: 1294.6748113957424
2020-08-02: 1376.9114197947772
2020-08-03: 1870.1017524288231
2020-08-04: 2031.9582818303415
2020-08-05: 1470.7737946520342

2020-08-23: 723.7772573669387
2020-08-24: 706.0685683775771
2020-08-25: 731.5048954451117
2020-08-26: 775.3348300726674
2020-08-27: 799.9356402211984
2020-08-28: 829.3660316070213
2020-08-29: 861.1616346167998
2020-08-30: 866.6304386512327
2020-08-31: 891.8366729287421

Ciao Predicting for Oman__nan
2020-08-01: 230.66298435369876
2020-08-02: 584.0692209560151
2020-08-03: 439.4667521330508
2020-08-04: 403.5294094550295
2020-08-05: 301.590000657519
2020-08-06: 290.2483069483177
2020-08-07: 440.0919336347198
2020-08-08: 630.1833619565507
2020-08-09: 602.366935555571
2020-08-10: 594.5164353893863
2020-08-11: 555.441673440356
2020-08-12: 555.7232662019704
2020-08-13: 653.599822383859
2020-08-14: 767.7211450399399
2020-08-15: 794.0247378330309
2020-08-16: 809.4095026301568
2020-08-17: 805.4640410955221
2020-08-18: 825.827113280259
2020-08-19: 896.6614712803448
2020-08-20: 976.5105280271439
2020-08-21: 1017.2529725628749
2020-08-22: 1046.9416217143987
2020-08-23: 1064.8084467225933
2020-08-24


Ciao Predicting for Paraguay__nan
2020-08-01: 913.2929236297305
2020-08-02: 865.3445581183647
2020-08-03: 857.4435952842423
2020-08-04: 1015.9431046696163
2020-08-05: 1091.5355283526653
2020-08-06: 970.3844886303875
2020-08-07: 1103.2230957735017
2020-08-08: 1104.133335914042
2020-08-09: 1118.022290021273
2020-08-10: 1226.3712958011963
2020-08-11: 1276.3612732945235
2020-08-12: 1249.6916753080359
2020-08-13: 1328.0942690012312
2020-08-14: 1350.243247879324
2020-08-15: 1379.0704497584156
2020-08-16: 1457.6643869653512
2020-08-17: 1502.3503789588967
2020-08-18: 1516.6254813301464
2020-08-19: 1574.7231084680939
2020-08-20: 1608.0598345309245
2020-08-21: 1645.9671163033665
2020-08-22: 1709.2049815035293
2020-08-23: 1754.48767853342
2020-08-24: 1787.6005952106846
2020-08-25: 1839.0914432593547
2020-08-26: 1879.4816846657168
2020-08-27: 1923.3121082605887
2020-08-28: 1979.7767135483864
2020-08-29: 2027.3137705391332
2020-08-30: 2070.0520329432184
2020-08-31: 2120.5634571817145

Ciao Predict

2020-08-06: 235.50810632854362
2020-08-07: 304.2036615975869
2020-08-08: 337.75071646019086
2020-08-09: 362.17214046108745
2020-08-10: 386.0448040776809
2020-08-11: 403.7865060881087
2020-08-12: 433.83751356212156
2020-08-13: 484.2599398638819
2020-08-14: 519.9788427598276
2020-08-15: 532.2580913897905
2020-08-16: 557.3289077542402
2020-08-17: 580.2168115825036
2020-08-18: 599.3270111836916
2020-08-19: 638.5390136079332
2020-08-20: 671.106651023703
2020-08-21: 692.6715429534171
2020-08-22: 719.6984556394164
2020-08-23: 739.9906197949194
2020-08-24: 722.6153091924664
2020-08-25: 747.5326450312878
2020-08-26: 773.3163785599702
2020-08-27: 793.6804800395072
2020-08-28: 819.2356975801672
2020-08-29: 837.4974452664844
2020-08-30: 839.389342789357
2020-08-31: 859.915475716526

Ciao Predicting for Singapore__nan
2020-08-01: 129.21354172773215
2020-08-02: 158.10733782006167
2020-08-03: 171.90234096592727
2020-08-04: 194.75275491570733
2020-08-05: 197.6467337434879
2020-08-06: 215.3985873272734

2020-08-03: 1014.0183106695806
2020-08-04: 1393.414228739383
2020-08-05: 1831.0855066187578
2020-08-06: 1985.1780521476514
2020-08-07: 1998.1137490880255
2020-08-08: 1756.8685294634438
2020-08-09: 1558.0224011430216
2020-08-10: 1748.7388535009559
2020-08-11: 1997.4733767071543
2020-08-12: 2130.2004166927395
2020-08-13: 2174.979776190227
2020-08-14: 2068.716684077751
2020-08-15: 1962.3567051807893
2020-08-16: 2065.1905767612334
2020-08-17: 2215.719020470808
2020-08-18: 2310.650008115969
2020-08-19: 2363.9369554312307
2020-08-20: 2329.0709796450587
2020-08-21: 2291.439322385674
2020-08-22: 2357.794884052576
2020-08-23: 2454.1088471053886
2020-08-24: 2492.264841226074
2020-08-25: 2538.3890000155407
2020-08-26: 2538.017717161495
2020-08-27: 2533.386111419516
2020-08-28: 2583.2973935307077
2020-08-29: 2649.025937881594
2020-08-30: 2692.538001039525
2020-08-31: 2736.259776023871

Ciao Predicting for Slovenia__nan
2020-08-01: 1829.1450857468985
2020-08-02: 1472.6413861408068
2020-08-03: 1164.


Ciao Predicting for Turkmenistan__nan
2020-08-01: 121.74168620259263
2020-08-02: 148.49126610544272
2020-08-03: 163.91550209834605
2020-08-04: 181.2250200658436
2020-08-05: 188.83562570416444
2020-08-06: 208.09792131483783
2020-08-07: 281.73242549973406
2020-08-08: 316.06559658521303
2020-08-09: 341.44355387248766
2020-08-10: 366.86766033352234
2020-08-11: 381.62601026786524
2020-08-12: 408.34757338675877
2020-08-13: 460.642131646561
2020-08-14: 496.74987944554414
2020-08-15: 509.6277787000548
2020-08-16: 535.4955057157796
2020-08-17: 556.6746617705826
2020-08-18: 574.0094257057426
2020-08-19: 613.8042241353328
2020-08-20: 646.4712589928947
2020-08-21: 668.3418813463734
2020-08-22: 695.7117443224271
2020-08-23: 719.8253546000228
2020-08-24: 702.4851290409049
2020-08-25: 728.0610142183882
2020-08-26: 754.4524780081513
2020-08-27: 774.8958963843593
2020-08-28: 801.2226455714028
2020-08-29: 821.7287936962897
2020-08-30: 824.2801625738249
2020-08-31: 845.6212687674308

Ciao Predicting for

2020-08-30: 213954.6095977409
2020-08-31: 216496.01485433543

Ciao Predicting for United States__Alaska
2020-08-01: 580.2314674424108
2020-08-02: 637.3151151656756
2020-08-03: 699.194127884493
2020-08-04: 806.1998414620042
2020-08-05: 708.9525437072894
2020-08-06: 478.86467551322545
2020-08-07: 742.4733660267259
2020-08-08: 810.3528730874561
2020-08-09: 869.0181889768152
2020-08-10: 957.2149171472836
2020-08-11: 907.6665686323072
2020-08-12: 820.5545424252373
2020-08-13: 952.6439307213504
2020-08-14: 1012.745960811816
2020-08-15: 1060.2376282950045
2020-08-16: 1128.4997137993075
2020-08-17: 1116.5143938553865
2020-08-18: 1088.6441391963076
2020-08-19: 1164.800962764164
2020-08-20: 1215.4729665794118
2020-08-21: 1263.056598703276
2020-08-22: 1319.252798556126
2020-08-23: 1330.590014590988
2020-08-24: 1316.2897852570723
2020-08-25: 1365.5921712036247
2020-08-26: 1408.7551891524508
2020-08-27: 1452.4887767957548
2020-08-28: 1501.133206102568
2020-08-29: 1523.6614342858284
2020-08-30: 1535

2020-08-14: 913.4863305749511
2020-08-15: 933.4319361808083
2020-08-16: 983.8249393688965
2020-08-17: 1007.2106861321641
2020-08-18: 984.9802705468364
2020-08-19: 1055.6763216001023
2020-08-20: 1105.3152166364114
2020-08-21: 1136.1731585164362
2020-08-22: 1182.9868546664884
2020-08-23: 1207.6829429223817
2020-08-24: 1196.2566286230876
2020-08-25: 1243.3361794457246
2020-08-26: 1284.9669733015285
2020-08-27: 1318.4002934659202
2020-08-28: 1361.138668188285
2020-08-29: 1388.7708820808573
2020-08-30: 1401.1425976402606
2020-08-31: 1439.9230797753748

Ciao Predicting for United States__Florida
2020-08-01: 5158.106582182154
2020-08-02: 5612.877388882242
2020-08-03: 7262.718407298613
2020-08-04: 7054.105558598443
2020-08-05: 2786.3125317587824
2020-08-06: 2616.741047944244
2020-08-07: 4986.4286629385315
2020-08-08: 5517.683847880247
2020-08-09: 6644.31388564571
2020-08-10: 6537.632005883599
2020-08-11: 4393.985899202404
2020-08-12: 4221.8451265139665
2020-08-13: 5313.4086074128645
2020-08-14


Ciao Predicting for United States__Kentucky
2020-08-01: 1850.6591369269145
2020-08-02: 1962.604150722556
2020-08-03: 2465.1860615498363
2020-08-04: 2729.3050697075414
2020-08-05: 1105.3460586734673
2020-08-06: 1042.0335718902156
2020-08-07: 1911.149354749054
2020-08-08: 2059.0161189915502
2020-08-09: 2449.22008101297
2020-08-10: 2573.005385257036
2020-08-11: 1784.0516341058233
2020-08-12: 1730.2678270150295
2020-08-13: 2142.5886354044787
2020-08-14: 2277.1736100910084
2020-08-15: 2544.5325895838855
2020-08-16: 2623.3432824940787
2020-08-17: 2254.1645331076215
2020-08-18: 2221.3966743111973
2020-08-19: 2425.564225242639
2020-08-20: 2534.9886539282556
2020-08-21: 2721.7014167627917
2020-08-22: 2787.809849150143
2020-08-23: 2627.490684990689
2020-08-24: 2602.6854865119426
2020-08-25: 2711.4689276012336
2020-08-26: 2797.8782984097083
2020-08-27: 2929.124845049872
2020-08-28: 2990.5912468978236
2020-08-29: 2935.6471071944957
2020-08-30: 2941.2897779837117
2020-08-31: 3011.735242755694

Cia

2020-08-13: 1346.5269059712891
2020-08-14: 1407.4975667579733
2020-08-15: 1486.6041003983419
2020-08-16: 1580.7461946795447
2020-08-17: 1562.3569683880683
2020-08-18: 1501.3666249621215
2020-08-19: 1604.6315150252294
2020-08-20: 1662.322984424302
2020-08-21: 1730.0812593615021
2020-08-22: 1806.6993725884518
2020-08-23: 1815.0062590378732
2020-08-24: 1809.9123697631767
2020-08-25: 1878.0583932524644
2020-08-26: 1931.914891437396
2020-08-27: 1993.4286657612283
2020-08-28: 2058.9984610801403
2020-08-29: 2087.2024046124106
2020-08-30: 2110.089958231024
2020-08-31: 2165.97793135667

Ciao Predicting for United States__North Carolina
2020-08-01: 3192.388826023494
2020-08-02: 2467.8146562613606
2020-08-03: 3050.907389313005
2020-08-04: 3505.9881501043333
2020-08-05: 1370.9959037234291
2020-08-06: 1380.5844137925387
2020-08-07: 2847.045947609506
2020-08-08: 2653.0142522694223
2020-08-09: 3087.3796918342837
2020-08-10: 3290.427518478648
2020-08-11: 2229.7967193797344
2020-08-12: 2237.37831364183


Ciao Predicting for United States__Oklahoma
2020-08-01: 2700.488230153544
2020-08-02: 2906.0455950390897
2020-08-03: 2859.5103130744405
2020-08-04: 3198.6373909888357
2020-08-05: 1331.0128929040181
2020-08-06: 1269.5738383071516
2020-08-07: 2584.5752866441026
2020-08-08: 2789.5613144138842
2020-08-09: 2948.379729722182
2020-08-10: 3078.9754709409362
2020-08-11: 2151.2781959917415
2020-08-12: 2100.022503492166
2020-08-13: 2743.5822342013166
2020-08-14: 2917.669614089554
2020-08-15: 3091.486417898198
2020-08-16: 3162.079190398806
2020-08-17: 2715.092784391918
2020-08-18: 2684.8143071137374
2020-08-19: 3008.7582620204503
2020-08-20: 3146.5475460824714
2020-08-21: 3299.0440681858445
2020-08-22: 3357.760772026168
2020-08-23: 3155.3588805306113
2020-08-24: 3133.207858801738
2020-08-25: 3304.8451039126408
2020-08-26: 3411.930094073771
2020-08-27: 3533.7445732796737
2020-08-28: 3591.891447118713
2020-08-29: 3515.7235460921006
2020-08-30: 3524.526786507496
2020-08-31: 3629.166416419985

Ciao P

2020-08-01: 128.9268323326104
2020-08-02: 153.3387985726422
2020-08-03: 172.89088413533722
2020-08-04: 184.27503718931266
2020-08-05: 190.68037956659148
2020-08-06: 210.48370828704807
2020-08-07: 287.05478220371685
2020-08-08: 321.2314578699304
2020-08-09: 348.3068678671522
2020-08-10: 371.12150777352633
2020-08-11: 386.0961320864527
2020-08-12: 413.46477532793824
2020-08-13: 467.1871479928014
2020-08-14: 503.80369033851434
2020-08-15: 526.6163951497285
2020-08-16: 553.5023964618102
2020-08-17: 575.9255810581491
2020-08-18: 600.5657505381087
2020-08-19: 642.973024103992
2020-08-20: 678.3772110585112
2020-08-21: 707.1003327505147
2020-08-22: 737.1291349712099
2020-08-23: 764.1671798653423
2020-08-24: 772.9044241514536
2020-08-25: 806.3999468412129
2020-08-26: 839.1262479716295
2020-08-27: 869.7384805939415
2020-08-28: 901.3147256959654
2020-08-29: 929.077832929299
2020-08-30: 949.2662026532712
2020-08-31: 989.5574512598334

Ciao Predicting for United States__Vermont
2020-08-01: 181.8980

2020-08-23: 805.2033203264281
2020-08-24: 840.1327627954686
2020-08-25: 880.7146749531314
2020-08-26: 919.2090345375991
2020-08-27: 956.6708195219038
2020-08-28: 993.2825445317307
2020-08-29: 1028.5103988713768
2020-08-30: 1066.001652277381
2020-08-31: 1106.5127556947214

Ciao Predicting for Vanuatu__nan
2020-08-01: 121.74168620259263
2020-08-02: 148.49126610544272
2020-08-03: 163.91550209834605
2020-08-04: 181.2250200658436
2020-08-05: 188.83562570416444
2020-08-06: 208.09792131483783
2020-08-07: 281.73242549973406
2020-08-08: 316.06559658521303
2020-08-09: 341.44355387248766
2020-08-10: 366.86766033352234
2020-08-11: 382.8647411701168
2020-08-12: 409.8584832959018
2020-08-13: 462.309984328258
2020-08-14: 498.5938577870561
2020-08-15: 520.6224552330151
2020-08-16: 548.6797912026702
2020-08-17: 571.7577334651503
2020-08-18: 596.2069192003539
2020-08-19: 638.0301435870451
2020-08-20: 673.0851317850904
2020-08-21: 701.3974712361086
2020-08-22: 731.9458440643966
2020-08-23: 756.9403825844

In [64]:
# Check the predictions
preds_df[preds_df.CountryName == 'Italy']

Unnamed: 0,CountryName,RegionName,Date,PredictedDailyNewCases
24599,Italy,,2020-08-01,32449.797234
24600,Italy,,2020-08-02,29681.937952
24601,Italy,,2020-08-03,26847.319969
24602,Italy,,2020-08-04,26817.506633
24603,Italy,,2020-08-05,28078.536276
24604,Italy,,2020-08-06,30147.924848
24605,Italy,,2020-08-07,32173.443419
24606,Italy,,2020-08-08,31107.658707
24607,Italy,,2020-08-09,29742.144114
24608,Italy,,2020-08-10,29692.218269


# Validation
This is how the predictor is going to be called during the competition.  
!!! PLEASE DO NOT CHANGE THE API !!!

In [52]:
!python predict.py -s 2020-08-01 -e 2020-08-04 -ip ../../covid_xprize/validation/data/2020-09-30_historical_ip.csv -o predictions/2020-08-01_2020-08-04.csv

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/2020-08-01_2020-08-04.csv
Done!


In [53]:
!head predictions/2020-08-01_2020-08-04.csv

CountryName,RegionName,Date,PredictedDailyNewCases
Aruba,,2020-08-01,142.56629738919912
Aruba,,2020-08-02,161.10804722795518
Aruba,,2020-08-03,174.4263128199457
Aruba,,2020-08-04,197.16095044264176
Afghanistan,,2020-08-01,266.70666571128135
Afghanistan,,2020-08-02,361.78323642335
Afghanistan,,2020-08-03,373.63396107727465
Afghanistan,,2020-08-04,379.39879996232133
Angola,,2020-08-01,317.3563234888792


# Test cases
We can generate a prediction file. Let's validate a few cases...

In [54]:
import os
from covid_xprize.validation.predictor_validation import validate_submission

def validate(start_date, end_date, ip_file, output_file):
    # First, delete any potential old file
    try:
        os.remove(output_file)
    except OSError:
        pass
    
    # Then generate the prediction, calling the official API
    !python predict.py -s {start_date} -e {end_date} -ip {ip_file} -o {output_file}
    
    # And validate it
    errors = validate_submission(start_date, end_date, ip_file, output_file)
    if errors:
        for error in errors:
            print(error)
    else:
        print("All good!")

## 4 days, no gap
- All countries and regions
- Official number of cases is known up to start_date
- Intervention Plans are the official ones

In [56]:
validate(start_date="2020-08-01",
         end_date="2020-08-04",
         ip_file="../../covid_xprize/validation/data/2020-09-30_historical_ip.csv",
         output_file="predictions/val_4_days.csv")

Generating predictions from 2020-08-01 to 2020-08-04...
Saved predictions to predictions/val_4_days.csv
Done!
Missing countries / regions: {'British Virgin Islands', 'Turks and Caicos Islands', 'Montserrat', 'Pitcairn Islands', 'Anguilla', 'Cayman Islands', 'Gibraltar', 'Falkland Islands'}


## 1 month in the future
- 2 countries only
- there's a gap between date of last known number of cases and start_date
- For future dates, Intervention Plans contains scenarios for which predictions are requested to answer the question: what will happen if we apply these plans?

In [58]:
%%time
validate(start_date="2021-01-01",
         end_date="2021-01-31",
         ip_file="../../covid_xprize/validation/data/future_ip.csv",
         output_file="predictions/val_1_month_future.csv")

Generating predictions from 2021-01-01 to 2021-01-31...
Saved predictions to predictions/val_1_month_future.csv
Done!
All good!
CPU times: user 67.5 ms, sys: 29 ms, total: 96.6 ms
Wall time: 2.11 s


## 180 days, from a future date, all countries and regions
- Prediction start date is 1 week from now. (i.e. assuming submission date is 1 week from now)  
- Prediction end date is 6 months after start date.  
- Prediction is requested for all available countries and regions.  
- Intervention plan scenario: freeze last known intervention plans for each country and region.  

As the number of cases is not known yet between today and start date, but the model relies on them, the model has to predict them in order to use them.  
This test is the most demanding test. It should take less than 1 hour to generate the prediction file.

### Generate the scenario

In [59]:
from datetime import datetime, timedelta

start_date = datetime.now() + timedelta(days=7)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date = start_date + timedelta(days=180)
end_date_str = end_date.strftime('%Y-%m-%d')
print(f"Start date: {start_date_str}")
print(f"End date: {end_date_str}")

Start date: 2020-12-04
End date: 2021-06-02


In [60]:
from covid_xprize.validation.scenario_generator import get_raw_data, generate_scenario, NPI_COLUMNS
DATA_FILE = 'data/OxCGRT_latest.csv'
latest_df = get_raw_data(DATA_FILE, latest=True)
scenario_df = generate_scenario(start_date_str, end_date_str, latest_df, countries=None, scenario="Freeze")
scenario_file = "predictions/180_days_future_scenario.csv"
scenario_df.to_csv(scenario_file, index=False)
print(f"Saved scenario to {scenario_file}")

Saved scenario to predictions/180_days_future_scenario.csv


### Check it

In [61]:
%%time
validate(start_date=start_date_str,
         end_date=end_date_str,
         ip_file=scenario_file,
         output_file="predictions/val_6_month_future.csv")

Generating predictions from 2020-12-04 to 2021-06-02...
Saved predictions to predictions/val_6_month_future.csv
Done!
All good!
CPU times: user 6.41 s, sys: 859 ms, total: 7.27 s
Wall time: 2min 39s
