`task6-create-deploy-models` creates pickle files that work well with numerai-compute. This notebook is built to create pickle files that work with pickle model uploads.

In [1]:
%load_ext autoreload
%autoreload 2

import pickle
import cloudpickle
import pandas as pd

# Need to bring all dependencies into local variables so
# we can cloudpickle it correctly
from deploy_model import *
import predict as predict_script
import numerapi


def unpickle(fl):
    with open(fl, "rb") as infile:
        return pickle.load(infile)

In [2]:
#!aws s3 cp --recursive s3://numerai-v1/experiments/train_on_all_data_2023-05-01_16h-10m/models/ ./

## 1. Download and build ensemble model

In [3]:
argn_mdls = {
    tgt: unpickle(fl=f"./model_{tgt}_v4_20.pkl.pkl")
    for tgt in ["cyrus", "tyler", "ben", "waldo", "victor", "nomi"]
}
argn_mdls["tyler"]

In [4]:
argentina_no_ntr = EnsembleNeutralModel(
    models=argn_mdls,
    neutralisation_cols=None,
    neutralisation_prop=None,
    ensembling_fn=argentina_ensemble,
)

## 2. Sanity check it works with both int8 and float32 features

### 2.1 Script using int8 data

In [5]:
# Loads int8 data
pred_arg_nontr_df = predict_script.predict(napi=numerapi.NumerAPI(), wrapped_model=argentina_no_ntr)
pred_arg_nontr_df

[2023-06-09 03:21:16,204] INFO - reading prediction data
[2023-06-09 03:21:17,620] INFO - target file already exists
[2023-06-09 03:21:17,623] INFO - download complete
[2023-06-09 03:21:17,627] INFO - Downloaded live data to v4.1/live_502.parquet...
[2023-06-09 03:21:18,101] INFO - generating predictions


Predicting for each model: 100%|█| 6/6 [00:21<00:00,  3.6

[2023-06-09 03:21:40,411] INFO - Ensembling predictions with argentina_ensemble(): ['pred_cyrus', 'pred_tyler', 'pred_ben', 'pred_waldo', 'pred_victor', 'pred_nomi']
[2023-06-09 03:21:40,416] INFO - Taking the rank percent





Unnamed: 0_level_0,prediction
id,Unnamed: 1_level_1
n000124edbee5931,0.840941
n0006fd05e5c5171,0.854737
n0008241720e02e0,0.102658
n002cecd72ff9453,0.095354
n0033b5ad4d2f9a0,0.730980
...,...
nffcf4bdcf971590,0.936093
nffd07e017f3def4,0.480219
nfff296ce15d1d13,0.683912
nfff505ecc1ec6ad,0.726922


### 2.2 Load float data and verify model predicts identical results

In [6]:
napi = numerapi.NumerAPI()
napi.download_dataset("v4.1/live.parquet", "live.parquet")
argentina_no_ntr.predict(pd.read_parquet("live.parquet"))

[2023-06-09 03:21:41,244] INFO - target file already exists
[2023-06-09 03:21:41,246] INFO - download complete


Predicting for each model: 100%|█| 6/6 [00:23<00:00,  3.9

[2023-06-09 03:22:05,263] INFO - Ensembling predictions with argentina_ensemble(): ['pred_cyrus', 'pred_tyler', 'pred_ben', 'pred_waldo', 'pred_victor', 'pred_nomi']
[2023-06-09 03:22:05,267] INFO - Taking the rank percent





Unnamed: 0_level_0,prediction
id,Unnamed: 1_level_1
n000124edbee5931,0.840941
n0006fd05e5c5171,0.854737
n0008241720e02e0,0.102658
n002cecd72ff9453,0.095354
n0033b5ad4d2f9a0,0.730980
...,...
nffcf4bdcf971590,0.936093
nffd07e017f3def4,0.480219
nfff296ce15d1d13,0.683912
nfff505ecc1ec6ad,0.726922


## 3. Cloudpickle predict file for pickle uploads

In [7]:
pkl_fl = "./models/pkl_upload_argentina_no_ntr.pkl"
with open(pkl_fl, "wb") as outfile:
    cloudpickle.dump(obj=argentina_no_ntr.predict, file=outfile)

### Unpickle and verify predictions w float32 for pickle uploads

In [8]:
# Delete all local variables and start afresh
from IPython import get_ipython
get_ipython().magic('reset -sf')

  get_ipython().magic('reset -sf')


In [9]:
import pickle
import pandas as pd

def unpickle(fl):
    with open(fl, "rb") as infile:
        return pickle.load(infile)

In [10]:
pkl_fl = "./models/pkl_upload_argentina_no_ntr.pkl"
arg_predict_from_pkl = unpickle(pkl_fl)

In [11]:
arg_predict_from_pkl(pd.read_parquet("live.parquet"))

Predicting for each model: 100%|█| 6/6 [00:23<00:00,  3.9

[2023-06-09 03:22:37,626] INFO - Ensembling predictions with argentina_ensemble(): ['pred_cyrus', 'pred_tyler', 'pred_ben', 'pred_waldo', 'pred_victor', 'pred_nomi']
[2023-06-09 03:22:37,630] INFO - Taking the rank percent





Unnamed: 0_level_0,prediction
id,Unnamed: 1_level_1
n000124edbee5931,0.840941
n0006fd05e5c5171,0.854737
n0008241720e02e0,0.102658
n002cecd72ff9453,0.095354
n0033b5ad4d2f9a0,0.730980
...,...
nffcf4bdcf971590,0.936093
nffd07e017f3def4,0.480219
nfff296ce15d1d13,0.683912
nfff505ecc1ec6ad,0.726922
