<a href="https://colab.research.google.com/github/jamesjmcconnell/numerai-model/blob/main/Numerai_e2e_CatBoost.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# An End-to-end guide to making your first Numer.ai Submission

The goal of this notebook in colab is to get you up and runnig with the tournament in the easiest way possible. Numerai data already comes with so many helpful scripts. This notebook is inspired by [example-scripts](https://github.com/numerai/example-scripts).

Colab provides free access to GPU/TPU to everyone ⚡. To utilize GPU for your model, go to `Runtime > Change runtime type > GPU > Save`

---

All you have to do to make your first submission is,

- Make sure you have signed up on [Numerai](https://numer.ai/signup)
- Create and setup your API keys (which is super easy)
- Click `Runtime > Run all`

Medium post: [An easy guide to “The hardest data science tournament on the planet”](https://towardsdatascience.com/a-guide-to-the-hardest-data-science-tournament-on-the-planet-748f46e83690)

## Loading required libraries 📔 and dataset 🗄️🔽

In [None]:
# installing required libraries
# numerapi, for facilitating data download and predictions uploading
# catboost, for modeling and making predictions
!pip install numerapi
!pip install catboost

In [None]:
import os
import gc
import csv
import glob
import time
from pathlib import Path

import numerapi

import scipy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from catboost import CatBoostRegressor

In [None]:
napi = numerapi.NumerAPI(verbosity="info")
# download current dataset
napi.download_current_dataset(unzip=True)

current_ds = napi.get_current_round()
latest_round = os.path.join('numerai_dataset_'+str(current_ds))

./numerai_dataset_240.zip: 381MB [00:04, 84.9MB/s]                           
2020-12-01 05:19:02,417 INFO numerapi.base_api: unzipping file...


## Helper functions for efficient loading and evaluation 📐

In [None]:
TOURNAMENT_NAME = "nomi"
TARGET_NAME = f"target"
PREDICTION_NAME = f"prediction"

BENCHMARK = 0
BAND = 0.2

#-----------------------------------------------------

# Submissions are scored by spearman correlation
def score(df):
    # method="first" breaks ties based on order in array
    return np.corrcoef(
        df[TARGET_NAME],
        df[PREDICTION_NAME].rank(pct=True, method="first")
    )[0, 1]

def correlation(predictions, targets):
    ranked_preds = predictions.rank(pct=True, method="first")
    return np.corrcoef(ranked_preds, targets)[0, 1]

# The payout function
def payout(scores):
    return ((scores - BENCHMARK) / BAND).clip(lower=-1, upper=1)


# Read the csv file into a pandas Dataframe
def read_csv(file_path):
    with open(file_path, 'r') as f:
        column_names = next(csv.reader(f))
        dtypes = {x: np.float16 for x in column_names if
                  x.startswith(('feature', 'target'))}
    return pd.read_csv(file_path, dtype=dtypes)

## Loading and exploring dataset into memory 🖥️

In [None]:
%%time
print("# Loading data...")
# The training data is used to train your model how to predict the targets.
training_data = read_csv(os.path.join(latest_round, "numerai_training_data.csv")).set_index("id")
# The tournament data is the data that Numerai uses to evaluate your model.
tournament_data = read_csv(os.path.join(latest_round, "numerai_tournament_data.csv")).set_index("id")

example_preds = read_csv(os.path.join(latest_round, "example_predictions.csv"))

validation_data = tournament_data[tournament_data.data_type == "validation"]

# Loading data...
CPU times: user 49 s, sys: 3.49 s, total: 52.5 s
Wall time: 52.5 s


In [None]:
feature_names = [f for f in training_data.columns if f.startswith("feature")]
print(f"Loaded {len(feature_names)} features")

cols = feature_names+[TARGET_NAME]

Loaded 310 features


Training data | Sample submission
- | - 
![alt](https://gblobscdn.gitbook.com/assets%2F-LmGruQ_-ZYj9XMQUd5x%2F-LrjUJcZGLBAGyzvX2tl%2F-LrlScdEXnDEVhYpSsIN%2FEx_data.png?alt=media&token=66e1ed15-abca-4fda-8485-cc72b7662bdb) | ![alt](https://gblobscdn.gitbook.com/assets%2F-LmGruQ_-ZYj9XMQUd5x%2F-LrjUJcZGLBAGyzvX2tl%2F-LrlT5EetbUvp5qr9MBy%2Fimage.png?alt=media&token=cab0eef4-759f-4412-8a8c-86b211e85917)

In [None]:
training_data.head()

Unnamed: 0_level_0,era,data_type,feature_intelligence1,feature_intelligence2,feature_intelligence3,feature_intelligence4,feature_intelligence5,feature_intelligence6,feature_intelligence7,feature_intelligence8,feature_intelligence9,feature_intelligence10,feature_intelligence11,feature_intelligence12,feature_charisma1,feature_charisma2,feature_charisma3,feature_charisma4,feature_charisma5,feature_charisma6,feature_charisma7,feature_charisma8,feature_charisma9,feature_charisma10,feature_charisma11,feature_charisma12,feature_charisma13,feature_charisma14,feature_charisma15,feature_charisma16,feature_charisma17,feature_charisma18,feature_charisma19,feature_charisma20,feature_charisma21,feature_charisma22,feature_charisma23,feature_charisma24,feature_charisma25,feature_charisma26,...,feature_wisdom8,feature_wisdom9,feature_wisdom10,feature_wisdom11,feature_wisdom12,feature_wisdom13,feature_wisdom14,feature_wisdom15,feature_wisdom16,feature_wisdom17,feature_wisdom18,feature_wisdom19,feature_wisdom20,feature_wisdom21,feature_wisdom22,feature_wisdom23,feature_wisdom24,feature_wisdom25,feature_wisdom26,feature_wisdom27,feature_wisdom28,feature_wisdom29,feature_wisdom30,feature_wisdom31,feature_wisdom32,feature_wisdom33,feature_wisdom34,feature_wisdom35,feature_wisdom36,feature_wisdom37,feature_wisdom38,feature_wisdom39,feature_wisdom40,feature_wisdom41,feature_wisdom42,feature_wisdom43,feature_wisdom44,feature_wisdom45,feature_wisdom46,target
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
n000315175b67977,era1,train,0.0,0.5,0.25,0.0,0.5,0.25,0.25,0.25,0.75,0.75,0.25,0.25,1.0,0.75,0.5,1.0,0.5,0.0,0.5,0.5,0.0,0.0,0.0,1.0,0.25,0.0,0.5,0.25,0.75,0.5,1.0,0.75,0.75,0.5,0.5,0.75,0.5,0.25,...,0.75,0.75,0.75,0.5,1.0,1.0,0.5,0.75,0.5,0.25,0.25,0.75,0.5,1.0,0.5,0.75,0.75,0.25,0.5,1.0,0.75,0.5,0.5,1.0,0.25,0.5,0.5,0.5,0.75,1.0,1.0,1.0,0.75,0.5,0.75,0.5,1.0,0.5,0.75,0.5
n0014af834a96cdd,era1,train,0.0,0.0,0.0,0.25,0.5,0.0,0.0,0.25,0.5,0.5,0.0,0.5,0.0,0.5,0.5,0.5,0.5,0.5,0.25,0.25,0.5,0.0,1.0,0.5,0.5,0.5,0.75,0.5,0.5,0.75,0.25,0.5,0.75,0.5,0.25,0.75,0.5,0.5,...,0.25,0.25,0.25,1.0,1.0,0.5,0.5,0.5,0.0,0.25,1.0,0.5,1.0,1.0,0.5,0.5,0.5,1.0,0.25,0.75,1.0,0.25,0.25,1.0,0.5,0.5,0.5,0.75,0.75,0.75,1.0,1.0,0.0,0.0,0.75,0.25,0.0,0.25,1.0,0.25
n001c93979ac41d4,era1,train,0.25,0.5,0.25,0.25,1.0,0.75,0.75,0.25,0.0,0.25,0.5,1.0,0.5,0.75,0.5,0.5,1.0,0.5,0.5,0.5,0.25,0.0,0.25,0.75,0.75,0.75,0.5,0.75,0.5,0.25,0.5,0.75,0.25,0.5,0.5,0.75,0.5,0.5,...,0.25,1.0,1.0,1.0,0.5,1.0,1.0,1.0,0.5,1.0,0.0,1.0,1.0,0.5,1.0,0.75,1.0,0.0,0.5,0.75,0.0,1.0,0.5,0.5,0.75,1.0,0.75,1.0,0.25,0.5,0.25,0.5,0.0,0.0,0.5,1.0,0.0,0.25,0.75,0.25
n0034e4143f22a13,era1,train,1.0,0.0,0.0,0.5,0.5,0.25,0.25,0.75,0.25,0.5,0.5,0.5,0.75,0.5,1.0,0.5,0.5,0.0,1.0,0.0,0.75,0.0,0.5,0.5,0.5,0.5,0.0,0.5,0.5,0.75,0.75,0.5,0.25,0.5,0.5,0.5,0.5,0.5,...,1.0,1.0,0.75,0.75,1.0,0.75,0.75,0.75,1.0,0.75,1.0,0.75,1.0,0.75,1.0,0.0,0.5,0.75,1.0,0.75,1.0,0.75,1.0,1.0,0.0,0.5,0.75,0.75,1.0,0.75,1.0,1.0,0.75,0.75,1.0,1.0,0.75,1.0,1.0,0.25
n00679d1a636062f,era1,train,0.25,0.25,0.25,0.25,0.0,0.25,0.5,0.25,0.25,0.5,0.25,0.25,0.75,0.5,0.0,0.5,0.5,0.25,0.0,0.5,0.0,0.5,0.25,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.75,0.5,0.25,0.5,0.5,0.5,0.5,0.25,...,1.0,0.25,0.75,1.0,0.75,0.0,0.0,0.75,0.5,1.0,0.5,0.75,0.25,0.5,0.0,0.5,0.5,0.5,0.75,0.75,0.5,0.75,0.25,0.75,0.5,0.5,0.25,0.25,0.75,0.5,0.75,0.75,0.25,0.5,0.75,0.0,0.5,0.25,0.75,0.75


In [None]:
tournament_data.head()

Unnamed: 0_level_0,era,data_type,feature_intelligence1,feature_intelligence2,feature_intelligence3,feature_intelligence4,feature_intelligence5,feature_intelligence6,feature_intelligence7,feature_intelligence8,feature_intelligence9,feature_intelligence10,feature_intelligence11,feature_intelligence12,feature_charisma1,feature_charisma2,feature_charisma3,feature_charisma4,feature_charisma5,feature_charisma6,feature_charisma7,feature_charisma8,feature_charisma9,feature_charisma10,feature_charisma11,feature_charisma12,feature_charisma13,feature_charisma14,feature_charisma15,feature_charisma16,feature_charisma17,feature_charisma18,feature_charisma19,feature_charisma20,feature_charisma21,feature_charisma22,feature_charisma23,feature_charisma24,feature_charisma25,feature_charisma26,...,feature_wisdom8,feature_wisdom9,feature_wisdom10,feature_wisdom11,feature_wisdom12,feature_wisdom13,feature_wisdom14,feature_wisdom15,feature_wisdom16,feature_wisdom17,feature_wisdom18,feature_wisdom19,feature_wisdom20,feature_wisdom21,feature_wisdom22,feature_wisdom23,feature_wisdom24,feature_wisdom25,feature_wisdom26,feature_wisdom27,feature_wisdom28,feature_wisdom29,feature_wisdom30,feature_wisdom31,feature_wisdom32,feature_wisdom33,feature_wisdom34,feature_wisdom35,feature_wisdom36,feature_wisdom37,feature_wisdom38,feature_wisdom39,feature_wisdom40,feature_wisdom41,feature_wisdom42,feature_wisdom43,feature_wisdom44,feature_wisdom45,feature_wisdom46,target
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
n0003aa52cab36c2,era121,validation,0.25,0.75,0.5,0.5,0.0,0.75,0.5,0.25,0.5,0.5,0.25,0.0,0.25,0.5,0.25,0.0,0.25,1.0,1.0,0.25,1.0,1.0,0.25,0.25,0.0,0.5,0.25,0.75,0.0,0.5,0.25,0.25,0.25,0.5,0.0,0.5,1.0,0.25,...,0.0,0.0,0.25,0.5,0.25,0.25,0.0,0.25,0.0,0.25,0.5,0.5,0.5,0.5,0.0,0.25,0.75,0.25,0.25,0.5,0.25,0.0,0.25,0.5,0.25,0.5,0.25,0.25,1.0,0.75,0.75,0.75,1.0,0.75,0.5,0.5,1.0,0.0,0.0,0.25
n000920ed083903f,era121,validation,0.75,0.5,0.75,1.0,0.5,0.0,0.0,0.75,0.25,0.0,0.75,0.5,0.0,0.25,0.5,0.0,1.0,0.25,0.25,1.0,1.0,0.25,0.75,0.0,0.0,0.75,1.0,1.0,0.0,0.25,0.0,0.0,0.25,0.25,0.25,0.0,1.0,0.25,...,0.5,0.5,0.25,1.0,0.5,0.25,0.0,0.25,0.5,0.25,1.0,0.25,0.0,0.5,0.75,0.75,0.5,1.0,1.0,0.25,0.5,0.25,0.5,0.5,0.5,0.5,0.25,0.25,0.75,0.5,0.5,0.5,0.75,1.0,0.75,0.5,0.5,0.5,0.5,0.5
n0038e640522c4a6,era121,validation,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.5,0.5,1.0,1.0,1.0,0.75,0.5,0.5,1.0,1.0,0.5,0.5,0.0,1.0,0.5,1.0,0.5,1.0,0.5,1.0,0.25,1.0,1.0,1.0,0.5,1.0,1.0,0.75,1.0,1.0,...,0.25,0.5,0.0,0.0,0.0,0.25,0.25,0.0,0.5,0.0,0.0,0.0,0.25,0.0,0.25,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.75,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.5,0.25,0.0,0.0,0.5,0.5,0.0,1.0
n004ac94a87dc54b,era121,validation,0.75,1.0,1.0,0.5,0.0,0.0,0.0,0.5,0.75,1.0,0.75,0.0,0.5,0.0,0.5,0.75,0.5,0.75,0.25,0.75,0.25,0.75,0.25,0.75,1.0,0.5,0.5,0.75,0.5,1.0,0.5,0.25,0.75,0.25,0.75,0.25,0.75,0.75,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.75,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.25,0.5
n0052fe97ea0c05f,era121,validation,0.25,0.5,0.5,0.25,1.0,0.5,0.5,0.25,0.25,0.5,0.5,1.0,1.0,1.0,1.0,0.75,0.5,0.5,0.5,0.75,0.0,0.0,0.0,0.25,0.0,0.0,0.75,0.25,1.0,0.25,1.0,0.75,0.0,1.0,0.75,0.75,0.75,0.25,...,0.0,0.5,0.5,0.0,0.75,0.5,0.75,0.25,0.25,0.25,0.0,0.25,0.5,0.25,1.0,1.0,1.0,0.0,0.25,0.0,0.0,0.25,0.25,0.75,1.0,1.0,0.75,0.75,0.5,0.5,0.5,0.75,0.0,0.0,0.75,1.0,0.0,0.25,1.0,0.75


## Training our model 🤖⚙️

This is where most of tweaking will happen. You can add more model in your pipeline simply by changing your model and data pipeline suited for that architecture.

In [None]:
%%time
MODEL_FILE = "example_model.cbm"

params = {
    'task_type': 'GPU'
    }

model = CatBoostRegressor(**params)

if os.path.isfile(MODEL_FILE):
    print("Loading pre-trained model...")
    model.load_model(MODEL_FILE)
else:
    print("Training model...")
    model.fit(training_data[feature_names].astype(np.float32), training_data[TARGET_NAME].astype(np.float32),
         eval_set=(validation_data[feature_names].astype(np.float32), validation_data[TARGET_NAME].astype(np.float32))
         )
    model.save_model(MODEL_FILE)

Training model...
Learning rate set to 0.097143
0:	learn: 0.2232607	test: 0.2234885	best: 0.2234885 (0)	total: 16.5ms	remaining: 16.5s
1:	learn: 0.2232551	test: 0.2234878	best: 0.2234878 (1)	total: 31.5ms	remaining: 15.7s
2:	learn: 0.2232473	test: 0.2234865	best: 0.2234865 (2)	total: 47ms	remaining: 15.6s
3:	learn: 0.2232424	test: 0.2234840	best: 0.2234840 (3)	total: 61.9ms	remaining: 15.4s
4:	learn: 0.2232372	test: 0.2234819	best: 0.2234819 (4)	total: 77.2ms	remaining: 15.4s
5:	learn: 0.2232316	test: 0.2234803	best: 0.2234803 (5)	total: 92.7ms	remaining: 15.3s
6:	learn: 0.2232272	test: 0.2234795	best: 0.2234795 (6)	total: 109ms	remaining: 15.4s
7:	learn: 0.2232180	test: 0.2234775	best: 0.2234775 (7)	total: 124ms	remaining: 15.4s
8:	learn: 0.2232117	test: 0.2234769	best: 0.2234769 (8)	total: 139ms	remaining: 15.3s
9:	learn: 0.2232044	test: 0.2234745	best: 0.2234745 (9)	total: 154ms	remaining: 15.3s
10:	learn: 0.2231983	test: 0.2234734	best: 0.2234734 (10)	total: 170ms	remaining: 15.3s


## Predictions. Evaluation. ➡️

In [None]:
%%time
print("Generating predictions on training data...")
training_preds = model.predict(training_data[feature_names].astype(np.float32))
training_data[PREDICTION_NAME] = training_preds
gc.collect()

print("Generating predictions on tournament data...")
tournament_preds = model.predict(tournament_data[feature_names].astype(np.float32))
tournament_data[PREDICTION_NAME] = tournament_preds

Generating predictions on training data...
Generating predictions on tournament data...
CPU times: user 6.44 s, sys: 1.9 s, total: 8.34 s
Wall time: 7.03 s


In [None]:
# Check the per-era correlations on the training set (in sample)
train_correlations = training_data.groupby("era").apply(score)
print(f"On training the correlation has mean {train_correlations.mean()} and std {train_correlations.std()}")
print(f"On training the average per-era payout is {payout(train_correlations).mean()}")

# Check the per-era correlations on the validation set (out of sample)
validation_data = tournament_data[tournament_data.data_type == "validation"]
validation_correlations = validation_data.groupby("era").apply(score)
print(f"On validation the correlation has mean {validation_correlations.mean()} and "
        f"std {validation_correlations.std()}")
print(f"On validation the average per-era payout is {payout(validation_correlations).mean()}")

2020-12-01 05:24:47,301 INFO numexpr.utils: NumExpr defaulting to 2 threads.


On training the correlation has mean 0.07446827116043177 and std 0.027392572863037482
On training the average per-era payout is 0.37234135580215866
On validation the correlation has mean 0.018998788903111463 and std 0.02525694300065223
On validation the average per-era payout is 0.0949939445155573


In [None]:
# FEAT_EXPOSURE: How much your model is correlated to the features across era
feature_exposures = validation_data[feature_names].apply(
    lambda d: correlation(validation_data[PREDICTION_NAME], d), axis=0
)
max_per_era = validation_data.groupby("era").apply(
    lambda d: d[feature_names].corrwith(d[PREDICTION_NAME]).abs().max()
)
max_feature_exposure = max_per_era.mean()
print(f"Max Feature Exposure: {max_feature_exposure}")

Max Feature Exposure: 0.3320094601501898


In [None]:
tournament_data[PREDICTION_NAME].to_csv(f"{TOURNAMENT_NAME}_{current_ds}_submission.csv")

## Uploading predictions using your API keys 🚀

To create a key for submission only, 

`Settings -> Create API key -> select "Upload Predictions" -> Save`


In [None]:
# NameOfYourAI
# Add keys between the quotes
public_id = "YourKeys"
secret_key = "YourKeys"
model_id = "YourKeys"
napi = numerapi.NumerAPI(public_id=public_id, secret_key=secret_key)

In [None]:
submission_id = napi.upload_predictions(f"{TOURNAMENT_NAME}_{current_ds}_submission.csv", model_id=model_id)

And its done. Congratulations🎉. Your predictions for latest round are submitted! 


Check some information about your latest predictions on [Numerai Tournament]
(https://numer.ai/tournament). It will show some metrics like this,

![Submission](https://cdn-images-1.medium.com/max/600/1*3pb7M7utM21d3RXnhjx5KA.png)

Note: This screenshot is from my other submissions


## Let's check out how well the `example_predictions` perform 💭
You can compare your models with `example_predictions` and try to beat it on some metrics or atlest, you should aim for positive correlation in initial submissions.

In [None]:
#@title
tournament_data[PREDICTION_NAME]=example_preds['prediction_kazutsugi'].values

In [None]:
#@title
# Check the per-era correlations on the validation set (out of sample)
validation_data = tournament_data[tournament_data.data_type == "validation"]
validation_correlations = validation_data.groupby("era").apply(score)
print(f"On validation the correlation has mean {validation_correlations.mean()} and "
        f"std {validation_correlations.std()}")
print(f"On validation the average per-era payout is {payout(validation_correlations).mean()}")

## Some useful tips from my experience for using colab efficiently ✨
- You can do simple data exploration without any accelators(GPU/TPU).
- Use GPU/TPU only when everything is ready for execution.
- You can mount your Google Drive to save any work done here.
- Make sure to terminate session if your work is complete and you no longer need that session.


Created by Suraj Parmar

- Numerai: [SurajP](https://numer.ai/surajp)

- Twitter: [@parmarsuraj99](https://twitter.com/parmarsuraj99)


Thanks to [@NJ](https://twitter.com/tasha_jade) and [@MikeP](https://twitter.com/EasyMikeP) for the feedback
