<a href="https://colab.research.google.com/github/numerai/example-scripts/blob/master/making-your-first-submission-on-numerai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Making your first submission on Numerai

## Introduction 
This tutorial will go over how to create your first submission on Numerai.

## Overview

1. Using this notebook
2. Download the datasets
3. Train your first model
4. Generate your first predictions
4. Make your first submission


---



## 1. Using this notebook 

This is an interactive notebook. You can execute code in each cell by pressing `shift+enter`. This requires you to login with your Google account.

In order to make changes, you need to make a copy by `File -> Save a copy in Drive`.

Let's start off by installing and importing our dependencies.

In [1]:
# install dependencies
!pip install numpy pandas sklearn numerapi

Collecting numerapi
  Downloading numerapi-2.9.1-py3-none-any.whl (26 kB)
Installing collected packages: numerapi
Successfully installed numerapi-2.9.1


In [2]:
# import dependencies
import numpy as np
import pandas as pd
import numerapi
import sklearn.linear_model

## 2. Download the datasets

### Datasets 
*   `training_data` is used to train your model
*   `tournament_data` is used to evaluate your model

### Column descriptions
*   id: a randomized id that corresponds to a stock 
*   era: a period of time
*   data_type: either `train`, `validation`, `test`, or `live` 
*   feature_*: abstract financial features of the stock 
*   target: abstract measure of stock performance

Check out [this forum post](https://forum.numer.ai/t/super-massive-data-release-deep-dive/4053) for further details



In [3]:
napi = numerapi.NumerAPI()

In [4]:
# download the latest training dataset (takes around 30s)
training_data_filename = "numerai_training_data_int8.parquet"

napi.download_dataset(training_data_filename)
training_data = pd.read_parquet(training_data_filename)
training_data.head()

2021-09-27 10:58:11,660 INFO numerapi.utils: starting download
numerai_training_data_int8.parquet: 1.01GB [00:25, 39.5MB/s]                            


Unnamed: 0_level_0,era,data_type,feature_dichasial_hammier_spawner,feature_rheumy_epistemic_prancer,feature_pert_performative_hormuz,feature_hillier_unpitied_theobromine,feature_perigean_bewitching_thruster,feature_renegade_undomestic_milord,feature_koranic_rude_corf,feature_demisable_expiring_millepede,feature_unscheduled_malignant_shingling,feature_clawed_unwept_adaptability,feature_rubblier_chlorotic_stogy,feature_untumbled_histologic_inion,feature_piffling_inflamed_jupiter,feature_abstersive_emotional_misinterpreter,feature_unluckiest_mulley_benzyl,feature_escutcheoned_timocratic_kotwal,feature_integrated_extroversive_ambivalence,feature_vedic_mitral_swiz,feature_reclaimed_fallibilist_turpentine,feature_gone_honduran_worshipper,feature_insociable_exultant_tatum,feature_outdated_tapered_speciation,feature_leggiest_slaggiest_inez,feature_chaldean_vixenly_propylite,feature_hysteric_mechanized_recklinghausen,feature_glare_factional_assessment,feature_highland_eocene_berean,feature_seemlier_reorient_monandry,feature_expressed_abhominable_pruning,feature_castrated_presented_quizzer,feature_restricted_aggregately_workmanship,feature_scorbutic_intellectualism_mongoloid,feature_telephonic_shakable_bollock,feature_subglobular_unsalable_patzer,feature_syrian_coital_counterproof,feature_supergene_legible_antarthritic,feature_hypothetic_distressing_endemic,feature_torturesome_estimable_preferrer,...,feature_oscillating_elaborated_mandatory,feature_contradictory_museful_somatotropin,feature_direst_interrupted_paloma,feature_congenerical_anodal_chelation,feature_pronominal_rampant_megaspore,feature_dropsical_suctorial_mnemosyne,feature_corrugated_dotiest_committeewoman,feature_architectonic_godlier_southland,feature_fishiest_simulatory_roadholding,feature_unpruned_pedagoguish_inkblot,feature_forworn_hask_haet,feature_drawable_exhortative_dispersant,feature_metabolic_minded_armorist,feature_investigatory_inerasable_circumvallation,feature_centroclinal_incentive_lancelet,feature_unemotional_quietistic_chirper,feature_behaviorist_microbiological_farina,feature_lofty_acceptable_challenge,feature_coactive_prefatorial_lucy,target,target_nomi_20,target_nomi_60,target_jerome_20,target_jerome_60,target_janet_20,target_janet_60,target_ben_20,target_ben_60,target_alan_20,target_alan_60,target_paul_20,target_paul_60,target_george_20,target_george_60,target_william_20,target_william_60,target_arthur_20,target_arthur_60,target_thomas_20,target_thomas_60
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
n003bba8a98662e4,1,train,4,2,4,4,0,0,4,4,3,0,2,4,0,4,0,4,0,4,1,2,1,1,1,0,0,4,4,0,0,3,4,1,0,1,1,3,4,0,...,4,4,3,2,3,3,0,0,0,0,4,3,4,3,2,4,4,1,0,0.25,0.25,0.0,0.25,0.25,0.25,0.25,0.25,0.0,0.5,0.25,0.25,0.25,0.25,0.0,0.166667,0.0,0.166667,0.0,0.166667,0.0
n003bee128c2fcfc,1,train,2,4,1,3,0,3,2,3,2,2,2,1,4,0,3,0,3,1,1,1,1,3,1,2,3,2,3,3,3,4,3,3,3,3,3,4,3,1,...,3,1,0,0,2,2,1,1,2,2,0,1,0,1,0,1,2,4,3,0.75,0.75,0.75,1.0,0.75,1.0,0.75,1.0,1.0,0.75,0.5,1.0,1.0,1.0,1.0,0.833333,0.666667,0.833333,0.666667,0.833333,0.666667
n0048ac83aff7194,1,train,2,1,3,0,3,0,3,3,4,2,0,0,3,2,2,2,1,3,2,0,0,0,1,1,4,2,3,4,4,1,0,4,1,0,1,1,0,3,...,1,3,1,2,3,2,2,2,2,2,2,3,4,2,1,2,3,3,4,0.5,0.5,0.25,0.5,0.25,0.25,0.25,0.5,0.25,0.25,0.5,0.5,0.25,0.25,0.25,0.5,0.333333,0.5,0.333333,0.5,0.333333
n00691bec80d3e02,1,train,4,2,2,3,0,4,1,4,1,3,1,4,0,2,1,2,0,2,0,1,3,4,0,0,0,4,2,0,0,4,3,4,1,1,1,2,0,0,...,0,3,0,3,1,4,0,0,0,0,1,2,2,2,1,0,1,0,2,0.75,0.75,0.5,0.5,0.5,0.5,0.75,0.75,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.666667,0.5,0.5,0.5,0.666667,0.5
n00b8720a2fdc4f2,1,train,4,3,4,4,0,0,4,2,3,0,1,3,2,0,3,0,3,1,2,1,0,0,1,1,1,4,4,3,1,4,4,0,0,0,0,1,3,1,...,4,4,0,3,1,0,2,1,2,1,4,0,1,4,1,4,4,2,1,0.75,0.75,0.5,0.5,0.5,0.75,0.75,0.5,0.5,0.75,0.5,0.5,0.5,0.5,0.5,0.666667,0.5,0.5,0.5,0.666667,0.5


In [5]:
# download the latest tournament dataset (takes around 30s)
tournament_data_filename = "numerai_tournament_data_int8.parquet"

napi.download_dataset(tournament_data_filename)
tournament_data = pd.read_parquet(tournament_data_filename)
tournament_data.head()

2021-09-27 10:58:51,125 INFO numerapi.utils: starting download
numerai_tournament_data_int8.parquet: 582MB [00:13, 42.9MB/s]                           


Unnamed: 0_level_0,era,data_type,feature_dichasial_hammier_spawner,feature_rheumy_epistemic_prancer,feature_pert_performative_hormuz,feature_hillier_unpitied_theobromine,feature_perigean_bewitching_thruster,feature_renegade_undomestic_milord,feature_koranic_rude_corf,feature_demisable_expiring_millepede,feature_unscheduled_malignant_shingling,feature_clawed_unwept_adaptability,feature_rubblier_chlorotic_stogy,feature_untumbled_histologic_inion,feature_piffling_inflamed_jupiter,feature_abstersive_emotional_misinterpreter,feature_unluckiest_mulley_benzyl,feature_escutcheoned_timocratic_kotwal,feature_integrated_extroversive_ambivalence,feature_vedic_mitral_swiz,feature_reclaimed_fallibilist_turpentine,feature_gone_honduran_worshipper,feature_insociable_exultant_tatum,feature_outdated_tapered_speciation,feature_leggiest_slaggiest_inez,feature_chaldean_vixenly_propylite,feature_hysteric_mechanized_recklinghausen,feature_glare_factional_assessment,feature_highland_eocene_berean,feature_seemlier_reorient_monandry,feature_expressed_abhominable_pruning,feature_castrated_presented_quizzer,feature_restricted_aggregately_workmanship,feature_scorbutic_intellectualism_mongoloid,feature_telephonic_shakable_bollock,feature_subglobular_unsalable_patzer,feature_syrian_coital_counterproof,feature_supergene_legible_antarthritic,feature_hypothetic_distressing_endemic,feature_torturesome_estimable_preferrer,...,feature_oscillating_elaborated_mandatory,feature_contradictory_museful_somatotropin,feature_direst_interrupted_paloma,feature_congenerical_anodal_chelation,feature_pronominal_rampant_megaspore,feature_dropsical_suctorial_mnemosyne,feature_corrugated_dotiest_committeewoman,feature_architectonic_godlier_southland,feature_fishiest_simulatory_roadholding,feature_unpruned_pedagoguish_inkblot,feature_forworn_hask_haet,feature_drawable_exhortative_dispersant,feature_metabolic_minded_armorist,feature_investigatory_inerasable_circumvallation,feature_centroclinal_incentive_lancelet,feature_unemotional_quietistic_chirper,feature_behaviorist_microbiological_farina,feature_lofty_acceptable_challenge,feature_coactive_prefatorial_lucy,target,target_nomi_20,target_nomi_60,target_jerome_20,target_jerome_60,target_janet_20,target_janet_60,target_ben_20,target_ben_60,target_alan_20,target_alan_60,target_paul_20,target_paul_60,target_george_20,target_george_60,target_william_20,target_william_60,target_arthur_20,target_arthur_60,target_thomas_20,target_thomas_60
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
n000101811a8a843,575,test,2,0,4,0,3,0,4,1,0,1,0,1,0,0,1,0,0,4,4,4,1,0,4,4,0,1,2,4,4,0,1,0,2,0,0,0,1,4,...,2,4,4,2,0,0,4,4,4,4,4,2,2,2,0,1,0,1,3,,,,,,,,,,,,,,,,,,,,,
n001e1318d5072ac,575,test,1,4,2,2,1,3,3,0,3,2,4,2,4,4,3,4,3,2,1,1,4,1,0,1,2,1,0,1,1,3,1,4,3,4,4,4,4,2,...,2,1,3,3,3,0,2,2,0,0,0,3,3,4,2,3,4,1,4,,,,,,,,,,,,,,,,,,,,,
n002a9c5ab785cbb,575,test,1,2,2,3,1,1,3,0,1,1,2,3,4,4,2,4,2,3,1,2,2,1,2,2,3,1,1,1,1,3,3,4,1,2,3,4,2,1,...,3,2,3,0,0,0,4,4,0,0,0,0,3,1,1,1,0,3,1,,,,,,,,,,,,,,,,,,,,,
n002ccf6d0e8c5ad,575,test,2,4,2,4,2,4,3,2,2,1,3,1,4,4,4,4,4,0,2,2,0,0,0,0,4,2,2,1,1,3,4,1,0,3,2,3,3,0,...,3,1,0,0,1,1,2,3,0,0,0,4,4,1,1,0,0,1,1,,,,,,,,,,,,,,,,,,,,,
n0051ab821295c29,575,test,2,0,0,1,0,4,2,1,3,4,1,2,1,3,2,2,2,0,2,4,2,2,1,3,1,1,2,2,2,2,2,2,2,1,1,2,2,2,...,2,0,0,4,4,2,0,1,2,4,4,3,4,2,1,4,3,0,2,,,,,,,,,,,,,,,,,,,,,


## 3. Train your first model
Let's create a basic model using sklearn's linear regression.

In [6]:
# find only the feature columns
feature_cols = training_data.columns[training_data.columns.str.startswith('feature')]

# only use non-overlapping eras
# Eras are weekly (5 days), but the target is four weeks out (20 days)
training_data["era_int"] = training_data.era.astype(int)
max_era = training_data.era_int.max()
training_data_subsample = training_data[training_data.era_int.isin(np.arange(1, (max_era + 1), 4))]

# select those columns out of the subsampled training dataset
training_features = training_data_subsample[feature_cols]

In [7]:
# create a model and fit the training data
model = sklearn.linear_model.LinearRegression()
model.fit(training_features, training_data_subsample.target)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

## 4. Generate your first predictions
Now that we have a trained model, we can use it to make predictions on the tournament data.



In [8]:
# select the feature columns from the tournament data
live_features = tournament_data[feature_cols]

In [9]:
# predict the target on the live features
predictions = model.predict(live_features)

In [10]:
# predictions must have an `id` column and a `prediction` column
predictions_df = tournament_data.index.to_frame()
predictions_df["prediction"] = predictions
predictions_df.head()

Unnamed: 0_level_0,id,prediction
id,Unnamed: 1_level_1,Unnamed: 2_level_1
n000101811a8a843,n000101811a8a843,0.493218
n001e1318d5072ac,n001e1318d5072ac,0.518794
n002a9c5ab785cbb,n002a9c5ab785cbb,0.482782
n002ccf6d0e8c5ad,n002ccf6d0e8c5ad,0.505387
n0051ab821295c29,n0051ab821295c29,0.50012


## 5. Make your first submission
To enter the tournament, we must submit the predictions back to Numerai. We will use the `numerapi` library to do this.

In [11]:
# Get your API keys and model_id from https://numer.ai/notebook
public_key = "REPLACEME"
secret_key = "REPLACEME"
model_id = "REPLACEME"
napi = numerapi.NumerAPI(public_id=public_key, secret_key=secret_key)

In [12]:
# Upload your predictions
predictions_df.to_csv("predictions.csv", index=False)
submission_id = napi.upload_predictions("predictions.csv", model_id=model_id, version=2)

2021-09-27 11:00:47,334 INFO numerapi.base_api: uploading predictions...


# Done 🚀
Good job! You just made your first submission on Numerai!

Head back over to https://numer.ai/notebook to continue.