# Jane Street Market Prediction 
## *My final submission ... and how I got there*
***

**A quick heads up:** *My final submission is just a simple, basic ensemble of relatively shallow neural nets and probably nothing no one's not seen before. It ain't gonna make you [Jim Simons](https://www.investopedia.com/articles/investing/030516/jim-simons-success-story-net-worth-education-top-quotes.asp). But it's made me a lucky ducky (thus far). So enjoy and please don't steal too much of my luck! :-)*

***

First of all, this notebook is a clone of my final submission, which I kept super-lightweight. All of the underlying work is spread across quite a few notebooks and my GitHub repo for the competition - I've inserted relevant links throughout. Anyway, here's a recap of my journey in this competition:

## 1. Prototyping

Most of the ground work is in my GitHub repo [here](https://github.com/robert-manolache/kaggle-jsmp). I like working with scalable, reusable functions, so I built a `jsmp` package to make my life easier. Note that it includes some preliminary functions from a Python package I'm trying to develop as a wrapper for the Kaggle API, aptly called [lazykaggler](https://github.com/robert-manolache/lazy-kaggler) - hopefully I'll get the time to finish it someday and share it with the Kaggle community!

### 1.1 LightGBM experiment

I've outlined this experiment in this [notebook](https://github.com/robert-manolache/kaggle-jsmp/blob/main/lightGBM_experiment.ipynb) in my GitHub repo. In summary:
* I tried to predict classes for various `resp` ranges using a LightGBM multi-classifier
* Though it looked reasonably promising, it was somewhat tricky to come up with a good way to translate the various class probabilities into a 0/1 action (I even briefly considered a reinforcement learning approach, but it got messy and complicated very quickly!)
* That's when I started thinking more about using a neural network to directly model the 0/1 action from the baseline features

### 1.2 Neural Network experiment

I wanted to avoid building a binary return classifier, but still wanted the model to somehow predict a 0/1 action decision. The solution was to add a custom objective function that `tensorflow` could handle: 

$$ L = - \sum_{i} a_{i}r_{i}w_{i}^{h} $$

* $a_{i}$ is the output of a single final sigmoid activation node in the neural network
* $r_{i}$ is the observed return `resp` for the trade $i$
* $w_{i}^{h}$ is the `weight` of trade $i$, adjusted using a factor of $h \in (0,1)$ to avoid over-emphasising the large-weight trades in training

This is effectively a profit maximisation function and, since it takes into account the magnitude of `resp`, it actually leads more to a regression-type optimization rather than classification (that's my intuition, anyway, I might be wrong). Anyway, I did a bit of prototyping to get a ballpark figure for all hyperparameters before moving on.

PS: *Initially, I wanted to build this on top of the LightGBM return class predictions, but it turned out better to let the neural network work its magic on the original features. Plus, having more overhead in the final submission for a LightGBM model seemed quite unecessary.* 

## 2. Cross-Validation

I was quite happy with the neural network approach described above, so it was time to scale it up and get some CV notebooks working around the clock! I ran these on Kaggle, because I really needed the extra computational power - thank you, Kaggle! :-)

Here's a [sample](https://www.kaggle.com/slashie/jsmp-08-cv-final) of one of my CV notebooks. It's not very well commented (apologies!), so I'll give quick high-level summary of the approach:

* I have a `__create_NN_model__()` function that takes a `NN_params` input dictionary to configure models with different hyperparameters, architecture etc.
* For models with different `NN_params` and other settings, I ran CV on the same folds each time and recorded performance on each fold, before training a full model
* The notebook outputs 3 files each time it runs:
  * A CSV with the model's K-fold performance for each fold using different action thresholds
  * A JSON file with the model weights
  * A JSON file with the model's `NN_params` and other configuration settings

## 3. Ensemble Selection

After multiple CV runs, I collected a lot of K-fold performance data to help guide my selection of models for the ensemble. Here's the [notebook](https://github.com/robert-manolache/kaggle-jsmp/blob/main/ensemble_selection.ipynb) on GitHub where I came up with my final ensemble - it's not particularly scientific, but I was sprinting for the finish line and didn't have time to be precise. In the end I chose:
* The top 6 models by average utility score across all folds
* The top 7 models by ratio of average utility to utility standard deviation
* The top model in each of the 5 folds
* I also gave my 'favourite' model a couple of extra ensemble votes because ... why not? :-)

## 4. Final Submission

And so here we are ... this notebook! After a nightmare run of failed submissions during the public LB stage, I went to the extreme to make this notebook error-proof and as lightweight as possible: 
* It only imports `json`, `numpy` and `pandas` in addition to the necessary `janestreet` package
* There's no training - I realised it was pointless to re-train since you'll never get any new data to update and re-optimize (as far as I understand, at least...)
* It uses the model weights that were saved for each model during the CV stage
  * These are sourced by the notebook from private datasets
* I just use `numpy` to manually make the neural network matrix operations to derive predictions 


In [None]:
import janestreet
import json
import numpy as np
import pandas as pd

## Load models and ensemble weights

In [None]:
# create feature names and load ensemble config
features = ["feature_" + str(f) for f in range(130)]
e_df = pd.read_csv("../input/jsmpensemble004/ensemble_weights.csv")


# iterate through each model in the ensemble
wb = {} # for weights/biases
nL = [] # number of layers
for m in e_df["model"]:
    
    # load model weights file
    with open("../input/jsmpensemble004/%s_model_weights.json"%m) as rf:
        params = json.load(rf)
    
    # convert weights/biases to numpy arrays
    for k,v in params.items():
        params[k] = np.array(v)
    
    # append model params and number of layers
    wb[m] = params
    nL.append(int(len(list(params.keys()))/2))

# include number of layers in ensemble config df for easy iteration
e_df.loc[:, "n_layers"] = nL
e_df

## Make predictions

In [None]:
env = janestreet.make_env() # initialize the environment
iter_test = env.iter_test() # an iterator which loops over the test set

for (test_df, sample_prediction_df) in iter_test:
    
    # only consider trades that count towards score
    if (test_df['weight'].values[0] > 0):
        
        # wrap in try/except to avoid failures
        try:  
            
            # subset features and fill NAs with 0
            X = test_df[features].fillna(0).values
            
            # get weight and assign decision threshold based on formula
            w = test_df['weight'].values[0]
            t = 0.5 + ((w/300) ** 1.5)
            if t > 0.75:
                t = 0.75
            y = 0
            
            # iterate through ensemble models
            for m, eW, nL in e_df.itertuples(index=False):
            # m:  model
            # eW: ensemble weight
            # nL: number of NN layers
                
                # get the parameter set
                params = wb[m]
                
                # iterate through each layer
                for i in range(nL):
                    
                    # initialize first layer input
                    if i == 0:
                        a = X.copy()
                    else:
                        pass
                    
                    # compute linear output 
                    a = a @ params["w"+str(i)] + params["b"+str(i)]
                    
                    # default to relu unless it's the last layer - use sigmoid there
                    if i < (nL-1):
                        a = np.where(a < 0, 0, a)
                    else:
                        a = 1/(1+np.exp(-a))
                
                # increment action probability using model's assigned weight
                y += (a[0][0] * eW)
            
            # have some fail-safes before predicting 0/1 action
            if np.isnan(y):
                action = 0
            elif y > t:
                action = 1
            else:
                action = 0
                
        except:           
            action = 0
            
    else:
        action = 0
    
    #to test it works
    #print("Choose action %d with dtype %s"%(action, type(action)))
    
    # submit action
    sample_prediction_df["action"] = action #make your 0/1 prediction here
    env.predict(sample_prediction_df)