# Simple Predictor for Jane-Street Market competition

Here I show how to build a very simple neural network model in Pytorch and train it on a GPU.

Pre-processing of data is done as described in [this notebook](https://www.kaggle.com/andreasthomasen/preprocessing-and-feature-selection). The main difference is that we only do PCA here and retain a lot of features. The reason is that we do not use RNNs, but instead only rely on instantaneous feature values. So this model can be trained with quite a lot of features included.

If you read this notebook from start to finish, you will learn how to
* Load data into pandas
* Do feature reduction using PCA
* Define a neural network model in pytorch
* Train the model and save it using pickle

Thanks for reading, if you like it, feel free to copy it. Nothing revolutionary in this notebook. It would also be helpful if you upvoted :)

UPDATE: Including the training step, it took too long to run this notebook for submission. So instead it now saves the model at the end. You can run it later in a private submission.
Enjoy!

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

import torch
import torch.nn as nn
import torch.optim as optim

if torch.cuda.is_available():
    dev = torch.device("cuda")
else:
    dev = torch.device("cpu")

import pickle
    
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


# Load data and reduce dimensions

In [None]:
train = pd.read_csv('/kaggle/input/jane-street-market-prediction/train.csv')
batch_size = len(train)

The tensors below will be used later. The wrtensor is used in training. We store feature_0 in a separate tensor since it is the only integer valued feature.

In [None]:
wrtensor = torch.tensor(train.loc[:,['weight','resp']].to_numpy(),dtype=torch.float)
wrtensor = torch.mul(wrtensor[:,0],wrtensor[:,1]).to(dev)
itensor = torch.tensor(((train.loc[:,'feature_0']+1)//2).to_numpy(),dtype=torch.long,device=dev)

We make a separate tensor that contains all other features

In [None]:
feature_names = ['feature_'+str(i) for i in range(1,130)]
train = train[feature_names]

Let's remove outliers first

In [None]:
maxindex = np.zeros((129,3))
for i in range(129):
    counts = train[feature_names[i]].value_counts()
    mean = train[feature_names[i]].mean()
    std = train[feature_names[i]].std()
    sigmas = np.abs(counts.index[0]-mean)/std
    maxindex[i] = [counts.index[0], counts.iloc[0], sigmas]
    
for i in range(129):
    if maxindex[i,1] > 100 and maxindex[i,2] > 1:
        train.replace({feature_names[i]: maxindex[i,0]},np.nan)

Now we need to deal with NaN. We impute those missing values with the mean of each column.

In [None]:
fill_val=train.mean()
train = train.fillna(fill_val)

We compute the principal components and reduce the feature space using sklearn

In [None]:
pca_components = 60
sc = StandardScaler().fit(train.to_numpy())
train = sc.transform(train.to_numpy())
pca = PCA(n_components = pca_components).fit(train)
train=pca.transform(train)

Finally we have a tensor with the last features we will use

In [None]:
train = torch.tensor(train,dtype=torch.float,device=dev)

# Model

We will make a very simple model at first using pytorch. The idea is to have fully connected layers deal with all of the floating point features, while feature_0 is used in an embedding layer.

In [None]:
e_size = 64
fc_input = pca_components
h_dims = [512,512,256,128]
dropout_rate = 0.5
epochs = 200
minibatch_size = 100000

class MarketPredictor(nn.Module):
    def __init__(self):
        super(MarketPredictor, self).__init__()
        
        self.e = nn.Embedding(2,e_size)
        self.deep = nn.Sequential(
            nn.Linear(fc_input,h_dims[0]),
            nn.BatchNorm1d(h_dims[0]),
            nn.LeakyReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(h_dims[0],h_dims[1]),
            nn.BatchNorm1d(h_dims[1]),
            nn.LeakyReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(h_dims[1],h_dims[2]),
            nn.BatchNorm1d(h_dims[2]),
            nn.LeakyReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(h_dims[2],h_dims[3]),
            nn.BatchNorm1d(h_dims[3]),
            nn.LeakyReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(h_dims[3],e_size),
            nn.BatchNorm1d(e_size),
            nn.LeakyReLU(),
            nn.Dropout(dropout_rate)
            )
        self.reduce = nn.utils.weight_norm(nn.Linear(e_size,1))
        self.sig = nn.Sigmoid()
        
    def forward(self,xi,xf):
        e_out = self.e(xi)
        f_out = self.deep(xf)
        ef_out = self.reduce(e_out+f_out)
        sig_out = self.sig(ef_out)
        
        return sig_out
        

Now we train it. Let's define the loss function first. In the competition we're told that the return on day $i$ is
\begin{equation}
p_i = \sum_j (\mathit{weight}_{ij}*\mathit{resp}_{ij}*\mathit{action}_{ij})
\end{equation}
The way we've made the network it gives a sigmoidal output $s_{ij} \in[0;1]$. Let's make the cost-function
\begin{equation}
C = \sum_i c_i = -\sum_{i,j} (\mathit{weight}_{ij}*\mathit{resp}_{ij}*s_{ij}).
\end{equation}
This has the same minimum as $p_i$, but the advantage is that it's got finite gradients with respect to the model parameters, and so should work better with SGD.

We will also use minibatches to prevent overfitting.

In [None]:
def loss(s,wr):
    return - torch.dot(s,wr)

Let's make some torch tensors which hold the training data and apply our model to it

In [None]:
model = MarketPredictor().to(dev)
opt = optim.Adam(model.parameters())

In [None]:
minibatches = batch_size//minibatch_size

for i in range(epochs):
    permutation = torch.randperm(batch_size)
    print('Epoch is',i,'/',epochs)
    for j in range(minibatches):
        opt.zero_grad()
        s = model(itensor[permutation[j*minibatch_size:(j+1)*minibatch_size]],train[permutation[j*minibatch_size:(j+1)*minibatch_size]])
        c = loss(s.squeeze(),wrtensor[permutation[j*minibatch_size:(j+1)*minibatch_size]])
        c.backward()
        opt.step()
    print('Loss is',c.item())

# Saving the model
It's pretty easy to save a pytorch model. We will use pickle and save the state dict of the model.

In [None]:
path = 'marketpredictor_state_dict_'+str(epochs)+'epochs.pt'
torch.save(model.state_dict(),path)

We will also need the standard scaler and pca objects, as well as the maxindex and fill_val for when we run things for submission later

In [None]:
with open('feature_processing.pkl','wb') as f:
    pickle.dump([sc,pca,maxindex,fill_val],f)