## Multilayer Perceptron & Autoencoder

This notebook presents a novel architecture of multilayer perceptron and autoencoder inspired by a [top-1 solution](https://www.kaggle.com/c/jane-street-market-prediction/discussion/224348) for Jane Street market competition. Both components of the system are training together on the same data. The latent-space representation produced by Autoencoder is concatenated with original set of attributes and fed into MLP.

### Backlog:
1. <s> Concatenate output of AE with the input (use latent representation for a new features) </s>
2. <s> Try Swish (SiLU) activation function </s>
3. Add Gaussian noise layer before encoder for data augmentation
4. Add target information to autoencoder (supervised learning) to force it to generate more relevant features, and to create a shortcut for backpropagation of gradient
5. Hyperparameter optimization

### 0. Prerequisites

In [1]:
import os 
import time
import torch
import optuna
import joblib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from scipy import stats
from tqdm import tqdm
from dotenv import load_dotenv
from torch import nn
from torch.optim import lr_scheduler
from collections import OrderedDict

from src.metrics import pearson_metric
from src.torch_models import MLPAE
from src.data import Dataset, load_data

In [2]:
load_dotenv()

True

In [3]:
DEVICE = "cuda:0"
EPOCHS = 30
EXPERIMENT = "MLP+AE-baseline"

### 1. Data preparation

In [4]:
trainloader, _ = load_data(use_feather=True, split_data=False)

Loading took 7.08 seconds


### 2. Building a Model.

In [5]:
model = MLPAE(input_dim=301, mlp_depth=3, activation=nn.SiLU).to(DEVICE)

In [6]:
def train(model, criterion, loader, optimizer, investment_id_dropout=0.01, device='cpu'):
    model.to(device)
    model.train()
    
    train_loss = 0.0
    accuracy = []
    for i, (x, y) in enumerate(loader):
        x[:, 0] *= (torch.rand(len(x)) > investment_id_dropout)
        optimizer.zero_grad()
        x, y = x.to(device), y.to(device)
        x_pred, y_pred = model(x)
        
        loss_ae = criterion(x, x_pred)
        loss_mlp = criterion(y, y_pred.view(-1))
        
        loss = loss_ae + loss_mlp        
        
        loss.backward()
        optimizer.step()
        
        loss_ae = loss_ae.item()
        loss_mlp = loss_mlp.item()
        train_loss += loss.item()
    
    losses = {
        'ae': loss_ae / len(loader),
        'mlp': loss_mlp / len(loader),
        'ov': train_loss / len(loader)
    }
        
    return losses

In [7]:
def test(model, criterion, loader, investment_id_dropout=0.01, device='cpu'):
    model.eval()
    test_loss = 0.0
    accuracy = []
    with torch.no_grad():
        for i, (x, y) in enumerate(loader):
            x[:, 0] *= (torch.rand(len(x)) > investment_id_dropout)
            optimizer.zero_grad()
            x, y = x.to(device), y.to(device)
            x_pred, y_pred = model(x)

            loss_ae = criterion(x, x_pred)
            loss_mlp = criterion(y, y_pred.view(-1))

            loss = loss_ae + loss_mlp  
            
            loss_ae = loss_ae.item()
            loss_mlp = loss_mlp.item()
            test_loss += loss.item()
    
    losses = {
        'ae': loss_ae / len(loader),
        'mlp': loss_mlp / len(loader),
        'ov': test_loss / len(loader)
    }
        
    return losses

### 3. Training the Model.

In [8]:
experiment_dir = os.path.join("weights", EXPERIMENT)
if not os.path.exists(experiment_dir):
    os.makedirs(experiment_dir)

In [9]:
optimizer = torch.optim.AdamW(model.parameters(), lr=0.0001, weight_decay=5e-4)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=100)
criterion = nn.MSELoss()

In [10]:
losses = []
for epoch in range(EPOCHS):
    start_execution = time.time()
    train_losses = train(model, criterion, trainloader, optimizer, device=DEVICE)
    # test_losses = test(model, criterion, testloader, device=DEVICE)    
    scheduler.step()
    # Test AE: {test_losses['ae']:.5f} MLP: {test_losses['mlp']:.5f} OV: {test_losses['ov']:.5f} |     
    print(f"Epoch: {epoch+1:02d} ({time.time()-start_execution:.1f} s.) | Train AE: {train_losses['ae']:.5f} MLP: {train_losses['mlp']:.5f} OV: {train_losses['ov']:.5f} |")
    
    losses.append(train_losses['ov'])
    if train_losses['ov'] <= min(losses):
        torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': train_losses['ov'], 
        }, os.path.join(experiment_dir, f"{epoch}.pt"))

Epoch: 01 (31.6 s.) | Train AE: 0.04024 MLP: 0.00779 OV: 7924.75691 |
Epoch: 02 (29.8 s.) | Train AE: 0.00781 MLP: 0.00760 OV: 2.56608 |
Epoch: 03 (29.9 s.) | Train AE: 0.00783 MLP: 0.00769 OV: 2.02736 |
Epoch: 04 (30.5 s.) | Train AE: 0.00781 MLP: 0.00700 OV: 2.01057 |
Epoch: 05 (30.6 s.) | Train AE: 0.00789 MLP: 0.00701 OV: 2.01423 |
Epoch: 06 (29.9 s.) | Train AE: 0.00777 MLP: 0.00695 OV: 1.99301 |
Epoch: 07 (30.5 s.) | Train AE: 0.00774 MLP: 0.00698 OV: 2.00704 |
Epoch: 08 (30.5 s.) | Train AE: 0.00778 MLP: 0.00692 OV: 1.97015 |
Epoch: 09 (30.8 s.) | Train AE: 0.00776 MLP: 0.00677 OV: 1.97704 |
Epoch: 10 (30.8 s.) | Train AE: 0.00772 MLP: 0.00701 OV: 1.97547 |
Epoch: 11 (30.7 s.) | Train AE: 0.00774 MLP: 0.00704 OV: 1.96363 |
Epoch: 12 (30.7 s.) | Train AE: 0.00765 MLP: 0.00675 OV: 1.96397 |
Epoch: 13 (30.4 s.) | Train AE: 0.00763 MLP: 0.00674 OV: 1.95850 |
Epoch: 14 (30.2 s.) | Train AE: 0.00764 MLP: 0.00688 OV: 1.95734 |
Epoch: 15 (30.5 s.) | Train AE: 0.00767 MLP: 0.00674 OV: 1.