# SKORCH EXAMPLE ON TPS-05 - SIMPLE NN START WITH FEW LINES OF CODE

<div class="alert alert-warning">
The goal of skorch is to make it possible to use PyTorch with sklearn. This is achieved by providing a wrapper around PyTorch that has an sklearn interface. In that sense, skorch is the spiritual successor to nolearn, but instead of using Lasagne and Theano, it uses PyTorch. Skorch does not re-invent the wheel, instead getting as much out of your way as possible. If you are familiar with sklearn and PyTorch, you don’t have to learn any new concepts, and the syntax should be well known. (If you’re not familiar with those libraries, it is worth getting familiarized.)
</div>
<div><br></div>
<div aligh="center"><img src="https://skorch.readthedocs.io/en/stable/_static/logo.svg"></div>



The goal of this notebook is to show how to create very simple NN using Pytorch model with Scikit-Learn Wrapper. Content:

<ul>
    <li>Install skorch</li>
    <li>Prepare data</li>
    <li>Define Pytorch simple Sequential model</li>
    <li>Define skorch wrapper</li>
    <li>Create simple scikit-learn Pipeline</li>
    <li>Search for NN hyperparameters using GridSearchCV</li>
    <li>Callback implemented</li>
    <ul>
        <li>EarlyStopping</li>
        <li>Lerning Scheduler</li>
    </ul>
    <li>Gridsearch - searching for best Network Architecture</li>
    <ul>
        <li>The best NonLinear module search using GridSearch</li>
        <li>Module configuration (eg. BatchNormalization on/off)</li>
    </ul>
</ul>

<div class="alert alert-info">
    <strong>Important links:</strong>
<ul>
    <li><a href="https://skorch.readthedocs.io/en/stable/">Skorch documentation</a></li>
    <li><a href="https://github.com/skorch-dev/skorch">Skorch repo</a></li>
</ul>
</div>

In [None]:
!pip install skorch -q 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import GridSearchCV


import torch
from torch import nn
import torch.optim as optim
import torch.nn.functional as F

import warnings
warnings.filterwarnings("ignore")

from skorch import NeuralNetClassifier
from skorch.callbacks import EpochScoring

device = 'cuda' if torch.cuda.is_available() else 'cpu'
torch.manual_seed(0)

## MINIMAL DATASET PREPARATION

In [None]:
train = pd.read_csv("../input/tabular-playground-series-may-2021/train.csv", index_col = 'id')
test = pd.read_csv("../input/tabular-playground-series-may-2021/test.csv", index_col = 'id').values.astype('float32')

X = train.drop('target', axis = 1).values.astype('float32')

lencoder = LabelEncoder()
y = lencoder.fit_transform(train['target']).astype('int64')

## LET'S DEFINE NN MODEL (WE USE PYTORCH WAY OF DEFINING MODEL)

In [None]:
num_features = 50

block_config = {
        'fc_block1':{
            'values':[num_features, 128]},
        'fc_block2':{
            'values':[num_features, 128, 64]},
        'fc_block3':{
            'values':[num_features, 256, 128, 64]},
        'fc_block4':{  
            'values':[num_features, 512, 256, 128, 64]} 
    }


def linear_block(in_features, out_features, p_drop, nonlinear, batch_norm, *args, **kwargs):
    
    layers = []
    layers.append(nn.Linear(in_features, out_features))
    if batch_norm:
        layers.append(nn.BatchNorm1d(out_features))
    layers.append(nonlinear)
    layers.append(nn.Dropout(p = p_drop))    
    
    return nn.Sequential(*layers)

class TPS05Classification(nn.Module):
    def __init__(self, num_class = 4, dropout = 0.3, nonlinear = nn.ReLU(), block = 1, batch_norm = True):
        super(TPS05Classification, self).__init__()
        
        self.non_linear = nonlinear
        
        self.lin_sizes = block_config['fc_block'+str(block)]['values']
        
        lin_blocks = [linear_block(in_f, out_f, dropout, self.non_linear, batch_norm) 
                      for in_f, out_f in zip(self.lin_sizes, self.lin_sizes[1:])]
        
        self.linear = nn.Sequential(*lin_blocks)
        
        self.out = nn.Sequential(
            nn.Linear(block_config['fc_block'+str(block)]['values'][-1], num_class))
    
    def forward(self, x):
        x = self.linear(x)
        return  F.softmax(self.out(x), dim = -1)

## SKORCH SCIKIT-LEARN WRAPPER 

### CALLBACKS

Instead of searching LR and max_epochs params using GridSearch I decided to use better strategy - use Callbacks. Skorch supports wide range of callbacks (you can even write custom one).

#### BUILD-IN

In [None]:
from skorch.callbacks import LRScheduler, EarlyStopping
from torch.optim.lr_scheduler import ReduceLROnPlateau

lr_scheduler = LRScheduler(policy = ReduceLROnPlateau, monitor = 'valid_loss', mode = 'min', patience = 3, factor = 0.1, verbose = True)
early_stopping = EarlyStopping(monitor='valid_loss', patience = 10, threshold = 0.0001, threshold_mode='rel', lower_is_better=True)

#### CUSTOM

This is very easy sample to show you how to jump into two stages: epoch_end and train_end but you can define more callbacks: https://skorch.readthedocs.io/en/stable/callbacks.html# 

In [None]:
from skorch.callbacks import Callback


class TPS05CustomCallback(Callback):
    def __init__(self, ):
        self.best_epoch_ = 0

    def initialize(self):
        self.best_log_loss_ = 999

    def on_epoch_end(self, net, **kwargs):
        if net.history[-1, 'valid_loss'] < self.best_log_loss_:
            self.best_log_loss_ = net.history[-1, 'valid_loss']
            self.best_epoch_ = len(net.history)

    def on_train_end(self, net, **kwargs):
        print(f">>>> Training end. The best log_loss: {self.best_log_loss_} on epoch: {self.best_epoch_} <<<< \n")

### SKORCH - SCIKIT PYTORCH WRAPPER

In [None]:
# No additional parameters - we will find them using GridSearchCV

net = NeuralNetClassifier(TPS05Classification, device = device, lr = 0.001, max_epochs = 50, callbacks = [lr_scheduler, early_stopping, TPS05CustomCallback])

## SCIKIT-LEARN PIPELINE

In [None]:
steps = [('scaler', StandardScaler()), ('net', net)]
pipeline = Pipeline(steps)

## LET'S DEFINE GRIDSEARCH PARAMETERS

In [None]:
grid_params = {
    # For the first two params we used Callbacks 
    #'net__max_epochs':[20, 40], 
    #'net__lr': [0.001, 0.0001], 
    'net__module__dropout': [0.2, 0.3],
    'net__optimizer': [optim.Adam, optim.RMSprop], 
    'net__module__block': [2, 3],
    'net__module__nonlinear': [nn.ReLU(), nn.Softmax(dim = 1)], # we can play with network architecture as well 
    'net__module__batch_norm': [True, False] # BatchNormalization test
} 

grid_net = GridSearchCV(pipeline, grid_params, refit = True, cv = 3, scoring = 'neg_log_loss', verbose = 1)

## LET'S TRAIN THE NETWORK AND FIND THE BEST HYPERPARAMETERS

In [None]:
result = grid_net.fit(X,y)

## SHOW BEST PARAMETERS FOR CURRENT NN ARCHITECTURE

In [None]:
print(grid_net.best_params_)

## SHOW MODEL RANKING (TOP 5)

In [None]:
def report(results, n_top=3):
    for i in range(1, n_top + 1):
        candidates = np.flatnonzero(results['rank_test_score'] == i)
        for candidate in candidates:
            print("Model with rank: {0}".format(i))
            print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
                  results['mean_test_score'][candidate],
                  results['std_test_score'][candidate]))
            print("Parameters: {0}".format(results['params'][candidate]))
            print("")

report(grid_net.cv_results_,5)

## SHOW THE BEST ESTIMATOR CONFIGURATION

In [None]:
grid_net.best_estimator_

## PLOT BEST ESTIMATOR LEARNING CURVES

In [None]:
epochs = [i for i in range(len(grid_net.best_estimator_[1].history))]
train_loss = grid_net.best_estimator_[1].history[:,'train_loss']
valid_loss = grid_net.best_estimator_[1].history[:,'valid_loss']

In [None]:
plt.plot(epochs,train_loss,'g-');
plt.plot(epochs,valid_loss,'r-');
plt.title('Training Loss Curves');
plt.xlabel('Epochs');
plt.ylabel('Mean Squared Error');
plt.legend(['Train','Validation']);

## PREDICT 

In [None]:
y_pred = grid_net.predict_proba(test)

## ... AND SUBMIT

In [None]:
sub = pd.read_csv("../input/tabular-playground-series-may-2021/sample_submission.csv")

predictions_df = pd.DataFrame(y_pred, columns = ["Class_1", "Class_2", "Class_3", "Class_4"])
predictions_df['id'] = sub['id']

In [None]:
predictions_df.head(5)

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import itertools

palette = itertools.cycle(sns.color_palette())

plt.figure(figsize=(16, 8))
for i in range(4):
    plt.subplot(2, 2, i+1)
    c = next(palette)
    sns.histplot(predictions_df, x = f'Class_{i+1}', color=c)
plt.suptitle("Class prediction distribution")

In [None]:
predictions_df.drop("id", axis=1).describe().T.style.bar(subset=['mean'], color='#205ff2')\
                            .background_gradient(subset=['std'], cmap='Reds')\
                            .background_gradient(subset=['50%'], cmap='coolwarm')

In [None]:
predictions_df.to_csv("skorch_nn_tutorial_submission.csv", index = False)

BONUS (for LB score lovers) - What if we blend NN score with my blender database .... ? :)

In [None]:
# GREAT kernel: https://www.kaggle.com/lazaro97/tps-may-stacking-blending-pseudolabelling 

blend_l1 = pd.read_csv("../input/tps-may-stacking-blending-pseudolabelling/sub_lb_0.09080355083449096_0.7.csv")


output = predictions_df.copy()
output["Class_1"] = (predictions_df.Class_1 * 0.3 + blend_l1.Class_1 * 0.7)
output["Class_2"] = (predictions_df.Class_2 * 0.3 + blend_l1.Class_2 * 0.7)
output["Class_3"] = (predictions_df.Class_3 * 0.3 + blend_l1.Class_3 * 0.7) 
output["Class_4"] = (predictions_df.Class_4 * 0.3 + blend_l1.Class_4 * 0.7) 

In [None]:
predictions_df = pd.DataFrame(output, columns = ["Class_1", "Class_2", "Class_3", "Class_4"])
predictions_df['id'] = sub['id']
predictions_df.to_csv("TPS-05-skorch_blended_submission.csv", index = False)