# Pytorch NN for tabular - step by step (day by day)
I am not a Pytorch Grand Master. I just use it for solving problems. This notebook called Pytorch NN for tabular step by step (day by day) is my contribution to TPS-05 helping people to start building simple NN models using Pytorch. If you have any improvements to my code please let me know - I will add all improvements to final notebook.

This notebook intentionaly will be updated every day - step by step ... we have time ... lets eat an elephant slice by slice ....

<div class="alert alert-info">
  <strong>People often ask a question - Keras/Tensorflow vs Pytorch?</strong>
 <div>Answer is simple - both. Many recent publications use both Keras and Pytorch. If you want to be flexible and understand how solutions work you should know both. This is why I encourage you start today ... and implement your first NN in Pytorch.</div>
</div>



Here you can find my funny (out of the box) tps-05 implementation in Keras: [CNN (2D Convolution) for solving TPS-05](https://www.kaggle.com/remekkinas/cnn-2d-convolution-for-solving-tps-05) So as you can see I use framewors interchangeably when it is more convenient for me.

<div class="alert alert-success">
  <strong>Notebook scope and implementation schedule</strong>
    <ul>
        <li>Preparation - import modules, find device for torch, load TPS-05 data, create train and test dataset, create dataloader classes</li>
        <li>Define feed forward NN using mModule, plot model</li>
        <li>Define feed forward NN using Sequential, criterion and optimization</li>
        <li>Build train and validation loop, metric functions</li>
        <li>Plot training metrics (integrate with Neptune.ai)</li>
        <li>Implementing callbacks</li>
        <li>Hyperparameter tuning - learning rate using Scheduler</li>
    </ul>
</div>

Great tutorials I recommend:
- [Deep Learning 60 min blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)
- [Pytorch with examples](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html)
- [Deep Learning (with PyTorch)](https://github.com/Atcold/pytorch-Deep-Learning)
- [Deep learning for Pytorch](https://github.com/sgrvinod/Deep-Tutorials-for-PyTorch)
- [Awesome Pytorch](https://github.com/bharathgs/Awesome-pytorch-list#tutorials-books--examples)
- [Dive into Deep Learning](https://d2l.ai/index.html)
- [Deep Learning with PyTorch Step-by-Step](https://github.com/dvgodoy/PyTorchStepByStep)
- [Deep Learning with PyTorch](https://www.tomasbeuzen.com/deep-learning-with-pytorch/README.html)

Youtube:
- [PyTorch Tutorials - Complete Beginner Course](https://www.youtube.com/playlist?list=PLqnslRFeH2UrcDBWF5mfPGpqQDSta6VK4)
- [Pytorch tutorials](https://www.youtube.com/playlist?list=PLhhyoLH6IjfxeoooqP9rhU3HJIAVAJ3Vz)

Updated:
- New book came out - "Deep Learning with PyTorch Step-by-Step. A Beginner's Guide" by Daniel Voigt Godoy

![](https://d2sofvawe08yqg.cloudfront.net/pytorch/hero?1620637439)

**Day 1 (5.5.2021) - PREPARATION - import modules, find device for torch, load TPS-05 data, create train and test dataset, create dataloader classes**

# PREPARATION

In [None]:
!pip install torchviz -q

In [None]:
# We have to prepare for this yourney .... import modules is e great idea .... :)
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler  

# Pytorch modules
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import torch.nn.functional as F


from torchviz import make_dot, make_dot_from_trace
from tqdm.notebook import tqdm

In [None]:
train = pd.read_csv("../input/tabular-playground-series-may-2021/train.csv", index_col = 'id')
test = pd.read_csv("../input/tabular-playground-series-may-2021/test.csv", index_col = 'id')

TARGET = 'target'
RANDOM_STATE = 2021

In [None]:
# Duplicates in dataset? This is noise ... kill them ....
# I find it thanks @omarvivas: https://www.kaggle.com/c/tabular-playground-series-may-2021/discussion/236561

train = train[~train.drop('target', axis = 1).duplicated()]
train.shape

In [None]:
X = pd.DataFrame(train.drop("target", axis = 1))

lencoder = LabelEncoder()
y = pd.DataFrame(lencoder.fit_transform(train['target']), columns=['target'])

In [None]:
# We use stratify ... to ensure that we have the same class representation in each dataset
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, stratify=y, random_state= RANDOM_STATE)

sns.countplot(x = TARGET, data= y)

In [None]:
# NN likes numbers from 0-1 .... so we scale our dataset 
scaler = MinMaxScaler()

X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)

In [None]:
# then we convert our dataset temporary to numpy .... we will use then torch data structure
X_train, y_train = np.array(X_train, dtype= np.float32), y_train['target'].values 
X_valid, y_valid = np.array(X_valid, dtype= np.float32), y_valid['target'].values

# PYTORCH FOR TABULAR (MUTLICLASS) - STEP BY STEP

In [None]:
# Here we will define all params for rest of notebook

BATCH_SIZE = 64
NUM_FEATURES = len(train.columns)-1
NUM_CLASSES = 4
NUM_EPOCHS = 100

## DEVICE (CPU/GPU)

In [None]:
# Torch is like numpy (kind of data structure) but it is designed to work on GPU ... so we will catch device (mostly GPU) and load all data to speed up learning process

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

# if GPU enabled then device = gpu (recommended)

## DATASET CLASSES

In [None]:
# We define very simple Datast class for TPS-05 (three methods are required)

class TPS05Dataset(Dataset):
    
    def __init__(self, X_data, y_data):
        self.X_data = X_data
        self.y_data = y_data
        
    def __getitem__(self, index):
        return self.X_data[index], self.y_data[index]
        
    def __len__ (self):
        return len(self.X_data)

# As you can see we define torch arrays - then we will put them into device
train_dataset = TPS05Dataset(torch.from_numpy(X_train).float(), torch.from_numpy(y_train).long())
valid_dataset = TPS05Dataset(torch.from_numpy(X_valid).float(), torch.from_numpy(y_valid).long())

## DATA LOADERS

In [None]:
# Data loaders 
train_loader = DataLoader(dataset=train_dataset,
                          batch_size=BATCH_SIZE)

valid_loader = DataLoader(dataset=valid_dataset, batch_size=1)

In [None]:
# You can test it if you want - take first batch (size = 16) and print size

dataiter = iter(train_loader)
train_features, train_labels = dataiter.next()
print('Batch #1')
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
print("First row from batch #1")
print(train_features[1])

# take next batch (another way)
train_features, train_labels = dataiter.next()
print('\nBatch #2')
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
print("First row from batch #2")
print(train_features[1])

**Day 2 (6.5.2021) - DEFINE FEED FORWARD NN MODEL (using Module)**


## DEFINE FEED FORWARD NN MODEL


In [None]:
# Lets define model from: https://www.kaggle.com/subinium/tps-may-deeplearning-pipeline-for-beginner as a benchamrk model.

#model = Sequential([
#        Dense(512, input_dim=num_features, activation='relu'),
#        BatchNormalization(),
#        Dropout(0.3),
    
#        Dense(256, activation='relu'),
#        BatchNormalization(),
#        Dropout(0.2),
    
#        Dense(128, activation='relu'),
#        BatchNormalization(),
#        Dropout(0.2),
    
#        Dense(num_classes, activation='softmax')
#    ]) 


# There is many way you can create models in Pytorch using:
# - Module,
# - Sequential, 
# - ModuleList,
# - ModuleDict

# This is starter so we use first way using Module

class TPS05ClassificationModule(nn.Module):
    def __init__(self, num_feature, num_class):
        super(TPS05ClassificationModule, self).__init__()
        
        self.layer_1 = nn.Linear(num_feature, 512)
        self.layer_2 = nn.Linear(512, 256)
        self.layer_3 = nn.Linear(256, 128)
        self.layer_out = nn.Linear(128, num_class)
        
        torch.nn.init.xavier_normal_(self.layer_1.weight)
        torch.nn.init.xavier_normal_(self.layer_2.weight)
        torch.nn.init.xavier_normal_(self.layer_3.weight)
        torch.nn.init.xavier_normal_(self.layer_out.weight)
        
        self.dropout_1 = nn.Dropout(p=0.3)
        self.dropout_2 = nn.Dropout(p=0.2)
        
        self.batchnorm_1 = nn.BatchNorm1d(512)
        self.batchnorm_2 = nn.BatchNorm1d(256)
        self.batchnorm_3 = nn.BatchNorm1d(128)
        
        self.relu = nn.ReLU()
        self.softmax = nn.Softmax(dim = 1)
    
    def forward(self, x):
        x = self.layer_1(x)
       #x = self.batchnorm_1(x)
        x = F.relu(x)   # Second one using torch.nn.functional
        x = self.dropout_1(x)
        
        x = self.layer_2(x)
       #x = self.batchnorm_2(x)
        x = self.softmax(x)
        x = self.dropout_2(x)
        
        x = self.layer_3(x)
        #x = self.batchnorm_3(x)
        x = self.softmax(x)
        x = self.dropout_2(x)
        
        x = self.layer_out(x)
        return x
    
    
# Day 3 (7.5.2021)- using Sequential
# And ... what do you thing ... is it better? :)

def linear_block(in_features, out_features, p_drop, *args, **kwargs):
    return nn.Sequential(
        nn.Linear(in_features, out_features),
        #nn.BatchNorm1d(out_features),
        nn.ReLU(),
        nn.Dropout(p = p_drop)
    )

class TPS05ClassificationSeq(nn.Module):
    def __init__(self, num_feature, num_class):
        super(TPS05ClassificationSeq, self).__init__()
        
        self.linear = nn.Sequential(
            linear_block(num_feature, 100, 0.3),
            linear_block(100, 250, 0.3),
            linear_block(250, 128, 0.3),
        )
        
        self.out = nn.Sequential(
            nn.Linear(128, num_class)
        )
    
    def forward(self, x):
        x = self.linear(x)
        return self.out(x)

# Day 4 (8.5.2021) - using Dynamic Sequential 
class TPS05ClassificationDynSeq(nn.Module):
    def __init__(self, num_feature, num_class):
        super(TPS05ClassificationDynSeq, self).__init__()
        
        self.lin_sizes = [num_feature, 64, 32, 128]
        self.b_norm = [0.3, 0.2, 0.2]
        
        lin_blocks = [linear_block(in_f, out_f, b_in) 
                      for in_f, out_f , b_in in zip(self.lin_sizes, self.lin_sizes[1:], self.b_norm)]
        
        self.linear = nn.Sequential(*lin_blocks)
        
        self.out = nn.Sequential(
            nn.Linear(128, num_class)
        )
    
    def forward(self, x):
        x = self.linear(x)
        return self.out(x)

In [None]:
# Create model using Module 
modelMod = TPS05ClassificationModule(num_feature = NUM_FEATURES, num_class=NUM_CLASSES)
# Then pushi it to device (CPU/GPU)
modelMod.to(device)

# model.eval() is switch off for some specific layers/parts of the model (Dropouts Layers, BatchNorm Layers etc.) 
modelMod.eval()

# Whenever you want you can print model 
print(modelMod)

In [None]:
# Create model using Module 
modelSeq = TPS05ClassificationSeq(num_feature = NUM_FEATURES, num_class=NUM_CLASSES)
# Then pushi it to device (CPU/GPU)
modelSeq.to(device)

# model.eval() is switch off for some specific layers/parts of the model (Dropouts Layers, BatchNorm Layers etc.) 
modelSeq.eval()

In [None]:
# Create model using Module 
modelDynSeq = TPS05ClassificationDynSeq(num_feature = NUM_FEATURES, num_class=NUM_CLASSES)
# Then pushi it to device (CPU/GPU)
modelDynSeq.to(device)

# model.eval() is switch off for some specific layers/parts of the model (Dropouts Layers, BatchNorm Layers etc.) 
modelDynSeq.eval()

## VISUALIZE MODELS

### CREATING BY MODULE

In [None]:
# Generate random array (torch)
x = torch.randn(1, NUM_FEATURES).to(device)

# Pass through model
y = modelMod(x)

# Visualize
make_dot(y.mean(), params=dict(modelMod.named_parameters()))

### CREATING BY SEQUENTIAL

In [None]:
y = modelSeq(x)
make_dot(y.mean(), params=dict(modelSeq.named_parameters()))

### CREATING BY DYNAMIC SEQUENTIAL

In [None]:
y = modelDynSeq(x)
make_dot(y.mean(), params=dict(modelDynSeq.named_parameters()))

**Day 3 (7.5.2021) - DEFINE CRITERION AND OPTIMIZER**
## DEFINE CRITERION

In [None]:
# Loss function -> CrossEntropy 
# This criterion combines LogSoftmax and NLLLoss in one single class.
# It is useful when training a classification problem with C classes. 

criterion = nn.CrossEntropyLoss()

## DEFINE OPTIMIZER

In [None]:
# Lets choose one model (as you remember we created three models using different ways)
model = modelSeq

In [None]:
# During neural network training, its weights are randomly initialized initially.
# Then they are updated in each epoch in a manner such that they increase the overall accuracy of the network.
# This is actually a problem of optimization where the goal is to optimize the loss function and get the ideal weights. 
# And the method used for optimization is called Optimizer.

# Define learning rate -> then (in day 8 we will find it during hyperparameter optimization)
LEARNING_RATE = 0.001
optimizer = optim.Adam(model.parameters(), lr = LEARNING_RATE)

# 2021.05.17
# This is a part of NN optimization 
from torch.optim.lr_scheduler import ReduceLROnPlateau
scheduler = ReduceLROnPlateau(optimizer, 'min', patience = 3)

**Day 4 (8.5.2021) - DEFINE TRAIN AND VALIDATION LOOP, METRIC FUNCTIONS**

## DEFINE TRAIN AND VALIDATION LOOP, METRIC FUNCTIONS

In [None]:
accuracy_stat = {'train': [],"validation": []}
loss_stat = {'train': [], "validation": [] }

def acc_calc(y_pred, y_test):
    y_pred_softmax = torch.log_softmax(y_pred, dim = 1)
    _, y_pred_tags = torch.max(y_pred_softmax, dim = 1)    
    
    correct_pred = (y_pred_tags == y_test).float()
    acc = correct_pred.sum() / len(correct_pred)
    
    acc = torch.round(acc * 100)
    
    return acc

## DEFINE SIMPLE CALLBACK 

In [None]:
# Let's create simple Pytorch Callback 

class EarlyStoppingCallback:   
    def __init__(self, min_delta = 0.1, patience = 5):
        
        self.min_delta = min_delta
        self.patience = patience
        self.best_epoch_score = 0
        
        self.attempt = 0
        self.best_score = None
        self.stop_training = False
        
        
    def __call__(self, validation_loss):

        self.epoch_score = validation_loss

        if self.best_epoch_score == 0:
            self.best_epoch_score = self.epoch_score
        elif self.epoch_score > self.best_epoch_score - self.min_delta:
            self.attempt += 1
            print(f'Message from callback (Early Stopping) counter: {self.attempt}/{self.patience}')
            if self.attempt >= self.patience:
                self.stop_training = True
        else:
            self.best_epoch_score = self.epoch_score
            self.attempt = 0

In [None]:

# This is training and validation loop
# for each epoch
def train_nn():
    for progress in tqdm(range(1, NUM_EPOCHS+1)):

        train_epoch_loss = 0
        train_epoch_acc = 0

        model.train()

        # We loop over training dataset using batches (we use DataLoader to load data with batches)
        for X_train_batch, y_train_batch in train_loader:
            X_train_batch, y_train_batch = X_train_batch.to(device), y_train_batch.to(device)

            # Clear gradients
            optimizer.zero_grad()

            # Forward pass ->>>>
            y_train_pred = model(X_train_batch)

            # Find Loss and backpropagation of gradients
            train_loss = criterion(y_train_pred, y_train_batch)
            train_acc = acc_calc(y_train_pred, y_train_batch)

            # backward <------    
            train_loss.backward()

            # Update the parameters (weights and biases)
            optimizer.step()

            train_epoch_loss += train_loss.item()
            train_epoch_acc += train_acc.item()


        #  Then we validate our model - concept is the same
        with torch.no_grad():

            val_epoch_loss = 0
            val_epoch_acc = 0

            model.eval()
            for X_val_batch, y_val_batch in valid_loader:
                X_val_batch, y_val_batch = X_val_batch.to(device), y_val_batch.to(device)

                y_val_pred = model(X_val_batch)

                val_loss = criterion(y_val_pred, y_val_batch)
                val_acc = acc_calc(y_val_pred, y_val_batch)

                val_epoch_loss += val_loss.item()
                val_epoch_acc += val_acc.item()

        # end of validation loop
        early_stopping_callback(val_epoch_loss/len(valid_loader))
        if early_stopping_callback.stop_training:
            print(f'Training stopped -> Early Stopping Callback : validation_loss: {val_epoch_loss/len(valid_loader)}')
            break

        loss_stat['train'].append(train_epoch_loss/len(train_loader))
        loss_stat['validation'].append(val_epoch_loss/len(valid_loader))
        accuracy_stat['train'].append(train_epoch_acc/len(train_loader))
        accuracy_stat['validation'].append(val_epoch_acc/len(valid_loader))                           
        
        
        # 2021.05.17 
        # This is a part of NN optimization
        clr = optimizer.param_groups[0]['lr']        
        scheduler.step(val_epoch_acc/len(valid_loader))

        print(f'Epoch { progress + 0:03}: Loss: [Train: {train_epoch_loss/len(train_loader):.5f} | Validation: {val_epoch_loss/len(valid_loader):.5f} ] Accuracy: [Train: {train_epoch_acc/len(train_loader):.3f} | Validation: {val_epoch_acc/len(valid_loader):.3f}] LR: {clr}')

In [None]:
# I created function for training - it will be more flexible during NN tuning 
early_stopping_callback = EarlyStoppingCallback(0.001, 5)

train_nn()

**Day 5 (9.5.2021) - PLOT TRAINING METRICS (integrate with Neptune.ai)**

## PLOT TRAINING METRICS

In [None]:
# First define DataFrames with our data from training

df_train_va = pd.DataFrame.from_dict(accuracy_stat).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})
df_train_vl = pd.DataFrame.from_dict(loss_stat).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})

In [None]:
# Then plot two charts for Train/Val 
#   - Accuracy per epoch
#   - Loss

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20,7))

sns.lineplot(data = df_train_va, x = "epochs", y="value", hue="variable",  ax=axes[0]).set_title('Train - Validation Accuracy/Epoch')
sns.lineplot(data = df_train_vl, x = "epochs", y="value", hue="variable", ax=axes[1]).set_title('Train - Validation Loss/Epoch')

## PREDICT AND SUBMIT
Time to predict and submit our first NN TPS-05 predictions (!)

In [None]:
# First we have to forward pass our test data. To do this we have to transform it and send to device (CPU/GPU) 
tensor_preds = model(torch.from_numpy(scaler.transform(test)).float().to(device))

# As we can see we got Tensor (!). This is output from our last layer. This is not probabilities .....
# In next days I will show you how to do it in a much more elegant way :) 
# We change Network class in two ways (change in architecture definition and one more method for predicion)

# We look into first prediction
tensor_preds[0]

In [None]:
# Now it is time to convert our las layer tensor output to probablilities, send in to CPU and convert to Numpy

nn_preds = torch.nn.functional.softmax(tensor_preds, dim=1).cpu().detach().numpy() 
nn_preds[0]

In [None]:
sub = pd.read_csv("../input/tabular-playground-series-may-2021/sample_submission.csv")

predictions_df = pd.DataFrame(nn_preds, columns = ["Class_1", "Class_2", "Class_3", "Class_4"])
predictions_df['id'] = sub['id']

In [None]:
# Lets look on first predictions
predictions_df.head(5)

In [None]:
# Lets look on submission predition and distribution

predictions_df.drop("id", axis=1).describe().T.style.bar(subset=['mean'], color='#205ff2')\
                            .background_gradient(subset=['std'], cmap='Reds')\
                            .background_gradient(subset=['50%'], cmap='coolwarm')

In [None]:
# Submit

predictions_df.to_csv("pytorch_nn_tutorial_submission.csv", index = False)

### 08.05.2021 - First submission is 1.10542. It is ok but we will make it better in next days. I am sure this will be the best NN submission for TPS-05. 

# NN OPIMIZATION

What we can do to optimize NN? This is a very broad topic for several notebooks. Before we start optimizing the network (here I will show only 2-3 selected elements) let's think what we can do:

- Configure Nodes and Layers
- Optimize Gradient algorithm 
- Optimize Batch Size
- Optimize Loss Function
- Configure Speed of Learning
- Data preparation
- Vanishing Gradient / Gradient Clipping / Batch Normalization / Dropout
- Transfer Learning
- Regularization
- ...


In this tutotial I will show:
1. Configure Speed of Learning 
2. Configure Nodes and Layers

17.05.2021
## Configure Speed of Learning

In order to implement Learning Rate we can use various scheduler in optim library in PyTorch (https://pytorch.org/docs/stable/optim.html):

* LambdaLR()
* MultiplicativeLR()
* StepLR()
* MultiStepLR()
* ExponentialLR()
* CosineAnnealingLR()
* ReduceLROnPlateau()
* CyclicLR()
* OneCycleLR()

In our tutorial we use ReduceLROnPlateau as example. So out notebook requires two changes:

**1. Define scheduler**

<code>from torch.optim.lr_scheduler import ReduceLROnPlateau
optimizer = optim.Adam(model.parameters(), lr = LEARNING_RATE)
scheduler = ReduceLROnPlateau(optimizer, 'min', patience = 5) </code>

**2. Change training loop**

<code>clr = optimizer.param_groups[0]['lr']        
scheduler.step(val_epoch_acc/len(valid_loader))</code>


<div class="alert alert-info">
  <strong>All changes were introduced in notebook - just play with parameters and check how learning looks like</strong> 
</div>

TO BE CONTINUED ... 