# Let's talk about Layers and activations and losses

### There are different types of Layers in a Neural Network

The most important and widely used are:

* Linear layers :
* Convolution layers :
* Pooling Layers :
* Dropout layers :

We will be using Linear layers and Dropout layers to build a simple shallow neural network

### The different Activation functions

The activation functions we use depends on the task we perform:

__Non-linear activation functions__:
* nn.ReLU :
* nn.Sigmoid :
* nn.Tanh :

__Linear activation fucntions are__:
* nn.Sotfmax
* nn.Softmin
* nn.LogSoftmax

### The different loss functions are:

The most widely used loss functions are:
* nn.MSELoss : Mean Squared Error Loss function
* nn.CrossEntropyLoss : This criterion computes the cross entropy loss between input and target.
* nn.BCELoss : Creates a criterion that measures the Binary Cross Entropy between the target and the input probabilities
* nn.BCEWithLogitsLoss : This loss combines a Sigmoid layer and the BCELoss in one single class.

for more information you can check this [link](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity).


# Lets start building a simple NN

We Will be building a simple NN from scratch so that later we can experiment it likewise :)

Let's get started

# Please DO UPVOTE if you like :)

# Importing dependencies

In [None]:
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import datatable as dt
from sklearn.model_selection import train_test_split
from torch.optim import AdamW, lr_scheduler
from tqdm.notebook import tqdm_notebook
from tqdm import tqdm
import math
from sklearn.preprocessing import StandardScaler
from torch.utils.data import DataLoader
import time
import gc
from sklearn.metrics import roc_auc_score
import pickle
import os
import math

# The sigmoid function for the outputs

In [None]:
def sigmoid(x):
    return 1 / (1 + math.exp(-x))

# Loss Function : Binary Cross Entropy Loss

In [None]:
def loss_fn(outputs, targets):
    loss = nn.BCEWithLogitsLoss()(outputs, targets.view(-1,1))
    return loss

# AUC metric to measure model performance

In [None]:
def metrics(targets, outputs):
    auc = roc_auc_score(targets, outputs)
    return auc

# Config

In [None]:
class Config:
    epochs = 10
    scheduler = 'CosineAnnealingLR'
    batch_size = 10240
    early_stopping_epochs = 2
    lr = 1e-5
    weight_decay = 0.01

# Train and Validation Dataset

In [None]:
class Dataset:
    def __init__(self, X, y):
        self.X = X
        self.y = y.values
        
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        return {
            'X' : torch.tensor(self.X[idx], dtype=torch.float),
            'targets' : torch.tensor(self.y[idx], dtype=torch.float)
        }

# Prediction Dataset

To learn more about Datasets and DataLoaders see this notebook : []()

In [None]:
class PredDataset:
    def __init__(self, X):
        self.X = X
        
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        return {
            'X' : torch.tensor(self.X[idx], dtype=torch.float)
        }

# Create your Model

### The model creation is simple...
Just follow these 3 steps:
1. Create the **Model** class and inherit from **nn.Module**
2. Create the __init__ : define the layers you want to use
3. Create the **forward** function for the forward propagation of the NN and return the output

### Another interesting thing you can see:

I have used **nn.LazyLinear** instead of **nn.Linear** as the input layer.

The reason: I am too lazy :)

### So what does a LazyLinear layer do?

Easy. Yeah, it makes our life so easy. **No need** to wander around calculating the number of input features for the model. O.o

LazyLinear layer just takes in input only the number of **out_features**. :)

Done. Now if you want to change the number of input features, excluding some columns... No need to worry :D

In [None]:
class TPSModel(nn.Module):
    def __init__(self, args):
        super(TPSModel, self).__init__()
        self.args = args
        self.linear = nn.Linear(128, 128)
        self.lazylinear = nn.LazyLinear(128)
        self.silu = nn.SiLU()
        self.dropout = nn.Dropout(0.5)
        self.output = nn.Linear(128, 1)
    
    # The forward Function
    def forward(self, x):
        x = self.lazylinear(x)
        x = self.silu(x)
        x = self.linear(x)
        x = self.dropout(x)
        x = self.output(x)
        
        return x

# Creating the training Epoch

In this step we give the model some inputs and get some outputs and return the loss for that epoch :)

Nothing else.




Later we use this loss and backpropagate it into the NN and accordingly step the optimizer and the scheduler :)

Easy right?

In [None]:
def train_epoch(args, dataloader, model, optimizer, scheduler, epoch):
    
    model.train()
    
    epoch_loss = 0.0
    running_loss = 0.0
    dataset_size=0
    running_auc=0
    batch_size = args.batch_size
    
    bar = tqdm(enumerate(dataloader), total=len(dataloader))
    for step, data in bar:
        optimizer.zero_grad()
        
        X = data['X'].cuda()
        targets = data['targets'].cuda()
        outputs = model(X)
        
        loss = loss_fn(outputs.view(-1,1), targets)
        loss.backward()
        
        optimizer.step()
        if scheduler is not None:
            scheduler.step()
            
        running_loss += (loss.item() * batch_size)
        dataset_size += batch_size
        epoch_loss = running_loss / dataset_size
        auc = metrics(targets.cpu().detach().numpy(), outputs.cpu().detach().numpy())
        running_auc += auc * batch_size
        epoch_auc = running_auc / dataset_size
        bar.set_postfix(Epoch=epoch, Stage='Training', Train_Loss=epoch_loss,
                        AUC=epoch_auc)
    gc.collect()
    return epoch_loss

# Validation Epoch

We just calculate the loss and return it :)

In [None]:
def validation(args, dataloader, model, epoch):
    
    model.eval()
    
    epoch_loss = 0.0
    running_loss = 0.0
    dataset_size=0
    batch_size = args.batch_size
    running_auc = 0
    counter=0
    with torch.no_grad():
        bar = tqdm(enumerate(dataloader), total=len(dataloader))
        for step, data in bar:

            X = data['X'].cuda()
            targets = data['targets'].cuda()
            outputs = model(X)

            loss = loss_fn(outputs.view(-1,1), targets)

            running_loss += (loss.item() * batch_size)
            dataset_size += batch_size
            epoch_loss = running_loss / dataset_size
            auc = metrics(targets.cpu().detach().numpy(), outputs.cpu().detach().numpy())
            counter+=1
            
            running_auc += auc * batch_size
            epoch_auc = running_auc / dataset_size
            
            bar.set_postfix(Epoch=epoch, AUC=epoch_auc, Train_Loss=epoch_loss, Stage='Validation')
    gc.collect()
    return epoch_loss, epoch_auc

# Prediction Loop :)
Predict the Output and returns it :D

In [None]:
def predict(args, dataloader, model):
    print('-'*20,'Predicting for Submission','-'*20)
    model.eval()
    all_outputs=[]

    with torch.no_grad():
        bar = tqdm(enumerate(dataloader), total=len(dataloader))
        for step, data in bar:

            X = data['X'].cuda()
            outputs = model(X)
            outputs = outputs.cpu().detach().numpy()
            all_outputs.append(outputs)
            bar.set_postfix(Stage='Prediction')
    gc.collect()
    return np.vstack(all_outputs)

# Optimizer

In [None]:
def get_optimizer(args, params):
    opt = AdamW(params, lr=args.lr, weight_decay=args.weight_decay)
    return opt

# Scheduler

In [None]:
def get_scheduler(args, optimizer):
    if args.scheduler == 'CosineAnnealingLR':
        scheduler = lr_scheduler.CosineAnnealingLR(optimizer,T_max=500, 
                                                   eta_min=1e-6)
    else:
        schduler = None
    return scheduler

# Reading the DF

In [None]:
pred_df = dt.fread('../input/tabular-playground-series-nov-2021/test.csv').to_pandas()

In [None]:
pred_df.head()

# Processing the test DF

In [None]:
xpred = pred_df.drop(['id'], axis=1)

# Main Training Loop

In this loop, for each fold the training is done. The Training epoch return the loss. It is then backpropagated in the NN. The validation loss and the metrics are also calculated. Finally the model is also saved. 

And the prediction is also done for the test set

In [None]:
def run(data, fold):
    
    if not os.path.isdir('standard_scaler'):
        os.mkdir('standard_scaler')
    if not os.path.isdir('models'):
        os.mkdir('models')
    
    print('-'*50)
    print(f'Fold : {fold}')
    print('-'*50)
    
    args = Config()
    start = time.time()
    model = TPSModel(args)
    model = model.cuda()
    
    optimizer = get_optimizer(args, model.parameters())
    scheduler = get_scheduler(args, optimizer)
    
    train = data[data['kfold']!=fold]
    valid = data[data['kfold']==fold]
    
    sc = StandardScaler()
    
    # We will be scaling down the inputs so that no feature is overlooked by another feature
    xtrain = train.drop(['id', 'target', 'kfold'], axis=1)
    ytrain = train['target']
    xtrain = sc.fit_transform(xtrain)
    
    xtest = valid.drop(['id', 'target', 'kfold'], axis=1)
    ytest = valid['target']
    xtest = sc.transform(xtest)
    
    xpred_sc = sc.transform(xpred) 
    pred_dataset = PredDataset(xpred_sc)
    pred_loader = DataLoader(pred_dataset, batch_size = 2*args.batch_size)
    
    with open(f'standard_scaler/sc_fold_{fold}.pickle', 'wb') as handle:
        pickle.dump(sc, handle, protocol=pickle.HIGHEST_PROTOCOL)
    
    # Creating the datasets
    train_dataset = Dataset(xtrain, ytrain)
    valid_dataset = Dataset(xtest, ytest)
    
    # Creating the DataLoaders
    train_loader = DataLoader(train_dataset, batch_size=args.batch_size)
    valid_loader = DataLoader(valid_dataset, batch_size=2*args.batch_size)
    
    best_val_loss = np.inf
    patience_counter = 0
    best_auc = 0
    
    # Iterating through epochs
    for epoch in range(args.epochs):
        
        # Trainign
        train_loss = train_epoch(args, train_loader, model, optimizer, scheduler, epoch)
        
        # Validation
        valid_loss, val_auc = validation(args, valid_loader, model, epoch)
        
        if val_auc >= best_auc:
            patience_counter = 0
            print(f"Validation AUC improved from : ({best_auc} ---> {val_auc})")
            best_auc = val_auc

            PATH = f"models/model_fold_{fold}.bin"
            torch.save(model.state_dict(), PATH)
            print(f"----------Model Saved----------")
        
        
        # Early Stopping to prevent overfitting
        else:
            patience_counter += 1
            print(f'Early stopping counter {patience_counter} of {args.early_stopping_epochs}')
            if patience_counter == args.early_stopping_epochs:
                print('*************** Early Stopping ***************')
                break
    
    
    end = time.time()
    time_elapsed = end-start
    print('Training complete in {:.0f}h {:.0f}m {:.0f}s'.format(
        time_elapsed // 3600, (time_elapsed % 3600) // 60, (time_elapsed % 3600) % 60))
    print("Best AUC: {:.4f}".format(best_auc))
    
    
    # Prediction
    preds = predict(args, pred_loader, model)
    
    del model, train_loader, valid_loader
    gc.collect()
    
    return preds

In [None]:
df = dt.fread('../input/fold-is-power/5fold.csv').to_pandas()

In [None]:
df.head()

### Converting the targets from True-False to 1-0

In [None]:
df['target'] = pd.get_dummies(df['target'].values, drop_first=True)

# Run Training, Validation and Prediction

In [None]:
all_preds = 0
for fold in range(5):
    model_preds = run(df, fold=fold)
    all_preds = all_preds + model_preds/5

In [None]:
final_preds = [sigmoid(x) for x in np.hstack(all_preds)]
pred_df['target'] = final_preds
pred_df[['id', 'target']].to_csv('submission.csv', index=False)

# Please DO UPVOTE if you like