# Mechanism of Actions Prediction using Autoencoders and Multi-Layer Perceptron

In this notebook, we will predict the mechanisms of action of different drugs using the data provided by lish. 

We shall use PCA to compress and extract more predicatbility from our features and categorical embeddings for categorical variables.

We will first be building an autoencoder for encoding the information available for our features into a feature vector. On top of these embeddings, we will build a multi-label classifier which will basically be an MLP model. So, let's get started.



In [None]:
import sys
sys.path.append('../input/')

In [None]:
!export CUDA_LAUNCH_BLOCKING=1

In [None]:
# import required libraries
import os
import numpy as np
import pandas as pd
import random

# For building multi-layer perceptron
import torch
import torch.nn as nn
import torch.nn.utils.weight_norm as wnrm
import torch.nn.functional as F

# For preprocessing & manipulating data
from sklearn.decomposition import PCA
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, log_loss
from iterstrat.ml_stratifiers import MultilabelStratifiedKFold

# For plotting
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.style.use("fivethirtyeight")

import warnings
warnings.filterwarnings("ignore")
print("Imported all necessary libraries.")

In [None]:
# Read the data 
base_path = "/kaggle/input/lish-moa/"
read_data = lambda x: pd.read_csv(f"{base_path}/{x}")

train_features = read_data("train_features.csv")
test_features  = read_data("test_features.csv")
train_targets  = read_data("train_targets_scored.csv")

In [None]:
train_features.head(4)

In [None]:
train_targets.head(4)

We do not need the id values for analysis, so let's store them and drop them from the main dataframe. 

PS: Please verify that the target and features are in the same order in both train_features and train_targets datasets before dropping them off.

In [None]:
train_ids = train_features.sig_id.copy()
train_target_ids = train_targets.sig_id.copy()
test_ids = test_features.sig_id.copy()

assert list(train_ids) == list(train_target_ids)

train_features = train_features.drop(columns = ["sig_id"])
train_targets = train_targets.drop(columns = ["sig_id"])
test_features = test_features.drop(columns = ["sig_id"])

We have three categorical variables cp_type, cp_time and cp_dose. Let's look at each of them in more detail.

In [None]:
sns.countplot(train_features.cp_type);

There are two groups of people, the treatment group and the control group. The control group is generally kept aside and nothing substantial should happen in this group, let's see the targets of the control group entries.

In [None]:
control_group = list(train_features[train_features.cp_type == "ctl_vehicle"].index)
control_group_targets = train_targets.iloc[control_group, :]
np.any(control_group_targets.values)

As presumed, none of the target entries of the control group evaluate to 1, which means this group is simply not informative for our analysis; we can get rid of this column.

In the test set, wherever we have `cp_type` category as `ctl_vehicle` we can manually override the predictions for all MoAs to be zeros. 

In [None]:
train_features = train_features[train_features.cp_type != "ctl_vehicle"]
train_targets = train_targets.iloc[list(train_features.index), :]

train_features.reset_index(inplace = True, drop = True)
train_targets.reset_index(inplace = True, drop = True)

In [None]:
train_features = train_features.drop(columns = ["cp_type"])
test_cp_types = test_features.cp_type.copy()
test_features = test_features.drop(columns = ["cp_type"])

In [None]:
sns.countplot(train_features.cp_dose);

Looks like equal proportions of people were assigned doses of strength `D1` and `D2` respectively. Let's label encode them.

In [None]:
train_features["cp_dose"] = np.where(train_features.cp_dose == "D1", 1, 0)
test_features["cp_dose"]  = np.where(test_features.cp_dose == "D1", 1, 0)

In [None]:
sns.countplot(train_features.cp_time);

Similar to cp_dose, it looks like an equal proportion of people who were administered the doses were monitored for 24, 48 and 72 hours respectively. Let's label encode these as well.

In [None]:
cp_time_enc = LabelEncoder()
train_features["cp_time"] = cp_time_enc.fit_transform(train_features.cp_time)
test_features["cp_time"] = cp_time_enc.transform(test_features.cp_time)

Plot a random subset of all the columns in genetype and have a look at their distributions in the trainset.

In [None]:
gene_cols = [x for x in list(train_features.columns) if "g-" in x]
random.seed(10)
gene_cols = random.sample(gene_cols, 10)

fig, axes_ = plt.subplots(2, 5, figsize = (15, 6), sharey=True)
for idx, col in enumerate(gene_cols):
    
    r, c = idx // 5, idx % 5
    sns.distplot(train_features[col], ax = axes_[r][c])
    axes_[r][c].set_title(col, fontsize = 24)

fig.tight_layout();

Looks like most of the attributes are normally distributed with a mean of zero with a spread of values between (-10, 10) respectively. Let's look at the columns particular to cell type.

In [None]:
gene_cols = [x for x in list(train_features.columns) if "c-" in x]
random.seed(10)
gene_cols = random.sample(gene_cols, 10)

fig, axes_ = plt.subplots(2, 5, figsize = (15, 6), sharey=True)
for idx, col in enumerate(gene_cols):
    
    r, c = idx // 5, idx % 5
    sns.distplot(train_features[col], ax = axes_[r][c])
    axes_[r][c].set_title(col, fontsize = 24)

fig.tight_layout();

These features seem to have a bimodal distribution with 0 and -10 being the two potential modes for most of the features with 0 being the primary mode.

Since most of the numerical features are already constrained within comparatively close ranges, we will try working with them without worrying about any covariate shifts that might occur (as the effect will be minimal).

Let's split the data into train and validation sets. This will help us check the veracity of our model on data it hasn't encountered during training.

In [None]:
trX, vlX, trY, vlY = train_test_split(train_features, train_targets, random_state = 2, test_size = .2)

In [None]:
# Reset the indices after creating the split
g = lambda x: x.reset_index(drop = True)
trX, vlX, trY, vlY = map(g, [trX, vlX, trY, vlY])

Create a dataset class to feed the data to pytorch models

In [None]:
num_cols = list(train_features.columns)

Use PCA for compressing and extracting relevant information from all the continuous features.

In [None]:
n_comps = 650
pca_object = PCA(n_components = n_comps)
print("Fitting PCA object")
pca_object.fit(trX[num_cols]);

In [None]:
fig, ax = plt.subplots(2, 1, figsize = (20, 10))
print("First twenty components and the percent of variance explained by them.")
print(np.cumsum(pca_object.explained_variance_ratio_)[:20])
sns.lineplot(y = pca_object.explained_variance_ratio_, x = range(n_comps), ax = ax[0])
ax[0].set_title("% of explained variance using PCA on train features", fontsize = 25)

sns.lineplot(y = np.cumsum(pca_object.explained_variance_ratio_), x = range(n_comps), ax = ax[1])
ax[1].set_title("% of explained variance cumulative using PCA on train features", fontsize = 25)
fig.tight_layout();

We can see that as expected, there's the first two components which account for most of the explained variance and others marginally contribute to the variance explained. But since we have a gpu at our disposal, we shall choose to keep all the components as they won't harm the predictability in any way...

We can also use umap which is another compression technique in order to extract predictability out of our data. Let's try that as well. Let's use the same number of components as the PCA object to fit our umap object also.

In [None]:
# umap_object = umap.UMAP(n_components = n_comps, n_neighbors = 50)
# print("Fitting UMAP object")
# umap_object.fit(trX[num_cols]);

In [None]:
print("Extracting pca components from train, validation and test datasets.")

g = lambda x: pd.DataFrame(pca_object.transform(x[num_cols]), columns = [f"pca_{i}" for i in range(n_comps)])
# h = lambda x: pd.DataFrame(umap_object.transform(x[num_cols]), columns = [f"umap_{i}" for i in range(n_comps)])

trX_pca = g(trX)
vlX_pca = g(vlX)

In [None]:
print("Concatenating pca components to train, validation and test datasets.")
j = lambda x, y, z: pd.concat([x, y, z])

trX = pd.concat([trX, trX_pca], axis = 1)
vlX = pd.concat([vlX, vlX_pca], axis = 1)

num_cols = num_cols + [f"pca_{i}" for i in range(n_comps)]

In [None]:
SS = StandardScaler()
trX = pd.DataFrame(SS.fit_transform(trX[num_cols].values), columns = num_cols)
vlX = pd.DataFrame(SS.transform(vlX[num_cols].values), columns = num_cols)

For autoencoders, we need to introduce some noise in the dataset from which it needs to learn to denoise the input.

In [None]:
noise_ = np.random.normal(0, 0.02, trX.shape)
trX_noised = trX.values + noise_

vl_noise = np.random.normal(0, 0.02, vlX.shape)
vlX_noised = vlX.values + vl_noise

# Denoising Auto Encoder for feature compression

In [None]:
class moaAutoEncoderDataset(torch.utils.data.Dataset):
    """
    A dataset class to load the data for feeding it to pytorch models during training
    """
    def __init__(self, X, X_N):
        self.X = X.copy().astype(np.float32) # Output
        self.X_noise = X_N.copy().astype(np.float32) # Noise Input
        
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        return self.X_noise[idx, :], self.X[idx, :]

In [None]:
# Create train and valid datasets
train_ds = moaAutoEncoderDataset(trX.values, trX_noised)
valid_ds = moaAutoEncoderDataset(vlX.values, vlX_noised)

batch_size = 128
train_dl = torch.utils.data.DataLoader(train_ds, batch_size=batch_size, shuffle=True)
valid_dl = torch.utils.data.DataLoader(valid_ds, batch_size=batch_size, shuffle=True)

Create your model configuration for AutoEncoder here

In [None]:
encoding_dim = 450
class moaAutoEncoder(nn.Module):
    """
    Pytorch multiperceptron model
    """
    def __init__(self, input_dim):
        """
        Given embedding size and the number of continuous variables, initializes different layers of the model
        """
        super(moaAutoEncoder, self).__init__()
        
        # Define the initial embedding layer and batchnorm layer for input continuous columns
                
        self.fc1 = nn.Linear(in_features = input_dim, out_features = 800)
        self.act1 = nn.ELU()
#         self.bn1 = nn.BatchNorm1d(num_features = 800)
#         self.do1 = nn.Dropout(p = .3)
        
        self.fc_encoder = nn.Linear(in_features = 800, out_features = encoding_dim)
        self.act2 = nn.ELU()
#         self.bn_encoder = nn.BatchNorm1d(num_features = encoding_dim)
#         self.do_encoder = nn.Dropout(p = .15)
        
        self.fc2 = nn.Linear(in_features = encoding_dim, out_features = 800)
        self.act3 = nn.ELU()
#         self.bn2 = nn.BatchNorm1d(num_features = 800)
#         self.do2 = nn.Dropout(p = .3)
        
        self.fc_decoder = nn.Linear(in_features = 800, out_features = input_dim)
        self.fc_decoder_act = nn.ELU()
        
        
    def forward(self, ip):
        """
        Implement forward pass through the network
        """
                
        # Pass the input through the first expander layers
        x = self.act1(self.fc1(ip))
        
        # Pass through the bottleneck
        x = self.act2(self.fc_encoder(x))
        
        # Pass through the end expander layers
        x = self.act3(self.fc2(x))
        
        # Pass through the output layer
        op = self.fc_decoder_act(self.fc_decoder(x))
        
        return op
    
    def encode(self, ip):
        """
        Forward pass only when doing prediction
        """
        with torch.no_grad():
            
            # Pass the input through the first expander layers
            x = self.act1(self.fc1(ip))

            # Pass through the bottleneck
            encoding = self.fc_encoder(x)
            
        return encoding

In [None]:
input_dim = trX.shape[1]
autoencoderModel = moaAutoEncoder(input_dim)
autoencoderModel

In [None]:
# Check if gpu is available for training and utilize if available
device = "cuda" if torch.cuda.is_available() else "cpu" 
autoencoderModel = autoencoderModel.to(device);

In [None]:
# Training Hyperparams for AutoEncoder
n_epochs = 750
learning_rate = 8e-4

# Use the BCEWithLogitsLoss function. It will apply sigmoid activation to output and use Cross Entropy loss using
# log sum exp trick
AE_loss_func = nn.MSELoss()

# Use Adam as the optimizer with an initial learning rate specified above with a small weight decay for regularization
optim = torch.optim.Adadelta(autoencoderModel.parameters(), lr = learning_rate, weight_decay = 5e-5)

# Use Learning rate schedular to reduce the Learning rate to a third if validation 
# loss has plateaued for about 10 epochs. Set a lower bound of 1e-5 for learning rate
# If an update is made then print it to the output (verbose = True)
# scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optim, patience = 5, verbose = True, min_lr=1e-5, factor = 0.3)

In [None]:
# Training Loop for AutoEncoder
for epoch in range(1, n_epochs + 1):
    
    train_loss = 0
    valid_loss = 0
    n_train_samples = len(train_dl)
    n_valid_samples = len(valid_dl)
    
    # Train the model on training data
    for x_noise, x_real in train_dl:       
        # Move the batches to gpu if available
        x_noise, x_real = x_noise.to(device), x_real.to(device)
        
        # Zero out the accumulated gradients
        optim.zero_grad()
        
        # Get the inputs and outputs from the forward pass
        ops = autoencoderModel(x_noise)
        
        # Compute the loss
        loss = AE_loss_func(ops, x_real)
        
        # Backpropagate the losses
        loss.backward()
        optim.step()
        
        # Accumulate the losses
        train_loss += loss.item()
    
    # Evaluate the model performance on validation data
    for x_noise, x_real in valid_dl:
        # Move the batches to gpu if available
        x_noise, x_real = x_noise.to(device), x_real.to(device)
        
        with torch.no_grad():
            # Get the inputs and outputs from the forward pass
            ops = autoencoderModel(x_noise)

            # Compute the loss
            loss = AE_loss_func(ops, x_real)
        
        # Accumulate the loss
        valid_loss += loss.item()
    
#     scheduler.step(valid_loss / n_valid_samples)
    
    train_loss = round(train_loss / n_train_samples, 6)
    valid_loss = round(valid_loss / n_valid_samples, 6)
    
    print(f"Epoch {str(epoch):<3}/{str(n_epochs):<3} | Train Loss: {str(train_loss):<8}| Validation Loss: {str(valid_loss):<8}")
torch.save(autoencoderModel.state_dict(), f"autoencoder_weights.pth")

# Encode train and test datasets using the Autoencoder to latent dimension

In [None]:
# Create a train and test dataloader and extract the encodings from the autoencoder
pca_compression_cols = list(train_features.columns)

# Get the train features in a dataframe with the pca components
arr = pca_object.transform(train_features[pca_compression_cols])
temp_df = pd.DataFrame(arr, columns = [f"pca_{i}" for i in range(n_comps)])
train_features_df = pd.concat([train_features, temp_df], axis = 1)

# Get the test features in a dataframe with the pca components
arr = pca_object.transform(test_features[pca_compression_cols])
temp_df = pd.DataFrame(arr, columns = [f"pca_{i}" for i in range(n_comps)])
test_features_df = pd.concat([test_features, temp_df], axis = 1)

# Create train and test dataloaders
train_features_ds = moaAutoEncoderDataset(train_features_df.values, train_features_df.values)
test_features_ds = moaAutoEncoderDataset(test_features_df.values, test_features_df.values)

train_features_dl = torch.utils.data.DataLoader(train_features_ds, batch_size = batch_size, shuffle = False)
test_features_dl = torch.utils.data.DataLoader(test_features_ds, batch_size = batch_size, shuffle = False)

In [None]:
train_encodings = []
for x, _ in train_features_dl:
    x = x.to(device)
    embeddings = autoencoderModel.encode(x)
    train_encodings.append(embeddings.to("cpu"))

In [None]:
test_encodings = []
for x, _ in test_features_dl:
    x = x.to(device)
    embeddings = autoencoderModel.encode(x)
    test_encodings.append(embeddings.to("cpu"))

In [None]:
new_train_feats = pd.DataFrame(torch.cat(train_encodings).numpy(), columns = [f"AE_ft{i}" for i in range(encoding_dim)])
new_test_feats = pd.DataFrame(torch.cat(test_encodings).numpy(), columns = [f"AE_ft{i}" for i in range(encoding_dim)])

In [None]:
new_train_feats = pd.concat([train_features_df, new_train_feats], axis = 1)
new_test_feats = pd.concat([test_features_df, new_test_feats], axis = 1)

# Building a Multi-Layer Perceptron for Classification

## Create an indexer for Cross Validation

In [None]:
new_train_df = pd.concat([new_train_feats, train_targets], axis = 1)
target_cols = list(train_targets.columns)

In [None]:
def multifold_indexer(train_df, target_cols, n_splits = 8, random_state = 10):
    folds = train_df.copy()

    mlskf = MultilabelStratifiedKFold(n_splits=n_splits,random_state=random_state)
    
    folds['kfold'] = 0
    
    for f, (t_idx, v_idx) in enumerate(mlskf.split(X = train_df, y=train_df[target_cols])):
        folds.iloc[v_idx,-1] = int(f)

    folds['kfold'] = folds['kfold'].astype(int)
    
    return folds

## Create a dataset class for loading data

In [None]:
class moaMLPDataset(torch.utils.data.Dataset):
    """
    A dataset class to load the data for feeding it to pytorch models during training
    """
    def __init__(self, X, target_cols = None, dtype = "train"):
        X = X.copy()
        self.dtype = dtype
        
        if dtype == "train":
            self.X = X.drop(columns = target_cols).values.astype(np.float32) 
            self.y = X[target_cols].values.astype(np.float32)
        else:
            self.X = X.values.astype(np.float32)
        
    def __len__(self):
        return self.X.shape[0]
    
    def __getitem__(self, idx):
        
        if self.dtype == "train":
            op = (self.X[idx, :], self.y[idx, :])
        else:
            op = (self.X[idx, :], -1)
        
        return op

## Define a Multi-Layer Perceptron Model

In [None]:
class moaModel(nn.Module):
    """
    Pytorch multiperceptron model
    """
    def __init__(self, ip_dim, fc_dims, op_dim):
        """
        Given embedding size and the number of continuous variables, initializes different layers of the model
        """
        super(moaModel, self).__init__()
                        
        # Define the architecture with hidden layers
        fc_dims = [ip_dim] + fc_dims
        middle_layers = []
        middle_layers.append(nn.BatchNorm1d(num_features = ip_dim))
        for ip, op in zip(fc_dims[:-1], fc_dims[1:]):
            middle_layers.append(wnrm(nn.Linear(in_features = ip, out_features = op)))
            middle_layers.append(nn.LeakyReLU())
            middle_layers.append(nn.BatchNorm1d(num_features = op))
            middle_layers.append(nn.Dropout(p = 0.25))
        
        self.hidden = nn.Sequential(*middle_layers)
        
        # Define the output layer
        self.op = wnrm(nn.Linear(in_features = fc_dims[-1], out_features = op_dim))
    
    def forward(self, X):
        """
        Implement forward pass through the network
        """
        op = self.op(self.hidden(X))
        return op
    
    def predict(self, X):
        """
        Forward pass only when doing prediction
        """
        with torch.no_grad():
            predictions = self.forward(X)
        return predictions

## Define the Training Loss Function

In [None]:
class smoothLoss(nn.Module):
    def __init__(self, reduction = "mean", smoothing = 0):
        super(smoothLoss, self).__init__()
        self.reduction = reduction
        assert (smoothing >= 0) & (smoothing < 1)
        self.smoothing = smoothing
    
    def forward(self, ip, target):
        # Smooth the target labels
        with torch.no_grad():
            target = target * (1 - self.smoothing) + 0.5 * self.smoothing
        
        loss = F.binary_cross_entropy_with_logits(ip, target)
        
        if self.reduction == "mean":
            loss = loss.mean()
        else:
            loss = loss.sum()
        
        return loss

## Define the training loop

In [None]:
def train_validate(model, train_dl, valid_dl, train_loss_func, valid_loss_func, optimizer, sched, device):
    
    model.to(device)
    
    train_batches = len(train_dl)
    valid_batches = len(valid_dl)

    train_ls = 0
    valid_ls = 0


    # Train for one epoch
    for X, y in train_dl:
        # Clear the accumulated gradients
        optimizer.zero_grad()

        # Perform the forward pass operation
        X, targs = X.to(device), y.to(device)
        op = model(X)

        # Backpropagate the errors through the network
        tr_loss = train_loss_func(op, targs)
        tr_loss.backward()
        optimizer.step()
        train_ls += tr_loss.item()

    # Check the performance on valiation data
    valid_preds = []
    for X, y in valid_dl:
        X, targs = X.to(device), y.to(device)
        op = model.predict(X)
        valid_preds.append(F.sigmoid(op).cpu())
        vls = valid_loss_func(op, targs)
        valid_ls += vls.item()

    train_ls = round(train_ls / train_batches, 6)
    valid_ls = round(valid_ls / valid_batches, 6)
    
    valid_preds = torch.cat(valid_preds).numpy()
    
    # Check if validation loss is reducing, if not, reduce the learning rate
    sched.step(valid_ls)

#         print(f"Epoch {str(epoch):<3}/{str(n_epochs):<3} | Train Loss: {str(train_ls):<8}| Validation Loss: {str(valid_ls):<8}")
    return (train_ls, valid_ls, valid_preds)

## Define a function for doing inference on test dataset

In [None]:
def predict_test(model, test_dl, device):
    model.to(device)
    predictions = []
    for x, _ in test_dl:
        x = x.to(device)
        with torch.no_grad():
            predictions.append(F.sigmoid(model(x)).cpu())
    predictions = torch.cat(predictions).numpy()
    return predictions    

## Create a model instance

In [None]:
ip_dim = new_train_feats.shape[1]
fc_dims = [600, 512, 256]
op_dim = train_targets.shape[1]
myClassifierModel = moaModel(ip_dim, fc_dims, op_dim)

print(myClassifierModel)

## Define training hyperparameters

In [None]:
# Seed everything for reproducibility
def seed_everything(seed=42):
    random.seed(seed)
    #os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True

In [None]:
N_FOLDS = 7
RANDOM_STATE = 42
N_EPOCHS_PER_FOLD = 20
BATCH_SIZE = 128
WEIGHT_DECAY = 1e-5
LEARNING_RATE = 1.5e-3
MIN_LR = 8e-4

# Seed everything
seed_everything(RANDOM_STATE)

# Create a dataset with the cross-validation indices
df_with_folds = multifold_indexer(new_train_df, target_cols ,N_FOLDS, RANDOM_STATE)

# Check if a gpu is available
device = "cuda" if torch.cuda.is_available() else "cpu"

# For training use the smooth loss func defined above
train_loss_func = smoothLoss(smoothing = 0.001)
# train_loss_func = nn.BCEWithLogitsLoss()

# Use the BCEWithLogitsLoss function for validation. It will apply sigmoid activation to output
# and use Cross Entropy loss using log sum exp trick
valid_loss_func = nn.BCEWithLogitsLoss()

# Use Adam as the optimizer with an initial learning rate specified above with a small weight decay for regularization
optimizer = torch.optim.AdamW(myClassifierModel.parameters(), lr = LEARNING_RATE, weight_decay = WEIGHT_DECAY)

# Use Learning rate schedular to reduce the Learning rate to a third if validation 
# loss has plateaued for about 5 epochs. Set a lower bound of 1e-5 for learning rate
# If an update is made then print it to the output (verbose = True)
sched = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience = 10, verbose = True, min_lr = MIN_LR, factor = 0.3)

## Train the MLP

In [None]:
loss_history = {}
best_loss = np.inf

# Create a test dataloader for inferring the predictions after each fold
test_ds = moaMLPDataset(new_test_feats, dtype = "test")
test_dl = torch.utils.data.DataLoader(test_ds, batch_size = BATCH_SIZE, shuffle = False)

# Create an out of folds predictions and an inference predictions array
oof_predictions = np.zeros((new_train_feats.shape[0], len(target_cols)))
inference_predictions = np.zeros((new_test_feats.shape[0], len(target_cols)))
random_states = [10, 42, 73, 1729, 31415]

for random_state in random_states:
    df_with_folds = multifold_indexer(new_train_df, target_cols ,N_FOLDS, random_state)
    for fold in np.unique(df_with_folds.kfold):

        # Using the fold indices, split the data into train and validation
        tr = df_with_folds[df_with_folds.kfold == fold].copy().reset_index(drop = True)
        vl = df_with_folds[df_with_folds.kfold != fold].copy()
        vl_indices = list(vl.index)                   
        vl = vl.reset_index(drop = True)

        tr = tr.drop(columns = ["kfold"])
        vl = vl.drop(columns = ["kfold"])

        # Create train and validation dataloaders
        train_ds = moaMLPDataset(tr, target_cols)
        valid_ds = moaMLPDataset(vl, target_cols)

        train_dl = torch.utils.data.DataLoader(train_ds, batch_size = BATCH_SIZE, shuffle = True)
        valid_dl = torch.utils.data.DataLoader(valid_ds, batch_size = BATCH_SIZE, shuffle = False)

        # Keep the training history in memory
        if f"FOLD_{fold}" not in loss_history:
            loss_history[f"FOLD_{fold}"] = {}
            loss_history[f"FOLD_{fold}"]["train_loss"] = []
            loss_history[f"FOLD_{fold}"]["valid_loss"] = []

        # Do the actual training
        for epoch in range(1, N_EPOCHS_PER_FOLD + 1):
            train_ls, valid_ls, valid_preds = train_validate(myClassifierModel, train_dl, valid_dl, train_loss_func, valid_loss_func, optimizer, sched, device)
            print(f"Fold: {str(fold):<3}| Epoch: {str(epoch):<3}| Train Loss: {str(round(train_ls, 5)):<6}| Valid Loss: {str(round(valid_ls, 5)):<6}")
            loss_history[f"FOLD_{fold}"]["train_loss"].append(train_ls)
            loss_history[f"FOLD_{fold}"]["valid_loss"].append(valid_ls)

            # Save the parameters of the best model until now
            if valid_ls < best_loss:
                best_loss = valid_ls
                torch.save(myClassifierModel.state_dict(), f"best_model.pth")
            
            # Add the validation predictions to the out of folds dataframe
            oof_predictions[vl_indices, :] += valid_preds
        
        # Do inference and append it to the inference predictions dataset
        print("Adding inference predictions")
        inference_predictions += predict_test(myClassifierModel, test_dl, device)

    # Save the parameters of the final model
    torch.save(myClassifierModel.state_dict(), f"final_model.pth")

In [None]:
print("Averaging all the inference predictions across all folds and random states breaks.")
inference_predictions /= (len(random_states) * N_FOLDS)

In [None]:
print("Averaging out of folds predictions across all epochs and random states breaks")
oof_predictions /= (len(random_states) * N_EPOCHS_PER_FOLD)

## Look at the training curves for different folds

In [None]:
def plot_losses(ax, record, fold):
    df = pd.DataFrame(record)
    sns.lineplot(df.index, df.train_loss, ax = ax)
    sns.lineplot(df.index, df.valid_loss, ax = ax)
    ax.set_xlabel("Epochs", fontsize = 12)
    ax.set_ylabel("Loss", fontsize = 12)
    ax.set_title(f"Training Curve for {fold}", fontsize = 15)
    ax.legend(["Train Loss", "Valid Loss"])

In [None]:
plt.style.use("fivethirtyeight")
fig, ax = plt.subplots(2, 3, figsize = (15,9), sharex = True)
ax = np.ravel(ax)

# Plot the loss curves for different training folds
for fold, axis in zip(loss_history, ax):
    rec = loss_history[fold]
    plot_losses(axis, rec, fold)

## Evaluate the performance on the whole dataset

In [None]:
# Check the log loss on the entire dataframe
score = 0
for i in range(len(target_cols)):
    true = train_targets.values[:, i]
    preds = oof_predictions[:, i]/N_FOLDS
    
    score_ = log_loss(true, preds, eps = 1e-6)
    if not np.isnan(score_):
        score += score_ 
    else:
        print(i)
    
score /= len(target_cols)

print(f"Loss on the entire dataframe: {round(score, 5)}")

## Infer on the test dataset

In [None]:
predictions = pd.DataFrame(inference_predictions, columns = target_cols)
predictions["sig_id"] = test_ids

cols = ["sig_id"] + list(predictions.columns)[:-1]
predictions = predictions[cols]

### Manually change predictions of control group to zeros

In [None]:
# Get all the control group ids
control_ids = list(test_cp_types[test_cp_types == "ctl_vehicle"].index)

# Modify predictions of control group records
modified_predictions = pd.DataFrame(np.zeros_like(predictions.iloc[control_ids]), columns = predictions.columns)
modified_predictions.index = control_ids
predictions.iloc[control_ids] = modified_predictions
predictions.sig_id = test_ids

# Save the predictions
predictions.to_csv("/kaggle/working/submission.csv", index = False)
predictions.head()