<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:plum; border:0; color:black' role="tab" aria-controls="home"><center>Mechanisms of Action (MoA) Prediction</center></h2>
<img src="https://www.ledgerinsights.com/wp-content/uploads/2019/09/drugs-pharmaceuticals.jpg" width="1500"></center>

<a id="11"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:lightgray; border:0; color:black' role="tab" aria-controls="home"><center>References</center></h2>

* [MoA | Pytorch | 0.01859 | RankGauss | PCA | NN](https://www.kaggle.com/kushal1506/moa-pytorch-0-01859-rankgauss-pca-nn)
* [Pytorch CV|0.0145| LB| 0.01839 |](https://www.kaggle.com/riadalmadani/pytorch-cv-0-0145-lb-0-01839)



<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:lightgray; border:0; color:black' role="tab" aria-controls="home"><center>Table of Contents</center></h2>

    
    
- [Problem Statement](#1)
- [Import Libaries](#2)
- [Reading Data](#3)     
- [Explanatory Data Analysis (EDA)](#4)
- [PCA](#5)
- [MultilabelStratifiedKFold CV](#6)
- [Pytorch Dataset Classes)](#7)
- [Smoothing](#8)
- [Model](#9)
- [Prediction & Submission](#10)
- [References](#11)


<a id="1"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:lightgray; border:0; color:black' role="tab" aria-controls="home"><center>Problem Statment</center></h2>
<b><i>In this competition, we will have access to a unique dataset that combines gene expression and cell viability data. The data is based on a new technology that measures simultaneously (within the same samples) human cells’ responses to drugs in a pool of 100 different cell types (thus solving the problem of identifying ex-ante, which cell types are better suited for a given drug). In addition, We will have access to MoA annotations for more than 5,000 drugs in this dataset.Hence, Our task is to use the training dataset to develop an algorithm that automatically labels each case in the test set as one or more MoA classes. Note that since drugs can have multiple MoA annotations, the task is formally a multi-label classification problem.</b>


<a id="2"></a>
# Import Libaries

In [None]:
import os
import sys
sys.path.append('../input/iterative-stratification/iterative-stratification-master')
from iterstrat.ml_stratifiers import MultilabelStratifiedKFold

In [None]:
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import RobustScaler,StandardScaler,QuantileTransformer,PowerTransformer
from sklearn.decomposition import PCA
from sklearn.metrics import log_loss
from torch.nn.modules.loss import _WeightedLoss
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from scipy.stats import skew,boxcox,boxcox_normmax,kurtosis
#sns.set_context("paper", font_scale = 1, rc={"grid.linewidth": 3})
pd.set_option('display.max_rows', 100, 'display.max_columns', 900)
from matplotlib.pyplot import cm
from sklearn.feature_selection import VarianceThreshold
import sys
from torch.utils.data import Dataset, TensorDataset

In [None]:
data_dir = '../input/lish-moa'
print(os.listdir(data_dir))


<a id="3"></a>
# Reading Data

In [None]:
train_features = pd.read_csv('../input/lish-moa/train_features.csv')
test_features = pd.read_csv('../input/lish-moa/test_features.csv')
train_targets_scored = pd.read_csv('../input/lish-moa/train_targets_scored.csv')
train_targets_nonscored = pd.read_csv('../input/lish-moa/train_targets_nonscored.csv')
train_drug= pd.read_csv('../input/lish-moa/train_drug.csv')
sample_submission = pd.read_csv('../input/lish-moa/sample_submission.csv')

In [None]:
#train features
train_features.head(3)

In [None]:
# test fetures
test_features.head(3)

In [None]:
# contains an anonymous drug_id for the training set only.
train_drug.head(2)

In [None]:
# binary MoA targets that are scored(used for scoring and prediction)
train_targets_scored.head(3)

In [None]:
# Additional (optional) binary MoA responses for the training data. These are not predicted nor scored.
train_targets_nonscored.head(3)

In [None]:
genes=[col for col in train_features.columns if col.startswith('g-')]
cells=[col for col in train_features.columns if col.startswith('c-')]
print(genes)
print('    <----------------------------------------------------------------------------------------------------------------->')
print(cells)
print('\n')
print(len(genes))
print('\n')
print(len(cells))

In [None]:
# fetaures shape
print(train_features.shape)
print(test_features.shape)
print(train_targets_scored.shape)
print(train_targets_nonscored.shape)
print(train_drug.shape)

In [None]:
# Checking null/nan values if any
print(train_features.isna().sum().sum())
print(test_features.isna().sum().sum())
print(train_drug.isna().sum().sum())
print(train_targets_scored.isna().sum().sum())
print(train_targets_nonscored.isna().sum().sum())

In [None]:
train_features2=train_features.copy()
test_features2=test_features.copy()

### **Features**

* `sig_id` is the unique sample id
* Features with `g-` prefix are gene expression features and there are 772 of them (from `g-0` to `g-771`)
* Features with `c-` prefix are cell viability features and there are 100 of them (from `c-0` to `c-99`)
* `cp_type` is a binary categorical feature which indicates the samples are treated with a compound or with a control perturbation (`trt_cp` or `ctl_vehicle`)
* `cp_time` is a categorical feature which indicates the treatment duration (`24`, `48` or `72` hours)
* `cp_dose` is a binary categorical feature which indicates the dose is low or high (`D1` or `D2`)m

In [None]:
g_features = [feature for feature in train_features.columns if feature.startswith('g-')]
c_features = [feature for feature in train_features.columns if feature.startswith('c-')]
other_features = [feature for feature in train_features.columns if feature not in g_features and 
                                                             feature not in c_features and 
                                                             feature not in train_targets_scored and
                                                             feature not in train_targets_nonscored]

print(f'Number of g- Features: {len(g_features)}')
print(f'Number of c- Features: {len(c_features)}')
print(f'Number of Other Features: {len(other_features)} ({other_features})')

### Correlation Heatmap

In [None]:
qt = QuantileTransformer(n_quantiles=100,random_state=42,output_distribution='normal')
data = pd.concat([pd.DataFrame(train_features[g_features+c_features]), pd.DataFrame(test_features[g_features+c_features])])
data2 = qt.fit_transform(data[g_features+c_features])
train_features[g_features+c_features] = pd.DataFrame(data2[:train_features.shape[0]])
test_features[g_features+c_features] = pd.DataFrame(data2[-test_features.shape[0]:])

In [None]:
df = pd.concat([train_features, test_features])
columns = g_features + c_features
correlation = list(set([columns[np.random.randint(0, len(columns)-1)] for i in range(200)]))[:40]
data = df[correlation]
f = plt.figure(figsize=(18, 12))
sns.heatmap(data.corr(),cmap='cividis_r')
plt.xticks(range(data.shape[1]), data.columns, fontsize=14, rotation=50)
plt.yticks(range(data.shape[1]), data.columns, fontsize=14)
plt.show()

<a id="4"></a>
# Checking Normal distribution (skewness) in training and testing set for Genes and Cells

`Visualising Genes training set before applying transformation( Not Normal/Gaussian)`

In [None]:
color=cm.rainbow(np.linspace(0,3,20))
color_ind=0
n_row = 6
n_col = 5
n_sub = 1 
plt.rcParams["legend.loc"] = 'upper right'
fig = plt.figure(figsize=(14,14))
plt.subplots_adjust(left=-0.3, right=1.3,bottom=-0.3,top=1.3)
for i in (np.arange(0,10,1)):
    plt.subplot(n_row, n_col, n_sub)
    sns.kdeplot(train_features.loc[:,g_features[i]],color=color[color_ind],shade=True,
                 label=['mean:'+str('{:.2f}'.format(train_features.loc[:,g_features[i]].mean()))
                        +'  ''std: '+str('{:.2f}'.format(train_features.loc[:,g_features[i]].std()))])
    
    plt.xlabel(g_features[i])
    plt.legend()                    
    n_sub+=1
    color_ind+=1
plt.show()

`Visualising Cells Training set before applying transformation( Not Normal/Gaussian)`


In [None]:
color=cm.rainbow(np.linspace(0,3,20))
color_ind=0
n_row = 6
n_col = 5
n_sub = 1 
plt.rcParams["legend.loc"] = 'upper right'
fig = plt.figure(figsize=(14,14))
plt.subplots_adjust(left=-0.3, right=1.3,bottom=-0.3,top=1.3)
for i in (np.arange(0,10,1)):
    plt.subplot(n_row, n_col, n_sub)
    sns.kdeplot(train_features.loc[:,c_features[i]],color=color[color_ind],shade=True,
                 label=['mean:'+str('{:.2f}'.format(train_features.loc[:,g_features[i]].mean()))
                        +'  ''std: '+str('{:.2f}'.format(train_features.loc[:,g_features[i]].std()))])
    
    plt.xlabel(g_features[i])
    plt.legend()                    
    n_sub+=1
    color_ind+=1
plt.show()

`Visualising Genes Testing set before applying transformation( Not Normal/Gaussian)`

In [None]:
color=cm.rainbow(np.linspace(0,3,20))
color_ind=0
n_row = 6
n_col = 5
n_sub = 1 
plt.rcParams["legend.loc"] = 'upper right'
fig = plt.figure(figsize=(14,14))
plt.subplots_adjust(left=-0.3, right=1.3,bottom=-0.3,top=1.3)
for i in (np.arange(0,10,1)):
    plt.subplot(n_row, n_col, n_sub)
    sns.kdeplot(test_features.loc[:,g_features[i]],color=color[color_ind],shade=True,
                 label=['mean:'+str('{:.2f}'.format(train_features.loc[:,g_features[i]].mean()))
                        +'  ''std: '+str('{:.2f}'.format(train_features.loc[:,g_features[i]].std()))])
    
    plt.xlabel(g_features[i])
    plt.legend()                    
    n_sub+=1
    color_ind+=1
plt.show()

`Visualising Cells Testing set before applying transformation( Not Normal/Gaussian)`

In [None]:
color=cm.rainbow(np.linspace(0,3,20))
color_ind=0
n_row = 6
n_col = 5
n_sub = 1 
plt.rcParams["legend.loc"] = 'upper right'
fig = plt.figure(figsize=(14,14))
plt.subplots_adjust(left=-0.3, right=1.3,bottom=-0.3,top=1.3)
for i in (np.arange(0,10,1)):
    plt.subplot(n_row, n_col, n_sub)
    sns.kdeplot(test_features.loc[:,c_features[i]],color=color[color_ind],shade=True,
                 label=['mean:'+str('{:.2f}'.format(train_features.loc[:,g_features[i]].mean()))
                        +'  ''std: '+str('{:.2f}'.format(train_features.loc[:,g_features[i]].std()))])
    
    plt.xlabel(g_features[i])
    plt.legend()                    
    n_sub+=1
    color_ind+=1
plt.show()

`Applying Quantile transformer to make distribution more like gaussian/Normal.I applied power transformer also but quantile did great job. `

In [None]:
for col in (genes + cells):

    #transformer = PowerTransformer(method = 'yeo-johnson')
    transformer = QuantileTransformer(n_quantiles=100,random_state=42, output_distribution='normal')
    vec_len = len(train_features[col].values)
    vec_len_test = len(test_features[col].values)
    raw_vec = train_features[col].values.reshape(vec_len, 1)
    transformer.fit(raw_vec)

    train_features[col] = transformer.transform(raw_vec).reshape(1, vec_len)[0]
    test_features[col] = transformer.transform(test_features[col].values.reshape(vec_len_test, 1)).reshape(1, vec_len_test)[0]

`Visualising Genes Trainig set after transformation(looking Normal/Gaussian, Mean=0. Std=1)`

In [None]:
color=cm.rainbow(np.linspace(0,3,20))
color_ind=0
n_row = 6
n_col = 5
n_sub = 1 
plt.rcParams["legend.loc"] = 'upper right'
fig = plt.figure(figsize=(14,14))
plt.subplots_adjust(left=-0.3, right=1.3,bottom=-0.3,top=1.3)
for i in (np.arange(0,10,1)):
    plt.subplot(n_row, n_col, n_sub)
    sns.kdeplot(train_features.loc[:,g_features[i]],color=color[color_ind],shade=True,
                 label=['mean:'+str('{:.2f}'.format(train_features.loc[:,g_features[i]].mean()))
                        +'  ''std: '+str('{:.2f}'.format(train_features.loc[:,g_features[i]].std()))])
    
    plt.xlabel(g_features[i])
    plt.legend()                    
    n_sub+=1
    color_ind+=1
plt.show()

`Visualising Cells Trainig set after transformation(looking Normal/Gaussian, Mean=0. Std=1)`

In [None]:
color=cm.rainbow(np.linspace(0,3,20))
color_ind=0
n_row = 6
n_col = 5
n_sub = 1 
plt.rcParams["legend.loc"] = 'upper right'
fig = plt.figure(figsize=(14,14))
plt.subplots_adjust(left=-0.3, right=1.3,bottom=-0.3,top=1.3)
for i in (np.arange(0,10,1)):
    plt.subplot(n_row, n_col, n_sub)
    sns.kdeplot(train_features.loc[:,c_features[i]],color=color[color_ind],shade=True,
                 label=['mean:'+str('{:.2f}'.format(train_features.loc[:,g_features[i]].mean()))
                        +'  ''std: '+str('{:.2f}'.format(train_features.loc[:,g_features[i]].std()))])
    
    plt.xlabel(g_features[i])
    plt.legend()                    
    n_sub+=1
    color_ind+=1
plt.show()

`Visualising Genes Testing set after transformation(looking Normal/Gaussian), Mean=0, Std=1`

In [None]:
color=cm.rainbow(np.linspace(0,3,20))
color_ind=0
n_row = 6
n_col = 5
n_sub = 1 
plt.rcParams["legend.loc"] = 'upper right'
fig = plt.figure(figsize=(14,14))
plt.subplots_adjust(left=-0.3, right=1.3,bottom=-0.3,top=1.3)
for i in (np.arange(0,10,1)):
    plt.subplot(n_row, n_col, n_sub)
    sns.kdeplot(test_features.loc[:,c_features[i]],color=color[color_ind],shade=True,
                 label=['mean:'+str('{:.2f}'.format(train_features.loc[:,g_features[i]].mean()))
                        +'  ''std: '+str('{:.2f}'.format(train_features.loc[:,g_features[i]].std()))])
    
    plt.xlabel(g_features[i])
    plt.legend()                    
    n_sub+=1
    color_ind+=1
plt.show()

`Applying Seed function to generate same random numbers when using pytorch, GPU and numpy functions.`

In [None]:
def seed_everything(seed=42):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    
seed_everything(seed=42)

<a id="5"></a>
# PCA

In [None]:
#Creating more features with Genes
n_comp = 650 
data = pd.concat([pd.DataFrame(train_features[genes]), pd.DataFrame(test_features[genes])])
data2 = (PCA(n_components=n_comp, random_state=42).fit_transform(data[genes]))
train2 = data2[:train_features.shape[0]] 
test2 = data2[-test_features.shape[0]:]
train_gpca = pd.DataFrame(train2, columns=[f'pca_G-{i}' for i in range(n_comp)])
test_gpca = pd.DataFrame(test2, columns=[f'pca_G-{i}' for i in range(n_comp)])
train_features = pd.concat((train_features, train_gpca), axis=1)
test_features = pd.concat((test_features, test_gpca), axis=1)

In [None]:
#Creating more features with Cells
n_comp = 50
data = pd.concat([pd.DataFrame(train_features[cells]), pd.DataFrame(test_features[cells])])
data2 = (PCA(n_components=n_comp, random_state=42).fit_transform(data[cells]))
train2 = data2[:train_features.shape[0]]
test2 = data2[-test_features.shape[0]:]
train_cpca = pd.DataFrame(train2, columns=[f'pca_C-{i}' for i in range(n_comp)])
test_cpca = pd.DataFrame(test2, columns=[f'pca_C-{i}' for i in range(n_comp)])
train_features = pd.concat((train_features, train_cpca), axis=1)
test_features = pd.concat((test_features, test_cpca), axis=1)

In [None]:
print(train_features.shape)
print(test_features.shape)

In [None]:
# Feature Selection using Variance Encoding

var_thresh = VarianceThreshold(0.845)
data = train_features.append(test_features)
data_transformed = var_thresh.fit_transform(data.iloc[:, 4:])

train_features_transformed = data_transformed[ : train_features.shape[0]]
test_features_transformed = data_transformed[-test_features.shape[0] : ]


train_features = pd.DataFrame(train_features[['sig_id','cp_type','cp_time','cp_dose']].values.reshape(-1, 4),\
                              columns=['sig_id','cp_type','cp_time','cp_dose'])

train_features = pd.concat([train_features, pd.DataFrame(train_features_transformed)], axis=1)


test_features = pd.DataFrame(test_features[['sig_id','cp_type','cp_time','cp_dose']].values.reshape(-1, 4),\
                             columns=['sig_id','cp_type','cp_time','cp_dose'])

test_features = pd.concat([test_features, pd.DataFrame(test_features_transformed)], axis=1)

train_features.shape



In [None]:
from sklearn.cluster import KMeans
def feat_cluster(train, test, n_clusters_g = 22, n_clusters_c = 4, SEED = 42):
    
    features_g = g_features
    features_c = c_features
    
    def create_cluster(train, test, features, kind = 'g', n_clusters = n_clusters_g):
        train_ = train[features].copy()
        test_ = test[features].copy()
        data = pd.concat([train_, test_], axis = 0)
        kmeans = KMeans(n_clusters = n_clusters, random_state = SEED).fit(data)
        train[f'clusters_{kind}'] = kmeans.labels_[:train.shape[0]]
        test[f'clusters_{kind}'] = kmeans.labels_[train.shape[0]:]
        train = pd.get_dummies(train, columns = [f'clusters_{kind}'])
        test = pd.get_dummies(test, columns = [f'clusters_{kind}'])
        return train, test
    
    train, test = create_cluster(train, test, features_g, kind = 'g', n_clusters = n_clusters_g)
    train, test = create_cluster(train, test, features_c, kind = 'c', n_clusters = n_clusters_c)
    return train, test

train_features2 ,test_features2=feat_cluster(train_features2,test_features2)

In [None]:
train_pca=pd.concat((train_gpca,train_cpca),axis=1)
test_pca=pd.concat((test_gpca,test_cpca),axis=1)

In [None]:
def feat_cluster_pca(train, test,n_clusters=5,SEED = 42):
        data=pd.concat([train,test],axis=0)
        kmeans = KMeans(n_clusters = n_clusters, random_state = SEED).fit(data)
        train[f'clusters_pca'] = kmeans.labels_[:train.shape[0]]
        test[f'clusters_pca'] = kmeans.labels_[train.shape[0]:]
        train = pd.get_dummies(train, columns = [f'clusters_pca'])
        test = pd.get_dummies(test, columns = [f'clusters_pca'])
        return train, test
train_cluster_pca ,test_cluster_pca = feat_cluster_pca(train_pca,test_pca)

In [None]:
train_cluster_pca = train_cluster_pca.iloc[:,700:]
test_cluster_pca = test_cluster_pca.iloc[:,700:]

In [None]:
train_features_cluster=train_features2.iloc[:,876:]
test_features_cluster=test_features2.iloc[:,876:]

In [None]:
gsquarecols=['g-574','g-211','g-216','g-0','g-255','g-577','g-153','g-389','g-60','g-370','g-248','g-167',\
             'g-203','g-177','g-301','g-332','g-517','g-6','g-744','g-224','g-162','g-3','g-736','g-486',\
             'g-283','g-22','g-359','g-361','g-440','g-335','g-106','g-307','g-745','g-146','g-416','g-298',\
             'g-666','g-91','g-17','g-549','g-145','g-157','g-768','g-568','g-396']

In [None]:
def feat_stats(train, test):
    
    features_g = g_features
    features_c = c_features
    
    for df in train, test:
        df['g_sum'] = df[features_g].sum(axis = 1)
        df['g_mean'] = df[features_g].mean(axis = 1)
        df['g_std'] = df[features_g].std(axis = 1)
        df['g_kurt'] = df[features_g].kurtosis(axis = 1)
        df['g_skew'] = df[features_g].skew(axis = 1)
        df['c_sum'] = df[features_c].sum(axis = 1)
        df['c_mean'] = df[features_c].mean(axis = 1)
        df['c_std'] = df[features_c].std(axis = 1)
        df['c_kurt'] = df[features_c].kurtosis(axis = 1)
        df['c_skew'] = df[features_c].skew(axis = 1)
        df['gc_sum'] = df[features_g + features_c].sum(axis = 1)
        df['gc_mean'] = df[features_g + features_c].mean(axis = 1)
        df['gc_std'] = df[features_g + features_c].std(axis = 1)
        df['gc_kurt'] = df[features_g + features_c].kurtosis(axis = 1)
        df['gc_skew'] = df[features_g + features_c].skew(axis = 1)
        
        df['c52_c42'] = df['c-52'] * df['c-42']
        df['c13_c73'] = df['c-13'] * df['c-73']
        df['c26_c13'] = df['c-23'] * df['c-13']
        df['c33_c6'] = df['c-33'] * df['c-6']
        df['c11_c55'] = df['c-11'] * df['c-55']
        df['c38_c63'] = df['c-38'] * df['c-63']
        df['c38_c94'] = df['c-38'] * df['c-94']
        df['c13_c94'] = df['c-13'] * df['c-94']
        df['c4_c52'] = df['c-4'] * df['c-52']
        df['c4_c42'] = df['c-4'] * df['c-42']
        df['c13_c38'] = df['c-13'] * df['c-38']
        df['c55_c2'] = df['c-55'] * df['c-2']
        df['c55_c4'] = df['c-55'] * df['c-4']
        df['c4_c13'] = df['c-4'] * df['c-13']
        df['c82_c42'] = df['c-82'] * df['c-42']
        df['c66_c42'] = df['c-66'] * df['c-42']
        df['c6_c38'] = df['c-6'] * df['c-38']
        df['c2_c13'] = df['c-2'] * df['c-13']
        df['c62_c42'] = df['c-62'] * df['c-42']
        df['c90_c55'] = df['c-90'] * df['c-55']
        
        
        for feature in features_c:
             df[f'{feature}_squared'] = df[feature] ** 2
        for feature in features_c:
             df[f'{feature}_cubed'] = df[feature] ** 3         
                
        for feature in gsquarecols:
            df[f'{feature}_squared'] = df[feature] ** 2
        for feature in gsquarecols:
            df[f'{feature}_cubed'] = df[feature] ** 3
        
    return train, test

train_features2,test_features2=feat_stats(train_features2,test_features2)

In [None]:
train_features_stats=train_features2.iloc[:,902:]
test_features_stats=test_features2.iloc[:,902:]

In [None]:
train_features = pd.concat((train_features, train_features_cluster,train_cluster_pca,train_features_stats), axis=1)
test_features = pd.concat((test_features, test_features_cluster,test_cluster_pca,test_features_stats), axis=1)

In [None]:
train = train_features.merge(train_targets_scored, on='sig_id')
train = train.merge(train_targets_nonscored, on='sig_id')
train = train.merge(train_drug, on='sig_id')
train = train[train['cp_type'] != 'ctl_vehicle'].reset_index(drop=True)
test = test_features[test_features['cp_type'] != 'ctl_vehicle'].reset_index(drop=True)
#train = train_features.merge(train_targets_scored, on='sig_id')
#train = train[train['cp_type']!='ctl_vehicle'].reset_index(drop=True)
#test = test_features[test_features['cp_type']!='ctl_vehicle'].reset_index(drop=True)
#target = train[train_targets_scored.columns]

In [None]:
train = train.drop('cp_type', axis=1)
test = test.drop('cp_type', axis=1)
#target_cols = target.drop('sig_id', axis=1).columns.values.tolist()
target_cols = [x for x in train_targets_scored.columns if x != 'sig_id']
aux_target_cols = [x for x in train_targets_nonscored.columns if x != 'sig_id']
all_target_cols = target_cols + aux_target_cols

num_targets = len(target_cols)
num_aux_targets = len(aux_target_cols)
num_all_targets = len(all_target_cols)
print('num_targets: {}'.format(num_targets))
print('num_aux_targets: {}'.format(num_aux_targets))
print('num_all_targets: {}'.format(num_all_targets))

In [None]:
# Final Dataframe shapes
print(train.shape)
print(test.shape)
print(sample_submission.shape)

<a id="7"></a>
# Pytorch Dataset Classes

In [None]:
class MoADataset:
    #it takes whatever arguments needed to build a list of tuples — it may be the name of a CSV file that will be loaded and processed; it may be two tensors, one for features, another one for labels; or anything else, depending on the task at hand.
    def __init__(self, features, targets): 
        self.features = features
        self.targets = targets
   #it should simply return the size of the whole dataset so, whenever it is sampled, its indexing is limited to the actual size.
    def __len__(self):
        return (self.features.shape[0])
    #There is no need to load the whole dataset in the constructor method (__init__). If your dataset is big (tens of thousands of image files, for instance), loading it at once would not be memory efficient. It is recommended to load them on demand (whenever __get_item__ is called).
    # it allows the dataset to be indexed, so it can work like a list (dataset[i]) — it must return a tuple (features, label) corresponding to the requested data point. 
    def __getitem__(self, idx):
        dct = {
            'x' : torch.tensor(self.features[idx, :], dtype=torch.float),
            'y' : torch.tensor(self.targets[idx, :], dtype=torch.float)            
        }
        return dct
    
class TestDataset:
    def __init__(self, features):
        self.features = features
        
    def __len__(self):
        return (self.features.shape[0])
    
    def __getitem__(self, idx):
        dct = {
            'x' : torch.tensor(self.features[idx, :], dtype=torch.float)
        }
        return dct
    

In [None]:
def train_fn(model, optimizer, scheduler, loss_fn, dataloader, device):
    model.train() #  set the model to training mode
    final_loss = 0  # Initialise final loss to zero
    
    for data in dataloader:
        optimizer.zero_grad() #every time we use the gradients to update the parameters, we need to zero the gradients afterwards
        inputs, targets = data['x'].to(device), data['y'].to(device) #Sending data to GPU(cuda) if gpu is available otherwise CPU
        outputs = model(inputs) #output 
        loss = loss_fn(outputs, targets) #loss function
        loss.backward() #compute gradients(work its way BACKWARDS from the specified loss)
        optimizer.step()  #gradient optimisation
        scheduler.step() 
        
        final_loss += loss.item() #Final loss
        
    final_loss /= len(dataloader) #average loss
    
    return final_loss


def valid_fn(model, loss_fn, dataloader, device):
    model.eval() #  set the model to evaluation/validation mode
    final_loss = 0 # Initialise validation final loss to zero
    valid_preds = [] #Empty list for appending prediction
    
    for data in dataloader:
        inputs, targets = data['x'].to(device), data['y'].to(device) #Sending data to GPU(cuda) if gpu is available otherwise CPU
        outputs = model(inputs) #output
        loss = loss_fn(outputs, targets) #loss calculation
        
        final_loss += loss.item() #final validation loss
        valid_preds.append(outputs.sigmoid().detach().cpu().numpy()) # get CPU tensor as numpy array # cannot get GPU tensor as numpy array directly
        
    final_loss /= len(dataloader)
    valid_preds = np.concatenate(valid_preds) #concatenating predictions under valid_preds
    
    return final_loss, valid_preds

def inference_fn(model, dataloader, device):
    model.eval()
    preds = []
    
    for data in dataloader:
        inputs = data['x'].to(device)

        with torch.no_grad():  # need to use NO_GRAD to keep the update out of the gradient computation
            outputs = model(inputs)
        
        preds.append(outputs.sigmoid().detach().cpu().numpy()) 
        
    preds = np.concatenate(preds)
    
    return preds

<a id="8"></a>
# Smoothing

In [None]:
# Label Smoothing Regularization(LSR) is a trick to overcome overfitting and reduce the ability of the model to adapt.
# it is similar to CrossEntropyLoss in Tensorflow.
class SmoothBCEwLogits(_WeightedLoss):
    def __init__(self, weight=None, reduction='mean', smoothing=0.0):
        super().__init__(weight=weight, reduction=reduction)
        self.smoothing = smoothing
        self.weight = weight
        self.reduction = reduction

    @staticmethod
    def _smooth(targets:torch.Tensor, n_labels:int, smoothing=0.0):
        assert 0 <= smoothing < 1
        with torch.no_grad():
            targets = targets * (1.0 - smoothing) + 0.5 * smoothing
        return targets

    def forward(self, inputs, targets):
        targets = SmoothBCEwLogits._smooth(targets, inputs.size(-1),
            self.smoothing)
        loss = F.binary_cross_entropy_with_logits(inputs, targets,self.weight)

        if  self.reduction == 'sum':
            loss = loss.sum()
        elif  self.reduction == 'mean':
            loss = loss.mean()

        return loss

In [None]:
#class LabelSmoothingCrossEntropy(nn.Module):
#import torch
#from torch.nn.modules.loss import _WeightedLoss
#import torch.nn.functional as F

#class SmoothCrossEntropyLoss(_WeightedLoss):
#    def __init__(self, weight=None, reduction='mean', smoothing=0.0):
#        super().__init__(weight=weight, reduction=reduction)
#        self.smoothing = smoothing
#        self.weight = weight
#        self.reduction = reduction
#
#    @staticmethod
#    def _smooth_one_hot(targets:torch.Tensor, n_classes:int, smoothing=0.0):
#        assert 0 <= smoothing < 1
#        with torch.no_grad():
#            targets = torch.empty(size=(targets.size(0), n_classes),
#                    device=targets.device) \
#                .fill_(smoothing /(n_classes-1)) \
#                .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing)
#        return targets
#
#    def forward(self, inputs, targets):
#        targets = SmoothCrossEntropyLoss._smooth_one_hot(targets, inputs.size(-1),
#            self.smoothing)
#        lsm = F.log_softmax(inputs, -1)
#
#        if self.weight is not None:
#            lsm = lsm * self.weight.unsqueeze(0)
#
#        loss = -(targets * lsm).sum(-1)
#
#        if  self.reduction == 'sum':
#            loss = loss.sum()
#        elif  self.reduction == 'mean':
#            loss = loss.mean()
#
#        return loss

<a id="9"></a>
# Model

In [None]:
class Model(nn.Module):
    def __init__(self, num_features, num_targets):
        super(Model, self).__init__()
        self.hidden_size = [1500, 1250, 1000, 750]
        self.dropout_value = [0.5, 0.35, 0.3, 0.25]

        self.batch_norm1 = nn.BatchNorm1d(num_features)
        self.dense1 = nn.Linear(num_features, self.hidden_size[0])
        
        self.batch_norm2 = nn.BatchNorm1d(self.hidden_size[0])
        self.dropout2 = nn.Dropout(self.dropout_value[0])
        self.dense2 = nn.Linear(self.hidden_size[0], self.hidden_size[1])

        self.batch_norm3 = nn.BatchNorm1d(self.hidden_size[1])
        self.dropout3 = nn.Dropout(self.dropout_value[1])
        self.dense3 = nn.Linear(self.hidden_size[1], self.hidden_size[2])

        self.batch_norm4 = nn.BatchNorm1d(self.hidden_size[2])
        self.dropout4 = nn.Dropout(self.dropout_value[2])
        self.dense4 = nn.Linear(self.hidden_size[2], self.hidden_size[3])

        self.batch_norm5 = nn.BatchNorm1d(self.hidden_size[3])
        self.dropout5 = nn.Dropout(self.dropout_value[3])
        self.dense5 = nn.utils.weight_norm(nn.Linear(self.hidden_size[3], num_targets))
    
    def forward(self, x):
        x = self.batch_norm1(x)
        x = F.leaky_relu(self.dense1(x))
        
        x = self.batch_norm2(x)
        x = self.dropout2(x)
        x = F.leaky_relu(self.dense2(x))

        x = self.batch_norm3(x)
        x = self.dropout3(x)
        x = F.leaky_relu(self.dense3(x))

        x = self.batch_norm4(x)
        x = self.dropout4(x)
        x = F.leaky_relu(self.dense4(x))

        x = self.batch_norm5(x)
        x = self.dropout5(x)
        x = self.dense5(x)
        return x
    
class LabelSmoothingLoss(nn.Module):
    def __init__(self, classes, smoothing=0.0, dim=-1):
        super(LabelSmoothingLoss, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing
        self.cls = classes
        self.dim = dim

    def forward(self, pred, target):
        pred = pred.log_softmax(dim=self.dim)

        with torch.no_grad():
            true_dist = torch.zeros_like(pred)
            true_dist.fill_(self.smoothing / (self.cls - 1))
            true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)
            
        return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))    

* **Learning rate scheduling**: Instead of using a fixed learning rate, we will use a learning rate scheduler, which will change the learning rate after every batch of training. There are many strategies for varying the learning rate during training, and the one we'll use is called the **"One Cycle Learning Rate Policy"**, which involves starting with a low learning rate, gradually increasing it batch-by-batch to a high learning rate for about 30% of epochs, then gradually decreasing it to a very low value for the remaining epochs. 

* **Weight decay**: We also use weight decay, which is yet another regularization technique which prevents the weights from becoming too large by adding an additional term to the loss function.


In [None]:
class FineTuneScheduler:
    def __init__(self, epochs):
        self.epochs = epochs
        self.epochs_per_step = 0
        self.frozen_layers = []

    def copy_without_top(self, model, num_features, num_targets, num_targets_new):
        self.frozen_layers = []

        model_new = Model(num_features, num_targets)
        model_new.load_state_dict(model.state_dict())

        # Freeze all weights
        for name, param in model_new.named_parameters():
            layer_index = name.split('.')[0][-1]

            if layer_index == 5:
                continue

            param.requires_grad = False

            # Save frozen layer names
            if layer_index not in self.frozen_layers:
                self.frozen_layers.append(layer_index)

        self.epochs_per_step = self.epochs // len(self.frozen_layers)

        # Replace the top layers with another ones
        model_new.batch_norm5 = nn.BatchNorm1d(model_new.hidden_size[3])
        model_new.dropout5 = nn.Dropout(model_new.dropout_value[3])
        model_new.dense5 = nn.utils.weight_norm(nn.Linear(model_new.hidden_size[-1], num_targets_new))
        model_new.to(DEVICE)
        return model_new

    def step(self, epoch, model):
        if len(self.frozen_layers) == 0:
            return

        if epoch % self.epochs_per_step == 0:
            last_frozen_index = self.frozen_layers[-1]
            
            # Unfreeze parameters of the last frozen layer
            for name, param in model.named_parameters():
                layer_index = name.split('.')[0][-1]

                if layer_index == last_frozen_index:
                    param.requires_grad = True

            del self.frozen_layers[-1]  # Remove the last layer as unfrozen

In [None]:
# convert categorical columns
def process_data(data):
    data = pd.get_dummies(data, columns=['cp_time','cp_dose'])
    return data

In [None]:
feature_cols = [c for c in process_data(train).columns if c not in all_target_cols]
feature_cols = [c for c in feature_cols if c not in ['kfold', 'sig_id', 'drug_id']]
num_features = len(feature_cols)


In [None]:


# HyperParameters

DEVICE = ('cuda' if torch.cuda.is_available() else 'cpu')
EPOCHS = 24
BATCH_SIZE = 128

WEIGHT_DECAY = {'ALL_TARGETS': 1e-5, 'SCORED_ONLY': 3e-6}
MAX_LR = {'ALL_TARGETS': 1e-2, 'SCORED_ONLY': 3e-3}
DIV_FACTOR = {'ALL_TARGETS': 1e3, 'SCORED_ONLY': 1e2}
PCT_START = 0.1

In [None]:
# Show model architecture
model = Model(num_features, num_all_targets)
model

In [None]:
from sklearn.model_selection import KFold

def make_cv_folds(train, SEEDS, NFOLDS, DRUG_THRESH):
    vc = train.drug_id.value_counts()
    vc1 = vc.loc[vc <= DRUG_THRESH].index.sort_values()
    vc2 = vc.loc[vc > DRUG_THRESH].index.sort_values()

    for seed_id in range(SEEDS):
        kfold_col = 'kfold_{}'.format(seed_id)
        
        # STRATIFY DRUGS 18X OR LESS
        dct1 = {}
        dct2 = {}

        skf = MultilabelStratifiedKFold(n_splits=NFOLDS, shuffle=True, random_state=seed_id)
        tmp = train.groupby('drug_id')[target_cols].mean().loc[vc1]

        for fold,(idxT, idxV) in enumerate(skf.split(tmp, tmp[target_cols])):
            dd = {k: fold for k in tmp.index[idxV].values}
            dct1.update(dd)

        # STRATIFY DRUGS MORE THAN 18X
        skf = MultilabelStratifiedKFold(n_splits=NFOLDS, shuffle=True, random_state=seed_id)
        tmp = train.loc[train.drug_id.isin(vc2)].reset_index(drop=True)

        for fold,(idxT, idxV) in enumerate(skf.split(tmp, tmp[target_cols])):
            dd = {k: fold for k in tmp.sig_id[idxV].values}
            dct2.update(dd)

        # ASSIGN FOLDS
        train[kfold_col] = train.drug_id.map(dct1)
        train.loc[train[kfold_col].isna(), kfold_col] = train.loc[train[kfold_col].isna(), 'sig_id'].map(dct2)
        train[kfold_col] = train[kfold_col].astype('int8')
        
    return train

SEEDS = 7
NFOLDS = 7
DRUG_THRESH = 18

train = make_cv_folds(train, SEEDS, NFOLDS, DRUG_THRESH)
train.head()

In [None]:
def run_training(fold_id, seed_id):
    seed_everything(seed_id)
    
    train_ = process_data(train)
    test_ = process_data(test)
    
    kfold_col = f'kfold_{seed_id}'
    trn_idx = train_[train_[kfold_col] != fold_id].index
    val_idx = train_[train_[kfold_col] == fold_id].index
    
    train_df = train_[train_[kfold_col] != fold_id].reset_index(drop=True)
    valid_df = train_[train_[kfold_col] == fold_id].reset_index(drop=True)
    
    def train_model(model, tag_name, target_cols_now, fine_tune_scheduler=None):
        x_train, y_train  = train_df[feature_cols].values, train_df[target_cols_now].values
        x_valid, y_valid =  valid_df[feature_cols].values, valid_df[target_cols_now].values
        
        train_dataset = MoADataset(x_train, y_train)
        valid_dataset = MoADataset(x_valid, y_valid)

        trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
        validloader = torch.utils.data.DataLoader(valid_dataset, batch_size=BATCH_SIZE, shuffle=False)
        
        optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=WEIGHT_DECAY[tag_name])
        scheduler = optim.lr_scheduler.OneCycleLR(optimizer=optimizer,
                                                  steps_per_epoch=len(trainloader),
                                                  pct_start=PCT_START,
                                                  div_factor=DIV_FACTOR[tag_name], 
                                                  max_lr=MAX_LR[tag_name],
                                                  epochs=EPOCHS)
        
        loss_fn = nn.BCEWithLogitsLoss()
        loss_tr = SmoothBCEwLogits(smoothing=0.001)

        oof = np.zeros((len(train), len(target_cols_now)))
        best_loss = np.inf
        
        for epoch in range(EPOCHS):
            if fine_tune_scheduler is not None:
                fine_tune_scheduler.step(epoch, model)

            train_loss = train_fn(model, optimizer, scheduler, loss_tr, trainloader, DEVICE)
            valid_loss, valid_preds = valid_fn(model, loss_fn, validloader, DEVICE)
            print(f"SEED: {seed_id}, FOLD: {fold_id}, {tag_name}, EPOCH: {epoch}, train_loss: {train_loss:.6f}, valid_loss: {valid_loss:.6f}")

            if np.isnan(valid_loss):
                break
            
            if valid_loss < best_loss:
                best_loss = valid_loss
                oof[val_idx] = valid_preds
                torch.save(model.state_dict(), f"{tag_name}_FOLD{fold_id}_.pth")

        return oof

    fine_tune_scheduler = FineTuneScheduler(EPOCHS)

    pretrained_model = Model(num_features, num_all_targets)
    pretrained_model.to(DEVICE)

    # Train on scored + nonscored targets
    train_model(pretrained_model, 'ALL_TARGETS', all_target_cols)

    # Load the pretrained model with the best loss
    pretrained_model = Model(num_features, num_all_targets)
    pretrained_model.load_state_dict(torch.load(f"ALL_TARGETS_FOLD{fold_id}_.pth"))
    pretrained_model.to(DEVICE)

    # Copy model without the top layer
    final_model = fine_tune_scheduler.copy_without_top(pretrained_model, num_features, num_all_targets, num_targets)

    # Fine-tune the model on scored targets only
    oof = train_model(final_model, 'SCORED_ONLY', target_cols, fine_tune_scheduler)

    # Load the fine-tuned model with the best loss
    model = Model(num_features, num_targets)
    model.load_state_dict(torch.load(f"SCORED_ONLY_FOLD{fold_id}_.pth"))
    model.to(DEVICE)

    #--------------------- PREDICTION---------------------
    x_test = test_[feature_cols].values
    testdataset = TestDataset(x_test)
    testloader = torch.utils.data.DataLoader(testdataset, batch_size=BATCH_SIZE, shuffle=False)
    
    predictions = np.zeros((len(test_), num_targets))
    predictions = inference_fn(model, testloader, DEVICE)
    return oof, predictions

<a id="10"></a>
# Prediction & Submission

In [None]:
def run_k_fold(NFOLDS, seed_id):
    oof = np.zeros((len(train), len(target_cols)))
    predictions = np.zeros((len(test), len(target_cols)))
    
    for fold_id in range(NFOLDS):
        oof_, pred_ = run_training(fold_id, seed_id)
        predictions += pred_ / NFOLDS
        oof += oof_
        
    return oof, predictions

In [None]:
# Averaging on multiple SEEDS

SEED = [0, 1, 2, 3, 4, 5, 6]
oof = np.zeros((len(train), len(target_cols)))
predictions = np.zeros((len(test), len(target_cols)))

for seed in SEED:
    
    oof_, predictions_ = run_k_fold(NFOLDS, seed)
    oof += oof_ / len(SEED)
    predictions += predictions_ / len(SEED)

train[target_cols] = oof
test[target_cols] = predictions


In [None]:
valid_results = train_targets_scored.drop(columns=target_cols).merge(train[['sig_id']+target_cols], on='sig_id', how='left').fillna(0)

y_true = train_targets_scored[target_cols].values
y_pred = valid_results[target_cols].values

score = 0

for i in range(len(target_cols)):
    score += log_loss(y_true[:, i], y_pred[:, i])

print("CV log_loss: ", score / y_pred.shape[1])

In [None]:
submission = sample_submission.drop(columns=target_cols).merge(test[['sig_id']+target_cols], on='sig_id', how='left').fillna(0)
submission.to_csv('submission.csv', index=False)


<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:black; border:0; color:tomato' role="tab" aria-controls="home"><center>If you found this notebook helpful , some upvotes would be very much appreciated - That will keep me motivated :)</center></h2>


<h1><center>THANK YOU :)</center></h1>