# Mechanisms of Action Prediction

<br>

### Contents

* <a href="#intro">1. Introduction</a>
   * <a href="#datadec">1.1. Data description</a>
- <a href="#data">2. Data overview and data preprocessing</a>
- <a href="#pca">3. Dimensionality reduction via PCA</a>
- <a href="#lr">4. A Logistic Regression Model (neural networks without hidden layers)</a>
   - <a href="#hyper-lr">4.1. Lambda optimization</a>
   - <a href="#train-original-lr">4.2. Predictions for the original data</a>
   - <a href="#train-red-lr">4.3. Predictions for the reduced data</a>
   - <a href="#train-pca-lr">4.4. Predictions for the PCA data</a>
   - <a href="#train-pca_wv-lr">4.5. Predictions for the reduced PCA data
   - <a href="#comp-lr">4.6. Comparing the results</a>
- <a href="#dnn">5. Multi-label classification via Deep Neural Networks</a>
   - <a href="#hyper">5.1. Hyperparamter optimization</a>
   - <a href="#train-original">5.2. Predictions for the original data</a>
   - <a href="#train-red">5.3. Predictions for the reduced data</a>
   - <a href="#train-pca">5.4. Predictions for the PCA data</a>
   - <a href="#train-pca">5.5. Predictions for the reduced PCA data</a>
   - <a href="#comp">5.6. Comparing the results</a>
   - <a href="#comp_models">5.7. Comparing the models</a>
- <a href="#submit">6. Submission</a>

<div id="intro"></div>

## 1. Introduction

The [Connectivity Map](https://clue.io/), a project within the Broad Institute of MIT and [Harvard, the Laboratory for Innovation Science at Harvard (LISH)](https://lish.harvard.edu/), and the NIH Common Funds Library of Integrated Network-Based Cellular Signatures (LINCS), present this challenge with the goal of advancing drug development through improvements to MoA prediction algorithms.


<b>What is the Mechanism of Action (MoA) of a drug? And why is it important?</b>

In the past, scientists derived drugs from natural products or were inspired by traditional remedies. Very common drugs, such as paracetamol, known in the US as acetaminophen, were put into clinical use decades before the biological mechanisms driving their pharmacological activities were understood. Today, with the advent of more powerful technologies, drug discovery has changed from the serendipitous approaches of the past to a more targeted model based on an understanding of the underlying biological mechanism of a disease. In this new framework, scientists seek to identify a protein target associated with a disease and develop a molecule that can modulate that protein target. As a shorthand to describe the biological activity of a given molecule, scientists assign a label referred to as mechanism-of-action or MoA for short.

<br>

<div id="datadec"></div>

### 1.1. Data description

<code>train_features.csv</code>: Features for the training set. Features g- signify gene expression data, and c- signify cell viability data. cp_type indicates samples treated with a compound (cp_vehicle) or with a control perturbation (ctrl_vehicle); control perturbations have no MoAs; cp_time and cp_dose indicate treatment duration (24, 48, 72 hours) and dose (high or low).

<code>cp_type (categorical)</code>: Samples treated with a compound or with a control perturbation. Categories include "trt_cp" and "ctl_vehicle", respectively. There is no MoA for "ctl_vehicle".

<code>cp_time (categorical)</code>: Treatment duration in hours. Categories include 24, 48, and 72 hours.

<code>cp_dose (categorical)</code>: Drug dose. Categories include "D1", "D2" for low and high dose, respectively.

<code>g-[0-771] (continous)</code>: Gene expression data - a measure of activation in a given gene after the drug is applied.

<code>c-[0-99] (continous)</code>: Cell viability. Basically count of live cells after the drug is applied.
    

<code>train_targets_scored.csv</code>: The binary MoA targets that are scored. There are 206 MoA targets.

<code>test_features.csv</code>: Features for the test data. We must predict the probability of each scored MoA for each row in the test data.
    

<hr>

<div id="data"></div>

## 2. Data overview and data preprocessing

In [None]:
#Loading libraries / methods:

import warnings
warnings.filterwarnings("ignore")

import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import tensorflow as tf
import keras
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score, KFold, GridSearchCV
from sklearn.metrics import log_loss, classification_report, confusion_matrix
from keras.models import Sequential
from keras.layers import Dense, Dropout, BatchNormalization, Input
from keras.wrappers.scikit_learn import KerasClassifier
from keras import regularizers
from keras.callbacks import ReduceLROnPlateau, ModelCheckpoint
from tensorflow_addons.layers import WeightNormalization
from tensorflow_addons.optimizers import Lookahead
from tqdm.keras import TqdmCallback

sns.set()
%matplotlib inline

In [None]:
#Loading datasets (we'll ignore the non-scored dataset):

train_features = pd.read_csv('/kaggle/input/lish-moa/train_features.csv')
train_targets = pd.read_csv('/kaggle/input/lish-moa/train_targets_scored.csv')
test_features = pd.read_csv('/kaggle/input/lish-moa/test_features.csv')

In [None]:
#Visualization of the train data:
train_features.head()

In [None]:
#Visualization of the test data:
test_features.head()

In [None]:
#Visualization of the target data:
train_targets.head()

In [None]:
#Dimension of the datasets:

print('Number of rows in training set: ', train_features.shape[0])
print('Number of columns in training set: ', train_features.shape[1])
print('Number of rows in test set: ', test_features.shape[0])
print('Number of columns in test set: ', test_features.shape[1])
print('Number of rows in target set: ',train_targets.shape[0])
print('Number of columns in target set: ',train_targets.shape[1])

#The dataset is split into the following setting: 85% into the training 
#set (23814), 15% into the test set (3982).

In [None]:
#There are 772 gene expression features (g-0 to g-771):
train_features.iloc[:, 4:776].head()

In [None]:
#There are 100 cell viability features (c-0 to c-99):
train_features.iloc[:, 776:876].head()

In [None]:
#Gene expression:
train_gene = pd.DataFrame(train_features.iloc[:, 4:776])

#Cell Viability:
train_cell = pd.DataFrame(train_features.iloc[:, 776:876])

#Gene expression and Cell Viability:
train_gc = pd.DataFrame(train_features.iloc[:, 4:876])
train_gc.head(5)

In [None]:
#Data preprocessing (test_features)


#Converting cp_type and cp_dose as binary:
def Preprocess(data):
    data = data.copy()
    data.loc[:, 'cp_type'] = data.loc[:, 'cp_type'].map({'trt_cp': 0, 'ctl_vehicle': 1})
    data.loc[:, 'cp_dose'] = data.loc[:, 'cp_dose'].map({'D1': 0, 'D2': 1})
    del data['sig_id']
    return data

#train = Preprocess(train_features)
test = Preprocess(test_features)

#Delete the sig_id column:
#del train['sig_id']

#train_targets = train_targets.loc[train['cp_type']==0].reset_index(drop=True)
#train = train.loc[train['cp_type']==0].reset_index(drop=True)


In [None]:
#Loading csv files
x_train_orig = pd.read_csv('/kaggle/input/orig-data/orig_data/x_train_orig.csv')
y_train_orig = pd.read_csv('/kaggle/input/orig-data/orig_data/y_train_orig.csv')
x_test_orig = pd.read_csv('/kaggle/input/orig-data/orig_data/x_test_orig.csv')
y_test_orig = pd.read_csv('/kaggle/input/orig-data/orig_data/y_test_orig.csv')

x_train_orig = np.asmatrix(x_train_orig)
y_train_orig = np.asmatrix(y_train_orig)
x_test_orig = np.asmatrix(x_test_orig)
y_tes_orig = np.asmatrix(y_test_orig)

print(x_train_orig.shape, y_train_orig.shape, x_test_orig.shape, y_test_orig.shape)

In [None]:
#Loading csv files
x_train_wv = pd.read_csv('/kaggle/input/reddata/red_data/x_train_wv.csv')
y_train_wv = pd.read_csv('/kaggle/input/reddata/red_data/y_train_wv.csv')
x_test_wv = pd.read_csv('/kaggle/input/reddata/red_data/x_test_wv.csv')
y_test_wv = pd.read_csv('/kaggle/input/reddata/red_data/y_test_wv.csv')

x_train_wv = np.asmatrix(x_train_wv)
y_train_wv = np.asmatrix(y_train_wv)
x_test_wv = np.asmatrix(x_test_wv)
y_test_wv= np.asmatrix(y_test_wv)

print(x_train_wv.shape, y_train_wv.shape, x_test_wv.shape, y_test_wv.shape)

In [None]:
#Loading csv files
x_train_pca = pd.read_csv('/kaggle/input/pca-data/pca_data/x_train_pca.csv')
y_train_pca = pd.read_csv('/kaggle/input/pca-data/pca_data/y_train_pca.csv')
x_test_pca = pd.read_csv('/kaggle/input/pca-data/pca_data/x_test_pca.csv')
y_test_pca = pd.read_csv('/kaggle/input/pca-data/pca_data/y_test_pca.csv')

x_train_pca = np.asmatrix(x_train_pca)
y_train_pca = np.asmatrix(y_train_pca)
x_test_pca = np.asmatrix(x_test_pca)
y_test_pca = np.asmatrix(y_test_pca)

print(x_train_pca.shape, y_train_pca.shape, x_test_pca.shape, y_test_pca.shape)

In [None]:
#Loading csv files
x_train_pca_wv = pd.read_csv('/kaggle/input/pca-wv-data/pca_wv_data/x_train_pca_wv.csv')
y_train_pca_wv = pd.read_csv('/kaggle/input/pca-wv-data/pca_wv_data/y_train_pca_wv.csv')
x_test_pca_wv = pd.read_csv('/kaggle/input/pca-wv-data/pca_wv_data/x_test_pca_wv.csv')
y_test_pca_wv = pd.read_csv('/kaggle/input/pca-wv-data/pca_wv_data/y_test_pca_wv.csv')

x_train_pca_wv = np.asmatrix(x_train_pca_wv)
y_train_pca_wv = np.asmatrix(y_train_pca_wv)
x_test_pca_wv = np.asmatrix(x_test_pca_wv)
y_tes_pca_wv = np.asmatrix(y_test_pca_wv)

print(x_train_pca_wv.shape, y_train_pca_wv.shape, x_test_pca_wv.shape, y_test_pca_wv.shape)

<div id="pca"></div>

## 3. Dimensionality Reduction via PCA

In [None]:
#Standardization

scaler=StandardScaler()
scaler.fit(train_gc)

scaled_data=scaler.transform(train_gc)

In [None]:
#PCA: Gene expression & Cell vialbility

#n_components=0.95: Keep features explaining at least 95% of the data.

pca_gc = PCA(n_components=10) #the first ten PC
pca_gc.fit(scaled_data)

print(pca_gc.explained_variance_ratio_)

In [None]:
gc_pca = pca_gc.transform(scaled_data)
print(scaled_data.shape, gc_pca.shape)

In [None]:
#Scree plot:
per_var = np.round(pca_gc.explained_variance_ratio_* 100, decimals=1)
labels = ['PC' + str(x) for x in range(1, len(per_var)+1)]
 
plt.bar(x=range(1,len(per_var)+1), height=per_var, tick_label=labels)
plt.ylabel('Percentage of Explained Variance')
plt.xlabel('Principal Component')
plt.title('Scree Plot')
plt.show()

In [None]:
#Plotting the 521 features w.r.t. CP1 and CP2:

plt.figure(figsize=(8,6))
plt.scatter(gc_pca[:,0],gc_pca[:,1])
plt.xlabel('First principle component')
plt.ylabel('Second principle component')

<div id="lr"></div>

## 4. A Logistic Regression Model (neural networks without hidden layers)

### 4.1. Lambda optimization

https://www.kaggle.com/heitorbaldo/moa-prediction-pca-dnn-r

### 4.2. Predictions for the original data

In [None]:
#Building the Logistic Regression model (neural network without hidden layers):

def LR_Model(inputsize):
    classifier = Sequential()
    classifier.add(Dense(206, activation="sigmoid", 
                              kernel_regularizer=regularizers.l1(0),
                              input_dim=inputsize))
    
    classifier.compile(optimizer='adam', loss = 'binary_crossentropy', 
                       metrics = ['accuracy'])
    
    return classifier

In [None]:
LR_Model_orig = LR_Model(875)

history_lr_orig = LR_Model_orig.fit(x_train_orig, 
                              y_train_orig,
                              validation_split=0.3,
                              epochs=300, 
                              batch_size=2300,
                              verbose=0,
                              callbacks=[TqdmCallback()])

In [None]:
#Training / Validation Loss (original data)

trg_loss_orig = history_lr_orig.history['loss']
val_loss_orig = history_lr_orig.history['val_loss']
epochs = range(1, 301)

fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(1, 2, 1)
plt.plot(epochs, trg_loss_orig, 'b',  linewidth=3, label='Training Loss')
plt.plot(epochs, val_loss_orig, 'r',  linewidth=3, label='Validation Loss')
plt.title("Training / Validation Loss - LR model (original data)")
ax.set_ylabel("Loss")
ax.set_xlabel("Epochs")
plt.legend(loc='best')
    
plt.tight_layout()
plt.show()

In [None]:
LR_Model_orig.evaluate(x_test_orig, y_test_orig)
min(history_lr_orig.history["val_loss"])

### 4.3. Predictions for the reduced data

In [None]:
LR_Model_red = LR_Model(875)

history_lr_red = LR_Model_red.fit(x_train_wv, 
                              y_train_wv,
                              validation_split=0.3,
                              epochs=300, 
                              batch_size=2300,
                              verbose=0,
                              callbacks=[TqdmCallback()])

In [None]:
#Training / Validation Loss (reduced data)

trg_loss_red = history_lr_red.history['loss']
val_loss_red = history_lr_red.history['val_loss']
epochs = range(1, 301)

fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(1, 2, 1)
plt.plot(epochs, trg_loss_red, 'k',  linewidth=3, label='Training Loss')
plt.plot(epochs, val_loss_red, 'y',  linewidth=3, label='Validation Loss')
plt.title("Training / Validation Loss - LR model (reduced data)")
ax.set_ylabel("Loss")
ax.set_xlabel("Epochs")
plt.legend(loc='best')
    
plt.tight_layout()
plt.show()

In [None]:
LR_Model_red.evaluate(x_test_wv, y_test_wv)
min(history_lr_red.history["val_loss"])

### 4.4. Predictions for the PCA data

In [None]:
LR_Model_pca = LR_Model(490)

history_lr_pca = LR_Model_pca.fit(x_train_pca, 
                              y_train_pca,
                              validation_split=0.3,
                              epochs=300, 
                              batch_size=2300,
                              verbose=0,
                              callbacks=[TqdmCallback()])

In [None]:
#Training / Validation Loss (PCA data)

trg_loss_pca = history_lr_pca.history['loss']
val_loss_pca = history_lr_pca.history['val_loss']
epochs = range(1, 301)

fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(1, 2, 1)
plt.plot(epochs, trg_loss_pca, 'g',  linewidth=3, label='Training Loss')
plt.plot(epochs, val_loss_pca, 'm',  linewidth=3, label='Validation Loss')
plt.title("Training / Validation Loss - LR model (PCA data)")
ax.set_ylabel("Loss")
ax.set_xlabel("Epochs")
plt.legend(loc='best')
    
plt.tight_layout()
plt.show()

In [None]:
LR_Model_pca.evaluate(x_test_pca, y_test_pca)
min(history_lr_pca.history["val_loss"])

### 4.5. Predictions for the reduced PCA data

In [None]:
LR_Model_pca_wv = LR_Model(490)

history_lr_pca_wv = LR_Model_pca.fit(x_train_pca_wv, 
                              y_train_pca_wv,
                              validation_split=0.3,
                              epochs=300, 
                              batch_size=2300,
                              verbose=0,
                              callbacks=[TqdmCallback()])

In [None]:
#Training / Validation Loss (reduced PCA data)

trg_loss_pca_wv = history_lr_pca_wv.history['loss']
val_loss_pca_wv = history_lr_pca_wv.history['val_loss']
epochs = range(1, 301)

fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(1, 2, 1)
plt.plot(epochs, trg_loss_pca_wv, 'c',  linewidth=3, label='Training Loss')
plt.plot(epochs, val_loss_pca_wv, 'darkgreen',  linewidth=3, label='Validation Loss')
plt.title("Training / Validation Loss - LR model (reduced PCA data)")
ax.set_ylabel("Loss")
ax.set_xlabel("Epochs")
plt.legend(loc='best')
    
plt.tight_layout()
plt.show()

In [None]:
LR_Model_pca_wv.evaluate(x_test_pca_wv, y_test_pca_wv)
min(history_lr_pca_wv.history["val_loss"])

### 4.6. Comparing the results

In [None]:
#Validation Loss (original, reduced, PCA, and reduced PCA data)   

fig = plt.figure(figsize=(20, 8))
ax = fig.add_subplot(1, 2, 1)
plt.plot(epochs, val_loss_orig, 'r',  linewidth=3, label='Validation Loss (original)')
plt.plot(epochs, val_loss_red, 'y',  linewidth=3, label='Validation Loss (reduced)')
plt.plot(epochs, val_loss_pca, 'm',  linewidth=3, label='Validation Loss (PCA)')
plt.plot(epochs, val_loss_pca_wv, 'darkgreen',  linewidth=3, label='Validation Loss (reduced PCA)')
plt.title("Validation Loss - LR model (original, reduced, PCA, and reduced PCA data)  ")
ax.set_ylabel("Loss")
ax.set_xlabel("Epochs")
plt.legend(loc='best')
    
plt.tight_layout()
plt.show()

<div id="dnn"></div>

## 5. Multi-label classification via Deep Neural Networks

<div id="#hyper"></div>

### 5.1. Hyperparameter optimization

In [None]:
#Hyperparameter optimization (tuning)
#Neural network model: 

'''
def NeuralNet(optimizer, kernel_initializer, activation, neurons, drop_rate):
    classifier = Sequential()
    classifier.add(Dense(units = neurons, activation=activation,
                        kernel_initializer=kernel_initializer, input_dim=875))
    classifier.add(BatchNormalization())
    classifier.add(Dropout(drop_rate))
    classifier.add(Dense(units = neurons, activation=activation,
                        kernel_initializer=kernel_initializer))
    classifier.add(BatchNormalization())
    classifier.add(Dropout(drop_rate))
    classifier.add(Dense(units = neurons, activation=activation,
                        kernel_initializer=kernel_initializer))
    classifier.add(BatchNormalization())
    classifier.add(Dense(206, activation="sigmoid"))
    classifier.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return classifier

classifier = KerasClassifier(build_fn = NeuralNet)
'''

#Grid Search method:

'''
parameters = {'batch_size': [1000, 2300],
              'epochs': [50, 100, 300],
              'optimizer': ['adam', 'adamw'],
              'kernel_initializer': ['random_uniform', 'normal'], 
              'activation': ['relu', 'leaky-relu', 'elu'],
              'neurons': [1048, 512, 256, 128],
              'drop_rate': [0.2, 0.3, 0.4]}

grid_search = GridSearchCV(estimator = classifier, 
                           param_grid = parameters, 
                           scoring = 'accuracy', cv = 5)

grid_search = grid_search.fit(x_train_orig, y_train_orig)

best_param = grid_search.best_params_
best_param
best_acc = grid_search.best_score_
best_acc
'''

In [None]:
#Hyperparameters were set by the Grid Search method (previous code):

batchsize_hyp = 2300
epochs_hyp = 300
hiddenlayers_hyp = 512
dropout_hyp = 0.3
opt_hyp = 'adam'

<div id="#train-original"></div>

### 5.2. Predictions for the original data

In [None]:
#Building the network:

def DNN_Model(inputsize):
    classifier = Sequential()
    classifier.add(WeightNormalization(Dense(512, activation="relu", input_dim=inputsize)))
    classifier.add(BatchNormalization())
    classifier.add(Dropout(0.3))
    classifier.add(WeightNormalization(Dense(256, activation="relu"))) 
    classifier.add(BatchNormalization())
    classifier.add(WeightNormalization(Dense(128, activation="relu"))) 
    classifier.add(BatchNormalization())
    classifier.add(WeightNormalization(Dense(206, activation="sigmoid"))) 
    
    classifier.compile(optimizer='adam', loss = 'binary_crossentropy', 
                       metrics = ['accuracy'])
    
    return classifier

In [None]:
DNN_Model_orig = DNN_Model(875)

history_dnn_orig = DNN_Model_orig.fit(x_train_orig, 
                              y_train_orig,
                              validation_split=0.3,
                              epochs=300, 
                              batch_size=2300,
                              verbose=0,
                              callbacks=[TqdmCallback()])

In [None]:
#10-fold cross-validation:

'''
models = []
history = {}
verbosity = 0

kfold = KFold(n_splits=10, shuffle=True)
for j, (train_idx, val_idx) in enumerate(kfold.split(x_train_orig)):
    model = DNN_Model(875)
    history[j] = model.fit(x_train_orig.values[train_idx], y_train_orig.values[train_idx], 
                validation_data=(x_train_orig.values[val_idx], y_train_orig.values[val_idx]), 
                           batch_size=2300, epochs=12, verbose=verbosity)
    scores = model.evaluate(x_train_orig.values[val_idx], y_train_orig.values[val_idx], 
                            verbose=verbosity)
    print('Fold %d: %s %.6f' % (j,model.metrics_names[0],scores))
    models.append(model)
'''

In [None]:
'''
plt.figure(figsize=(15,7))
for k,v in history.items():
    plt.plot(v.history["loss"], color='#d13812', label="Loss Fold "+str(k))
    plt.plot(v.history["val_loss"], color='#6dbc1e', label="ValLoss Fold "+str(k))

plt.xlabel('Epochs')
plt.ylabel('Error')
plt.title('Folds Error Compound')
plt.legend()
plt.show()
'''

In [None]:
#Training / Validation Loss (original data)

trg_loss_dnn_orig = history_dnn_orig.history['loss']
val_loss_dnn_orig = history_dnn_orig.history['val_loss']
epochs = range(1, 301)

fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(1, 2, 1)
plt.plot(epochs, trg_loss_dnn_orig, 'b',  linewidth=3, label='Training Loss')
plt.plot(epochs, val_loss_dnn_orig, 'r',  linewidth=3, label='Validation Loss')
plt.title("Training / Validation Loss - DNN model (original data)")
ax.set_ylabel("Loss")
ax.set_xlabel("Epochs")
plt.legend(loc='best')
    
plt.tight_layout()
plt.show()

In [None]:
DNN_Model_orig.evaluate(x_test_orig, y_test_orig)
min(history_dnn_orig.history["val_loss"])

### 5.3. Predictions for the reduced data

In [None]:
DNN_Model_red = DNN_Model(875)

history_dnn_red = DNN_Model_red.fit(x_train_wv, 
                              y_train_wv,
                              validation_split=0.3,
                              epochs=300, 
                              batch_size=2300,
                              verbose=0,
                              callbacks=[TqdmCallback()])

In [None]:
#Training / Validation Loss (reduced data)

trg_loss_dnn_red = history_dnn_red.history['loss']
val_loss_dnn_red = history_dnn_red.history['val_loss']
epochs = range(1, 301)

fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(1, 2, 1)
plt.plot(epochs, trg_loss_dnn_red, 'k',  linewidth=3, label='Training Loss')
plt.plot(epochs, val_loss_dnn_red, 'y',  linewidth=3, label='Validation Loss')
plt.title("Training / Validation Loss - DNN model (reduced data)")
ax.set_ylabel("Loss")
ax.set_xlabel("Epochs")
plt.legend(loc='best')
    
plt.tight_layout()
plt.show()

In [None]:
DNN_Model_red.evaluate(x_test_wv, y_test_wv)
min(history_dnn_red.history["val_loss"])

### 5.4. Predictions for the PCA data

In [None]:
DNN_Model_pca = DNN_Model(490)

history_dnn_pca = DNN_Model_pca.fit(x_train_pca, 
                              y_train_pca,
                              validation_split=0.3,
                              epochs=300, 
                              batch_size=2300,
                              verbose=0,
                              callbacks=[TqdmCallback()])

In [None]:
#Training / Validation Loss (PCA data)

trg_loss_dnn_pca = history_dnn_pca.history['loss']
val_loss_dnn_pca = history_dnn_pca.history['val_loss']
epochs = range(1, 301)

fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(1, 2, 1)
plt.plot(epochs, trg_loss_dnn_pca, 'g',  linewidth=3, label='Training Loss')
plt.plot(epochs, val_loss_dnn_pca, 'm',  linewidth=3, label='Validation Loss')
plt.title("Training / Validation Loss - DNN model (PCA data)")
ax.set_ylabel("Loss")
ax.set_xlabel("Epochs")
plt.legend(loc='best')
    
plt.tight_layout()
plt.show()

In [None]:
DNN_Model_pca.evaluate(x_test_pca, y_test_pca)
min(history_dnn_pca.history["val_loss"])

### 5.5. Predictions for the reduced PCA data

In [None]:
DNN_Model_pca_wv = DNN_Model(875)

history_dnn_pca_wv = DNN_Model_pca_wv.fit(x_train_pca_wv, 
                              y_train_pca_wv,
                              validation_split=0.3,
                              epochs=300, 
                              batch_size=2300,
                              verbose=0,
                              callbacks=[TqdmCallback()])

In [None]:
#Training / Validation Loss (original data)

trg_loss_dnn_pca_wv = history_dnn_pca_wv.history['loss']
val_loss_dnn_pca_wv = history_dnn_pca_wv.history['val_loss']
epochs = range(1, 301)

fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(1, 2, 1)
plt.plot(epochs, trg_loss_dnn_pca_wv, 'c',  linewidth=3, label='Training Loss')
plt.plot(epochs, val_loss_dnn_pca_wv, 'darkgreen',  linewidth=3, label='Validation Loss')
plt.title("Training / Validation Loss - DNN model (original data)")
ax.set_ylabel("Loss")
ax.set_xlabel("Epochs")
plt.legend(loc='best')
    
plt.tight_layout()
plt.show()

In [None]:
DNN_Model_pca_wv.evaluate(x_test_pca_wv, y_test_pca_wv)
min(history_dnn_pca_wv.history["val_loss"])

### 5.6. Comparing the results

In [None]:
#Validation Loss (original, reduced, PCA, and reduced PCA data)   

fig = plt.figure(figsize=(20, 8))
ax = fig.add_subplot(1, 2, 1)
plt.plot(epochs, val_loss_dnn_orig, 'r',  linewidth=3, label='Validation Loss (original)')
plt.plot(epochs, val_loss_dnn_red, 'y',  linewidth=3, label='Validation Loss (reduced)')
plt.plot(epochs, val_loss_dnn_pca, 'm',  linewidth=3, label='Validation Loss (PCA)')
plt.plot(epochs, val_loss_dnn_pca_wv, 'darkgreen',  linewidth=3, label='Validation Loss (reduced PCA)')
plt.title("Validation Loss - DNN model (original, reduced, PCA, and reduced PCA data)  ")
ax.set_ylabel("Loss")
ax.set_xlabel("Epochs")
plt.legend(loc='best')
    
plt.tight_layout()
plt.show()

### 5.7. Comparing the models

In [None]:
#Validation Loss - DNN vs LR model (origina and PCA data)   

fig = plt.figure(figsize=(20, 8))
ax = fig.add_subplot(1, 2, 1)
plt.plot(epochs, val_loss_orig, 'r',  linewidth=3, label='Validation Loss (LR - original)')
plt.plot(epochs, val_loss_pca, 'y',  linewidth=3, label='Validation Loss (LR - PCA)')
plt.plot(epochs, val_loss_dnn_orig, 'm',  linewidth=3, label='Validation Loss (DNN - original)')
plt.plot(epochs, val_loss_dnn_pca, 'darkgreen',  linewidth=3, label='Validation Loss (DNN - PCA)')
plt.title("Validation Loss - DNN vs LR model (original and PCA data)")
ax.set_ylabel("Loss")
ax.set_xlabel("Epochs")
plt.legend(loc='best')
    
plt.tight_layout()
plt.show()

<div id="#submit"></div>

## 6. Submission

In [None]:
sample_submit = pd.read_csv('/kaggle/input/lish-moa/sample_submission.csv')
sample_submit.head()

In [None]:
submit_pred = DNN_Model_orig.predict(test)

In [None]:
sample_submit.iloc[:,1:] = submit_pred
sample_submit.head()

In [None]:
sample_submit.to_csv("submission.csv", index=False, header=True) 