### Table of Content <a id='toc'></>

- [Introduction](#introduction)
- [Load dataset](#load_dataset)
    - [Morgan FP](#morgan-fp)
    - [Divide the data](#divide_the_data)
    - [Hot encoding the Y label](#hot_encoding_the_y_label)
- [Define the model architecture](#define-the-model-architecture)


#### Introduction <a id='introduction'></a>

Train a Deep Neural Network (DNN) for predicting the hERG liability of a molecule. 

In this tutorial we will develop a Deep Neural Network model for predicting the hERG activity of chemical compounds. We will be using Keras with TensorFlow back end to train the DNN model. 

Keras is an open-source software library that provides a Python interface for artificial neural networks. We will be running the python codes in a Jupyter notebooks.    

##### Background

hERG (the human Ether-à-go-go-Related Gene) is a gene that codes for a protein KV11.1, the alpha sub-unit of a potassium ion channel. This ion channel (sometimes simply denoted as 'hERG') is best known for its contribution to the electrical activity of the heart. When this channel's ability to conduct electrical current across the cell membrane is blocked, either by drugs or by rare mutations it can result in long QT syndrome. Which leads to potentially life-threatening arrhythmia. 

In [1]:
### imports     ##
###=============##
#! pip install pandas tensorflow 
#! pip install -U scikit-learn
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

2023-03-19 17:22:41.043642: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-03-19 17:22:43.491971: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-03-19 17:22:43.493464: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Load dataset <a id='load_dataset'></a> 
[TOP](#toc)

Load the dataset and calculate ECFP/Morgan fingerprints of compounds. Please note due to large size of full dataset following code example is shown with a small subset of data. 


In [2]:
## data file
file_name = 'data/herg_MLSMR_automated_patch_clamp_small.csv'

## read as dataframe
df_data = pd.read_csv(file_name)

print('Read {file_name} file which has shape {df_shape}. \n\
      Name of the columns are {col_names}'.format(file_name=file_name, df_shape=df_data.shape, col_names=[col for col in df_data.columns]))

Read data/herg_MLSMR_automated_patch_clamp_small.csv file which has shape (30030, 5). 
      Name of the columns are ['PUBCHEM_SID', 'hERG inhibition (%) at 1uM', 'hERG inhibition (%) at 10uM', 'SMILES_unique_largest_fragment', 'SMILES_Original']


#### Morgan FP <a id='morgan-fp'></a>
[TOP](#toc)

In [3]:
from utils.utils import get_morgan

### Calculate Morgan fingerprint 
df_morgan = get_morgan(df_data, smiles='SMILES_unique_largest_fragment')

ModuleNotFoundError: No module named 'rdkit'

#### Divide the data <a id='divide_the_data'></a>
[TOP](#toc)

Divide the dataset into test and train

In [None]:
# train and test division
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.02, random_state=42)

m = X_train.shape[0]  # training set size
 
print ('The shape of X_train is: ' + str(X_train.shape))
print ('The shape of Y is: ' + str(Y_train.shape))
print ('The shape of X_test is: ' + str(X_test.shape))
print ('The shape of Y_test is: ' + str(Y_test.shape))
print ('I have m = %d training examples!' % (m))
print ('I have m = %d training examples!' % (X.shape[0]))
print("\n Y", Y)
print("\n X", X)

#### Hot encoding the Y label <a id='hot_encoding_the_y_label'></a>
[TOP](#toc)



In [8]:
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y_train)
encoded_Y = encoder.transform(Y_train)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

# encode test class
encoder_test = LabelEncoder()
encoder_test.fit(Y_test)
encoded_Y_test = encoder.transform(Y_test)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y_test = np_utils.to_categorical(encoded_Y_test)

NameError: name 'Y_train' is not defined

#### Define the model architecture <a id='define-the-model-architecture'></a>

[TOP](#toc)

Fully connected. 

Input [75] --> Hidden layer [100] --> Output[3]

In [None]:
model = Sequential()
model.add(Dense(2000, input_dim=1024, activation='relu'))
model.add(Dense(4000, activation='relu'))
model.add(Dense(1000, activation='relu'))
model.add(Dense(500, activation='relu'))
model.add(Dense(3, activation='softmax'))

In [7]:
squares = [n**2 for n in range(10)]
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [1]:
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune']

In [8]:
if 14 % 7 == 0:
    print('True')

help(planets)

True
Help on list object:

class list(object)
 |  list(iterable=(), /)
 |  
 |  Built-in mutable sequence.
 |  
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate