# **Task: To build a classification model on the given data using Deep Learning approach**

The given dataset contains details about organic chemical compounds. The compounds are classified as either ‘Musk’ or ‘Non-Musk’ compounds. The task is to classify these compounds accordingly. I used an Artificial Neural Network (ANN) built using Keras.<br>
<br>
This many-to-one relationship between feature vectors and molecules is called the "multiple instance problem". When learning a classifier for this data, the classifier should classify a molecule as "musk" if ANY of its conformations is classified as a musk. A molecule should be classified as "non-musk" if NONE of its conformations is classified as a musk.<br>
A simple neural network can do a fine work to classify such data

Importing required libraries

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.decomposition import PCA
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
from sklearn.metrics import f1_score, precision_score, recall_score

In [None]:
dataset = pd.read_csv('../input/credcxodataset/musk_csv.csv') #reading the dataset using pandas
dataset.head() #Displaying the dataset

### **Data preprocessing:<br>**
1. Checking for null values <br>
2. Performing Feature Scaling

In [None]:
dataset.isna().sum() #Checking for any null values

In [None]:
X = dataset.iloc[:, 3:-1].values  #Extracting the important features from the dataset
y = dataset.iloc[:, -1].values

In [None]:
from sklearn.preprocessing import RobustScaler  #feature scaling
scaler = RobustScaler()

X = scaler.fit_transform(X)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [None]:
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

### Building Deep Learining Model
Used an Artificial Neural Network (ANN)

In [None]:
model =tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(40, input_shape=(166,),activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10,activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(1,activation=tf.nn.sigmoid))

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, validation_split = 0.2, epochs = 40) #training the model

## Post Processing

### Plotting the graphs

In [None]:
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='lower right')
plt.show()

In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper right')
plt.show()

In [None]:
val_score=model.evaluate(X_test,y_test,verbose=0) #Calculating Validation Score

In [None]:
model.save("weights.h5") #saving weights

## Final performance measures 

In [None]:
print("Validation Accuracy:",val_score[1])
print("Validation Loss:",val_score[0])
print("f1_score:",f1_score(y_test,model.predict_classes(X_test)))
print("recall:",recall_score(y_test,model.predict_classes(X_test)))
print("precision_score:",precision_score(y_test,model.predict_classes(X_test)))