# Assignment 4 : Neural Networks

In this assignment, you will design an artificial neural network for a binary classification task. We will use Keras for implementing the neural network in this assignment. You can also use numpy, pandas and/or scikit-learn wherever you find them useful. You'll also need matplotlib

#### Import the required libraries here

In [51]:
import numpy as np
from tensorflow import keras
import tensorflow as tf

Load the dataset from the file named 'data.csv' . You'll find 7200 datapoints in this file. The first 6 columns are the features (X), while the last column has a binary label (Y) for each feature vector. After loading the dataset, divide it into a training set and a test set (cross-validation set to be more accurate). You can have 70% datapoints in the training set and 30% in the test set. An 80-20 split is also acceptable.

Normalize your training set using mean and variances.

In [23]:
data = np.loadtxt("data.csv",delimiter = ',')
y = data.T[6]
x = data.T[:6]
x = x.T
x_mean = np.sum(x, axis = 0)/7200
squared = np.square(x)
variance = np.sqrt(np.sum(squared, axis = 0)/7200 - np.square(x_mean))
for i in range(7200):
    for j in range(6):
        x[i][j] = (x[i][j] - x_mean[j])/variance[j]
x_train = x[:5040]
x_test = x[5040:7200]
y_train = y.T[:5040]
y_test = y.T[5040:7200]
print(np.where(y_test == 1))

(array([  33,   41,   44,   72,   76,   84,   93,  113,  114,  123,  131,
        135,  143,  144,  149,  171,  187,  193,  210,  217,  219,  223,
        235,  240,  267,  279,  280,  281,  310,  315,  336,  349,  353,
        366,  371,  392,  408,  417,  422,  431,  441,  467,  468,  478,
        489,  490,  492,  502,  517,  525,  544,  564,  567,  571,  573,
        578,  583,  607,  627,  645,  651,  653,  667,  679,  739,  741,
        764,  813,  816,  821,  834,  836,  868,  872,  885,  890,  900,
        909,  910,  916,  918,  929,  931,  938,  981,  984,  990,  998,
       1000, 1011, 1050, 1107, 1118, 1129, 1157, 1168, 1190, 1192, 1216,
       1229, 1243, 1249, 1251, 1253, 1254, 1255, 1259, 1262, 1265, 1287,
       1316, 1333, 1341, 1361, 1390, 1400, 1416, 1429, 1437, 1441, 1472,
       1497, 1509, 1533, 1542, 1551, 1586, 1587, 1605, 1614, 1623, 1628,
       1632, 1661, 1663, 1673, 1677, 1698, 1701, 1728, 1764, 1777, 1800,
       1836, 1842, 1850, 1869, 1871, 1906, 1932, 1

Find the number of positive and negative samples in the training set /test set /the whole dataset. You'll use this result while evaluating your neural network model later.

In [3]:
test_neg = np.count_nonzero(x_test < 0)
train_pos = np.count_nonzero(x_train > 0)
test_pos = np.count_nonzero(x_test > 0)
train_neg = np.count_nonzero(x_train < 0)

#### Define the architecture of the neural network

Initialise a sequential neural network model using keras.models.Sequential(), and add dense layers (dense layer means fully-connected layer) to it using keras.layers.Dense() (you easily how to do this from the internet).

Use ReLU activation function in every layer, except the last one, where you'll use the sigmoid activation function, since it's a binary classification task.

The choice of the number of layers and the number of units in a layer is totally up to you.

In [16]:
model  = keras.models.Sequential()
model.add(keras.layers.Dense(3, activation = tf.keras.activations.relu))
model.add(keras.layers.Dense(3, activation = tf.keras.activations.relu))
model.add(keras.layers.Dense(1, activation = tf.keras.activations.sigmoid))

#### Training the neural network

1. Compile your nn model using model.compile() . Use the appropriate loss function (binary cross-entropy), and use Adam optimizer. Pass on 'accuracy' as a metric, so that you get to see the accuracy on your training set after every iteration of Adam optimization (a form of mini-batch gradient descent).

Try to look-up and learn a bit about what stochastic gradient descent and mini-batch gradient descent essentilly are. You'll use mini-batches while training your model.

2. Train your model using model.fit() (this will take a while, perhaps a few minutes). Use appropriate number of ephocs and batch size (you have to decide which values work the best). Don't forget that you have to train your model on the training set, and not the test set

In [19]:
model.compile(optimizer = "Adam", loss = "bce", metrics = ["accuracy"])
model.fit(x_train, y_train, 10, 5, 1)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7fa85c3ac5e0>

#### Evaluation of the model on the test set

Find the accuracy of your trained model on the test set. Don't forget that you had normalized your training set before training the same model.

In [53]:
prediction = model.predict(x_test, 10, 1)
prediction = np.where(prediction > 0.5, prediction, 0)
prediction = np.where(prediction < 0.5, prediction, 1)
accuracy = np.count_nonzero(y_test - prediction.T == 0)/y_test.size
print(accuracy)

0.9467592592592593


Is accuracy really the best metric to evaluate our model on the test set here? What was the proportion of positive samples to negative samples in the dataset? What would have been the accuracy of a model that would have output '0' for any input? By the way, this dataset is for detecting the presence of thyroid in a patient.

Now, find out about the metrics - recall, precision and F1-score. Use these metrics to evaluate your model on the test set

Now, tune the hyperparameters of your model (like number of layers, number of units in different layers, etc.) to try and do better and better on the test set.


#### Plotting histograms based on prediction values

Now, you will plot 2 historgrams of prediction values (0 to 1, or 0% to 100%) on the training data, output by your nn model - one histogram for positive samples and another for negative samples. Plot both the histograms in the same figure. 

Here's one such plot I made with my nn model:
<img src="files/index.jpeg">

The blue histogram is for negative samples and the red one is for positive samples (add a legend to the plots, unlike me).

Use log-scale on the y-axis (number of occurances in the given predicted value range), like I have done.

Now, what can you infer from the histograms you got? How should an ideal pair of histograms look like?