# Assignment1: Classification of Sonar dataset With one Hidden Layer

In this assignment, you will implement a neural network with one hidden layer from scratch using numpy operations to classify the UCI sonar dataset to Rock or Mine: https://www.kaggle.com/datasets/shrutimehta/nasa-asteroids-classification.

In [78]:
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
import sklearn.linear_model
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

np.random.seed(10) 

## Load  and Prepare Dataset

Load the dataset into a dataframe and show the first few rows:

In [79]:
sonar_dataframe = pd.read_csv("data/sonar.all-data.csv", header=None)
sonar_dataframe.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,60
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094,R


How many rows and columns does this data set have:

In [80]:
print(sonar_dataframe.shape[0], ' rows')
print(sonar_dataframe.shape[1], ' columns')

208  rows
61  columns


Check the columns of the dataframe using info() function:

In [81]:
sonar_dataframe.info

<bound method DataFrame.info of          0       1       2       3       4       5       6       7       8   \
0    0.0200  0.0371  0.0428  0.0207  0.0954  0.0986  0.1539  0.1601  0.3109   
1    0.0453  0.0523  0.0843  0.0689  0.1183  0.2583  0.2156  0.3481  0.3337   
2    0.0262  0.0582  0.1099  0.1083  0.0974  0.2280  0.2431  0.3771  0.5598   
3    0.0100  0.0171  0.0623  0.0205  0.0205  0.0368  0.1098  0.1276  0.0598   
4    0.0762  0.0666  0.0481  0.0394  0.0590  0.0649  0.1209  0.2467  0.3564   
..      ...     ...     ...     ...     ...     ...     ...     ...     ...   
203  0.0187  0.0346  0.0168  0.0177  0.0393  0.1630  0.2028  0.1694  0.2328   
204  0.0323  0.0101  0.0298  0.0564  0.0760  0.0958  0.0990  0.1018  0.1030   
205  0.0522  0.0437  0.0180  0.0292  0.0351  0.1171  0.1257  0.1178  0.1258   
206  0.0303  0.0353  0.0490  0.0608  0.0167  0.1354  0.1465  0.1123  0.1945   
207  0.0260  0.0363  0.0136  0.0272  0.0214  0.0338  0.0655  0.1400  0.1843   

         9   ...   

Convert the target column to 0 and 1:

In [82]:
sonar_dataframe[60]=sonar_dataframe[60].map({'R': 0,'M' :1 })

Convert the sonar_dataframe to numpy array using the values function:


In [83]:
sonar_np_array = np.array(sonar_dataframe.values)

Split the dataset into  80\% train and 20\% validation usig the train_test_split command:

In [84]:
train, test = train_test_split(sonar_np_array, test_size=0.2, train_size=0.8)

split the last column as the label:

In [85]:
X_train = train[:,0:60].astype(float)
Y_train = train[:,60]

In [86]:
X_test = test[:,0:60].astype(float)
Y_test = test[:,60]

## Train a logistic Regression Model
Use sklearn to train a logistic regression model:

In [87]:
logit = LogisticRegression()
logit.fit(X_train, Y_train)
logit.predict(X_test)

array([0., 0., 0., 1., 0., 1., 1., 1., 1., 0., 1., 1., 0., 1., 0., 1., 0.,
       0., 1., 1., 1., 0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1.,
       0., 1., 1., 0., 1., 1., 0., 1.])

What is the accuracy of the model:

In [88]:
score = logit.score(X_test, Y_test)
print(score)

0.7142857142857143


## Building a Neural Network Model
In this section, you will create an NN model with one hidden layer and a sigmoid function for the output layer. Use a tanh functiomn for the hidden layer activation. Use average cross entropy for the loss function.
Fill in the missing code wherever you see \#CODE HERE comment

In [89]:
def initialize_parameters(n_x, n_h, n_y):
    np.random.seed(2) 
    
    W1 = np.random.randn(n_h,n_x) * 0.01
    b1 = np.zeros((n_h,1))
    W2 = np.random.randn(n_y,n_h) * 0.01
    b2 = np.zeros((n_y,1))
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In [90]:
def sigmoid(z):
    return 1/(1 + np.exp(-z))

In [91]:
def forward_propagation(X, parameters):
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    Z1 = np.dot(W1,X.T) + b1
    A1 = np.tanh(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = np.tanh(Z2)
    
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    
    return A2, cache

In [92]:
def cross_entropy(Y_hat, Y, parameters):
    m = Y.shape[1] # number of example
    logprobs = logprobs = np.multiply(Y ,np.log(Y_hat)) + np.multiply((1-Y), np.log(1-Y_hat))
    cost = (-1/m) * np.sum(logprobs)
    cost = float(np.squeeze(cost)) 
                                    
    return cost

In [93]:
def backward_propagation(parameters, cache, X, Y):
    m = X.shape[0]
   
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    
    A1 = cache["A1"]
    A2 = cache["A2"]
    Z1 = cache["Z1"]
    Z2 = cache["Z2"]

    dZ2 = A2 - Y
    dW2 = (1 / m) * np.dot(dZ2, A1.T)
    db2 = (1 / m) * np.sum(dZ2, axis = 1, keepdims = True)
    dZ1 = np.dot(W2.T, dZ2) * (1 - np.power(A1, 2))
    dW1 = (1 / m) * np.dot(dZ1, X)
    db1 = (1 / m) * np.sum(dZ1, axis = 1, keepdims = True)
    
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return grads

In [94]:

def update_parameters(parameters, grads, learning_rate):
    
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    
 
    dW1 = grads["dW1"]
    db1 = grads["db1"]
    dW2 = grads["dW2"]
    db2 = grads["db2"]


    W1 = W1 - learning_rate * dW1
    b1 = b1 - learning_rate * db1
    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

Below is the main function which puts all the previous functions together to train the model

In [95]:
def train_nn_model(X, Y, num_of_hidden_units, learning_rate, num_iterations = 10000, print_cost=False):
    
    input_size=X_train.shape[1] 
    num_of_output_units=1
    parameters = initialize_parameters(input_size, num_of_hidden_units, num_of_output_units)
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    
    # gradient descent Loop
    for i in range(0, num_iterations):
        A2, cache = forward_propagation(X, parameters)
        grads = backward_propagation(parameters, cache, X, Y)
        parameters = update_parameters(parameters, grads, learning_rate)
        if print_cost and i % 10000 == 0:
            cost = cross_entropy(A2, Y, parameters)
            print ("Cost after iteration %i: %f" %(i, cost))

    return parameters

Predict is the scoring function to get predictions for new instances using the trained model: 

In [96]:
def predict(parameters, X):
    A2, cache = forward_propagation(X, parameters)
    predictions = (A2 > 0.5)    
    return predictions


Train the model using the train_nn_model defined above with 5 units in the hidden layer

In [97]:
parameters = train_nn_model(X_train, Y_train, 5, 0.01)


Use the predict function to generate the output for the X_test data. What is the accuracy of the model?

In [98]:
predictions=predict(parameters, X_test)

In [99]:
print ('Accuracy: %d' % float((np.dot(Y_test,predictions.T) + np.dot(1-Y_test,1-predictions.T))/float(Y_test.size)*100) + '%')


Accuracy: 73%


## Tunning the Size of Hidden Layer

Run the following code to see which size for the hodden layer gives you the best performance

In [100]:

plt.figure(figsize=(16, 32))
hidden_layer_sizes = [1, 2, 3, 4, 5, 10, 20, 30,50]
for i, num_hidden_units in enumerate(hidden_layer_sizes):
    parameters = train_nn_model(X_train, np.expand_dims(Y_train,axis=0),num_hidden_units ,0.01, num_iterations = 100000)
    predictions = predict(parameters, X_test)
    accuracy = float((np.dot(Y_test,predictions.T) + np.dot(1-Y_test,1-predictions.T))/float(Y_test.size)*100)
    print ("Accuracy for {} hidden units: {} %".format(num_hidden_units, accuracy))

Accuracy for 1 hidden units: 71.42857142857143 %
Accuracy for 2 hidden units: 69.04761904761905 %
Accuracy for 3 hidden units: 73.80952380952381 %
Accuracy for 4 hidden units: 76.19047619047619 %
Accuracy for 5 hidden units: 66.66666666666666 %
Accuracy for 10 hidden units: 64.28571428571429 %
Accuracy for 20 hidden units: 69.04761904761905 %
Accuracy for 30 hidden units: 71.42857142857143 %
Accuracy for 50 hidden units: 73.80952380952381 %


<Figure size 1600x3200 with 0 Axes>

Which one was the best model?

The best model was back propagation with 4 hidden layers based on the accuracy scores we got.Above 4 hidden layers we start to see the model becoming over fit.