##Neural Network Implementation in Python


In [None]:
import numpy as np
feature_set = np.array([[0,1,0],[0,0,1],[1,0,0],[1,1,0],[1,1,1]])
labels = np.array([[1,0,0,1,1]])
labels = labels.reshape(5,1)

#####we create our feature set. It contains five records. Similarly, we created a labels set which contains corresponding labels for each record in the feature set. The labels are the answers we're trying to predict with the neural network.

The next step is to define hyper parameters for our neural network. Execute the following script to do so:

In [None]:
feature_set

array([[0, 1, 0],
       [0, 0, 1],
       [1, 0, 0],
       [1, 1, 0],
       [1, 1, 1]])

In [None]:
labels

array([[1],
       [0],
       [0],
       [1],
       [1]])

In [None]:
np.random.seed(0)
weights = np.random.rand(3,1)
bias = np.random.rand(1)
lr = 0.05

#####In the script above we used the random.seed function so that we can get the same random values whenever the script is executed.

In the next step, we initialize our weights with normally distributed random numbers. Since we have three features in the input, we have a vector of three weights. We then initialize the bias value with another random number. Finally, we set the learning rate to 0.05.

Next, we need to define our activation function and its derivative (I'll explain in a moment why we need to find the derivative of the activation). Our activation function is the sigmoid function, which we covered earlier.

In [None]:
weights

array([[0.5488135 ],
       [0.71518937],
       [0.60276338]])

In [None]:
bias

array([0.54488318])

#####Activation Function 



In [None]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

#####calculates the derivative of the sigmoid function 

In [None]:
def sigmoid_der(x):
    return sigmoid(x)*(1-sigmoid(x))

In [None]:
for epoch in range(50000):
    inputs = feature_set

    # feedforward step1
    XW = np.dot(feature_set, weights) + bias

    #feedforward step2
    z = sigmoid(XW)


    # backpropagation step 1
    error = z - labels

    print(error.sum())

    # backpropagation step 2
    dcost_dpred = error
    dpred_dz = sigmoid_der(z)

    z_delta = dcost_dpred * dpred_dz

    inputs = feature_set.T
    weights -= lr * np.dot(inputs, z_delta)

    for num in z_delta:
        bias -= lr * num

######In the first step, we define the number of epochs. An epoch is basically the number of times we want to train the algorithm on our data. We will train the algorithm on our data 50,000 times. I have tested this number and found that the error is pretty much minimized after 50,000 iterations. You can try with a different number. The ultimate goal is to minimize the error.

Next we store the values from the feature_set to the input variable. 

Then find the dot product of the input and the weight vector and add bias to it. This is Step 1 of the feedforward section.

Then We pass the dot product through the sigmoid activation function, as explained in Step 2 of the feedforward section. This completes the feed forward part of our algorithm.

Now is the time to start backpropagation. The variable z contains the predicted outputs. The first step of the backpropagation is to find the error.

We need to differentiate this function with respect to each weight. We will use the chain rule of differentiation for this purpose. Let's suppose "d_cost" is the derivate of our cost function with respect to weight "w", we can use chain rule to find this derivative.

Here "d_pred" is simply the sigmoid function and we have differentiated it with respect to input dot product "z".

Therefore, derivative with respect to any weight is simply the corresponding input. Hence, our final derivative of the cost function with respect to any weight

Here we have the z_delta variable, which contains the product of dcost_dpred and dpred_dz. Instead of looping through each record and multiplying the input with corresponding z_delta, we take the transpose of the input feature matrix and multiply it with the z_delta. Finally, we multiply the learning rate variable lr with the derivative to increase the speed of convergence.

We then looped through each derivative value and update our bias values

You can see that error is extremely small at the end of the training of our neural network. At this point of time our weights and bias will have values that can be used to detect whether a person is diabetic or not, based on his smoking habits, obesity, and exercise habits.

You can now try and predict the value of a single instance. Let's suppose we have a record of a patient that comes in who smokes, is not obese, and doesn't exercise. Let's find if he is likely to be diabetic or not. The input feature will look like this: [1,0,0].

In [None]:
single_point = np.array([1,0,0])
result = sigmoid(np.dot(single_point, weights) + bias)
print(result)

[0.00281444]


In [None]:
single_point = np.array([0,1,0])
result = sigmoid(np.dot(single_point, weights) + bias)
print(result)

[0.99938023]


In [None]:
weights

array([[-0.51534317],
       [12.74036917],
       [-0.86631273]])

In [None]:
bias

array([-5.35483198])