# Implementing gradient descent

Numpy provides a function that calculates the dot product of two arrays, which conveniently calculates h for us. The dot product multiplies two arrays element-wise, the first element in array 1 is multiplied by the first element in array 2, and so on. Then, each product is summed.

    #input to the output layer<br>
    output_in = np.dot(weights, inputs)

Here's the general algorithm for updating the weights with gradient descent:

Set the weight step to zero: Δw
​i
​​ =0

For each record in the training data:

Make a forward pass through the network, calculating the output 
​y
​^
​​ =f(∑
​i
​​ w
​i
​​ x
​i
​​ )

Calculate the error term for the output unit, δ=(y−
​y
​^
​​ )∗f
​′
​​ (∑
​i
​​ w
​i
​​ x
​i
​​ )

Update the weight step Δw
​i
​​ =Δw
​i
​​ +δx
​i
​​ 

Update the weights w
​i
​​ =w
​i
​​ +ηΔw
​i
​​ /m where η is the learning rate and m is the number of records. Here we're averaging the weight steps to help reduce any large variations in the training data.

Repeat for e epochs.

You can also update the weights on each record instead of averaging the weight steps after going through all the records.

Remember that we're using the sigmoid for the activation function, f(h)=1/(1+e
​−h
​​ )

And the gradient of the sigmoid is f
​′
​​ (h)=f(h)(1−f(h))

where h is the input to the output unit,

h=∑
​i
​​ w
​i
​​ x
​i
​​ 

In [18]:
import numpy as np
from data_prep import features, targets, features_test, targets_test


def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))

# TODO: We haven't provided the sigmoid_prime function like we did in
#       the previous lesson to encourage you to come up with a more
#       efficient solution. If you need a hint, check out the comments
#       in solution.py from the previous lecture.

def sigmoid_prime(x):
    """
    # Derivative of the sigmoid function
    """
    return sigmoid(x) * (1 - sigmoid(x))


# Use to same seed to make debugging easier
np.random.seed(42)

n_records, n_features = features.shape
last_loss = None

# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)
print(weights)

# Neural Network hyperparameters
epochs = 1000
learnrate = 0.5

for e in range(epochs):
    del_w = np.zeros(weights.shape)
    for x, y in zip(features.values, targets):
        # Loop through all records, x is the input, y is the target

        # Note: We haven't included the h variable from the previous
        #       lesson. You can add it if you want, or you can calculate
        #       the h together with the output

        # TODO: Calculate the output
        output = np.dot(x, weights)

        # TODO: Calculate the error
        error = y - sigmoid(output)

        # TODO: Calculate the error term
        error_term = error * sigmoid_prime(output)

        # TODO: Calculate the change in weights for this sample
        #       and add it to the total weight change
        del_w += error_term * x

    # TODO: Update weights using the learning rate and the average change in weights
    weights += learnrate * del_w / n_records

    # TODO: Update weights using the learning rate and the average change in weights
    weights += 0

    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        out = sigmoid(np.dot(features, weights))
        loss = np.mean((out - targets) ** 2)
        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss


# Calculate accuracy on test data
tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

[ 0.2027827  -0.05644616  0.26441774  0.62177434 -0.09559271 -0.09558601]
Train loss:  0.2627609385
Train loss:  0.209286194093
Train loss:  0.200842929081
Train loss:  0.198621564755
Train loss:  0.197798513967
Train loss:  0.197425779122
Train loss:  0.197235077462
Train loss:  0.197129456251
Train loss:  0.197067663413
Train loss:  0.197030058018
Prediction accuracy: 0.725
