## Gradient Descent - Implement

This is for ONE output layer (no hidden layers)

Implement gradient descent and train the network on the admissions data. Your goal here is to train the network until you reach a minimum in the mean square error (MSE) on the training set. You need to implement:

- The network output: output.
- The output error: error.
- The error term: error_term.
- Update the weight step: del_w +=.
- Update the weights: weights +=.

Instead of the SSE, use the mean of the square errors (MSE), divide by the number of records in the data, mm to take the average.

Here's the general algorithm for updating the weights with gradient descent:

Set the **weight step** to zero: \Delta w_i = 0

- For each record in the training data:

Make a forward pass through the network, calculating the output \hat y = f(\sum_i w_i x_i) 

Calculate the error term for the output unit, \delta = (y - \hat y) * f'(\sum_i w_i x_i)

Update the weight step \Delta w_i = \Delta w_i + \delta x_i

Update the weights w_i = w_i + \eta \Delta w_i / m where \etaη is the learning rate and mm is the number of records. Here we're averaging the weight steps to help reduce any large variations in the training data.

Repeat for ee epochs.

You can also update the weights on each record instead of averaging the weight steps after going through all the records.

#### Numpy

**First, you'll need to initialize the weights of the model.

Want these to be small such that the input to the sigmoid is in the linear region near 0 and not squashed at the high and low ends. It's also important to initialize them randomly so that they all have different starting values and diverge, breaking symmetry. So, we'll initialize the weights from a normal distribution centered at 0. A good value for the scale is 1/\sqrt{n}	  where nn is the number of input units. This keeps the input to the sigmoid low for increasing numbers of input units.

*weights = np.random.normal(scale=1/n_features**.5, size=n_features)

NumPy provides a function np.dot() that calculates the dot product of two arrays, which conveniently calculates hh for us. The dot product multiplies two arrays element-wise, the first element in array 1 is multiplied by the first element in array 2, and so on. Then, each product is summed.

*output_in = np.dot(weights, inputs)

In [None]:
import numpy as np
from data_prep import features, targets, features_test, targets_test


def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))

# Use to same seed to make debugging easier
np.random.seed(42)

n_records, n_features = features.shape
last_loss = None

# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)

# Neural Network hyperparameters
epochs = 1000
learnrate = 0.5

for e in range(epochs):
    del_w = np.zeros(weights.shape)
    for x, y in zip(features.values, targets):
        # Loop through all records, x is the input, y is the target

        # Note: We haven't included the h variable from the previous
        #       lesson. You can add it if you want, or you can calculate
        #       the h together with the output
        
        # helper calc
        h = np.dot(x, weights)
        
        # TODO: Calculate the output
        output = sigmoid(h)

        # DONE: Calculate the error
        error = y - output
        
        # DONE: Calculate the error term
# Note: The sigmoid_prime function calculates sigmoid(h) twice,
#       You can make this code more efficient by calculating the derivative directly
#       rather than calling sigmoid_prime.
        error_term = error * output * (1 - output)        

        # DONE: Calculate the change in weights for this sample
        #       and add it to the total weight change
        del_w += error_term * x

    # TODO: Update weights using the learning rate and the average change in weights
    weights += learnrate * del_w / n_records

    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        out = sigmoid(np.dot(features, weights))
        loss = np.mean((out - targets) ** 2)
        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss


# Calculate accuracy on test data
tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))