# Basic Neural Nets

In [1]:
import numpy as np

## Data

We want to create a neural network that is trained on dummy data. Here's is our data below. Each training example's features are either 1 or 0, and you'll notice that the label is always exactly equal to the first feature.

In [2]:
X = np.array([[1, 1, 0, 1], [0, 1, 0, 1], [0, 1, 0, 1], [1, 0, 1, 0],
              [0, 1, 1, 0], [1, 0, 1, 1], [0, 0, 0, 0], [1, 1, 1, 0],
              [0, 0, 1, 1], [1, 1, 0, 1], [0, 0, 1, 0], [1, 0, 0, 0],
              [1, 1, 1, 1], [0, 1, 1, 1], [1, 0, 0, 1], [1, 0, 0, 1]])
y = np.array([[0], [0], [0], [1], [1], [1], [0], [1], [1], [0], [1], [0], [1], [1], [0], [0]])

print(X.shape, y.shape)

(16, 4) (16, 1)


Now let's split this up into the train, validation, and test datasets. Make it a 50:25:25 split.

In [3]:
X_train = X[:8, :]
X_val = X[8:12, :]
X_test = X[12:, :]

y_train = y[:8]
y_val = y[8:12]
y_test = y[12:]

## Forward Pass

We want to create a simple 1-layer fully-connected neural network, as you practiced in the previous notebook. A fully connected notebook consists of a multiplicative weight matrix and an additive bias vector. Intialize the weight and bias to random values. Make sure that the shapes of the two are correct to transform each training example from having 4 elements (for each feature) to 2 (one for each label i.e. what the network predicts the answer to be).

In [4]:
W = np.random.randn(4, 2)
b = np.zeros(2)

Now let's create a function that acts as the linear layer for the network. Don't worry about the 'linear_cache' variable for now, we'll discuss that later.

In [5]:
linear_cache = {}
def linear(input):
    output = np.matmul(input, W) + b
    linear_cache["input"] = input
    return output

The linear layer will output a vector with 2 elements. The 0th element will contain a 1 if the network thinks the correct label is 0, and the 1st element will contain a 1 if the network thinks the correct label is 1 (this is called one-hot encoding).

This is great, but the output will be arbitrarily scaled. We want to turn the output into a probability distribution. So, let's implement a softmax function below:

In [6]:
softmax_cache = {}
def softmax_cross_entropy(input, y):
    batch_size = input.shape[1]
    indeces = np.arange(batch_size)

    exp = np.exp(input)
    norm = (exp.T / np.sum(exp, axis=1)).T
    softmax_cache["norm"], softmax_cache["y"], softmax_cache["indeces"] = norm, y, indeces

    losses = -np.log(norm[indeces, y])
    return np.sum(losses)/batch_size

## Backward Pass

As we discussed during the lecture, implement the backward pass for the softmax and linear layers. Modify your above code to include variables in the caches as necessary:

In [7]:
def softmax_cross_entropy_backward():
    norm, y, indeces = softmax_cache["norm"], softmax_cache["y"], softmax_cache["indeces"]
    dloss = norm
    dloss[indeces, y] -= 1
    return dloss

In [8]:
def linear_backward(dout):
    input = linear_cache["input"]
    dW = np.matmul(input.T, dout)
    db = np.sum(dout, axis=0)
    return dW, db

## Training

Let's train our network! We provided below a useful function to test the accuracy of your network:

In [9]:
def eval_accuracy(output, target):
    pred = np.argmax(output, axis=1)
    target = np.reshape(target, (target.shape[0]))
    correct = np.sum(pred == target)
    accuracy = correct / pred.shape[0] * 100
    return accuracy

Now create your training regime which 1) samples a batch of training data 2) passes the batch forward through the network and computes loss 3) backpropigates into the weight and bias 4) updates the weights and 5) periodically outputs the accuracy on the training and validation sets.

In [10]:
for i in range(4000):
    indeces = np.random.choice(X_train.shape[0], 4)
    batch = X_train[indeces, :]
    target = y_train[indeces]

    # Forward Pass
    linear_output = linear(batch)
    loss = softmax_cross_entropy(linear_output, target)

    # Backward Pass
    dloss = softmax_cross_entropy_backward()
    dW, db = linear_backward(dloss)

    # Weight updates
    W -= 1e-2 * dW
    b -= 1e-2 * db

    # Evaluation
    if (i+1) % 100 == 0:
        accuracy = eval_accuracy(linear_output, target)
        print ("Training Accuracy: %f" % accuracy)

    if (i+1) % 500 == 0:
        accuracy = eval_accuracy(linear(X_val), y_val)
        print("Validation Accuracy: %f\n" % accuracy)

Training Accuracy: 100.000000
Training Accuracy: 75.000000
Training Accuracy: 50.000000
Training Accuracy: 100.000000
Training Accuracy: 100.000000
Validation Accuracy: 100.000000

Training Accuracy: 100.000000
Training Accuracy: 100.000000
Training Accuracy: 100.000000
Training Accuracy: 100.000000
Training Accuracy: 100.000000
Validation Accuracy: 100.000000

Training Accuracy: 100.000000
Training Accuracy: 100.000000
Training Accuracy: 100.000000
Training Accuracy: 100.000000
Training Accuracy: 100.000000
Validation Accuracy: 100.000000

Training Accuracy: 100.000000
Training Accuracy: 100.000000
Training Accuracy: 100.000000
Training Accuracy: 100.000000
Training Accuracy: 100.000000
Validation Accuracy: 100.000000

Training Accuracy: 100.000000
Training Accuracy: 100.000000
Training Accuracy: 100.000000
Training Accuracy: 100.000000
Training Accuracy: 100.000000
Validation Accuracy: 100.000000

Training Accuracy: 100.000000
Training Accuracy: 100.000000
Training Accuracy: 100.0000

Finally, check your accuracy on the test set to see how you did!

In [11]:
accuracy = eval_accuracy(linear(X_test), y_test)
print("Test Accuracy: %f" % accuracy)

Test Accuracy: 100.000000
