Coding a Neural Network from Scratch

Activation Function

In [7]:
import numpy as np

def sigmoid(x):
    # Our activation function: f(x) = 1 / (1 + e^(-x))
    return 1/ (1 + np.exp(-x))

Creating a Neuron

In [8]:
class Neuron:

    def __init__(self, weight, bias):
        self.weight = weight
        self.bias = bias
    
    def feed_forward(self, inputs):
        # Weight inputs, add bias, then use the activation function
        total = np.dot(self.weight, inputs) + self.bias
        return sigmoid(total)

Implementing a Neuron

In [9]:
weights = np.array([0,1]) # w1 = 0, w2 = 1
bias = 4 # b = 4

n = Neuron(weights, bias)

x = np.array([2,3]) # x1 = 2, x2 = 3

print(n.feed_forward(x)) # 0.9990889488055994

0.9990889488055994


Combining Neurons into a Neural Network

In [10]:
class Neural_Network:
    '''
    A Neural Network with:
        1. 2 Inputs
        2. A hidden layer with 2 neurons (h1, h2)
        3. An output layer with 1 neuron (o1)
    Each Neuron has the same weight and bias:
        w = [0,1]
        b = 0
    '''

    def __init__(self):
        weights = np.array([0,1])
        bias = 0

        # Neurons
        self.h1 = Neuron(weights, bias)
        self.h2 = Neuron(weights, bias)
        self.o1 = Neuron(weights, bias)
    
    def feed_forward(self, x):
        out_h1 = self.h1.feed_forward(x)
        out_h2 = self.h2.feed_forward(x)

        # The inputs for o1 are the outputs of h1 and h2
        out_o1 = self.o1.feed_forward(np.array([out_h1, out_h2]))

        return out_o1

In [11]:
network = Neural_Network()
x = np.array([2,3])
print(network.feed_forward(x)) # 0.7216325609518421

0.7216325609518421


Training a Neural Network

We have the following measurements:

Name	Weight (lb)	 Height (in)	  Gender
Alice	   133	        65	            F
Bob	       160	        72	            M
Charlie	   152	        70	            M
Diana	   120	        60	            F

let's train our model to predict someone's gender based on thier weight and height

We will represent male with a 0 and female with a 1, we will also shift the data to make it easier to use.

Name	Weight (-135)	 Height (-66)	  Gender
Alice	    -2	             -1	            1
Bob	        25	              6	            0
Charlie	    17	              4	            0
Diana	   -15	             -6	            1

Loss

Before we train the model, we first need a way to quantify how "good" its doing so we can improve.

We will use the Mean Squared Error (MSE) Loss:
MSE=  1/n 1->n ∑(y[true] - y[pred])**2

n is the number of samples, which is 4 (Alice, Bob, Charlie, Diana).
y represents the variable being predicted, which is Gender.
y[true] is the true value of the variable (the “correct answer”). For example, y[true] for Alice would be 1 (Female).
y[pred] is the predicted value of the variable. It’s whatever our network outputs.

(y[true] - y[pred])**2 is known as the squared error. Our loss function is simply taking the average over all squared errors (hence the name mean squared error). The better our predictions are, the lower our loss will be!

Better predictions = Lower loss.
Training a network = trying to minimize its loss.

Exampke of loss calculation

In [1]:
import numpy as np

def mse_loss(y_true: np.array, y_pred: np.array):
    return ((y_true - y_pred) ** 2).mean()

y_pred = np.array([0,0,0,0])
y_true = np.array([1,0,0,1])

print(mse_loss(y_true, y_pred)) # 0.5

0.5


We now have a clear goal: minimize the loss of the neural network. We know we can change the network’s weights and biases to influence its predictions, but how do we do so in a way that decreases loss?

We’ll use an optimization algorithm called stochastic gradient descent (SGD) that tells us how to change our weights and biases to minimize loss.

Our training process will look like this:

1. Choose one sample from our dataset. This is what makes it stochastic gradient descent - we only operate on one sample at a time.
2. Calculate all the partial derivatives of loss with respect to weights or biases.
3. Use the update equation to update each weight and bias.
4. Go back to step 1.

In [26]:
import numpy as np

def sigmoid(x):
    # Sigmoid activation function: f(x) = 1 / (1 + e^(-x))
    return 1 / (1 + np.exp(-x))

def deriv_sigmoid(x):
    # Derivative of sigmoid: f'(x) = f(x) * (1 - f(x))
    fx = sigmoid(x)

    return fx * (1 - fx)

def mse_loss(y_true, y_pred):
    return ((y_true - y_pred) ** 2).mean()

class Neural_Network:
    '''
    A Neural Network with:
        1. 2 Inputs
        2. A hidden layer with 2 neurons (h1, h2)
        3. An output layer with 1 neuron (o1)
    '''

    def __init__(self):
        # Weights
        self.w1 = np.random.normal()
        self.w2 = np.random.normal()
        self.w3 = np.random.normal()
        self.w4 = np.random.normal()
        self.w5 = np.random.normal()
        self.w6 = np.random.normal()
        
        # Biases
        self.b1 = np.random.normal()
        self.b2 = np.random.normal()
        self.b3 = np.random.normal()
    
    def feed_forward(self, x):
        h1 = sigmoid(self.w1 * x[0] + self.w2 * x[1] + self.b1)
        h2 = sigmoid(self.w3 * x[0] + self.w4 * x[1] + self.b2)
        o1 = sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)

        return o1
    
    def train(self, data, all_y_trues):
        '''
            - data is a (n x 2) numpy array, n = # of samples in the dataset.
            - all_y_trues is a numpy array with n elements.
            Elements in all_y_trues correspond to those in data.
        '''
        learn_rate = 0.1
        epochs = 1000 # No.of times to loop through the entire dataset

        for epoch in range(epochs):
            for x, y_true in zip(data, all_y_trues):
                # Feed Forward
                sum_h1 = self.w1 * x[0] + self.w2 * x[1] + self.b1
                h1 = sigmoid(sum_h1)

                sum_h2 = self.w3 * x[0] + self.w4 * x[1] + self.b2
                h2 = sigmoid(sum_h2)

                sum_o1 = self.w5 * h1 + self.w6 * h2 + self.b3
                o1 = sigmoid(sum_o1)
                y_pred = o1

                # Calculate partial derivatives
                # Naming: d_L_d_w1 represents "partial L / partial w1"
                d_L_d_y_pred = -2 *(y_true - y_pred)

                # Neuron o1
                d_ypred_d_w5 = h1 * deriv_sigmoid(sum_o1)
                d_ypred_d_w6 = h2 * deriv_sigmoid(sum_o1)
                d_ypred_d_b3 = deriv_sigmoid(sum_o1)

                d_ypred_d_h1 = self.w5 * deriv_sigmoid(sum_o1)
                d_ypred_d_h2 = self.w6 * deriv_sigmoid(sum_o1)

                # Neuron h1
                d_h1_d_w1 = x[0] * deriv_sigmoid(sum_h1)
                d_h1_d_w2 = x[1] * deriv_sigmoid(sum_h1)
                d_h1_d_b1 = deriv_sigmoid(sum_h1)

                # Neuron h2
                d_h2_d_w3 = x[0] * deriv_sigmoid(sum_h2)
                d_h2_d_w4 = x[1] * deriv_sigmoid(sum_h2)
                d_h2_d_b2 = deriv_sigmoid(sum_h2)

                # Update weights and biases

                # Neuron h1
                self.w1 -= learn_rate * d_L_d_y_pred * d_ypred_d_h1 * d_h1_d_w1
                self.w2 -= learn_rate * d_L_d_y_pred * d_ypred_d_h1 * d_h1_d_w2
                self.b1 -= learn_rate * d_L_d_y_pred * d_ypred_d_h1 * d_h1_d_b1

                # Neuron h2
                self.w3 -= learn_rate * d_L_d_y_pred * d_ypred_d_h2 * d_h2_d_w3
                self.w4 -= learn_rate * d_L_d_y_pred * d_ypred_d_h2 * d_h2_d_w4
                self.b2 -= learn_rate * d_L_d_y_pred * d_ypred_d_h2 * d_h2_d_b2

                # Neuron o1
                self.w5 -= learn_rate * d_L_d_y_pred * d_ypred_d_w5
                self.w6 -= learn_rate * d_L_d_y_pred * d_ypred_d_w6
                self.b3 -= learn_rate * d_L_d_y_pred * d_ypred_d_b3
            
            # Calculate total loss at the end of each epoch
            if epoch % 10 == 0:
                y_pred = np.apply_along_axis(self.feed_forward, 1, data)
                loss = mse_loss(all_y_trues, y_pred)
                print("Epoch: %d loss: %.3f" % (epoch, loss))

In [27]:
# Define dataset
data = np.array([
    [-2,-1], # Alice
    [25, 6], # Bob
    [17, 4], # Charlie
    [-15,-6], # Diana
])

all_y_trues = np.array([
    1, # Alice
    0, # Bob
    0, # Charlie
    1, # Diana
])

# Train our Neural Network
network = Neural_Network()
network.train(data, all_y_trues)

Epoch: 0 loss: 0.174
Epoch: 10 loss: 0.111
Epoch: 20 loss: 0.087
Epoch: 30 loss: 0.071
Epoch: 40 loss: 0.060
Epoch: 50 loss: 0.051
Epoch: 60 loss: 0.044
Epoch: 70 loss: 0.039
Epoch: 80 loss: 0.034
Epoch: 90 loss: 0.031
Epoch: 100 loss: 0.028
Epoch: 110 loss: 0.025
Epoch: 120 loss: 0.023
Epoch: 130 loss: 0.021
Epoch: 140 loss: 0.020
Epoch: 150 loss: 0.018
Epoch: 160 loss: 0.017
Epoch: 170 loss: 0.016
Epoch: 180 loss: 0.015
Epoch: 190 loss: 0.014
Epoch: 200 loss: 0.014
Epoch: 210 loss: 0.013
Epoch: 220 loss: 0.012
Epoch: 230 loss: 0.012
Epoch: 240 loss: 0.011
Epoch: 250 loss: 0.011
Epoch: 260 loss: 0.010
Epoch: 270 loss: 0.010
Epoch: 280 loss: 0.009
Epoch: 290 loss: 0.009
Epoch: 300 loss: 0.009
Epoch: 310 loss: 0.008
Epoch: 320 loss: 0.008
Epoch: 330 loss: 0.008
Epoch: 340 loss: 0.008
Epoch: 350 loss: 0.007
Epoch: 360 loss: 0.007
Epoch: 370 loss: 0.007
Epoch: 380 loss: 0.007
Epoch: 390 loss: 0.006
Epoch: 400 loss: 0.006
Epoch: 410 loss: 0.006
Epoch: 420 loss: 0.006
Epoch: 430 loss: 0.006

In [28]:
# Make some predictions
emily = np.array([-7, -3]) # 128 pounds, 63 inches
frank = np.array([20, 2])  # 155 pounds, 68 inches
print("Emily: %.3f" % network.feed_forward(emily)) # 0.948 - F
print("Frank: %.3f" % network.feed_forward(frank)) # 0.039 - M

# epochs 1000     10,000 
# Emily: 0.948     0.990
# Frank: 0.039     0.016

Emily: 0.948
Frank: 0.039
