# Perceptron from scratch

In this assignment, we will be reimplementing a Neural Networks from scratch.

In part A, we are going to build a simple Perceptron on a small dataset that contains only 3 features.

<img src='https://drive.google.com/uc?id=1aUtXFBMKUumwfZ-2jmR5SIvNYPaD-t2x' width="500" height="250">

Some of the code have already been defined for you. You need only to add your code in the sections specified (marked with **TODO**). Some assert statements have been added to verify the expected outputs are correct. If it does throw an error, this means your implementation is not behaving as expected.

Note: You are only allowed to use Numpy and Pandas packages for the implemention of Perceptron. You can not packages such as Sklearn or Tensorflow.

# 1. Import Required Packages

[1.1] We are going to use numpy and random packages

In [34]:
# Import necessary libraries
import numpy as np
import random

# 2. Define Dataset

[2.1] We are going to use a simple dataset containing 3 features and 7 observations. The target variable is a binary outcome (either 0 or 1)

In [35]:
# Define the input dataset with 3 features and corresponding binary labels
input_set = np.array([[0,1,0], [0,0,1], [1,0,0], [1,1,0], [1,1,1], [0,1,1], [0,1,0]])
labels = np.array([[1], [0], [0], [1], [1], [0], [1]])

# 3. Set Initial Parameters

[3.1] Let's set the seed in order to have reproducible outcomes

In [36]:
# Set random seed to ensure reproducible results
np.random.seed(42)

[3.2] **TODO**: Define a function that will create a Numpy array of a given shape with random values.


For example, `initialise_array(3,1)` will return an array of dimensions (3,1)that can look like this (values may be different):


`array([[0.37454012],
       [0.95071431],
       [0.73199394]])`

In [37]:
# Function to initialise weights/biases with small random values
def initialise_array(*shape):
    # Return an array with the given shape filled with small normally distributed random numbers
    return np.random.randn(*shape) * 0.01

[3.3] **TODO**: Create a Numpy array of shape (3,1) called `init_weights` filled with random values using `initialise_array()` and print them.

In [38]:
# Initialise weights with shape (3, 1) since we have 3 input features
init_weights = initialise_array(3, 1)
print("Initial weights:\n", init_weights)

Initial weights:
 [[ 0.00496714]
 [-0.00138264]
 [ 0.00647689]]


[3.4] **TODO**: Create a Numpy array of shape (1,) called `init_bias` filled with a random value using `initialise_array()` and print it.

In [39]:
# Initialise bias as a single scalar value
init_bias = initialise_array(1)
print("Initial bias:\n", init_bias)

Initial bias:
 [0.0152303]


[3.5] Assert statements to check your created variables have the expected shapes

In [40]:
assert init_weights.shape == (3, 1)  # Validate that weights matrix matches input feature size
assert init_bias.shape == (1,)       # Confirm that the bias is a single scalar value in an array

# 4. Define Linear Function
In this section we are going to implement the linear function of a neuron:

<img src='https://drive.google.com/uc?id=1vhfpGffqletFDzMIvWkCMR2jrHE5MBy5' width="500" height="300">

[4.1] **TODO**: Define a function that will perform a dot product on the provided X and weights and add the bias to it

In [41]:
# Compute the linear transformation: weighted sum of inputs plus bias
def linear(X, weights, bias):
    # Perform matrix multiplication between input and weights, then add bias
    return np.dot(X, weights) + bias

[4.2] Assert statements to check your linear function is behaving as expected

In [42]:
# Test the linear function using known weights and biases to verify correctness
test_weights = [[0.37454012],[0.95071431],[0.73199394]]
test_bias = [0.59865848]
assert linear(X=input_set[0], weights=test_weights, bias=test_bias)[0] == 1.54937279
assert linear(X=input_set[1], weights=test_weights, bias=test_bias)[0] == 1.3306524199999998
assert linear(X=input_set[2], weights=test_weights, bias=test_bias)[0] == 0.9731985999999999
assert linear(X=input_set[3], weights=test_weights, bias=test_bias)[0] == 1.9239129099999999
assert linear(X=input_set[4], weights=test_weights, bias=test_bias)[0] == 2.65590685
assert linear(X=input_set[5], weights=test_weights, bias=test_bias)[0] == 2.28136673
assert linear(X=input_set[6], weights=test_weights, bias=test_bias)[0] == 1.54937279

# 5. Activation Function

In the forward pass, an activation function is applied on the result of the linear function. We are going to implement the sigmoid function and its derivative:

<img src='https://drive.google.com/uc?id=1LK7yjCp4KBICYNvTXzILQUzQbkm7G9xC' width="200" height="100">
<img src='https://drive.google.com/uc?id=1f5jUyw0wgiVufNqveeJVZnQc6pOrDJXD' width="300" height="100">


[5.1] **TODO**: Define a function that will implement the sigmoid function

In [43]:
# Sigmoid activation function (handles large values to avoid overflow)
def sigmoid(x):
    # Handle scalar input explicitly to avoid overflow in extreme values
    if isinstance(x, (int, float)):
        if x >= 500:
            return 1.0
        elif x <= -500:
            return 0.0
        else:
            return 1 / (1 + np.exp(-x))
    else:
        # Clip array inputs to the range [-500, 500] for numerical stability
        x = np.clip(x, -500, 500)
        return 1 / (1 + np.exp(-x))

[5.2] Assert statements to check your sigmoid function is behaving as expected

In [44]:
# Validate sigmoid output for typical and extreme input values
assert sigmoid(0) == 0.5
assert sigmoid(1) == 0.7310585786300049
assert sigmoid(-1) == 0.2689414213699951
assert sigmoid(9999999999999) == 1.0
assert sigmoid(-9999999999999) == 0.0

[5.3] **TODO**: Define a function that will implement the derivative of the sigmoid function

In [45]:
# Derivative of the sigmoid function: used for backpropagation
def sigmoid_derivative(x):
    # Apply the sigmoid function
    s = sigmoid(x)

    # Return the derivative: sigmoid(x) * (1 - sigmoid(x))
    return s * (1 - s)

[5.4] Assert statements to check your sigmoid_derivative function is behaving as expected

In [46]:
# Test sigmoid_derivative output for various input values including edge cases
assert sigmoid_derivative(0) == 0.25
assert sigmoid_derivative(1) == 0.19661193324148185
assert sigmoid_derivative(-1) == 0.19661193324148185
assert sigmoid_derivative(9999999999999) == 0.0
assert sigmoid_derivative(-9999999999999) == 0.0

# 6. Forward Pass

Now we have everything we need to implement the forward propagation

[6.1] **TODO**: Define a function that will implement the forward pass (apply linear function on the input followed by the sigmoid activation function)

In [47]:
# Forward pass function: linear output passed through sigmoid
def forward(X, weights, bias):
    # Compute the linear transformation of inputs
    z = linear(X, weights, bias)

    # Apply the sigmoid activation function to the linear output
    a = sigmoid(z)

    # Return the final activated output
    return a

[6.2] Assert statements to check your forward function is behaving as expected

In [48]:
# Confirm that forward pass produces expected outputs with test inputs
assert forward(X=input_set[0], weights=test_weights, bias=test_bias)[0] == 0.8248231247647452
assert forward(X=input_set[1], weights=test_weights, bias=test_bias)[0] == 0.7909485322272701
assert forward(X=input_set[2], weights=test_weights, bias=test_bias)[0] == 0.7257565873271445
assert forward(X=input_set[3], weights=test_weights, bias=test_bias)[0] == 0.8725741389540382
assert forward(X=input_set[4], weights=test_weights, bias=test_bias)[0] == 0.9343741240208852
assert forward(X=input_set[5], weights=test_weights, bias=test_bias)[0] == 0.9073220375080315
assert forward(X=input_set[6], weights=test_weights, bias=test_bias)[0] == 0.8248231247647452

# 7. Calculate Error

After the forward pass, the Neural Networks will calculate the error between its predictions (output of forward pass) and the actual targets.

[7.1] **TODO**: Define a function that will implement the error calculation (difference between predictions and actual targets)

In [49]:
# [Loss Calculation] Compute total prediction error across all samples
# Measures the sum of differences between predicted and actual labels
def calculate_error(actual, pred):
    # Subtract true labels from predictions and sum the result to get total error
    return np.sum(np.array(pred) - np.array(actual))

[7.2] Assert statements to check your calculate_error function is behaving as expected

In [50]:
# Validate error calculation by comparing predictions against actual labels
test_actual = np.array([0,0,0,1,1,1])
assert calculate_error(actual=test_actual, pred=[0,0,0,1,1,1]).sum() == 0
assert calculate_error(actual=test_actual, pred=[0,0,0,1,1,0]).sum() == -1
assert calculate_error(actual=test_actual, pred=[0,0,0,0,0,0]).sum() == -3

# 8. Calculate Gradients
Once the error has been calculated, a Neural Networks will use this information to update its weights accordingly.

[8.1] Let's create function that calculate the gradients using the sigmoid derivative function and applying the chain rule.

In [51]:
# [Backpropagation] Compute the gradient of the loss with respect to weights and bias
# Uses the chain rule: derivative of loss w.r.t. prediction, then w.r.t. weights
def calculate_gradients(pred, error, input):
    # Compute derivative of the prediction (sigmoid output)
    dpred = sigmoid_derivative(pred)

    # Multiply error with derivative to get delta for each sample
    z_del = error * dpred

    # Calculate gradient of weights using input transposed
    gradients = np.dot(input.T, z_del)

    # Return both gradients and delta to be used for updating weights and bias
    return gradients, z_del

# 9. Training

Now that we built all the components of a Neural Networks, we can finally train it on our dataset.

[9.1] Create 2 variables called `weights` and `bias` that will respectively take the value of `init_weights` and `init_bias`

In [52]:
# Initialize model parameters by copying initial weights and bias
weights = init_weights
bias = init_bias

[9.2] Create a variable called `lr` that will be used as the learning rate for updating the weights

In [53]:
# Define the learning rate used in gradient descent updates
lr = 0.5

[9.3] Create a variable called `epochs` with the value 10000. This will the number of times the Neural Networks will process the entire dataset and update its weights

In [54]:
# Set the number of training iterations (epochs) for full dataset passes
epochs = 10000  # Number of full passes over the dataset during training

[9.4] Create a for loop that will perform the training of our Neural Networks

In [55]:
# [Training Loop] Repeatedly update weights and bias over multiple epochs to minimize error
for epoch in range(epochs):
    inputs = input_set  # Use the full training dataset for each epoch

    # Perform forward pass to get predicted outputs for current weights and bias
    z = forward(X=inputs, weights=weights, bias=bias)

    # Calculate the difference between predictions and actual labels
    error = calculate_error(actual=labels, pred=z)

    # Compute gradients for weights and the intermediate delta for bias
    gradients, z_del = calculate_gradients(pred=z, error=error, input=input_set)

    # Update weights using gradient descent rule
    weights = weights - lr * gradients

    # Update bias for each training sample's delta
    for num in z_del:
        bias = bias - lr * num

[9.5] **TODO** Print the final values of `weights` and `bias`

In [56]:
# Display final weights and bias after training
print("Final weights:\n", weights)
print("Final bias:\n", bias)

Final weights:
 [[0.09721747]
 [2.34714834]
 [0.10194134]]
Final bias:
 [-1.48227029]


# 10. Compare before and after training

Let's compare the predictions of our Neural Networks before (using `init_weights` and `init_bias`) and after the training (using `weights` and `bias`)

[10.1] Create a function to display the values of a single observation from the dataset (using its index), the error and the actual target and prediction

In [57]:
# [Final Comparison] Utility function to display the model's prediction, actual label,
# and error for a specific input index. Helps assess model performance before and after training.
def compare_pred(weights, bias, index, X, y):
    # Perform a forward pass to get the model's prediction for a specific input
    pred = forward(X=X[index], weights=weights, bias=bias)

    # Extract the actual label for the same input
    actual = y[index]

    # Calculate the prediction error compared to the true label
    error = calculate_error(actual, pred)

    # Display the input, prediction, actual label, and error
    print(f"{X[index]} - Error {error} - Actual: {actual} - Pred: {pred}")

[10.2] Compare the results on the first observation (index 0)

In [58]:
# Compare predictions before and after training for all data points
# This helps visualize improvement in prediction after the model is trained
# We compare the predicted value, actual label, and error at each index

In [59]:
# Evaluate model performance on the first input sample (index 0)
compare_pred(weights=init_weights, bias=init_bias, index=0, X=input_set, y=labels)
compare_pred(weights=weights, bias=bias, index=0, X=input_set, y=labels)

[0 1 0] - Error -0.4965381414315283 - Actual: [1] - Pred: [0.50346186]
[0 1 0] - Error -0.2963211896026101 - Actual: [1] - Pred: [0.70367881]


[10.3] Compare the results on the second observation (index 1)

In [60]:
# Evaluate model performance on the second input sample (index 1)
compare_pred(weights=init_weights, bias=init_bias, index=1, X=input_set, y=labels)
compare_pred(weights=weights, bias=bias, index=1, X=input_set, y=labels)

[0 0 1] - Error 0.5054265829032935 - Actual: [0] - Pred: [0.50542658]
[0 0 1] - Error 0.20095617436289462 - Actual: [0] - Pred: [0.20095617]


[10.4] Compare the results on the third observation (index 2)

In [61]:
# Evaluate model performance on the third input sample (index 2)
compare_pred(weights=init_weights, bias=init_bias, index=2, X=input_set, y=labels)
compare_pred(weights=weights, bias=bias, index=2, X=input_set, y=labels)

[1 0 0] - Error 0.5050491883789925 - Actual: [0] - Pred: [0.50504919]
[1 0 0] - Error 0.20019872037064 - Actual: [0] - Pred: [0.20019872]


[10.5] Compare the results on the forth observation (index 3)

In [62]:
# Evaluate model performance on the fourth input sample (index 3)
compare_pred(weights=init_weights, bias=init_bias, index=3, X=input_set, y=labels)
compare_pred(weights=weights, bias=bias, index=3, X=input_set, y=labels)

[1 1 0] - Error -0.49529643948225954 - Actual: [1] - Pred: [0.50470356]
[1 1 0] - Error -0.2764588337939673 - Actual: [1] - Pred: [0.72354117]


[10.6] Compare the results on the fifth observation (index 4)

In [63]:
# Evaluate model performance on the fifth input sample (index 4)
compare_pred(weights=init_weights, bias=init_bias, index=4, X=input_set, y=labels)
compare_pred(weights=weights, bias=bias, index=4, X=input_set, y=labels)

[1 1 1] - Error -0.4936774164107015 - Actual: [1] - Pred: [0.50632258]
[1 1 1] - Error -0.25653876259383823 - Actual: [1] - Pred: [0.74346124]


[10.7] Compare the results on the sixth observation (index 5)

In [64]:
# Evaluate model performance on the sixth input sample (index 5)
compare_pred(weights=init_weights, bias=init_bias, index=5, X=input_set, y=labels)
compare_pred(weights=weights, bias=bias, index=5, X=input_set, y=labels)

[0 1 1] - Error 0.5050809603280083 - Actual: [0] - Pred: [0.50508096]
[0 1 1] - Error 0.7244850808594092 - Actual: [0] - Pred: [0.72448508]


[10.8] Compare the results on the sixth observation (index 5)

In [65]:
# Evaluate model performance on the seventh input sample (index 6)
compare_pred(weights=init_weights, bias=init_bias, index=6, X=input_set, y=labels)
compare_pred(weights=weights, bias=bias, index=6, X=input_set, y=labels)

[0 1 0] - Error -0.4965381414315283 - Actual: [1] - Pred: [0.50346186]
[0 1 0] - Error -0.2963211896026101 - Actual: [1] - Pred: [0.70367881]


Please submit this notebook into Canvas. Name it following this rule: *assignment1-partA-\<student_id\>.ipynb*