# 4 bit binary classification by hand

### Introduction 

The aim of this project is to build and train a simple multilayer perceptron from scratch, exclusively using the `math` module in Python and with all calculations done by hand.

The dataset which will be used for classification problem is the binary representation of numbers 0-15 with a binary label.

Thus, our input `X` will be an array of 4 numbers representing binary values, and our target variable `y` will be a binary value depending on whether or not the number represented is odd or even.

Below is the MLP architecture that will be used:

![image.png](attachment:image.png)

As seen above, the network will use a single hidden layer made up of 4 neurons, with a single neuron in the output layer due to binary classification. A ReLU activation function will be used for the hidden layer, and a sigmoid for the output layer.

### Forward Pass Overview

**Initialisation:**

Let $n = 4$ (number of features and neurons in the hidden layer)

Hidden Layer Weights and Biases:
$$
\begin{align*}
& w_{11}, w_{12}, w_{13}, w_{14} \sim \text{Uniform}(-0.25, 0.25), & b_1 = 0 \\
& w_{21}, w_{22}, w_{23}, w_{24} \sim \text{Uniform}(-0.25, 0.25), & b_2 = 0 \\
& w_{31}, w_{32}, w_{33}, w_{34} \sim \text{Uniform}(-0.25, 0.25), & b_3 = 0 \\
& w_{41}, w_{42}, w_{43}, w_{44} \sim \text{Uniform}(-0.25, 0.25), & b_4 = 0 \\
\end{align*}
$$

Output Layer Weights and Bias:
$$
\begin{align*}
& w_{o1}, w_{o2}, w_{o3}, w_{o4} \sim \text{Uniform}(-0.25, 0.25), & b_o = 0 \\
\end{align*}
$$

**Forward Pass for a Single Input Sample $x = [x_1, x_2, x_3, x_4]$**:

Hidden Layer Calculations:
$$
\begin{align*}
z_1 & = w_{11}x_1 + w_{12}x_2 + w_{13}x_3 + w_{14}x_4 + b_1 \\
a_1 & = \max(0, z_1) \\
z_2 & = w_{21}x_1 + w_{22}x_2 + w_{23}x_3 + w_{24}x_4 + b_2 \\
a_2 & = \max(0, z_2) \\
z_3 & = w_{31}x_1 + w_{32}x_2 + w_{33}x_3 + w_{34}x_4 + b_3 \\
a_3 & = \max(0, z_3) \\
z_4 & = w_{41}x_1 + w_{42}x_2 + w_{43}x_3 + w_{44}x_4 + b_4 \\
a_4 & = \max(0, z_4) \\
\end{align*}
$$

Output Layer Calculation:
$$
\begin{align*}
z_o & = w_{o1}a_1 + w_{o2}a_2 + w_{o3}a_3 + w_{o4}a_4 + b_o \\
y_{\hat{}} & = \frac{1}{1 + e^{-z_o}} \\
\end{align*}
$$


### Data Generation

In [1]:
import math
import random
# Data
numbers = range(16)
x_train = [[int(bit) for bit in format(num, '04b')] for num in numbers]
y = [num % 2 for num in numbers]

### Forward Pass Implementation

In [2]:
# Initialising weights and biases for hidden layer
def init_w_b(n_neurons, n_features):
    w = []
    b = []
    for i in range(n_neurons):
        w.append([random.randint(-250, 250) * .001 for i in range(n_features)])
        b.append(0)
    
    return w, b

# Initialising weights and biases for output layer
def init_wo_bo(n_neurons_in_prev_h):
    w_o = [random.randint(-250, 250) * .001 for i in range(n_neurons_in_prev_h)]
    b_o = 0
    return w_o, b_o

In [25]:
def forward(x_train, n_neurons, h_W, h_B, o_W, o_B):   
    y_hat = []
    for x in x_train:    
        # Input Layer -> Hidden Layer
        h1_output = []
        for i in range(n_neurons):
            w_i = h_W[i] # ith weights
            b_i = h_B[i] # ith bias
            z_i = 0
            for j in range(len(x)):
                z_i += x[j] * w_i[j]
            z_i += b_i
            a_i = max(0, z_i) # ReLU
            h1_output.append(a_i)

        # Hidden Layer -> Output Layer
        z_o = 0
        for i in range(len(h1_output)):
            z_o += h1_output[i] * o_W[i]
        z_o += o_B
        pred = 1 / (1 + math.exp(-z_o)) # Sigmoid  
        y_hat.append(pred)
    
    return y_hat

### Loss function and backprop overview

The loss function I've chosen to use is the Binary Cross Entropy Loss also known as the Log Loss:
$$
L(y_{\text{true}}, y_{\text{pred}}) = -\frac{1}{N} \sum_{i=1}^{N} \left( y_{\text{true}}^{(i)} \cdot \log(y_{\text{pred}}^{(i)}) + (1 - y_{\text{true}}^{(i)}) \cdot \log(1 - y_{\text{pred}}^{(i)}) \right)
$$
Where:

- $ N $ is the number of samples in the dataset.
- $ y_{\text{true}}^{(i)}$ is the true label (0 or 1) for the $ i^{th} $ sample.
- $ y_{\text{pred}}^{(i)}$ is the predicted probability of the positive class (class 1) for the $ i^{th} $ sample.


### Binary Cross Entropy Implementation

In [18]:
def L(y, y_hat, n_samples):
    s = 0
    for i in range(n_samples):
        s += (y[i] * math.log(y_hat[i]) + (1 - y[i]) * math.log(1 - y_hat[i]))
    loss = -1/n_samples * s
    return loss

In [32]:
def calculate_gradients(x_train, y, y_hat, h_W, h_B, o_W, o_B, n_neurons):
    gradients_h_W = [[0, 0, 0, 0] for i in range(n_neurons)]
    gradients_h_B = [0, 0, 0, 0]
    gradients_o_W = [0, 0, 0, 0]
    gradients_o_B = 0
    
    for i in range(len(x_train)): # For each training sample
        x = x_train[i]
        y = y[i]
        y_hat = y_hat[i]
        
        # dLoss w.r.t pred
        dL_dy_hat = - (y / y_hat - (1 - y) / (1 - y_hat))
        
        # Output layer gradients
        # dy_hat w.r.t z_o (output before sigmoid)
        
        # o_w o_b gradients
        #
    

In [None]:
weights, biases = init_w_b(4, 4)
w_o, b_o = init_wo_bo(4)


n_neurons = 4