# Neural Net
// TODO: Write a introduction part here

### 1. Data Preprocessing
To get started with any ML project, at first we need data, so lets import it. Typically, we would also perform data cleaning and wrangling here, but since our data is already clean and well-structured, we only ensure that the data types are read correctly.

In [17]:
import csv
import math
import random

# Read CSV file without checking for missing values
data = []
with open("data/full_data.csv", newline="") as csvfile:
    reader = csv.reader(csvfile)
    headers = next(reader)  # Skip the header
    for row in reader:
        # Convert relevant columns to numerical values
        income = float(row[1])
        age = float(row[2])
        loan = float(row[3])
        target = int(row[4])
        # Append the row to the data
        data.append([row[0], income, age, loan, target])

### 2. Split Data into Test & Training Sets

In this step, we will split the data into two parts: training and test data. The training data will be used to teach the model, while the test data will help us check how well the model works on new, unseen data. This way, we can ensure that the model is not just memorizing the training data but is actually learning patterns that can be applied in real situations.

A model can "memorize" the data if trained for too long on a small dataset. This is called overfitting. When overfitting happens, the model performs very well on the training data but struggles with new, unseen data.

In [31]:
# Shuffle the dataset
random.seed(42)  # For reproducibility
random.shuffle(data)

# Define the split ratio (80% training, 20% testing)
split_ratio = 0.8
split_index = int(len(data) * split_ratio)


# Split the data into training and testing sets
train_data = data[:split_index]
test_data = data[split_index:]


# Save the training data to a CSV file
with open("data/train_data.csv", "w", newline="") as trainfile:
    writer = csv.writer(trainfile)
    writer.writerow(headers)  # Write the header
    writer.writerows(train_data)  # Write the training data


# Save the testing data to a CSV file
with open("data/test_data.csv", "w", newline="") as testfile:
    writer = csv.writer(testfile)
    writer.writerow(headers)  # Write the header
    writer.writerows(test_data)  # Write the testing data


### 3. Feature Extraction and Target Variable Selection

Now, that we have divided our data into `test` and `training` sets, lets pick the columns (`income`, `age`, and `loan`) based on what we want to predict the outcome or a last column (`class`).

The columns `income`, `age`, and `loan` are selected as features, which are stored in `X_train` and `X_test`. These features will be used as input to train and evaluate the model.

Additionally, we select the last column (`class`) as the target variable, which is stored in `y_train` and `y_test`. The target variable represents the outcome or category we want the model to predict.

In [19]:
# Extract features and target from training data
X_train = [[col[1], col[2], col[3]] for col in train_data]  # income, age, loan
y_train = [col[4] for col in train_data]  # class (target)

# Extract features and target from testing data
X_test = [[col[1], col[2], col[3]] for col in test_data]  # income, age, loan
y_test = [col[4] for col in test_data]  # class (target)


### 4. Min-Max Scaling

Inputs can have a huge variance amongst themselves. For example `loan` column can have a number ranging from hundreds to millions while the `age` column will most likely stay under a hundred with few exceptions. 

This can cause some larger features to overshadow the smaller ones. To prevent that, we normalize the data using Min-Max scaling. This technique adjusts each feature so that its values range between 0 and 1. By doing this, we ensure that all features contribute equally to the model's learning process.

### $x_{scaled} = \frac{x - x_{min}}{x_{max} - x_{min}}$


In [20]:
# Write min-max scaling function, instead of using third-party libraries
def min_max_scaling(X):
    min_vals = [min(col) for col in zip(*X)]
    max_vals = [max(col) for col in zip(*X)]

    X_scaled = []
    for row in X:
        scaled_row = [
            (row[i] - min_vals[i]) / (max_vals[i] - min_vals[i])
            for i in range(len(row))
        ]
        X_scaled.append(scaled_row)

    return X_scaled

# Normalize test and train data
X_train_scaled = min_max_scaling(X_train)
X_test_scaled = min_max_scaling(X_test)

# 5. Initialize Weights and Biases
In this step, we initialize the weights and biases for our neural network. This is a crucial step as it sets the starting point for the training process.

1. **Function to Initialize Random Weights:**
   - We define a function `random_matrix(rows, cols)` that generates a matrix of random values with the specified number of rows and columns. This function does not use NumPy and relies on Python's built-in `random` module.

2. **Define Network Architecture:**
   - `input_size`: The number of input features, which is determined by the length of the first row in `X_train_scaled`. In this case, we have 3 input features: income, age, and loan.
   - `hidden_size`: The number of neurons in the hidden layer. We set this to 4.
   - `output_size`: The number of output neurons. We set this to 1, as we are predicting a single value.

3. **Initialize Weights and Biases:**
   - `W1`: A matrix of random weights connecting the input layer to the hidden layer. It has dimensions `input_size x hidden_size`.
   - `b1`: A list of random biases for the hidden layer. It has a length of `hidden_size`.
   - `W2`: A matrix of random weights connecting the hidden layer to the output layer. It has dimensions `hidden_size x output_size`.
   - `b2`: A list of random biases for the output layer. It has a length of `output_size`.

By initializing the weights and biases randomly, we ensure that the neural network starts with a diverse set of parameters, which helps in breaking symmetry and allows the network to learn effectively during training.

In [21]:
# Function to initialize random weights without using numpy
def random_matrix(rows, cols):
    return [[random.random() for _ in range(cols)] for _ in range(rows)]


# Initialize weights and biases
input_size = len(X_train_scaled[0])  # 3 input features: income, age, loan
hidden_size = 4
output_size = 1


# Randomly initialize weights and biases
W1 = random_matrix(input_size, hidden_size)
b1 = [random.random() for _ in range(hidden_size)]
W2 = random_matrix(hidden_size, output_size)
b2 = [random.random() for _ in range(output_size)]

# 6. Matrix Multiplication | Dot Product

Although in a real project we would use a third-party library to perform matrix multiplication, for learning purposes, we will implement it using just Python.

### Inputs

- **A**: A matrix with dimensions $m \times n$.
- **B**: A matrix with dimensions $n \times p$.

### Output

- A matrix with dimensions $m \times p$, where each element is the dot product of the corresponding row from **A** and column from **B**.

### Matrix Multiplication Equation

The element at position $(i, j)$ in the resulting matrix is calculated as:

$$
C_{ij} = \sum_{k=1}^{n} A_{ik} \cdot B_{kj}
$$

Where:

- $A_{ik}$ is the element from the $i$-th row and $k$-th column of matrix **A**.
- $B_{kj}$ is the element from the $k$-th row and $j$-th column of matrix **B**.
- $C_{ij}$ is the element at the $i$-th row and $j$-th column of the resulting matrix.

### Additional Resources

For more information on matrix multiplication, you can refer to the [Wikipedia article on Matrix Multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication).


In [22]:
# Matrix multiplication function (for dot product)
def matrix_multiply(A, B):
    return [
        [sum(a * b for a, b in zip(A_row, B_col)) for B_col in zip(*B)] for A_row in A
    ]

# 7. Adding Bias to a Matrix

In this step, we define a function `add_bias(matrix, bias)` that adds a bias vector to each row of a given matrix.

### Where Bias is Coming From:
In the context of neural networks, biases are additional parameters that are added to the weighted sum of inputs before applying the activation function. They help the model to fit the data better by providing an additional degree of freedom. 

### Initialization of Biases:
Biases are typically initialized randomly or set to zero at the beginning of the training process. In our neural network, biases are initialized as follows:

- **`b1`**: A list of random biases for the hidden layer. It has a length equal to the number of neurons in the hidden layer.
- **`b2`**: A list of random biases for the output layer. It has a length equal to the number of neurons in the output layer.

These biases are then added to the respective layers during the forward pass of the neural network.

### Biases During Training:
- **Forward Pass**: During the forward pass, biases are added to the weighted sum of inputs but do not change.

- **Backward Pass**: During backpropagation, biases are updated along with weights to minimize the error between the predicted output and the actual output. This adjustment is done using optimization algorithms like gradient descent.

### Detailed Explanation:
1. **Error Calculation**: During the backward pass, the error (or loss) is calculated as the difference between the predicted output of the neural network and the actual output (ground truth). Common loss functions include Mean Squared Error (MSE) for regression tasks and Cross-Entropy Loss for classification tasks.

2. **Gradient Calculation**: The gradients of the loss function with respect to each weight and bias are computed. These gradients indicate the direction and magnitude of change needed to reduce the error. This process involves applying the chain rule of calculus to propagate the error backward through the network layers.

3. **Parameter Update**: Using the calculated gradients, the weights and biases are updated to minimize the error. This is typically done using an optimization algorithm like gradient descent. In gradient descent, each parameter (weight or bias) is adjusted in the opposite direction of its gradient by a small step, known as the learning rate.

4. **Iterative Process**: The process of forward pass, error calculation, gradient calculation, and parameter update is repeated iteratively for many epochs (complete passes through the training dataset) until the model converges to a solution with minimal error.

By updating the biases (and weights) during backpropagation, the neural network learns to make more accurate predictions, effectively minimizing the error over time.

In [23]:
# Adding bias to a matrix
def add_bias(matrix, bias):
    return [
        [matrix[row][col] + bias[col] for col in range(len(bias))]
        for row in range(len(matrix))
    ]

# 8. Sigmoid Function & Forward Propagation
The `sigmoid` function is an activation function used in neural networks to introduce non-linearity. It maps any input Z (the weighted sum of inputs to a neuron) to a value between 0 and 1. The function's output can be interpreted as the probability of belonging to class 1. Values close to 0 represent class 0, while values close to 1 represent class 1, creating a clear decision boundary.

The formula for the sigmoid function is:
### $\sigma(z) = \frac{1}{1 + e^{-z}}$

The `apply_sigmoid` function applies the sigmoid function to each element of a given matrix. It processes the matrix element-wise, returning a new matrix with the sigmoid function applied to each element.

The `forward` function performs the forward propagation through the neural network. Forward propagation involves calculating the activations of each layer in the network, starting from the input layer and moving through to the output layer.

In [24]:

# Sigmoid function
def sigmoid(z):
    return 1 / (1 + math.exp(-z))

# Element-wise application of the sigmoid function
def apply_sigmoid(matrix):
    return [[sigmoid(x) for x in row] for row in matrix]

# Forward propagation
def forward(X, W1, b1, W2, b2):
    Z1 = add_bias(matrix_multiply(X, W1), b1)  # Input to hidden layer
    A1 = apply_sigmoid(Z1)  # Activation in hidden layer
    Z2 = add_bias(matrix_multiply(A1, W2), b2)  # Input to output layer
    A2 = apply_sigmoid(Z2)  # Final output (prediction)
    return A1, A2


# 9. Sigmoid Derivative

The derivative of the sigmoid function is used calculating the gradient for adjusting the weights during training.

This derivative indicates how the output of the sigmoid function changes with respect to the input, which is essential for optimizing the model.

The formula for the sigmoid derivative function is:
### $\sigma'(z) = \sigma(z) \cdot (1 - \sigma(z))$

In [26]:
# Derivative of the sigmoid function
def sigmoid_derivative(a):
    return a * (1 - a)

# 9. Backpropagation

Backpropagation calculates the error and updates the model's weights and biases to minimize this error.

### Helper Functions
- **`compute_layer_gradients(dZ, A, m)`**: Calculates gradients for weights and biases for any layer.
- **`update_weights_biases(W, b, dW, db, learning_rate)`**: Applies gradient descent to update weights and biases.

### Main Function: `backprop(...)`
- **Inputs**: Input data (`X`), labels (`y`), weights (`W1`, `W2`), biases (`b1`, `b2`), activations (`A1`, `A2`), and learning rate.
- **Process**:
  1. Computes error at the output.
  2. Calculates and backpropagates gradients.
  3. Updates weights and biases for both layers.
- **Output**: Updated weights and biases for further training iterations.




In [1]:
def compute_layer_gradients(dZ, A, m):
    """Compute the gradients for weights and biases in a layer."""
    dW = [[sum(A[i][j] * dZ[i][0] for i in range(m)) / m for _ in range(len(dZ[0]))] for j in range(len(A[0]))]
    db = [sum(dZ[i][0] for i in range(m)) / m]
    return dW, db

def update_weights_biases(W, b, dW, db, learning_rate):
    """Update weights and biases using gradient descent."""
    W_updated = [[W[j][i] - learning_rate * dW[j][i] for i in range(len(W[0]))] for j in range(len(W))]
    b_updated = [b[i] - learning_rate * db[i] for i in range(len(b))]
    return W_updated, b_updated

def backprop(X, y, W1, b1, W2, b2, A1, A2, learning_rate=0.1):
    m = len(y)  # Number of training examples

    # Step 1: Compute the error at the output layer
    dZ2 = [[A2[i][0] - y[i]] for i in range(m)]

    # Step 2: Calculate gradients at the output layer
    dW2, db2 = compute_layer_gradients(dZ2, A1, m)

    # Step 3: Propagate the error back to the hidden layer
    dA1 = [[sum(W2[h][o] * dZ2[i][0] for o in range(len(W2[0]))) for h in range(len(W2))] for i in range(m)]

    # Step 4: Compute the error term at the hidden layer
    dZ1 = [[dA1[i][h] * sigmoid_derivative(A1[i][h]) for h in range(len(A1[0]))] for i in range(m)]

    # Step 5: Calculate gradients at the hidden layer
    dW1, db1 = compute_layer_gradients(dZ1, X, m)

    # Step 6: Update weights and biases using gradient descent
    W1, b1 = update_weights_biases(W1, b1, dW1, db1, learning_rate)
    W2, b2 = update_weights_biases(W2, b2, dW2, db2, learning_rate)

    return W1, b1, W2, b2


# 10 Training Loop

The training loop iterates over a specified number of epochs to train the neural network. During each epoch, the forward and backward propagation steps are performed, and optionally, the loss is calculated for monitoring purposes.


In [28]:
# Training loop
for epoch in range(10000):  # Number of epochs
    A1, A2 = forward(X_train_scaled, W1, b1, W2, b2)
    W1, b1, W2, b2 = backprop(X_train_scaled, y_train, W1, b1, W2, b2, A1, A2)

    # Optional: Calculate loss for monitoring
    if epoch % 1000 == 0:
        loss = sum(
            -y_train[i] * math.log(A2[i][0]) - (1 - y_train[i]) * math.log(1 - A2[i][0])
            for i in range(len(y_train))
        ) / len(y_train)
        print(f"Epoch {epoch}, Loss: {loss}")


Epoch 0, Loss: 2.3317760965068706
Epoch 1000, Loss: 0.39258092513604625
Epoch 2000, Loss: 0.3791446938451353
Epoch 3000, Loss: 0.34456667813829983
Epoch 4000, Loss: 0.29808416957613454
Epoch 5000, Loss: 0.25433260579048994
Epoch 6000, Loss: 0.21775096391199814
Epoch 7000, Loss: 0.19001783867416813
Epoch 8000, Loss: 0.1700869721003493
Epoch 9000, Loss: 0.15602323353907965


### 11. Saving Weights and Biases

This code snippet defines a function to save the weights and biases of a neural network to CSV files after training. This allows for the persistence of the model's parameters, which can be reloaded later for inference or further training.


In [29]:
# Save the weights and biases after training
def save_weights_biases(W, b, W_file, b_file):
    with open(W_file, "w") as f_w, open(b_file, "w") as f_b:
        for row in W:
            f_w.write(",".join(map(str, row)) + "\n")
        f_b.write(",".join(map(str, b)) + "\n")


save_weights_biases(W1, b1, "model_weights/W1.csv", "model_weights/b1.csv")
save_weights_biases(W2, b2, "model_weights/W2.csv", "model_weights/b2.csv")

### 12. Testing the model

* Forward Propagation on Test Data:The forward function is called with the test data (X_test_scaled), weights (W1, W2), and biases (b1, b2). It returns the activations of the hidden layer (A1_test) and the output layer (A2_test).


* Generating Predictions:Predictions are generated by applying a threshold of 0.5 to the output activations (A2_test). If the activation is greater than 0.5, the prediction is 1; otherwise, it is 0.


* Calculating Accuracy:The accuracy is calculated by comparing the predictions with the true labels (y_test). The number of correct predictions is summed and divided by the total number of test samples to get the accuracy.
Printing Accuracy:


In [30]:
# Testing the model
A1_test, A2_test = forward(X_test_scaled, W1, b1, W2, b2)
predictions = [1 if a > 0.5 else 0 for a in [row[0] for row in A2_test]]
accuracy = sum([1 for i in range(len(y_test)) if predictions[i] == y_test[i]]) / len(
    y_test
)
print(f"Test Accuracy: {accuracy}")

Test Accuracy: 0.5625
