# Implementing a Simple Neural Network from Scratch using NumPy vs. Pytorch

(as part of the Deep Learning & NLP course)

**2025-0512 PP** use `conda` environment `Pytorch 2 (python 3.12)` (i.e. Apple Silicon, pytorch on GPU)

**TODO** apply neural network on astronominal project: Habital/non-habital planets

<br>
© Thu Vu. All rights reserved.

<hr>

💡 In this project, we will build a simple neural network to predict whether a flower will thrive or not based on the amount of sun and water it needs.

Our neural network will have:
- Two inputs \(X_1\) and \(X_2\)
- A hidden layer with two neurons, \(h_1\) and \(h_2\)
- An output layer that provides a single predicted output \(y_pred\)

In addition, we will use a Sigmoid activation function and Scochastic Gradient Descent (SGD) for optimization.

<img src="images/network.png" alt="Network Image" width="500"/>

## Step 0: Install libraries

In [1]:
# Install the libraries we are going to use
#PP I'll use Jupyter kernel 'Pytorch2 (Python 3.12)': ! pip install numpy pandas torch


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Step 1: Loading the Data
We will use a small dataset called `flower_data_train.csv` that contains information about the amount of sun and water each flower received and whether it thrived or not.

In [1]:
import pandas as pd

# Load the dataset
data = pd.read_csv('data/flower_data_train.csv')
data.head()

Unnamed: 0,flower_id,sun,water,outcome
0,0,0.73,0.15,Did not thrive
1,1,0.6,0.5,Thrived
2,2,1.0,1.0,Did not thrive
3,3,0.8,0.55,Thrived
4,4,0.9,0.6,Thrived


## Step 2: Data Preprocessing
We need to preprocess the data by encoding the outcome as a binary value (1 for 'Thrived' and 0 for 'Did not thrive'). We will also extract the features (sun and water) and the labels.

In [2]:
# Encode the outcome
data['outcome'] = data['outcome'].apply(lambda x: 1 if x == 'Thrived' else 0)

# Extract features and labels
X = data[['sun', 'water']].values
y = data['outcome'].values
print(X[:5], y[:5])

[[0.73 0.15]
 [0.6  0.5 ]
 [1.   1.  ]
 [0.8  0.55]
 [0.9  0.6 ]] [0 1 0 1 1]


## Step 3: Implementing the Neural Network from Scratch (VERY naive version)
We will build a simple neural network with the following structure:
- **Input Layer**: 2 neurons (sun and water)
- **Hidden Layer**: 2 neurons
- **Output Layer**: 1 neuron (sigmoid activation for binary classification)

### Activation Function
We will use the sigmoid function:
$$
sigmoid(x) = \frac{1}{1 + e^{-x}}
$$

In [3]:
import numpy as np

# Sigmoid activation function
def sigmoid(x):
    # Sigmoid function: f(x) = 1 / (1 + e^(-x))
    return 1 / (1 + np.exp(-x))

# Derivative of the sigmoid function
def sigmoid_derivative(x):
    # Derivative of sigmoid: f'(x) = f(x) * (1 - f(x))
    return sigmoid(x) * (1 - sigmoid(x))

### Cross-Entropy Loss
The cross-entropy loss is given by the formula:
$$
L = -\frac{1}{N} \sum \left[ y_{\text{true}} \log(y_{\text{pred}}) + (1 - y_{\text{true}}) \log(1 - y_{\text{pred}}) \right]
$$

where:
- N is the number of samples
- y_true are the true labels (0 or 1)
- y_pred are the predicted probabilities (between 0 and 1)

In [4]:
def cross_entropy_loss(y_true, y_pred):
    """
    Computes the binary cross-entropy loss.

    Parameters:
    y_true (numpy array): True labels (0 or 1)
    y_pred (numpy array): Predicted probabilities (between 0 and 1)

    Returns:
    float: The binary cross-entropy loss
    """
    # Clip y_pred to avoid log(0) and ensure numerical stability
    y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
    loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    return loss

Now we will define our neural network and train it on our data.

In [5]:
class OurNeuralNetwork:
  '''
  Adapted from Victor Zhou's repo: https://github.com/vzhou842/neural-network-from-scratch
  
  A neural network with:
    - 2 inputs
    - a hidden layer with 2 neurons (h1, h2)
    - an output layer with 1 neuron (y_pred)

  *** DISCLAIMER ***:
  The code below is intended to be simple and educational. It's not optimal.
  Real neural net code looks nothing like this. DO NOT use this code.
  Instead, read/run it to understand how this specific network works.
  '''
  def __init__(self):
    # Set the seed for reproducibility
    np.random.seed(42)

    # Weights
    self.w1 = np.random.rand()
    self.w2 = np.random.rand()
    self.w3 = np.random.rand()
    self.w4 = np.random.rand()
    self.w5 = np.random.rand()
    self.w6 = np.random.rand()

    # Biases
    self.b1 = np.random.rand()
    self.b2 = np.random.rand()
    self.b3 = np.random.rand()

  def feedforward(self, x):
    # x is a numpy array with 2 elements.
    h1 = sigmoid(self.w1 * x[0] + self.w2 * x[1] + self.b1)
    h2 = sigmoid(self.w3 * x[0] + self.w4 * x[1] + self.b2)
    o1 = sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
    return o1

  def train(self, X, all_y_trues):
    '''
    - X is a (n x 2) numpy array, n = # of samples in the dataset.
    - all_y_trues is a numpy array with n elements.
      Elements in all_y_trues correspond to those in data.
    '''
    learn_rate = 0.1   # how fast we want to train (adjusting weights and biases)
    epochs = 1000      # number of times to loop through the entire dataset

    for epoch in range(epochs):
      for x, y_true in zip(X, all_y_trues):  # For each data pair of X and y like this: [0.73 0.15] 0

        # --- Do a feedforward (we'll need these values later)
        sum_h1 = self.w1 * x[0] + self.w2 * x[1] + self.b1
        h1 = sigmoid(sum_h1)

        sum_h2 = self.w3 * x[0] + self.w4 * x[1] + self.b2
        h2 = sigmoid(sum_h2)

        sum_y_pred = self.w5 * h1 + self.w6 * h2 + self.b3
        y_pred = sigmoid(sum_y_pred)

        # --- Calculate partial derivatives.
        # --- Naming: d_L_d_w1 represents "partial L / partial w1"

        # Partial derivative of loss with respect to y_pred (see the note cell below for how we got this formula)
        d_L_d_ypred = - (y_true / y_pred - (1 - y_true) / (1 - y_pred))

        # Neuron y_pred
        d_ypred_d_w5 = h1 * sigmoid_derivative(sum_y_pred)
        d_ypred_d_w6 = h2 * sigmoid_derivative(sum_y_pred)
        d_ypred_d_b3 = sigmoid_derivative(sum_y_pred)

        d_ypred_d_h1 = self.w5 * sigmoid_derivative(sum_y_pred)
        d_ypred_d_h2 = self.w6 * sigmoid_derivative(sum_y_pred)

        # Neuron h1
        d_h1_d_w1 = x[0] * sigmoid_derivative(sum_h1)
        d_h1_d_w2 = x[1] * sigmoid_derivative(sum_h1)
        d_h1_d_b1 = sigmoid_derivative(sum_h1)

        # Neuron h2
        d_h2_d_w3 = x[0] * sigmoid_derivative(sum_h2)
        d_h2_d_w4 = x[1] * sigmoid_derivative(sum_h2)
        d_h2_d_b2 = sigmoid_derivative(sum_h2)

        # --- Update weights and biases
        # Neuron h1
        self.w1 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w1
        self.w2 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w2
        self.b1 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_b1

        # Neuron h2
        self.w3 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w3
        self.w4 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w4
        self.b2 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_b2

        # Neuron y_pred
        self.w5 -= learn_rate * d_L_d_ypred * d_ypred_d_w5
        self.w6 -= learn_rate * d_L_d_ypred * d_ypred_d_w6
        self.b3 -= learn_rate * d_L_d_ypred * d_ypred_d_b3

      # --- Calculate total loss at the end of each epoch
      # Loss should decrease per epoch, otherwise something is wrong in our implementation
      if epoch % 10 == 0:
        y_preds = np.apply_along_axis(self.feedforward, 1, X)
        loss = cross_entropy_loss(all_y_trues, y_preds)
        print("Epoch %d loss: %.10f" % (epoch, loss))

# Define all_y_trues
all_y_trues = y

# Train our neural network!
network = OurNeuralNetwork()
network.train(X, all_y_trues)

Epoch 0 loss: 0.6953976154
Epoch 10 loss: 0.7025905718
Epoch 20 loss: 0.7016244024
Epoch 30 loss: 0.7007784134
Epoch 40 loss: 0.7000184221
Epoch 50 loss: 0.6993148162
Epoch 60 loss: 0.6986393653
Epoch 70 loss: 0.6979615931
Epoch 80 loss: 0.6972462427
Epoch 90 loss: 0.6964521230
Epoch 100 loss: 0.6955322222
Epoch 110 loss: 0.6944352553
Epoch 120 loss: 0.6931093115
Epoch 130 loss: 0.6915085455
Epoch 140 loss: 0.6896034676
Epoch 150 loss: 0.6873939615
Epoch 160 loss: 0.6849216920
Epoch 170 loss: 0.6822760909
Epoch 180 loss: 0.6795878530
Epoch 190 loss: 0.6770078078
Epoch 200 loss: 0.6746761329
Epoch 210 loss: 0.6726927937
Epoch 220 loss: 0.6711004629
Epoch 230 loss: 0.6698853000
Epoch 240 loss: 0.6689924897
Epoch 250 loss: 0.6683477396
Epoch 260 loss: 0.6678760130
Epoch 270 loss: 0.6675129713
Epoch 280 loss: 0.6672090326
Epoch 290 loss: 0.6669282331
Epoch 300 loss: 0.6666442385
Epoch 310 loss: 0.6663350538
Epoch 320 loss: 0.6659772061
Epoch 330 loss: 0.6655399417
Epoch 340 loss: 0.6649804

<div class="alert alert-block alert-info">

**📘 NOTE: Derivative of the cross-entropy loss (L) with respect to the prediction (y_pred)**

Given the binary cross-entropy loss:

$$
L = -\frac{1}{N} \sum \left[ y_{\text{true}} \log(y_{\text{pred}}) + (1 - y_{\text{true}}) \log(1 - y_{\text{pred}}) \right]
$$

For a single training example (drop the summation and $\frac{1}{N}$):

$$
L = - \left[ y_{\text{true}} \log(y_{\text{pred}}) + (1 - y_{\text{true}}) \log(1 - y_{\text{pred}}) \right]
$$


Take the derivative with respect to $y_{\text{pred}}$:

$$
\frac{\partial L}{\partial y_{\text{pred}}} = - \left( \frac{y_{\text{true}}}{y_{\text{pred}}} - \frac{1 - y_{\text{true}}}{1 - y_{\text{pred}}} \right)
$$

Hence, in our code above, we wrote:

d_L_d_ypred = - (y_true / y_pred - (1 - y_true) / (1 - y_pred))
</div>

## Step 4: Implementing the Neural Network using PyTorch
Now, let's implement the same neural network using PyTorch, which will simplify the process.

In [6]:
import torch
import torch.nn as nn
import torch.optim as optim

# Convert data to PyTorch tensors
X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.float32).unsqueeze(1)

# Define the neural network class
class SimpleNeuralNetwork(nn.Module):
    def __init__(self):
        super(SimpleNeuralNetwork, self).__init__()
        self.hidden = nn.Linear(2, 2)
        self.output = nn.Linear(2, 1)

    def forward(self, x):
        x = torch.sigmoid(self.hidden(x))
        x = torch.sigmoid(self.output(x))
        return x

# Initialize the model, loss function, and optimizer with learning rate = 0.01
model = SimpleNeuralNetwork()
criterion = nn.BCELoss()  # Binary cross-entropy loss
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(1000):
    # Forward pass
    outputs = model(X_tensor)
    loss = criterion(outputs, y_tensor)

    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item()}')

Epoch 0, Loss: 0.73137366771698
Epoch 10, Loss: 0.7282949090003967
Epoch 20, Loss: 0.7254539132118225
Epoch 30, Loss: 0.722833514213562
Epoch 40, Loss: 0.7204175591468811
Epoch 50, Loss: 0.7181909084320068
Epoch 60, Loss: 0.716139554977417
Epoch 70, Loss: 0.7142502665519714
Epoch 80, Loss: 0.7125106453895569
Epoch 90, Loss: 0.7109094262123108
Epoch 100, Loss: 0.7094359993934631
Epoch 110, Loss: 0.7080804705619812
Epoch 120, Loss: 0.7068336009979248
Epoch 130, Loss: 0.7056871056556702
Epoch 140, Loss: 0.7046329975128174
Epoch 150, Loss: 0.7036639451980591
Epoch 160, Loss: 0.7027732729911804
Epoch 170, Loss: 0.7019548416137695
Epoch 180, Loss: 0.7012028694152832
Epoch 190, Loss: 0.7005119323730469
Epoch 200, Loss: 0.6998772621154785
Epoch 210, Loss: 0.69929438829422
Epoch 220, Loss: 0.6987590193748474
Epoch 230, Loss: 0.6982674598693848
Epoch 240, Loss: 0.6978158950805664
Epoch 250, Loss: 0.6974014043807983
Epoch 260, Loss: 0.6970208287239075
Epoch 270, Loss: 0.6966714262962341
Epoch 280