# Basic Neural Networks

Today, we will implement a very simple neural network from scratch. We will use the sigmoid activation function and the mean squared error loss function. We will train the network using the backpropagation algorithm.

Let's first implement several loss functions and their derivatives, since we will need them later for training the network.

In [1]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt

# YOUR TURN: Implement the following activation functions, as seen from the lecture notes

In [2]:
def sigmoid(x):
    # Sigmoid activation function
    return 1 / (1 + np.exp(-x))


def tanh(x):
    # Tanh activation function
    return np.tanh(x)


def relu(x):
    # ReLU activation function
    return np.maximum(0, x)


def softmax(x):
    # Softmax activation function
    return np.exp(x) / np.sum(np.exp(x), axis=0)

Since we will use the sigmoid activation function, we will need its derivative. We will also need the mean squared error loss function.

In [3]:
def sigmoid_derivative(x):
    # Derivative of sigmoid activation function
    return x * (1 - x)


def MSE(y, y_pred):
    # Mean Squared Error
    return np.mean((y - y_pred) ** 2)

Now let's define the neural network. We will make the layer sizes and the learning rate configurable parameters. We will also initialize the weights and biases randomly.

In [4]:
# Define the neural network class
class NeuralNetwork:
    def __init__(self, layer_sizes, lr=0.1):
        """
        Initialize the neural network with:
        - layer_sizes: list defining the number of neurons in each layer
        - lr: learning rate for weight updates
        """
        self.lr = lr  # Learning rate
        self.layer_sizes = layer_sizes  # Structure of the neural network

        # Initialize weights and biases with random values
        self.weights = [
            np.random.randn(layer_sizes[i], layer_sizes[i + 1])
            for i in range(len(layer_sizes) - 1)
        ]
        self.biases = [
            np.random.randn(layer_sizes[i + 1]) for i in range(len(layer_sizes) - 1)
        ]

    def forward(self, x):
        """
        Forward pass: propagates inputs through the network layer by layer
        - Stores activations for use in backpropagation
        - Uses sigmoid activation function
        """
        self.activations = [x]  # Store activations for backpropagation
        for W, b in zip(self.weights, self.biases):
            x = np.dot(x, W) + b  # Linear transformation
            x = sigmoid(x)  # Apply activation function
            self.activations.append(x)  # Store activation
        return x  # Final output

    def backward(self, X, y):
        """
        Backward pass: computes gradients and updates weights using gradient descent
        """
        # Compute gradient of loss w.r.t. output (last layer)
        loss_gradient = (self.activations[-1] - y) * sigmoid_derivative(
            self.activations[-1]
        )
        gradients = [loss_gradient]

        # Backpropagate the error through hidden layers
        for i in range(len(self.weights) - 1, 0, -1):
            loss_gradient = np.dot(
                gradients[-1], self.weights[i].T
            ) * sigmoid_derivative(self.activations[i])
            gradients.append(loss_gradient)

        gradients.reverse()  # Reverse to align with layer ordering

        # Update weights and biases using the computed gradients
        for i in range(len(self.weights)):
            self.weights[i] -= self.lr * np.dot(self.activations[i].T, gradients[i])
            self.biases[i] -= self.lr * np.sum(gradients[i], axis=0)

    def train(self, X, y, epochs=10000):
        """
        Train the neural network using forward and backward propagation
        - X: input features
        - y: target labels
        - epochs: number of training iterations
        """
        for epoch in range(epochs):
            y_pred = self.forward(X)  # Forward pass
            self.backward(X, y)  # Backward pass and weight updates

            # Print loss every 100 epochs for monitoring
            if epoch % 100 == 0:
                print(f"Epoch {epoch}, Loss: {MSE(y, y_pred)}")

Now let's initialize the network and train it on a real life dataset.

In [5]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

We will use the breast cancer dataset from sklearn. This dataset is a binary classification dataset, which makes it a good candidate for our simple neural network.
Breast cancer wisconsin (diagnostic) dataset is a classic and very easy binary classification dataset. It contains 569 samples of malignant and benign tumor cells. The first two columns in the dataset store the unique ID numbers of the samples and the corresponding diagnosis (M=malignant, B=benign), respectively. The columns 3-32 contain 30 real-value features that have been computed from digitized images of the cell nuclei, which can be used to build a model to predict whether a tumor is benign or malignant. Let's do a quick exploration of the dataset.

In [6]:
# Load the breast cancer dataset
data = load_breast_cancer()
print(data.DESCR)

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

:Number of Instances: 569

:Number of Attributes: 30 numeric, predictive attributes and the class

:Attribute Information:
    - radius (mean of distances from center to points on the perimeter)
    - texture (standard deviation of gray-scale values)
    - perimeter
    - area
    - smoothness (local variation in radius lengths)
    - compactness (perimeter^2 / area - 1.0)
    - concavity (severity of concave portions of the contour)
    - concave points (number of concave portions of the contour)
    - symmetry
    - fractal dimension ("coastline approximation" - 1)

    The mean, standard error, and "worst" or largest (mean of the three
    worst/largest values) of these features were computed for each image,
    resulting in 30 features.  For instance, field 0 is Mean Radius, field
    10 is Radius SE, field 20 is Worst Radius.

    - 

Now let's prepare the dataset and train the network.

In [7]:
X, y = data.data, data.target
y = y.reshape(-1, 1) # Ensure y is a column vector

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=22)

# Normalize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize and train the neural network
layer_sizes = [X_train.shape[1], 64, 32, 1]  # Define the structure of the neural network, 30 input features, 1 output and 2 hidden layers
learning_rate = 0.01
nn = NeuralNetwork(layer_sizes, lr=learning_rate)
nn.train(X_train, y_train, epochs=10000)

# Test the trained model on the test set
y_pred = nn.forward(X_test)

# Convert the predicted values to binary
y_pred_binary = (y_pred > 0.5).astype(int)

# Calculate the accuracy
accuracy = np.mean(y_pred_binary == y_test)
print(f"Test Accuracy: {accuracy}")

Epoch 0, Loss: 0.4032502278620832
Epoch 100, Loss: 0.011348672389084706
Epoch 200, Loss: 0.008183841185131893
Epoch 300, Loss: 0.006416120652152135
Epoch 400, Loss: 0.005115304928408777
Epoch 500, Loss: 0.004222833230006237
Epoch 600, Loss: 0.003692371236565765
Epoch 700, Loss: 0.003350637304201768
Epoch 800, Loss: 0.003117035339809622
Epoch 900, Loss: 0.002950445502803346
Epoch 1000, Loss: 0.0028276126989517977
Epoch 1100, Loss: 0.0027344730740917773
Epoch 1200, Loss: 0.002662110574139502
Epoch 1300, Loss: 0.0026046692935225344
Epoch 1400, Loss: 0.0025581922094030296
Epoch 1500, Loss: 0.0025199381079503316
Epoch 1600, Loss: 0.002487963467270327
Epoch 1700, Loss: 0.0024608589513584068
Epoch 1800, Loss: 0.0024375791661777903
Epoch 1900, Loss: 0.0024173297814073656
Epoch 2000, Loss: 0.002399490172620031
Epoch 2100, Loss: 0.00238355762221964
Epoch 2200, Loss: 0.002369103224616727
Epoch 2300, Loss: 0.002355730823266033
Epoch 2400, Loss: 0.0023430278901257614
Epoch 2500, Loss: 0.00233048741

We have achieved an accuracy of ~0.95 on the test set. Not bad for a simple neural network!