In [None]:
from pathlib import Path

ROOT_DIR = Path('..') / '..'
!pip install -r {ROOT_DIR/ 'requirements.txt'}

# Neural Networks

Neural networks, often referred to as artificial neural networks (ANNs), are computational systems that attempt to mimic the way the human brain processes information. Inspired by biological neural structures, ANNs are central to deep learning, a subset of machine learning. Over the past decade, they've driven many AI advancements, setting benchmark performance in tasks like image recognition, natural language processing, and game-playing strategies.

## Structure

A neural network is usually built with several layers of interconnected nodes or "neurons". Every connection has a weight that's adjusted during training. The primary layers of a neural network include:

1. **Input Layer:** Accepts input features and sends them to the next layer.
2. **Hidden Layers:** Intermediate layers that handle input data by processing them through weighted connections and activation functions.
3. **Output Layer:** Yields the ultimate predictions or categorizations.

Activation functions introduce non-linearity when applied by neurons to their inputs. This non-linearity is crucial as it enables the network to capture complex patterns and relationships in the data.

In this notebook, we will employ a straightforward feed-forward neural network to categorize images from the MNIST dataset. This network boasts two hidden layers with 16 units each, using the ReLU activation function. The output layer consists of 10 units, each representing a class, and applies the softmax activation function. After training the network, its parameters were saved. We will now load them to visualize a few sample predictions.

## Feed-Forward Network

Feed-forward neural networks are a category of ANNs where the units don't form cyclical connections. This ensures the data flows only in a single direction—starting from the input nodes, moving through any hidden nodes, and culminating at the output nodes. These networks lack the feedback loops that would reintroduce the model's outputs back into itself, differentiating feed-forward networks from recurrent neural networks. They focus on distinct data hierarchies, unlike convolutional networks that prioritize spatial hierarchies, or RNNs which are designed for sequences and possess a form of memory.

### MNIST Dataset Overview

The MNIST dataset comprises 70,000 grayscale images of handwritten digits. Specifically:

- **Image Dimensions**: Each image measures 28x28 pixels.
- **Pixel Representation**: Pixels range from 0 (black) to 255 (white).
- **Dataset Split**:
  - **Training Set**: 60,000 images
  - **Test Set**: 10,000 images

Due to its simplicity and established benchmarks, MNIST is a popular choice for introductory machine learning exercises, often likened to the "Hello, World!" of machine learning.

### Loading and Visualizing the MNIST Dataset

Let's load the dataset and randomly visualize some of the handwritten digits:

In [None]:
from matplotlib.axes import Axes
from numpy import ndarray
from matplotlib import pyplot
from torchvision.transforms import ToTensor
from utils import DATA_PATH
from torchvision.datasets import MNIST


def mnist_dataset() -> MNIST:
    """Load the MNIST dataset"""
    return MNIST(str(DATA_PATH / "mnist"), train=False, transform=ToTensor(), download=True)


# Define the number of examples to visualize
N_EXAMPLES = 3


def setup_figure() -> ndarray[Axes]:
    """Set up a figure for visualization"""
    fig = pyplot.figure(figsize=(6, N_EXAMPLES * 3))
    grid_spec = fig.add_gridspec(1, 3, hspace=2)
    axs = grid_spec.subplots(sharey='row')
    return axs


def display_examples(*axs: pyplot.Axes):
    """Display a few examples from the MNIST dataset"""
    for i in range(N_EXAMPLES):
        idx = random.randint(0, len(mnist_dataset()))
        img, label = mnist_dataset()[idx]
        view = img.view(28, 28).numpy()
        axs[i].set_title(f"Class: {label}")
        axs[i].imshow(view)


display_examples(*setup_figure())

### Using a Pre-Trained Model

#### Loading Model Parameters from Disk

In a neural network, parameters constitute the weights and biases between units. They're essential because they are learned and adjusted during the training phase, allowing the model to make accurate predictions or classifications.

For our task, we've previously trained a model on the MNIST dataset and saved its parameters to disk. Let's load these parameters and initialize our neural network model for further predictions.

In [None]:
# Load weights and biases for hidden layers
weights = [torch.from_numpy(loadtxt(DATA_PATH / f"W{i}.txt")).float() for i in (1, 2)]
biases = [torch.from_numpy(loadtxt(DATA_PATH / f"b{i}.txt")).float() for i in (1, 2)]

# Load weights and biases for the output layer
output_weights = torch.from_numpy(loadtxt(DATA_PATH / "U.txt")).float()
output_biases = torch.from_numpy(loadtxt(DATA_PATH / "c.txt")).float()

# Initialize the neural network model
MNIST_MODEL = FeedForwardNetwork(
    n_features=784,
    hidden_layer_sizes=[16, 16],
    activation_functions=[relu, relu],
    n_classes=10,
)

# Inject the loaded parameters into the model
MNIST_MODEL.load_parameters(weights, output_weights, biases, output_biases)

# Display the initialized model
print(MNIST_MODEL)

#### Making Predictions with the Model

After successfully loading the model and dataset, it's time to use our model to make predictions. We'll select a subset of images from the MNIST dataset at random and visualize their true and predicted class labels.

In [None]:
def display_predictions(*axs: pyplot.Axes) -> None:
    """
    Visualize predictions for a subset of the MNIST dataset.

    This function randomly selects images from the MNIST dataset, makes predictions using
    the loaded model, and visualizes the images alongside their true and predicted class labels.

    :param axs: A list of Axes objects to display the images and predictions.
    """
    for i in range(N_EXAMPLES):
        idx = random.randint(0, len(mnist_dataset()))
        img, true_label = mnist_dataset()[idx]
        img_view = img.view(28, 28).numpy()

        # Predicting the class label using the model
        pred_prob, pred_label = torch.max(MNIST_MODEL(img.view(1, 784)), dim=1)

        # Displaying the image along with true and predicted labels
        axs[i].imshow(img_view)
        axs[i].set_title(
            f"True Class: {true_label}\n"
            f"Predicted Class: {pred_label.item()}\n"
            f"Confidence: {pred_prob.item():.2f}"
        )


# Using the function to display predictions
display_predictions(*setup_figure())

#### Evaluating Model Performance

We want to see how well our model does on all the pictures we have. We'll use a function that goes through all the images in small groups, makes guesses with our model, and then counts how many guesses are right. In the end, we'll see what percentage of the guesses were correct.

In [None]:
from tqdm import tqdm
from torchvision.datasets import VisionDataset
from networks import NeuralNetwork


def evaluate_network(
        network: NeuralNetwork,
        dataset: VisionDataset,
        batch_size: int = 100,
        device: Device = Device.CPU,
):
    """
    Evaluates the performance of a neural network on a given vision dataset.

    This function iterates over the dataset using batches, computes predictions for each
    batch using the provided network, and tracks the number of correct predictions.
    At the end, it prints the accuracy of the network on the dataset.

    :param network: The neural network model to be evaluated.
    :param dataset: The dataset on which the network is evaluated.
    :param batch_size: The size of the batches in which the dataset is divided for
                       evaluation.
                       Default is 100.
    :param device: The device on which the computations are performed (CPU or GPU).

    __Note:__

    - This function assumes that the network's forward method outputs raw scores (logits)
      for each class.
    - The accuracy is computed as the percentage of correct predictions over the total
      number of samples in the dataset.
    """
    network.to(device)
    data_loader = DataLoader(dataset, batch_size=batch_size)
    n_correct = 0
    for x, y in tqdm(data_loader):
        view: torch.Tensor = x.view(-1, network.input_size).to(device)
        predictions: torch.Tensor = torch.max(network(view), dim=1)[1]
        n_correct += torch.sum(torch.eq(predictions, y.to(device))).item()

    print(f"Accuracy: {(n_correct / len(dataset) * 100):.2f}%")


evaluate_network(MNIST_MODEL, mnist_dataset(), device=Device.CPU)

### Training a Neural Network with Backpropagation

Backpropagation is like a teacher for neural networks. It helps the network learn from mistakes by making small changes to its internal settings. Here's how it works:

1. **Learning from Examples:** We show the network many examples and tell it the correct answers.
2. **Making Guesses:** The network tries to guess the answer for each example.
3. **Checking Mistakes:** After guessing, we check how far off its guess was from the correct answer.
4. **Learning from Mistakes:** Using the mistakes it made, the network fine-tunes its internal settings to guess better next time.
5. **Repeat:** We keep showing examples until the network gets good at guessing right.

The magic of backpropagation is in step 4, where it figures out which settings to tweak and by how much. This "tweaking" is done using a math trick called gradient descent.

#### Checking if Gradients are Correct: Gradient Checking

Imagine you've got a math formula, and you've made some changes to it. You'd want to double-check if your changes were right. That's what gradient checking does for neural networks.

In simple terms, gradient checking compares two methods of finding gradients (slopes). One method uses the standard backpropagation technique. The other uses a quick-and-dirty method called "finite difference approximation." If both methods give similar answers, we can be pretty sure our backpropagation is set up correctly.

Here's a simple way to do gradient checking:

In [None]:
def check_gradients(epsilon: float = 1e-6):
    # Disable tracking computations
    with torch.no_grad():
        # Set some basics and random data
        samples = 100
        input_size = 300
        output_classes = 10
        # Set up a basic neural network
        network = FeedForwardNetwork(input_size, [100, 200], [sigmoid, relu], output_classes)
        parameters = list(network.parameters())
        # Random input
        input_data = torch.randn(samples, input_size)
        # Make random target labels
        labels = torch.zeros(samples, output_classes)
        targets = torch.randint(0, output_classes, (samples,))
        labels[torch.arange(samples), targets] = 1
        for param in parameters:
            # Check the loss when we reduce the parameter a tiny bit
            param -= epsilon
            pred_minus = network(input_data)
            loss_minus = cross_entropy(pred_minus, labels)
            # Check the loss when we increase the parameter a tiny bit
            param += 2 * epsilon
            pred_plus = network(input_data)
            loss_plus = cross_entropy(pred_plus, labels)
            # Quick-and-dirty gradient calculation
            estimated_gradient = (loss_plus - loss_minus) / (2 * epsilon)
            # Bring parameter back to original
            param -= epsilon
            # Get the actual gradient using backpropagation
            pred = network(input_data)
            network.backward(input_data, labels, pred)
            # See how different the two gradients are
            difference = torch.abs(estimated_gradient - torch.mean(param.grad))
            print(f"Difference between estimated and real gradient: {difference}")


# Run our gradient check
check_gradients()

#### Training the Model

Now that we've verified that our gradients are correct, we can train our model. We'll use a RandomDataset. This dataset generates random data and labels on the fly. It's useful for testing and debugging.

In [None]:
def plot_loss_and_accuracy(losses: list[float], accuracies: list[float]):
    """Plot the loss and accuracy of the model during training"""
    fig_loss = pyplot.figure(1)
    loss_ax = fig_loss.add_subplot(111)
    loss_ax.set_title("Loss")
    loss_ax.set_xlabel("epochs")
    loss_ax.set_ylabel("loss")
    loss_ax.plot(losses, c="r")

    fig_accuracy = pyplot.figure(2)
    accuracy_ax = fig_accuracy.add_subplot(111)
    accuracy_ax.set_title("Accuracy")
    accuracy_ax.set_xlabel("epochs")
    accuracy_ax.set_ylabel("acc")
    accuracy_ax.plot(accuracies, c="b")
    pyplot.show()

In [None]:
from datasets import SizedDataset


def convert_to_one_hot(tensor: torch.Tensor, labels: torch.Tensor) -> torch.Tensor:
    """
    Convert tensor to one-hot encoding based on provided labels.
    """
    one_hot = torch.zeros_like(tensor)
    one_hot[torch.arange(tensor.size(0)), labels] = 1.0
    return one_hot


def train_one_batch(network: FeedForwardNetwork, optimizer: StochasticGradientDescent, x: torch.Tensor,
                    y: torch.Tensor) -> None:
    """
    Train the network on a single batch of data.
    """
    y_pred = network(x)
    y_onehot = convert_to_one_hot(y_pred, y)
    network.backward(x, y_onehot, y_pred)
    optimizer.step()


def report_progress(epoch: int, accuracy: float, loss: float, avg_time: float):
    """
    Report training progress.
    """
    print(f"\rEpoch:{epoch:03d} Accuracy:{accuracy:.2f}% Loss:{loss:.4f} Time/epoch:{avg_time:.3f}s", end='')


def calculate_average_time(previous_avg: float, current_time: float, epoch: int) -> float:
    """
    Calculate the average time taken per epoch.
    """
    return (previous_avg * (epoch - 1) + current_time) / epoch


def train_feed_forward_network(
        network: FeedForwardNetwork,
        dataset: SizedDataset | VisionDataset,
        optimizer: StochasticGradientDescent,
        epochs: int = 1,
        batch_size: int = 1,
        reports_every: int = 1,
        device=Device.CPU
) -> tuple[list[float], list[float]]:
    network.to(device)
    data_loader = DataLoader(dataset, batch_size, shuffle=True)
    dataset_size = len(dataset)
    average_time_per_epoch = 0
    losses, accuracies = [], []
    for epoch in range(1, epochs + 1):
        epoch_start_time = timer()
        for x, y in data_loader:
            x, y = x.view(x.size(0), -1).float().to(device), y.to(device)
            train_one_batch(network, optimizer, x, y)
        average_time_per_epoch = calculate_average_time(average_time_per_epoch, timer() - epoch_start_time, epoch)
        if epoch % reports_every == 0:
            x_all = dataset.data.view(dataset_size, -1).float().to(device)
            true_labels = dataset.targets.to(device)
            predicted_output = network(x_all).to(device)
            onehot_prediction = convert_to_one_hot(predicted_output, true_labels)
            loss = cross_entropy(predicted_output, onehot_prediction)
            losses.append(loss)
            predicted_labels = torch.argmax(predicted_output, dim=1)
            accuracy = 100 * (predicted_labels == true_labels).sum().item() / dataset_size
            accuracies.append(accuracy)
            report_progress(epoch, accuracy, loss, average_time_per_epoch)
    return losses, accuracies


In [None]:
def train_network_on_dataset(dataset_class):
    # Hyperparameters
    n_samples = 2000
    n_features = 300
    n_classes = 10
    hidden_layer_sizes = [300, 400]
    activation_functions = [celu, relu]
    activation_function_parameters = [float(n_classes), None]
    learning_rate = 1e-3
    epochs = 100
    batch_size = 32

    # Initialize network
    network = FeedForwardNetwork(n_features, hidden_layer_sizes, activation_functions,
                                 n_classes, activation_function_parameters)

    # Generate dataset based on the provided dataset class
    dataset = dataset_class(n_samples, n_features, n_classes)

    # Initialize optimizer
    optimizer = StochasticGradientDescent(network.parameters(), learning_rate=learning_rate)

    # Train network
    with torch.no_grad():
        losses, accuracies = train_feed_forward_network(network, dataset, optimizer,
                                                        epochs=epochs, batch_size=batch_size)

    # Plot results
    plot_loss_and_accuracy(losses, accuracies)


train_network_on_dataset(RandomUniformDataset)

In [None]:
from datasets import BernoulliDataset

train_network_on_dataset(BernoulliDataset)

In [None]:
from datasets import RandomNormalDataset

train_network_on_dataset(RandomNormalDataset)

#### Training the Model on MNIST

Having introduced the MNIST dataset and our chosen training methodology, let's delve into the specifics of putting them into action.

In [None]:
def train_network_on_mnist_dataset():
    # Hyperparameters
    n_features = 784
    n_classes = 10
    hidden_layer_sizes = [512, 1024, 128]
    activation_functions = [relu, relu, relu]
    learning_rate = 1e-5
    epochs = 30
    batch_size = 32
    # Initialize network
    network = FeedForwardNetwork(n_features, hidden_layer_sizes, activation_functions, n_classes)
    # Generate random dataset

    dataset = MNIST(
        str(DATA_PATH / "mnist"), train=False, transform=ToTensor(), download=True
    )
    # Initialize optimizer
    optimizer = StochasticGradientDescent(network.parameters(), learning_rate=learning_rate)
    # Train network
    with torch.no_grad():
        losses, accuracies = train_feed_forward_network(network, dataset, optimizer, epochs=epochs,
                                                        batch_size=batch_size)
    # Plot results
    plot_loss_and_accuracy(losses, accuracies)


train_network_on_mnist_dataset()