# Neural Networks and Deep Learning - Chapter 1

## Introduction
In this notebook, we’ll implement a simple neural network to classify handwritten digits from the MNIST dataset. This is based on **Chapter 1** of Michael Nielsen’s book *"Neural Networks and Deep Learning"*. We’ll break down the code into small, understandable steps and provide explanations for each part.

---

## Step 1: Import Libraries
First, let’s import the necessary libraries. We’ll use `numpy` for numerical computations and `random` for shuffling the data.

In [1]:
import random
import time
import numpy as np

### Step 2: Define the Network Class

**Explanation**:
The `Network` class is the heart of our neural network. It’s like a factory that builds and trains the network. Here’s what it does:
- **Initialization**: Sets up the network with random weights and biases. Think of weights as "knobs" that the network adjusts to learn, and biases as "offsets" that help fine-tune the output.
- **Layers**: The network has multiple layers: an input layer, one or more hidden layers, and an output layer. For example, `[784, 30, 10]` means:
  - Input layer: 784 neurons (one for each pixel in a 28x28 image).
  - Hidden layer: 30 neurons (the "brain" of the network).
  - Output layer: 10 neurons (one for each digit class, 0 through 9).

In [2]:
class Network(object):
    def __init__(self, sizes):
        """Initialize the network with the given layer sizes."""
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]

### Step 3: Feedforward Method

**Explanation**:
The `feedforward` method is how the network makes predictions. It’s like a conveyor belt:
1. Takes an input (e.g., an image of a digit).
2. Passes it through each layer of the network.
3. Applies the sigmoid activation function to "squash" the output into a range between 0 and 1.
4. Returns the final output, which represents the network’s prediction.

In [3]:
def feedforward(self, a):
        """Compute the output of the network for input `a`."""
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a) + b)
        return a

### Step 4: Stochastic Gradient Descent (SGD)

**Explanation**:
Training a neural network is like teaching a child to ride a bike. You show them examples, and they learn from their mistakes. Here’s how it works:
1. **Mini-Batches**: Instead of learning from all the data at once, the network learns from small chunks (mini-batches). This makes training faster and more efficient.
2. **Epochs**: The network goes through the entire dataset multiple times (epochs) to improve its performance.
3. **Learning Rate (`eta`)**: Controls how big the steps are when adjusting the weights and biases. Too big, and the network might overshoot; too small, and it might take forever to learn.

In [5]:
def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None):
        """Train the network using stochastic gradient descent."""
        if test_data: n_test = len(test_data)
        n = len(training_data)
        for j in range(epochs):
            time1 = time.time()
            random.shuffle(training_data)
            mini_batches = [
                training_data[k:k+mini_batch_size]
                for k in range(0, n, mini_batch_size)]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            time2 = time.time()
            if test_data:
                print("Epoch {0}: {1} / {2}, took {3:.2f} seconds".format(
                    j, self.evaluate(test_data), n_test, time2-time1))
            else:
                print("Epoch {0} complete in {1:.2f} seconds".format(j,  time2-time1))

### Step 5: Update Mini-Batch

**Explanation**:
The `update_mini_batch` method is where the magic happens! It’s like a coach correcting the network’s mistakes:
1. **Backpropagation**: Computes how much each weight and bias contributed to the error.
2. **Gradient Descent**: Adjusts the weights and biases to reduce the error.
3. **Learning Rate (`eta`)**: Determines how much to adjust the weights and biases. Think of it as the "step size" in the learning process.

In [6]:
def update_mini_batch(self, mini_batch, eta):
        """Update the network's weights and biases using backpropagation on a mini-batch."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [w-(eta/len(mini_batch))*nw
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb
                        for b, nb in zip(self.biases, nabla_b)]

### Step 6: Backpropagation

**Explanation**:
Backpropagation is the "brain" of the learning process. It’s like solving a mystery:
1. **Forward Pass**: Computes the output of the network for a given input.
2. **Error Calculation**: Compares the output to the true label to compute the error.
3. **Backward Pass**: Propagates the error backward through the network to compute gradients for each weight and bias.

In [7]:
def backprop(self, x, y):
        """Compute the gradients for the cost function using backpropagation."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        # Feedforward
        activation = x
        activations = [x]  # List to store all the activations, layer by layer
        zs = []  # List to store all the z vectors, layer by layer
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation) + b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        # Backward pass
        delta = self.cost_derivative(activations[-1], y) * sigmoid_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        for l in range(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
        return (nabla_b, nabla_w)

### Step 7: Evaluate the Network

**Explanation**:
The `evaluate` method checks how well the network is performing. It’s like a teacher grading a test:
1. **Predictions**: The network makes predictions for the test data.
2. **Accuracy**: Compares the predictions to the true labels and calculates the percentage of correct answers.

In [None]:
def evaluate(self, test_data):
        """Evaluate the network's performance on the test data."""
        test_results = [(np.argmax(self.feedforward(x)), y)
                        for (x, y) in test_data]
        return sum(int(x == y) for (x, y) in test_results)

### Step 8: Cost Function Derivative

**Explanation**:
The `cost_derivative` method computes how much the output differs from the true label. It’s like measuring how far off the network’s prediction is.

In [8]:
def cost_derivative(self, output_activations, y):
        """Compute the derivative of the cost function."""
        return (output_activations - y)

### Step 9: Sigmoid Activation Function

**Explanation**:
The `sigmoid` function is the activation function used in the network. It’s like a "squashing" function that maps any input to a value between 0 and 1. This helps the network make decisions.

In [9]:
def sigmoid(z):
    """The sigmoid function."""
    return 1.0 / (1.0 + np.exp(-z))

### Step 10: Sigmoid Derivative

**Explanation**:
The `sigmoid_prime` function is the derivative of the sigmoid function. It’s used in backpropagation to compute how much to adjust the weights and biases.

In [10]:
def sigmoid_prime(z):
    """Derivative of the sigmoid function."""
    return sigmoid(z) * (1 - sigmoid(z))

### Step 11: Load the MNIST Dataset

**Explanation**:
The MNIST dataset is a collection of 70,000 images of handwritten digits (0 through 9). Each image is 28x28 pixels, and we’ll preprocess the data to make it ready for training.

In [None]:
from src import mnist_loader
training_data, validation_data, test_data = mnist_loader.load_data_wrapper()

ModuleNotFoundError: No module named 'src'

### Step 12: Train the Network

**Explanation**:
Now it’s time to train the network! We’ll use the MNIST dataset to teach the network how to recognize handwritten digits. The network will learn by adjusting its weights and biases to minimize the error.

In [None]:
net = Network([784, 20,20,10])
net.SGD(training_data, 30, 10, 0.1, lmbda = 5.0,evaluation_data=validation_data, 
    monitor_evaluation_accuracy=True)

### Step 13: Visualize Predictions

**Explanation**:
Let’s see how well the network is doing! We’ll visualize some test images along with the network’s predictions and the true labels.

In [None]:
import matplotlib.pyplot as plt

# Plot some test images and their predictions
fig, axes = plt.subplots(2, 5, figsize=(10, 5))
for i, ax in enumerate(axes.flat):
    x, y = X_test[i], y_test[i]
    prediction = np.argmax(net.feedforward(x))
    ax.imshow(x.reshape(28, 28), cmap='gray')
    ax.set_title(f"Pred: {prediction}, True: {np.argmax(y)}")
    ax.axis('off')
plt.show()

## Conclusion
In this notebook, we implemented a simple feedforward neural network to classify handwritten digits from the MNIST dataset. The network achieved reasonable accuracy, demonstrating the power of even basic neural networks.

---

## Next Steps
1. Experiment with different network architectures (e.g., more layers, more neurons).
2. Try different activation functions (e.g., ReLU).
3. Add regularization techniques like dropout or L2 regularization.