# Lab 10: Stochastic, Batch, and Mini-Batch Gradient

In this coding lab, we will explore the implementation of different kinds of Gradient Descent: ***Stochastic***, ***Batch***, and ***Mini-Batch***. We will use the the ***same single-layer neural network*** we made in the last lab to classify observations in the Iris dataset.


***Note***: This lab is structured like our Decision Tree lab. There won't be many "<font color='red'>**TRY IT**</font> &#x1f9e0;s", but rather, we will walk through this live together. If you are following along from outside of our course, try to fill in wherever you see "XXXX". If you get stuck, refer to the answer key that's also posted in the `Introduction-to-ML` repository.

### Data Preparation

This section of code is responsible for preparing the Iris dataset for training a neural network. We will: **Load the Data**, perform **One-Hot Encoding**, perform **Feature Normalization**, and create the **Train-Test Split**. Together, these ensure that the data is in the right format and scale for training the neural network.

**Note**: The following code block is the same as that in `09_Neural_Networks.ipynb`.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# One-Hot Encoding the target labels
y = y.reshape(-1, 1)
encoder = OneHotEncoder(sparse_output=False)
y_encoded = encoder.fit_transform(y)

# Normalize the features
scaler = StandardScaler()
X_normalized = scaler.fit_transform(X)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_normalized, y_encoded, test_size=0.3, random_state=42)

### Define the Neural Network Structure
We'll use ***the same simple one-layer neural network*** as we did in `09_Neural_Networks.ipynb` but we will add the ability to perform stochastic, batch, and mini-batch gradient descent.

A few changes should be noted:
- We have set some seeds (`np.random.seed(42)`) so that our work is reproducible.
- We have added new parameters to the `train()` function.
- We have added new code within the `train()` function to allow for the different kinds of gradient descent.

All other code is largely the same as that in `09_Neural_Networks.ipynb`.

In [None]:
class SimpleNeuralNetwork:
    def __init__(self, input_size, output_size, learning_rate=0.01):
        """
        Initialize the neural network with random weights and biases.
        """
        # Set the seed for reproducibility
        np.random.seed(42)  # This ensures all random processes (e.g., weight initialization) are the same each time

        self.learning_rate = learning_rate
        self.weights = np.random.randn(input_size, output_size) * 0.01  # Small random initialization
        self.bias = np.zeros(output_size)  # Initialize biases to zeros

    def softmax(self, z):
        """Compute softmax values for each class in z."""
        exp_z = np.exp(z - np.max(z))  # Stability improvement (to prevent overflow)
        return exp_z / exp_z.sum(axis=1, keepdims=True)

    def forward(self, X):
        """Forward pass: computes predicted class probabilities."""
        z = np.dot(X, self.weights) + self.bias
        return self.softmax(z)

    def compute_loss(self, y_pred, y_true):
        """Compute the cross-entropy loss."""
        m = y_true.shape[0]  # Number of samples
        # Get the index of the true class for each sample
        true_class_indices = np.argmax(y_true, axis=1)

        # Calculate log-likelihood using the true class indices
        log_likelihood = -np.log(y_pred[range(m), true_class_indices])
        loss = np.sum(log_likelihood) / m  # Average loss

        return loss

    def backward(self, X, y_true, y_pred, batch_size=1):
        """Backward pass: computes gradients."""
        m = y_true.shape[0]  # Number of samples
        true_class_indices = np.argmax(y_true, axis=1)  # True class indices for each sample

        # Compute gradient of loss with respect to softmax output (cross-entropy)
        y_pred[range(m), true_class_indices] -= 1
        dw = np.dot(X.T, y_pred)  # Gradient for weights
        db = np.sum(y_pred, axis=0)  # Gradient for bias

        # For batch or mini-batch, normalize gradients by batch size
        dw /= batch_size  # Normalize by batch size
        db /= batch_size  # Normalize by batch size

        return dw, db

    def update_parameters(self, dw, db):
        """Update weights and biases using computed gradients."""
        self.weights -= self.learning_rate * dw  # Adjust weights
        self.bias -= self.learning_rate * db  # Adjust biases

    def train(self, X, y, epochs=1000, XXXX, XXXX):
        """
        Train the neural network using the provided data.
        """
        loss_history = []  # To store loss values
        m = X.shape[0]  # Number of samples

        # Shuffle the data with a fixed seed before each epoch
        np.random.seed(42)  # Ensure reproducibility in data shuffling
        indices = np.random.permutation(m)
        X_shuffled = X[indices]
        y_shuffled = y[indices]

        for epoch in range(epochs):
            total_loss = 0  # Variable to accumulate loss for this epoch

            if gd_type == "XXXX":
                # Forward pass for the entire dataset
                y_pred = self.forward(X_shuffled)
                loss = self.compute_loss(y_pred, y_shuffled)

                # Backward pass and update parameters
                dw, db = self.backward(X_shuffled, y_shuffled, y_pred, batch_size=m)
                self.update_parameters(dw, db)
                total_loss += loss

            elif gd_type == "stochastic":
                for XXXX:
                    # get one sample
                    X_i = XXXX
                    y_i = XXXX

                    # forward pass
                    y_pred = self.forward(X_i)
                    loss = self.compute_loss(y_pred, y_i)

                    # backward pass (batch size of 1 for SGD)
                    dw, db = self.backward(X_i, y_i, y_pred, batch_size=1)
                    self.update_parameters(dw, db)
                    total_loss += loss

            elif gd_type == "XXXX":
                for XXXX:
                    # get one batch
                    X_batch = XXXX
                    y_batch = XXXX

                    # forward pass
                    y_pred = self.forward(X_batch)
                    loss = self.compute_loss(y_pred, y_batch)

                    # backward pass (use actual batch size)
                    dw, db = self.backward(X_batch, y_batch, y_pred, batch_size=batch_size)
                    self.update_parameters(dw, db)
                    total_loss += loss

            # Average loss for the epoch
            avg_loss = XXXX / XXXX
            loss_history.append(avg_loss)

            # Print loss every few epochs for monitoring
            if epoch % 10 == 0:
                print(f'Epoch {epoch}, Loss: {avg_loss:.4f}')

        # Plot the loss over epochs
        plt.plot(loss_history)
        plt.title('Loss over Epochs')
        plt.xlabel('Epochs')
        plt.ylabel('Loss')
        plt.grid()
        plt.show()

### Train the Neural Network

Specify the number of epochs, batch size, and type of gradient descent.

In [None]:
# Create and train the neural network
input_size = X_train.shape[1]  # this is the number of features
output_size = y_train.shape[1]  # this is the number of classes
nn = SimpleNeuralNetwork(input_size, output_size, learning_rate=0.01)

# Re-run training
nn.train(X=X_train, y=y_train, epochs=1000, batch_size=10, gd_type="mini-batch")

### Step 4: Making Predictions
After training, we can use our neural network to make predictions. We just call the `forward` function we made to run the data through the neural network. Then we take the max of the output probabilities to get the most likely class each observation belongs to.

In [None]:
# Evaluate the model on the test set
y_pred = nn.forward(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)
y_test_classes = np.argmax(y_test, axis=1)

accuracy = accuracy_score(y_test_classes, y_pred_classes)
print(f'Accuracy: {accuracy:.2f}')

**How were your results? Not as good as you'd like?**

<font color='red'>**TRY IT**</font> &#x1f9e0;: Try changing the ***gradient descent type***, ***batch size***, or ***number of epochs***. Still not happy with your results? Change the ***learning rate*** and see what happens!