# <font color = 'orange'>**MultiClass Classification using Mini Batch Gradient Descent with PyTorch**</font>

In this project, I’m building a multiclass classification model using PyTorch and **Mini-Batch Gradient Descent**.

- **Data Generation**: I’m generating a synthetic dataset using `make_classification` with:
  - **1000 samples**, each with **5 features**.
  - **3 classes** for classification.
  - **4 informative features** and **1 redundant feature** (redundant means it can be derived from the other features).
  - **random_state=0** to keep things reproducible.

- **Custom Dataset Class**: I’ll convert the data into PyTorch tensors and create a custom dataset class to handle mini-batch loading efficiently.

- **Mini-Batch Gradient Descent**: Here’s the five-step training process I’ll use:
  1. **Initialize weights** randomly.
  2. **Forward pass**: Get predictions for each mini-batch.
  3. **Compute loss** for the mini-batch.
  4. **Backpropagate** to calculate gradients.
  5. **Update weights** based on the gradients from each mini-batch.

- **Training Loop**: I’ll repeat this over multiple epochs. Each epoch will process the dataset in mini-batches, updating the weights after every batch.

- **Hyperparameters**: I’ll fine-tune the learning rate, mini-batch size, and number of epochs based on performance during training.

Lastly, I’ll write functions for steps 1 to 4 to streamline the training process using **Mini-Batch Gradient Descent**.


In [None]:
# importing the make_classification function from the sklearn.datasets module used to generate synthetic data
from sklearn.datasets import make_classification

# importing the Standard Scaler for standardizing features by removing the mean and scaling to unit variance.
from sklearn.preprocessing import StandardScaler

# Importing the main torch library for constructing Neural Networks
import torch

# Import the torch.nn module which contains pre-defined layers, loss functions etc for Neural Networks
import torch.nn as nn

# Importing the torch.optim module from torch which contains various optimization algorithms like SGD, Adam, etc
import torch.optim as optim

# Importing the torch.functional moducle from PyTorch which contains functional forms of layers, loss function and other operations
import torch.nn.functional as F

# Import the DataLoader and Dataset classes from pytorch utils; Dataloader helps use with batching, shuffling and loading data in parallel.
# Dataset provides an abstract interface for easier data maniupulation and handling.
from torch.utils.data import DataLoader, Dataset


In [None]:
# Generate a synthetic dataset for classification using make_classification function.
# Parameters:
# - n_samples=1000: The total number of samples in the generated dataset.
# - n_features=5: The total number of features for each sample.
# - n_classes=3: The number of classes for the classification task.
# - n_informative=4: The number of informative features, i.e., features that are actually useful for classification.
# - n_redundant=1: The number of redundant features, i.e., features that can be linearly derived from informative features.
# - random_state=0: The seed for the random number generator to ensure reproducibility.

X, y = make_classification(n_samples=1000, n_features=5, n_classes=3, n_informative=4, n_redundant=1, random_state=0)

In [None]:
# initilializing the preprocessor for the feature scaling
preprocessor = StandardScaler()

X = preprocessor.fit_transform(X) # Running both the fit and transform on the dataset for the features before splitting

In [None]:
print(X.shape, y.shape)

(1000, 5) (1000,)


In [None]:
X[0:5]

array([[-0.39443436, -0.78033571, -0.25005511,  0.09118536, -0.5690698 ],
       [ 0.64284479, -0.95837057,  0.83598996, -0.08438568,  0.50539358],
       [ 0.99102498,  0.8580679 ,  0.78786062, -0.9114329 ,  1.62615938],
       [-0.96923966,  0.86168226, -1.31837608, -1.22844863, -0.07591589],
       [ 0.96021518,  0.99206623,  1.0026402 , -0.25339161,  1.18831784]])

In [None]:
y[0:10]

array([2, 0, 1, 2, 1, 1, 0, 2, 0, 0])

### <font color = 'orange'>**Input and True Label Observations**
- <font color = 'orange'>**Inputs:**</font> Every row represents one sample in X and the randomized feature values for that sample.
- <font color = 'orange'>**True Labels:**</font> We have a one row matrix with 1000 labels 0, 1, or 2 based on the indicated class.

In [None]:
# Convert numpy arrays X and y to PyTorch Tensors for multiclass classification.
# X is converted to a float tensor, and y is converted to a long integer tensor as required for CrossEntropyLoss.

x_tensor = torch.tensor(X, dtype=torch.float, requires_grad=True)
y_tensor = torch.tensor(y, dtype=torch.long)  # y needs to be long for CrossEntropyLoss



## <font color='orange'>**Custom PyTorch Dataset Class**</font>

In this part of the project, I’m creating a custom `Dataset` class to handle my training data efficiently. The class is designed to work with PyTorch’s DataLoader, which helps manage mini-batch loading during training.

- **`__init__`**: This is the constructor that initializes the dataset with the input features (`X`) and the labels (`y`). This allows me to easily pass the numpy arrays or tensors into the dataset object.
  
- **`__len__`**: This method returns the total number of samples in the dataset, which is simply the number of labels. PyTorch uses this to know how many data points exist when batching.
  
- **`__getitem__`**: This method is used to retrieve a specific sample (feature-label pair) by its index. It’s how PyTorch’s DataLoader will access individual data points when building batches during training.

By building this custom class, I can make sure that my data is easily accessible and ready for PyTorch’s training loop.


In [None]:
# Definining a custom PyTorch DataSet class for handling my data
class MyDataset(Dataset):
    # Constructor: inititializes the dataset with features and labels
    def __init__(self, X, y):
        self.features = X
        self.labels = y

    # Method to return length of the dataset
    def __len__(self):
        return self.labels.shape[0] # pulls the shape value of the 1st index

    # Method to get a data point by the index
    def __getitem__(self, index):
        x = self.features[index]
        y = self.labels[index]
        return x, y

## <font color='orange'>**Creating a Dataset Instance**</font>

Here, I’m creating an instance of my custom `MyDataset` class by passing in the feature (`x_tensor`) and label (`y_tensor`) tensors. This prepares my data to be used by PyTorch’s DataLoader for training.


In [None]:
train_dataset = MyDataset(X=x_tensor, y = y_tensor)

In [None]:
# Accessing the first element (feature - label pair from the train_dataset using indexing)
# The __getitem__method of the MyDataset class is called to return the element.
train_dataset[0]

(tensor([-0.3944, -0.7803, -0.2501,  0.0912, -0.5691],
        grad_fn=<SelectBackward0>),
 tensor(2))

- <font color = 'orange'>**Above we can see the element with the five feature values and the class label of 2**</font>

## <font color='orange'>**Creating a DataLoader**</font>

Now, I’m creating a `DataLoader` from my dataset, with a batch size of 16 and `shuffle=True` to randomize the order of samples during training. This helps in efficiently loading data in mini-batches.


In [None]:
train_loader = DataLoader(train_dataset, batch_size = 16, shuffle = True)

## <font color='orange'>**Defining the Model**</font>

This model takes a 5-dimensional input and outputs 3 dimensions, which matches the number of classes in my classification task. The `bias=True` means that bias terms will be added during the transformation.


In [None]:
model = nn.Linear(in_features = 5 , out_features =3, bias = True)


## <font color='orange'>**Using CrossEntropy Loss**</font>

I’m using PyTorch’s built-in `CrossEntropyLoss`, which combines the softmax function with cross-entropy in one step. This means I don’t need to apply softmax separately.


In [None]:
loss_function = nn.CrossEntropyLoss()

## <font color='orange'>**Custom Weight Initialization**</font>

This function initializes the weights and biases for `nn.Linear` layers. The weights are initialized with a normal distribution (mean=0, std=0.05), and the bias terms are set to zero. It ensures that each linear layer starts with a specific, controlled initialization.


In [None]:
def init_weights(layer):
    # Checking to see if the layer is of type nn.Linear
    if type(layer) == nn.Linear:
        # initialize the weights with a normal distribution, centered at 0 with a standard dev of 0.05
        torch.nn.init.normal_(layer.weight, mean = 0, std = 0.05)
        # initializing the bias terms to zero
        torch.nn.init.zeros_(layer.bias)


## <font color='orange'>**Training Function**</font>

This function trains the neural network over a specified number of epochs using the provided model, loss function, and optimizer.

1. **Loop through epochs**: The training process repeats for the number of epochs defined.
2. **Loop through batches**: For each batch in the dataset:
    - Move the input and target tensors to the appropriate device (CPU/GPU).
    - Perform the forward pass to get predictions from the model.
    - Compute the loss based on the model's predictions and actual targets.
3. **Backpropagation**:
    - Zero the gradients, compute new gradients, and update the model parameters using the optimizer.
4. **Track performance**:
    - Calculate running loss and correct predictions for each batch.
    - Compute average loss and accuracy over the entire dataset for the epoch.
5. **Display metrics**: Print out the loss and accuracy for each epoch to monitor training progress.


In [None]:
# Function to train a neural network model.
# Arguments include epochs, loss_function, learning_rate, model_architechture, and optimizer

def train(epochs, loss_function, learning_rate, model, optimizer):

    # Loop for each training cycle
    for epoch in range(epochs):
        # initializing variables to store the training loss and correct prediction count for each epoch
        running_train_loss = 0
        running_train_correct = 0
        # looping through every batch in training dataset using the train_loader
        for x, y in train_loader:
            # moving the input and target tensors to the device
            x = x.to(device)

            targets = y.to(device)

            # Calculating the forward pass
            output = model(x)

            # Calculating the loss
            loss = loss_function(output, targets)

            # Zero out the gradients from the previous iteration
            optimizer.zero_grad()
            # Backward pass to compute the gradients
            loss.backward()

            # updating the parameters
            optimizer.step()

            # Accumulating the loss for the batch

            running_train_loss += loss.item()

            # Evaluating the performance with backpropogation
            with torch.no_grad():
                y_pred = torch.argmax(output, dim = 1) # pulls the predictions based on the maximum class probability after passing through softmax.
                correct = (y_pred == targets).sum().item() #calclates the sum of the items where y_pred == targets accumulating the correct predictions for this batch
                running_train_correct += correct # accumulates the number of correct predictions


            train_loss = running_train_loss/len(train_loader) # Computing the average training loss for each epoch by taking each batch and averaging for that epoch
            train_accuracy = running_train_correct / len(train_loader.dataset) # calculating the accuracy accross the whole dataset, in this example we would be dividing by 1000

            # Displaying the training loss and accuracy for the current epoch
            print(f'Epoch : {epoch + 1} / {epochs}')
            print(f'Train Loss: {train_loss:.4f} | Train Accuracy: {train_accuracy * 100:.4f}%')



## <font color='orange'>**Setting Up and Training the Model**</font>

1. **Random Seed**: Fixes the random seed for reproducibility.
2. **Epochs**: Sets the number of training epochs to 5.
3. **Device Configuration**: Detects if a GPU is available and uses it; otherwise defaults to CPU.
4. **Learning Rate**: Sets the learning rate to 1.
5. **Optimizer**: Configures the optimizer as SGD using the model’s parameters and the defined learning rate.
6. **Model to Device**: Moves the model to the appropriate device (CPU or GPU).
7. **Custom Weight Initialization**: Applies a custom weight initialization function across all layers of the model.
8. **Training Process**: Starts the training process with the specified epochs, loss function, model, and optimizer.


In [None]:
# Fixing a random seed for reproducibility across different runs
torch.manual_seed(100)

# Defining the number of epochs
epochs = 5

# Detecting to see if the GPU is available and use it, otherwise use the CPU
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)

# defining the learning rate
learning_rate = 1

# Configuring the optimizer for SGD using the model parameters and lr = 1
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)

# moving the model to the appropriate device (GPU or CPU)
model.to(device)

# Apply custom weight initialization; this can affect the model's learning trajectory
# The `apply` function recursively applies a function to each submodule in a PyTorch model.
# In the given context, it's used to apply the `init_weights` function to initialize the weights of all layers in the model.
# The benefit is that it provides a convenient way to systematically apply custom weight initialization across complex models,
# potentially improving model convergence and performance.
model.apply(init_weights)

# Kicking of the training process using the specified settings
train(epochs, loss_function, learning_rate, model, optimizer)

cpu
Epoch : 1 / 5
Train Loss: 0.0171 | Train Accuracy: 0.8000%
Epoch : 1 / 5
Train Loss: 0.0340 | Train Accuracy: 1.4000%
Epoch : 1 / 5
Train Loss: 0.0516 | Train Accuracy: 2.3000%
Epoch : 1 / 5
Train Loss: 0.0647 | Train Accuracy: 3.4000%
Epoch : 1 / 5
Train Loss: 0.0784 | Train Accuracy: 4.3000%
Epoch : 1 / 5
Train Loss: 0.0915 | Train Accuracy: 5.3000%
Epoch : 1 / 5
Train Loss: 0.1047 | Train Accuracy: 6.3000%
Epoch : 1 / 5
Train Loss: 0.1179 | Train Accuracy: 7.2000%
Epoch : 1 / 5
Train Loss: 0.1336 | Train Accuracy: 7.9000%
Epoch : 1 / 5
Train Loss: 0.1438 | Train Accuracy: 9.2000%
Epoch : 1 / 5
Train Loss: 0.1515 | Train Accuracy: 10.6000%
Epoch : 1 / 5
Train Loss: 0.1642 | Train Accuracy: 11.7000%
Epoch : 1 / 5
Train Loss: 0.1767 | Train Accuracy: 12.7000%
Epoch : 1 / 5
Train Loss: 0.1896 | Train Accuracy: 13.9000%
Epoch : 1 / 5
Train Loss: 0.2069 | Train Accuracy: 14.6000%
Epoch : 1 / 5
Train Loss: 0.2194 | Train Accuracy: 15.8000%
Epoch : 1 / 5
Train Loss: 0.2293 | Train Accur

### <font color='orange'>**Final model parameters**</font>


In [None]:
# Output the learned parameters (weights and biases) of the model after training
for name, param in model.named_parameters():
  # Print the name and the values of each parameter
  print(name, param.data)


weight tensor([[ 0.4345, -0.9407, -0.5891, -0.4591,  0.7364],
        [ 0.0557,  1.0332,  0.1920,  0.4732,  0.0554],
        [-0.6367, -0.0987,  0.2159,  0.0453, -0.8418]])
bias tensor([-0.2508, -0.0254,  0.2762])
