In [None]:
# @title
from IPython.display import display, HTML

display(HTML("""
<script>
const firstCell = document.querySelector('.cell.code_cell');
if (firstCell) {
  firstCell.querySelector('.input').style.pointerEvents = 'none';
  firstCell.querySelector('.input').style.opacity = '0.5';
}
</script>
"""))

html = """
<div style="display:flex; flex-direction:column; align-items:center; text-align:center; gap:12px; padding:8px;">
  <h1 style="margin:0;">ðŸ‘‹ Welcome to <span style="color:#1E88E5;">Algopath Coding Academy</span>!</h1>

  <img src="https://raw.githubusercontent.com/sshariqali/mnist_pretrained_model/main/algopath_logo.jpg"
       alt="Algopath Coding Academy Logo"
       width="400"
       style="border-radius:15px; box-shadow:0 4px 12px rgba(0,0,0,0.2); max-width:100%; height:auto;" />

  <p style="font-size:16px; margin:0;">
    <em>Empowering young minds to think creatively, code intelligently, and build the future with AI.</em>
  </p>
</div>
"""

display(HTML(html))

## **1. Problem Statement**

**Objective**

The goal is to develop a neural network model that can accurately classify images of clothing items into their respective categories. Given a grayscale image of a clothing item, our model should predict which category it belongs to among 10 different classes.

**Dataset: Fashion MNIST**

Fashion MNIST is a dataset of Zalando's article images, consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

<div align="center">
  <img src="https://github.com/zalandoresearch/fashion-mnist/raw/master/doc/img/fashion-mnist-sprite.png" width="600"/>
</div>

**The 10 Classes:**

| Label | Description |
|-------|-------------|
| 0     | T-shirt/top |
| 1     | Trouser     |
| 2     | Pullover    |
| 3     | Dress       |
| 4     | Coat        |
| 5     | Sandal      |
| 6     | Shirt       |
| 7     | Sneaker     |
| 8     | Bag         |
| 9     | Ankle boot  |

**Dataset Properties:**
- **Training images:** 60,000
- **Test images:** 10,000
- **Image size:** 28x28 pixels
- **Color:** Grayscale (1 channel)
- **Pixel values:** 0-255 (0 = black, 255 = white)

**Methodology**

To address this problem, we will create a `Multi-Layer Neural Network model` using `PyTorch` to implement `Image Classification` - a Machine Learning task.

**Tools**
- **NumPy:** A library for scientific computing, mainly involving linear algebra operations.
- **Matplotlib:** A library for plotting and visualizing data.
- **PyTorch:** A library for flexibility and speed when building deep learning models.
- **torchvision:** PyTorch's computer vision library for datasets and transformations.

---

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

print(f"PyTorch version: {torch.__version__}")

In [None]:
# Check if CUDA is available

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

if device == 'cuda':
    print(f"GPU: {torch.cuda.get_device_name(0)}")
else:
    print("GPU not available, using CPU")

In [None]:
# Hyperparameters

batch_size = 64
num_epochs = 10
learning_rate = 0.001

## **2. Loading and Exploring the Dataset**

In [None]:
# Load the Fashion-MNIST Train dataset

train_data = pd.read_csv('fashion-mnist_train.csv')
train_data

In [None]:
# Loading Train Labels

labels_train = torch.tensor(train_data['label'].to_list(), dtype = torch.long)
labels_train.shape

In [None]:
# Loading Train Images

images_train = train_data.drop(columns = ['label']).values
images_train = torch.tensor(images_train, dtype = torch.float32)
images_train = images_train.reshape(-1, 28, 28)
images_train.shape

**Custom Dataset Class**

In PyTorch, creating a custom `Dataset` class is a best practice for organizing data loading logic. It provides several benefits:

1.  **Organization:** It keeps data loading, preprocessing, and augmentation logic in one place, making the code cleaner and more maintainable.
2.  **Memory Efficiency:** Instead of loading the entire dataset into memory at once, you can load data lazily (on-demand) within the class.
3.  **Standard Interface:** It allows your data to work seamlessly with PyTorch's `DataLoader`.

The two essential methods provide specific functionality:

*   **`__len__(self)`**: Returns the total number of items in the dataset. This allows the `DataLoader` to know how many samples are available and how many batches it can create.
*   **`__getitem__(self, idx)`**: Allows the dataset to be indexed like a list (e.g., `dataset[0]`). It retrieves a single sample and its corresponding label at the given index, which is crucial for the `DataLoader` to fetch mini-batches during training.

In [None]:
# Defining a custom Dataset class

class FashionDataset(Dataset):

    def __init__(self, csv_file):
        data = pd.read_csv(csv_file)
        self.labels = torch.tensor(data.iloc[:, 0].to_numpy(), dtype = torch.long)
        self.images = torch.tensor(data.iloc[:, 1:].to_numpy().reshape(-1, 28, 28), dtype = torch.float32)

    # Implementing the __len__ method
    def __len__(self):
        return len(self.labels)

    # Implementing the __getitem__ method
    def __getitem__(self, idx):
        image = self.images[idx]
        label = self.labels[idx]

        return image, label

In [None]:
# Using the FashionDataset class to load Train and Test datasets

train_dataset = FashionDataset('fashion-mnist_train.csv')
test_dataset = FashionDataset('fashion-mnist_test.csv')

print(f"CSV Training set size: {len(train_dataset)}")
print(f"CSV Test set size: {len(test_dataset)}")

In [None]:
# Check Train Images and Labels shapes
print("Train Images shape:", train_dataset.images.shape)
print("Train Labels shape:", train_dataset.labels.shape)

In [None]:
# Check Test Images and Labels shapes
print("Test Images shape:", test_dataset.images.shape)
print("Test Labels shape:", test_dataset.labels.shape)

In [None]:
# Accessing the first image and label using __getitem__ (indexing)
first_image, first_label = train_dataset[1]

# Visualizing the first image
plt.imshow(first_image, cmap = 'gray')
plt.title(f"Label: {first_label}")
plt.show()

**Data Loaders**

A DataLoader wraps a dataset and provides:
- **Batching:** Groups multiple samples together for efficient training
- **Shuffling:** Randomizes the order of samples to improve learning
- **Parallel loading:** Loads data in the background while the model trains

In [None]:
# Create data loaders

train_loader = DataLoader(
    train_dataset,
    batch_size = batch_size,
    shuffle = True,           # Shuffle the training data
)

test_loader = DataLoader(
    test_dataset,
    batch_size = batch_size,
    shuffle = False,          # Don't shuffle test data
)

print(f"Number of training batches: {len(train_loader)}")
print(f"Number of test batches: {len(test_loader)}")

## **3. Checking Class Distribution**

In [None]:
labels = [label for _, label in train_dataset]
unique, counts = np.unique(labels, return_counts = True)
print("Unique labels:", unique)
print("Counts:", counts)

**Balanced Dataset:** Each class has exactly 6,000 samples in the training set, making this a perfectly balanced dataset. This is ideal for training as the model won't be biased toward any particular class.

## **4. Understanding Classification vs Regression**

In our previous notebook, we built a model to predict exam scores - a **regression** task. Now we're building a model to classify clothing items - a **classification** task. What's the difference?

**Regression vs Classification:**

| Aspect | Regression | Classification |
|--------|-----------|---------------|
| **Output Type** | Continuous numerical value | Discrete category/class |
| **Examples** | Predicting exam scores (0-100), house prices, temperature | Identifying clothing type, spam detection, disease diagnosis |
| **Previous Task** | Exam Score: 67.5, 89.2, 54.8, etc. | - |
| **Current Task** | - | Clothing Type: T-shirt, Trouser, Dress, etc. |
| **Output Range** | Any real number (e.g., -âˆž to +âˆž) | Fixed set of categories (e.g., 0-9 for our 10 classes) |
| **Loss Function** | Mean Squared Error (MSE) | ? |
| **Activation Function** | Identity (linear) | ? |
| **Evaluation Metrics** | MSE | ? |

## **5. Implementing the Neural Network Model**

Now let's implement our multi-layer neural network using PyTorch. We'll create a class that inherits from `nn.Module`, just like we did with the Perceptron, but this time with multiple layers.

**Key Components:**
- `nn.Linear`: Implements Weights and Biases to perform $y = xW^T + b$
- `nn.ReLU`: ReLU activation function (does something to the Output $y)
- `forward()`: This is the "logic" hub. It should take the output from nn.Linear and immediately pass it through nn.ReLU.

In [None]:
class FashionMNISTNet(nn.Module):
    
    def __init__(self):
        """
        Initialize the neural network architecture
        """
        super(FashionMNISTNet, self).__init__()
        
        # Input layer to Hidden layer 1
        # Input: 784 pixels (28x28), Output: 128 neurons
        self.fc1 = nn.Linear(28 * 28, 128)
        
        # Hidden layer 1 to Hidden layer 2
        # Input: 128 neurons, Output: 64 neurons
        self.fc2 = nn.Linear(128, 64)
        
        # Hidden layer 2 to Output layer
        # Input: 64 neurons, Output: 10 classes
        self.fc3 = nn.Linear(64, 10)
        
        # ReLU activation function
        self.relu = nn.ReLU()
        
    def forward(self, x):
        """
        Forward pass: define how data flows through the network
        
        Args:
            x: Input images (batch_size, 1, 28, 28)
        
        Returns:
            Output logits (batch_size, 10)
        """
        # Flatten the image from (batch_size, 1, 28, 28) to (batch_size, 784)
        x = x.reshape(-1, 28 * 28)
        
        # Layer 1: Linear transformation + ReLU activation
        x = self.fc1(x)      # (batch_size, 784) -> (batch_size, 128)
        x = self.relu(x)     # Apply ReLU activation
        
        # Layer 2: Linear transformation + ReLU activation
        x = self.fc2(x)      # (batch_size, 128) -> (batch_size, 64)
        x = self.relu(x)     # Apply ReLU activation
        
        # Output layer: Linear transformation (no activation here)
        # Softmax will be applied automatically by the loss function
        x = self.fc3(x)      # (batch_size, 64) -> (batch_size, 10)
        
        return x

In [None]:
# Create an instance of our model
model = FashionMNISTNet().to(device)  # Move model to GPU if available
model

In [None]:
# Count total parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

**Understanding the Parameter Count:**

Let's break down where all these parameters come from:

1. **Layer 1 (fc1):** 784 â†’ 128
   - Weights: 784 Ã— 128 = 100,352
   - Biases: 128
   - Total: 100,480 parameters

2. **Layer 2 (fc2):** 128 â†’ 64
   - Weights: 128 Ã— 64 = 8,192
   - Biases: 64
   - Total: 8,256 parameters

3. **Layer 3 (fc3):** 64 â†’ 10
   - Weights: 64 Ã— 10 = 640
   - Biases: 10
   - Total: 650 parameters

**Grand Total:** 100,480 + 8,256 + 650 = **109,386 parameters**

Each of these parameters will be learned during training to minimize our loss function!

In [None]:
# Define the loss function
criterion = nn.CrossEntropyLoss()

In [None]:
# Define the optimizer

optimizer = optim.Adam(model.parameters(), lr = learning_rate)

Now comes the exciting part - training our neural network! The training process is similar to what we did with the perceptron, but with some important differences:

**Training Loop Components:**

1. **Epochs:** Complete passes through the entire training dataset
2. **Batches:** Process multiple images at once (faster and more stable than one at a time)
3. **Forward Pass:** Feed data through the network to get predictions
4. **Loss Calculation:** Measure how wrong the predictions are
5. **Backward Pass:** Calculate gradients (how to adjust each parameter)
6. **Parameter Update:** Use optimizer to adjust weights and biases

**Why Train in Batches?**

Instead of using all 60,000 images at once (too memory-intensive) or one image at a time (too slow and unstable), we use **mini-batches** of 64 images:

- **Computational Efficiency:** GPUs are optimized for parallel processing
- **Memory Management:** Fits in GPU/CPU memory
- **Better Gradients:** Averaging over a batch gives more stable gradient estimates
- **Faster Convergence:** Updates happen more frequently than full-batch training

In [None]:
train_losses = []
train_accuracies = []

for epoch in range(num_epochs):

    model.train()  # Set model to training mode
    running_loss = 0.0
    correct = 0
    total = 0
    
    # Iterate through batches
    for batch_idx, (images, labels) in enumerate(train_loader):
        # Move data to device (GPU if available)
        images, labels = images.to(device), labels.to(device)
        
        # 1. Forward pass: compute predictions
        outputs = model(images)
        
        # 2. Calculate loss
        loss = criterion(outputs, labels)
        running_loss += loss.item()
        
        # 3. Backward pass: compute gradients
        loss.backward()        # Compute new gradients
        
        # 4. Update parameters
        optimizer.step()
        optimizer.zero_grad()  # Clear previous gradients
        
        # Calculate accuracy
        _, predicted = torch.max(outputs.data, 1)  # Get class with highest probability
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    
    # Calculate average loss and accuracy for this epoch
    epoch_loss = running_loss / len(train_loader)
    epoch_accuracy = 100 * correct / total
    
    train_losses.append(epoch_loss)
    train_accuracies.append(epoch_accuracy)
    
    # Print progress
    print(f"Epoch [{epoch+1}/{num_epochs}] | "
            f"Loss: {epoch_loss:.4f} | "
            f"Accuracy: {epoch_accuracy:.2f}%")

print("="*70)
print("\nTraining complete!")

## **6. Evaluating on the Test Set**

Training accuracy tells us how well the model performs on data it has seen. But the real test is: **Can it generalize to new, unseen data?**

This is why we have a separate **test set** - 10,000 images the model has never seen during training.

**Key Concepts:**

1. **Generalization:** The ability to perform well on new data
2. **Overfitting:** When training accuracy is high but test accuracy is low (model memorized training data)
3. **Underfitting:** When both training and test accuracy are low (model is too simple)
4. **Good Fit:** When both training and test accuracy are high and similar

<div align="center">
  <img src="https://miro.medium.com/v2/resize:fit:1400/1*_7OPgojau8hkiPUiHoGK_w.png" width="600"/>
</div>

**What We're Measuring:**
- **Accuracy:** Percentage of correct predictions
- **Per-Class Performance:** How well the model performs on each clothing type
- **Confusion Matrix:** Where the model makes mistakes

In [None]:
model.eval()  # Set model to evaluation mode
    
correct = 0
total = 0

all_predictions = []
all_labels = []

# Don't compute gradients during evaluation (saves memory and computation)
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device) # If GPU is available
        
        # Forward pass
        outputs = model(images)
        
        # Get predictions
        _, predicted = torch.max(outputs.data, 1)
        
        # Overall accuracy
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        
        # Store for confusion matrix
        all_predictions.extend(predicted.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

# Calculate overall accuracy
overall_accuracy = 100 * correct / total

## **7. Conclusion and Key Takeaways**

Congratulations! You've successfully built, trained, and evaluated a multi-layer neural network for image classification. Let's summarize what we've learned:

**ðŸŽ¯ What We Accomplished:**

1. âœ… Loaded and explored the Fashion MNIST dataset (70,000 images)
2. âœ… Built a 3-layer neural network with 109,386 parameters
3. âœ… Trained the model using Cross-Entropy Loss and Adam optimizer
4. âœ… Achieved ~85-90% accuracy on unseen test data
5. âœ… Analyzed performance using confusion matrices and visualizations

**ðŸ”‘ Key Concepts Learned:**

1. **Classification vs Regression:**
   - Classification predicts discrete categories
   - Requires different loss functions (Cross-Entropy) and activations (Softmax)

2. **Multi-Layer Neural Networks:**
   - Stack layers to learn hierarchical features
   - Use activation functions (ReLU) for non-linearity
   - More layers = more complex patterns can be learned

3. **Training Process:**
   - Forward pass â†’ Loss calculation â†’ Backward pass â†’ Parameter update
   - Mini-batch training for efficiency
   - Monitoring loss and accuracy to track learning

4. **Evaluation:**
   - Test set measures generalization ability
   - Confusion matrix reveals where errors occur
   - Per-class accuracy shows strengths and weaknesses

**ðŸ’¡ Important Insights:**

- **Simple items** (Trousers, Bags, Sneakers) are easier to classify
- **Similar items** (T-shirt vs Shirt, Pullover vs Coat) get confused
- **Architecture matters:** More layers and neurons generally improve performance
- **Hyperparameters** (learning rate, batch size, epochs) significantly impact results

---

**ðŸŒŸ You've now mastered the fundamentals of neural networks! Keep exploring and building more complex models!**

## **8. Challenge Exercises (HomeWork)**

Ready to test your understanding? Try these challenges:

**Challenge 1: Modify the Architecture**
- Change the network to have 4 layers instead of 3
- Try different neuron counts (e.g., 256 â†’ 128 â†’ 64 â†’ 10)
- Compare the results with the original architecture

**Challenge 2: Experiment with Hyperparameters**
- Train for 20 epochs instead of 10
- Try different learning rates (0.0001, 0.01, 0.1)
- Change the batch size (32, 128, 256)

**Challenge 3: Analyze Specific Classes**
- Focus on the two classes with lowest accuracy
- Visualize 20 misclassified examples from these classes
- Can you spot patterns in why the model fails?

**Challenge 4: Save and Load the Model**
```python
# Save the model
torch.save(model.state_dict(), 'fashion_mnist_model.pth')

# Load the model
loaded_model = FashionMNISTNet().to(device)
loaded_model.load_state_dict(torch.load('fashion_mnist_model.pth'))
```

**Challenge 5: Try Different Optimizers**
- Replace Adam with SGD with momentum
- Compare training curves and final accuracy
- Which optimizer works better for this task?

---

Good luck, and happy coding! ðŸš€