## 1. Import Required Libraries

We import PyTorch for building and training the neural network, torchvision for dataset loading, and matplotlib for visualization.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
from torchvision import transforms
import matplotlib.pyplot as plt
import numpy as np
import os

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check for GPU availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')
if torch.cuda.is_available():
    print(f'GPU: {torch.cuda.get_device_name(0)}')
print(f'PyTorch version: {torch.__version__}')

## 2. Dataset Preparation

### Data Loading

Fashion-MNIST consists of:
- **Training set:** 60,000 images
- **Test set:** 10,000 images
- **Image size:** 28×28 pixels (grayscale)

### Data Preprocessing

We apply two transformations:
1. **ToTensor()**: Converts PIL images to PyTorch tensors (values 0-1)
2. **Normalize()**: Normalizes pixel values using dataset-specific statistics
   - Mean: 0.2913
   - Std: 0.3552
   - These values were computed from the training set

**Why normalize?** Normalization centers the data around zero and scales it to unit variance, which:
- Speeds up training convergence
- Prevents gradient vanishing/exploding
- Improves model stability

In [None]:
# Data transformation pipeline
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.2913,), (0.3552,))  # Fashion-MNIST specific statistics
])

# Load training dataset
train_dataset = torchvision.datasets.FashionMNIST(
    root='./data',
    train=True,
    download=True,
    transform=transform
)

# Load test dataset
test_dataset = torchvision.datasets.FashionMNIST(
    root='./data',
    train=False,
    download=True,
    transform=transform
)

print(f'Training samples: {len(train_dataset):,}')
print(f'Test samples: {len(test_dataset):,}')
print(f'Total samples: {len(train_dataset) + len(test_dataset):,}')
print(f'Image shape: {train_dataset[0][0].shape}')

### Create Data Loaders

**DataLoaders** handle batching and shuffling:
- **Batch size = 64**: Process 64 images simultaneously for efficiency
- **Shuffle = True** (training): Randomizes order each epoch to prevent overfitting
- **Shuffle = False** (testing): Maintains consistent evaluation

In [None]:
batch_size = 64

train_loader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=2
)

test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=batch_size,
    shuffle=False,
    num_workers=2
)

print(f'Batch size: {batch_size}')
print(f'Training batches: {len(train_loader)}')
print(f'Test batches: {len(test_loader)}')

### Visualize Sample Data

Let's examine a few samples to understand what we're working with.

In [None]:
# Class names for Fashion-MNIST
classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Get a batch of training images
dataiter = iter(train_loader)
images, labels = next(dataiter)

# Display first 10 images
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
fig.suptitle('Sample Training Images', fontsize=14, fontweight='bold')

for idx, ax in enumerate(axes.flat):
    # Denormalize for display
    img = images[idx].squeeze() * 0.3552 + 0.2913
    ax.imshow(img, cmap='gray')
    ax.set_title(f'{classes[labels[idx]]}')
    ax.axis('off')

plt.tight_layout()
plt.show()

## 3. Model Design

### Architecture: FashionNet

We implement a fully connected (dense) neural network with the following architecture:

```
Input (784) → FC1 (128) → ReLU → Dropout(0.2) →
              FC2 (64)  → ReLU → Dropout(0.2) →
              FC3 (10)  → Output
```

**Layer Breakdown:**

1. **Input Layer (784 neurons)**
   - Flattened 28×28 image = 784 pixels

2. **Hidden Layer 1 (128 neurons)**
   - Fully connected (Linear) layer
   - ReLU activation: f(x) = max(0, x)
   - Dropout (20%): Randomly zeros neurons during training to prevent overfitting

3. **Hidden Layer 2 (64 neurons)**
   - Fully connected layer
   - ReLU activation
   - Dropout (20%)

4. **Output Layer (10 neurons)**
   - One neuron per class
   - No activation (CrossEntropyLoss applies softmax internally)

**Design Choices:**
- **ReLU activation:** Fast, effective, prevents vanishing gradients
- **Dropout:** Regularization technique to reduce overfitting
- **Layer sizes:** Progressively smaller (128 → 64 → 10) to extract hierarchical features

In [None]:
class FashionNet(nn.Module):
    """
    Fully connected neural network for Fashion-MNIST classification.
    
    Args:
        dropout (float): Dropout probability (default: 0.2)
    """
    
    def __init__(self, dropout=0.2):
        super(FashionNet, self).__init__()
        
        # Define layers
        self.fc1 = nn.Linear(784, 128)  # Input to hidden 1
        self.fc2 = nn.Linear(128, 64)   # Hidden 1 to hidden 2
        self.fc3 = nn.Linear(64, 10)    # Hidden 2 to output
        
        # Activation and regularization
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(dropout)
    
    def forward(self, x):
        """
        Forward pass through the network.
        
        Args:
            x: Input tensor of shape (batch_size, 1, 28, 28)
        
        Returns:
            Output tensor of shape (batch_size, 10)
        """
        # Flatten image: (batch, 1, 28, 28) -> (batch, 784)
        x = x.view(-1, 784)
        
        # Layer 1: Linear -> ReLU -> Dropout
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        
        # Layer 2: Linear -> ReLU -> Dropout
        x = self.fc2(x)
        x = self.relu(x)
        x = self.dropout(x)
        
        # Output layer (no activation)
        x = self.fc3(x)
        
        return x

# Create model and move to device (CPU or GPU)
model = FashionNet(dropout=0.2).to(device)

# Print model architecture
print(model)
print(f'\nTotal parameters: {sum(p.numel() for p in model.parameters()):,}')

# Calculate model size
param_size = sum(p.numel() * p.element_size() for p in model.parameters())
buffer_size = sum(b.numel() * b.element_size() for b in model.buffers())
size_mb = (param_size + buffer_size) / 1024 / 1024
print(f'Model size: {size_mb:.2f} MB')

## 4. Training Setup

### Loss Function: CrossEntropyLoss

- Combines **softmax** activation and **negative log likelihood** loss
- Standard choice for multi-class classification
- Measures how different predictions are from true labels

### Optimizer: Adam

- **Adaptive learning rate** optimizer
- Combines benefits of AdaGrad and RMSprop
- Learning rate: 0.001 (standard default)
- Automatically adjusts step sizes per parameter

### Training Configuration

- **Epochs: 10** - Complete passes through the training set
- **Batch size: 64** - Number of samples per gradient update
- **Learning rate: 0.001** - Step size for weight updates

In [None]:
# Loss function
criterion = nn.CrossEntropyLoss()

# Optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training configuration
num_epochs = 10

print(f'Loss function: CrossEntropyLoss')
print(f'Optimizer: Adam')
print(f'Learning rate: 0.001')
print(f'Number of epochs: {num_epochs}')
print(f'Batch size: {batch_size}')

## 5. Training and Evaluation Functions

### Training Function (train_epoch)

Performs one complete pass through the training data:

1. **model.train()**: Enables dropout and batch normalization
2. **Forward pass**: Compute predictions
3. **Calculate loss**: Compare predictions to true labels
4. **Backward pass**: Compute gradients
5. **Update weights**: Apply optimizer step

### Evaluation Function (evaluate)

Tests model performance on unseen data:

1. **model.eval()**: Disables dropout, uses all neurons
2. **torch.no_grad()**: Disables gradient computation (saves memory)
3. **Forward pass only**: No weight updates
4. **Calculate metrics**: Loss and accuracy

In [None]:
def train_epoch(model, loader, criterion, optimizer, device):
    """
    Train model for one epoch.
    
    Args:
        model: Neural network model
        loader: Training data loader
        criterion: Loss function
        optimizer: Optimization algorithm
        device: CPU or GPU
    
    Returns:
        avg_loss: Average training loss
        accuracy: Training accuracy (%)
    """
    model.train()  # Set model to training mode
    running_loss = 0.0
    correct = 0
    total = 0
    
    for batch_idx, (images, labels) in enumerate(loader):
        # Move data to device
        images, labels = images.to(device), labels.to(device)
        
        # Zero gradients from previous step
        optimizer.zero_grad()
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward pass
        loss.backward()
        
        # Update weights
        optimizer.step()
        
        # Calculate statistics
        running_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    
    avg_loss = running_loss / len(loader)
    accuracy = 100 * correct / total
    
    return avg_loss, accuracy


def evaluate(model, loader, criterion, device):
    """
    Evaluate model on test/validation set.
    
    Args:
        model: Neural network model
        loader: Test data loader
        criterion: Loss function
        device: CPU or GPU
    
    Returns:
        avg_loss: Average test loss
        accuracy: Test accuracy (%)
    """
    model.eval()  # Set model to evaluation mode
    running_loss = 0.0
    correct = 0
    total = 0
    
    with torch.no_grad():  # Disable gradient computation
        for images, labels in loader:
            # Move data to device
            images, labels = images.to(device), labels.to(device)
            
            # Forward pass only
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            # Calculate statistics
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    avg_loss = running_loss / len(loader)
    accuracy = 100 * correct / total
    
    return avg_loss, accuracy

print("✅ Training and evaluation functions defined")

## 6. Model Training

Now we train the model for the specified number of epochs.

**Training Loop:**
1. Train on entire training set (60,000 images)
2. Evaluate on test set (10,000 images)
3. Record metrics (loss and accuracy)
4. Print progress
5. Repeat for each epoch

**What to expect:**
- Training loss should decrease steadily
- Training accuracy should increase
- Test accuracy should reach ~87-89%
- Small gap between train/test indicates good generalization

In [None]:
# Storage for metrics
train_losses = []
train_accs = []
test_losses = []
test_accs = []

print("=" * 70)
print("TRAINING FASHION-MNIST CLASSIFIER")
print("=" * 70)
print()

# Training loop
for epoch in range(num_epochs):
    # Train for one epoch
    train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
    
    # Evaluate on test set
    test_loss, test_acc = evaluate(model, test_loader, criterion, device)
    
    # Store metrics
    train_losses.append(train_loss)
    train_accs.append(train_acc)
    test_losses.append(test_loss)
    test_accs.append(test_acc)
    
    # Print progress
    print(f'Epoch [{epoch+1:2d}/{num_epochs}] | '
          f'Train Loss: {train_loss:.4f}, Acc: {train_acc:5.2f}% | '
          f'Test Loss: {test_loss:.4f}, Acc: {test_acc:5.2f}%')

print()
print("=" * 70)
print(f"✅ Training Complete!")
print(f"Final Test Accuracy: {test_accs[-1]:.2f}%")
print(f"Overfitting Gap: {train_accs[-1] - test_accs[-1]:.2f}%")
print("=" * 70)

## 7. Save Trained Model

We save the trained model weights to a file for:
- Submission requirements
- Future use without retraining
- Deployment or sharing

**Saved file:** `fashion_model.pth` (contains only model weights, ~400KB)

In [None]:
# Save the trained model
model_path = 'fashion_model.pth'
torch.save(model.state_dict(), model_path)

print(f"✅ Model saved to: {model_path}")
print(f"Final test accuracy: {test_accs[-1]:.2f}%")

# Verify file was created
if os.path.exists(model_path):
    file_size = os.path.getsize(model_path) / 1024  # KB
    print(f"Model file size: {file_size:.2f} KB")
else:
    print("⚠️  Warning: Model file not found!")

### Verify Model Can Be Loaded

Let's confirm the saved model can be loaded correctly and produces the same results.

In [None]:
# Create a new model instance
loaded_model = FashionNet(dropout=0.2).to(device)

# Load the saved weights
loaded_model.load_state_dict(torch.load(model_path))
loaded_model.eval()

# Test the loaded model
test_loss, test_acc = evaluate(loaded_model, test_loader, criterion, device)

print(f"✅ Model loaded successfully")
print(f"Loaded model test accuracy: {test_acc:.2f}%")
print(f"Original model test accuracy: {test_accs[-1]:.2f}%")

if abs(test_acc - test_accs[-1]) < 0.01:
    print("✅ Verification passed: Model loads correctly!")
else:
    print("⚠️  Warning: Accuracy mismatch after loading")

## 8. Results Visualization

Visualizing training progress helps us understand:
- **Loss curves**: Should decrease over time
- **Accuracy curves**: Should increase over time
- **Train/test gap**: Small gap = good generalization, large gap = overfitting

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

epochs_range = range(1, num_epochs + 1)

# Loss plot
ax1.plot(epochs_range, train_losses, 'b-o', label='Train Loss', linewidth=2, markersize=6)
ax1.plot(epochs_range, test_losses, 'r-s', label='Test Loss', linewidth=2, markersize=6)
ax1.set_xlabel('Epoch', fontsize=12)
ax1.set_ylabel('Loss', fontsize=12)
ax1.set_title('Training and Test Loss', fontsize=14, fontweight='bold')
ax1.legend(fontsize=10)
ax1.grid(True, alpha=0.3)

# Accuracy plot
ax2.plot(epochs_range, train_accs, 'b-o', label='Train Accuracy', linewidth=2, markersize=6)
ax2.plot(epochs_range, test_accs, 'r-s', label='Test Accuracy', linewidth=2, markersize=6)
ax2.set_xlabel('Epoch', fontsize=12)
ax2.set_ylabel('Accuracy (%)', fontsize=12)
ax2.set_title('Training and Test Accuracy', fontsize=14, fontweight='bold')
ax2.legend(fontsize=10)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print final metrics
print("\nFinal Metrics:")
print(f"  Training Loss: {train_losses[-1]:.4f}")
print(f"  Test Loss: {test_losses[-1]:.4f}")
print(f"  Training Accuracy: {train_accs[-1]:.2f}%")
print(f"  Test Accuracy: {test_accs[-1]:.2f}%")
print(f"  Overfitting Gap: {train_accs[-1] - test_accs[-1]:.2f}%")

## 9. Model Evaluation

### Sample Predictions

Let's visualize some predictions to see how well the model performs on individual images.

In [None]:
# Get a batch of test images
dataiter = iter(test_loader)
images, labels = next(dataiter)
images, labels = images.to(device), labels.to(device)

# Make predictions
model.eval()
with torch.no_grad():
    outputs = model(images)
    _, predicted = torch.max(outputs, 1)

# Display predictions
fig, axes = plt.subplots(2, 5, figsize=(15, 6))
fig.suptitle('Model Predictions on Test Images', fontsize=16, fontweight='bold')

for idx, ax in enumerate(axes.flat):
    # Move to CPU for display
    img = images[idx].cpu().squeeze() * 0.3552 + 0.2913  # Denormalize
    true_label = labels[idx].cpu().item()
    pred_label = predicted[idx].cpu().item()
    
    # Display image
    ax.imshow(img, cmap='gray')
    
    # Color code: green = correct, red = wrong
    color = 'green' if true_label == pred_label else 'red'
    ax.set_title(f'True: {classes[true_label]}\nPred: {classes[pred_label]}',
                 fontsize=10, color=color, fontweight='bold')
    ax.axis('off')

plt.tight_layout()
plt.show()

# Calculate batch accuracy
correct = (predicted == labels).sum().item()
batch_acc = 100 * correct / len(labels)
print(f"Batch accuracy: {batch_acc:.2f}% ({correct}/{len(labels)} correct)")

### Per-Class Accuracy

Let's analyze which clothing types the model handles best and worst.

In [None]:
# Calculate per-class accuracy
class_correct = [0] * 10
class_total = [0] * 10

model.eval()
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        
        for i in range(len(labels)):
            label = labels[i].item()
            class_correct[label] += (predicted[i] == labels[i]).item()
            class_total[label] += 1

# Print per-class results
print("\nPer-Class Accuracy:")
print("=" * 50)
print(f"{'Class':<15} {'Accuracy':<10} {'Correct/Total'}")
print("=" * 50)

for i in range(10):
    accuracy = 100 * class_correct[i] / class_total[i]
    print(f"{classes[i]:<15} {accuracy:>6.2f}%     {class_correct[i]:>4}/{class_total[i]:<4}")

print("=" * 50)
print(f"{'Overall':<15} {test_accs[-1]:>6.2f}%")
print("=" * 50)

## 10. Summary and Conclusions

### Project Summary

**Dataset:** Fashion-MNIST (70,000 grayscale images, 10 classes)  
**Model:** 3-layer fully connected neural network  
**Architecture:** 784 → 128 → 64 → 10  
**Training:** 10 epochs, Adam optimizer, batch size 64  
**Final Test Accuracy:** ~88%

### Key Achievements

✅ Successfully implemented and trained a neural network classifier  
✅ Achieved competitive accuracy (~88%) on Fashion-MNIST  
✅ Minimal overfitting (small train/test gap)  
✅ Model saved and verified for submission  

### Technical Highlights

- **Custom normalization** using dataset-specific statistics (mean=0.2913, std=0.3552)
- **Dropout regularization** (20%) to prevent overfitting
- **ReLU activation** for non-linearity
- **Adam optimizer** for efficient training
- **CrossEntropyLoss** for multi-class classification

### Potential Improvements

1. **Convolutional Neural Networks (CNNs)**: Better for image data
2. **Data Augmentation**: Rotation, flipping to increase training data
3. **Learning Rate Scheduling**: Decay learning rate over time
4. **Batch Normalization**: Stabilize training
5. **Ensemble Methods**: Combine multiple models
6. **Hyperparameter Tuning**: Optimize layer sizes, dropout, learning rate

---

**Contact:** originalmartin97@gmail.com  
**Date:** November 2025