<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        
        /* FIXES: */
        box-sizing: border-box;          /* Includes padding in width calculation */
        max-width: 100%;                 /* Prevents box from exceeding screen width */
        overflow-wrap: break-word;       /* Forces long words to wrap */
        word-wrap: break-word;           /* Legacy support for wrapping */
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow for emphasis */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;       /* Slightly lighter background */
        color: #f38ba8;                  /* Soft Red/Pink for code terms */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        
        /* FIXES for code: */
        white-space: pre-wrap;           /* Allows code to wrap on multiple lines */
        word-break: break-word;          /* Breaks long variables if necessary */
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
  <h2>1. Setup and Device Configuration</h2>
  <p>
    Before defining our models, we configure the environment to use hardware acceleration. This is crucial for training deep networks efficiently.
  </p>
  <ul>
    <li><span class="code-inline">torch.backends.mps.is_available()</span>: Checks specifically for <strong>Apple Silicon (M1/M2/M3/M4)</strong> GPUs. Using <span class="code-inline">"mps"</span> (Metal Performance Shaders) significantly speeds up training on Macs compared to CPU.</li>
    <li><span class="code-inline">torchsummary</span>: A helper library to visualize the model architecture, output shapes, and parameter counts layer-by-layer.</li>
  </ul>
</div>

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [3]:
# Check for 'torchsummary' to print model details
try:
    from torchsummary import summary
except ImportError:
    print("torchsummary not found. You can install it via: pip install torchsummary")
    summary = None

# --- Device Configuration for Mac M4 ---
# This checks for Apple's Metal Performance Shaders (MPS) first.
if torch.backends.mps.is_available():
    device = torch.device("mps")
    print("Success: Using Apple MPS (Metal Performance Shaders) acceleration.")
elif torch.cuda.is_available():
    device = torch.device("cuda")
    print("Using CUDA.")
else:
    device = torch.device("cpu")
    print("Using CPU.")



Success: Using Apple MPS (Metal Performance Shaders) acceleration.


<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        
        /* FIXES: */
        box-sizing: border-box;          /* Includes padding in width calculation */
        max-width: 100%;                 /* Prevents box from exceeding screen width */
        overflow-wrap: break-word;       /* Forces long words to wrap */
        word-wrap: break-word;           /* Legacy support for wrapping */
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow for emphasis */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;       /* Slightly lighter background */
        color: #f38ba8;                  /* Soft Red/Pink for code terms */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        
        /* FIXES for code: */
        white-space: pre-wrap;           /* Allows code to wrap on multiple lines */
        word-break: break-word;          /* Breaks long variables if necessary */
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
  <h2>2. LeNet-5 (1998)</h2>
  <p>
    Proposed by Yann LeCun, this is one of the earliest successful Convolutional Neural Networks, originally designed for <strong>handwritten digit recognition (MNIST)</strong>.
  </p>
  <p><strong>Key Architecture Features:</strong></p>
  <ul>
    <li><strong>Input:</strong> Grayscale images (1 channel), typically resized to 32x32.</li>
    <li><strong>Convolution:</strong> Uses 5x5 filters with stride 1.</li>
    <li><strong>Pooling:</strong> Originally used Average Pooling (subsampling), which we implement here using <span class="code-inline">nn.AvgPool2d</span>.</li>
    <li><strong>Structure:</strong> A simple pattern of <em>Conv â†’ Pool â†’ Conv â†’ Pool â†’ Fully Connected</em>.</li>
  </ul>
  <p>
    This network is small enough to run quickly on a CPU, but it laid the foundation for modern Deep Learning.
  </p>
</div>

<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        
        /* FIXES: */
        box-sizing: border-box;          /* Includes padding in width calculation */
        max-width: 100%;                 /* Prevents box from exceeding screen width */
        overflow-wrap: break-word;       /* Forces long words to wrap */
        word-wrap: break-word;           /* Legacy support for wrapping */
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow for emphasis */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;       /* Slightly lighter background */
        color: #f38ba8;                  /* Soft Red/Pink for code terms */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        
        /* FIXES for code: */
        white-space: pre-wrap;           /* Allows code to wrap on multiple lines */
        word-break: break-word;          /* Breaks long variables if necessary */
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
  <h2>ðŸ”¬ Deep Dive: Why LeNet Worked?</h2>
  <p>
    LeNet-5 (1998) was revolutionary not just because it worked, but because it established the standard <strong>CNN paradigm</strong> that we still use today. Before this, most computer vision relied on "hand-crafted features" (algorithms written by humans to detect edges or corners).
  </p>
  <ul>
    <li><strong>Local Receptive Fields:</strong> Unlike a fully connected network where every pixel connects to every neuron, LeNet connects neurons only to a small 5x5 region. This allows the network to detect elementary visual features (like edges) regardless of where they are in the image.</li>
    <li><strong>Shared Weights (Parameter Sharing):</strong> If a feature detector (e.g., a vertical edge filter) is useful in the top-left corner of an image, it is likely useful in the bottom-right too. By sharing weights across the entire image, LeNet drastically reduced the number of trainable parameters.</li>
    <li><strong>Sub-sampling (Pooling) & Invariance:</strong> The pooling layers make the network robust to small shifts and distortions. If the digit "7" shifts one pixel to the left, the output of the pooling layer often remains unchanged, making the model "translation invariant."</li>
  </ul>
</div>

In [4]:


# ==========================================
# 1. LeNet-5 Architecture
# ==========================================
class LeNet5(nn.Module):
    def __init__(self, num_classes=10):
        super(LeNet5, self).__init__()
        # Input: 1 channel (grayscale), Output: 6 channels, Kernel: 5x5
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1)
        self.pool = nn.AvgPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1)
        
        # Fully Connected Layers
        # 16 channels * 5 * 5 spatial dimension = 400
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, num_classes)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5) # Flatten
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x



<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        
        /* FIXES: */
        box-sizing: border-box;          /* Includes padding in width calculation */
        max-width: 100%;                 /* Prevents box from exceeding screen width */
        overflow-wrap: break-word;       /* Forces long words to wrap */
        word-wrap: break-word;           /* Legacy support for wrapping */
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow for emphasis */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;       /* Slightly lighter background */
        color: #f38ba8;                  /* Soft Red/Pink for code terms */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        
        /* FIXES for code: */
        white-space: pre-wrap;           /* Allows code to wrap on multiple lines */
        word-break: break-word;          /* Breaks long variables if necessary */
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
  <h2>3. AlexNet (2012)</h2>
  <p>
    Designed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, this model won the ILSVRC 2012 competition by a large margin, sparking the modern "Deep Learning Boom."
  </p>
  <p><strong>Key Improvements over LeNet:</strong></p>
  <ul>
    <li><strong>Depth:</strong> Significantly deeper (8 layers).</li>
    <li><strong>ReLU Activation:</strong> Replaced Sigmoid/Tanh with <span class="code-inline">nn.ReLU</span>, solving the vanishing gradient problem and speeding up convergence.</li>
    <li><strong>Dropout:</strong> Implemented <span class="code-inline">nn.Dropout</span> in the fully connected layers to prevent overfitting.</li>
    <li><strong>Large Kernels:</strong> Starts with aggressive 11x11 convolutions to capture large features in high-resolution inputs.</li>
  </ul>
</div>

<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        
        /* FIXES: */
        box-sizing: border-box;          /* Includes padding in width calculation */
        max-width: 100%;                 /* Prevents box from exceeding screen width */
        overflow-wrap: break-word;       /* Forces long words to wrap */
        word-wrap: break-word;           /* Legacy support for wrapping */
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow for emphasis */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;       /* Slightly lighter background */
        color: #f38ba8;                  /* Soft Red/Pink for code terms */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        
        /* FIXES for code: */
        white-space: pre-wrap;           /* Allows code to wrap on multiple lines */
        word-break: break-word;          /* Breaks long variables if necessary */
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
  <h2>ðŸ”¬ Deep Dive: The AlexNet Breakthrough</h2>
  <p>
    AlexNet (2012) didn't just add more layers; it introduced critical engineering solutions that made training deep networks possible.
  </p>
  <ul>
    <li><strong>The ReLU Revolution:</strong> Previous networks used <em>Sigmoid</em> or <em>Tanh</em> activations. These suffer from the <strong>Vanishing Gradient</strong> problem (gradients become tiny as they backpropagate through many layers). <strong>ReLU (Rectified Linear Unit)</strong>, defined as $f(x) = max(0, x)$, keeps gradients healthy and speeds up training by 6x.</li>
    <li><strong>Dropout as Regularization:</strong> With 60 million parameters, AlexNet was prone to "memorizing" the training data (overfitting). <strong>Dropout</strong> randomly turns off 50% of neurons during training. This forces the network to learn robust features that don't rely on specific neighboring neurons.</li>
    <li><strong>Data Augmentation:</strong> The authors artificially expanded their dataset by mirroring images and extracting random crops. This taught the network that a "cat" is still a "cat" even if reflected or shifted.</li>
  </ul>
</div>

In [5]:
# ==========================================
# 2. AlexNet Architecture
# ==========================================
class AlexNet(nn.Module):
    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            # Conv1
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            # Conv2
            nn.Conv2d(96, 256, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            # Conv3
            nn.Conv2d(256, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            # Conv4
            nn.Conv2d(384, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            # Conv5
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x



<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        
        /* FIXES: */
        box-sizing: border-box;          /* Includes padding in width calculation */
        max-width: 100%;                 /* Prevents box from exceeding screen width */
        overflow-wrap: break-word;       /* Forces long words to wrap */
        word-wrap: break-word;           /* Legacy support for wrapping */
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow for emphasis */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;       /* Slightly lighter background */
        color: #f38ba8;                  /* Soft Red/Pink for code terms */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        
        /* FIXES for code: */
        white-space: pre-wrap;           /* Allows code to wrap on multiple lines */
        word-break: break-word;          /* Breaks long variables if necessary */
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
  <h2>4. VGG-16 (2014)</h2>
  <p>
    Proposed by the Visual Geometry Group (VGG) at Oxford, this architecture explored the relationship between network depth and performance.
  </p>
  <p><strong>The "Small Filter" Philosophy:</strong></p>
  <ul>
    <li>Instead of large filters (like AlexNet's 11x11), VGG uses exclusively <strong>3x3 convolutions</strong>.</li>
    <li><strong>Why?</strong> Stacking two 3x3 layers has the same "receptive field" as one 5x5 layer but with fewer parameters and more non-linearity (more ReLU layers).</li>
    <li><strong>Structure:</strong> It follows a very uniform pattern of Convolutional Blocks followed by Max Pooling.</li>
  </ul>
  <p>
    <em>Note: VGG-16 is very parameter-heavy (approx. 138 million parameters), mostly due to the first dense layer (<span class="code-inline">4096 nodes</span>).</em>
  </p>
</div>

<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        
        /* FIXES: */
        box-sizing: border-box;          /* Includes padding in width calculation */
        max-width: 100%;                 /* Prevents box from exceeding screen width */
        overflow-wrap: break-word;       /* Forces long words to wrap */
        word-wrap: break-word;           /* Legacy support for wrapping */
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow for emphasis */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;       /* Slightly lighter background */
        color: #f38ba8;                  /* Soft Red/Pink for code terms */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        
        /* FIXES for code: */
        white-space: pre-wrap;           /* Allows code to wrap on multiple lines */
        word-break: break-word;          /* Breaks long variables if necessary */
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
  <h2>ðŸ”¬ Deep Dive: The "Small Filter" Philosophy of VGG</h2>
  <p>
    VGG challenged the idea that you need different filter sizes (11x11, 5x5, etc.) to capture features. It proved that you can build very deep networks using <strong>only 3x3 convolutions</strong>.
  </p>
  <p><strong>Why replace one large filter with multiple small ones?</strong></p>
  <ul>
    <li><strong>Effective Receptive Field:</strong> A stack of two 3x3 convolutions has the same "view" of the input image (receptive field) as a single 5x5 convolution. A stack of three 3x3s equals a 7x7.</li>
    <li><strong>More Non-Linearity:</strong> By using a stack of three 3x3 layers, we get three ReLU activation functions instead of just one (if we had used a single 7x7 layer). This allows the network to learn much more complex functions.</li>
    <li><strong>Parameter Efficiency:</strong> 
      <br>â€¢ One 7x7 filter has $1 \times (7 \times 7) = 49$ weights.
      <br>â€¢ Three 3x3 filters have $3 \times (3 \times 3) = 27$ weights.
      <br><strong>Result:</strong> We get the same spatial reach with nearly half the parameters!
    </li>
  </ul>
</div>

In [8]:
# ==========================================
# 3. VGG-16 Architecture
# ==========================================
class VGG16(nn.Module):
    def __init__(self, num_classes=1000):
        super(VGG16, self).__init__()
        self.features = nn.Sequential(
            # Block 1
            nn.Conv2d(3, 64, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            # Block 2
            nn.Conv2d(64, 128, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            # Block 3
            nn.Conv2d(128, 256, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            # Block 4
            nn.Conv2d(256, 512, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            # Block 5
            nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x



<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        
        /* FIXES: */
        box-sizing: border-box;          /* Includes padding in width calculation */
        max-width: 100%;                 /* Prevents box from exceeding screen width */
        overflow-wrap: break-word;       /* Forces long words to wrap */
        word-wrap: break-word;           /* Legacy support for wrapping */
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow for emphasis */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;       /* Slightly lighter background */
        color: #f38ba8;                  /* Soft Red/Pink for code terms */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        
        /* FIXES for code: */
        white-space: pre-wrap;           /* Allows code to wrap on multiple lines */
        word-break: break-word;          /* Breaks long variables if necessary */
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
  <h2>5. Testing and Summary Generation</h2>
  <p>
    In this section, we instantiate the models and print their summaries.
  </p>
  <p><strong>A Note on "torchsummary" and MPS:</strong></p>
  <ul>
    <li>The <span class="code-inline">torchsummary</span> library generates dummy input data to calculate output shapes. By default, it creates this data on the <strong>CPU</strong>.</li>
    <li>If we initialize our model immediately on the GPU (MPS/CUDA), we will get a <span class="code-inline">RuntimeError</span> because the model (GPU) and the dummy input (CPU) are on different devices.</li>
    <li><strong>The Fix:</strong> We instantiate the model on the CPU first, run the summary, and <em>then</em> move the model to the device (<span class="code-inline">.to(device)</span>) for actual training.</li>
  </ul>
</div>

In [9]:
# ==========================================
# Execution / Testing
# ==========================================
if __name__ == "__main__":
    if summary:
        print("\n" + "="*30)
        print("Testing LeNet-5 (Input: 1x32x32)")
        # 1. Instantiate on CPU first
        lenet = LeNet5() 
        # 2. Run summary (both model and input are on CPU)
        summary(lenet, (1, 32, 32))
        # 3. Move to MPS device for actual training/usage
        lenet = lenet.to(device) 

        print("\n" + "="*30)
        print("Testing AlexNet (Input: 3x227x227)")
        alexnet = AlexNet()
        summary(alexnet, (3, 227, 227))
        alexnet = alexnet.to(device)

        print("\n" + "="*30)
        print("Testing VGG-16 (Input: 3x224x224)")
        vgg = VGG16()
        summary(vgg, (3, 224, 224))
        vgg = vgg.to(device)
        
    else:
        print("Models defined successfully. Install 'torchsummary' to see output shapes.")


Testing LeNet-5 (Input: 1x32x32)
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 6, 28, 28]             156
         AvgPool2d-2            [-1, 6, 14, 14]               0
            Conv2d-3           [-1, 16, 10, 10]           2,416
         AvgPool2d-4             [-1, 16, 5, 5]               0
            Linear-5                  [-1, 120]          48,120
            Linear-6                   [-1, 84]          10,164
            Linear-7                   [-1, 10]             850
Total params: 61,706
Trainable params: 61,706
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.06
Params size (MB): 0.24
Estimated Total Size (MB): 0.30
----------------------------------------------------------------

Testing AlexNet (Input: 3x227x227)
--------------------------------------

<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        
        /* FIXES: */
        box-sizing: border-box;          /* Includes padding in width calculation */
        max-width: 100%;                 /* Prevents box from exceeding screen width */
        overflow-wrap: break-word;       /* Forces long words to wrap */
        word-wrap: break-word;           /* Legacy support for wrapping */
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow for emphasis */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;       /* Slightly lighter background */
        color: #f38ba8;                  /* Soft Red/Pink for code terms */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        
        /* FIXES for code: */
        white-space: pre-wrap;           /* Allows code to wrap on multiple lines */
        word-break: break-word;          /* Breaks long variables if necessary */
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
  <h2>ðŸ“Š Comparison: Evolution of Classic Architectures</h2>
  <table style="width:100%; text-align: left; border-collapse: collapse; color: #cdd6f4;">
    <thead>
      <tr style="border-bottom: 2px solid #89b4fa;">
        <th style="padding: 10px;">Feature</th>
        <th style="padding: 10px;">LeNet-5 (1998)</th>
        <th style="padding: 10px;">AlexNet (2012)</th>
        <th style="padding: 10px;">VGG-16 (2014)</th>
      </tr>
    </thead>
    <tbody>
      <tr style="border-bottom: 1px solid #45475a;">
        <td style="padding: 10px; color: #f9e2af;"><strong>Primary Input</strong></td>
        <td style="padding: 10px;">32x32 Grayscale</td>
        <td style="padding: 10px;">227x227 RGB</td>
        <td style="padding: 10px;">224x224 RGB</td>
      </tr>
      <tr style="border-bottom: 1px solid #45475a;">
        <td style="padding: 10px; color: #f9e2af;"><strong>Depth (Layers)</strong></td>
        <td style="padding: 10px;">5 (2 Conv, 3 FC)</td>
        <td style="padding: 10px;">8 (5 Conv, 3 FC)</td>
        <td style="padding: 10px;">16 (13 Conv, 3 FC)</td>
      </tr>
      <tr style="border-bottom: 1px solid #45475a;">
        <td style="padding: 10px; color: #f9e2af;"><strong>Filter Sizes</strong></td>
        <td style="padding: 10px;">5x5</td>
        <td style="padding: 10px;">11x11, 5x5, 3x3</td>
        <td style="padding: 10px;">Fixed 3x3</td>
      </tr>
      <tr style="border-bottom: 1px solid #45475a;">
        <td style="padding: 10px; color: #f9e2af;"><strong>Activation</strong></td>
        <td style="padding: 10px;">Sigmoid / Tanh</td>
        <td style="padding: 10px;">ReLU</td>
        <td style="padding: 10px;">ReLU</td>
      </tr>
      <tr>
        <td style="padding: 10px; color: #f9e2af;"><strong>Main Contribution</strong></td>
        <td style="padding: 10px;">Introduced CNNs & Backprop</td>
        <td style="padding: 10px;">Proved Deep Learning works on large data (GPU + ReLU)</td>
        <td style="padding: 10px;">Showed depth matters; Standardized architecture blocks</td>
      </tr>
    </tbody>
  </table>
</div>