### **State University of Campinas - UNICAMP** </br>
**Course**: MC886A </br>
**Professor**: Marcelo da Silva Reis </br>
**TA (PED)**: Marcos Vinicius Souza Freire

---

### **Hands-On: Deep Learnng with PyTorch**
##### Notebook: 01 CNN
---

### **Table of Contents**

1. [**Objectives**](#objectives) </br>
2. [**Prerequisites**](#prerequisites) </br>
3. [**Basic Concept**](#basic-concept) </br>
  3.1. [**Deep Neural Network (DNN)**](#1-deep-neural-network-dnn) </br>
  3.2. [**Convolutional Neural Network (CNN)**](#2-convolutional-neural-network-cnn) </br>
  3.3. [Multi-Layer Perceptron (MLP)](#3-multi--layer-perceptron-mlp) </br>
4. [**Implementation of A CNN**](#implementation-of-a-cnn) </br>
5. [**Extending CNN with Transfer Learning**](#extending-cnn-with-transfer-learning) </br>
6. [**REFERENCES**](#references)

---

#### **Objectives**
- Understand how Convolutional Neural Networks (CNN) are originated and created.
- Advance from the single concept of Deep Neural Networks to CNNs.
- Understand basic concepts of Transfer Learning and work with Fine-tune.

---



#### **Prerequisites**
- Install PyTorch and some extra packages.
- Have Python and a Jupyter Notebook ready (great for interactive demos).

Installing Pytorch (for all setups in `00-setup.ipynb` from the hands-on 00):

- `pip install torch torchvision`

- `pip install nbformat`

- `pip install torchmetrics`

- To plot pretty graphs, you can use Plotly
`pip install plotly`

---

### **Basic Concept**

For this part, let's start with Deep Neural Networks, for a general approach. </br> </br>


#### 1. **Deep Neural Network (DNN)**

**Definition:**
A Deep Neural Network extends the MLP by incorporating multiple hidden layers, enabling the network to learn increasingly complex feature representations at each successive layer. This hierarchical representation learning is what gives deep learning its power to solve complex problems.

**Formula:**
For a DNN with $L$ hidden layers:

First hidden layer ($l=1$): For each neuron $h_j^{(1)}$:
$h_j^{(1)} = \text{activation}^{(1)}(\sum_{i=0}^{n} w_{ij}^{(1)}x_i)$

Hidden layers ($l=2$ to $L$): For each neuron $h_j^{(l)}$:
$h_j^{(l)} = \text{activation}^{(l)}(\sum_{i=0}^{m_{l-1}} w_{ij}^{(l)}h_i^{(l-1)})$

Output layer: For each output neuron $y_j$:
$y_j = \text{activation}^{(L+1)}(\sum_{i=0}^{m_L} w_{ij}^{(L+1)}h_i^{(L)})$

Where:
- $h_j^{(l)}$ is the output of the $j$-th neuron in the $l$-th hidden layer
- $m_l$ is the number of neurons in the $l$-th layer
- $w_{ij}^{(l)}$ is the weight from neuron $i$ in layer $l-1$ to neuron $j$ in layer $l$
- $\text{activation}^{(l)}$ is the activation function for layer $l$ (which may vary by layer)

In [None]:
import plotly.graph_objects as go

# Shared configuration
node_size = 40
font_size = 12
layer_colors = {'input': '#636EFA', 'bias': '#00CC96',
               'hidden': '#FFA15A', 'output': '#EF553B',
               'activation': '#AB63FA'}

def create_network(nodes, edges, title):
    fig = go.Figure()
    legend_groups = set()

    # Create edges with weights
    for i, ((src, dest), weight) in enumerate(edges.items()):
        x0, y0 = nodes[src]['x'], nodes[src]['y']
        x1, y1 = nodes[dest]['x'], nodes[dest]['y']
        fig.add_trace(go.Scatter(
            x=[x0, x1, None], y=[y0, y1, None],
            line=dict(width=1, color='gray'),
            mode='lines',
            hoverinfo='text',
            text=f'Weight: {weight}',
            showlegend=i == 0,
            legendgroup='weights',
            name='Weights'
        ))

    # Create nodes with legend groups
    for node in nodes:
        # Determine node type from color
        node_type = [k for k, v in layer_colors.items() if v == node['color']][0]

        fig.add_trace(go.Scatter(
            x=[node['x']], y=[node['y']],
            mode='markers+text',
            marker=dict(size=node_size, color=node['color']),
            text=node.get('label', ''),
            textposition="top center",
            hoverinfo='text',
            hovertext=node.get('formula', ''),
            showlegend=node_type not in legend_groups,
            legendgroup=node_type,
            name=f'{node_type.capitalize()} Node'
        ))
        if node_type not in legend_groups:
            legend_groups.add(node_type)

    # Add activation functions as separate traces
    activation_added = False
    for node in nodes:
        if 'activation' in node:
            fig.add_trace(go.Scatter(
                x=[node['x'] + 0.25],  # Offset from node
                y=[node['y'] + 0.1],   # Vertical adjustment
                mode='text',
                text=node['activation'],
                textfont=dict(color=layer_colors['activation'], size=font_size),
                showlegend=not activation_added,
                legendgroup='activation',
                name='Activation Function'
            ))
            activation_added = True

    fig.update_layout(
        title=title,
        template='plotly_white',
        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        margin=dict(l=20, r=150),
        legend=dict(
            x=1.05,
            y=0.5,
            xanchor='left',
            yanchor='middle',
            itemsizing='constant'
        )
    )
    return fig

In [None]:
# --------------------------------------------------------------------------
# 1. Deep Neural Network
# --------------------------------------------------------------------------
dnn_nodes = [
    {'x': 0, 'y': 1, 'label': '1', 'color': layer_colors['bias']},
    {'x': 0, 'y': 0, 'label': 'x₁', 'color': layer_colors['input']},
    {'x': 0, 'y': -1, 'label': 'x₂', 'color': layer_colors['input']},
    {'x': 1, 'y': 1.5, 'label': 'h₁₁', 'color': layer_colors['hidden'],
     'activation': 'ReLU(Σ)', 'formula': 'h₁₁ = ReLU(Σw¹ᵢxᵢ + b₁)'},
    {'x': 1, 'y': 0.5, 'label': 'h₁₂', 'color': layer_colors['hidden'],
     'activation': 'ReLU(Σ)', 'formula': 'h₁₂ = ReLU(Σw¹ⱼxⱼ + b₂)'},
    {'x': 1, 'y': -0.5, 'label': 'h₁₃', 'color': layer_colors['hidden'],
     'activation': 'ReLU(Σ)', 'formula': 'h₁₃ = ReLU(Σw¹ₖxₖ + b₃)'},
    {'x': 2, 'y': 1, 'label': 'h₂₁', 'color': layer_colors['hidden'],
     'activation': 'tanh(Σ)', 'formula': 'h₂₁ = tanh(Σw²ᵢh₁ᵢ + b₄)'},
    {'x': 2, 'y': -1, 'label': 'h₂₂', 'color': layer_colors['hidden'],
     'activation': 'tanh(Σ)', 'formula': 'h₂₂ = tanh(Σw²ⱼh₁ⱼ + b₅)'},
    {'x': 3, 'y': 0, 'label': 'y', 'color': layer_colors['output'],
     'activation': 'σ(Σ)', 'formula': 'y = sigmoid(Σw³ₖh₂ₖ + b₆)'}
]

dnn_edges = {
    (0,3): 'w¹₀₁', (0,4): 'w¹₀₂', (0,5): 'w¹₀₃',
    (1,3): 'w¹₁₁', (1,4): 'w¹₁₂', (1,5): 'w¹₁₃',
    (2,3): 'w¹₂₁', (2,4): 'w¹₂₂', (2,5): 'w¹₂₃',
    (3,6): 'w²₁₁', (3,7): 'w²₁₂',
    (4,6): 'w²₂₁', (4,7): 'w²₂₂',
    (5,6): 'w²₃₁', (5,7): 'w²₃₂',
    (6,8): 'w³₁',
    (7,8): 'w³₂'
}

fig4 = create_network(dnn_nodes, dnn_edges, "Deep Neural Network (DNN)")
fig4.show()

#### 2. **Convolutional Neural Network (CNN)**

**Definition:** A Convolutional Neural Network is a specialized type of neural network designed to process structured grid-like data, such as images or time-series data. CNNs are particularly effective for tasks like image classification, object detection, and facial recognition due to their ability to learn spatial hierarchies of features.

**Formula** (simplified for a single convolutional layer):
For a feature map $ z $ at position $ (i,j) $:
$ z(i,j) = \text{activation}\left( \sum_m \sum_n w(m,n) x(i+m, j+n) + b \right) $
Where:
- $ x $ is the input (e.g., image pixels)
- $ w $ is the convolutional kernel
- $ b $ is the bias
- $ \text{activation} $ is typically ReLU

Followed by pooling (e.g., max pooling):
$ p(i,j) = \max_{m,n \in \text{window}} z(i \cdot s + m, j \cdot s + n) $
Where $ s $ is the stride.

**CNN is a type of Deep Neural Network (DNN)**. A DNN is defined as a neural network with multiple hidden layers, enabling complex feature learning. CNNs typically consist of multiple layers (convolutional, pooling, and fully connected), often numbering in the dozens or hundreds in modern architectures (e.g., VGG, ResNet).

A CNN which consists, basically, of:

- An input layer (e.g., a 2D image).
- A convolutional layer with multiple filters producing feature maps.
- A pooling layer reducing spatial dimensions.
- A fully connected layer for output (e.g., classification).

#### **In the following plot, we'll visualize the following architecture:**

##### **1. Input Layer (28×28×1)**
- **28×28**: Spatial dimensions of the input image (height × width)
- **×1**: Number of channels (grayscale image)
- **Example**: MNIST handwritten digits dataset uses 28x28 pixel monochrome images

---

##### **2. First Convolutional Layer (26×26×32)**
- **26×26**: Reduced spatial dimensions after convolution
  - Original 28x28 → 26x26 due to 3×3 filter (28 - 3 + 1 = 26)
- **×32**: Number of filters/feature maps
  - Each filter learns different spatial patterns

---

##### **3. First Pooling Layer (13×13×32)**
- **13×13**: Reduced spatial dimensions after max-pooling
  - 2×2 pooling with stride 2 (26/2 = 13)
- **×32**: Number of channels preserved
  - Pooling operates per channel independently

---

##### **4. Second Convolutional Layer (11×11×64)**
- **11×11**: Spatial dimensions after convolution
  - 13 - 3 + 1 = 11 (using 3×3 filters)
- **×64**: Increased number of filters
  - Deeper layers typically have more filters to capture complex patterns

---

##### **5. Second Pooling Layer (5×5×64)**
- **5×5**: Spatial dimensions after max-pooling
  - 11/2 = 5.5 → floor to 5 (common in pooling operations)
- **×64**: Number of channels preserved

---

##### **6. Flatten Layer**
- Converts 3D tensor (5×5×64) to 1D vector
- **5×5×64 = 1600 units**
  - 5 (height) × 5 (width) × 64 (channels) = 1600 elements

---

##### **7. Fully Connected Layers**
| Layer         | Units | Purpose                                                                 |
|---------------|-------|-------------------------------------------------------------------------|
| **Dense**     | 128   | Learns high-level patterns from flattened features                      |
| **Output**    | 10    | Final classification (e.g., 10 digits in MNIST) with softmax activation |

---

##### **Dimension Reduction Flow**
```python
Input → Conv → Pool → Conv → Pool → Flatten → Dense → Output
28×28×1 → 26×26×32 → 13×13×32 → 11×11×64 → 5×5×64 → 1600 → 128 → 10
```

##### **Key Operations**
1. **Convolution** (Conv2D):
   - Filter size: 3×3
   - Stride: 1 (no padding → reduces dimensions)
   - Activation: ReLU (σ = max(0,x))

2. **Max-Pooling**:
   - Window size: 2×2
   - Stride: 2 (halves spatial dimensions)

3. **Flatten**:
   - Prepares 3D features for dense layers

4. **Fully Connected** (Dense):
   - 128 units: Feature compression/abstraction
   - 10 units: Final classification (softmax activation)

---

In [None]:
# --------------------------------------------------------------------------
# 2. Convolutional Neural Network
# --------------------------------------------------------------------------

import plotly.graph_objects as go
import numpy as np

layer_colors = {
    'input': '#636EFA',
    'conv': '#00CC96',
    'pool': '#FFA15A',
    'flatten': '#EF553B',
    'fc': '#AB63FA',
    'output': '#19D3F3',
    'activation': '#FF6692'
}

layer_descriptions = {
    'input': "Input Layer\n28×28×1 grayscale image",
    'conv': "Convolutional Layer\nApplies filters to extract spatial features",
    'pool': "Pooling Layer\nReduces spatial dimensions through subsampling",
    'flatten': "Flatten Layer\nConverts 3D features to 1D vector",
    'fc': "Fully Connected Layer\nLearns global patterns in features",
    'output': "Output Layer\nProduces classification probabilities"
}

def create_cnn_visualization():
    fig = go.Figure()

    # Define layer positions
    x_positions = [0, 2, 4, 6, 8, 10]
    layer_heights = [3, 2.4, 1.8, 1.2, 0.6, 0.3]
    layer_depths = [3, 2.4, 1.8, 1.2, 0.6, 0.3]

    legend_groups = set()

    # Create layers
    create_3d_layer(fig, x_positions[0], layer_heights[0], layer_depths[0],
                   'input', 'Input\n28×28×1', legend_groups,
                   layer_descriptions['input'])

    create_3d_layer(fig, x_positions[1], layer_heights[1], layer_depths[1],
                   'conv', 'Conv\n26×26×32', legend_groups,
                   layer_descriptions['conv'], 'ReLU')

    create_3d_layer(fig, x_positions[2], layer_heights[2], layer_depths[2],
                   'pool', 'Pool\n13×13×32', legend_groups,
                   layer_descriptions['pool'])

    create_3d_layer(fig, x_positions[3], layer_heights[3], layer_depths[3],
                   'conv', 'Conv\n11×11×64', legend_groups,
                   layer_descriptions['conv'], 'ReLU')

    create_3d_layer(fig, x_positions[4], layer_heights[4], layer_depths[4],
                   'pool', 'Pool\n5×5×64', legend_groups,
                   layer_descriptions['pool'])

    add_flatten_layer(fig, x_positions[5], legend_groups,
                     layer_descriptions['flatten'])

    add_fc_layers(fig, x_positions[5] + 2, legend_groups)

    add_layer_connections(fig, x_positions)

    # Configure layout
    fig.update_layout(
        title=dict(
            text="Convolutional Neural Network (CNN) Architecture",
            x=0.05,
            font=dict(size=24)
        ),
        template='plotly_white',
        margin=dict(l=20, r=300, t=100, b=20),
        legend=dict(
            title="Layer Types",
            x=1.05,
            y=0.5,
            xanchor='left',
            yanchor='middle',
            font=dict(size=12)
            ),
        width=1400,
        height=600,
        scene=dict(
            xaxis=dict(showgrid=False, zeroline=False, visible=False),
            yaxis=dict(showgrid=False, zeroline=False, visible=False),
            zaxis=dict(showgrid=False, zeroline=False, visible=False),
            aspectmode='manual',
            aspectratio=dict(x=2, y=1, z=0.5)
        ),
        annotations=[
            dict(
                x=1.05,
                y=0.9,
                xref='paper',
                yref='paper',
                text="<b>CNN Components:</b><br>"
                     "- Convolution: Feature extraction<br>"
                     "- Pooling: Dimensionality reduction<br>"
                     "- Flatten: 3D→1D conversion<br>"
                     "- Dense: Classification",
                showarrow=False,
                align='left',
                font=dict(size=14))
        ]
    )
    return fig

def create_3d_layer(fig, x_pos, height, depth, layer_type, label_text,
                   legend_groups, description, activation=None):
    # Create 3D box structure
    x = [x_pos, x_pos+1.2, x_pos+1.2, x_pos] * 2
    y = [-height/2, -height/2, height/2, height/2] * 2
    z = [0]*4 + [depth]*4

    # Add all edges with proper legend grouping
    edges = [(0,1), (1,2), (2,3), (3,0), (4,5), (5,6), (6,7), (7,4), (0,4), (1,5), (2,6), (3,7)]

    for i, (start, end) in enumerate(edges):
        show_legend = (layer_type not in legend_groups) and (i == 0)
        fig.add_trace(go.Scatter3d(
            x=[x[start], x[end]],
            y=[y[start], y[end]],
            z=[z[start], z[end]],
            mode='lines',
            line=dict(color=layer_colors[layer_type], width=2),
            showlegend=show_legend,
            name=layer_descriptions[layer_type].split('\n')[0],
            legendgroup=layer_type,
            hoverinfo='text',
            hovertext=description
        ))
    if layer_type not in legend_groups:
        legend_groups.add(layer_type)

    # Add activation function annotation
    if activation:
        fig.add_trace(go.Scatter3d(
            x=[x_pos+0.6],
            y=[height/2 + 0.3],
            z=[depth/2],
            mode='text',
            text=f'σ = {activation}',
            textfont=dict(color=layer_colors['activation'], size=14),
            showlegend='activation' not in legend_groups,
            name='Activation Function',
            legendgroup='activation',
            hoverinfo='text',
            hovertext=f"Non-linear activation function<br>({activation})"
        ))
        if 'activation' not in legend_groups:
            legend_groups.add('activation')

    # Add layer label
    fig.add_trace(go.Scatter3d(
        x=[x_pos+0.6],
        y=[-height/2 - 0.5],
        z=[0],
        mode='text',
        text=label_text,
        textfont=dict(color='black', size=14),
        hoverinfo='none',
        showlegend=False
    ))

def add_flatten_layer(fig, x_pos, legend_groups, description):
    y_values = np.linspace(-1, 1, 20)
    fig.add_trace(go.Scatter3d(
        x=[x_pos]*20,
        y=y_values,
        z=[0]*20,
        mode='lines',
        line=dict(color=layer_colors['flatten'], width=6),
        showlegend='flatten' not in legend_groups,
        name=layer_descriptions['flatten'].split('\n')[0],
        legendgroup='flatten',
        hoverinfo='text',
        hovertext=description
    ))
    legend_groups.add('flatten')

def add_fc_layers(fig, x_pos, legend_groups):
    fc_layers = [
        {'units': 128, 'label': 'Dense\n128', 'activation': 'ReLU'},
        {'units': 10, 'label': 'Output\n10', 'activation': 'Softmax'}
    ]

    for i, layer in enumerate(fc_layers):
        layer_type = 'output' if i == len(fc_layers)-1 else 'fc'
        x = x_pos + i*2

        # Add nodes
        y_values = np.linspace(-0.8, 0.8, layer['units'])
        fig.add_trace(go.Scatter3d(
            x=[x]*len(y_values),
            y=y_values,
            z=[0]*len(y_values),
            mode='markers',
            marker=dict(size=6, color=layer_colors[layer_type]),
            showlegend=layer_type not in legend_groups,
            name=layer_descriptions[layer_type].split('\n')[0],
            legendgroup=layer_type,
            hoverinfo='text',
            hovertext=layer_descriptions[layer_type]
        ))
        if layer_type not in legend_groups:
            legend_groups.add(layer_type)

        # Add activation annotation
        fig.add_trace(go.Scatter3d(
            x=[x + 0.3],
            y=[0.9],
            z=[0],
            mode='text',
            text=f'σ = {layer["activation"]}',
            textfont=dict(color=layer_colors['activation'], size=14),
            showlegend=False
        ))

        # Add layer label
        fig.add_trace(go.Scatter3d(
            x=[x],
            y=[-1.2],
            z=[0],
            mode='text',
            text=layer['label'],
            textfont=dict(color='black', size=14),
            showlegend=False
        ))

def add_layer_connections(fig, x_positions):
    operations = [
        ("Conv2D", "3×3 kernel, 32 filters", "Stride 1, ReLU"),
        ("MaxPool", "2×2 window", "Stride 2"),
        ("Conv2D", "3×3 kernel, 64 filters", "Stride 1, ReLU"),
        ("MaxPool", "2×2 window", "Stride 2"),
        ("Flatten", "3D→1D conversion", "1600 → 128"),
        ("Dense", "Fully connected", "128 → 10")
    ]

    for i in range(len(x_positions)-1):
        x_start = x_positions[i] + 1.2
        x_end = x_positions[i+1]

        # Add connection line
        fig.add_trace(go.Scatter3d(
            x=np.linspace(x_start, x_end, 30),
            y=np.zeros(30),
            z=np.zeros(30),
            mode='lines',
            line=dict(color='gray', width=1),
            hoverinfo='text',
            hovertext=f"Operation: {operations[i][0]}<br>{operations[i][1]}<br>{operations[i][2]}",
            showlegend=False
        ))

        # Add operation label
        fig.add_trace(go.Scatter3d(
            x=[(x_start + x_end)/2],
            y=[0.5],
            z=[0],
            mode='text',
            text=operations[i][0],
            textfont=dict(size=12, color='black'),
            showlegend=False
        ))

# Generate and display the visualization
fig = create_cnn_visualization()
fig.show()

#### **Implementation of a CNN**

Convolutional Neural Network (CNN)

CNNs excel at image tasks by extracting spatial features. Let's use it on MNIST too.

In [None]:
# Install Pytorch (if not installed) with libraries to handle vision/image operations (Torchvision) and get metrics (Torchmetrics)
!pip install torch torchvision torchmetrics

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split
from torchmetrics import Accuracy
from tqdm import tqdm
import plotly.graph_objects as go

# Define the CNN model
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = self.pool(torch.relu(self.conv2(x)))
        x = x.view(-1, 64 * 7 * 7)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Data loading
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
train_set, val_set = random_split(train_dataset, [55000, 5000])  # 55k train, 5k val
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
val_loader = DataLoader(val_set, batch_size=64, shuffle=False)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Training setup
model = CNN()
# model = model.to(device)  # Move model to GPU if available
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
accuracy = Accuracy(task="multiclass", num_classes=10)

# Lists to store losses
train_losses = []
val_losses = []

num_epochs = 5

# Training loop
for epoch in range(num_epochs):
    model.train()
    train_loss = 0.0
    for images, labels in tqdm(train_loader, desc=f'Epoch {epoch+1}/{num_epochs}'):
        # images, labels = images.to(device), labels.to(device)  # Move data to GPU
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
    train_losses.append(train_loss / len(train_loader))

    # Validation
    model.eval()
    val_loss = 0.0
    with torch.no_grad():
        for images, labels in val_loader:
            # images, labels = images.to(device), labels.to(device)  # Move data to GPU
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            accuracy.update(outputs, labels)
    val_losses.append(val_loss / len(val_loader))
    val_acc = accuracy.compute()
    print(f'Epoch {epoch+1}, Train Loss: {train_losses[-1]:.4f}, Val Loss: {val_losses[-1]:.4f}, Val Accuracy: {val_acc:.2f}')
    accuracy.reset()

# Test set evaluation
model.eval()
test_loss = 0.0
test_acc = 0.0
with torch.no_grad():
    for images, labels in test_loader:
        # images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        loss = criterion(outputs, labels)
        test_loss += loss.item()
        accuracy.update(outputs, labels)
    test_loss /= len(test_loader)
    test_acc = accuracy.compute()
    print(f'Test Loss: {test_loss:.4f}, Test Accuracy: {test_acc:.2f}')

# Plot losses using Plotly
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=list(range(1, len(train_losses)+1)),
    y=train_losses,
    mode='lines+markers',
    name='Train Loss',
    line=dict(color='blue', width=2)
))

fig.add_trace(go.Scatter(
    x=list(range(1, len(val_losses)+1)),
    y=val_losses,
    mode='lines+markers',
    name='Validation Loss',
    line=dict(color='red', width=2)
))

fig.update_layout(
    title='CNN Training Progress',
    xaxis_title='Epoch',
    yaxis_title='Loss',
    template='plotly_white',
    legend=dict(x=0.8, y=0.9, bgcolor='rgba(255,255,255,0.5)'),
    margin=dict(l=40, r=20, t=40, b=20),
    hovermode='x unified'
)

fig.show()

In [None]:
import plotly.subplots as sp
import numpy as np

# Collect predictions from the test set
model.eval()
images_list = []
true_labels = []
pred_labels = []
correctness = []

with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        images_list.append(images.cpu().numpy())
        true_labels.append(labels.cpu().numpy())
        pred_labels.append(predicted.cpu().numpy())
        correctness.append(predicted == labels)

# Concatenate all batches
images_list = np.concatenate(images_list, axis=0)
true_labels = np.concatenate(true_labels, axis=0)
pred_labels = np.concatenate(pred_labels, axis=0)
correctness = np.concatenate(correctness, axis=0)

# Select a subset to display (e.g., 25 images)
num_display = 25
indices = np.random.choice(len(images_list), num_display, replace=False)
selected_images = images_list[indices]
selected_true = true_labels[indices]
selected_pred = pred_labels[indices]
selected_correct = correctness[indices]

# Create a subplot grid
rows, cols = 5, 5
fig = sp.make_subplots(rows=rows, cols=cols, subplot_titles=[f"True: {t}, Pred: {p}" for t, p in zip(selected_true, selected_pred)])

for i in range(num_display):
    row = i // cols + 1
    col = i % cols + 1
    img = selected_images[i].squeeze()  # Remove channel dimension
    img = (img * 0.5 + 0.5)  # Denormalize to [0, 1]
    img = np.flipud(img)  # Flip vertically to correct orientation

    # Add image to subplot
    fig.add_trace(
        go.Heatmap(z=img, colorscale='gray', showscale=False),
        row=row, col=col
    )

    # Update title color based on correctness
    title_color = 'green' if selected_correct[i] else 'red'
    fig.layout.annotations[i].update(font=dict(color=title_color))

# Update layout
fig.update_layout(
    title_text="CNN MNIST Classification Results (Green: Correct, Red: Incorrect)",
    height=800,
    width=800,
    showlegend=False
)

# Remove axes for cleaner visualization
for i in range(1, num_display + 1):
    fig.update_xaxes(showticklabels=False, showgrid=False, zeroline=False, row=(i-1)//cols+1, col=(i-1)%cols+1)
    fig.update_yaxes(showticklabels=False, showgrid=False, zeroline=False, row=(i-1)//cols+1, col=(i-1)%cols+1)

fig.show()

**Exercise**: Adjust the kernel size (e.g., from 3 to 5) and observe the impact.

### **Extending CNN with Transfer Learning**

### **Now, let's work with Transfer Learning**

</br>

##### **Transfer Learning with ResNet18**


> Transfer learning uses pre-trained models like ResNet18 for new tasks. We'll fine-tune it on CIFAR-10.

*Fine-tuning a model* involves taking a pre-trained model with learned weights and further training it on a new, often smaller dataset specific to a target task. This process adapts the weights to better fit the new data [1], starting from a more informed initialization than random weights.

In [None]:
import plotly.subplots as sp
import numpy as np

def plot_prediction():
    # CIFAR-10 class names
    class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

    # Collect predictions from the test set
    model.eval()
    images_list = []
    true_labels = []
    pred_labels = []
    correctness = []

    with torch.no_grad():
        for images, labels in test_loader:
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            images_list.append(images.cpu().numpy())
            true_labels.append(labels.cpu().numpy())
            pred_labels.append(predicted.cpu().numpy())
            correctness.append(predicted == labels)

    # Concatenate all batches
    images_list = np.concatenate(images_list, axis=0)
    true_labels = np.concatenate(true_labels, axis=0)
    pred_labels = np.concatenate(pred_labels, axis=0)
    correctness = np.concatenate(correctness, axis=0)

    # Select a subset to display (e.g., 25 images)
    num_display = 25
    indices = np.random.choice(len(images_list), num_display, replace=False)
    selected_images = images_list[indices]
    selected_true = true_labels[indices]
    selected_pred = pred_labels[indices]
    selected_correct = correctness[indices]

    # Create a subplot grid
    rows, cols = 5, 5
    fig = sp.make_subplots(
        rows=rows, cols=cols,
        subplot_titles=[f"True: {class_names[t]}, Pred: {class_names[p]}" for t, p in zip(selected_true, selected_pred)]
    )

    for i in range(num_display):
        row = i // cols + 1
        col = i % cols + 1
        img = selected_images[i]  # Shape: (3, 64, 64)
        img = (img * 0.5 + 0.5)  # Denormalize to [0, 1]
        img = np.transpose(img, (1, 2, 0))  # Change to (64, 64, 3) for Plotly
        img = np.clip(img, 0, 1) * 255  # Scale to [0, 255] and clip
        img = img.astype(np.uint8)

        # Add image to subplot
        fig.add_trace(
            go.Image(z=img, hoverinfo='none'),
            row=row, col=col
        )

        # Update title color based on correctness
        title_color = 'green' if selected_correct[i] else 'red'
        fig.layout.annotations[i].update(font=dict(color=title_color))

    # Update layout
    fig.update_layout(
        title_text="CIFAR-10 Classification Results (Green: Correct, Red: Incorrect)",
        height=800,
        width=1200,
        showlegend=False,
        margin=dict(l=10, r=20, t=60, b=20)
    )

    # Remove axes for cleaner visualization
    for i in range(1, num_display + 1):
        fig.update_xaxes(showticklabels=False, showgrid=False, zeroline=False, row=(i-1)//cols+1, col=(i-1)%cols+1)
        fig.update_yaxes(showticklabels=False, showgrid=False, zeroline=False, row=(i-1)//cols+1, col=(i-1)%cols+1)

    fig.show()

Before that, let's train a model without any wights, and then, compare to another implementation, using the weights from ImageNet.

In [None]:
# Without ImageNet weights ------------------------------------

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torchvision.models import resnet50
from torch.utils.data import DataLoader, Subset, random_split
from torchmetrics import Accuracy
from tqdm import tqdm
import plotly.graph_objects as go
import numpy as np

# Set random seed for reproducibility
torch.manual_seed(42)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Load ResNet50 without pretrained weights
model = resnet50(weights=None)
model = model.to(device)

num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 10)

# Data loading for CIFAR-10
transform = transforms.Compose([
    transforms.Resize(64),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# Load full training and test datasets
train_full = datasets.CIFAR10(root='./data', train=True, transform=transform, download=True)
test_full = datasets.CIFAR10(root='./data', train=False, transform=transform, download=True)

# Select 6,000 random indices for training and validation
indices = torch.randperm(len(train_full))[:6000]
train_subset = Subset(train_full, indices)

# Split the subset into training (5,000) and validation (1,000) sets
train_set, val_set = random_split(train_subset, [5000, 1000])

# Select 1,000 random indices for the test set
test_indices = torch.randperm(len(test_full))[:1000]
test_set = Subset(test_full, test_indices)

# Create data loaders
train_loader = DataLoader(train_set, batch_size=128, shuffle=True, num_workers=2)
val_loader = DataLoader(val_set, batch_size=128, shuffle=False, num_workers=2)
test_loader = DataLoader(test_set, batch_size=128, shuffle=False, num_workers=2)  # Fixed: test_set instead of test_dataset

# Training setup
criterion = nn.CrossEntropyLoss()
# Train the entire model since we're starting from scratch (no pretrained weights)
optimizer = optim.Adam(model.parameters(), lr=0.001)
accuracy = Accuracy(task="multiclass", num_classes=10).to(device)

# Lists to store metrics
train_losses = []
val_losses = []
val_accuracies = []

num_epochs = 2

# Training loop
for epoch in range(num_epochs):
    # Training phase
    model.train()
    train_loss = 0.0

    for images, labels in tqdm(train_loader, desc=f'Epoch {epoch+1}/{num_epochs}'):
        images, labels = images.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()

    train_losses.append(train_loss / len(train_loader))

    # Validation phase
    model.eval()
    val_loss = 0.0

    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)

            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            accuracy.update(outputs, labels)

    val_losses.append(val_loss / len(val_loader))
    val_acc = accuracy.compute()
    val_accuracies.append(val_acc.item())

    print(f'Epoch {epoch+1}, Train Loss: {train_losses[-1]:.4f}, Val Loss: {val_losses[-1]:.4f}, Val Accuracy: {val_acc:.4f}')
    accuracy.reset()

# Test set evaluation (final performance metric)
print("\n===== Final Model Evaluation on Test Set =====")
model.eval()
test_loss = 0.0
accuracy.reset()

with torch.no_grad():
    for images, labels in tqdm(test_loader, desc='Evaluating on test set'):
        images, labels = images.to(device), labels.to(device)

        outputs = model(images)
        loss = criterion(outputs, labels)
        test_loss += loss.item()
        accuracy.update(outputs, labels)

    test_loss /= len(test_loader)
    test_acc = accuracy.compute()
    print(f'Final Test Loss: {test_loss:.4f}')
    print(f'Final Test Accuracy: {test_acc:.4f}')
    print("===========================================================")
    print(f"Model performance: {test_acc*100:.2f}% accuracy on unseen test data")

# Create two separate plots for better clarity

# Plot 1: Training and Validation Loss
loss_fig = go.Figure()

loss_fig.add_trace(go.Scatter(
    x=list(range(1, len(train_losses)+1)),
    y=train_losses,
    mode='lines+markers',
    name='Training Loss',
    line=dict(color='blue', width=2),
    marker=dict(size=8)
))

loss_fig.add_trace(go.Scatter(
    x=list(range(1, len(val_losses)+1)),
    y=val_losses,
    mode='lines+markers',
    name='Validation Loss',
    line=dict(color='red', width=2),
    marker=dict(size=8)
))

loss_fig.update_layout(
    title='Training and Validation Loss (ResNet50 without ImageNet Weights)',
    xaxis_title='Epoch',
    yaxis_title='Loss',
    template='plotly_white',
    legend=dict(x=0.01, y=0.99),
    margin=dict(l=40, r=40, t=60, b=40),
    hovermode='x unified',
    xaxis=dict(tickmode='linear', dtick=1)
)

# Plot 2: Validation Accuracy (during training)
acc_fig = go.Figure()

acc_fig.add_trace(go.Scatter(
    x=list(range(1, len(val_accuracies)+1)),
    y=val_accuracies,
    mode='lines+markers',
    name='Validation Accuracy',
    line=dict(color='green', width=2),
    marker=dict(size=8)
))

acc_fig.update_layout(
    title='Validation Accuracy During Training (ResNet50 without ImageNet Weights)',
    xaxis_title='Epoch',
    yaxis_title='Accuracy',
    template='plotly_white',
    yaxis=dict(range=[0, 1]),
    legend=dict(x=0.01, y=0.01),
    margin=dict(l=40, r=40, t=60, b=40),
    hovermode='x unified',
    xaxis=dict(tickmode='linear', dtick=1)
)

# Display both plots
loss_fig.show()
acc_fig.show()

# After test evaluation, add test accuracy to a final comparison plot
final_fig = go.Figure()

final_fig.add_trace(go.Bar(
    x=['Training', 'Validation', 'Test'],
    y=[train_losses[-1], val_losses[-1], test_loss],
    name='Final Loss',
    marker_color=['blue', 'red', 'purple']
))

# Add another trace for accuracy (only validation and test have accuracy)
final_fig.add_trace(go.Bar(
    x=['Training', 'Validation', 'Test'],
    y=[None, val_accuracies[-1], test_acc.item()],
    name='Final Accuracy',
    marker_color=['lightblue', 'lightgreen', 'green'],
    yaxis='y2'
))

# final_fig.update_layout(
#     title='Final Model Performance (ResNet50 without ImageNet Weights)',
#     template='plotly_white',
#     yaxis=dict(
#         title='Loss',
#         side='left'
#     ),
#     yaxis2=dict(
#         title='Accuracy',
#         overlaying='y',
#         side='right',
#         range=[0, 1]
#     ),
#     barmode='group',
#     legend=dict(x=0.01, y=0.99),
#     margin=dict(l=40, r=40, t=60, b=40)
# )

final_fig.show()

# Save the model
# torch.save(model.state_dict(), 'resnet50_cifar10_no_pretrained.pth')
# print("Model saved to resnet50_cifar10_no_pretrained.pth")

In [None]:
# Plot results
plot_prediction()

Now, let's train the model using the weights from ImageNet.

In [None]:
# With ImageNet weights ------------------------------------

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torchvision.models import resnet18
from torch.utils.data import DataLoader, Subset, random_split
from torchmetrics import Accuracy
from tqdm import tqdm
import plotly.graph_objects as go
import numpy as np

# Set random seed for reproducibility
torch.manual_seed(42)

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Load pre-trained ResNet18
model = resnet18(weights='DEFAULT')
model = model.to(device)

# ResNet18 expects 3 channels (RGB images)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 10)

# Data transformation
transform = transforms.Compose([
    transforms.Resize(64),  # Resize images to 64x64
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# Load full training and test datasets
train_full = datasets.CIFAR10(root='./data', train=True, transform=transform, download=True)
test_full = datasets.CIFAR10(root='./data', train=False, transform=transform, download=True)

# Select 6,000 random indices for training and validation
indices = torch.randperm(len(train_full))[:6000]
train_subset = Subset(train_full, indices)

# Split the subset into training (5,000) and validation (1,000) sets
train_set, val_set = random_split(train_subset, [5000, 1000])

# Select 1,000 random indices for the test set
test_indices = torch.randperm(len(test_full))[:1000]
test_set = Subset(test_full, test_indices)

# Create data loaders
train_loader = DataLoader(train_set, batch_size=128, shuffle=True, num_workers=2)
val_loader = DataLoader(val_set, batch_size=128, shuffle=False, num_workers=2)
test_loader = DataLoader(test_set, batch_size=128, shuffle=False, num_workers=2)  # Fixed: test_set instead of test_dataset

# Training setup
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)  # Optimize only fc layer
accuracy = Accuracy(task="multiclass", num_classes=10).to(device)

# Lists to store losses and accuracies
train_losses = []
val_losses = []
val_accuracies = []

num_epochs = 2

# Training loop
for epoch in range(num_epochs):
    # Training phase
    model.train()
    train_loss = 0.0

    for images, labels in tqdm(train_loader, desc=f'Epoch {epoch+1}/{num_epochs}'):
        images, labels = images.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()

    train_losses.append(train_loss / len(train_loader))

    # Validation phase
    model.eval()
    val_loss = 0.0

    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)

            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            accuracy.update(outputs, labels)

    val_losses.append(val_loss / len(val_loader))
    val_acc = accuracy.compute()
    val_accuracies.append(val_acc.item())

    print(f'Epoch {epoch+1}, Train Loss: {train_losses[-1]:.4f}, Val Loss: {val_losses[-1]:.4f}, Val Accuracy: {val_acc:.4f}')
    accuracy.reset()

# Test set evaluation (final performance metric)
print("\n===== Final Model Evaluation on Test Set =====")
model.eval()
test_loss = 0.0
accuracy.reset()

with torch.no_grad():
    for images, labels in tqdm(test_loader, desc='Evaluating on test set'):
        images, labels = images.to(device), labels.to(device)

        outputs = model(images)
        loss = criterion(outputs, labels)
        test_loss += loss.item()
        accuracy.update(outputs, labels)

    test_loss /= len(test_loader)
    test_acc = accuracy.compute()
    print(f'Final Test Loss: {test_loss:.4f}')
    print(f'Final Test Accuracy: {test_acc:.4f}')
    print("===========================================================")
    print(f"Model performance: {test_acc*100:.2f}% accuracy on unseen test data")

# Create two separate plots for better clarity

# Plot 1: Training and Validation Loss
loss_fig = go.Figure()

loss_fig.add_trace(go.Scatter(
    x=list(range(1, len(train_losses)+1)),
    y=train_losses,
    mode='lines+markers',
    name='Training Loss',
    line=dict(color='blue', width=2),
    marker=dict(size=8)
))

loss_fig.add_trace(go.Scatter(
    x=list(range(1, len(val_losses)+1)),
    y=val_losses,
    mode='lines+markers',
    name='Validation Loss',
    line=dict(color='red', width=2),
    marker=dict(size=8)
))

loss_fig.update_layout(
    title='Training and Validation Loss',
    xaxis_title='Epoch',
    yaxis_title='Loss',
    template='plotly_white',
    legend=dict(x=0.01, y=0.99),
    margin=dict(l=40, r=40, t=60, b=40),
    hovermode='x unified',
    xaxis=dict(tickmode='linear', dtick=1)
)

# Plot 2: Validation Accuracy (during training)
acc_fig = go.Figure()

acc_fig.add_trace(go.Scatter(
    x=list(range(1, len(val_accuracies)+1)),
    y=val_accuracies,
    mode='lines+markers',
    name='Validation Accuracy',
    line=dict(color='green', width=2),
    marker=dict(size=8)
))

acc_fig.update_layout(
    title='Validation Accuracy During Training',
    xaxis_title='Epoch',
    yaxis_title='Accuracy',
    template='plotly_white',
    yaxis=dict(range=[0, 1]),
    legend=dict(x=0.01, y=0.01),
    margin=dict(l=40, r=40, t=60, b=40),
    hovermode='x unified',
    xaxis=dict(tickmode='linear', dtick=1)
)

# Display both plots
loss_fig.show()
acc_fig.show()

# After test evaluation, add test accuracy to a final comparison plot
final_fig = go.Figure()

final_fig.add_trace(go.Bar(
    x=['Training', 'Validation', 'Test'],
    y=[train_losses[-1], val_losses[-1], test_loss],
    name='Final Loss',
    marker_color=['blue', 'red', 'purple']
))

# Add another trace for accuracy (only validation and test have accuracy)
final_fig.add_trace(go.Bar(
    x=['Training', 'Validation', 'Test'],
    y=[None, val_accuracies[-1], test_acc.item()],
    name='Final Accuracy',
    marker_color=['lightblue', 'lightgreen', 'green'],
    yaxis='y2'
))

# final_fig.update_layout(
#     title='Final Model Performance',
#     template='plotly_white',
#     yaxis=dict(
#         title='Loss',
#         side='left'
#     ),
#     yaxis2=dict(
#         title='Accuracy',
#         overlaying='y',
#         side='right',
#         range=[0, 1]
#     ),
#     barmode='group',
#     legend=dict(x=0.01, y=0.99),
#     margin=dict(l=40, r=40, t=60, b=40)
# )

final_fig.show()

# Save the model
# torch.save(model.state_dict(), 'resnet18_cifar10.pth')
# print("Model saved to resnet18_cifar10.pth")

In [None]:
# Plot results
plot_prediction()

### **REFERENCES**

**This hands-on was based or inspired on the following reference materials:**

- Deep Learning with PyTorch by Manning Publications [1]
- PyTorch Official Documentation [2]
- PyTorch Tutorials [3]
- Learn PyTorch for Deep Learning: Zero to Mastery [4]


[1] Stevens, E., Antiga, L., & Viehmann, T. (2020). Deep Learning with PyTorch: Build, train, and tune neural networks using Python tools. Manning.

[2] PyTorch (2025). PyTorch documentation. The Linux Foundation. https://docs.pytorch.org/docs/stable/index.html

[3] PyTorch (2024). Welcome to PyTorch Tutorials. The Linux Foundation. https://docs.pytorch.org/tutorials/

[4] Learn Pytorch (2023). Learn PyTorch for Deep Learning: Zero to Mastery. By Daniel Bourke. https://www.learnpytorch.io/

[5] Project Gutenberg (2025). Alice's Adventures in Wonderland, by Lewis Carroll. https://www.gutenberg.org/files/11/11-0.txt