### **State University of Campinas - UNICAMP** </br>
**Course**: MC886A </br>
**Professor**: Marcelo da Silva Reis </br>
**TA (PED)**: Marcos Vinicius Souza Freire

---

### **Hands-On: Deep Learnng with PyTorch**
##### Notebook: 00 Perceptron and MLP
---

### **Table of Contents**

1. [**Objectives**](#objectives) </br>
2. [**Prerequisites**](#prerequisites) </br>
3. [**Basic Concept**](#basic-concept) </br>
  3.1. [Perceptron](#1-perceptron) </br>
  3.2. [Single-Layer Perceptron (SLP)](#2-single--layer-perceptron-slp) </br>
  3.3. [Multi-Layer Perceptron (MLP)](#3-multi--layer-perceptron-mlp) </br>

4. [**REFERENCES**](#references)

---

#### **Objectives**
- Understand how perceptrons are originated and created.
- Advance from Single-Layer Perceptrons to Multi-Layer Perceptrons.

---



#### **Prerequisites**
- Install PyTorch and some extra packages.
- Have Python and a Jupyter Notebook ready (great for interactive demos).

Installing Pytorch (for all setups in `00-setup.ipynb` from the hands-on 00):

- `pip install torch torchvision`

- `pip install nbformat`

- `pip install torchmetrics`

- To plot pretty graphs, you can use Plotly
`pip install plotly`

---

### **Basic concept**:

### 1. Perceptron

**Definition:**
The Perceptron is the simplest form of neural network, consisting of a single artificial neuron. It takes multiple inputs, applies weights to them, sums them together with a bias term, and passes the result through a step activation function to produce a binary output.

**Formula:**
$y = \text{step}(w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n)$

Where:
- $y$ is the output (0 or 1)
- $x_i$ are the inputs
- $w_i$ are the weights
- $w_0$ is the bias term
- $\text{step}(z) = \begin{cases} 1 & \text{if } z \geq 0 \\ 0 & \text{if } z < 0 \end{cases}$

In [None]:
import plotly.graph_objects as go

node_size = 40
font_size = 12
layer_colors = {'input': '#636EFA', 'bias': '#00CC96',
               'hidden': '#FFA15A', 'output': '#EF553B',
               'activation': '#AB63FA'}

def create_network(nodes, edges, title):
    fig = go.Figure()
    legend_groups = set()

    # Create edges with weights
    for i, ((src, dest), weight) in enumerate(edges.items()):
        x0, y0 = nodes[src]['x'], nodes[src]['y']
        x1, y1 = nodes[dest]['x'], nodes[dest]['y']
        fig.add_trace(go.Scatter(
            x=[x0, x1, None], y=[y0, y1, None],
            line=dict(width=1, color='gray'),
            mode='lines',
            hoverinfo='text',
            text=f'Weight: {weight}',
            showlegend=i == 0,
            legendgroup='weights',
            name='Weights'
        ))

    # Create nodes with legend groups
    for node in nodes:
        # Determine node type from color
        node_type = [k for k, v in layer_colors.items() if v == node['color']][0]

        fig.add_trace(go.Scatter(
            x=[node['x']], y=[node['y']],
            mode='markers+text',
            marker=dict(size=node_size, color=node['color']),
            text=node.get('label', ''),
            textposition="top center",
            hoverinfo='text',
            hovertext=node.get('formula', ''),
            showlegend=node_type not in legend_groups,
            legendgroup=node_type,
            name=f'{node_type.capitalize()} Node'
        ))
        if node_type not in legend_groups:
            legend_groups.add(node_type)

    # Add activation functions as separate traces
    activation_added = False
    for node in nodes:
        if 'activation' in node:
            fig.add_trace(go.Scatter(
                x=[node['x'] + 0.25],  # Offset from node
                y=[node['y'] + 0.1],   # Vertical adjustment
                mode='text',
                text=node['activation'],
                textfont=dict(color=layer_colors['activation'], size=font_size),
                showlegend=not activation_added,
                legendgroup='activation',
                name='Activation Function'
            ))
            activation_added = True

    fig.update_layout(
        title=title,
        template='plotly_white',
        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        margin=dict(l=20, r=150),
        legend=dict(
            x=1.05,
            y=0.5,
            xanchor='left',
            yanchor='middle',
            itemsizing='constant'
        )
    )
    return fig

In [None]:
# --------------------------------------------------------------------------
# 1. Perceptron
# --------------------------------------------------------------------------
perceptron_nodes = [
    {'x': 0, 'y': 1, 'label': '1', 'color': layer_colors['bias'],
     'formula': 'Bias (always 1)'},
    {'x': 0, 'y': 0, 'label': 'x₁', 'color': layer_colors['input']},
    {'x': 0, 'y': -1, 'label': 'x₂', 'color': layer_colors['input']},
    {'x': 1, 'y': 0, 'label': 'y', 'color': layer_colors['output'],
     'activation': 'step(Σ)',
     'formula': 'Output: y = step(w₀ + w₁x₁ + w₂x₂)'}
]

perceptron_edges = {
    (0,3): 'w₀',
    (1,3): 'w₁',
    (2,3): 'w₂'
}

fig1 = create_network(perceptron_nodes, perceptron_edges, "Perceptron")
fig1.show()

### 2. Single-Layer Perceptron (SLP)

**Definition:**
A Single-Layer Perceptron extends the basic perceptron by including multiple output neurons, allowing it to classify inputs into more than two categories. It still has a single layer of computation (the output layer) with direct connections from all inputs to all outputs.

**Formula:**
For each output neuron $j$:
$y_j = \text{step}(\sum_{i=0}^{n} w_{ij}x_i)$

Where:
- $y_j$ is the output of the $j$-th output neuron
- $x_i$ are the inputs (with $x_0 = 1$ for the bias)
- $w_{ij}$ is the weight from input $i$ to output neuron $j$

In [None]:
# --------------------------------------------------------------------------
# 2. Single-Layer Perceptron
# --------------------------------------------------------------------------
slp_nodes = [
    {'x': 0, 'y': 1, 'label': '1', 'color': layer_colors['bias']},
    {'x': 0, 'y': 0, 'label': 'x₁', 'color': layer_colors['input']},
    {'x': 0, 'y': -1, 'label': 'x₂', 'color': layer_colors['input']},
    {'x': 1, 'y': 0.5, 'label': 'y₁', 'color': layer_colors['output'],
     'activation': 'step(Σ)', 'formula': 'y₁ = step(Σwᵢxᵢ + b₁)'},
    {'x': 1, 'y': -0.5, 'label': 'y₂', 'color': layer_colors['output'],
     'activation': 'step(Σ)', 'formula': 'y₂ = step(Σwⱼxⱼ + b₂)'}
]

slp_edges = {
    (0,3): 'w₀₁', (0,4): 'w₀₂',
    (1,3): 'w₁₁', (1,4): 'w₁₂',
    (2,3): 'w₂₁', (2,4): 'w₂₂'
}

fig2 = create_network(slp_nodes, slp_edges, "Single-Layer Perceptron (SLP)")
fig2.show()

### 3. Multi-Layer Perceptron (MLP)

**Definition:**
A Multi-Layer Perceptron introduces one or more hidden layers between the input and output layers. This allows the network to learn non-linear relationships and solve problems that aren't linearly separable. Each neuron typically uses non-linear activation functions like ReLU or sigmoid.

**Formula:**
For a MLP with one hidden layer:

Hidden layer: For each hidden neuron $h_k$:
$h_k = \text{activation}_h(\sum_{i=0}^{n} w_{ik}^{(1)}x_i)$

Output layer: For each output neuron $y_j$:
$y_j = \text{activation}_o(\sum_{k=0}^{m} w_{kj}^{(2)}h_k)$

Where:
- $h_k$ is the output of the $k$-th hidden neuron
- $y_j$ is the output of the $j$-th output neuron
- $w_{ik}^{(1)}$ is the weight from input $i$ to hidden neuron $k$
- $w_{kj}^{(2)}$ is the weight from hidden neuron $k$ to output neuron $j$
- $\text{activation}_h$ and $\text{activation}_o$ are activation functions (e.g., ReLU, sigmoid, tanh)

In [None]:
# --------------------------------------------------------------------------
# 3. Multi-Layer Perceptron
# --------------------------------------------------------------------------
mlp_nodes = [
    {'x': 0, 'y': 1, 'label': '1', 'color': layer_colors['bias']},
    {'x': 0, 'y': 0, 'label': 'x₁', 'color': layer_colors['input']},
    {'x': 0, 'y': -1, 'label': 'x₂', 'color': layer_colors['input']},
    {'x': 1, 'y': 1, 'label': 'h₁', 'color': layer_colors['hidden'],
     'activation': 'ReLU(Σ)', 'formula': 'h₁ = ReLU(Σwᵢxᵢ + b₁)'},
    {'x': 1, 'y': -1, 'label': 'h₂', 'color': layer_colors['hidden'],
     'activation': 'ReLU(Σ)', 'formula': 'h₂ = ReLU(Σwⱼxⱼ + b₂)'},
    {'x': 2, 'y': 0, 'label': 'y', 'color': layer_colors['output'],
     'activation': 'σ(Σ)', 'formula': 'y = sigmoid(Σwₖhₖ + b₃)'}
]

mlp_edges = {
    (0,3): 'w₀₁', (0,4): 'w₀₂',
    (1,3): 'w₁₁', (1,4): 'w₁₂',
    (2,3): 'w₂₁', (2,4): 'w₂₂',
    (3,5): 'w₃₁',
    (4,5): 'w₄₁'
}

fig3 = create_network(mlp_nodes, mlp_edges, "Multi-Layer Perceptron (MLP)")
fig3.show()

#### **Implementation of a MLP**

Multi-Layer Perceptron (MLP)

An MLP is a fully connected neural network used for tasks like classification. We'll classify MNIST digits.

In [None]:
# Install Pytorch (if not installed) with libraries to handle vision/image operations (Torchvision) and get metrics (Torchmetrics)
!pip install torch torchvision torchmetrics

In [None]:
import plotly.graph_objects as go
import numpy as np

layer_colors = {
    'input': '#636EFA',
    'flatten': '#EF553B',
    'fc': '#AB63FA',
    'output': '#19D3F3',
    'activation': '#FF6692'
}

layer_descriptions = {
    'input': "Input Layer\n28×28×1 grayscale image",
    'flatten': "Flatten Layer\nConverts 2D image to 1D vector (784 features)",
    'fc': "Fully Connected Layer\nLearns global patterns with ReLU activation",
    'output': "Output Layer\n10 units (one per class)"
}

def create_mlp_visualization():
    fig = go.Figure()
    x_positions = [0, 2, 4, 6, 8]
    legend_groups = set()

    # Input layer
    create_3d_layer(fig, x_positions[0], 3, 3, 'input', 'Input\n28×28×1',
                   legend_groups, layer_descriptions['input'])

    # Flatten layer
    add_flatten_layer(fig, x_positions[1], legend_groups,
                     layer_descriptions['flatten'])

    # Fully Connected layers
    fc_layers = [
        {'units': 128, 'label': '128', 'activation': 'ReLU', 'type': 'fc'},
        {'units': 64, 'label': '64', 'activation': 'ReLU', 'type': 'fc'},
        {'units': 10, 'label': '10', 'activation': None, 'type': 'output'}
    ]

    for i, layer in enumerate(fc_layers):
        x = x_positions[2 + i]
        add_fc_layer(fig, x, layer['units'], layer['label'],
                    layer['type'], legend_groups, layer['activation'],
                    layer_descriptions[layer['type']])

    add_mlp_connections(fig, x_positions)

    fig.update_layout(
        title=dict(
            text="Multilayer Perceptron (MLP) Architecture",
            x=0.05,
            font=dict(size=24)
        ),
        template='plotly_white',
        margin=dict(l=20, r=300, t=100, b=20),
        legend=dict(
            title="Layer Types",
            x=1.05,
            y=0.5,
            xanchor='left',
            yanchor='middle',
            font=dict(size=12)),
        width=1400,
        height=600,
        scene=dict(
            xaxis=dict(showgrid=False, zeroline=False, visible=False),
            yaxis=dict(showgrid=False, zeroline=False, visible=False),
            zaxis=dict(showgrid=False, zeroline=False, visible=False),
            aspectmode='manual',
            aspectratio=dict(x=2, y=1, z=0.5)),
        annotations=[dict(
            x=1.05,
            y=0.9,
            xref='paper',
            yref='paper',
            text="<b>MLP Components:</b><br>"
                 "- Flatten: 2D→1D conversion<br>"
                 "- Fully Connected: Dense layers<br>"
                 "- ReLU: Non-linear activation",
            showarrow=False,
            align='left',
            font=dict(size=14))])
    return fig

def create_3d_layer(fig, x_pos, height, depth, layer_type, label_text,
                   legend_groups, description):
    # 3D box coordinates
    x = [x_pos, x_pos+1.2, x_pos+1.2, x_pos] * 2
    y = [-height/2, -height/2, height/2, height/2] * 2
    z = [0]*4 + [depth]*4

    edges = [(0,1), (1,2), (2,3), (3,0), (4,5), (5,6), (6,7), (7,4), (0,4), (1,5), (2,6), (3,7)]

    for i, (start, end) in enumerate(edges):
        show_legend = (layer_type not in legend_groups) and (i == 0)
        fig.add_trace(go.Scatter3d(
            x=[x[start], x[end]],
            y=[y[start], y[end]],
            z=[z[start], z[end]],
            mode='lines',
            line=dict(color=layer_colors[layer_type], width=2),
            showlegend=show_legend,
            name=layer_descriptions[layer_type].split('\n')[0],
            legendgroup=layer_type,
            hoverinfo='text',
            hovertext=description))
    if layer_type not in legend_groups:
        legend_groups.add(layer_type)

    # Layer label
    fig.add_trace(go.Scatter3d(
        x=[x_pos+0.6],
        y=[-height/2 - 0.5],
        z=[0],
        mode='text',
        text=label_text,
        textfont=dict(color='black', size=14),
        hoverinfo='none',
        showlegend=False))

def add_flatten_layer(fig, x_pos, legend_groups, description):
    y_values = np.linspace(-1, 1, 20)
    fig.add_trace(go.Scatter3d(
        x=[x_pos]*20,
        y=y_values,
        z=[0]*20,
        mode='lines',
        line=dict(color=layer_colors['flatten'], width=6),
        showlegend='flatten' not in legend_groups,
        name=layer_descriptions['flatten'].split('\n')[0],
        legendgroup='flatten',
        hoverinfo='text',
        hovertext=description))
    legend_groups.add('flatten')

def add_fc_layer(fig, x_pos, units, label, layer_type, legend_groups, activation, description):
    # Add nodes
    num_nodes = min(units, 20)  # Limit nodes for visualization
    y_values = np.linspace(-0.8, 0.8, num_nodes)

    fig.add_trace(go.Scatter3d(
        x=[x_pos]*num_nodes,
        y=y_values,
        z=[0]*num_nodes,
        mode='markers',
        marker=dict(size=6, color=layer_colors[layer_type]),
        showlegend=layer_type not in legend_groups,
        name=description.split('\n')[0],
        legendgroup=layer_type,
        hoverinfo='text',
        hovertext=f"{description}\nUnits: {units}"))

    if layer_type not in legend_groups:
        legend_groups.add(layer_type)

    # Activation annotation
    if activation:
        fig.add_trace(go.Scatter3d(
            x=[x_pos + 0.3],
            y=[0.9],
            z=[0],
            mode='text',
            text=f'σ = {activation}',
            textfont=dict(color=layer_colors['activation'], size=14),
            showlegend=False))

    # Layer label
    fig.add_trace(go.Scatter3d(
        x=[x_pos],
        y=[-1.2],
        z=[0],
        mode='text',
        text=label,
        textfont=dict(color='black', size=14),
        showlegend=False))

def add_mlp_connections(fig, x_positions):
    operations = [
        ("Flatten", "28×28→784", ""),
        ("Fully Connected", "784→128", "ReLU"),
        ("Fully Connected", "128→64", "ReLU"),
        ("Fully Connected", "64→10", "")
    ]

    for i in range(len(x_positions)-1):
        x_start = x_positions[i] + 1.2
        x_end = x_positions[i+1]

        fig.add_trace(go.Scatter3d(
            x=np.linspace(x_start, x_end, 30),
            y=np.zeros(30),
            z=np.zeros(30),
            mode='lines',
            line=dict(color='gray', width=1),
            hoverinfo='text',
            hovertext=f"Operation: {operations[i][0]}<br>{operations[i][1]}<br>{operations[i][2]}",
            showlegend=False))

        fig.add_trace(go.Scatter3d(
            x=[(x_start + x_end)/2],
            y=[0.5],
            z=[0],
            mode='text',
            text=operations[i][0],
            textfont=dict(size=12, color='black'),
            showlegend=False))

# Generate and display the visualization
mlp_fig = create_mlp_visualization()
mlp_fig.show()

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split
from torchmetrics import Accuracy
from tqdm import tqdm
import plotly.graph_objects as go

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Define the MLP model
class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Data loading
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)

train_set, val_set = random_split(train_dataset, [55000, 5000])  # 55k train, 5k val

train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
val_loader = DataLoader(val_set, batch_size=64, shuffle=False)

test_dataset = datasets.MNIST(root='./data', train=False, transform=transform)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Training setup
model = MLP()

# Set mode to run on GPU
model = model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
accuracy = Accuracy(task="multiclass", num_classes=10)

# Lists to store losses
train_losses = []
val_losses = []

num_epochs = 5

# Training loop
for epoch in range(num_epochs):
    model.train()
    train_loss = 0.0
    for images, labels in tqdm(train_loader, desc=f'Epoch {epoch+1}/{num_epochs}'):
        images, labels = images.to(device), labels.to(device)  # Move data to GPU
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
    train_losses.append(train_loss / len(train_loader))

    # Validation
    model.eval()
    val_loss = 0
    val_acc = 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)  # Move data to GPU
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            accuracy.update(outputs, labels)

    val_losses.append(val_loss / len(val_loader))
    val_acc = accuracy.compute()
    print(f'Epoch {epoch+1}, Train Loss: {train_losses[-1]:.4f}, Val Loss: {val_losses[-1]:.4f}, Val Accuracy: {val_acc:.2f}')
    accuracy.reset()

# Test set evaluation
model.eval()
test_loss = 0.0
test_acc = 0.0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        loss = criterion(outputs, labels)
        test_loss += loss.item()
        accuracy.update(outputs, labels)
    test_loss /= len(test_loader)
    test_acc = accuracy.compute()
    print(f'Test Loss: {test_loss:.4f}, Test Accuracy: {test_acc:.2f}')

# Plot losses
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=list(range(1, len(train_losses)+1)),
    y=train_losses,
    mode='lines+markers',
    name='Train Loss',
    line=dict(color='blue')
))

fig.add_trace(go.Scatter(
    x=list(range(1, len(val_losses)+1)),
    y=val_losses,
    mode='lines+markers',
    name='Val Loss',
    line=dict(color='red')
))

fig.update_layout(
    title='MLP Training Progress',
    xaxis_title='Epoch',
    yaxis_title='Loss',
    template='plotly_white',
    legend=dict(x=0.8, y=0.9),
    margin=dict(l=40, r=20, t=40, b=20)
)

fig.show()

In [None]:
import plotly.subplots as sp
import numpy as np

# Collect predictions from the test set
model.eval()
images_list = []
true_labels = []
pred_labels = []
correctness = []

with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        images_list.append(images.cpu().numpy())
        true_labels.append(labels.cpu().numpy())
        pred_labels.append(predicted.cpu().numpy())
        correctness.append(predicted == labels)

# Concatenate all batches
images_list = np.concatenate(images_list, axis=0)
true_labels = np.concatenate(true_labels, axis=0)
pred_labels = np.concatenate(pred_labels, axis=0)
correctness = np.concatenate(correctness, axis=0)

# Select a subset to display (e.g., 25 images)
num_display = 25
indices = np.random.choice(len(images_list), num_display, replace=False)
selected_images = images_list[indices]
selected_true = true_labels[indices]
selected_pred = pred_labels[indices]
selected_correct = correctness[indices]

# Create a subplot grid
rows, cols = 5, 5
fig = sp.make_subplots(rows=rows, cols=cols, subplot_titles=[f"True: {t}, Pred: {p}" for t, p in zip(selected_true, selected_pred)])

for i in range(num_display):
    row = i // cols + 1
    col = i % cols + 1
    img = selected_images[i].squeeze()  # Remove channel dimension
    img = (img * 0.5 + 0.5)  # Denormalize to [0, 1]
    img = np.flipud(img)  # Flip vertically to correct orientation

    # Add image to subplot
    fig.add_trace(
        go.Heatmap(z=img, colorscale='gray', showscale=False),
        row=row, col=col
    )

    # Update title color based on correctness
    title_color = 'green' if selected_correct[i] else 'red'
    fig.layout.annotations[i].update(font=dict(color=title_color))

# Update layout
fig.update_layout(
    title_text="MNIST Classification Results (Green: Correct, Red: Incorrect)",
    height=800,
    width=800,
    showlegend=False
)

# Remove axes for cleaner visualization
for i in range(1, num_display + 1):
    fig.update_xaxes(showticklabels=False, showgrid=False, zeroline=False, row=(i-1)//cols+1, col=(i-1)%cols+1)
    fig.update_yaxes(showticklabels=False, showgrid=False, zeroline=False, row=(i-1)//cols+1, col=(i-1)%cols+1)

fig.show()

**Exercise**: Change the number of neurons in the hidden layers (e.g., 128 to 256) and observe the impact on the loss curves and accuracy.

### **REFERENCES**

**This hands-on was based or inspired on the following reference materials:**

- PyTorch Official Documentation [1]
- PyTorch Tutorials [2]
- Learn PyTorch for Deep Learning: Zero to Mastery [3]


[1] PyTorch (2025). PyTorch documentation. The Linux Foundation. https://docs.pytorch.org/docs/stable/index.html

[2] PyTorch (2024). Welcome to PyTorch Tutorials. The Linux Foundation. https://docs.pytorch.org/tutorials/

[3] Learn Pytorch (2023). Learn PyTorch for Deep Learning: Zero to Mastery. By Daniel Bourke. https://www.learnpytorch.io/