# Lab1 ‚Äî PyTorch Foundations for Computer Vision

**Course**: Deep Learning for Image Analysis

**Class**: M2 IASD App  

**Professor**: Mehyar MLAWEH

---

## Objectives
By the end of this lab, you should be able to:

- Understand how **neurons and layers** are implemented in PyTorch
- Manipulate **tensors** and reason about shapes
- Use **autograd** to compute gradients
- Implement a **training loop** yourself
- Connect theory (neurons, loss, backprop) to actual code

‚ö†Ô∏è This notebook is **intentionally incomplete**.  
Whenever you see **`# TODO`**, you are expected to write code.


**Deadline:** üóìÔ∏è **Saturday, February 7th (23:59)**

## ü§ñ A small (honest) note before you start

Let‚Äôs be real for a second.

 I know you **can use LLMs (ChatGPT, Copilot, Claude, etc.)** to help you with this lab.  
And yes, **I use them too**, so don‚Äôt worry üòÑ

üëâ **You are allowed to use AI tools.**  
But here‚Äôs the deal:

- Don‚Äôt just **copy‚Äìpaste** code you don‚Äôt understand  
- Take time to **read, question, and modify** what the model gives you  
- If you can solve a block **by yourself, without AI**, that‚Äôs excellent

Remember:

> AI can write code for you, but **only you can understand it** ‚Äî and understanding is what matters for exams, projects, and real work.

Use these tools **as assistants, not as replacements for thinking**.

---

## üìö Useful documentation (highly recommended)

You will often find answers faster (and more reliably) by checking the official documentation:

- **PyTorch main documentation**  
  https://pytorch.org/docs/stable/index.html

- **PyTorch tensors**  
  https://pytorch.org/docs/stable/tensors.html

- **Neural network modules (`torch.nn`)**  
  https://pytorch.org/docs/stable/nn.html

- **Loss functions** (`BCEWithLogitsLoss`, CrossEntropy, etc.)  
  https://pytorch.org/docs/stable/nn.html#loss-functions

- **Optimizers** (`SGD`, `Adam`, ‚Ä¶)  
  https://pytorch.org/docs/stable/optim.html

If you learn how to **navigate the documentation**, you are already thinking like a real AI engineer üëå

---

## PART I

## 0) Colab setup ‚Äî GPU check

**Instructions**
1. In Colab: `Runtime ‚Üí Change runtime type to GPU T4`
2. Select **GPU**
3. Save and restart runtime

Then run the cell below.


In [1]:
import torch

print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

# TODO: set the device correctly (cuda if available, else cpu)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


# print("Using device:", device)


PyTorch version: 2.9.0+cu126
CUDA available: True


## 1) Imports and reproducibility


In [2]:
import torch
import torch.nn as nn
import torch.optim as optim

# TODO: fix the random seed for reproducibility
torch.manual_seed(42)



<torch._C.Generator at 0x7a874f7311d0>

## 2) PyTorch tensors and shapes

Tensors are multi-dimensional arrays that support:
- GPU acceleration
- automatic differentiation

Understanding **shapes** is critical in deep learning.


In [3]:
# Examples
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.randn(4, 5)

print("a shape:", a.shape)
print("b shape:", b.shape)

print(a)
print(b)

a shape: torch.Size([3])
b shape: torch.Size([4, 5])
tensor([1., 2., 3.])
tensor([[ 1.9269,  1.4873,  0.9007, -2.1055, -0.7581],
        [ 1.0783,  0.8008,  1.6806,  0.3559, -0.6866],
        [-0.4934,  0.2415, -0.2316,  0.0418, -0.2516],
        [ 0.8599, -0.3097, -0.3957,  0.8034, -0.6216]])


### üîç Question (answer inside the markdown)
- How many dimensions does tensor `b` have?
- What does each dimension represent conceptually?

* there are 2 dimensions in tensor `b`
* in a mathematical understanding it depresents a matrix so the first corresponds to the number of rows and the second to the columns.


### ‚úÖTensor operations

Complete the following:

1. Create a tensor `x` of shape `(8, 3)` with random values  
2. Compute:
   - the **mean of each column**
   - the **L2 norm of each row**
3. Normalize `x` **row-wise** using the L2 norm

In [4]:
# TODO: create x
x = torch.randn(8, 3)

# TODO: column mean
col_mean = torch.mean(x, dim = 0)


# TODO: row-wise L2 norm
row_norm = torch.norm(x, dim = 1)



# print(x.shape, col_mean.shape, row_norm.shape, x_normalized.shape)


In [5]:
# TODO: normalized tensor
x_normalized = x / row_norm.view(-1, 1)

## 3) Artificial neuron ‚Äî from math to code

A neuron computes:

$$
z = \sum_i w_i x_i + b
$$

Then applies an activation function:

$$
y = g(z)
$$

This section connects directly to the theory seen in class.


In [6]:
x = torch.tensor([1.0, -2.0, 3.0])
w = torch.tensor([0.2, 0.4, -0.1])
b = torch.tensor(0.1)

z = torch.sum(x * w) + b
z


tensor(-0.8000)

### Activation functions

1. Implement **ReLU**
2. Implement **Sigmoid**
3. Apply both to `z` and compare the outputs

Which activation preserves negative values?


In [18]:
# TODO
def relu(z):
  return torch.maximum(z,torch.zeros_like(z))

def sigmoid(z):
    return torch.sigmoid(z)

y_relu = relu(z)
y_sigmoid = sigmoid(z)
y_relu, y_sigmoid


(tensor(0., grad_fn=<MaximumBackward0>),
 tensor(0.3318, grad_fn=<SigmoidBackward0>))

## 4) Autograd and gradients

PyTorch uses **automatic differentiation** to compute gradients
using the **chain rule** (backpropagation).


In [8]:
x = torch.tensor([1.0, 2.0, -1.0], requires_grad=True)
w = torch.tensor([0.5, -0.3, 0.8], requires_grad=True)
b = torch.tensor(0.2, requires_grad=True)

z = torch.sum(x * w) + b
loss = (z - 1.0) ** 2

loss.backward()

print("loss:", loss.item())
print("grad w:", w.grad)
print("grad b:", b.grad)


loss: 2.890000104904175
grad w: tensor([-3.4000, -6.8000,  3.4000])
grad b: tensor(-3.4000)


### üîç Conceptual question

- If `b.grad > 0`, should `b` increase or decrease after a gradient descent step?
Explain **why** in one sentence.


## 5) Toy classification dataset

We create a **linearly separable** dataset.

Label rule:
- class = 1 if `x‚ÇÅ + x‚ÇÇ + x‚ÇÉ > 0`
- else class = 0

This mimics a very simple classification problem.


In [9]:
# TODO: generate a dataset of size N=500 with 3 features
X = torch.randn(500, 3)
y = torch.randn(500, 1)

In [10]:
for i in range(500):
  if X[i, 0] + X[i, 1] + X[i, 2] > 0 :
    y[i] = 1
  else :
    y[i] = 0

In [11]:
# TODO: split into train (80%) and validation (20%)

X_train, X_val = X[:400], X[400:]
y_train, y_val = y[:400], y[400:]


In [13]:
X_train.shape, X_val.shape, y_train.shape, y_val.shape

(torch.Size([400, 3]),
 torch.Size([100, 3]),
 torch.Size([400, 1]),
 torch.Size([100, 1]))

## 6) Model definition

We define a small **MLP** (fully-connected network):

`3 ‚Üí 16 ‚Üí 8 ‚Üí 1`

Activation: ReLU  
Output: raw logits (no sigmoid)


In [14]:
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(3, 16),       # TODO: Linear 3 ‚Üí 16
            nn.ReLU(),              # TODO: ReLU
            nn.Linear(16, 8),       # TODO: Linear 16 ‚Üí 8
            nn.ReLU(),              # TODO: ReLU
            nn.Linear(8, 1),       # TODO: Linear 8 ‚Üí 1
        )

    def forward(self, x):
        return self.net(x)

# TODO: create model and move it to the GPU
model = MLP().to(device)




###  parameters

1. Compute **by hand** the total number of parameters
2. Verify your answer using PyTorch


In [20]:
# TODO: count parameters with PyTorch
total_params_calculated = 3*16 + 16 + 16*8 + 8 + 8*1 + 1

# Programmatic verification:
total_params_pytorch = sum(p.numel() for p in model.parameters())

print(f"Calculated by hand: {total_params_calculated}")
print(f"Counted by PyTorch: {total_params_pytorch}")

Calculated by hand: 209
Counted by PyTorch: 209


## 7) Training loop

You must complete the full training loop:
- forward pass
- loss computation
- backward pass
- optimizer step

Loss: `BCEWithLogitsLoss`
Optimizer: `SGD`


In [19]:
# TODO: move data to device
X_train_d = X_train.to(device)
y_train_d = y_train.to(device)
X_val_d = X_val.to(device)
y_val_d = y_val.to(device)

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(20):
    model.train()
    optimizer.zero_grad()

    # TODO: forward
    logits = model(X_train_d)

    # TODO: loss
    loss = criterion(logits, y_train_d)

    # TODO: backward
    loss.backward()

    # TODO: update
    optimizer.step()

    if epoch % 5 == 0:
        print("Epoch", epoch, "| loss =", float(loss))


Epoch 0 | loss = 0.6936067938804626
Epoch 5 | loss = 0.6922205686569214
Epoch 10 | loss = 0.6908179521560669
Epoch 15 | loss = 0.6892109513282776


## 8) Evaluation

1. Apply `sigmoid` to the logits
2. Convert probabilities to predictions
3. Compute **accuracy** on the validation set


In [21]:
# TODO: evaluation
model.eval() # Set model to evaluation mode (disables dropout, etc.)

with torch.no_grad(): # Disable gradient calculation for efficiency
    # 1. Forward pass on validation data
    logits_val = model(X_val_d)

    # 2. Apply sigmoid to get probabilities
    probs_val = torch.sigmoid(logits_val)

    # 3. Convert to predictions (0 or 1) using 0.5 threshold
    preds_val = (probs_val > 0.5).float()

    # 4. Compute accuracy
    # Check where prediction equals target, convert boolean to float, take mean
    accuracy = (preds_val == y_val_d).float().mean()

print(f"Validation Accuracy: {accuracy.item() * 100:.2f}%")

Validation Accuracy: 60.00%


## 9) Reflection questions (answer inside the markdown)

1. Why do we **not** apply sigmoid inside the model?
2. What would happen if we removed all ReLU activations?
3. How does this toy problem relate to image classification?

Write short answers (2‚Äì3 lines each).

1 - We use BCEWithLogitsLoss, which combines the Sigmoid layer and the BCELoss in one single class. This is more numerically stable (prevents overflow/underflow issues) than applying them separately.

2 - The model would collapse into a single linear transformation (essentially Logistic Regression), regardless of how many layers you add. Without non-linearities, a deep neural network cannot learn complex, non-linear decision boundaries.

3 - The fundamental pipeline is identical: Input -> Layers (Weights) -> Non-Linearity -> Loss -> Backprop. In image classification, the input is just a larger tensor (pixels) and we typically use Convolutional layers instead of just Linear layers, but the training logic remains the same.

## 10) Bridge to Computer Vision

So far:
- inputs = vectors of size 3
- layers = fully-connected

Next session:
- inputs = images `(B, C, H, W)`
- layers = convolutions
- same training logic

üëâ **Architecture changes, learning principles stay the same.**


## Part II ‚Äî Training on MNIST

Check the next notebook

okay
