# Deep Learning Homework - Reference Solution

This notebook contains a complete implementation for the deep learning homework assignment.


In [1]:
# Apply the suggested random number generation process set by the question for reproduceability sake:

import numpy as np
import torch

SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)

if torch.cuda.is_available():
    torch.cuda.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False


## Model Architecture

For this homework we will use Convolutional Neural Network (CNN). We'll use PyTorch.

Model structure:
* The shape for input should be `(3, 200, 200)` (channels first format in PyTorch)
* Next, create a convolutional layer (`nn.Conv2d`):
    * Use 32 filters (output channels)
    * Kernel size should be `(3, 3)` (that's the size of the filter)
    * Use `'relu'` as activation
* Reduce the size of the feature map with max pooling (`nn.MaxPool2d`)
    * Set the pooling size to `(2, 2)`
* Turn the multi-dimensional result into vectors using `flatten` or `view`
* Next, add a `nn.Linear` layer with 64 neurons and `'relu'` activation
* Finally, create the `nn.Linear` layer with 1 neuron - this will be the output
    * The output layer should have an activation - use the appropriate activation for the binary classification case

As optimizer use `torch.optim.SGD` with the following parameters:
* `torch.optim.SGD(model.parameters(), lr=0.002, momentum=0.8)`


In [2]:
# Import necessary libraries
import os
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset
from torchvision.datasets import ImageFolder
import numpy as np
from tqdm import tqdm

# Try to import torchsummary, but it's optional
try:
    from torchsummary import summary
    HAS_TORCHSUMMARY = True
except ImportError:
    HAS_TORCHSUMMARY = False
    print("torchsummary not available. Install with: pip install torchsummary")


In [3]:
# Define the CNN model
class BinaryCNN(nn.Module):
    def __init__(self):
        super(BinaryCNN, self).__init__()

        # Convolutional layer: (3, 200, 200) -> (32, 198, 198)
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3, 3))
        self.relu1 = nn.ReLU()

        # Max pooling: (32, 198, 198) -> (32, 99, 99)
        self.pool = nn.MaxPool2d(kernel_size=(2, 2))

        # Flatten: (32, 99, 99) -> (32 * 99 * 99,)
        # First linear layer: 32 * 99 * 99 -> 64
        self.fc1 = nn.Linear(32 * 99 * 99, 64)
        self.relu2 = nn.ReLU()

        # Output layer: 64 -> 1 (binary classification)
        self.fc2 = nn.Linear(64, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # Convolution + ReLU
        x = self.conv1(x)
        x = self.relu1(x)

        # Max pooling
        x = self.pool(x)

        # Flatten
        x = x.view(x.size(0), -1)

        # First fully connected layer + ReLU
        x = self.fc1(x)
        x = self.relu2(x)

        # Output layer + Sigmoid
        x = self.fc2(x)
        x = self.sigmoid(x)

        return x


In [4]:
# Create model instance
model = BinaryCNN()

# Optimizer
optimizer = optim.SGD(model.parameters(), lr=0.002, momentum=0.8)

# Loss function for binary classification
criterion = nn.BCELoss()


## Question 1: Which loss function you will use?

For binary classification with sigmoid activation, we use **BCELoss** (Binary Cross Entropy Loss).

Answer: **nn.BCELoss()** (or **nn.BCEWithLogitsLoss()** if we didn't use sigmoid)

Since we're using sigmoid activation in the output layer, we use `nn.BCELoss()`.
If we didn't use sigmoid, we would use `nn.BCEWithLogitsLoss()` which combines sigmoid and BCE loss.


In [5]:
# Question 2: Total number of parameters
# Method 1: Using torchsummary (if available)
if HAS_TORCHSUMMARY:
    try:
        summary(model, (3, 200, 200))
    except Exception as e:
        print(f"Error using torchsummary: {e}")
        print("Counting manually...")
else:
    print("torchsummary not available, counting manually...")

# Method 2: Count manually
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

# Manual calculation breakdown:
# Conv2d: 3 * 32 * 3 * 3 + 32 (bias) = 896
# Linear1: 32 * 99 * 99 * 64 + 64 (bias) = 20,074,048
# Linear2: 64 * 1 + 1 (bias) = 65
# Total: 896 + 20,074,048 + 65 = 20,075,009


Error using torchsummary: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
Counting manually...

Total parameters: 20,073,473
Trainable parameters: 20,073,473


## Data Preparation

We'll use the **straight/curly hair** dataset from
`http://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip`.

Once unzipped, the dataset has the following folder structure:
- `data/train/`
- `data/test/`

Each split contains two subfolders (e.g. `curly/` and `straight/`) with images.
We'll:
- Download and unzip the dataset (if not already present)
- Resize images to 200x200 as required
- Use `ImageFolder` to load images
- Keep it as a **binary** problem (curly vs straight).


In [6]:
# Download and unzip straight/curly hair dataset if not already present
import urllib.request
import zipfile

DATA_URL = "http://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip"
DATA_ZIP_PATH = "./data.zip"
DATA_DIR = "./data"

if not os.path.exists(DATA_DIR):
    print("Downloading dataset...")
    urllib.request.urlretrieve(DATA_URL, DATA_ZIP_PATH)
    print("Unzipping dataset...")
    with zipfile.ZipFile(DATA_ZIP_PATH, "r") as zip_ref:
        zip_ref.extractall(".")
    print("Done!")
else:
    print("Dataset already present, skipping download.")

Downloading dataset...
Unzipping dataset...
Done!


In [7]:
# Path to straight/curly hair dataset (after unzipping data.zip)
data_dir = "./data"

# Define transforms for training (without augmentation initially)
transform_train = transforms.Compose([
    transforms.Resize((200, 200)),
    transforms.ToTensor(),
])

transform_test = transforms.Compose([
    transforms.Resize((200, 200)),
    transforms.ToTensor(),
])

# Load dataset splits using ImageFolder
# Expected structure:
# ./data/train/curly
# ./data/train/straight
# ./data/test/curly
# ./data/test/straight
train_dir = os.path.join(data_dir, "train")
test_dir = os.path.join(data_dir, "test")

train_dataset = ImageFolder(train_dir, transform=transform_train)
test_dataset = ImageFolder(test_dir, transform=transform_test)

print(f"Classes: {train_dataset.classes}")

# Since this dataset is already binary (curly vs straight),
# we don't need to remap labels; ImageFolder will give 0/1 labels.

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")


Classes: ['curly', 'straight']
Training samples: 800
Test samples: 201


In [8]:
# Training function
def train_epoch(model, train_loader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for images, labels in tqdm(train_loader, desc="Training"):
        images = images.to(device)
        labels = labels.float().to(device).unsqueeze(1)  # Convert to float and add dimension

        # Forward pass
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass
        loss.backward()
        optimizer.step()

        # Statistics
        running_loss += loss.item()
        predicted = (outputs > 0.5).float()
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    epoch_loss = running_loss / len(train_loader)
    epoch_acc = correct / total
    return epoch_loss, epoch_acc

# Evaluation function
def evaluate(model, test_loader, criterion, device):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for images, labels in tqdm(test_loader, desc="Evaluating"):
            images = images.to(device)
            labels = labels.float().to(device).unsqueeze(1)

            outputs = model(images)
            loss = criterion(outputs, labels)

            running_loss += loss.item()
            predicted = (outputs > 0.5).float()
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    epoch_loss = running_loss / len(test_loader)
    epoch_acc = correct / total
    return epoch_loss, epoch_acc


In [9]:
# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Move model to device
model = model.to(device)


Using device: cuda


In [10]:
# Train the model for 10 epochs (initial training)
num_epochs = 10
train_losses = []
train_accuracies = []
test_losses = []
test_accuracies = []

for epoch in range(num_epochs):
    print(f"\nEpoch {epoch+1}/{num_epochs}")

    # Train
    train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
    train_losses.append(train_loss)
    train_accuracies.append(train_acc)

    # Evaluate
    test_loss, test_acc = evaluate(model, test_loader, criterion, device)
    test_losses.append(test_loss)
    test_accuracies.append(test_acc)

    print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}")
    print(f"Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.4f}")



Epoch 1/10


Training: 100%|██████████| 25/25 [00:09<00:00,  2.73it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.96it/s]


Train Loss: 0.6961, Train Acc: 0.5212
Test Loss: 0.6829, Test Acc: 0.6020

Epoch 2/10


Training: 100%|██████████| 25/25 [00:07<00:00,  3.21it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.86it/s]


Train Loss: 0.6697, Train Acc: 0.6225
Test Loss: 0.6629, Test Acc: 0.6219

Epoch 3/10


Training: 100%|██████████| 25/25 [00:07<00:00,  3.20it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.94it/s]


Train Loss: 0.6376, Train Acc: 0.6525
Test Loss: 0.6384, Test Acc: 0.6418

Epoch 4/10


Training: 100%|██████████| 25/25 [00:07<00:00,  3.36it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  3.61it/s]


Train Loss: 0.6091, Train Acc: 0.6737
Test Loss: 0.6305, Test Acc: 0.6269

Epoch 5/10


Training: 100%|██████████| 25/25 [00:07<00:00,  3.50it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.70it/s]


Train Loss: 0.5936, Train Acc: 0.6763
Test Loss: 0.6369, Test Acc: 0.6617

Epoch 6/10


Training: 100%|██████████| 25/25 [00:08<00:00,  3.06it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.79it/s]


Train Loss: 0.5718, Train Acc: 0.7025
Test Loss: 0.6575, Test Acc: 0.6368

Epoch 7/10


Training: 100%|██████████| 25/25 [00:08<00:00,  2.94it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.59it/s]


Train Loss: 0.5955, Train Acc: 0.6800
Test Loss: 0.6148, Test Acc: 0.6368

Epoch 8/10


Training: 100%|██████████| 25/25 [00:08<00:00,  3.06it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.75it/s]


Train Loss: 0.5633, Train Acc: 0.7050
Test Loss: 0.6335, Test Acc: 0.6567

Epoch 9/10


Training: 100%|██████████| 25/25 [00:07<00:00,  3.37it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.00it/s]


Train Loss: 0.5401, Train Acc: 0.7375
Test Loss: 0.6294, Test Acc: 0.6617

Epoch 10/10


Training: 100%|██████████| 25/25 [00:07<00:00,  3.24it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.67it/s]

Train Loss: 0.5348, Train Acc: 0.7250
Test Loss: 0.6229, Test Acc: 0.6517





## Question 3: Median of training accuracy for all epochs


In [11]:
# Question 3: Median of training accuracy
median_train_acc = np.median(train_accuracies)
print(f"Median training accuracy: {median_train_acc:.4f}")
print(f"\nTraining accuracies: {[f'{acc:.4f}' for acc in train_accuracies]}")
print(f"\nAnswer: {median_train_acc:.2f}")


Median training accuracy: 0.6781

Training accuracies: ['0.5212', '0.6225', '0.6525', '0.6737', '0.6763', '0.7025', '0.6800', '0.7050', '0.7375', '0.7250']

Answer: 0.68


## Question 4: Standard deviation of training loss for all epochs


In [12]:
# Question 4: Standard deviation of training loss
std_train_loss = np.std(train_losses)
print(f"Standard deviation of training loss: {std_train_loss:.4f}")
print(f"\nTraining losses: {[f'{loss:.4f}' for loss in train_losses]}")
print(f"\nAnswer: {std_train_loss:.3f}")


Standard deviation of training loss: 0.0506

Training losses: ['0.6961', '0.6697', '0.6376', '0.6091', '0.5936', '0.5718', '0.5955', '0.5633', '0.5401', '0.5348']

Answer: 0.051


## Questions 5 & 6: Training with Data Augmentation

Now we'll train for 10 more epochs with data augmentation. Note: we continue training the same model, not creating a new one.


In [13]:
# Define transforms with augmentation for training
transform_train_aug = transforms.Compose([
    transforms.Resize((200, 200)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(degrees=10),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.ToTensor(),
])

# Reload training dataset with augmentation from straight/curly hair data
train_dataset_aug = ImageFolder(train_dir, transform=transform_train_aug)

# No need to convert labels; still binary 0/1

# Create new data loader with augmentation
train_loader_aug = DataLoader(train_dataset_aug, batch_size=32, shuffle=True)

print("Data augmentation transforms applied on straight/curly hair dataset:")
print("- Random horizontal flip (p=0.5)")
print("- Random rotation (10 degrees)")
print("- Color jitter (brightness=0.2, contrast=0.2)")


Data augmentation transforms applied on straight/curly hair dataset:
- Random horizontal flip (p=0.5)
- Random rotation (10 degrees)
- Color jitter (brightness=0.2, contrast=0.2)


In [14]:
# Train for 10 more epochs with augmentation
# Note: We're continuing to train the same model, not creating a new one
num_epochs_aug = 10
train_losses_aug = []
train_accuracies_aug = []
test_losses_aug = []
test_accuracies_aug = []

for epoch in range(num_epochs_aug):
    print(f"\nEpoch {epoch+1}/{num_epochs_aug} (with augmentation)")

    # Train with augmented data
    train_loss, train_acc = train_epoch(model, train_loader_aug, criterion, optimizer, device)
    train_losses_aug.append(train_loss)
    train_accuracies_aug.append(train_acc)

    # Evaluate
    test_loss, test_acc = evaluate(model, test_loader, criterion, device)
    test_losses_aug.append(test_loss)
    test_accuracies_aug.append(test_acc)

    print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}")
    print(f"Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.4f}")



Epoch 1/10 (with augmentation)


Training: 100%|██████████| 25/25 [00:09<00:00,  2.70it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.68it/s]


Train Loss: 0.5732, Train Acc: 0.7137
Test Loss: 0.6139, Test Acc: 0.6617

Epoch 2/10 (with augmentation)


Training: 100%|██████████| 25/25 [00:09<00:00,  2.76it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.72it/s]


Train Loss: 0.5664, Train Acc: 0.6900
Test Loss: 0.6105, Test Acc: 0.6219

Epoch 3/10 (with augmentation)


Training: 100%|██████████| 25/25 [00:09<00:00,  2.72it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.72it/s]


Train Loss: 0.5657, Train Acc: 0.7013
Test Loss: 0.6203, Test Acc: 0.6567

Epoch 4/10 (with augmentation)


Training: 100%|██████████| 25/25 [00:09<00:00,  2.74it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.81it/s]


Train Loss: 0.5458, Train Acc: 0.7212
Test Loss: 0.6327, Test Acc: 0.6716

Epoch 5/10 (with augmentation)


Training: 100%|██████████| 25/25 [00:08<00:00,  2.90it/s]
Evaluating: 100%|██████████| 7/7 [00:02<00:00,  3.30it/s]


Train Loss: 0.5568, Train Acc: 0.7150
Test Loss: 0.6119, Test Acc: 0.6816

Epoch 6/10 (with augmentation)


Training: 100%|██████████| 25/25 [00:08<00:00,  2.92it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.19it/s]


Train Loss: 0.5547, Train Acc: 0.7050
Test Loss: 0.6223, Test Acc: 0.6766

Epoch 7/10 (with augmentation)


Training: 100%|██████████| 25/25 [00:09<00:00,  2.75it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.71it/s]


Train Loss: 0.5600, Train Acc: 0.7113
Test Loss: 0.6214, Test Acc: 0.6766

Epoch 8/10 (with augmentation)


Training: 100%|██████████| 25/25 [00:09<00:00,  2.73it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.72it/s]


Train Loss: 0.5343, Train Acc: 0.7300
Test Loss: 0.5955, Test Acc: 0.6965

Epoch 9/10 (with augmentation)


Training: 100%|██████████| 25/25 [00:09<00:00,  2.73it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.75it/s]


Train Loss: 0.5490, Train Acc: 0.7188
Test Loss: 0.6254, Test Acc: 0.6766

Epoch 10/10 (with augmentation)


Training: 100%|██████████| 25/25 [00:09<00:00,  2.73it/s]
Evaluating: 100%|██████████| 7/7 [00:01<00:00,  4.71it/s]

Train Loss: 0.5358, Train Acc: 0.7188
Test Loss: 0.6130, Test Acc: 0.6915





## Question 5: Mean of test loss for all epochs (with augmentation)


In [15]:
# Question 5: Mean of test loss for all epochs with augmentation
mean_test_loss_aug = np.mean(test_losses_aug)
print(f"Mean test loss (with augmentation): {mean_test_loss_aug:.4f}")
print(f"\nTest losses: {[f'{loss:.4f}' for loss in test_losses_aug]}")
print(f"\nAnswer: {mean_test_loss_aug:.3f}")


Mean test loss (with augmentation): 0.6167

Test losses: ['0.6139', '0.6105', '0.6203', '0.6327', '0.6119', '0.6223', '0.6214', '0.5955', '0.6254', '0.6130']

Answer: 0.617


## Question 6: Average of test accuracy for the last 5 epochs (epochs 6-10) with augmentation


In [16]:
# Question 6: Average of test accuracy for last 5 epochs (epochs 6-10, indices 5-9)
last_5_test_acc = test_accuracies_aug[5:10]  # Epochs 6-10 (0-indexed: 5-9)
avg_last_5_test_acc = np.mean(last_5_test_acc)

print(f"Test accuracies for epochs 6-10: {[f'{acc:.4f}' for acc in last_5_test_acc]}")
print(f"Average test accuracy (last 5 epochs): {avg_last_5_test_acc:.4f}")
print(f"\nAnswer: {avg_last_5_test_acc:.2f}")


Test accuracies for epochs 6-10: ['0.6766', '0.6766', '0.6965', '0.6766', '0.6915']
Average test accuracy (last 5 epochs): 0.6836

Answer: 0.68


## Summary of Answers

1. **Question 1**: Loss function - `nn.BCELoss()` (or `nn.BCEWithLogitsLoss()` if no sigmoid)
2. **Question 2**: Total parameters - ~20,075,009
3. **Question 3**: Median training accuracy - Check output above
4. **Question 4**: Standard deviation of training loss - Check output above
5. **Question 5**: Mean test loss (with augmentation) - Check output above
6. **Question 6**: Average test accuracy (last 5 epochs with augmentation) - Check output above
