## Homework

> **Note**: it's very likely that in this homework your answers won't match 
> the options exactly. That's okay and expected. Select the option that's
> closest to your solution.
> If it's exactly in between two options, select the higher value.

### Dataset

In this homework, we'll build a model for classifying various hair types. 
For this, we will use the Hair Type dataset that was obtained from 
[Kaggle](https://www.kaggle.com/datasets/kavyasreeb/hair-type-dataset) 
and slightly rebuilt.

You can download the target dataset for this homework from 
[here](https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip):

```bash
wget https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip
unzip data.zip
```


In the lectures we saw how to use a pre-trained neural network. In the homework, we'll train a much smaller model from scratch. 

We will use PyTorch for that.

You can use Google Colab or your own computer for that.

In [1]:
# only run once
#! wget https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip
#! unzip data.zip

### Data Preparation

The dataset contains around 1000 images of hairs in the separate folders 
for training and test sets. 

Let's try to look at two examples.

| Curly | Straight |
|---|---|
|![curly](../../../data/train/curly/besthairstylesforindianmen15_1378796017.jpg "Title") | ![straight](../../../data/train/straight/db4dbc136f8f0ebd32b6854bbd9834d2.jpg) |

 Ok, these are pictures of people with curly or straight hair.

### Reproducibility

Reproducibility in deep learning is a multifaceted challenge that requires attention 
to both software and hardware details. In some cases, we can't guarantee exactly the same results during the same experiment runs.

Therefore, in this homework we suggest to set the random number seed generators by:

```python
import numpy as np
import torch

SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)

if torch.cuda.is_available():
    torch.cuda.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
```

Also, use PyTorch of version 2.8.0 (that's the one in Colab).

In [2]:
# let's set the seeds. I also don't have cuda on my machine :()

import numpy as np
import torch

SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

### Model

For this homework we will use Convolutional Neural Network (CNN). We'll use PyTorch.

You need to develop the model with following structure:

* The shape for input should be `(3, 200, 200)` (channels first format in PyTorch)
* Next, create a convolutional layer (`nn.Conv2d`):
    * Use 32 filters (output channels)
    * Kernel size should be `(3, 3)` (that's the size of the filter)
    * Use `'relu'` as activation 
* Reduce the size of the feature map with max pooling (`nn.MaxPool2d`)
    * Set the pooling size to `(2, 2)`
* Turn the multi-dimensional result into vectors using `flatten` or `view`
* Next, add a `nn.Linear` layer with 64 neurons and `'relu'` activation
* Finally, create the `nn.Linear` layer with 1 neuron - this will be the output
    * The output layer should have an activation - use the appropriate activation for the binary classification case

As optimizer use `torch.optim.SGD` with the following parameters:

* `torch.optim.SGD(model.parameters(), lr=0.002, momentum=0.8)`

In [3]:
import torch.nn as nn


model = nn.Sequential(
    nn.Conv2d(3, 32, kernel_size=3),  # Input channels: 3, Output channels: 32
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=(2, 2)),  # Reduces the spatial dimensions by half
    nn.Flatten(),  # Flattens the output to a vector
    nn.Linear(32 * 99 * 99, 64),  # Input features: 32 * 99 * 99 (after flattening)
    nn.ReLU(),
    nn.Linear(64, 1),  # Output layer with 1 neuron for binary classification
    # don't use activation layer, even though text says we should. Example code assumes we don't have activation-layer
)

optimizer = torch.optim.SGD(model.parameters(), lr=0.002, momentum=0.8)

### Question 1

Which loss function you will use?

* `nn.MSELoss()`
* **`nn.BCEWithLogitsLoss()`**
* `nn.CrossEntropyLoss()`
* `nn.CosineEmbeddingLoss()`

(Multiple answered can be correct, so pick any)

In [4]:
# we have binary classification without logits
# so we use B(inary)C(ross)E(ntropy)WithLogigtsLoss

criterion = nn.BCEWithLogitsLoss()

### Question 2

What's the total number of parameters of the model? You can use `torchsummary` or count manually. 

In PyTorch, you can find the total number of parameters using:

```python
# Option 1: Using torchsummary (install with: pip install torchsummary)
from torchsummary import summary
summary(model, input_size=(3, 200, 200))

# Option 2: Manual counting
total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params}")
```

* 896 
* 11214912
* 15896912
* **20073473**

In [5]:
# Option 2: Manual counting
total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params}")

Total parameters: 20073473


Layer Type | Parameters/Details | Output Shape | Weights + Biases |
|-|-|-|-|
| Input | Shape: (3, 200, 200) | (3, 200, 200) | 0 |
| Conv2d | Filters: 32, Kernel: (3, 3), Activation: ReLU | (32, 198, 198) | 3 * 3 * 3 * 32 + 32 = 896 |
| MaxPool2d | Pooling size: (2,2) | (32, 99, 99) | 0 |
| Flatten | - | (32 * 99 * 99,) | 0 |
| Linear | Neurons: 64, Activation: ReLU | (64, ) | 32 * 99 * 99 * 64 + 64 = 20,072,512 |
| Linear (Output) | Neurons: | (1, ) | 64 + 1 = 65 |

**Total**  =  896 + 20,072,512 + 65 = 20,073,473

### Generators and Training

For the next two questions, use the following transformation for both train and test sets:

```python
train_transforms = transforms.Compose([
    transforms.Resize((200, 200)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ) # ImageNet normalization
])
```

* We don't need to do any additional pre-processing for the images.
* Use `batch_size=20`
* Use `shuffle=True` for both training, but `False` for test. 

In [6]:
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

train_transforms = transforms.Compose(
    [
        transforms.Resize((200, 200)),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
        ),  # ImageNet normalization
    ]
)

batch_size = 20
# Define the datasets
train_dataset = datasets.ImageFolder(root="data/train", transform=train_transforms)
validation_dataset = datasets.ImageFolder(root="data/test", transform=train_transforms)

# Define the data loaders
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
validation_loader = DataLoader(validation_dataset, batch_size=batch_size, shuffle=False)

# Define the device
device = torch.device("cpu")


Now fit the model.

You can use this code:

```python
num_epochs = 10
history = {'acc': [], 'loss': [], 'val_acc': [], 'val_loss': []}

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct_train = 0
    total_train = 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        labels = labels.float().unsqueeze(1) # Ensure labels are float and have shape (batch_size, 1)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        # For binary classification with BCEWithLogitsLoss, apply sigmoid to outputs before thresholding for accuracy
        predicted = (torch.sigmoid(outputs) > 0.5).float()
        total_train += labels.size(0)
        correct_train += (predicted == labels).sum().item()

    epoch_loss = running_loss / len(train_dataset)
    epoch_acc = correct_train / total_train
    history['loss'].append(epoch_loss)
    history['acc'].append(epoch_acc)

    model.eval()
    val_running_loss = 0.0
    correct_val = 0
    total_val = 0
    with torch.no_grad():
        for images, labels in validation_loader:
            images, labels = images.to(device), labels.to(device)
            labels = labels.float().unsqueeze(1)

            outputs = model(images)
            loss = criterion(outputs, labels)

            val_running_loss += loss.item() * images.size(0)
            predicted = (torch.sigmoid(outputs) > 0.5).float()
            total_val += labels.size(0)
            correct_val += (predicted == labels).sum().item()

    val_epoch_loss = val_running_loss / len(validation_dataset)
    val_epoch_acc = correct_val / total_val
    history['val_loss'].append(val_epoch_loss)
    history['val_acc'].append(val_epoch_acc)

    print(f"Epoch {epoch+1}/{num_epochs}, "
          f"Loss: {epoch_loss:.4f}, Acc: {epoch_acc:.4f}, "
          f"Val Loss: {val_epoch_loss:.4f}, Val Acc: {val_epoch_acc:.4f}"))
```

In [7]:
def train(train_loader, validation_loader):
    num_epochs = 10
    history = {"acc": [], "loss": [], "val_acc": [], "val_loss": []}

    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        correct_train = 0
        total_train = 0
        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)
            labels = labels.float().unsqueeze(
                1
            )  # Ensure labels are float and have shape (batch_size, 1)

            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item() * images.size(0)
            # For binary classification with BCEWithLogitsLoss, apply sigmoid to outputs before thresholding for accuracy
            predicted = (torch.sigmoid(outputs) > 0.5).float()
            total_train += labels.size(0)
            correct_train += (predicted == labels).sum().item()

        epoch_loss = running_loss / len(train_dataset)
        epoch_acc = correct_train / total_train
        history["loss"].append(epoch_loss)
        history["acc"].append(epoch_acc)

        model.eval()
        val_running_loss = 0.0
        correct_val = 0
        total_val = 0
        with torch.no_grad():
            for images, labels in validation_loader:
                images, labels = images.to(device), labels.to(device)
                labels = labels.float().unsqueeze(1)

                outputs = model(images)
                loss = criterion(outputs, labels)

                val_running_loss += loss.item() * images.size(0)
                predicted = (torch.sigmoid(outputs) > 0.5).float()
                total_val += labels.size(0)
                correct_val += (predicted == labels).sum().item()

        val_epoch_loss = val_running_loss / len(validation_dataset)
        val_epoch_acc = correct_val / total_val
        history["val_loss"].append(val_epoch_loss)
        history["val_acc"].append(val_epoch_acc)

        print(
            f"Epoch {epoch+1}/{num_epochs}, "
            f"Loss: {epoch_loss:.4f}, Acc: {epoch_acc:.4f}, "
            f"Val Loss: {val_epoch_loss:.4f}, Val Acc: {val_epoch_acc:.4f}"
        )

    return history


history = train(train_loader, validation_loader)

Epoch 1/10, Loss: 0.6462, Acc: 0.6362, Val Loss: 0.6032, Val Acc: 0.6517
Epoch 2/10, Loss: 0.5475, Acc: 0.7100, Val Loss: 0.7251, Val Acc: 0.6318
Epoch 3/10, Loss: 0.5533, Acc: 0.7250, Val Loss: 0.5991, Val Acc: 0.6716
Epoch 4/10, Loss: 0.4802, Acc: 0.7712, Val Loss: 0.6033, Val Acc: 0.6567
Epoch 5/10, Loss: 0.4334, Acc: 0.8025, Val Loss: 0.6196, Val Acc: 0.6766
Epoch 6/10, Loss: 0.3740, Acc: 0.8325, Val Loss: 0.7371, Val Acc: 0.6766
Epoch 7/10, Loss: 0.2721, Acc: 0.8838, Val Loss: 0.9223, Val Acc: 0.6418
Epoch 8/10, Loss: 0.2478, Acc: 0.9000, Val Loss: 0.7294, Val Acc: 0.7214
Epoch 9/10, Loss: 0.2075, Acc: 0.9200, Val Loss: 0.7523, Val Acc: 0.7015
Epoch 10/10, Loss: 0.1494, Acc: 0.9450, Val Loss: 0.7894, Val Acc: 0.7015


### Question 3

What is the median of training accuracy for all the epochs for this model?

* 0.05
* 0.12
* 0.40
* **0.84**

In [8]:
np.median(history["acc"])

np.float64(0.8175)

### Question 4

What is the standard deviation of training loss for all the epochs for this model?

* 0.007
* 0.078
* **0.171**
* 1.710

In [9]:
np.sqrt(np.var(history["loss"]))

np.float64(0.15896665288565942)

### Data Augmentation

For the next two questions, we'll generate more data using data augmentations. 

Add the following augmentations to your training data generator:

```python
transforms.RandomRotation(50),
transforms.RandomResizedCrop(200, scale=(0.9, 1.0), ratio=(0.9, 1.1)),
transforms.RandomHorizontalFlip(),
```

In [None]:
train_augmentatiions_transforms = transforms.Compose(
    [
        transforms.RandomRotation(50),
        transforms.RandomResizedCrop(200, scale=(0.9, 1.0), ratio=(0.9, 1.1)),
        transforms.RandomHorizontalFlip(),
        transforms.Resize((200, 200)),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
        ),  # ImageNet normalization
    ]
)

batch_size = 20
# Define the datasets
train_dataset = datasets.ImageFolder(
    root="data/train", transform=train_augmentatiions_transforms
)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

history_2 = train(train_loader, validation_loader)

Epoch 1/10, Loss: 0.6794, Acc: 0.6700, Val Loss: 0.6157, Val Acc: 0.6866
Epoch 2/10, Loss: 0.5589, Acc: 0.6913, Val Loss: 0.5749, Val Acc: 0.6965
Epoch 3/10, Loss: 0.5626, Acc: 0.7100, Val Loss: 0.5839, Val Acc: 0.7065
Epoch 4/10, Loss: 0.5565, Acc: 0.7250, Val Loss: 0.5488, Val Acc: 0.7264
Epoch 5/10, Loss: 0.5167, Acc: 0.7400, Val Loss: 0.5238, Val Acc: 0.7313
Epoch 6/10, Loss: 0.5052, Acc: 0.7400, Val Loss: 0.5356, Val Acc: 0.7413
Epoch 7/10, Loss: 0.4838, Acc: 0.7688, Val Loss: 0.4948, Val Acc: 0.7811
Epoch 8/10, Loss: 0.4820, Acc: 0.7738, Val Loss: 0.4905, Val Acc: 0.7612
Epoch 9/10, Loss: 0.4722, Acc: 0.7625, Val Loss: 0.5517, Val Acc: 0.7413
Epoch 10/10, Loss: 0.4612, Acc: 0.7712, Val Loss: 0.5071, Val Acc: 0.7711


### Question 5 

Let's train our model for 10 more epochs using the same code as previously.

> **Note:** make sure you don't re-create the model.
> we want to continue training the model we already started training.

What is the mean of test loss for all the epochs for the model trained with augmentations?

* 0.008
* 0.08
* **0.88**
* 8.88

In [13]:
np.mean(history_2["val_loss"])

np.float64(0.542674267773901)


### Question 6

What's the average of test accuracy for the last 5 epochs (from 6 to 10)
for the model trained with augmentations?

* 0.08
* 0.28
* **0.68**
* 0.98

In [12]:
np.average(history_2["val_acc"][-5:])

np.float64(0.7592039800995024)

## Submit the results

* Submit your results here: https://courses.datatalks.club/ml-zoomcamp-2025/homework/hw08
* If your answer doesn't match options exactly, select the closest one. If the answer is exactly in between two options, select the higher value.