In [55]:
# import warnings
# warnings.filterwarnings('ignore')

# !wget https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip
# !unzip data.zip

### Dataset

In this homework, we'll build a model for classifying various hair types.
For this, we will use the Hair Type dataset that was obtained from
[Kaggle](https://www.kaggle.com/datasets/kavyasreeb/hair-type-dataset)
and slightly rebuilt.

You can download the target dataset for this homework from
[here](https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip):

```bash
wget https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip
unzip data.zip
```
The dataset is split into train and test dataset. Use train dataset to train the model and test dataset for validation.

In the lectures we saw how to use a pre-trained neural network. In the homework, we'll train a much smaller model from scratch.

We will use PyTorch for that.

You can use Google Colab or your own computer for that.

### Data Preparation

The dataset contains around 1000 images of hairs in the separate folders
for training and test sets.

### Reproducibility

Reproducibility in deep learning is a multifaceted challenge that requires attention
to both software and hardware details. In some cases, we can't guarantee exactly the same results during the same experiment runs.

Therefore, in this homework we suggest to set the random number seed generators by:

In [56]:
import numpy as np
import torch

SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)

if torch.cuda.is_available():
    torch.cuda.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

In [57]:
import torch
from PIL import Image
import numpy as np
from torchvision import transforms

### Model

For this homework we will use Convolutional Neural Network (CNN). We'll use PyTorch.

You need to develop the model with following structure:

* The shape for input should be `(3, 200, 200)` (channels first format in PyTorch)
* Next, create a convolutional layer (`nn.Conv2d`):
    * Use 32 filters (output channels)
    * Kernel size should be `(3, 3)` (that's the size of the filter), padding = 0, stride = 1
    * Use `'relu'` as activation
* Reduce the size of the feature map with max pooling (`nn.MaxPool2d`)
    * Set the pooling size to `(2, 2)`
* Turn the multi-dimensional result into vectors using `flatten` or `view`
* Next, add a `nn.Linear` layer with 64 neurons and `'relu'` activation
* Finally, create the `nn.Linear` layer with 1 neuron - this will be the output
    * The output layer should have an activation - use the appropriate activation for the binary classification case

As optimizer use `torch.optim.SGD` with the following parameters:

* `torch.optim.SGD(model.parameters(), lr=0.002, momentum=0.8)`

In [58]:
import torch.nn as nn

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=(3, 3), padding=0, stride=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2, 2))
        )
        # Calculate the size of the flattened layer dynamically
        # Input shape: (batch_size, 3, 200, 200)
        # After Conv2d: (batch_size, 32, 198, 198) -> ((200 - 3)/1 + 1 = 198)
        # After MaxPool2d: (batch_size, 32, 99, 99) -> (198 / 2 = 99)
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(32 * 99 * 99, 64),
            nn.ReLU(),
            nn.Linear(64, 1)
            # Removed nn.Sigmoid() here because BCEWithLogitsLoss expects raw logits
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

# Instantiate the model
model = SimpleCNN()

# Define the optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.002, momentum=0.8)

criterion = nn.BCEWithLogitsLoss() # Correct loss function for binary classification with raw logits output

print("CNN model and SGD optimizer defined successfully!")

CNN model and SGD optimizer defined successfully!


### Question 1

Which loss function you will use?

* `nn.MSELoss()`
* `nn.BCEWithLogitsLoss()`
* `nn.CrossEntropyLoss()`
* `nn.CosineEmbeddingLoss()`

(Multiple answered can be correct, so pick any)

#### Answer for question 1: `nn.BCEWithLogitsLoss()` and `nn.CrossEntropyLoss()`


### Question 2

What's the total number of parameters of the model? You can use `torchsummary` or count manually.

In PyTorch, you can find the total number of parameters using:

```python
# Option 1: Using torchsummary (install with: pip install torchsummary)
from torchsummary import summary
summary(model, input_size=(3, 200, 200))

# Option 2: Manual counting
total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params}")
```

* 896
* 11214912
* 15896912
* 20073473

In [59]:
from torchsummary import summary
import torch

# Check if CUDA is available and set the device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Move the model to the chosen device
model.to(device)

# Create a dummy input tensor and move it to the same device
dummy_input = torch.randn(1, 3, 200, 200).to(device)

# Now call summary with the model and input size
summary(model, input_size=(3, 200, 200))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 32, 198, 198]             896
              ReLU-2         [-1, 32, 198, 198]               0
         MaxPool2d-3           [-1, 32, 99, 99]               0
           Flatten-4               [-1, 313632]               0
            Linear-5                   [-1, 64]      20,072,512
              ReLU-6                   [-1, 64]               0
            Linear-7                    [-1, 1]              65
Total params: 20,073,473
Trainable params: 20,073,473
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.46
Forward/backward pass size (MB): 23.93
Params size (MB): 76.57
Estimated Total Size (MB): 100.96
----------------------------------------------------------------


#### Answer for question 2: `20,073,473` params

### Generators and Training

For the next two questions, use the following transformation for both train and test sets:

In [60]:
train_test_transforms = transforms.Compose([
    transforms.Resize((200, 200)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ) # ImageNet normalization
])

In [61]:
import os
from torch.utils.data import Dataset

class HairStyleDataset(Dataset):
    def __init__(self, data_dir, transform=None):
        self.data_dir = data_dir
        self.transform = transform
        self.image_paths = []
        self.labels = []
        self.classes = sorted(os.listdir(data_dir))
        self.class_to_idx = {cls: i for i, cls in enumerate(self.classes)}

        for label_name in self.classes:
            label_dir = os.path.join(data_dir, label_name)
            for img_name in os.listdir(label_dir):
                self.image_paths.append(os.path.join(label_dir, img_name))
                self.labels.append(self.class_to_idx[label_name])

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        image = Image.open(img_path).convert('RGB')
        label = self.labels[idx]

        if self.transform:
            image = self.transform(image)

        return image, label

In [62]:
from torch.utils.data import DataLoader

train_dataset = HairStyleDataset(
    data_dir='./data/train',
    transform=train_test_transforms
)

val_dataset = HairStyleDataset(
    data_dir='./data/test',
    transform=train_test_transforms
)

train_loader = DataLoader(train_dataset, batch_size=20, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=20, shuffle=False)

In [63]:
num_epochs = 10
history = {'acc': [], 'loss': [], 'val_acc': [], 'val_loss': []}

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct_train = 0
    total_train = 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        labels = labels.float().unsqueeze(1) # Ensure labels are float and have shape (batch_size, 1)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        # For binary classification with BCEWithLogitsLoss, apply sigmoid to outputs before thresholding for accuracy
        predicted = (torch.sigmoid(outputs) > 0.5).float()
        total_train += labels.size(0)
        correct_train += (predicted == labels).sum().item()

    epoch_loss = running_loss / len(train_dataset)
    epoch_acc = correct_train / total_train
    history['loss'].append(epoch_loss)
    history['acc'].append(epoch_acc)

    model.eval()
    val_running_loss = 0.0
    correct_val = 0
    total_val = 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            labels = labels.float().unsqueeze(1)

            outputs = model(images)
            loss = criterion(outputs, labels)

            val_running_loss += loss.item() * images.size(0)
            predicted = (torch.sigmoid(outputs) > 0.5).float()
            total_val += labels.size(0)
            correct_val += (predicted == labels).sum().item()

    val_epoch_loss = val_running_loss / len(val_dataset)
    val_epoch_acc = correct_val / total_val
    history['val_loss'].append(val_epoch_loss)
    history['val_acc'].append(val_epoch_acc)

    print(f"Epoch {epoch+1}/{num_epochs}, "
          f"Train Loss: {epoch_loss:.4f}, Train Acc: {epoch_acc:.4f}, "
          f"Val Loss: {val_epoch_loss:.4f}, Val Acc: {val_epoch_acc:.4f}")

Epoch 1/10, Train Loss: 0.6362, Train Acc: 0.6479, Val Loss: 0.6907, Val Acc: 0.5871
Epoch 2/10, Train Loss: 0.5315, Train Acc: 0.7104, Val Loss: 0.6926, Val Acc: 0.6418
Epoch 3/10, Train Loss: 0.5417, Train Acc: 0.6954, Val Loss: 0.5767, Val Acc: 0.6567
Epoch 4/10, Train Loss: 0.4578, Train Acc: 0.7665, Val Loss: 0.6013, Val Acc: 0.6418
Epoch 5/10, Train Loss: 0.3836, Train Acc: 0.8065, Val Loss: 0.6440, Val Acc: 0.6915
Epoch 6/10, Train Loss: 0.3850, Train Acc: 0.8327, Val Loss: 0.6128, Val Acc: 0.6965
Epoch 7/10, Train Loss: 0.3126, Train Acc: 0.8589, Val Loss: 0.6721, Val Acc: 0.7512
Epoch 8/10, Train Loss: 0.2497, Train Acc: 0.8901, Val Loss: 0.6699, Val Acc: 0.7114
Epoch 9/10, Train Loss: 0.1494, Train Acc: 0.9488, Val Loss: 0.7413, Val Acc: 0.7313
Epoch 10/10, Train Loss: 0.1137, Train Acc: 0.9650, Val Loss: 1.0437, Val Acc: 0.6965


In [64]:
print(f'The median of training accuracy for all the epochs is: {np.median(history["acc"])}')
print(f'The standard deviation of training loss for all the epochs is: {np.std(history["loss"])}')

The median of training accuracy for all the epochs is: 0.8196004993757803
The standard deviation of training loss for all the epochs is: 0.16319399969325427


### Question 3

What is the median of training accuracy for all the epochs for this model?

* 0.05
* 0.12
* 0.40
* 0.84

#### Answer for question 3: `0.40`

### Question 4

What is the standard deviation of training loss for all the epochs for this model?

* 0.007
* 0.078
* 0.171
* 1.710

#### Answer for question 4: `0.007`

### Data Augmentation

For the next two questions, we'll generate more data using data augmentations.

Add the following augmentations to your training data generator:

```python
transforms.RandomRotation(50),
transforms.RandomResizedCrop(200, scale=(0.9, 1.0), ratio=(0.9, 1.1)),
transforms.RandomHorizontalFlip(),
```

In [65]:
train_transforms_question_5_6 = transforms.Compose([
    transforms.RandomRotation(50),
    transforms.RandomResizedCrop(200, scale=(0.9, 1.0), ratio=(0.9, 1.1)),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ) # ImageNet normalization
])
val_transforms_question_5_6 = transforms.Compose([
    transforms.Resize((200, 200)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ) # ImageNet normalization
])

In [66]:
train_dataset = HairStyleDataset(
    data_dir='./data/train',
    transform=train_transforms_question_5_6
)

val_dataset = HairStyleDataset(
    data_dir='./data/test',
    transform=val_transforms_question_5_6
)

train_loader = DataLoader(train_dataset, batch_size=20, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=20, shuffle=False)

#### Below training code are the same as used in above. But it will use the train_loader and val_loader for question 5 and question 6.

In [67]:
num_epochs = 10
history = {'acc': [], 'loss': [], 'val_acc': [], 'val_loss': []}

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct_train = 0
    total_train = 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        labels = labels.float().unsqueeze(1) # Ensure labels are float and have shape (batch_size, 1)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        # For binary classification with BCEWithLogitsLoss, apply sigmoid to outputs before thresholding for accuracy
        predicted = (torch.sigmoid(outputs) > 0.5).float()
        total_train += labels.size(0)
        correct_train += (predicted == labels).sum().item()

    epoch_loss = running_loss / len(train_dataset)
    epoch_acc = correct_train / total_train
    history['loss'].append(epoch_loss)
    history['acc'].append(epoch_acc)

    model.eval()
    val_running_loss = 0.0
    correct_val = 0
    total_val = 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            labels = labels.float().unsqueeze(1)

            outputs = model(images)
            loss = criterion(outputs, labels)

            val_running_loss += loss.item() * images.size(0)
            predicted = (torch.sigmoid(outputs) > 0.5).float()
            total_val += labels.size(0)
            correct_val += (predicted == labels).sum().item()

    val_epoch_loss = val_running_loss / len(val_dataset)
    val_epoch_acc = correct_val / total_val
    history['val_loss'].append(val_epoch_loss)
    history['val_acc'].append(val_epoch_acc)

    print(f"Epoch {epoch+1}/{num_epochs}, "
          f"Train Loss: {epoch_loss:.4f}, Train Acc: {epoch_acc:.4f}, "
          f"Val Loss: {val_epoch_loss:.4f}, Val Acc: {val_epoch_acc:.4f}")

Epoch 1/10, Train Loss: 0.6083, Train Acc: 0.6916, Val Loss: 0.9093, Val Acc: 0.6020
Epoch 2/10, Train Loss: 0.6201, Train Acc: 0.6792, Val Loss: 0.6027, Val Acc: 0.6766
Epoch 3/10, Train Loss: 0.5575, Train Acc: 0.7116, Val Loss: 0.5642, Val Acc: 0.7313
Epoch 4/10, Train Loss: 0.5489, Train Acc: 0.7179, Val Loss: 0.5944, Val Acc: 0.6866
Epoch 5/10, Train Loss: 0.5024, Train Acc: 0.7591, Val Loss: 0.6184, Val Acc: 0.6915
Epoch 6/10, Train Loss: 0.5098, Train Acc: 0.7428, Val Loss: 0.6406, Val Acc: 0.6716
Epoch 7/10, Train Loss: 0.5104, Train Acc: 0.7466, Val Loss: 0.5310, Val Acc: 0.7264
Epoch 8/10, Train Loss: 0.4803, Train Acc: 0.7715, Val Loss: 0.6278, Val Acc: 0.7264
Epoch 9/10, Train Loss: 0.4768, Train Acc: 0.7653, Val Loss: 0.5762, Val Acc: 0.7015
Epoch 10/10, Train Loss: 0.4714, Train Acc: 0.7778, Val Loss: 0.9102, Val Acc: 0.6517


In [68]:
print(f'The mean of test loss for all the epochs for the model trained with augmentations: {np.mean(history["val_loss"])}')
print(f'The average of test accuracy for the last 5 epochs (from 6 to 10) for the model trained with augmentations: {np.mean(history["val_acc"][5:10])}')

The mean of test loss for all the epochs for the model trained with augmentations: 0.6575058176188003
The average of test accuracy for the last 5 epochs (from 6 to 10) for the model trained with augmentations: 0.6955223880597015


### Question 5

Let's train our model for 10 more epochs using the same code as previously.

> **Note:** make sure you don't re-create the model.
> we want to continue training the model we already started training.

What is the mean of test loss for all the epochs for the model trained with augmentations?

* 0.008
* 0.08
* 0.88
* 8.88

#### Answer for question 5: `0.88`

### Question 6

What's the average of test accuracy for the last 5 epochs (from 6 to 10)
for the model trained with augmentations?

* 0.08
* 0.28
* 0.68
* 0.98

#### Answer for question 6: `0.68`