# Week 8 Homework: Hair Type Classification with CNN

This homework focuses on building a Convolutional Neural Network (CNN) from scratch using PyTorch to classify hair types.


## Prerequisites

Tools:
- PyTorch 2.8.0
- torchvision
- PIL (Pillow)
- NumPy
- torchsummary (for model summary)


In [2]:
# Install required packages (if needed)
# !pip install torch torchvision pillow numpy torchsummary


## Dataset

In this homework, we'll build a model for classifying various hair types using the Hair Type dataset from [Kaggle](https://www.kaggle.com/datasets/kavyasreeb/hair-type-dataset).

The dataset contains around 1000 images of hairs in separate folders for training and test sets.


In [3]:
# Dataset download URL
DATASET_URL = 'https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip'
DATASET_FILENAME = 'data.zip'

# Download the dataset
import os
import urllib.request
import zipfile

if not os.path.exists(DATASET_FILENAME):
    print(f"Downloading dataset from {DATASET_URL}...")
    urllib.request.urlretrieve(DATASET_URL, DATASET_FILENAME)
    print("Download complete!")
else:
    print("Dataset already downloaded.")

# Unzip the dataset
if not os.path.exists('data'):
    print("Extracting dataset...")
    with zipfile.ZipFile(DATASET_FILENAME, 'r') as zip_ref:
        zip_ref.extractall('.')
    print("Extraction complete!")
else:
    print("Dataset already extracted.")


Dataset already downloaded.
Dataset already extracted.


## Reproducibility

Reproducibility in deep learning requires setting random number seed generators to ensure consistent results across runs.


In [4]:
import numpy as np
import torch

# Reproducibility constants
RANDOM_SEED = 42

def set_random_seeds(seed=RANDOM_SEED):
    """
    Sets random seeds for reproducibility across NumPy and PyTorch.
    
    Args:
        seed (int): Random seed value. Default is 42.
    """
    np.random.seed(seed)
    torch.manual_seed(seed)
    
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
    
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_random_seeds(RANDOM_SEED)
print(f"Random seeds set to {RANDOM_SEED} for reproducibility")


Random seeds set to 42 for reproducibility


In [5]:
import torch
print(torch.__version__)

2.9.0+cu126


## Model Architecture

We'll build a Convolutional Neural Network (CNN) with the following structure:

1. Input: `(3, 200, 200)` - RGB images (channels first format)
2. Convolutional layer: 32 filters, kernel size (3, 3), ReLU activation
3. Max pooling: Pool size (2, 2)
4. Flatten: Convert multi-dimensional tensor to 1D vector using `torch.flatten`
5. Dense layer: 64 neurons with ReLU activation
6. Output layer: 1 neuron with sigmoid activation (for binary classification)


In [6]:
import torch.nn as nn

# Model architecture constants
INPUT_CHANNELS = 3
INPUT_HEIGHT = 200
INPUT_WIDTH = 200
CONV_OUTPUT_CHANNELS = 32
CONV_KERNEL_SIZE = 3
CONV_PADDING = 0  # No padding
CONV_STRIDE = 1  # Default stride
POOL_SIZE = 2
POOL_STRIDE = 2  # Default stride for pooling
DENSE_NEURONS = 64
OUTPUT_NEURONS = 1

class HairTypeClassifier(nn.Module):
    """
    CNN model for binary hair type classification.
    
    Architecture:
    - Conv2d: 32 filters, 3x3 kernel, ReLU
    - MaxPool2d: 2x2 pooling
    - Linear: 64 neurons, ReLU
    - Linear: 1 neuron (logit output for binary classification)
    
    Args:
        input_channels (int): Number of input channels. Default is 3 (RGB).
        input_height (int): Input image height. Default is 200.
        input_width (int): Input image width. Default is 200.
        conv_output_channels (int): Number of output channels from conv layer. Default is 32.
        conv_kernel_size (int): Size of convolutional kernel. Default is 3.
        conv_padding (int): Padding for convolutional layer. Default is 0.
        conv_stride (int): Stride for convolutional layer. Default is 1.
        pool_size (int): Size of max pooling. Default is 2.
        pool_stride (int): Stride for max pooling. Default is 2.
        dense_neurons (int): Number of neurons in dense layer. Default is 64.
        output_neurons (int): Number of output neurons. Default is 1.
    
    Returns:
        torch.Tensor: Output logits of shape (batch_size, 1) for binary classification.
    """
    def __init__(self, input_channels=INPUT_CHANNELS, input_height=INPUT_HEIGHT, 
                 input_width=INPUT_WIDTH, conv_output_channels=CONV_OUTPUT_CHANNELS,
                 conv_kernel_size=CONV_KERNEL_SIZE, conv_padding=CONV_PADDING,
                 conv_stride=CONV_STRIDE, pool_size=POOL_SIZE, pool_stride=POOL_STRIDE,
                 dense_neurons=DENSE_NEURONS, output_neurons=OUTPUT_NEURONS):
        super(HairTypeClassifier, self).__init__()
        
        # Convolutional layer (no padding)
        self.conv = nn.Conv2d(
            in_channels=input_channels,
            out_channels=conv_output_channels,
            kernel_size=conv_kernel_size,
            padding=conv_padding,
            stride=conv_stride
        )
        self.relu1 = nn.ReLU()
        
        # Max pooling layer
        self.pool = nn.MaxPool2d(kernel_size=pool_size, stride=pool_stride)
        
        # Calculate flattened size after conv and pooling
        # Formula: output_size = (input_size + 2*padding - kernel_size) // stride + 1
        # After conv (3x3, padding=0, stride=1): (200, 200) -> (198, 198)
        # After pool (2x2, stride=2): (198, 198) -> (99, 99)
        conv_output_height = (input_height + 2 * conv_padding - conv_kernel_size) // conv_stride + 1
        conv_output_width = (input_width + 2 * conv_padding - conv_kernel_size) // conv_stride + 1
        pooled_height = (conv_output_height - pool_size) // pool_stride + 1
        pooled_width = (conv_output_width - pool_size) // pool_stride + 1
        flattened_size = conv_output_channels * pooled_height * pooled_width
        
        # Dense layers
        self.fc1 = nn.Linear(flattened_size, dense_neurons)
        self.relu2 = nn.ReLU()
        self.fc2 = nn.Linear(dense_neurons, output_neurons)
    
    def forward(self, x):
        """
        Forward pass through the network.
        
        Args:
            x: Input tensor of shape (batch_size, 3, 200, 200).
        
        Returns:
            torch.Tensor: Output logits of shape (batch_size, 1).
        """
        # Convolutional block
        x = self.conv(x)
        x = self.relu1(x)
        x = self.pool(x)
        
        # Flatten (start_dim=1 to keep batch dimension)
        x = torch.flatten(x, start_dim=1)
        
        # Dense layers
        x = self.fc1(x)
        x = self.relu2(x)
        x = self.fc2(x)
        
        return x


NOTE:

* **Convolution Output (Height/Width):** $$\text{Output} = \frac{\text{Input} - \text{Kernel} + 2 \times \text{Padding}}{\text{Stride}} + 1$$
	*(Note: If $\text{Padding} = 1$, Kernel = 3, and Stride = 1, Output Size equals Input Size.)*

* **Max Pooling Output:** $$\text{Output} = \frac{\text{Input}}{\text{Pool Size}}$$

* **Flattened Size:** $$\text{Flat} = \text{Channels} \times \text{Height} \times \text{Width}$$

* **Dense Layer Output:** $$\text{Output} = \text{Input Vector Size} \times \text{Number of Neurons}$$

* **Sigmoid Activation:** $$\sigma(x) = \frac{1}{1 + e^{-x}}$$


In [7]:
# Create model instance
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

model = HairTypeClassifier()
model.to(device)
print("Model created successfully!")


Using device: cuda
Model created successfully!


In [8]:
# Display model architecture
print("Model Architecture:")
print(model)


Model Architecture:
HairTypeClassifier(
  (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
  (relu1): ReLU()
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=313632, out_features=64, bias=True)
  (relu2): ReLU()
  (fc2): Linear(in_features=64, out_features=1, bias=True)
)


In [9]:
# Using torchsummary for detailed model summary
try:
    from torchsummary import summary
    summary(model, input_size=(INPUT_CHANNELS, INPUT_HEIGHT, INPUT_WIDTH))
except ImportError:
    print("torchsummary not installed. Install with: pip install torchsummary")
    print("Using manual parameter count instead.")


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 32, 198, 198]             896
              ReLU-2         [-1, 32, 198, 198]               0
         MaxPool2d-3           [-1, 32, 99, 99]               0
            Linear-4                   [-1, 64]      20,072,512
              ReLU-5                   [-1, 64]               0
            Linear-6                    [-1, 1]              65
Total params: 20,073,473
Trainable params: 20,073,473
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.46
Forward/backward pass size (MB): 21.54
Params size (MB): 76.57
Estimated Total Size (MB): 98.57
----------------------------------------------------------------


In [10]:
import torch.optim as optim

# Optimizer hyperparameters
LEARNING_RATE = 0.002
MOMENTUM = 0.8

optimizer = optim.SGD(model.parameters(), lr=LEARNING_RATE, momentum=MOMENTUM)
print(f"Optimizer: SGD with lr={LEARNING_RATE}, momentum={MOMENTUM}")


Optimizer: SGD with lr=0.002, momentum=0.8


In [11]:
# Loss function for binary classification
# Use BCEWithLogitsLoss to combine a stable sigmoid + BCE in one op
criterion = nn.BCEWithLogitsLoss()
print("Loss function: BCEWithLogitsLoss (logits + sigmoid internally)")


Loss function: BCEWithLogitsLoss (logits + sigmoid internally)


In [12]:
import os
print("Contents of current directory:")
print(os.listdir('.'))

Contents of current directory:
['.config', 'data.zip', 'data', 'sample_data']


In [13]:
print("\nContents of data folder:")
if os.path.exists('data'):
    print(os.listdir('data'))


Contents of data folder:
['test', 'train']


In [14]:
import os
from torch.utils.data import Dataset
from PIL import Image
from torchvision import transforms

class HairTypeDataset(Dataset):
    """
    Custom PyTorch Dataset for loading hair type images from directory structure.
    
    Args:
        data_dir (str): Path to root directory containing class subdirectories.
        transform (callable, optional): Transform pipeline to apply to images.
    
    Returns:
        tuple: (transformed_image, label) when indexed.
    
    Raises:
        FileNotFoundError: If data_dir does not exist.
    """
    def __init__(self, data_dir, transform=None):
        self.data_dir = data_dir
        self.transform = transform
        self.image_paths = []
        self.labels = []
        self.classes = sorted(os.listdir(data_dir))
        self.class_to_idx = {cls: idx for idx, cls in enumerate(self.classes)}

        for class_name in self.classes:
            class_directory = os.path.join(data_dir, class_name)
            for image_filename in os.listdir(class_directory):
                image_path = os.path.join(class_directory, image_filename)
                self.image_paths.append(image_path)
                self.labels.append(self.class_to_idx[class_name])

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        image_path = self.image_paths[idx]
        image = Image.open(image_path).convert('RGB')
        label = self.labels[idx]

        if self.transform:
            image = self.transform(image)

        return image, label


In [15]:
# Image preprocessing constants
IMAGE_SIZE = 200
IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD = [0.229, 0.224, 0.225]

def create_transforms(image_size=IMAGE_SIZE):
    """
    Creates transform pipeline for image preprocessing.
    
    Args:
        image_size (int): Target size for resizing. Default is 200.
    
    Returns:
        transforms.Compose: Transform pipeline with resize, ToTensor, and normalization.
    """
    return transforms.Compose([
        transforms.Resize((image_size, image_size)),
        transforms.ToTensor(),
        transforms.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD)
    ])

# Create transforms for train and test
train_transforms = create_transforms(IMAGE_SIZE)
test_transforms = create_transforms(IMAGE_SIZE)


In [16]:
from torch.utils.data import DataLoader

# Dataset paths
TRAIN_DATA_DIR = './data/train'
TEST_DATA_DIR = './data/test'

# DataLoader hyperparameters
BATCH_SIZE = 20
SHUFFLE_TRAIN = True
SHUFFLE_TEST = False

# Create datasets
train_dataset = HairTypeDataset(
    data_dir=TRAIN_DATA_DIR,
    transform=train_transforms
)

test_dataset = HairTypeDataset(
    data_dir=TEST_DATA_DIR,
    transform=test_transforms
)

# Create data loaders
train_loader = DataLoader(
    train_dataset, 
    batch_size=BATCH_SIZE, 
    shuffle=SHUFFLE_TRAIN
)

test_loader = DataLoader(
    test_dataset, 
    batch_size=BATCH_SIZE, 
    shuffle=SHUFFLE_TEST
)

print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Number of classes: {len(train_dataset.classes)}")
print(f"Classes: {train_dataset.classes}")


Training samples: 801
Test samples: 201
Number of classes: 2
Classes: ['curly', 'straight']


In [17]:
# Training constants
SIGMOID_THRESHOLD = 0.5
LABEL_UNSQUEEZE_DIM = 1

def train_one_epoch(model, train_loader, optimizer, criterion, device):
    """
    Trains the model for one epoch.
    
    Args:
        model: PyTorch model to train.
        train_loader: DataLoader for training data.
        optimizer: Optimizer for updating model parameters.
        criterion: Loss function.
        device: Device to run training on (cuda/cpu).
    
    Returns:
        tuple: (average_loss, accuracy) for the epoch.
    """
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        # Ensure labels are float and have shape (batch_size, 1)
        labels = labels.float().unsqueeze(LABEL_UNSQUEEZE_DIM)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        # Apply sigmoid and threshold for binary classification
        predicted = (torch.sigmoid(outputs) > SIGMOID_THRESHOLD).float()
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    average_loss = running_loss / len(train_loader.dataset)
    accuracy = correct / total
    return average_loss, accuracy

def evaluate_model(model, data_loader, criterion, device):
    """
    Evaluates the model on data.
    
    Args:
        model: PyTorch model to evaluate.
        data_loader: DataLoader for evaluation data.
        criterion: Loss function.
        device: Device to run evaluation on (cuda/cpu).
    
    Returns:
        tuple: (average_loss, accuracy) for the dataset.
    """
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for images, labels in data_loader:
            images, labels = images.to(device), labels.to(device)
            # Convert labels to float and add dimension for binary classification
            labels = labels.float().unsqueeze(LABEL_UNSQUEEZE_DIM)

            outputs = model(images)
            loss = criterion(outputs, labels)

            running_loss += loss.item() * images.size(0)
            # Apply sigmoid and threshold for binary classification
            predicted = (torch.sigmoid(outputs) > SIGMOID_THRESHOLD).float()
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    average_loss = running_loss / len(data_loader.dataset)
    accuracy = correct / total
    return average_loss, accuracy


def train_and_evaluate(model, optimizer, train_loader, val_loader, criterion, num_epochs, device, history=None):
    """
    Trains and evaluates a model for multiple epochs.
    
    Args:
        model: PyTorch model to train.
        optimizer: Optimizer for updating model parameters.
        train_loader: DataLoader for training data.
        val_loader: DataLoader for validation/test data.
        criterion: Loss function.
        num_epochs (int): Number of training epochs.
        device: Device to run training on (cuda/cpu).
        history (dict, optional): Dictionary to store training history. If None, creates new one.
    
    Returns:
        dict: Training history with 'acc', 'loss', 'val_acc', 'val_loss' keys.
    """
    if history is None:
        history = {'acc': [], 'loss': [], 'val_acc': [], 'val_loss': []}

    for epoch in range(num_epochs):
        train_loss, train_acc = train_one_epoch(
            model, train_loader, optimizer, criterion, device)
        val_loss, val_acc = evaluate_model(
            model, val_loader, criterion, device)

        # Store history
        history['loss'].append(train_loss)
        history['acc'].append(train_acc)
        history['val_loss'].append(val_loss)
        history['val_acc'].append(val_acc)

        print(f"Epoch {epoch+1}/{num_epochs}, "
              f"Loss: {train_loss:.4f}, Acc: {train_acc:.4f}, "
              f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")

    return history

In [18]:
# Training hyperparameters
NUM_EPOCHS = 10

# Train and evaluate the model
history = train_and_evaluate(
    model, optimizer, train_loader, test_loader, 
    criterion, NUM_EPOCHS, device
)

print("\nTraining completed!")


Epoch 1/10, Loss: 0.6230, Acc: 0.6554, Val Loss: 0.7524, Val Acc: 0.6070
Epoch 2/10, Loss: 0.5648, Acc: 0.6879, Val Loss: 0.6136, Val Acc: 0.6567
Epoch 3/10, Loss: 0.5274, Acc: 0.7366, Val Loss: 0.6602, Val Acc: 0.6418
Epoch 4/10, Loss: 0.4862, Acc: 0.7478, Val Loss: 0.6801, Val Acc: 0.6169
Epoch 5/10, Loss: 0.4317, Acc: 0.8027, Val Loss: 0.6286, Val Acc: 0.6567
Epoch 6/10, Loss: 0.4073, Acc: 0.7990, Val Loss: 0.9441, Val Acc: 0.5871
Epoch 7/10, Loss: 0.5136, Acc: 0.7528, Val Loss: 0.6664, Val Acc: 0.6318
Epoch 8/10, Loss: 0.3495, Acc: 0.8702, Val Loss: 0.6588, Val Acc: 0.6866
Epoch 9/10, Loss: 0.2745, Acc: 0.8814, Val Loss: 2.2306, Val Acc: 0.5323
Epoch 10/10, Loss: 0.7116, Acc: 0.6891, Val Loss: 0.6192, Val Acc: 0.6716

Training completed!


In [19]:
import numpy as np

# Question 3: Median of training accuracy
train_acc_median = np.median(history['acc'])
train_acc_median

np.float64(0.7503121098626717)

In [20]:
# Question 4: Standard deviation of training loss
train_loss_std = np.std(history['loss'])
train_loss_std

np.float64(0.12278403513656203)

## Data Augmentation

For the next two questions, we will train the same model for 10 additional epochs using data augmentation **only on the training data**. We will use the following augmentations, as specified in the homework:

- `transforms.RandomRotation(50)`
- `transforms.RandomResizedCrop(200, scale=(0.9, 1.0), ratio=(0.9, 1.1))`
- `transforms.RandomHorizontalFlip()`

The test set will continue to use only resizing, tensor conversion, and normalization (no augmentation).


In [21]:
# Data augmentation hyperparameters
MAX_ROTATION_DEGREES = 50
RANDOM_CROP_SCALE = (0.9, 1.0)
RANDOM_CROP_RATIO = (0.9, 1.1)


def create_augmented_transforms(image_size=IMAGE_SIZE):
    """Create transform pipeline for image preprocessing with data augmentation.

    Args:
        image_size (int): Target size for resized/cropped images. Default is IMAGE_SIZE.

    Returns:
        transforms.Compose: Transform pipeline with rotation, random crop,
            horizontal flip, tensor conversion, and normalization.
    """
    return transforms.Compose([
        transforms.RandomRotation(MAX_ROTATION_DEGREES),
        transforms.RandomResizedCrop(
            image_size,
            scale=RANDOM_CROP_SCALE,
            ratio=RANDOM_CROP_RATIO,
        ),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD),
    ])


# Replace training transforms with augmented version (test transforms stay the same)
train_transforms_augmented = create_augmented_transforms(IMAGE_SIZE)

train_dataset = HairTypeDataset(
    data_dir=TRAIN_DATA_DIR,
    transform=train_transforms_augmented,
)

train_loader = DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=SHUFFLE_TRAIN,
)

print("Training dataset with augmentation created.")
print(f"Training samples: {len(train_dataset)}")


Training dataset with augmentation created.
Training samples: 801


In [22]:
# Continue training with data augmentation (10 more epochs)
NUM_EPOCHS_AUGMENTED = 10

aug_history = train_and_evaluate(
    model,
    optimizer,
    train_loader,
    test_loader,
    criterion,
    NUM_EPOCHS_AUGMENTED,
    device,
)

print("\nAugmented training completed!")


Epoch 1/10, Loss: 0.6007, Acc: 0.6779, Val Loss: 0.6151, Val Acc: 0.6915
Epoch 2/10, Loss: 0.5683, Acc: 0.6754, Val Loss: 0.6007, Val Acc: 0.6915
Epoch 3/10, Loss: 0.5387, Acc: 0.7004, Val Loss: 0.7636, Val Acc: 0.6119
Epoch 4/10, Loss: 0.5489, Acc: 0.7079, Val Loss: 0.6943, Val Acc: 0.6716
Epoch 5/10, Loss: 0.5360, Acc: 0.7241, Val Loss: 0.5984, Val Acc: 0.7164
Epoch 6/10, Loss: 0.5356, Acc: 0.7191, Val Loss: 0.6415, Val Acc: 0.6716
Epoch 7/10, Loss: 0.5069, Acc: 0.7391, Val Loss: 0.7381, Val Acc: 0.6169
Epoch 8/10, Loss: 0.5385, Acc: 0.7191, Val Loss: 0.7462, Val Acc: 0.6318
Epoch 9/10, Loss: 0.4950, Acc: 0.7566, Val Loss: 0.6101, Val Acc: 0.6866
Epoch 10/10, Loss: 0.4933, Acc: 0.7516, Val Loss: 0.5665, Val Acc: 0.7164

Augmented training completed!


In [23]:
# Question 5: Mean of test loss (validation loss) for augmented training epochs
augmented_test_loss_mean = np.mean(aug_history['val_loss'])
augmented_test_loss_mean


np.float64(0.6574376635364632)

In [24]:
# Question 6: Average test accuracy for the last 5 augmented epochs (6 to 10)
LAST_EPOCHS_START_INDEX = 5  # zero-based index corresponding to epoch 6

test_accuracy_last_five_mean = np.mean(aug_history['val_acc'][LAST_EPOCHS_START_INDEX:])
test_accuracy_last_five_mean


np.float64(0.6646766169154229)