# ResNet18: Age-Gender Estimation Model

## Why Age and Gender Prediction Matters
Ever wondered how apps can guess your age or tailor content to you? That’s where age and gender prediction comes in! It’s used in:

- **Personalization:** Ads and content that actually match your demographic.  
- **Security:** Monitoring or authentication systems that need to know who’s who.  
- **Healthcare:** Studying age-related trends or spotting conditions early.  
- **Being bored:** If you’re someone like me who gets bored, why not dive into a fun project like this?

By teaching AI to estimate age and gender from images, we can make smarter, more human-aware systems.

## Dataset
For this project, we are using the **UTKFace** dataset. It’s a massive collection of faces from ages 0 to 116, along with gender labels. The dataset also includes ethnicity and timestamps, but we’ll stick to age and gender for simplicity.

## Image Preprocessing
Before feeding images into our model, we do some basic prep:

- **Resize:** Make sure all images are the same size.  
- **Normalize:** Scale pixel values so the model learns faster and better.  
- **Data Augmentation (optional):** Flip, rotate, or crop images to make the model more robust.

These steps ensure our model sees clean, consistent images and can actually learn meaningful features from them.

## Model
We’re using **ResNet18**, a type of CNN (Convolutional Neural Network) introduced by Microsoft in 2015. The “18” refers to the number of layers with trainable weights.

Explaining how Residual networks are helping us in **very deep networks** requires advanced mathematical knowledge . Here is a short and simple answer: 
Deep networks can run into **vanishing gradients**—gradients shrink as they move backward through layers, making learning slow or impossible. Residual connections in ResNet fix this, letting gradients **flow smoothly** and helping the network actually learn.
Think of it like this: the network wants to pass information backward through many layers, but without shortcuts, it gets tired and loses signal. Residual connections give it a “fast lane” to keep the learning alive.

## How It Works ?
1. The model takes a facial image as input.  
2. Convolutional layers extract features like eyes, nose, and mouth patterns.  
3. Residual connections make sure these features are propagated effectively.  
4. Fully connected layers finally predict **age** (as a number) and **gender** (as male/female).  

This simple pipeline allows us to go from raw images to meaningful predictions, all in one go!


In [9]:
# ---------- Imports ----------

# Standard libraries
import os
import re
import random
import math
from typing import Tuple

# PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, random_split
from torchvision import transforms, models
from torch.nn import L1Loss

# Image handling
from PIL import Image
import cv2

# Kaggle dataset helper
import kagglehub


In [None]:
# ---------- Downloading UTKFace Dataset ----------

# Download the dataset from Kaggle
path = kagglehub.dataset_download("jangedoo/utkface-new") # requires kagglehub.login() if you are not on Colab.

# Copy the folder with cropped images to current directory
# Note: using Python instead of shell for compatibility
import shutil
shutil.copytree(os.path.join(path, "crop_part1"), "crop_part1")

# crop_part1 contains cropped face images for better age/gender prediction


### Reproducibility & Device Setup
- Setting **seeds** ensures that the results are consistent every time you run the notebook.  
- Using **CUDA** if available, otherwise falling back to CPU, so training works on any machine.

It's better you have CUDA if your grandchild is not the one who wants to see the end of training process.

In [2]:
# ---------- Repro & Device ----------

# Reproducibility
random.seed(42)
torch.manual_seed(42)

# Device selection: GPU if available, else CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", device)


Device: cuda


### Preparing the UTKFace Dataset 

UTKFace filenames look like this: `age_gender_race_date.jpg`  
(e.g., `25_0_3_20170116174525125.jpg`). From this, we can sneakily grab **age** and **gender** for our model.

Here’s what's going on:

1. **Custom Dataset Class**
    - `UTKFaceAgeGender` is our little helper that knows how to read images and labels.
    - It grabs images from the folder, checks filenames for the right format, and extracts **age** and **gender**.
    - When you ask for an item (`__getitem__`), it gives you:
        - The processed image  
        - `gender` as a float tensor (binary, 0 or 1)  
        - `age` as a float tensor (number, regression style)

2. **Transforms / Image Magic**
    - Training images get a little workout:  
        - Resize to `224x224` (ResNet loves this size)  
        - Random flips and rotations (It's a widely used technique in computer vision for better accuracy)  
        - Color tweaks (brightness & contrast)  
        - Normalization so the network doesn’t freak out
    - Validation images stay clean and neat—no tricks here.

3. **Train/Validation Split**
    - 80% training, 20% validation.  
    - Thanks to our earlier **seed magic**, the split is always the same.  
    - `DataLoader` batches images, shuffles training data, and uses multiple workers so our CPU/GPU doesn’t get bored.



In [3]:
# ---------- Extracting Images & Labels ----------

_fname_re = re.compile(r'^(\d+)_([01])_([0-4])_') 


class UTKFaceAgeGender(Dataset):
    def __init__(self, folder_path: str, transform: transforms.Compose):
        self.folder_path = folder_path
        files = [f for f in os.listdir(folder_path) if f.lower().endswith('.jpg')] # Taking only jpg files 
        self.image_files = [f for f in files if _fname_re.match(f)] # Keep only correctly UTKFace formatted files
        self.transform = transform

    def __len__(self):
        return len(self.image_files) # Just in case if you need count of images

    def _parse_labels(self, fname: str) -> Tuple[float, float]:
        # Extract age & gender
        # Regex guarantees format; still guard:
        parts = fname.split('_')
        age = float(parts[0])
        gender = float(parts[1])  # 0 or 1
        return age, gender

    def __getitem__(self, idx: int):
        fname = self.image_files[idx]
        path = os.path.join(self.folder_path, fname)
        img = Image.open(path).convert('RGB')

        age, gender = self._parse_labels(fname)
        img = self.transform(img)

        # Return as shape (1,) float tensors
        age_t = torch.tensor([age], dtype=torch.float32)
        gender_t = torch.tensor([gender], dtype=torch.float32)   # for BCEWithLogits

        return img, gender_t, age_t
    
# ---------- Transforms ----------

train_tf = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
])
val_tf = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
])

# --------- Creating train and validation data ----------

data_dir = "crop_part1"  # <- set to your folder containing UTKFace images
full_train = UTKFaceAgeGender(data_dir, transform=train_tf)
n_total = len(full_train)
n_val = int(0.2 * n_total)
n_train = n_total - n_val
train_ds, val_ds = random_split(full_train, [n_train, n_val])
# ensure val has deterministic transforms
val_ds.dataset.transform = val_tf

train_loader = DataLoader(train_ds, batch_size=64, shuffle=True, num_workers=2, pin_memory=True)
val_loader   = DataLoader(val_ds, batch_size=64, shuffle=False, num_workers=2, pin_memory=True)

### Creating the Age & Gender Model

So what’s `self.backbone` anyway?  

- ResNet18 was originally trained on **ImageNet**, a huge dataset with **1000 classes** like cats, dogs, cars… you name it.  
- But we don’t care about cats or dogs here—we only want **age** and **gender**.  
- `self.backbone` is basically the ResNet18 network **without its final classification layer**, so we can repurpose it (call “personalize” it) for our task.  


In [4]:
# ------- Creating Model -------

class AgeGenderNet(nn.Module):
    def __init__(self):
        super().__init__()
        # Load pre-trained ResNet18 (ImageNet weights)
        base = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)
        
        # Keep all layers except the final fully-connected layer which I just talked about .
        self.backbone = nn.Sequential(*list(base.children())[:-1])
        
        # Add custom heads for our two tasks
        self.gender_head = nn.Linear(512, 1)  # Binary classification
        self.age_head = nn.Linear(512, 1)     # Regression

    def forward(self, x):
        x = self.backbone(x)
        x = x.view(x.size(0), -1)  # Flatten feature map
        return self.gender_head(x), self.age_head(x)

# -------- Defining Model & Moving It To GPU/CPU --------

model = AgeGenderNet().to(device)

# Freeze backbone so we don't retrain the whole ResNet
for p in model.backbone.parameters():
    p.requires_grad = False  # Only train the heads (This will be changed later , I'll explain down below)

# -------- Loss Functions & Optimizer --------

loss_gender = nn.BCEWithLogitsLoss()  # Binary classification for gender.
loss_age = nn.SmoothL1Loss()          # Regression for age (SmoothL1 is less sensitive to outliers than MSE, but smoother than MAE)

opt = optim.Adam([
    {"params": model.gender_head.parameters()},
    {"params": model.age_head.parameters()}
], lr=1e-3)  # Train both heads using Adam optimizer (lr is learning-rate and 1e-3  if you don't know , it's 0.001 which is the default rate for Adam)


### Training the Model (No Harry Potter Here)

So what’s happening in this loop? Don’t worry, no magic—just some good old **training tricks** to make sure our model doesn’t overfit or freak out.  

1. **Patience & Early Stopping**
    - We keep track of `total_loss` (age loss + gender loss) each epoch.  
    - If the validation loss doesn’t improve for `patience` epochs, we stop training early.  
    - This prevents wasting time and overfitting our model to the training set.

2. **Freezing & Unfreezing the Backbone**
    - At first, we **freeze the ResNet18 backbone** because our new heads (age & gender) aren’t trained yet.  
    - Pretrained backbone features (from ImageNet) give a good starting point for learning.  
    - After a few epochs (5 here), we **unfreeze the backbone** for fine-tuning, letting the whole network adjust and improve performance.

3. **Saving the Best Model**
    - We save the model whenever validation loss improves so we don’t lose the best version.  

This is an example AI used:
- First, they copy the teacher’s work (frozen backbone) to get the basics.  
- Later, they start experimenting on their own (unfrozen backbone) to get even better.


In [8]:
# ------ Training Loop (Fixed Shapes & Fun Notes) ------


mae_fn = L1Loss()
best_val_loss = float("inf")  # Just making sure the first epoch be saved .
patience, no_improve = 5, 0

for epoch in range(1, 31):
    # ---- Train ----
    model.train()
    total_train_loss = 0
    total_train_mae = 0
    for imgs, genders, ages in train_loader:
        imgs, genders, ages = imgs.to(device), genders.to(device), ages.to(device)
        genders = genders.view(-1, 1)
        ages = ages.view(-1, 1)

        opt.zero_grad()

        g_out, a_out = model(imgs)

        loss = loss_gender(g_out, genders) + loss_age(a_out, ages)
        loss.backward()
        opt.step()

        total_train_loss += loss.item() * imgs.size(0)
        total_train_mae += mae_fn(a_out, ages).item() * imgs.size(0)

    # ---- Validate ----
    model.eval()
    total_val_loss = 0
    total_val_mae = 0
    with torch.no_grad():
        for imgs, genders, ages in val_loader:
            imgs, genders, ages = imgs.to(device), genders.to(device), ages.to(device)
            genders = genders.view(-1, 1)
            ages = ages.view(-1, 1)

            g_out, a_out = model(imgs)
            loss = loss_gender(g_out, genders) + loss_age(a_out, ages)

            total_val_loss += loss.item() * imgs.size(0)
            total_val_mae += mae_fn(a_out, ages).item() * imgs.size(0)

    # Compute averages
    train_loss = total_train_loss / len(train_loader.dataset)
    val_loss = total_val_loss / len(val_loader.dataset)
    train_mae = total_train_mae / len(train_loader.dataset)
    val_mae = total_val_mae / len(val_loader.dataset)

    print(f"Epoch {epoch:02d} | "
          f"Train Loss: {train_loss:.4f} | Val Loss: {val_loss:.4f} | "
          f"Age Train MAE: {train_mae:.4f} | Age Val MAE: {val_mae:.4f}")

    # Save best model
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        torch.save(model.state_dict(), "best_age_gender.pth")
        print("Best model saved")
        no_improve = 0
    else:
        no_improve += 1

    # Unfreeze backbone after a few epochs
    if epoch == 5:
        for p in model.backbone.parameters():
            p.requires_grad = True
        opt = optim.Adam(model.parameters(), lr=1e-4)
        print("Backbone unfrozen for fine-tuning")

    # Early stopping
    if no_improve >= patience:
        print("Early stopping activated")
        break



Epoch 01 | Train Loss: 13.3787 | Val Loss: 12.8048 | Age Train MAE: 13.4364 | Age Val MAE: 12.8543
Best model saved
Epoch 02 | Train Loss: 13.1423 | Val Loss: 12.5692 | Age Train MAE: 13.1980 | Age Val MAE: 12.6157
Best model saved
Epoch 03 | Train Loss: 12.8745 | Val Loss: 12.4837 | Age Train MAE: 12.9327 | Age Val MAE: 12.5314
Best model saved
Epoch 04 | Train Loss: 12.6879 | Val Loss: 12.4356 | Age Train MAE: 12.7552 | Age Val MAE: 12.4790
Best model saved
Epoch 05 | Train Loss: 12.6001 | Val Loss: 12.2701 | Age Train MAE: 12.6652 | Age Val MAE: 12.3193
Best model saved
Backbone unfrozen for fine-tuning
Epoch 06 | Train Loss: 8.2748 | Val Loss: 7.1728 | Age Train MAE: 8.3650 | Age Val MAE: 7.3020
Best model saved
Epoch 07 | Train Loss: 5.6151 | Val Loss: 5.6156 | Age Train MAE: 5.7871 | Age Val MAE: 5.7659
Best model saved
Epoch 08 | Train Loss: 4.4562 | Val Loss: 5.7351 | Age Train MAE: 4.6661 | Age Val MAE: 5.8910
Epoch 09 | Train Loss: 3.8225 | Val Loss: 6.0925 | Age Train MAE: 4

### Training Highlights

- **Total epochs:** 27  
- **Best validation loss:** 4.42  
- **Early stopping activated** at epoch 27  

### Key Observation: Epoch 5 → Epoch 6

- **Epoch 5:** Backbone frozen → small, gradual improvements  
- **Epoch 6:** Backbone unfrozen → major performance jump  
  - Train Loss dropped sharply  
  - Val Loss dropped sharply  
  - Age MAE improved by ~4 years  
- This marks the turning point where fine-tuning the pretrained backbone dramatically improves the model.


### Time For Testing The Model
Now that we trained our Model with a good MAE for age (about 4.5 years) , we are going to test it on test_image.jpg , it will be saved in test_image_output.jpg

#### X Important X
In tests, I used **haar** algorithm for detecting face before estimating . Since haar is not the best option for face detection , it might not perform very well . I suggest using other methods for detecting face such as **Dlib Face Detector** or **YOLO**


In [35]:
model.load_state_dict(torch.load("best_age_gender.pth", map_location=device))
model.eval()

face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
frame = cv2.imread("test_image.jpg")

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5)

for (x, y, w, h) in faces:
    face_img = frame[y:y+h, x:x+w]
    img_pil = Image.fromarray(cv2.cvtColor(face_img, cv2.COLOR_BGR2RGB))
    img_t = val_tf(img_pil).unsqueeze(0).to(device)

    with torch.no_grad():
        g_logit, a_pred = model(img_t)
        gender_prob = torch.sigmoid(g_logit).item()
        gender_label = "Female" if gender_prob > 0.5 else "Male"
        age_val = float(a_pred.item())
        age_val = max(0.0, min(116.0, age_val))  # clamp

    # Scaling factors relative to face size
    thickness = max(1, w // 100)  # rectangle & text thickness
    font_scale = w / 200          # font size

    # Draw rectangle
    cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), thickness)

    # Prepare text
    text = f"{gender_label} ({(gender_prob*100) if gender_prob > 0.5 else (100 - gender_prob*100):.0f}%), Age: {age_val:.0f}"
    text_y = max(0, y - 10)  # keep text inside image
    cv2.putText(frame, text, (x, text_y),
                cv2.FONT_HERSHEY_SIMPLEX, font_scale, (0, 255, 0), thickness)

cv2.imwrite("test_image_output.jpg", frame)

True