Age and Gender Prediction

In [None]:
# import libraries
import pandas as pd
import numpy as np  
import matplotlib.pyplot as plt
import seaborn as sns
import os
from pathlib import Path
from PIL import Image
import warnings
from kaggle.api.kaggle_api_extended import KaggleApi
import cv2
from tqdm.notebook import tqdm

# import tensorflow and keras
import tensorflow as tf
import keras
from keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing.image import img_to_array, load_img, array_to_img
from sklearn.model_selection import train_test_split
from keras.initializers import random_uniform, glorot_uniform, constant, identity
from keras.layers import Dropout, Input, Add, Dense, Activation, Rescaling,\
    BatchNormalization, Flatten, Conv2D, MaxPooling2D, GlobalMaxPooling2D
from keras.models import Model, Sequential, load_model
print(dir(keras))



### 1. Load Dataset

In [None]:
from dotenv import load_dotenv

# Import your custom data loading module
# (Make sure data_loader.py is in the same folder as this notebook)
from data_loader import get_unified_dataset


# Load environment variables from .env file
load_dotenv()

# --- 2. Download Datasets (Only if missing) ---
dataset_folder = './datasets'
datasets = [
    'jangedoo/utkface-new',
    'ttungl/adience-benchmark-gender-and-age-classification',
    'aiolapo/fgnet-dataset',
    'moritzm00/biometrically-filtered-famous-figure-dataset'
]

# Authenticate with Kaggle
if not os.getenv("KAGGLE_KEY"):
    print("‚ö†Ô∏è Error: KAGGLE_KEY not found. Please check your .env file.")
else:
    api = KaggleApi()
    api.authenticate()
    
    if not os.path.exists(dataset_folder):
        os.makedirs(dataset_folder)

    # Simple check: If folder is empty or barely populated, download everything
    # (Adjust this logic if you want to be more specific per dataset)
    if len(os.listdir(dataset_folder)) < 10:  # If less than 10 subfolders, assume datasets are missing
        print("üìÇ Downloading datasets from Kaggle... this may take a while.")
        for dataset in datasets:
            print(f"   --> Downloading {dataset}...")
            api.dataset_download_files(dataset, path=dataset_folder, unzip=True)
        print("‚úÖ Download complete!")
    else:
        print("‚úÖ Datasets already downloaded.")

# --- 3. Load & Unify Data ---
print("üîÑ Processing and loading data...")
df = get_unified_dataset(dataset_folder)

# --- 4. Verify Data ---
print(f"\nTotal Images Loaded: {len(df)}")
print("Source Breakdown:")
print(df['source'].value_counts())

df.to_csv('datasets/df_first.csv', index=False)

# Show sample
df.head()

In [None]:
print("Files in ./datasets:", os.listdir('./datasets'))

### 2. Data Exploration

In [None]:
df = pd.read_csv('datasets/df_first.csv')

In [None]:
df.info()

In [None]:
# split into 2 dfs for age and gender
df_age = df[['image_path', 'age', 'source']]
df_age = df_age[df_age['age'] > 0]
df_age = df_age[df_age['age'] <= 100]

df_gender = df[['image_path', 'gender', 'source']]
df_gender = df_gender[df_gender['gender'].isin([0, 1])]

*Convert gender to Male and Female labels/categories*

remember to comment out the labelling process so it wont convert again

In [None]:
df['gender'] = df['gender'].astype('category')

df['gender'].value_counts()

In [None]:
fig, ax = plt.subplots(figsize=(10, 10))
sns.histplot(data=df_age, x="age", ax=ax)
plt.show()

Gender Distribution is quite balanced. However, the age distribution is highly imbalanced and heavily skewed towards younger individuals. The histogram reveals a significant peak of data points in the 20-30 year-old range, representing the majority of the dataset. This data imbalance is a critical factor that needs to be addressed before training the CNN model.

**Impact on the CNN Model:**

The model is highly likely to become biased towards the heavily represented age groups (e.g., 20s and 30s). This will lead to excellent prediction accuracy for these ages but poor generalization and significant performance degradation on underrepresented age groups, particularly for older individuals. The model will struggle to accurately predict the age of individuals in these categories, often defaulting to a more common age from the training set.

Hence, it is crucial to implement data balancing techniques such as oversampling the minority classes, undersampling the majority classes, or using weighted loss functions during model training.

In [None]:
sources = sorted(df['source'].dropna().unique())
fig, axes = plt.subplots(len(sources), 2, figsize=(12, 4 * len(sources)))
axes = np.atleast_2d(axes)

with warnings.catch_warnings():
    warnings.simplefilter("ignore", category=FutureWarning)
    warnings.simplefilter("ignore", category=UserWarning)
    for i, source in enumerate(sources):
        subset = df[df['source'] == source]

        sns.histplot(subset['age'], bins=40, kde=True, ax=axes[i, 0], color='royalblue')
        axes[i, 0].set_title(f"ages for {source.upper()} (Count: {len(subset)})", fontsize=12, fontweight='bold')
        axes[i, 0].set_xlabel("Age")
        axes[i, 0].set_ylabel("Number of Images")
        axes[i, 0].grid(axis='y', alpha=0.3)

        sns.countplot(x='gender', data=subset, ax=axes[i, 1], palette='pastel', order=[0, 1])
        axes[i, 1].set_title(f"gender for {source.upper()}", fontsize=12, fontweight='bold')
        axes[i, 1].set_xticklabels(['Male (0)', 'Female (1)'])
        axes[i, 1].set_xlabel("")
        axes[i, 1].set_ylabel("Count")
        for container in axes[i, 1].containers:
            axes[i, 1].bar_label(container)

plt.tight_layout()
plt.show()

In [None]:
# 1. Sample 10000 images to check (much faster than checking all)
# If you want to check everything, remove .sample(1000)
sample_df = df.sample(n=10000, random_state=42)

widths = []
heights = []
ratios = []

print("Measuring 10000 random images...")

for img_path in sample_df['image_path']:
    try:
        with Image.open(img_path) as img:
            w, h = img.size
            widths.append(w)
            heights.append(h)
            ratios.append(w / h)
    except:
        continue

# --- Visualizing the Results ---
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Plot 1: Width vs Height
axes[0].scatter(widths, heights, alpha=0.5, color='purple')
axes[0].set_title("Image Dimensions (Width vs Height)")
axes[0].set_xlabel("Width (px)")
axes[0].set_ylabel("Height (px)")
axes[0].axvline(200, color='red', linestyle='--', label='Target Size (200)')
axes[0].axhline(200, color='red', linestyle='--')
axes[0].legend()

# Plot 2: Aspect Ratio Distribution
# Ratio = 1.0 is Square. >1 is Wide, <1 is Tall.
sns.histplot(ratios, bins=30, ax=axes[1], color='orange', kde=True)
axes[1].set_title("Aspect Ratio Distribution (Width / Height)")
axes[1].set_xlabel("Aspect Ratio (1.0 = Square)")
axes[1].axvline(1.0, color='black', linestyle='--', label='Perfect Square')
axes[1].legend()

plt.show()

# Summary Stats
print(f"Average Size: {int(np.mean(widths))}x{int(np.mean(heights))}")
print(f"Smallest Width: {np.min(widths)}")
print(f"Percentage of non-square images: {np.mean(np.array(ratios) != 1.0) * 100:.1f}%")

### 3. Data Preprocessing: Balancing Dataset
   - **Upsample** underrepresented age groups (rare ages with few samples).
   - **Downsample** overrepresented age groups (common ages with too many samples).
   - This helps prevent the model from overfitting to frequent ages and improves performance on rare ones.

In [None]:
age_counts = df_age['age'].value_counts().sort_values()
for i in range(100):
    if i not in age_counts.index:
        print(i)
print(age_counts)

I will need to find images for 94, 97, 98 and upsample 5 more for them. And then downsample all ages to just 2k images, in order to balance the age distribution.

In [None]:
# This step is for creating augmented images for missing ages (94, 97, 98)
# These will be created by the augmentation loop below
# For now, we'll let the augmentation script handle this

# Remove this code block - we'll get these images from the augmentation process
# new_rows = pd.DataFrame({
#     'age': [94, 97, 98],
#     'image_path': [Path('UTKFace_augmented/age_94_lady.jpg'), Path('UTKFace_augmented/age_97_lady.jpg'), Path('UTKFace_augmented/age_97_lady.jpg')],
#     'source': ['augmented', 'augmented', 'augmented']
# })

# df_age = pd.concat([df_age, new_rows], ignore_index=True, sort=False)
# df_age.tail()

In [None]:
# Image Augmentation Strategy
datagen = ImageDataGenerator(
    rotation_range=20,  # Randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range=0.1,  # Randomly translate images horizontally (fraction of total width)
    height_shift_range=0.1,  # Randomly translate images vertically (fraction of total height)
    brightness_range=[0.8, 1.2],  # Randomly change brightness
    shear_range=0.1,  # Shear angle in counter-clockwise direction in degrees
    zoom_range=0.1, # Randomly zoom in on images
    horizontal_flip=True,  # Randomly flip images horizontally
    fill_mode='nearest'
)

In [None]:
# Create a folder for augmented images
augmented_path = Path("datasets/UTKFace_augmented")
augmented_path.mkdir(exist_ok=True)

In [None]:
# Look for images to augment that actually exist
# First check what data we have
age_counts = df_age['age'].value_counts().sort_values()
print("Ages with fewer than 5 images:")
missing_or_rare_ages = age_counts[age_counts < 10].index.tolist()
print(missing_or_rare_ages)

# Filter to only augment ages that actually have source images
target_ages = [age for age in missing_or_rare_ages if len(df_age[df_age['age'] == age]) > 0]
print(f"\nTarget ages for augmentation (with existing images): {target_ages}")

subset = df_age[df_age['age'].isin(target_ages)]
print(f"Found {len(subset)} images to augment")
print(subset.head())

In [None]:
for index, row in subset.iterrows():
    original_img_path = Path(f"{row.image_path}")
    
    # Extract just the filename without extension
    original_filename = original_img_path.stem  # e.g., "image" from "image.jpg"
    
    try:
        # Load the image
        img = Image.open(original_img_path)
        img_array = np.array(img)
        img_array = np.expand_dims(img_array, axis=0)

        i = 0
        for batch in datagen.flow(img_array, batch_size=1):
            augmented_image_array = batch[0]
            augmented_image = array_to_img(augmented_image_array)
            
            # Create filename with _augmented suffix
            augmented_filename = f"{original_filename}_{row['age']}_augmented_{i}.jpg"
            augmented_image_path = augmented_path / augmented_filename
            augmented_image.save(augmented_image_path)
            print(f"Saved augmented image: {augmented_image_path}")
            
            # Add to df_age immediately
            new_row = pd.DataFrame({
                'image_path': [augmented_image_path],
                'age': [row['age']],
                'source': ['augmented']
            })
            df_age = pd.concat([df_age, new_row], ignore_index=True)
            
            i += 1
            if i >= 2:
                break
    except Exception as e:
        print(f"Error processing {original_img_path}: {e}")
        continue

In [None]:
df_age.tail(10)

#### Downsampling

In [None]:
# --- 1. Settings ---
MAX_IMAGES_PER_AGE = 2000

# --- 2. The Logic ---
processed_chunks = []
unique_ages = df_age['age'].unique()

print(f"üîÑ Processing {len(unique_ages)} unique age groups...")

for age in unique_ages:
    # Get all rows for this specific age
    age_group = df_age[df_age['age'] == age]
    
    count = len(age_group)
    
    # Case A: Small group? Keep everything.
    if count <= MAX_IMAGES_PER_AGE:
        processed_chunks.append(age_group)
        
    # Case B: Too big? Downsample while keeping source ratios.
    else:
        # Calculate fraction to keep (e.g., 2000 / 4000 = 0.5)
        frac = MAX_IMAGES_PER_AGE / count
        
        # We group by 'source' within this age and sample strictly.
        # This ensures if we keep 50%, we keep 50% of UTK, 50% of Adience, etc.
        sampled = age_group.groupby('source', group_keys=False).apply(
            lambda x: x.sample(frac=frac, random_state=42)
        )
        processed_chunks.append(sampled)

# --- 3. Combine Back Together ---
# This creates a brand new clean dataframe
df_meow = pd.concat(processed_chunks, ignore_index=True)

print("‚úÖ Downsampling complete!")
print(f"Original size: {len(df_age)}")
print(f"New size:      {len(df_meow)}")

# --- 4. Verify ---
print("\nTop 5 ages by count (Should be ~2000):")
print(df_meow['age'].value_counts().head())

In [None]:
sns.histplot(data = df_meow['age'], bins = 100, color='teal')

##### *Data is definitely more balanced than before.*

In [None]:
# 1. Check current type
is_path_type = df_meow['image_path'].apply(lambda x: isinstance(x, Path)).all()
print(f"Current Status: Are all image_paths 'Path' objects? {is_path_type}")

# 2. FIX IT: If they are strings, convert them back to Path objects
if not is_path_type:
    print("üîÑ Converting strings to Path objects...")
    df_meow['image_path'] = df_meow['image_path'].apply(Path)
    
    # Re-check
    is_path_type_now = df_meow['image_path'].apply(lambda x: isinstance(x, Path)).all()
    print(f"‚úÖ Fixed Status: Are all image_paths 'Path' objects? {is_path_type_now}")
else:
    print("‚úÖ No changes needed.")

In [None]:
df_meow.info()

### 4. Split into response and predictor

##### Data Preparation (df_meow only)
I detect and crop faces with **YOLO-Face**, resize to **224x224**, normalize to **[0,1]**, create `df_meow["age_group"]`, and prepare train/validation data for both outputs.

In [None]:
from ultralytics import YOLO
from huggingface_hub import hf_hub_download

# --- Constants ---
IMAGE_SIZE = (224, 224)
NUM_AGE_GROUPS = 5
BATCH_SIZE = 32
VAL_SPLIT = 0.15
RANDOM_STATE = 42
AUTOTUNE = tf.data.AUTOTUNE

# --- Setup Save Directory ---
# We create a folder to save your cropped faces so you don't have to re-run YOLO later.
save_dir = Path("./processed_faces")
save_dir.mkdir(parents=True, exist_ok=True)

# --- Helper Functions ---

def map_age_to_group(age):
    # I map each raw age value to your required 5-class age group.
    age = int(age)
    if 0 <= age <= 12:
        return 0  # Child
    if 13 <= age <= 25:
        return 1  # Youth
    if 26 <= age <= 42:
        return 2  # Adult
    if 43 <= age <= 60:
        return 3  # Middle Age
    return 4      # Senior (60+)

# --- Load YOLO Model ---
# I download YOLO-Face weights once from Hugging Face and load the detector.
yolo_face_weights = hf_hub_download(
    repo_id="Bingsu/adetailer",
    filename="face_yolov8n.pt"
)
yolo_face_model = YOLO(yolo_face_weights)

def crop_face_with_yolo(image_path: Path, target_size=IMAGE_SIZE, conf_thres: float = 0.3, margin_ratio: float = 0.15):
    # I read the image in BGR format because OpenCV loads BGR by default.
    image_bgr = cv2.imread(str(image_path))
    if image_bgr is None:
        return None

    # I run YOLO-Face inference and get the first result object.
    result = yolo_face_model.predict(
        source=image_bgr,
        conf=conf_thres,
        verbose=False
    )[0]

    # I safely handle the case where no face is detected.
    if result.boxes is None or len(result.boxes) == 0:
        return None

    # Logic: It assumes the largest face belongs to the main subject 
    # and ignores the smaller background faces.
    boxes_xyxy = result.boxes.xyxy.cpu().numpy()
    areas = (boxes_xyxy[:, 2] - boxes_xyxy[:, 0]) * (boxes_xyxy[:, 3] - boxes_xyxy[:, 1])
    x1, y1, x2, y2 = boxes_xyxy[np.argmax(areas)].astype(int)

    # Why: YOLO bounding boxes are usually very "tight". 
    # Action: It expands the box by 15% (0.15) on all sides. 
    h, w = image_bgr.shape[:2]
    bw, bh = (x2 - x1), (y2 - y1)
    mx, my = int(bw * margin_ratio), int(bh * margin_ratio)

    x1 = max(0, x1 - mx)
    y1 = max(0, y1 - my)
    x2 = min(w, x2 + mx)
    y2 = min(h, y2 + my)

    # Cropping the face region from the original BGR image.
    face_bgr = image_bgr[y1:y2, x1:x2]
    if face_bgr.size == 0:
        return None

    # I convert to RGB, resize to 224x224, and normalize to [0,1].
    face_rgb = cv2.cvtColor(face_bgr, cv2.COLOR_BGR2RGB)
    face_rgb = cv2.resize(face_rgb, target_size, interpolation=cv2.INTER_AREA)
    face_rgb = face_rgb.astype("float32") / 255.0
    return face_rgb

In [None]:
from tqdm import tqdm

# --- Preparation ---
df_meow = df_meow.dropna(subset=["image_path", "age"]).copy()

# Note: We keep age as int64 here (cleaner for you)
df_meow["age_group"] = df_meow["age"].apply(map_age_to_group).astype("int32")

processed_metadata = []
skipped_count = 0

print(f"üöÄ Starting Processing on {len(df_meow)} images...")

for row in tqdm(df_meow.itertuples(index=False), total=len(df_meow)):
    image_path = Path(str(row.image_path))
    
    if not image_path.exists():
        skipped_count += 1
        continue

    # Crop
    face = crop_face_with_yolo(image_path) # Returns uint8 (0-255)
    if face is None:
        skipped_count += 1
        continue

    # Save to disk
    # Convert RGB -> BGR for OpenCV
    face_bgr = cv2.cvtColor(face, cv2.COLOR_RGB2BGR)
    
    save_name = f"{int(row.age)}_{image_path.name}"
    save_path = save_dir / save_name
    cv2.imwrite(str(save_path), face_bgr)

    # Store Metadata
    processed_metadata.append({
        "path": str(save_path), # <--- This is the new cropped image path
        "age": row.age,          # <--- Kept as int/original
        "age_group": row.age_group
    })

if not processed_metadata:
    raise ValueError("No images processed!")

# Create clean DataFrame
df_processed = pd.DataFrame(processed_metadata)  # the new df with paths, age & age groups

# --- FINAL CONVERSION ---
# This is where we satisfy the model's need for floats
X = df_processed["path"].values
y_age = df_processed["age"].values.astype("float32") # <--- Convert ONLY here
y_age_group = df_processed["age_group"].values.astype("int32")

df_processed.to_csv("processed_metadata.csv", index=False)

print(f"‚úÖ Processed {len(X)} images. Saved to disk.")
print(f"‚ö†Ô∏è Skipped {skipped_count}.")

>_Can use this cell if restarted kernel / rerun_

### Train-test split

In [None]:
# --- 1. Split Data (Train / Val / Test) ---

# I verify stratification requirements quickly to ensure fair age distribution.
stratify_vals = y_age_group if (pd.Series(y_age_group).value_counts().min() >= 2) else None

# First Split: Separate the Test Set (10% of total)
# This data is locked away and only used for the final exam.
X_temp, X_test, y_age_temp, y_age_test, y_group_temp, y_group_test = train_test_split(
    X, y_age, y_age_group,
    test_size=0.10, 
    random_state=RANDOM_STATE,
    stratify=stratify_vals
)

# Recalculate stratify values for the remaining 90%
stratify_vals_temp = y_group_temp if (pd.Series(y_group_temp).value_counts().min() >= 2) else None

# Second Split: Separate Train and Validation
# I use 0.1111 (approx 1/9) because 1/9th of the remaining 90% equals 10% of the total.
# Final Result: Train (80%), Val (10%), Test (10%)
X_train, X_val, y_age_train, y_age_val, y_group_train, y_group_val = train_test_split(
    X_temp, y_age_temp, y_group_temp,
    test_size=0.1111, 
    random_state=RANDOM_STATE,
    stratify=stratify_vals_temp
)

print(f"‚úÖ Data Split Complete:")
print(f"Train: {len(X_train)} samples (80%)")
print(f"Val:   {len(X_val)} samples (10%)")
print(f"Test:  {len(X_test)} samples (10%)")

# --- 2. Dataset Pipeline ---

def load_and_process_image(path, targets):
    """Loads image from disk, decodes, resizes, and normalizes."""
    # I read the raw file from the disk
    img = tf.io.read_file(path)
    # I decode the compressed string into a tensor
    img = tf.io.decode_image(img, channels=3, expand_animations=False)
    # I resize it because the model expects a fixed 224x224 input
    img = tf.image.resize(img, IMAGE_SIZE)
    # I normalize pixels from 0-255 to 0-1 for faster convergence
    img = tf.cast(img, tf.float32) / 255.0
    return img, targets

def make_dataset(paths, age_targets, group_targets, training=False):
    # I create the base dataset from paths (Strings) and labels
    ds = tf.data.Dataset.from_tensor_slices((
        paths,
        { "age_group_output": group_targets, "age_output": age_targets }
    ))

    # ‚ö†Ô∏è CRITICAL STEP: Map paths -> Actual Images
    # num_parallel_calls=AUTOTUNE allows my CPU to load multiple images while the GPU trains
    ds = ds.map(load_and_process_image, num_parallel_calls=AUTOTUNE)

    # I only shuffle the training data. 
    # Validation and Test order should stay fixed for consistent evaluation.
    if training:
        ds = ds.shuffle(buffer_size=1000, seed=RANDOM_STATE)

    # I batch the images and prefetch the next batch to avoid bottlenecks
    ds = ds.batch(BATCH_SIZE).prefetch(AUTOTUNE)
    return ds

# --- 3. Create Datasets ---
# I create optimized pipelines for all three sets.
# Note: training=True only for the training set!
train_ds = make_dataset(X_train, y_age_train, y_group_train, training=True)
val_ds   = make_dataset(X_val,   y_age_val,   y_group_val,   training=False)
test_ds  = make_dataset(X_test,  y_age_test,  y_group_test,  training=False)

#### 5. Build & Compile the Model - Architecture: EfficientNetB0 Backbone

Instead of building a custom CNN from scratch, I use **Transfer Learning** with a pre-trained EfficientNetB0 backbone.

##### **EfficientNetB0 (Pre-trained) ‚Üí GlobalAveragePooling ‚Üí Dense(Shared) ‚Üí Split Heads**

1. **EfficientNetB0 Backbone (The "Eye"):**
   - **Why use it?** Unlike my custom CNN which learns from random weights, this model has already been trained on **14 million images** (ImageNet). It already knows how to detect edges, textures, and shapes.
   - **Benefit:** Massive improvement in feature extraction without needing millions of face images.

2. **Freezing the Backbone (Stage 1 Strategy):**
- I set backbone.trainable = False.

- Why? The new layers at the end have random weights. If I trained everything at once, these random errors would propagate back and destroy the pre-trained ImageNet patterns. I keep the backbone "frozen" first to let the new layers learn to work with it.

3. **GlobalAveragePooling2D:**
   - Replaces `Flatten`. Instead of keeping every pixel location, it averages the features map. 
   - **Benefit:** Drastically reduces the number of parameters, preventing overfitting and making the model smaller/faster.

4. **Shared Dense Layer (256 units + ReLU):**
   - Combines the high-level features extracted by EfficientNet into a vector representing the face.
   - **Dropout (Tunable):** Randomly turns off neurons to force the model to learn robust features, preventing it from relying on just one specific visual cue.

5. **Two-Head Output Strategy (Multi-Task Learning):**
   - **Head 1 (Classification):** `Softmax` output for Age Group. Acts as an "anchor" to guide the model towards the general correct range.
   - **Head 2 (Regression):** `Linear` output for Exact Age. Fine-tunes the prediction to get the specific number.
   - **Why?** Learning both tasks simultaneously improves the shared backbone features, making the model smarter than if it learned just one.

6. **Compile the model:**

> _It prepares your model for training by telling it:_
1) _How to learn (optimizer)_   = I use `adam` since it is the best choice for beginners and most practical models
2) _What to measure (metrics)_ 
3) _What to minimize (loss function)_

In [None]:
from tensorflow.keras.applications import EfficientNetB0

def build_tunable_model(hp):
    # I define the hyperparameter search space.
    # 1. Dropout: Controls how much "memory" we delete to prevent overfitting.
    dropout_rate = hp.Float("dropout_rate", min_value=0.25, max_value=0.35, step=0.05)
    
    # 2. Learning Rate: Controls how fast the model learns. 
    # Too fast = unstable; Too slow = takes forever.
    lr = hp.Choice("learning_rate", values=[1e-3, 5e-4, 1e-4])

    # --- Input & Backbone ---
    # I define the input shape explicitly.
    inputs = Input(shape=(224, 224, 3), name="image_input")
    
    # EfficientNet expects [0, 255] pixels, but we normalized to [0, 1].
    # I rescale the inputs back to [0, 255] internally.
    x = Rescaling(255.0)(inputs)

    # I load the pre-trained EfficientNetB0.
    # pooling="avg" automatically applies GlobalAveragePooling2D.
    backbone = EfficientNetB0(
        include_top=False, 
        weights="imagenet", 
        input_shape=(224, 224, 3), 
        pooling="avg"
    )
    
    # I freeze the backbone so we don't destroy the pre-trained ImageNet weights 
    # while our new custom heads are initialized randomly.
    backbone.trainable = False
    
    # I extract the features using the frozen backbone.
    features = backbone(x, training=False)

    # --- Shared Layers ---
    # I combine features into a shared dense layer.
    shared = Dense(256, activation="relu", name="shared_dense")(features)
    shared = Dropout(dropout_rate, name="shared_dropout")(shared)

    # --- Output Heads ---
    # Head 1: Age Group (Helper Task)
    age_group_output = Dense(NUM_AGE_GROUPS, activation="softmax", name="age_group_output")(shared)
    
    # Head 2: Exact Age (Main Task)
    age_output = Dense(1, activation="linear", name="age_output")(shared)

    # --- Compile ---
    model = Model(inputs=inputs, outputs=[age_group_output, age_output])
    
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
        loss={
            "age_group_output": "sparse_categorical_crossentropy",
            "age_output": "mean_absolute_error"
        },
        loss_weights={
            "age_group_output": 0.5, # Auxiliary weight (Guide)
            "age_output": 1.0        # Primary weight (Goal)
        },
        metrics={
            "age_group_output": "accuracy", 
            "age_output": "mae"
        }
    )
    return model

print("‚úÖ Model Architecture defined successfully.")

#### 6. Hyperparameter Search (Tuning)

I use **Keras Tuner (Hyperband)** to automatically find the best configuration.

**What I am searching for:**
1.  **Dropout Rate:** (0.2, 0.3, 0.4, 0.5) - To find the perfect balance for regularization.
2.  **Initial Learning Rate:** (0.001, 0.0005, 0.0001) - To find the best speed to start training.

**Callbacks used during search:**
- **EarlyStopping:** Stops bad trials immediately to save time.
- **ReduceLROnPlateau:** If a trial gets stuck, it lowers the learning rate to try and squeeze out better performance. This ensures every configuration gets a fair chance to reach its best potential.

In [None]:
import keras_tuner as kt

# I initialize the Hyperband tuner.
tuner = kt.Hyperband(
    build_tunable_model,
    objective=kt.Objective("val_age_output_mae", direction="min"),
    max_epochs=15,
    factor=3,
    directory="efficientnet_tuning",
    project_name="age_tuning_final"
)

# --- Define Callbacks ---
# 1. Stop trials that are not improving to save time.
# removed since it can be too aggressive for tuning and may stop promising trials early.
stop_early = tf.keras.callbacks.EarlyStopping(
    monitor='val_age_output_mae', 
    patience=2, 
    restore_best_weights=True
)

# 2. Reduce Learning Rate if the model hits a plateau.
# This helps each trial converge better, making the comparison fairer.
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(
    monitor='val_age_output_mae', 
    factor=0.2,    # Reduce LR to 20% of its value
    patience=1,    # Wait just 1 epoch (aggressive for tuning)
    mode = 'min',
    min_lr=1e-6
)

print("üöÄ Starting Hyperparameter Search...")

# I run the search with only stop_early callback.
tuner.search(
    train_ds,
    validation_data=val_ds,
    epochs=15,
    callbacks=[reduce_lr]  # remove stop_early
)

# --- Retrieve Results ---
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"\nüèÜ Best Hyperparameters Found:")
print(f"   - Dropout Rate: {best_hps.get('dropout_rate')}")
print(f"   - Initial Learning Rate: {best_hps.get('learning_rate')}")

In [None]:
best_trial = tuner.oracle.get_best_trials(num_trials=1)[0]
print(best_trial.metrics.get_history("val_age_output_mae"))

In [None]:
tuner.search(..., epochs=12)

In [None]:
# 2. Show the top 10 trials
# This prints a table showing which combinations won and their validation scores.
print("\nüìä TOP 5 TRIALS SUMMARY:")
tuner.results_summary(num_trials=20)

In [None]:
# --- 1. Get the Best Trial ---
best_trial = tuner.oracle.get_best_trials(num_trials=1)[0]

# --- 2. Extract Validation Metrics ---
val_mae = best_trial.score  # This is the objective
val_loss = best_trial.metrics.get_last_value("val_loss")

# Safe retrieval (in case metric names differ)
def safe_get(metric_name):
    try:
        return best_trial.metrics.get_last_value(metric_name)
    except:
        return None

val_acc = safe_get("val_age_group_output_accuracy")
train_loss = safe_get("loss")
train_mae = safe_get("age_output_mae")
train_acc = safe_get("age_group_output_accuracy")

# --- 3. Print Report ---
print(f"üèÜ BEST TRIAL ID: {best_trial.trial_id}")
print("-" * 40)

print(f"üìâ Validation MAE (Objective): {val_mae:.4f}")
print(f"üìâ Validation Loss:            {val_loss:.4f}")

if val_acc is not None:
    print(f"‚úÖ Validation Accuracy:        {val_acc:.4f}")

print("-" * 40)

if train_loss is not None:
    print(f"üìâ Training Loss:              {train_loss:.4f}")

if train_mae is not None:
    print(f"üìâ Training MAE:               {train_mae:.4f}")

if train_acc is not None:
    print(f"‚úÖ Training Accuracy:          {train_acc:.4f}")

print("-" * 40)
print("Hyperparameters:")
print(best_trial.hyperparameters.values)


#### 7. Fine-Tuning (Stage 2: Unfreezing)

Now that the Tuner has found the best architecture (Dropout/Learning Rate) and trained the classification heads, I unfreeze the **EfficientNet backbone**.

**Strategy:**
1.  **Unfreeze Top Layers:** I unfreeze the top ~40 layers of the backbone.
2.  **Low Learning Rate (1e-5):** I use a very small learning rate to gently adapt the pre-trained weights to face features without "forgetting" the general shapes learned from ImageNet.
3.  **Early Stopping:** I set a higher epoch limit (20) but rely on Early Stopping to halt training once the model stops improving.

This step transforms the model from a generic image classifier into a specialized **Face Age Predictor**.

In [None]:
# --- Configuration ---
FINE_TUNE_EPOCHS = 30    # Increased to 20 as a safety buffer (EarlyStopping will cut it short)
FINE_TUNE_LR = 1e-5      # Ultra-low LR to prevent destroying pre-trained weights
UNFREEZE_TOP_LAYERS = 40 # Only let the top ~2 blocks of EfficientNet learn

# 1. Retrieve the Best Model from the Tuner (It's already trained on Stage 1)
# Note: The tuner returns a COMPILED model, but we must recompile it later anyway.
best_model = tuner.get_best_models(num_models=1)[0]

# 2. Locate the Backbone Layer
# I find the layer named "efficientnetb0" dynamically because its name might vary slightly.
backbone = next(layer for layer in best_model.layers if "efficientnet" in layer.name.lower())

# 3. Unfreeze the Backbone (Master Switch ON)
backbone.trainable = True

# 4. Freeze the Bottom Layers (Keep low-level features static)
# I lock the first ~200 layers so we only refine the high-level features at the top.
for layer in backbone.layers[:-UNFREEZE_TOP_LAYERS]:
    layer.trainable = False

print(f"üîì Unfrozen top {UNFREEZE_TOP_LAYERS} layers. Ready for Fine-Tuning.")

# 5. Recompile (CRITICAL STEP)
# You MUST recompile to apply the "trainable=True" changes and set the new Low LR.
best_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=FINE_TUNE_LR),
    loss={
        "age_group_output": "sparse_categorical_crossentropy",
        "age_output": "mean_absolute_error"
    },
    loss_weights={
        "age_group_output": 0.5, # Helper task (Guide)
        "age_output": 1.0        # Main task (Goal)
    },
    metrics={"age_group_output": "accuracy", "age_output": "mae"}
)

# 6. Define Callbacks for Safety
callbacks_fine_tune = [
    # Stop if validation MAE doesn't improve for 4 epochs
    # This acts as the real "Epoch Limit"
    tf.keras.callbacks.EarlyStopping(
        monitor="val_age_output_mae", 
        mode="min", 
        patience=4, 
        restore_best_weights=True,
        verbose=1
    ),
    # If stuck, lower the LR even more (to 1e-7)
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor="val_age_output_mae", 
        mode="min", 
        factor=0.5, 
        patience=2, 
        min_lr=1e-7, 
        verbose=1
    )
]

üü¢ Age Prediction Metrics (regression)
| Metric             | Meaning                                                   |
| ------------------ | --------------------------------------------------------- |
| `age_output_loss`  | Mean Squared Error (how off the predictions are, squared) |
| `age_output_mae`   | Mean Absolute Error (average error in years)              |
| `val_age_output_` | Same as above, but on validation set (unseen data)        |

üü¢ Gender Prediction Metrics (classification)
| Metric                       | Meaning                                                      |
| ---------------------------- | ------------------------------------------------------------ |
| `gender_output_accuracy`     | % of genders predicted correctly on training data            |
| `val_gender_output_accuracy` | Same for validation (unseen) data                            |
| `gender_output_loss`         | How wrong the model was on predicting gender (cross-entropy) |


‚úÖ What we can conclude:

1) Model is learning age and gender well
- MAE dropped from 7.8901 ‚Üí 5.7301 on training
- Validation MAE dropped from 11.1618 ‚Üí 8.4502

- Gender training accuracy went from 86.46% ‚Üí 94.58%
- Validation accuracy stayed strong: ~ 86%‚Äì88%

2) **No severe overfitting** - Training and validation accuracy/MAE **stay close**


In [None]:
print("üöÄ Starting Stage 2 Training (Fine-Tuning)...")

# 7. Train
history_fine = best_model.fit(
    train_ds,                  # Dataset handles inputs automatically
    validation_data=val_ds,    # Dataset handles validation
    epochs=FINE_TUNE_EPOCHS,
    callbacks=callbacks_fine_tune,
    verbose=1
)

### 8. Save the Final Model

In [None]:
best_model.save("age_prediction_model_final.h5")
print("Final Model saved successfully.")

#### 9. Model Evaluation & Performance Visualization

Now that training is complete, I evaluate the final model on the **Test Set**‚Äîdata it has never seen before. 

I also plot the training curves for the fine-tuning stage to check for:
1. **Convergence:** Did the MAE actually go down?
2. **Overfitting:** Is there a huge gap between Training and Validation performance?

In [None]:
# --- 1. Evaluate on Test Set ---
# We use the test_ds we created earlier, which is optimized for CPU/GPU efficiency.
print("üìä Evaluating on Test Set...")
results = best_model.evaluate(test_ds, verbose=1)

# Mapping results to names (Model returns: loss, group_loss, age_loss, group_acc, age_mae)
# Note: The order depends on your model.metrics_names
metrics_map = dict(zip(best_model.metrics_names, results))

print(f"\nüìà Final Test Results:")
print(f"   - Age Mean Absolute Error (MAE): {metrics_map['age_output_mae']:.2f} years")
print(f"   - Age Group Accuracy: {metrics_map['age_group_output_accuracy']*100:.2f}%")

- 98.73,         # Total combined loss
- 97.55,         # Age prediction loss (MSE)
- 0.8378,        # Gender prediction loss (Cross-Entropy)
- 7.55,          # Age prediction MAE (Mean Absolute Error)
- 0.8784         # Gender prediction accuracy

#### 10. Plotting of training history
<br> 

#### Training and Validation Curves

In [None]:
import matplotlib.pyplot as plt

# We use 'history_fine' from our Stage 2 training
h = history_fine.history

plt.figure(figsize=(14, 5))

# --- Plot 1: Age MAE (Regression) ---
plt.subplot(1, 2, 1)
plt.plot(h['age_output_mae'], label='Train MAE', color='blue', linestyle='--')
plt.plot(h['val_age_output_mae'], label='Val MAE', color='blue', linewidth=2)
plt.title('Age Prediction: Mean Absolute Error')
plt.xlabel('Epoch')
plt.ylabel('MAE (Years)')
plt.legend()
plt.grid(True, alpha=0.3)

# --- Plot 2: Age Group Accuracy (Classification) ---
plt.subplot(1, 2, 2)
plt.plot(h['age_group_output_accuracy'], label='Train Acc', color='green', linestyle='--')
plt.plot(h['val_age_group_output_accuracy'], label='Val Acc', color='green', linewidth=2)
plt.title('Age Group: Classification Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

#### 1. Age Prediction (Mean Absolute Error)

üîµ Train MAE steadily decreased from ~7.8 to ~5.7 years, showing that my model is consistently learning from the training data.

üü† Validation MAE is more unstable, with spikes (up to ~18) around epoch 6, before settling between 7‚Äì9 years.

My takeaway:

- The training curve looks healthy, but the validation curve shows signs of overfitting and instability.

- My model may be memorizing training examples too well while struggling to generalize to unseen data.

- To improve, I could try Regularization (e.g., L2 weight decay) and more data augmentation.

#### 2. Gender Prediction (Accuracy)

üîµ Train Accuracy improved smoothly, reaching ~94.5%.

üü† Validation Accuracy fluctuated between 83‚Äì89%, without the same steady upward trend.

My takeaway:

The gap between training and validation accuracy suggests overfitting. My model performs very well on the training set but struggles to maintain stable accuracy on validation.

Possible improvements: Add dropout in convolutional layers (not just at the dense layer). Tune batch size or try early stopping with patience to prevent over-training.

#### ‚úÖ Overall Reflection

Age prediction has promising training results but unstable validation MAE, pointing to generalization issues.

Gender prediction shows high training accuracy but fluctuating validation performance, another sign of overfitting.

I‚Äôm happy with the direction my model is going (it clearly learns useful patterns), but I recognize I need to work on improving generalization so that it performs more reliably on new data.

#### 11. Build a new CNN Model - Improve Generatlization

##### To solve my overfitting issues:
1) Use a joint objective in my keras tuning, and change objectives to increase accuracy on validation dataset instead, since training accuracy will usually go up anyways.
2) Add L2 regularization with tuning, to pick optimal regularization as well as Learning Rate
3) Add EarlyStoppiong, ReduceLROnPlateau, and ModelCheckpoint
4) Slightly increase validation split

In [None]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
import keras_tuner as kt
from tensorflow.keras.regularizers import l2

# add l2 regularization to conv layers(2)
def build_regularized_model(hp):
    input_layer = Input(shape=(200, 200, 3))
    
    l2_val = hp.Choice("l2_reg", [1e-4, 5e-4])
    x = Conv2D(32, (3,3), padding='same', kernel_regularizer=l2(l2_val))(input_layer)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = MaxPooling2D(2,2)(x)
    
    x = Conv2D(64, (3,3), padding='same', kernel_regularizer=l2(l2_val))(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = MaxPooling2D(2,2)(x)

    x = Conv2D(128, (3,3), padding='same', kernel_regularizer=l2(l2_val))(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = MaxPooling2D(2,2)(x)
    
    x = Flatten()(x)
    x = Dropout(0.5)(x)
    age_output = Dense(1, name='age_output')(x)
    gender_output = Dense(2, activation='softmax', name='gender_output')(x)
    
    model = Model(inputs=input_layer, outputs=[age_output, gender_output])
    lr = hp.Choice('learning_rate', [0.0001, 0.0005, 0.001])
    
    model.compile(
        optimizer=Adam(learning_rate=lr),
        loss={
            'age_output': 'mse',
            'gender_output': 'categorical_crossentropy'
        },
        metrics={
            'age_output': 'mae',
            'gender_output': 'accuracy'
        }
    )
    return model

# Create a tuner with joint objectives (1)
tuner = kt.RandomSearch(
    build_regularized_model,
    objective= [
        kt.Objective('val_age_output_mae', direction='min'),
        kt.Objective('val_gender_output_accuracy', direction='max')
    ],
    max_trials=10,
    overwrite=True,
    directory='kt_search',
    project_name='age_gender_cnn'
)

tuner.search(X_train, {'age_output': y_age_train, 'gender_output': y_gender_train}, epochs=4, validation_split=0.2)

best_model = tuner.get_best_models(num_models=1)[0]
print('Best learning rate:', tuner.get_best_hyperparameters(1)[0].get('learning_rate'))

#### 12. Building, Comiling & Training 2nd Model

In [None]:
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint

# Add EarlyStopping (3)
# Stop training if validation loss doesn't improve for 3 consecutive epochs
early_stop = EarlyStopping(
    monitor='val_loss',      # or 'val_age_output_mae', etc.
    patience=3,              # number of epochs with no improvement after which training will be stopped
    restore_best_weights=True  # restore model weights from the epoch with the best value of the monitored quantity
)

# Add ReduceLROnPlateau (3)
# Reduce learning rate when a metric has stopped improving
lr_scheduler = ReduceLROnPlateau(
    monitor="val_loss", factor=0.5, patience=2, min_lr=1e-6, verbose=1
)

# Save the best model during training (3)
checkpoint = ModelCheckpoint(
    "best_age_gender.keras",
    monitor="val_loss",
    save_best_only=True,
    verbose=1 # only log when a model is saved
)

history = best_model.fit(
    X_train, {'age_output': y_age_train, 'gender_output': y_gender_train},
    validation_split=0.2,
    epochs=16,
    batch_size=32,
    callbacks=[early_stop, lr_scheduler, checkpoint]
)

#### üß† Training Improvements with Callbacks and Regularization

1. Training Dynamics
- Before, the training loss decreased steadily, but the **validation loss was unstable** and often spiked.  
- With **EarlyStopping**, **ReduceLROnPlateau**, and **ModelCheckpoint**, the training process became **more controlled**:
  - EarlyStopping prevented overtraining once validation stopped improving.  
  - ReduceLROnPlateau automatically reduced the learning rate when a metric has stopped learning. 
  - ModelCheckpoint saved the best-performing model (based on validation loss), so I always kept the strongest version.

2. Age Prediction (MAE)
- Training MAE improved from **~8.5 ‚Üí 5.8** by the end.  
- Validation MAE dropped and became more stable, hovering around **7.0‚Äì7.2** compared to ~9+ before.  
- This shows the model is learning age features better and generalizing more consistently on unseen data.

3. Gender Prediction (Accuracy)
- Training gender accuracy improved to **94%+**, higher than previous runs.  
- Validation gender accuracy consistently reached **~89‚Äì90%**, compared to ~86‚Äì87% earlier.  
- The validation curve is smoother, showing that regularization + callbacks reduced overfitting.

4. Loss Stability
- Validation loss used to fluctuate heavily (sometimes >150).  
- Now, it **dropped steadily to ~88‚Äì90**, with much smaller fluctuations.  
- This demonstrates that learning rate scheduling helped the optimizer settle into better minima.


#### 13. Evaluate on Test Set and Save Model

In [None]:
best_model.evaluate(X_test, {'age_output': y_age_test, 'gender_output': y_gender_test})

In [None]:
best_model.save("models/age_gender_model_2.keras")

#### 14. Plot New Training vs Validation Training Curves

In [None]:
# Plot Age MAE
plt.plot(history.history['age_output_mae'], label='Age MAE (Train)')
plt.plot(history.history['val_age_output_mae'], label='Age MAE (Val)')
plt.title('Age Prediction: Mean Absolute Error')
plt.xlabel('Epoch')
plt.ylabel('MAE')
plt.legend()
plt.grid(True)
plt.show()

# Plot Gender Accuracy
plt.plot(history.history['gender_output_accuracy'], label='Gender Acc (Train)')
plt.plot(history.history['val_gender_output_accuracy'], label='Gender Acc (Val)')
plt.title('Gender Prediction: Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
plt.show()

#### üìä Training Results (After Regularization + Callbacks)

1. Age Prediction (MAE)
- **Training MAE** decreased steadily from ~8.5 ‚Üí 5.8, showing the model is learning age features effectively.  
- **Validation MAE** fluctuated at the start but stabilized around ~7.0‚Äì7.2, much lower than earlier runs (~9+).  

2. Gender Prediction (Accuracy)
- **Training accuracy** increased smoothly from ~85% ‚Üí 94.5%.  
- **Validation accuracy** climbed from ~74% ‚Üí ~90%, staying close to the training curve.   

3. Key Takeaways
- Both **age MAE** and **gender accuracy** improved compared to prior models. And the smaller gap between training and validation curves, and smoother validation curves indicates reduced overfitting and better generalization, hence with stabilized validation performance, the model was more robust for deployment.


#### 15. Age Prediction Evaluation - Predicted vs Actual Scatter Plot

In [None]:
# Get age predictions (output 0 from the model)
y_pred_age = best_model.predict(X_test)[0].flatten()

# Plot predicted vs actual ages
plt.figure(figsize=(8, 6))
plt.scatter(y_age_test, y_pred_age, alpha=0.5)
plt.plot([0, 100], [0, 100], 'r--')  # ideal line
plt.xlabel('Actual Age')
plt.ylabel('Predicted Age')
plt.title('Predicted vs Actual Age')
plt.grid(True)
plt.show()


#### Insights:

- ‚úÖ The model **successfully captures the age distribution trend** with relatively tight clustering along the ideal red line.
- ‚úÖ A small mean absolute error (MAE) and this scatter pattern confirm that the model performs **reasonably well**.
- ‚ö†Ô∏è - There are some outliers:
  - A few predictions below 0 (can be fixed by clamping or using a `ReLU` in the output layer)
  - More scatter and noise at age extremes (e.g. 60+), likely due to fewer samples in those ranges.

> üéØ Overall, this visual confirms that my CNN is making **informed predictions** and not random guesses.


#### 16. Gender Prediction Evaluation ‚Äì Classification Report

In [None]:
from sklearn.metrics import classification_report
import numpy as np

# Get gender predictions (output 1 from the model)
y_pred_gender = best_model.predict(X_test)[1]

# Convert one-hot encoded gender back to label
y_pred_gender_labels = np.argmax(y_pred_gender, axis=1)
y_true_gender_labels = np.argmax(y_gender_test, axis=1)

# Generate classification report
print(classification_report(y_true_gender_labels, y_pred_gender_labels, target_names=['Male', 'Female']))

| Class   | Precision | Recall | F1-score | Support | 
|---------|-----------|--------|----------|---------|
| Definition |Out of all the times the model predicted a class, how often was it correct?| Out of all the true samples of a class, how many did the model correctly identify?|  - | - |
| Male    | 0.89      | 0.92   | 0.90     | 2320    |
| Female  | 0.91      | 0.88   | 0.89     | 2163    |

#### Insights:

- ‚úÖ The model achieves a strong **90% overall accuracy** in gender classification.
- ‚úÖ It‚Äôs especially strong at identifying **Male** samples (Recall: 92%), meaning it rarely misses when a sample is male.
- ‚úÖ It is more precise when predicting **Female** (Precision: 91%), meaning when it says "Female", it's usually correct.

> üéØ This confirms that my gender prediction performs well and makes reliable predictions for both classes with balanced precision and recall.
