### **Project Objective**

The objective of this project is to design, train, and evaluate a **multi-task computer vision system** that can **simultaneously predict Age, Gender, and Race** from a single face image.

Rather than treating these as independent problems, the project formulates them as a **joint inference task**. A shared convolutional backbone is trained to learn a common facial representation, while task-specific heads specialize in predicting each attribute. This approach reflects real-world deployment scenarios, where multiple attributes must be inferred from the same input under strict fairness and generalization constraints.

The project aims to:

* Learn a **shared facial representation** that supports all three tasks without sacrificing individual task performance.
* Minimize demographic bias by enforcing **balanced evaluation across race, gender, and age groups**.
* Ensure statistically sound performance measurement using **identity-disjoint train/validation/test splits**.
* Validate generalization on unseen populations via a strict hold-out test set.
* Produce a **deployment-ready model**, with inference exported to ONNX and exposed through a REST API.

This project is not limited to model training; it emphasizes **end-to-end system design**, spanning dataset selection, data engineering, multi-task learning, fairness evaluation, and production-oriented deployment.

## **1. Dataset Selection & Evaluation**

### **1.1 Formulation of Evaluation Criteria**

Given the abundance of publicly available face datasets, this project begins with a structured audit:
**Which dataset best supports a global, equitable facial analysis system?**

To answer this, I defined the following evaluation criteria:

* **Demographic Parity:** Balanced representation across 7 race groups (White, Black, Indian, East Asian, Southeast Asian, Middle Eastern, Latino) to reduce bias.
* **Label Granularity:** High-fidelity, concurrent labels for **Age**, **Gender**, and **Race**.
* **Environmental Diversity:** Images captured “in-the-wild” with real-world variation in lighting, pose, and background.
* **Scale:** Sufficient volume ($>100k$ images) to support generalization.
* **Generalization:** Support for **balanced accuracy**, where performance is independent of race and gender.

## **2. Model Development Pipeline**

### **2.1 Data Engineering & Stratified Splits**

#### **2.1.1 Data Partitioning Strategy: Ensuring Unbiased Generalization**

Although the source dataset provides predefined training and validation splits, this project adopts a more rigorous **three-way split** (Train / Validation / Test) to ensure statistically sound evaluation.

A custom data pipeline is used to re-partition the combined image pool.

#### **2.1.2 Rationale for the Three-Way Split**

Many public benchmarks—including **FairFace, CelebA, and UTKFace**—do not provide a true held-out test set tailored to a custom pipeline. However, a deployment-ready system requires a strict separation of concerns:

* **Training Set:** Used to optimize model parameters.
* **Validation Set:** Used for hyperparameter tuning and architectural decisions.
* **Test Set:** A final, untouched “black-box” set used only once to measure real-world generalization.

#### **2.1.3 Methodology: The Re-Splitting Process**

To preserve statistical rigor, the following workflow is applied:

* **Data Consolidation:** Original training and validation splits are merged into a unified pool.
* **Stratified Re-Splitting:** The pool is re-split into Train / Val / Test while preserving identical distributions across:

  * 7 race groups
  * 2 genders
  * 9 age bins
* **Identity Disjointness:** No individual appears in more than one split, preventing identity leakage.
* **Independence Verification:** Ensures unbiased evaluation, as test images never influence training or model selection.

## **2.2 Multi-Task Model Architecture**

The model is formulated as a **multi-task learning system**, where a single shared representation supports multiple prediction objectives.

### **2.2.2 Shared Backbone: The “Master” Feature Finder**

The core of the architecture is a **ResNet-34 CNN** used as a **shared feature extractor** for all three tasks.

Instead of training separate networks for age, gender, and race, the model first learns a strong, general-purpose **face representation**, which is then reused by task-specific heads.

* **Input:** Preprocessed face images.
* **Output:** A compact embedding capturing facial structure, texture, and shape.
* **Benefit:** Since all tasks backpropagate through the same backbone, the learned representation generalizes across labels instead of overfitting to a single task.

#### **2.2.2.1 The Input–Output Handshake**

Before implementing task-specific logic, a strict contract is defined:

* **Input:** A batch of face images resized and normalized to $(B, 3, 224, 224)$.
* **Output:** A 512-dimensional embedding per image.
* **Shape Guarantee:** A batch of size $B$ produces an output tensor of shape $(B, 512)$.

This contract ensures that downstream heads can be attached cleanly and independently.

#### **2.2.2.2 Surgery: Turning a Classifier into a Feature Finder**

A standard ResNet-34 is trained to classify 1,000 ImageNet categories. These final class predictions are not useful for facial attribute learning.

* **The Operation:** The final classification layer is removed.
* **Stopping Point:** The network is truncated after the **Global Average Pooling** stage.
* **Cleanup:** The pooled tensor $(B, 512, 1, 1)$ is flattened into a clean $(B, 512)$ embedding.

This transforms ResNet-34 from an object classifier into a reusable feature extractor.

#### **2.2.2.3 Training Strategy: Let the Backbone Learn**

For the baseline model, the backbone is **fine-tuned**, not frozen.

* **Full Gradient Flow:** All backbone parameters remain trainable.
* **Shared Learning Signal:** Errors from age, gender, and race predictions all update the same representation.
* **Result:** The backbone learns facial features that are broadly useful across tasks.

#### **2.2.2.4 Trust-but-Verify Sanity Checks**

Before large-scale training, two sanity checks validate correctness:

1. **Shape Check:** A dummy input must produce exactly a 512-dimensional embedding.
2. **Gradient Check:** A backward pass must propagate gradients into backbone parameters.

These checks ensure the backbone is actively learning and not acting as a frozen observer.

## **2.3 Training Evaluation on Test Set**

This section represents the final, unbiased evaluation of the trained system.

The **Test Set** is a strict hold-out: it is never used during training or validation.

### **2.3.1 Unseen Data Benchmark**

The final model is evaluated on the held-out test split to measure generalization to previously unseen faces.

### **2.3.2 Multi-Task Performance Breakdown**

Performance is reported independently for:

* **Age Classification**
* **Gender Classification**
* **Race Classification**

This analysis verifies whether the shared backbone supports all tasks equally well.

### **2.3.3 Fairness & Slice-Based Metrics**

Accuracy is evaluated across demographic slices (e.g., per-race accuracy) to detect residual bias.

### **2.3.4 Error Analysis via Confusion Matrices**

Confusion matrices are used to visualize systematic errors, such as confusion between neighboring age bins.

### **2.3.5 Identity Leakage Verification**

Final checks confirm that no subject identities overlap between training/validation and test sets.

### **2.3.6 Comparison to Commercial Baselines**

Results are benchmarked against industry-standard commercial APIs to contextualize performance.

## **2.4 Deployment & Inference (ONNX + REST API)**

To demonstrate real-world applicability, the trained model is prepared for deployment:

* **ONNX Export:** The PyTorch model is exported to ONNX for framework-agnostic inference.
* **Inference Engine:** ONNX Runtime is used for efficient CPU/GPU inference.
* **REST API:** A FastAPI-based service exposes a `/predict` endpoint for age, gender, and race inference.
* **Parity Checks:** Outputs from ONNX inference are verified against PyTorch to ensure numerical consistency.

This step bridges the gap between research and production-ready systems.
