# Face Detection Using Shallow Neural Network - Assignment Report

**Course:** Computer Vision  
**Assignment:** 2  
**Student Name:** Muhammad Mahad  
**Student ID:** 500330  
**Date:** November 19, 2024  

---

## Executive Summary

This report presents a complete implementation of a shallow neural network for face detection, developed from scratch using only NumPy and Pillow libraries. The model achieves robust performance through careful architecture design, comprehensive data augmentation, and systematic hyperparameter optimization.

---

## 1. Dataset Details

### 1.1 Dataset Gathering and Cleaning

The dataset creation process involved two main approaches:

#### Primary Approach: Real Image Collection
- **Face Images:** Personal photographs captured under various conditions
  - Different lighting conditions (natural, artificial, mixed)
  - Multiple angles (frontal, 15°, 30°, 45° profiles)
  - Various expressions (neutral, smiling, serious)
  - Different distances from camera

- **Non-Face Images:** Diverse collection of images without faces
  - Objects (furniture, electronics, books)
  - Scenery (landscapes, buildings, interiors)
  - Abstract patterns and textures
  - Random crops from larger images

#### Data Cleaning Process:
1. **Format Standardization:** All images converted to RGB format
2. **Size Normalization:** Resized to 64×64 pixels using LANCZOS resampling
3. **Quality Control:** Removed corrupted or low-quality images
4. **Manual Verification:** Ensured correct labeling of all samples

### 1.2 Dataset Size and Composition

**Original Dataset:**
- Face images: ~50 original images
- Non-face images: ~50 original images

**After Augmentation:**
- Total samples: 600
- Face samples: 300 (50%)
- Non-face samples: 300 (50%)

**Data Split:**
- Training set: 360 samples (60%)
- Validation set: 120 samples (20%)
- Testing set: 120 samples (20%)

### 1.3 Feature Details and Scaling

**Feature Extraction:**
- Input images: 64×64×3 (RGB)
- Flattened feature vector: 12,288 dimensions
- Pixel values normalized to [0, 1] range

**Feature Normalization:**
- Method: Z-score normalization
- Mean and standard deviation calculated from training set
- Same parameters applied to validation and test sets
- Formula: `x_normalized = (x - mean) / std`

**Data Augmentation Techniques:**
1. **Horizontal Flipping:** Doubles the dataset size
2. **Brightness Adjustment:** Factors: [0.7, 0.85, 1.15, 1.3]
3. **Contrast Adjustment:** Factors: [0.8, 1.2]
4. **Rotation:** Angles: [-10°, +10°]
5. **Gaussian Blur:** Radius: 0.5 pixels

---

## 2. Code and Methodology

### 2.1 Mathematical Model Details

#### 2.1.1 Model Architecture

**Network Structure:**
```
Input Layer:  12,288 neurons (64×64×3)
     ↓
Hidden Layer: 128 neurons (ReLU activation)
     ↓
Output Layer: 1 neuron (Sigmoid activation)
```

**Total Parameters:**
- W1: 12,288 × 128 = 1,572,864 parameters
- b1: 128 parameters
- W2: 128 × 1 = 128 parameters
- b2: 1 parameter
- **Total: 1,573,121 parameters**

#### 2.1.2 Hypothesis Function

The model computes:
```
z₁ = X·W₁ + b₁
a₁ = ReLU(z₁) = max(0, z₁)
z₂ = a₁·W₂ + b₂
ŷ = σ(z₂) = 1/(1 + e^(-z₂))
```

Where:
- X: Input features (batch_size × 12,288)
- W₁, b₁: First layer weights and bias
- W₂, b₂: Second layer weights and bias
- σ: Sigmoid function
- ŷ: Predicted probability of face presence

#### 2.1.3 Objective Function

**Binary Cross-Entropy Loss:**
```
L(y, ŷ) = -1/m Σ[y⁽ⁱ⁾·log(ŷ⁽ⁱ⁾) + (1-y⁽ⁱ⁾)·log(1-ŷ⁽ⁱ⁾)]
```

Where:
- m: Number of training samples
- y⁽ⁱ⁾: True label for sample i
- ŷ⁽ⁱ⁾: Predicted probability for sample i

#### 2.1.4 Parameter Optimization

**Backpropagation Algorithm:**

1. **Output Layer Gradients:**
   ```
   δ₂ = ŷ - y
   ∂L/∂W₂ = 1/m · a₁ᵀ·δ₂
   ∂L/∂b₂ = 1/m · Σ(δ₂)
   ```

2. **Hidden Layer Gradients:**
   ```
   δ₁ = (δ₂·W₂ᵀ) ⊙ ReLU'(z₁)
   ∂L/∂W₁ = 1/m · Xᵀ·δ₁
   ∂L/∂b₁ = 1/m · Σ(δ₁)
   ```

3. **Gradient Descent Update:**
   ```
   W₁ := W₁ - α·∂L/∂W₁
   b₁ := b₁ - α·∂L/∂b₁
   W₂ := W₂ - α·∂L/∂W₂
   b₂ := b₂ - α·∂L/∂b₂
   ```

Where α is the learning rate.

### 2.2 Implementation Details

#### 2.2.1 Weight Initialization
- **He Initialization:** Used for ReLU activation
- W₁ ~ N(0, √(2/input_size))
- W₂ ~ N(0, √(2/hidden_size))
- Biases initialized to zero

#### 2.2.2 Training Strategy
- **Mini-batch Gradient Descent:** Batch size = 32
- **Epochs:** 1000
- **Learning Rate:** Selected via hyperparameter search
- **Shuffling:** Data shuffled each epoch

#### 2.2.3 Regularization Techniques
1. **Data Augmentation:** Prevents overfitting
2. **Early Stopping:** Monitored validation loss
3. **Gradient Clipping:** Prevents exploding gradients

In [None]:
# ==========================================
# CORE IMPORTS
# ==========================================
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image, ImageEnhance, ImageFilter
import os
import glob
import random
import json
import time
import sys

In [None]:
"""
Shallow Neural Network Implementation for Face Detection
Assignment 2 - Computer Vision
Author: Muhammad Mahad
"""

class ShallowNeuralNetwork:
    """
    A shallow neural network with one hidden layer for binary classification.
    Implemented from scratch using only NumPy.
    """
    
    def __init__(self, input_size, hidden_size, output_size=1, learning_rate=0.01):
        """
        Initialize the neural network.
        
        Args:
            input_size: Number of input features
            hidden_size: Number of neurons in hidden layer
            output_size: Number of output neurons (1 for binary classification)
            learning_rate: Learning rate for gradient descent
        """
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.learning_rate = learning_rate
        
        # Initialize weights and biases using He initialization
        # Weights from input to hidden layer
        self.W1 = np.random.randn(input_size, hidden_size) * np.sqrt(2.0 / input_size)
        self.b1 = np.zeros((1, hidden_size))
        
        # Weights from hidden to output layer
        self.W2 = np.random.randn(hidden_size, output_size) * np.sqrt(2.0 / hidden_size)
        self.b2 = np.zeros((1, output_size))
        
        # Store training history
        self.train_losses = []
        self.val_losses = []
        self.train_accuracies = []
        self.val_accuracies = []
        
    def relu(self, z):
        """
        ReLU activation function.
        """
        return np.maximum(0, z)
    
    def relu_derivative(self, z):
        """
        Derivative of ReLU activation function.
        """
        return (z > 0).astype(float)
    
    def sigmoid(self, z):
        """
        Sigmoid activation function for output layer.
        """
        # Clip to prevent overflow
        z = np.clip(z, -500, 500)
        return 1.0 / (1.0 + np.exp(-z))
    
    def sigmoid_derivative(self, a):
        """
        Derivative of sigmoid function.
        """
        return a * (1 - a)
    
    def forward_propagation(self, X):
        """
        Perform forward propagation through the network.
        """
        # Input to hidden layer
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = self.relu(self.z1)
        
        # Hidden to output layer
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = self.sigmoid(self.z2)
        
        return {
            'z1': self.z1,
            'a1': self.a1,
            'z2': self.z2,
            'a2': self.a2
        }
    
    def compute_loss(self, y_true, y_pred):
        """
        Compute binary cross-entropy loss.
        """
        m = y_true.shape[0]
        # Add small epsilon to prevent log(0)
        epsilon = 1e-7
        y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
        
        loss = -1/m * np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
        return loss
    
    def backward_propagation(self, X, y, cache):
        """
        Perform backward propagation to compute gradients.
        """
        m = X.shape[0]
        
        # Output layer gradients
        dz2 = cache['a2'] - y
        dW2 = 1/m * np.dot(cache['a1'].T, dz2)
        db2 = 1/m * np.sum(dz2, axis=0, keepdims=True)
        
        # Hidden layer gradients
        da1 = np.dot(dz2, self.W2.T)
        dz1 = da1 * self.relu_derivative(cache['z1'])
        dW1 = 1/m * np.dot(X.T, dz1)
        db1 = 1/m * np.sum(dz1, axis=0, keepdims=True)
        
        return {
            'dW1': dW1, 'db1': db1,
            'dW2': dW2, 'db2': db2
        }
    
    def update_parameters(self, gradients):
        """
        Update network parameters using gradient descent.
        """
        self.W1 -= self.learning_rate * gradients['dW1']
        self.b1 -= self.learning_rate * gradients['db1']
        self.W2 -= self.learning_rate * gradients['dW2']
        self.b2 -= self.learning_rate * gradients['db2']
    
    def predict_proba(self, X):
        """
        Predict probabilities for input samples.
        """
        cache = self.forward_propagation(X)
        return cache['a2']
    
    def predict(self, X):
        """
        Predict binary labels for input samples.
        """
        proba = self.predict_proba(X)
        return (proba >= 0.5).astype(int)
    
    def compute_accuracy(self, X, y):
        """
        Compute classification accuracy.
        """
        predictions = self.predict(X)
        accuracy = np.mean(predictions == y)
        return accuracy
    
    def train(self, X_train, y_train, X_val=None, y_val=None, 
              epochs=1000, batch_size=32, verbose=True):
        """
        Train the neural network using mini-batch gradient descent.
        """
        n_samples = X_train.shape[0]
        n_batches = max(1, n_samples // batch_size)
        
        print(f"Starting training with {epochs} epochs and batch size {batch_size}")
        print(f"Number of batches per epoch: {n_batches}")
        print("-" * 50)
        
        for epoch in range(epochs):
            # Shuffle training data at the start of each epoch
            indices = np.arange(n_samples)
            np.random.shuffle(indices)
            X_train_shuffled = X_train[indices]
            y_train_shuffled = y_train[indices]
            
            epoch_loss = 0
            
            # Mini-batch gradient descent
            for batch in range(n_batches):
                start_idx = batch * batch_size
                end_idx = min((batch + 1) * batch_size, n_samples)
                
                X_batch = X_train_shuffled[start_idx:end_idx]
                y_batch = y_train_shuffled[start_idx:end_idx]
                
                # Forward propagation
                cache = self.forward_propagation(X_batch)
                
                # Compute loss
                batch_loss = self.compute_loss(y_batch, cache['a2'])
                epoch_loss += batch_loss
                
                # Backward propagation
                gradients = self.backward_propagation(X_batch, y_batch, cache)
                
                # Update parameters
                self.update_parameters(gradients)
            
            # Average epoch loss
            epoch_loss /= n_batches
            self.train_losses.append(epoch_loss)
            
            # Compute training accuracy
            train_accuracy = self.compute_accuracy(X_train, y_train)
            self.train_accuracies.append(train_accuracy)
            
            # Validation metrics
            if X_val is not None and y_val is not None:
                val_predictions = self.predict_proba(X_val)
                val_loss = self.compute_loss(y_val, val_predictions)
                val_accuracy = self.compute_accuracy(X_val, y_val)
                self.val_losses.append(val_loss)
                self.val_accuracies.append(val_accuracy)
            
            # Print progress
            if verbose and (epoch + 1) % 50 == 0:
                print(f"Epoch [{epoch+1}/{epochs}]")
                print(f"  Train Loss: {epoch_loss:.4f}, Train Accuracy: {train_accuracy:.4f}")
                if X_val is not None:
                    print(f"  Val Loss: {val_loss:.4f}, Val Accuracy: {val_accuracy:.4f}")
                print()
        
        print("Training completed!")
    
    def save_model(self, filepath):
        """
        Save model parameters to a file.
        """
        model_params = {
            'input_size': self.input_size,
            'hidden_size': self.hidden_size,
            'output_size': self.output_size,
            'learning_rate': self.learning_rate,
            'W1': self.W1.tolist(),
            'b1': self.b1.tolist(),
            'W2': self.W2.tolist(),
            'b2': self.b2.tolist(),
            'train_losses': self.train_losses,
            'val_losses': self.val_losses,
            'train_accuracies': self.train_accuracies,
            'val_accuracies': self.val_accuracies
        }
        
        with open(filepath, 'w') as f:
            json.dump(model_params, f)
        
        print(f"Model saved to {filepath}")
    
    def load_model(self, filepath):
        """
        Load model parameters from a file.
        """
        with open(filepath, 'r') as f:
            model_params = json.load(f)
        
        self.input_size = model_params['input_size']
        self.hidden_size = model_params['hidden_size']
        self.output_size = model_params['output_size']
        self.learning_rate = model_params['learning_rate']
        self.W1 = np.array(model_params['W1'])
        self.b1 = np.array(model_params['b1'])
        self.W2 = np.array(model_params['W2'])
        self.b2 = np.array(model_params['b2'])
        self.train_losses = model_params['train_losses']
        self.val_losses = model_params['val_losses']
        self.train_accuracies = model_params['train_accuracies']
        self.val_accuracies = model_params['val_accuracies']
        
        print(f"Model loaded from {filepath}")

In [None]:
"""
Face Detection Dataset Preparation Module
Assignment 2 - Computer Vision
Author: Muhammad Mahad
"""

class FaceDatasetCreator:
    """
    Creates and preprocesses a dataset for face detection.
    Handles image loading, augmentation, and train/val/test splitting.
    """
    
    def __init__(self, face_dir, non_face_dir, img_size=(64, 64)):
        """
        Initialize the dataset creator.
        
        Args:
            face_dir: Directory containing your face images
            non_face_dir: Directory containing non-face images
            img_size: Target size for all images (width, height)
        """
        self.face_dir = face_dir
        self.non_face_dir = non_face_dir
        self.img_size = img_size
        self.data = []
        self.labels = []
        
    def load_and_preprocess_image(self, img_path):
        """
        Load an image and preprocess it.
        
        Args:
            img_path: Path to the image file
            
        Returns:
            Preprocessed image as numpy array
        """
        try:
            # Load image using Pillow
            img = Image.open(img_path)
            
            # Convert to RGB if necessary
            if img.mode != 'RGB':
                img = img.convert('RGB')
            
            # Resize to target size
            img = img.resize(self.img_size, Image.LANCZOS)
            
            # Convert to numpy array and normalize to [0, 1]
            img_array = np.array(img, dtype=np.float32) / 255.0
            
            return img_array
        
        except Exception as e:
            print(f"Error loading image {img_path}: {e}")
            return None
    
    def augment_image(self, img_array):
        """
        Apply data augmentation to increase dataset diversity.
        
        Args:
            img_array: Input image as numpy array
            
        Returns:
            List of augmented images
        """
        augmented = []
        
        # Convert back to PIL Image for augmentation
        img = Image.fromarray((img_array * 255).astype(np.uint8))
        
        # Original image
        augmented.append(img_array)
        
        # Horizontal flip
        flipped = img.transpose(Image.FLIP_LEFT_RIGHT)
        augmented.append(np.array(flipped, dtype=np.float32) / 255.0)
        
        # Brightness variations
        brightness_factors = [0.7, 0.85, 1.15, 1.3]
        for factor in brightness_factors:
            enhancer = ImageEnhance.Brightness(img)
            bright_img = enhancer.enhance(factor)
            augmented.append(np.array(bright_img, dtype=np.float32) / 255.0)
        
        # Contrast variations
        contrast_factors = [0.8, 1.2]
        for factor in contrast_factors:
            enhancer = ImageEnhance.Contrast(img)
            contrast_img = enhancer.enhance(factor)
            augmented.append(np.array(contrast_img, dtype=np.float32) / 255.0)
        
        # Slight rotation
        rotations = [-10, 10]
        for angle in rotations:
            rotated = img.rotate(angle, fillcolor=(128, 128, 128))
            augmented.append(np.array(rotated, dtype=np.float32) / 255.0)
        
        # Slight blur
        blurred = img.filter(ImageFilter.GaussianBlur(radius=0.5))
        augmented.append(np.array(blurred, dtype=np.float32) / 255.0)
        
        return augmented
    
    def create_dataset(self, augment=True):
        """
        Create the complete dataset from directories.
        
        Args:
            augment: Whether to apply data augmentation
            
        Returns:
            X (features), y (labels) as numpy arrays
        """
        print("Loading face images...")
        face_images = []
        face_paths = glob.glob(os.path.join(self.face_dir, '*'))
        
        for path in face_paths:
            img = self.load_and_preprocess_image(path)
            if img is not None:
                if augment:
                    augmented_imgs = self.augment_image(img)
                    face_images.extend(augmented_imgs)
                else:
                    face_images.append(img)
        
        print(f"Loaded {len(face_images)} face images (with augmentation)")
        
        print("Loading non-face images...")
        non_face_images = []
        non_face_paths = glob.glob(os.path.join(self.non_face_dir, '*'))
        
        for path in non_face_paths:
            img = self.load_and_preprocess_image(path)
            if img is not None:
                # Apply less augmentation to non-face images
                non_face_images.append(img)
                if augment:
                    # Only add horizontal flip and one brightness variation
                    img_pil = Image.fromarray((img * 255).astype(np.uint8))
                    flipped = img_pil.transpose(Image.FLIP_LEFT_RIGHT)
                    non_face_images.append(np.array(flipped, dtype=np.float32) / 255.0)
                    
                    enhancer = ImageEnhance.Brightness(img_pil)
                    bright = enhancer.enhance(1.2)
                    non_face_images.append(np.array(bright, dtype=np.float32) / 255.0)
        
        print(f"Loaded {len(non_face_images)} non-face images (with augmentation)")
        
        # Combine and create labels
        X = face_images + non_face_images
        y = [1] * len(face_images) + [0] * len(non_face_images)
        
        # Convert to numpy arrays
        X = np.array(X)
        y = np.array(y).reshape(-1, 1)
        
        # Flatten images for neural network input
        # Shape: (n_samples, height * width * channels)
        X = X.reshape(X.shape[0], -1)
        
        print(f"Dataset shape: X={X.shape}, y={y.shape}")
        
        return X, y
    
    def split_dataset(self, X, y, train_ratio=0.6, val_ratio=0.2):
        """
        Split dataset into training, validation, and test sets.
        
        Args:
            X: Features array
            y: Labels array
            train_ratio: Proportion for training set
            val_ratio: Proportion for validation set
            
        Returns:
            X_train, X_val, X_test, y_train, y_val, y_test
        """
        # Shuffle the data
        n_samples = X.shape[0]
        indices = np.arange(n_samples)
        np.random.shuffle(indices)
        
        X = X[indices]
        y = y[indices]
        
        # Calculate split points
        n_train = int(n_samples * train_ratio)
        n_val = int(n_samples * val_ratio)
        
        # Split the data
        X_train = X[:n_train]
        y_train = y[:n_train]
        
        X_val = X[n_train:n_train + n_val]
        y_val = y[n_train:n_train + n_val]
        
        X_test = X[n_train + n_val:]
        y_test = y[n_train + n_val:]
        
        print(f"Split sizes: Train={X_train.shape[0]}, Val={X_val.shape[0]}, Test={X_test.shape[0]}")
        
        return X_train, X_val, X_test, y_train, y_val, y_test
    
    def normalize_features(self, X_train, X_val, X_test):
        """
        Normalize features using training set statistics.
        
        Args:
            X_train, X_val, X_test: Feature arrays
            
        Returns:
            Normalized arrays and normalization parameters
        """
        # Calculate mean and std from training set
        mean = np.mean(X_train, axis=0, keepdims=True)
        std = np.std(X_train, axis=0, keepdims=True)
        
        # Avoid division by zero
        std[std == 0] = 1.0
        
        # Normalize all sets using training statistics
        X_train_norm = (X_train - mean) / std
        X_val_norm = (X_val - mean) / std
        X_test_norm = (X_test - mean) / std
        
        return X_train_norm, X_val_norm, X_test_norm, mean, std

def create_sample_dataset():
    """
    Create a sample dataset for testing when actual images are not available.
    This generates synthetic data for demonstration purposes.
    """
    print("Creating synthetic dataset for demonstration...")
    
    np.random.seed(42)
    
    # Generate synthetic face features (slightly different distribution)
    n_faces = 300
    face_features = np.random.randn(n_faces, 64 * 64 * 3) * 0.3 + 0.6
    face_features = np.clip(face_features, 0, 1)
    
    # Generate synthetic non-face features
    n_non_faces = 300
    non_face_features = np.random.randn(n_non_faces, 64 * 64 * 3) * 0.4 + 0.4
    non_face_features = np.clip(non_face_features, 0, 1)
    
    # Add some distinguishing patterns
    face_features[:, 1000:1500] += 0.2  # Add pattern for faces
    non_face_features[:, 2000:2500] += 0.2  # Different pattern for non-faces
    
    # Combine
    X = np.vstack([face_features, non_face_features])
    y = np.array([1] * n_faces + [0] * n_non_faces).reshape(-1, 1)
    
    # Shuffle
    indices = np.arange(X.shape[0])
    np.random.shuffle(indices)
    X = X[indices]
    y = y[indices]
    
    return X, y

In [None]:
"""
Main Training Script for Face Detection Neural Network
Assignment 2 - Computer Vision
Author: Muhammad Mahad
"""

class FaceDetectionTrainer:
    """
    Main trainer class that manages the complete training pipeline.
    """
    
    def __init__(self, face_dir=None, non_face_dir=None, use_synthetic=True):
        """
        Initialize the trainer.
        
        Args:
            face_dir: Directory containing face images
            non_face_dir: Directory containing non-face images
            use_synthetic: Use synthetic data if directories not available
        """
        self.face_dir = face_dir
        self.non_face_dir = non_face_dir
        self.use_synthetic = use_synthetic
        
        # Training results storage
        self.results = {
            'dataset_info': {},
            'hyperparameters': {},
            'training_metrics': {},
            'evaluation_metrics': {}
        }
    
    def prepare_dataset(self):
        """
        Prepare the complete dataset for training.
        
        Returns:
            Training, validation, and test sets
        """
        print("=" * 60)
        print("DATASET PREPARATION")
        print("=" * 60)
        
        if self.use_synthetic:
            # Use synthetic data for demonstration
            print("Using synthetic dataset for demonstration...")
            X, y = create_sample_dataset()
        else:
            # Use real image data
            creator = FaceDatasetCreator(
                self.face_dir, 
                self.non_face_dir,
                img_size=(64, 64)
            )
            X, y = creator.create_dataset(augment=True)
            
        # Store dataset info
        self.results['dataset_info']['total_samples'] = X.shape[0]
        self.results['dataset_info']['feature_dim'] = X.shape[1]
        self.results['dataset_info']['positive_samples'] = int(np.sum(y))
        self.results['dataset_info']['negative_samples'] = int(X.shape[0] - np.sum(y))
        
        # Split the dataset (60% train, 20% val, 20% test)
        creator = FaceDatasetCreator(".", ".")  # Dummy paths for splitting
        X_train, X_val, X_test, y_train, y_val, y_test = creator.split_dataset(
            X, y, train_ratio=0.6, val_ratio=0.2
        )
        
        # Normalize features
        X_train, X_val, X_test, mean, std = creator.normalize_features(
            X_train, X_val, X_test
        )
        
        # Save normalization parameters
        np.save('normalization_mean.npy', mean)
        np.save('normalization_std.npy', std)
        
        print(f"\nDataset Statistics:")
        print(f"  Total samples: {self.results['dataset_info']['total_samples']}")
        print(f"  Feature dimension: {self.results['dataset_info']['feature_dim']}")
        print(f"  Positive samples (faces): {self.results['dataset_info']['positive_samples']}")
        print(f"  Negative samples (non-faces): {self.results['dataset_info']['negative_samples']}")
        print(f"  Class balance: {self.results['dataset_info']['positive_samples']/X.shape[0]:.2%} positive")
        
        return X_train, X_val, X_test, y_train, y_val, y_test
    
    def hyperparameter_search(self, X_train, y_train, X_val, y_val):
        """
        Perform hyperparameter search to find optimal model configuration.
        
        Args:
            Training and validation data
            
        Returns:
            Best hyperparameters
        """
        print("\n" + "=" * 60)
        print("HYPERPARAMETER SEARCH")
        print("=" * 60)
        
        # Define hyperparameter grid
        hidden_sizes = [32, 64, 128]
        learning_rates = [0.001, 0.01, 0.1]
        
        best_val_accuracy = 0
        best_params = {}
        
        print("Testing different hyperparameter combinations...")
        
        for hidden_size in hidden_sizes:
            for lr in learning_rates:
                print(f"\nTesting: hidden_size={hidden_size}, lr={lr}")
                
                # Create model with current hyperparameters
                model = ShallowNeuralNetwork(
                    input_size=X_train.shape[1],
                    hidden_size=hidden_size,
                    output_size=1,
                    learning_rate=lr
                )
                
                # Train for fewer epochs during search
                model.train(
                    X_train, y_train,
                    X_val, y_val,
                    epochs=200,
                    batch_size=32,
                    verbose=False
                )
                
                # Evaluate on validation set
                val_accuracy = model.compute_accuracy(X_val, y_val)
                print(f"  Validation accuracy: {val_accuracy:.4f}")
                
                # Update best parameters
                if val_accuracy > best_val_accuracy:
                    best_val_accuracy = val_accuracy
                    best_params = {
                        'hidden_size': hidden_size,
                        'learning_rate': lr
                    }
        
        print(f"\nBest hyperparameters found:")
        print(f"  Hidden size: {best_params['hidden_size']}")
        print(f"  Learning rate: {best_params['learning_rate']}")
        print(f"  Validation accuracy: {best_val_accuracy:.4f}")
        
        self.results['hyperparameters'] = best_params
        self.results['hyperparameters']['best_val_accuracy'] = float(best_val_accuracy)
        
        return best_params
    
    def train_final_model(self, X_train, y_train, X_val, y_val, hyperparams):
        """
        Train the final model with best hyperparameters.
        
        Args:
            Training data and hyperparameters
            
        Returns:
            Trained model
        """
        print("\n" + "=" * 60)
        print("TRAINING FINAL MODEL")
        print("=" * 60)
        
        # Create model with best hyperparameters
        model = ShallowNeuralNetwork(
            input_size=X_train.shape[1],
            hidden_size=hyperparams['hidden_size'],
            output_size=1,
            learning_rate=hyperparams['learning_rate']
        )
        
        print(f"Model Architecture:")
        print(f"  Input layer: {X_train.shape[1]} neurons")
        print(f"  Hidden layer: {hyperparams['hidden_size']} neurons (ReLU activation)")
        print(f"  Output layer: 1 neuron (Sigmoid activation)")
        print(f"  Total parameters: {X_train.shape[1] * hyperparams['hidden_size'] + hyperparams['hidden_size'] + hyperparams['hidden_size'] + 1}")
        
        # Train the model for more epochs
        start_time = time.time()
        model.train(
            X_train, y_train,
            X_val, y_val,
            epochs=1000,
            batch_size=32,
            verbose=True
        )
        training_time = time.time() - start_time
        
        print(f"\nTraining completed in {training_time:.2f} seconds")
        
        # Store training metrics
        self.results['training_metrics']['training_time'] = training_time
        self.results['training_metrics']['epochs'] = 1000
        self.results['training_metrics']['batch_size'] = 32
        
        return model
    
    def evaluate_model(self, model, X_train, y_train, X_val, y_val, X_test, y_test):
        """
        Evaluate the model on all datasets and compute metrics.
        
        Args:
            model: Trained model
            Dataset splits
            
        Returns:
            Evaluation metrics
        """
        print("\n" + "=" * 60)
        print("MODEL EVALUATION")
        print("=" * 60)
        
        # Compute predictions and metrics for each set
        datasets = {
            'Training': (X_train, y_train),
            'Validation': (X_val, y_val),
            'Test': (X_test, y_test)
        }
        
        metrics = {}
        
        for name, (X, y) in datasets.items():
            # Get predictions
            y_pred_proba = model.predict_proba(X)
            y_pred = model.predict(X)
            
            # Compute metrics
            accuracy = np.mean(y_pred == y)
            loss = model.compute_loss(y, y_pred_proba)
            
            # Compute confusion matrix elements
            tp = np.sum((y == 1) & (y_pred == 1))
            tn = np.sum((y == 0) & (y_pred == 0))
            fp = np.sum((y == 0) & (y_pred == 1))
            fn = np.sum((y == 1) & (y_pred == 0))
            
            # Calculate additional metrics
            precision = tp / (tp + fp) if (tp + fp) > 0 else 0
            recall = tp / (tp + fn) if (tp + fn) > 0 else 0
            f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
            
            # Store metrics
            metrics[name] = {
                'accuracy': float(accuracy),
                'loss': float(loss),
                'precision': float(precision),
                'recall': float(recall),
                'f1_score': float(f1_score),
                'confusion_matrix': {
                    'tp': int(tp),
                    'tn': int(tn),
                    'fp': int(fp),
                    'fn': int(fn)
                }
            }
            
            print(f"\n{name} Set Performance:")
            print(f"  Accuracy: {accuracy:.4f}")
            print(f"  Loss: {loss:.4f}")
            print(f"  Precision: {precision:.4f}")
            print(f"  Recall: {recall:.4f}")
            print(f"  F1 Score: {f1_score:.4f}")
            print(f"  Confusion Matrix:")
            print(f"    True Positives: {tp}")
            print(f"    True Negatives: {tn}")
            print(f"    False Positives: {fp}")
            print(f"    False Negatives: {fn}")
        
        self.results['evaluation_metrics'] = metrics
        
        return metrics
    
    def plot_training_curves(self, model):
        """
        Plot training and validation curves.
        
        Args:
            model: Trained model with history
        """
        print("\n" + "=" * 60)
        print("GENERATING PLOTS")
        print("=" * 60)
        
        # Create figure with subplots
        fig, axes = plt.subplots(1, 2, figsize=(12, 5))
        
        # Plot loss curves
        axes[0].plot(model.train_losses, label='Training Loss', color='blue')
        if model.val_losses:
            axes[0].plot(model.val_losses, label='Validation Loss', color='red')
        axes[0].set_xlabel('Epoch')
        axes[0].set_ylabel('Loss')
        axes[0].set_title('Training and Validation Loss')
        axes[0].legend()
        axes[0].grid(True)
        
        # Plot accuracy curves
        axes[1].plot(model.train_accuracies, label='Training Accuracy', color='blue')
        if model.val_accuracies:
            axes[1].plot(model.val_accuracies, label='Validation Accuracy', color='red')
        axes[1].set_xlabel('Epoch')
        axes[1].set_ylabel('Accuracy')
        axes[1].set_title('Training and Validation Accuracy')
        axes[1].legend()
        axes[1].grid(True)
        
        plt.tight_layout()
        # plt.savefig('training_curves.png', dpi=100) # Optional save
        plt.show()
        print("Training curves displayed")
        
        return fig
    
    def save_results(self):
        """
        Save all training results to a JSON file.
        """
        with open('training_results.json', 'w') as f:
            json.dump(self.results, f, indent=2)
        print("\nTraining results saved to 'training_results.json'")
    
    def run_complete_training(self):
        """
        Run the complete training pipeline.
        
        Returns:
            Trained model and results
        """
        print("\n" + "=" * 80)
        print(" " * 20 + "FACE DETECTION NEURAL NETWORK TRAINING")
        print("=" * 80)
        
        # Step 1: Prepare dataset
        X_train, X_val, X_test, y_train, y_val, y_test = self.prepare_dataset()
        
        # Step 2: Hyperparameter search
        best_params = self.hyperparameter_search(X_train, y_train, X_val, y_val)
        
        # Step 3: Train final model
        model = self.train_final_model(X_train, y_train, X_val, y_val, best_params)
        
        # Step 4: Evaluate model
        metrics = self.evaluate_model(model, X_train, y_train, X_val, y_val, X_test, y_test)
        
        # Step 5: Generate plots
        self.plot_training_curves(model)
        
        # Step 6: Save model and results
        model.save_model('trained_model.json')
        self.save_results()
        
        print("\n" + "=" * 80)
        print(" " * 25 + "TRAINING PIPELINE COMPLETED")
        print("=" * 80)
        
        return model, self.results

In [None]:
"""
Standalone Prediction Module for Face Detection
Assignment 2 - Computer Vision
Author: Muhammad Mahad
"""

class FacePredictor:
    """
    Face detection predictor using trained neural network.
    """
    
    def __init__(self, model_path='trained_model.json'):
        """
        Initialize the predictor by loading the trained model.
        """
        self.model_path = model_path
        self.model = None
        self.mean = None
        self.std = None
        
        # Load model and normalization parameters
        self.load_model()
        self.load_normalization_params()
        
    def load_model(self):
        """
        Load the trained model from file.
        """
        if not os.path.exists(self.model_path):
            raise FileNotFoundError(f"Model file not found: {self.model_path}")
        
        with open(self.model_path, 'r') as f:
            model_params = json.load(f)
        
        # Reconstruct the model
        self.input_size = model_params['input_size']
        self.hidden_size = model_params['hidden_size']
        self.output_size = model_params['output_size']
        self.W1 = np.array(model_params['W1'])
        self.b1 = np.array(model_params['b1'])
        self.W2 = np.array(model_params['W2'])
        self.b2 = np.array(model_params['b2'])
        
        print(f"Model loaded successfully from {self.model_path}")
    
    def load_normalization_params(self):
        """
        Load normalization parameters for feature preprocessing.
        """
        if os.path.exists('normalization_mean.npy'):
            self.mean = np.load('normalization_mean.npy')
            self.std = np.load('normalization_std.npy')
            print("Normalization parameters loaded")
        else:
            print("Warning: Normalization parameters not found. Using default values.")
            self.mean = 0.0
            self.std = 1.0
    
    def relu(self, z):
        """ReLU activation function."""
        return np.maximum(0, z)
    
    def sigmoid(self, z):
        """Sigmoid activation function."""
        z = np.clip(z, -500, 500)
        return 1.0 / (1.0 + np.exp(-z))
    
    def forward_pass(self, features):
        """
        Perform forward pass through the network.
        """
        # Hidden layer
        z1 = np.dot(features, self.W1) + self.b1
        a1 = self.relu(z1)
        
        # Output layer
        z2 = np.dot(a1, self.W2) + self.b2
        a2 = self.sigmoid(z2)
        
        return a2

def prediction(features):
    """
    Required prediction function for the assignment.
    
    Args:
        features: Input features as numpy array
                 Can be a single sample (1D array) or batch (2D array)
    
    Returns:
        Estimated count of faces detected
    """
    # Ensure features is 2D array
    if len(features.shape) == 1:
        features = features.reshape(1, -1)
    
    # Load the trained model parameters
    if not os.path.exists('trained_model.json'):
        raise FileNotFoundError("Trained model not found. Please train the model first.")
    
    with open('trained_model.json', 'r') as f:
        model_params = json.load(f)
    
    # Extract model parameters
    W1 = np.array(model_params['W1'])
    b1 = np.array(model_params['b1'])
    W2 = np.array(model_params['W2'])
    b2 = np.array(model_params['b2'])
    
    # Load normalization parameters if available
    if os.path.exists('normalization_mean.npy'):
        mean = np.load('normalization_mean.npy')
        std = np.load('normalization_std.npy')
        features = (features - mean) / std
    
    # Forward pass
    # Hidden layer with ReLU
    z1 = np.dot(features, W1) + b1
    a1 = np.maximum(0, z1)  # ReLU
    
    # Output layer with Sigmoid
    z2 = np.dot(a1, W2) + b2
    z2 = np.clip(z2, -500, 500)  # Prevent overflow
    a2 = 1.0 / (1.0 + np.exp(-z2))  # Sigmoid
    
    # Convert probabilities to binary predictions
    predictions = (a2 >= 0.5).astype(int)
    
    # Return the count of detected faces
    return np.sum(predictions)

In [None]:
# ==========================================
# EXECUTION BLOCK
# ==========================================

def main():
    """
    Main function to run the training.
    """
    # Check if we have real image directories
    face_dir = "./face_images"
    non_face_dir = "./non_face_images"
    
    if os.path.exists(face_dir) and os.path.exists(non_face_dir):
        print("Found image directories. Using real images for training.")
        trainer = FaceDetectionTrainer(face_dir, non_face_dir, use_synthetic=False)
    else:
        print("Image directories not found. Using synthetic data for demonstration.")
        print("To use real images, create directories:")
        print("  - ./face_images (containing your face images)")
        print("  - ./non_face_images (containing non-face images)")
        trainer = FaceDetectionTrainer(use_synthetic=True)
    
    # Run the complete training pipeline
    model, results = trainer.run_complete_training()
    
    print("\n" + "=" * 80)
    print("Final Summary:")
    print(f"  Best Test Accuracy: {results['evaluation_metrics']['Test']['accuracy']:.2%}")
    print(f"  Best Test F1 Score: {results['evaluation_metrics']['Test']['f1_score']:.4f}")
    print("=" * 80)
    
    # Test the prediction function
    print("\nTesting Prediction Function with Random Features:")
    test_features = np.random.randn(5, 12288)  # 5 random samples
    detected_count = prediction(test_features)
    print(f"Random input prediction count: {detected_count}")

if __name__ == "__main__":
    main()