# **Training a Facial Recognition Model Using the LFW Dataset**

#### **Description**

This notebook demonstrates the process of training a facial recognition model using the Labeled Faces in the Wild (LFW) dataset. It includes detailed steps for downloading and preparing the dataset, extracting face encodings, training a Support Vector Classifier (SVC) model, and evaluating the model's performance. The notebook covers:

1. **Downloading and Preparing the LFW Dataset**: Using `fetch_lfw_people` from the `sklearn.datasets` module to download and resize the dataset.

In [None]:
# This notebook trains a facial recognition model using the LFW dataset.
# It includes steps for downloading the dataset, extracting face encodings, and training a model.

import numpy as np
import pandas as pd
import os
import cv2
import dlib
import face_recognition
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
import joblib
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_lfw_people

# PART 1: Download and Prepare the LFW Dataset
print("PART 1: Download and Prepare the LFW Dataset")

# Download the LFW dataset
lfw_people = fetch_lfw_people(min_faces_per_person=20, resize=0.5)
images = lfw_people.images
labels = lfw_people.target
target_names = lfw_people.target_names

2. **Extracting Face Encodings**: Converting images to RGB and extracting face encodings using the `face_recognition` library.

In [None]:
# PART 2: Extract Face Encodings
print("\nPART 2: Extract Face Encodings")

def extract_face_encodings_from_lfw(images):
    """
    Extract face encodings from LFW images.
    
    Parameters:
    - images (numpy array): Array of images.
    
    Returns:
    - face_encodings (list of numpy arrays): List of face encodings.
    """
    face_encodings = []
    for image in images:
        rgb_image = cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)
        face_encoding = face_recognition.face_encodings(rgb_image)
        if face_encoding:
            face_encodings.append(face_encoding[0])
    return np.array(face_encodings)

# Extract face encodings from LFW images
face_encodings = extract_face_encodings_from_lfw(images)

# Save the face encodings and labels
np.save('../data/processed/face_encodings.npy', face_encodings)
np.save('../data/processed/labels.npy', labels)

3. **Loading and Preparing Data**: Loading the extracted face encodings and labels, and splitting them into training and testing sets.

In [None]:
# PART 3: Load and Prepare Data
print("\nPART 3: Load and Prepare Data")

# Load face encodings and labels
face_encodings = np.load('../data/processed/face_encodings.npy')
labels = np.load('../data/processed/labels.npy')

# Check data shapes and types
print(f"Face encodings shape: {face_encodings.shape}")
print(f"Labels shape: {labels.shape}")

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(face_encodings, labels, test_size=0.2, random_state=42)


4. **Training the Model**: Initializing and training an SVC model on the face encodings and labels, and saving the trained model.

In [None]:
# PART 4: Train the Model
print("\nPART 4: Train the Model")

# Initialize and train the model
model = SVC(kernel='linear', probability=True)
model.fit(X_train, y_train)

# Save the trained model
model_file = '../data/models/face_recognition_model.pkl'
joblib.dump(model, model_file)
print(f"Model saved to {model_file}")


5. **Evaluating the Model**: Predicting labels for the test set, evaluating the model's performance using a confusion matrix and classification report, and visualizing the confusion matrix using Seaborn.

In [None]:

# PART 5: Evaluate the Model
print("\nPART 5: Evaluate the Model")

# Predict on the test set
y_pred = model.predict(X_test)

# Evaluate the model
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred, target_names=target_names)

print("Confusion Matrix:")
print(conf_matrix)

print("\nClassification Report:")
print(class_report)

# Visualize confusion matrix
plt.figure(figsize=(10, 7))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=target_names, yticklabels=target_names)
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()