# Facial Recognition Pipeline

This project implements a facial recognition system that identifies faces by converting them into **128-dimensional vector embeddings**. Faces belonging to the same person should produce embeddings that are close together, while embeddings from different people should be farther apart. A simple classifier (like **KNN** or **SVM**) then maps these embeddings to identities.

![Facial Recognition Pipeline](./examples/fast-five-1200-1200-675-675-crop-000000.jpg)

---

First, we need to install the required libraries. You can do this by running the following command in your terminal:

```bash
pip install face_recognition opencv-python scikit-learn imutils icrawler dlib

## Dataset

For this facial recognition project, we'll create a custom dataset of **Fast & Furious** actors. The dataset will include images of 10 main characters from the franchise:

- **Vin Diesel** (Dominic Toretto)
- **Paul Walker** (Brian O'Conner) 
- **Dwayne Johnson** (Luke Hobbs)
- **Michelle Rodriguez** (Letty Ortiz)
- **Tyrese Gibson** (Roman Pearce)
- **Ludacris** (Tej Parker)
- **Jordana Brewster** (Mia Toretto)
- **Gal Gadot** (Gisele Yashar)
- **Sung Kang** (Han Seoul-Oh)
- **Jason Statham** (Deckard Shaw)

### Dataset Structure
```
dataset/
├── Vin_Diesel/
│   ├── 000001.jpg
│   ├── 000002.jpg
│   └── ... (30 images)
├── Paul_Walker/
│   ├── 000001.jpg
│   └── ... (30 images)
└── ... (other actors)
```

### Data Collection
We'll automatically download **30 images per actor** from Google Images using the `icrawler` library. This provides us with a diverse set of facial images for training our recognition model. The images will be stored in separate folders for each actor, making it easy to extract labels during the encoding process.


In [None]:
from icrawler.builtin import GoogleImageCrawler
import os

actors = [
    "Tyrese Gibson",
]

base_dir = "dataset"
os.makedirs(base_dir, exist_ok=True)

def download_images(actor_name, max_images=30):
    """Download images of the actor from Google."""
    actor_dir = os.path.join(base_dir, actor_name.replace(" ", "_"))
    os.makedirs(actor_dir, exist_ok=True)
    
    google_crawler = GoogleImageCrawler(storage={"root_dir": actor_dir})
    google_crawler.crawl(keyword=actor_name + " photoshoot", max_num=max_images)
    print(f"✅ Downloaded images for {actor_name}")

# Download images for all actors
for actor in actors:
    download_images(actor, max_images=30)

print("🎯 Dataset ready in 'dataset/'")

## 🚀 Overview of the Pipeline

The system is divided into 4 main stages:

### 1. Detecting Faces (HOG-based Detection)

The first step is locating faces in images using **Histogram of Oriented Gradients (HOG)**. It’s an efficient algorithm that works well for frontal faces.

#### How it works:

- **1.1 Grayscale conversion:**  
  Each image is first converted to grayscale to reduce computational complexity.

- **1.2 Compute pixel gradients:**  
  For every pixel, we calculate how dark it is compared to its immediate neighbors. This gives us a gradient vector showing the direction of intensity change.

- **1.3 Aggregate gradient directions:**  
  Since computing this for every pixel is too detailed, we divide the image into small cells (typically 16×16 pixels). For each cell, we count how many gradients point in each major direction (e.g., 0°, 45°, 90°, etc.) and summarize the cell with a dominant direction.

- **1.4 Compare with face templates:**  
  The final HOG representation is compared with a known HOG pattern for a face to determine whether a face is present and where it is.
```python
import face_recognition

image = face_recognition.load_image_file("example.jpg")
face_locations = face_recognition.face_locations(image, model="hog")

```
This gives us bounding boxes for the detected faces.

![HOG Detection Example](./assets/face_detection.jpg)

---

### 2. Aligning Faces (Facial Landmark Projection)

Once we detect a face, we want to align it so that facial features (like eyes and mouth) are in consistent positions across all images. This is important for the encoder to generate meaningful embeddings.

We do this by finding **68 facial landmarks** for each face and warping the image accordingly.

#### Steps:

* A pre-trained model predicts the coordinates of 68 key facial features.
* We use these points to **rotate, scale, and warp** the face so it is centered and normalized in the frame.

#### Landmark Detection Details:

Landmark detection models are typically based on **regression trees** or **deep CNNs**. One well-known implementation is Dlib’s shape predictor, which was trained on labeled facial images to output consistent landmark coordinates. More advanced methods like **OpenFace** or **MediaPipe Face Mesh** use neural networks for more robust landmark estimation.

```python
import dlib
import cv2
import face_recognition

detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")

image_bgr = cv2.imread("./examples/fast-five-1200-1200-675-675-crop-000000.jpg")
gray = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2GRAY)

image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
face_locations = face_recognition.face_locations(image_rgb, model="hog")

for (top, right, bottom, left) in face_locations:
    face_rect = dlib.rectangle(left, top, right, bottom)
    
    landmarks = predictor(gray, face_rect)
```
![Facial Landmark Example](./assets/landmarks_result.jpg)
---

### 3. Encoding Faces (128-Dimensional Embedding)

After alignment, each face is passed through a **deep convolutional neural network** to extract a 128-dimensional embedding vector.

This vector is trained to capture the identity of the person in such a way that:

* Embeddings of the **same person** have a small Euclidean distance.
* Embeddings of **different people** are far apart.

We use a pre-trained encoder (like the one provided by `face_recognition`, based on dlib and similar to OpenFace) to handle this step.

```python
face_encodings = face_recognition.face_encodings(image, known_face_locations=face_locations)
```

Each face encoding is just a NumPy array of 128 float values.

---

### 4. Classifying Faces (KNN / SVM)

At this point, face recognition becomes a classification problem:

* For each person in your dataset, compute and store their embeddings.
* When a new image comes in, generate its embedding and **compare it** to the stored ones.

We can use:

* **K-Nearest Neighbors (KNN):**
  A simple and effective method. Given a new embedding, find the k closest embeddings in the dataset and vote.

* **Support Vector Machine (SVM):**
  Useful when you want a trained decision boundary between classes.

```python
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(known_embeddings, known_labels)

prediction = knn.predict([new_embedding])
```
![Classification Example](./assets/classification.jpg)

---

Let's implement the pipeline using out dataset of Fast & Furious actors.

Import the necessary libraries:

In [2]:
from imutils import paths
import cv2
import face_recognition
import argparse
import pickle
import os

Define our dataset path

In [3]:
data_path = "./dataset"
encoding_path = "./encodings"

In [4]:
imagePaths = list(paths.list_images(data_path))
knownEncodings = []
knownNames = []

Encode the faces using the `face_recognition` library:


In [None]:
for (i, imagePath) in enumerate(imagePaths):
	# extract the person name from the image path
	print("[INFO] processing image {}/{}".format(i + 1,
		len(imagePaths)))
	name = imagePath.split(os.path.sep)[-2]
	print(name)
	# load the input image and convert it from BGR (OpenCV ordering)
	# to dlib ordering (RGB)
	image = cv2.imread(imagePath)
	rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
	boxes = face_recognition.face_locations(rgb,
		model="hog")
	# compute the facial embedding for the face
	encodings = face_recognition.face_encodings(rgb, boxes)
	# loop over the encodings
	for encoding in encodings:
		# add each encoding + name to our set of known names and
		# encodings
		knownEncodings.append(encoding)
		knownNames.append(name)

Save our embeddings and labels for later use:

In [7]:
data = {"encodings": knownEncodings, "names": knownNames}
with open(encoding_path+"/encoding.pkl", "wb") as f:
    f.write(pickle.dumps(data))
print("[INFO] Face encodings saved successfully!")

[INFO] Face encodings saved successfully!


Now that we've encoded all the faces in our dataset, let's see how the recognition process works.

#### 1. **Load Saved Encodings**
```python
data = pickle.loads(open(encoding_path+"/encoding.pkl", "rb").read())
```
We load our previously saved face encodings and corresponding names from the pickle file.

#### 2. **Process Each Test Image**
For every image in our `./examples/` folder:
- **Load the image** using OpenCV
- **Convert color space** from BGR to RGB (required by face_recognition library)
- **Detect faces** using HOG-based detection
- **Generate encodings** for detected faces

#### 3. **Face Matching Algorithm**
For each detected face encoding:

**Step 3.1: Compare with Known Faces**
```python
matches = face_recognition.compare_faces(data["encodings"], encoding)
```
This compares the current face encoding with all encodings in our dataset, returning a boolean list. The function calculates the Euclidean distance between the new face encoding and each of the known encodings in your list.

**Step 3.2: Find Best Match Using Voting**
```python
if True in matches:
    matchedIdxs = [i for (i, b) in enumerate(matches) if b]
    counts = {}
    for i in matchedIdxs:
        name = data["names"][i]
        counts[name] = counts.get(name, 0) + 1
    name = max(counts, key=counts.get)
```

This implements a **voting mechanism**:
- Collect all matching face indices
- Count votes for each person name
- Select the name with the most votes (handles multiple encodings per person)


This testing phase validates that our face recognition pipeline can successfully identify the Fast & Furious actors from new images not used during the encoding phase.

In [8]:
data = pickle.loads(open(encoding_path+"/encoding.pkl", "rb").read())
for imagePath in list(paths.list_images("./examples")):
    image = cv2.imread(imagePath)
    rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    boxes = face_recognition.face_locations(rgb, model="hog")
    encodings = face_recognition.face_encodings(rgb, boxes)
    
    names = []
    for encoding in encodings:
        matches = face_recognition.compare_faces(data["encodings"], encoding)
        name = "Unknown"
        
        if True in matches:
            matchedIdxs = [i for (i, b) in enumerate(matches) if b]
            counts = {}
            for i in matchedIdxs:
                name = data["names"][i]
                counts[name] = counts.get(name, 0) + 1
            name = max(counts, key=counts.get)
        
        names.append(name)
    print(f"Names detected in {imagePath}: {names}")
    for ((top, right, bottom, left), name) in zip(boxes, names):
        cv2.rectangle(image, (left, top), (right, bottom), (0, 255, 0), 2)
        y = top - 15 if top - 15 > 15 else top + 15
        cv2.putText(image, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 255, 0), 2)
    cv2.imshow("Image", image)
    cv2.imwrite(f"./output/{os.path.basename(imagePath)}", image)
    cv2.waitKey(0)
        

Names detected in ./examples/fast-five-1200-1200-675-675-crop-000000.jpg: ['Jordana_Brewster', 'Vin_Diesel', 'Paul_Walker', 'Gal_Gadot', 'Dwayne_Johnson']


## 🎬 Real-Time Video Face Recognition

Now let's apply our trained face recognition system to process video files. This demonstrates how the pipeline performs on moving images with multiple faces appearing and disappearing throughout the video.

### Video Processing Features:

- **Frame-by-frame analysis**: Process each video frame to detect and identify faces
- **Optimized performance**: Skip frames to balance accuracy with processing speed
- **Real-time display**: Show recognition results as the video plays
- **Output generation**: Save processed video with face annotations
- **Smooth tracking**: Use previous frame results for skipped frames to maintain consistency

### Video Demo:

<div align="center">
  <img src="./assets/fast5.gif" alt="Video Demo" />
</div>

In [12]:
data = pickle.loads(open(encoding_path+"/encoding.pkl", "rb").read())

video_path = ".video/fast5-lq.mp4" # Path to your video file
output_path = "./output/video/output_fast5.mp4" # Path to save the output video
vs = cv2.VideoCapture(video_path)
print(f"[INFO] processing video: {video_path}...")

# Get video properties for output video
fps = int(vs.get(cv2.CAP_PROP_FPS))
width = int(vs.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(vs.get(cv2.CAP_PROP_FRAME_HEIGHT))

print(f"[INFO] Video properties: FPS={fps}, Width={width}, Height={height}")



# Frame skipping configuration
frame_skip = 3  # Process every 6th frame (adjust this value as needed)
frame_count = 0
output_fps = 20

# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path, fourcc, output_fps, (width, height))
# Variables to store last detection results
last_boxes = []
last_names = []

while True:
    # Grab the next frame from the video stream
    (grabbed, frame) = vs.read()

    # If the frame was not grabbed, then we have reached the end of the stream
    if not grabbed:
        break

    frame_count += 1

    # Only process every nth frame for face detection
    if frame_count % frame_skip == 0:
        # --- YOUR LOGIC (applied to 'frame' instead of 'image') ---
        # Convert the input frame from BGR to RGB
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

        # Detect face locations and compute encodings
        boxes = face_recognition.face_locations(rgb, model="hog")
        encodings = face_recognition.face_encodings(rgb, boxes)
        names = []

        # Loop over the facial embeddings
        for encoding in encodings:
            matches = face_recognition.compare_faces(data["encodings"], encoding)
            name = "Unknown"

            if True in matches:
                matchedIdxs = [i for (i, b) in enumerate(matches) if b]
                counts = {}
                for i in matchedIdxs:
                    name = data["names"][i]
                    counts[name] = counts.get(name, 0) + 1
                name = max(counts, key=counts.get)
            
            names.append(name)
        
        # Store the results for use in skipped frames
        last_boxes = boxes
        last_names = names
    else:
        # Use the last detection results for skipped frames
        boxes = last_boxes
        names = last_names

    # Loop over the recognized faces to draw boxes and names
    for ((top, right, bottom, left), name) in zip(boxes, names):
        cv2.rectangle(frame, (left, top), (right, bottom), (0, 255, 0), 2)
        y = top - 15 if top - 15 > 15 else top + 15
        cv2.putText(frame, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 255, 0), 2)

    # Write the frame to the output video
    out.write(frame)

    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF

    # If the `q` key was pressed, break from the loop
    if key == ord("q"):
        break

# --- NEW: Clean up ---
print("[INFO] cleaning up...")
vs.release()
out.release()  # Release the video writer
cv2.destroyAllWindows()
print(f"[INFO] Output video saved to: {output_path}")

[INFO] processing video: .video/fast5-lq.mp4...
[INFO] Video properties: FPS=0, Width=0, Height=0
[INFO] cleaning up...
[INFO] Output video saved to: ./output/video/output_fast5.mp4
