# Preprocessing of face images
In order to increase the performance of the model, this notebook contains the preprocessing of the images.
The preprocessing consists of the following steps:

## Install the necessary packages

In [1]:
!pip install kaggle opencv-python tqdm

Collecting kaggle
  Downloading kaggle-1.6.17.tar.gz (82 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m82.7/82.7 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting opencv-python
  Using cached opencv_python-4.10.0.84-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)
Collecting tqdm
  Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting certifi>=2023.7.22 (from kaggle)
  Downloading certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB)
Collecting requests (from kaggle)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting python-slugify (from kaggle)
  Downloading python_slugify-8.0.4-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting urllib3 (from kaggle)
  Using cached urllib3-2.2.3-py3-none-any.whl.meta

## Use kaggle to download the dataset and unzip it

In [2]:
!kaggle datasets download -d yousefmohamed20/sentiment-images-classifier
!unzip sentiment-images-classifier.zip -d emotions_dataset

Dataset URL: https://www.kaggle.com/datasets/yousefmohamed20/sentiment-images-classifier
License(s): apache-2.0
Downloading sentiment-images-classifier.zip to /home/rtorrero/Workspace/convomo
100%|████████████████████████████████████████| 114M/114M [00:05<00:00, 28.7MB/s]
100%|████████████████████████████████████████| 114M/114M [00:05<00:00, 23.5MB/s]
Archive:  sentiment-images-classifier.zip
  inflating: emotions_dataset/6 Emotions for image classification/anger/-win-holding-his-fists-shout-wow-mature-hispanic-man-happy-his-win-122652456.jpg  
  inflating: emotions_dataset/6 Emotions for image classification/anger/GBP-scam.jpg  
  inflating: emotions_dataset/6 Emotions for image classification/anger/I-hate-my-job.jpeg  
  inflating: emotions_dataset/6 Emotions for image classification/anger/Learn-How-to-Protect-Yourself-from-Aggressive-Drivers-on-the-Road-1024x683.jpg  
  inflating: emotions_dataset/6 Emotions for image classification/anger/OIP.0846U2L7OhuSwKhrPv2QyAHaE8.jpg  
  infla

## Face Detection and Cropping Preprocessing

This preprocessing step performs face detection and cropping on the emotions dataset to focus on facial expressions. The process:

1. Uses OpenCV's Haar Cascade classifier for face detection
2. Takes images from the original '6 Emotions for image classification' dataset
3. For each emotion class:
   - Detects faces in every image
   - Crops the detected faces
   - Saves the cropped faces in a new directory structure
   - Maintains the emotion labels by keeping the class folder organization

The cropped dataset will be stored in 'emotions_dataset/emotions_dataset_cropped_faces', preserving the original class structure but containing only the detected facial regions. This helps the model focus on relevant facial features during training.

Required libraries:
- OpenCV (cv2) for image processing and face detection
- tqdm for progress tracking
- os for file system operations

In [3]:
import cv2
import os
from tqdm import tqdm 

haar_cascade_path = cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
face_cascade = cv2.CascadeClassifier(haar_cascade_path)

# Input and output directories
input_dir = 'emotions_dataset/6 Emotions for image classification'  # Original dataset
output_dir = 'emotions_dataset/emotions_dataset_cropped_faces'  # Directory to save cropped faces

# Create output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)

# Loop through the dataset
for class_name in os.listdir(input_dir):
    class_path = os.path.join(input_dir, class_name)
    if not os.path.isdir(class_path):
        continue

    # Create class folder in the output directory
    output_class_path = os.path.join(output_dir, class_name)
    os.makedirs(output_class_path, exist_ok=True)

    # Loop through images in the class folder
    for img_name in tqdm(os.listdir(class_path), desc=f"Processing {class_name}"):
        img_path = os.path.join(class_path, img_name)

        # Read the image
        img = cv2.imread(img_path)
        if img is None:
            continue  # Skip if image is invalid

        # Convert to grayscale for face detection
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

        # Detect faces
        faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))

        # Process each detected face
        for i, (x, y, w, h) in enumerate(faces):
            # Crop the face
            face = img[y:y+h, x:x+w]

            # Save the face
            output_img_name = f"{os.path.splitext(img_name)[0]}_face{i}.jpg"
            output_img_path = os.path.join(output_class_path, output_img_name)
            cv2.imwrite(output_img_path, face)

Processing happy: 100%|██████████| 230/230 [00:04<00:00, 48.95it/s]
Processing sad: 100%|██████████| 224/224 [00:03<00:00, 68.14it/s]
Processing disgust: 100%|██████████| 201/201 [00:04<00:00, 46.60it/s]
Processing anger: 100%|██████████| 214/214 [00:03<00:00, 59.75it/s]
Processing pain: 100%|██████████| 168/168 [00:05<00:00, 31.91it/s]
Processing fear: 100%|██████████| 163/163 [00:03<00:00, 50.88it/s]
