# Preprocessing of face images
In order to increase the performance of the model, this notebook contains the preprocessing of the images.
The preprocessing consists of the following steps:

## Install the necessary packages

In [None]:
!pip install kaggle opencv-python tqdm

## Use kaggle to download the dataset and unzip it

In [None]:
!kaggle datasets download -d yousefmohamed20/sentiment-images-classifier
!unzip sentiment-images-classifier.zip -d emotions_dataset

## Face Detection and Cropping Preprocessing

This preprocessing step performs face detection and cropping on the emotions dataset to focus on facial expressions. The process:

1. Uses OpenCV's Haar Cascade classifier for face detection
2. Takes images from the original '6 Emotions for image classification' dataset
3. For each emotion class:
   - Detects faces in every image
   - Crops the detected faces
   - Saves the cropped faces in a new directory structure
   - Maintains the emotion labels by keeping the class folder organization

The cropped dataset will be stored in 'emotions_dataset/emotions_dataset_cropped_faces', preserving the original class structure but containing only the detected facial regions. This helps the model focus on relevant facial features during training.

Required libraries:
- OpenCV (cv2) for image processing and face detection
- tqdm for progress tracking
- os for file system operations

In [None]:
import cv2
import os
from tqdm import tqdm 

haar_cascade_path = cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
face_cascade = cv2.CascadeClassifier(haar_cascade_path)

# Input and output directories
input_dir = 'emotions_dataset/6 Emotions for image classification'  # Original dataset
output_dir = 'emotions_dataset/emotions_dataset_cropped_faces'  # Directory to save cropped faces

# Create output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)

# Loop through the dataset
for class_name in os.listdir(input_dir):
    class_path = os.path.join(input_dir, class_name)
    if not os.path.isdir(class_path):
        continue

    # Create class folder in the output directory
    output_class_path = os.path.join(output_dir, class_name)
    os.makedirs(output_class_path, exist_ok=True)

    # Loop through images in the class folder
    for img_name in tqdm(os.listdir(class_path), desc=f"Processing {class_name}"):
        img_path = os.path.join(class_path, img_name)

        # Read the image
        img = cv2.imread(img_path)
        if img is None:
            continue  # Skip if image is invalid

        # Convert to grayscale for face detection
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

        # Detect faces
        faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))

        # Process each detected face
        for i, (x, y, w, h) in enumerate(faces):
            # Crop the face
            face = img[y:y+h, x:x+w]

            # Save the face
            output_img_name = f"{os.path.splitext(img_name)[0]}_face{i}.jpg"
            output_img_path = os.path.join(output_class_path, output_img_name)
            cv2.imwrite(output_img_path, face)