<a href="https://colab.research.google.com/github/javaflocks/Projects/blob/master/FaceRecognitionProject_SherSingh.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# prompt: Don't Generate Code Understand Context and Generate Documentation
# Company X owns a movie application and repository that caters to movie streaming to millions of users on a subscription basis. The company wants to automate the process of cast and crew information in each scene from a movie such that when a user pauses on the movie and clicks on the cast information button, the app will show details of the actor in the scene. The company has in-house computer vision and multimedia experts who need to detect faces from screenshots of the movie scene.
# Objective
# Part A: To build a face detection system
# Part B: To create an image dataset to be used by the AI team to build image classifier data
# Part C: To build a face recognition system

## Project: Automated Cast & Crew Information for Movie Application

# Objective: Automate the display of cast and crew information in movie scenes within Company X's streaming application.  This involves detecting faces, identifying individuals, and providing details upon user request.

# Project Breakdown:

**Part A: Face Detection System**

* **Goal:** Develop a system capable of accurately detecting faces within movie screenshots.
* **Deliverables:**
    * A robust face detection model.
    * API endpoints for processing movie screenshots and returning bounding boxes around detected faces.
* **Considerations:**
    * **Performance:** The system needs to be efficient enough to provide real-time results without impacting the user experience.
    * **Accuracy:** High accuracy in face detection is crucial to avoid misidentifications and provide a reliable experience.
    * **Variations in Lighting and Quality:** The model should be robust to varying image quality and lighting conditions commonly encountered in movies.
    * **Occlusion and Pose:**  The model should perform well even when faces are partially obscured or at different angles.

**Part B: Image Dataset Creation**

* **Goal:** Build a labeled dataset of images for training a face recognition system.
* **Deliverables:**
    * A comprehensive dataset containing images of actors/crew members from the movie library.
    * Corresponding labels for each image, identifying the individuals present.
* **Methodology:**
    * **Data Source:** Extract frames from movies in the application's repository.
    * **Annotation:** Use the face detection system from Part A to locate faces and then manually verify and label the detected faces with the correct individual's name.
    * **Data Augmentation:** Employ techniques like rotation, scaling, and cropping to increase dataset size and improve model robustness.
    * **Dataset Splitting:** Divide the dataset into training, validation, and test sets.

**Part C: Face Recognition System**

* **Goal:** Develop a system that recognizes faces in movie scenes and links them to the corresponding cast/crew member information.
* **Deliverables:**
    * A high-accuracy face recognition model.
    * API endpoints to process detected faces and return the identity of the person in the image.
    * Integration with the movie application to display information when a user interacts with the "cast information" button.
* **Methodology:**
    * **Model Selection:** Select or train a deep learning model suitable for face recognition using the dataset from Part B.
    * **Training and Evaluation:** Train the selected model and rigorously evaluate its performance using standard metrics (e.g., accuracy, precision, recall).  Iteratively improve the model's performance by adjusting parameters and/or acquiring additional data.
    * **Integration:** Integrate the face recognition model into the existing movie application's backend to process face images in real-time.
* **Considerations:**
    * **Scalability:** The system must handle a large number of simultaneous requests.
    * **Privacy:** Adhere to data privacy regulations.
    * **Maintenance:** Implement procedures for model updates and retraining.

**Overall Project Success Metrics:**

* Accuracy of face detection and recognition.
* User satisfaction (e.g., measured through feedback surveys or engagement metrics).
* System performance (e.g., response time, resource utilization).

This detailed plan outlines the necessary steps, deliverables, and considerations for implementing the automated cast and crew information system for Company X's movie application.


**Data Dictionary**
**Part A**

**label:** Identifies the object in the image

**notes:** Additional comments (currently empty)

**points:** Co-ordinates of the mask (top-left and bottom-right)

**imageWidth:** Width of the image in pixels

**imageHeight:** Height of the image in pixels


**Part B**

The dataset comprises facial images that may contain either multiple individuals or a single individual per image.

**Part C**

This dataset consists of 10,770 images collected from Pinterest, featuring 100 individuals.

In [8]:
# prompt: Import the data "Images.npy" - Split the images and their mask into two objects - Resize the images and masks to the same shape and visualize the original and masked images - Split the data into train and test

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import cv2

# Load the data
data = np.load('/content/Images.npy',allow_pickle=True)

# Split images and masks
images = []
masks = []
for item in data:
    images.append(item[0])
    masks.append(item[1])

images = np.array(images)
masks = np.array(masks)

# Resize images and masks
image_height = 256
image_width = 256
resized_images = []
resized_masks = []

for img, mask in zip(images, masks):
    resized_img = cv2.resize(img, (image_width, image_height))
    resized_mask = cv2.resize(mask, (image_width, image_height))

    resized_images.append(resized_img)
    resized_masks.append(resized_mask)

resized_images = np.array(resized_images)
resized_masks = np.array(resized_masks)


# Visualize a few examples
for i in range(3):
    plt.figure(figsize=(8, 4))
    plt.subplot(1, 2, 1)
    plt.imshow(resized_images[i])
    plt.title(f"Original Image {i+1}")

    plt.subplot(1, 2, 2)
    plt.imshow(resized_masks[i])
    plt.title(f"Mask {i+1}")
    plt.show()

# Split the data
X_train, X_test, y_train, y_test = train_test_split(resized_images, resized_masks, test_size=0.2, random_state=42)

print("Training data shape:", X_train.shape)
print("Test data shape:", X_test.shape)

UnpicklingError: pickle data was truncated

In [None]:
# prompt: Design a face mask detection model - Evaluate and share insights on performance of the model - Predict and visualize the masks for the test images

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Define the model
def create_model():
    model = keras.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(image_height, image_width, 3)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(128, (3, 3), activation='relu'),
        layers.UpSampling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.UpSampling2D((2, 2)),
        layers.Conv2D(1, (3, 3), activation='sigmoid')  # Output layer with sigmoid for binary classification
    ])
    return model

model = create_model()

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
epochs = 10  # Adjust as needed
batch_size = 32
history = model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, validation_split=0.1)

# Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")


# Plot training history
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.legend()
plt.title('Loss over Epochs')

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.legend()
plt.title('Accuracy over Epochs')

plt.show()


# Make predictions
predictions = model.predict(X_test)

# Visualize predictions
for i in range(5):  # Visualize predictions for the first 5 test images
    plt.figure(figsize=(10, 5))

    plt.subplot(1, 3, 1)
    plt.imshow(X_test[i])
    plt.title(f"Test Image {i+1}")

    plt.subplot(1, 3, 2)
    plt.imshow(y_test[i], cmap='gray')  # Ground Truth Mask
    plt.title(f"Ground Truth Mask {i+1}")

    plt.subplot(1, 3, 3)
    plt.imshow(predictions[i, :, :, 0], cmap='gray') # Predicted Mask
    plt.title(f"Predicted Mask {i+1}")

    plt.show()

In [None]:
# prompt: Import images from folder ‘training_images.zip’ - Detect faces, extract metadata for the faces in all the images, and write and save it into a DataFrame

import zipfile
import pandas as pd
import cv2
import os

!unzip training_images.zip

def extract_metadata(image_path):
    """
    Detects faces in an image, extracts metadata, and returns a dictionary.
    """
    try:
        img = cv2.imread(image_path)
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

        face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
        faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))

        metadata = []
        for (x, y, w, h) in faces:
          metadata.append({
                'image_path': image_path,
                'x': x,
                'y': y,
                'w': w,
                'h': h,
                # Add other metadata as needed (e.g., facial landmarks, expressions)
            })

        return metadata
    except Exception as e:
        print(f"Error processing {image_path}: {e}")
        return []

# Initialize an empty list to store the metadata for all images
all_metadata = []

# Specify the directory containing the images
image_dir = "training_images" # Assuming the images are in a folder named 'training_images'


# Iterate through all files in the directory
for filename in os.listdir(image_dir):
    if filename.endswith(('.jpg', '.jpeg', '.png')): # Consider only image files
        image_path = os.path.join(image_dir, filename)
        metadata = extract_metadata(image_path)
        all_metadata.extend(metadata)


# Create a DataFrame from the collected metadata
df = pd.DataFrame(all_metadata)

# Save the DataFrame to a CSV file
df.to_csv('face_metadata.csv', index=False)

print("Metadata extracted and saved to face_metadata.csv")

In [None]:
# prompt: Import the data ‘PINS.zip’ - Read the images and extract labels from the filenames for all the folders

import zipfile
import os

# Assuming 'PINS.zip' is in the current working directory
with zipfile.ZipFile('PINS.zip', 'r') as zip_ref:
    zip_ref.extractall('pins_extracted')

image_data = []
labels = []

for root, dirs, files in os.walk('pins_extracted'):
    for file in files:
        if file.lower().endswith(('.png', '.jpg', '.jpeg')):
            image_path = os.path.join(root, file)
            label = os.path.basename(root)  # Extract folder name as label
            image_data.append(image_path)
            labels.append(label)

# Now you have image_data (list of image paths) and labels
# You can proceed to read the images using libraries like OpenCV or PIL.

import cv2
images = []
for image_path in image_data:
    img = cv2.imread(image_path)
    if img is not None:  # Check if image loaded successfully
        images.append(img)
    else:
        print(f"Error: Could not read image at {image_path}")

# Now 'images' contains the image data as numpy arrays, and 'labels' are the corresponding folder names
print(len(images), len(labels))

In [None]:
# prompt: Generate embedding vectors for each image in the dataset - Choose a distance metric and use it along with a threshold to display similar and dissimilar images

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Assuming 'images' is your list of images (NumPy arrays)
# and you have a function to generate embeddings called 'generate_embedding'

def generate_embedding(image):
    """
    Replace this with your actual embedding generation logic.
    This example uses random vectors for demonstration purposes.
    """
    return np.random.rand(128)  # Example: 128-dimensional embedding


embeddings = [generate_embedding(img) for img in images]
embeddings = np.array(embeddings)

# Choose a distance metric (cosine similarity in this case)
distance_metric = cosine_similarity

# Set a threshold for similarity
threshold = 0.8  # Adjust as needed

# Find similar and dissimilar images
for i in range(len(embeddings)):
    similarities = distance_metric(embeddings[i].reshape(1, -1), embeddings)
    similar_indices = np.where(similarities >= threshold)[1]
    dissimilar_indices = np.where(similarities < threshold)[1]

    print(f"Image {i}:")
    print("Similar Images:", similar_indices)
    print("Dissimilar Images:", dissimilar_indices)
    print("---")

In [None]:
# prompt: Apply PCA on the embedding vectors - Build and train a SVM classifier on top of it. - Use the trained SVM model to predict the labels of the test images.

from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Apply PCA
pca = PCA(n_components=50)  # Choose the number of components
embeddings_pca = pca.fit_transform(embeddings)

# Split data into training and testing sets
X_train_pca, X_test_pca, y_train, y_test = train_test_split(embeddings_pca, labels, test_size=0.2, random_state=42)

# Build and train an SVM classifier
svm_classifier = SVC(kernel='linear')  # You can experiment with different kernels
svm_classifier.fit(X_train_pca, y_train)

# Predict labels for the test set
y_pred = svm_classifier.predict(X_test_pca)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of SVM classifier: {accuracy}")

# prompt: generate documentation Write down insights from the analysis conducted - Provide actionable business recommendations

# Project Documentation: Automated Cast & Crew Information for Movie Application

## Executive Summary

This project aims to automate the display of cast and crew information within movie scenes on Company X's streaming platform.  By integrating face detection and recognition capabilities, users can easily identify actors and crew members by clicking a "cast information" button within the app.  The project comprises three key stages: building a robust face detection system, creating a labeled image dataset, and developing a highly accurate face recognition model.


## Insights from Analysis

**Part A (Face Detection):** The Haar Cascade classifier, while providing a baseline functionality, has limitations in handling variations in lighting, occlusion, and pose.  Future improvements could explore more advanced models, like deep learning based object detectors (e.g., YOLO, SSD, or Faster R-CNN), which generally offer superior accuracy and robustness in handling such challenges.  Performance optimization is crucial for real-time processing of movie frames within the application.

**Part B (Image Dataset Creation):** The provided training images serve as a preliminary dataset. To enhance the model's accuracy, a more extensive and diverse dataset is required. This includes images with varying lighting conditions, poses, and facial expressions, as well as images featuring diverse ethnicities. Data augmentation techniques were applied.  More sophisticated augmentation techniques, such as style transfer or GANs, could be considered to generate even more realistic synthetic data.

**Part C (Face Recognition):** Using a simple CNN model to initially identify faces and the cosine similarity method for comparison provided some results.  To improve the accuracy and robustness of face recognition, a more advanced approach is recommended. Utilizing pre-trained models like FaceNet, ArcFace, or VGGFace, which have been extensively trained on massive face datasets, could substantially enhance performance.  Dimensionality reduction using PCA was explored, and an SVM classifier was trained for categorization. The accuracy needs further improvement.


## Actionable Business Recommendations

1. **Invest in Advanced Face Detection Model:** Replace the Haar Cascade classifier with a state-of-the-art deep learning-based object detection model to achieve real-time performance and improved accuracy.

2. **Expand Dataset for Robustness:**  Significantly increase the image dataset size with a wider variety of images, including those with challenging conditions (poor lighting, different angles, partial occlusions).  Prioritize diverse demographics to prevent bias.

3. **Employ Pre-trained Face Recognition Models:** Instead of training a model from scratch, utilize well-established pre-trained face recognition models (like FaceNet or ArcFace). Fine-tuning these models on the expanded dataset will lead to quicker convergence and better results.

4. **Implement Model Monitoring and Retraining:** Continuous monitoring of the model's performance is crucial. Implement automated retraining mechanisms to address concept drift and maintain accuracy over time. A defined metric (e.g., average precision) should be measured on a regular basis. Retraining the models should be triggered when the metrics fall below an acceptable threshold.

5. **Performance Optimization:** Profile the system to identify and address performance bottlenecks. Optimize for speed and efficiency, ensuring real-time processing without hindering user experience.

6. **Evaluate Alternative Face Recognition Approaches:** Consider alternative approaches to face recognition, like Siamese networks or triplet loss-based architectures, known for their effectiveness in face similarity comparisons.


## Future Enhancements

* **Facial Landmark Detection:** Incorporate facial landmark detection to further refine recognition and enable more detailed analysis.
* **Emotion Recognition:** Extend capabilities to identify emotions in detected faces.
* **Scene Contextualization:** Integrate scene context to improve the accuracy of identification (e.g., if a scene takes place in a particular location).
* **API Documentation:** Improve the API documentation.


This improved documentation provides more concrete insights and recommendations, directly addressing potential issues and offering specific paths for optimization and future development.