# <h1><center>Smart Attendance System using OpenCV - Image capture and model training</center></h1>
<center>By -  Swayansu Mishra  </center>

## Introduction

This project is aimed at developing a Smart Attendance System using OpenCV and a Convolutional Neural Network (CNN). The system captures images of people, trains a CNN model to recognize their faces, and then uses the trained model to mark attendance in real-time by recognizing the person's face from the webcam feed.

OpenCV is a powerful open-source library for computer vision, and it provides several tools for detecting and recognizing objects, including faces. In this project, we will use the Haar Cascade classifier to detect faces in the video frames. Haar Cascade classifiers are machine learning object detection methods that work on the principle of features proposed by Paul Viola and Michael Jones in their paper, "Rapid Object Detection using a Boosted Cascade of Simple Features." These classifiers are trained to detect objects in images by analyzing the intensity differences between the adjacent rectangular regions.

`haarcascade_frontalface_default.xml` is a pre-trained Haar Cascade classifier provided by OpenCV for detecting frontal faces in images. We will use this classifier to detect faces in the video frames captured by the webcam.

The system consists of several key steps:

1. Capturing images of individuals for training purposes
2. Preprocessing the images and creating labels
3. Splitting the dataset into training and testing sets
4. Defining, compiling, and training a Convolutional Neural Network (CNN) model
5. Marking attendance for recognized individuals in a CSV file
6. Implementing a real-time face recognition system using the trained model
7. Throughout this notebook, we provide detailed explanations, comments, and code sections to guide you through the development process.

In this notebook we will focus on the steps from 1 to 4.

## Import Libraries

First, we need to import the necessary libraries for the project:

In [None]:
import os
import cv2
import numpy as np
import pandas as pd
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D, BatchNormalization, Dropout, Dense, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
from sklearn.utils import shuffle
import pickle

## Capture Images for Training

We will create functions to capture images of multiple people using a webcam. These images will be used to train our CNN model.

### Function to capture images for a single person

This function captures images of a single person using a webcam. The user can press 'c' to capture an image and 'q' to stop capturing.

In [None]:
def capture_images_for_person(person_name, output_dir):
    # Initialize the webcam for capturing video frames
    cap = cv2.VideoCapture('http://172.16.1.169:4747/video')
    cv2.namedWindow('Face Capture')
    
    # Get the current count of images for the person in the output directory
    count = 0
    for file in os.listdir(output_dir):
        if file.startswith(person_name):
            count += 1

    # Loop to capture multiple images of a person
    while True:
        # Read frames from the webcam
        ret, frame = cap.read()
        # Display the captured frames
        cv2.imshow('Face Capture', frame)

        # Wait for the user to press 'c' or 'q'
        key = cv2.waitKey(1) & 0xFF

        # If the user presses 'c', save the current frame as an image
        if key == ord('c'):
            count += 1
            file_name = f"{person_name}_{count}.jpg"
            file_path = os.path.join(output_dir, file_name)
            cv2.imwrite(file_path, frame)
            print(f"{file_name} saved.")

        # If the user presses 'q', stop capturing images
        if key == ord('q'):
            break

    # Release the webcam and close the windows
    cap.release()
    cv2.destroyAllWindows()

### Function to capture images for multiple people

This function captures images for multiple people by asking the user for the number of people and their names. It then calls the capture_images_for_person() function for each person.

In [None]:
def capture_images_for_multiple_people():
    # Input the number of people to capture images for
    num_people = int(input("Enter the number of people: "))
    output_dir = 'training_images'
    os.makedirs(output_dir, exist_ok=True)

    # Loop to capture images for each person
    for i in range(num_people):
        person_name = input(f"Enter the name of person {i + 1}: ")
        print(f"Capturing images for {person_name}. Press 'c' to capture and 'q' to move on to the next person.")
        capture_images_for_person(person_name, output_dir)

Now, we will call the function to capture images using our webcam

In [None]:
capture_images_for_multiple_people()

## Train the CNN Model

We will now create a function to train a CNN model using the captured images. The model will be used to recognize faces in real-time.

### Preprocess images and labels

In this step, we load and preprocess the images and labels from the 'training_images' folder. The images are resized to 96x96 pixels and converted to grayscale, while the labels are one-hot encoded.

In [None]:
def preprocess_images_and_labels():
    images = []
    labels = []

    # Loop through the images and load them into a list
    for idx, filename in enumerate(os.listdir('training_images')):
        img = cv2.imread(os.path.join('training_images', filename), cv2.IMREAD_GRAYSCALE)
        img = cv2.resize(img, (96, 96))  # Resize to match the input size of the CNN
        images.append(img)
        labels.append(filename.split('_')[0])

    # Normalize the images
    images = np.array(images).reshape(-1, 96, 96, 1).astype('float32') / 255.0

    # Encode labels
    encoder = LabelEncoder()
    encoded_labels = encoder.fit_transform(labels)
    one_hot_labels = np_utils.to_categorical(encoded_labels)

    return images, one_hot_labels, encoder

# Calling the function to preprocess the images and labels
images, one_hot_labels, encoder = preprocess_images_and_labels()

# Save the encoder
with open('label_encoder.pkl', 'wb') as f:
    pickle.dump(encoder, f)

### Define, compile, and train the CNN model

We define a simple CNN architecture with two convolutional layers, max-pooling layers, a fully connected layer, dropout, and an output layer. The model is compiled using the Adam optimizer and categorical cross-entropy loss function. Finally, we train the model using the preprocessed images and labels.

In [None]:
def create_custom_model(input_shape):
    model = Sequential([
        Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
        BatchNormalization(),
        MaxPooling2D(pool_size=(2, 2)),
        
        Conv2D(64, (3, 3), activation='relu'),
        BatchNormalization(),
        MaxPooling2D(pool_size=(2, 2)),
        
        Conv2D(128, (3, 3), activation='relu'),
        BatchNormalization(),
        MaxPooling2D(pool_size=(2, 2)),
        
        Conv2D(256, (3, 3), activation='relu'),
        BatchNormalization(),
        MaxPooling2D(pool_size=(2, 2)),
        
        Conv2D(512, (3, 3), activation='relu'),
        BatchNormalization(),
        MaxPooling2D(pool_size=(2, 2)),
        
        GlobalAveragePooling2D(),
    ])
    return model

def train_model(images, one_hot_labels, encoder):
    # Shuffle the dataset
    images, one_hot_labels = shuffle(images, one_hot_labels, random_state=42)
    
    # Split the dataset into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(images, one_hot_labels, test_size=0.3, random_state=42)

    # Make sure images have 3 channels
    X_train_3_channels = np.repeat(X_train, 3, axis=-1)
    X_test_3_channels = np.repeat(X_test, 3, axis=-1)

    # Create the custom CNN model
    base_model = create_custom_model(input_shape=(96, 96, 3))

    # Add custom top layers for face recognition
    x = base_model.output
    x = Dense(512, activation='relu')(x)
    x = Dropout(0.3)(x)
    x = Dense(256, activation='relu')(x)
    x = Dropout(0.3)(x)
    x = Dense(128, activation='relu')(x)
    x = Dropout(0.3)(x)
    x = Dense(64, activation='relu')(x)
    x = Dropout(0.3)(x)
    predictions = Dense(len(encoder.classes_), activation='softmax')(x)

    # Create the final model
    model = Model(inputs=base_model.input, outputs=predictions)

    # Compile the model
    model.compile(optimizer=Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
    
    # Define the data augmentation generator
    datagen = ImageDataGenerator(
        rotation_range=20,
        width_shift_range=0.2,
        height_shift_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest'
    )

    # Fit the generator on the training images
    datagen.fit(X_train_3_channels)

    # Train the model with data augmentation
    model.fit(datagen.flow(X_train_3_channels, y_train, batch_size=32),
              validation_data=(X_test_3_channels, y_test),
              steps_per_epoch=len(X_train_3_channels) / 32,
              epochs=130)
    
    # Save the model
    model.save('face_recognition_model.h5')

    return model

model = train_model(images, one_hot_labels, encoder)

Next we need to focus on definining a function to mark the attendance and then face detection using Haar Cascade and finally using our saved model recgnizing the face and marking the attendance.
This all will be covered in the next notebook.