# Face Recognition Attendance System

Welcome! This notebook walks you through the full pipeline used in this repository:

1. **Prepare the dataset** (images organized by person in `../data/`)
2. **Encode faces** and **train a classifier**
3. **Save trained artifacts** to `../models/`
4. **Run recognition** (image or webcam) and **log attendance** to `../outputs/attendance.csv`

This notebook mirrors the logic of the Python scripts in `src/` but adds explanations and runnable cells.

## Project Structure (expected)
```
face-recognition-attendance-system/
│── data/                          # raw images, organized by person
│   ├── person1/
│   ├── person2/
│   └── ...
│
│── models/                        # trained models, encodings
│   ├── encodings.pkl
│   └── classifier.pkl
│
│── notebook/
│   └── face_recognition_demo.ipynb
│
│── src/
│   ├── train_model.py
│   ├── test_model.py
│   └── utils.py (optional)
│
│── outputs/
│   └── attendance.csv
│
│── requirements.txt
│── README.md
│── .gitignore
```

**Note:** This notebook assumes it lives inside `notebook/`. All paths are resolved relative to the project root (`..`).

## 1) Setup & Imports
Install dependencies (if needed) and import libraries. If you're running inside a fresh environment, uncomment the pip command to install from `requirements.txt`.

In [7]:
# Optional: install dependencies from the repo root
# !pip install -r ../requirements.txt

import os
import csv
import pickle
from datetime import datetime

import cv2
import numpy as np
import face_recognition
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Resolve key paths relative to this notebook (located in ../notebook)
BASE_DIR = os.getcwd()  # gets current working directory (where notebook runs)
DATA_DIR = os.path.join(BASE_DIR, "data")
MODELS_DIR = os.path.join(BASE_DIR, "models")
OUTPUTS_DIR = os.path.join(BASE_DIR, "outputs")

os.makedirs(MODELS_DIR, exist_ok=True)
os.makedirs(OUTPUTS_DIR, exist_ok=True)

ATTENDANCE_FILE = os.path.join(OUTPUTS_DIR, "attendance.csv")

print("BASE_DIR:", BASE_DIR)
print("DATA_DIR:", DATA_DIR)
print("MODELS_DIR:", MODELS_DIR)
print("OUTPUTS_DIR:", OUTPUTS_DIR)

BASE_DIR: c:\Users\hp\Downloads
DATA_DIR: c:\Users\hp\Downloads\data
MODELS_DIR: c:\Users\hp\Downloads\models
OUTPUTS_DIR: c:\Users\hp\Downloads\outputs


## 2) Dataset Preparation
Images should be placed in `../data/<person_name>/image.jpg`. We will:

- Load each image
- Detect the face and compute a 128-D embedding using `face_recognition`
- Build arrays: `encodings` (features) and `labels` (person names)

If an image doesn't contain a detectable face, it will be skipped with a warning.

In [2]:
def get_face_encodings(image_path):
    """Return the 128-D face encoding for the first face found in the image, or None."""
    try:
        image = face_recognition.load_image_file(image_path)
        encs = face_recognition.face_encodings(image)
        if encs:
            return encs[0]
    except Exception as e:
        print(f"Error processing image {image_path}: {e}")
    return None

def prepare_dataset(dataset_path):
    labels = []
    encodings = []
    if not os.path.isdir(dataset_path):
        raise FileNotFoundError(f"Dataset path not found: {dataset_path}")

    for person_name in os.listdir(dataset_path):
        person_folder = os.path.join(dataset_path, person_name)
        if os.path.isdir(person_folder):
            for image_name in os.listdir(person_folder):
                image_path = os.path.join(person_folder, image_name)
                print(f"Processing {image_path}...")
                encoding = get_face_encodings(image_path)
                if encoding is not None:
                    encodings.append(encoding)
                    labels.append(person_name)
                else:
                    print(f"Warning: No encoding found for image: {image_path}")
    print(f"Found {len(np.unique(labels))} unique classes.")
    return np.array(encodings), np.array(labels)

encodings, labels = prepare_dataset(DATA_DIR)
print(f"Number of encodings: {len(encodings)}")
print(f"Encodings shape: {encodings.shape if encodings.size else 'Empty'}")
print(f"Number of labels: {len(labels)}")
print(f"Unique classes: {np.unique(labels)}")

FileNotFoundError: Dataset path not found: c:\Users\hp\data

## 3) Train the Classifier (SVM)
We split the data into train/test sets and train a simple SVM (`svm.SVC`) on the embeddings. 

**Why this works:** `face_recognition` already gives a strong embedding where same-person faces are close together in the 128-D space. A linear or RBF SVM can then separate classes effectively.

If you only have one class (one person) or too few images, training will fail — add more data.

In [None]:
if encodings.size == 0 or len(labels) == 0:
    raise RuntimeError("No encodings or labels found. Check your dataset in ../data.")

if len(np.unique(labels)) < 2:
    raise RuntimeError("Less than 2 unique classes found. Add more people to ../data.")

X_train, X_test, y_train, y_test = train_test_split(
    encodings, labels, test_size=0.2, random_state=42, stratify=labels
)

print("Training set classes:", np.unique(y_train, return_counts=True))
print("Testing set classes:", np.unique(y_test, return_counts=True))

clf = svm.SVC(gamma='scale')
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")

## 4) Save Artifacts to `../models/`
We persist two files so you can reuse them without retraining:

- `encodings.pkl`: numpy array of embeddings and their labels (for later retraining/expansion)
- `classifier.pkl`: trained SVM classifier

These mirror what your scripts in `src/` generate.

In [None]:
enc_path = os.path.join(MODELS_DIR, 'encodings.pkl')
clf_path = os.path.join(MODELS_DIR, 'classifier.pkl')

with open(enc_path, 'wb') as f:
    pickle.dump((encodings, labels), f)

with open(clf_path, 'wb') as f:
    pickle.dump(clf, f)

print(f"Saved encodings → {enc_path}")
print(f"Saved classifier → {clf_path}")

## 5) (Optional) Test on a Single Image
Use the trained classifier + stored encodings to predict a person in a static image. This is handy for quick validation before trying the webcam.

👉 Place a test image at a known path (e.g., `../data/person1/some_image.jpg`) and update the `TEST_IMAGE_PATH` below.

In [None]:
def predict_image(image_path, known_encodings, known_labels, clf):
    image_bgr = cv2.imread(image_path)
    if image_bgr is None:
        raise FileNotFoundError(f"Could not read image: {image_path}")

    image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
    boxes = face_recognition.face_locations(image_rgb)
    encs = face_recognition.face_encodings(image_rgb, boxes)

    results = []
    for enc, (top, right, bottom, left) in zip(encs, boxes):
        # Option A: classifier prediction (trained SVM)
        pred_name = clf.predict([enc])[0]
        results.append((pred_name, (top, right, bottom, left)))
    return image_bgr, results

# Example usage (uncomment and set a valid path):
# TEST_IMAGE_PATH = os.path.join(DATA_DIR, 'person1', 'your_test_image.jpg')
# with open(os.path.join(MODELS_DIR, 'classifier.pkl'), 'rb') as f:
#     loaded_clf = pickle.load(f)
# with open(os.path.join(MODELS_DIR, 'encodings.pkl'), 'rb') as f:
#     loaded_encs, loaded_labels = pickle.load(f)
# img, preds = predict_image(TEST_IMAGE_PATH, loaded_encs, loaded_labels, loaded_clf)
# for name, (t,r,b,l) in preds:
#     cv2.rectangle(img, (l,t), (r,b), (0,255,0), 2)
#     cv2.putText(img, name, (l, t-10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0,255,0), 2)
# cv2.imshow('Prediction', img)
# cv2.waitKey(0)
# cv2.destroyAllWindows()

## 6) Attendance Logging
When a face is recognized, we log it to `../outputs/attendance.csv` **once per person per day**. The logic is:

- If the CSV doesn't exist, create it with a header
- If the person is already logged today, skip
- Otherwise, append a new row with `Name, Date, Time`

In [None]:
def mark_attendance(name, filename=ATTENDANCE_FILE):
    now = datetime.now()
    dt_string = now.strftime('%Y-%m-%d')   # Date
    tm_string = now.strftime('%H:%M:%S')   # Time

    file_exists = os.path.isfile(filename)

    # Ensure header exists
    if not file_exists:
        with open(filename, 'w', newline='') as f:
            writer = csv.writer(f)
            writer.writerow(["Name", "Date", "Time"])

    # Check if this name already has attendance for today
    with open(filename, 'r') as f:
        reader = csv.reader(f)
        next(reader, None)  # skip header
        for row in reader:
            if len(row) >= 2 and row[0] == name and row[1] == dt_string:
                print(f"{name} already marked for {dt_string}")
                return

    # Append new record
    with open(filename, 'a', newline='') as f:
        writer = csv.writer(f)
        writer.writerow([name, dt_string, tm_string])
        print(f"Attendance marked for {name} at {dt_string} {tm_string}")

## 7) Real-Time Recognition (Webcam)
This cell starts your webcam and performs real-time face recognition. For each new person recognized in the current session, attendance is logged once per day.

**Controls:** Press `q` to quit.

**Note:** `face_recognition` expects RGB images, while OpenCV captures in BGR. We convert BGR → RGB before encoding.

In [None]:
def recognize_faces(frame, clf, distance_threshold=0.6):
    # Convert BGR → RGB for face_recognition
    rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    face_locations = face_recognition.face_locations(rgb)
    face_encodings = face_recognition.face_encodings(rgb, face_locations)
    names = []

    for face_encoding in face_encodings:
        # Predict with the trained classifier directly
        try:
            name = clf.predict([face_encoding])[0]
            names.append(name)
        except Exception as e:
            names.append("Unknown")
    return face_locations, names

def run_webcam_recognition():
    # Load classifier
    with open(os.path.join(MODELS_DIR, 'classifier.pkl'), 'rb') as f:
        clf = pickle.load(f)

    recognized_names = set()  # in-session dedupe to avoid spamming console/CSV

    cap = cv2.VideoCapture(0)
    if not cap.isOpened():
        raise RuntimeError("Could not open webcam. Is a camera available?")

    while True:
        ret, frame = cap.read()
        if not ret:
            print("Failed to capture image from camera.")
            break

        face_locations, names = recognize_faces(frame, clf)
        for (top, right, bottom, left), name in zip(face_locations, names):
            if name != "Unknown" and name not in recognized_names:
                recognized_names.add(name)
                mark_attendance(name)
            cv2.rectangle(frame, (left, top), (right, bottom), (0, 255, 0), 2)
            cv2.putText(frame, name, (left, top - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)

        cv2.imshow('Face Recognition - Press q to quit', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()

# To run the webcam demo, uncomment:
# run_webcam_recognition()

## 8) Notes, Troubleshooting & Next Steps
- **Lighting & camera angle:** Bad lighting or extreme angles reduce accuracy.
- **Image quality:** Use clear, frontal images for each person (several per person recommended).
- **Thresholds:** You can adjust SVM parameters or switch to distance-based matching with a threshold (e.g., 0.6) against saved encodings.
- **Liveness/anti-spoofing:** This simple system can be fooled by phone photos. To mitigate, consider adding liveness detection (eye blink, depth/IR camera, texture analysis, challenge-response).
- **Per-day CSV:** Currently logs to a single `attendance.csv`. You can change to daily files like `attendance_YYYY-MM-DD.csv` if needed.
- **Performance:** For many users, consider caching encodings, using GPU-accelerated libraries, or batching frames.

If you encounter issues, verify:
1) Paths (`DATA_DIR`, `MODELS_DIR`, `OUTPUTS_DIR`) are correct
2) Camera is available and accessible
3) You have at least 2 classes and enough images per class