# Biometric Authentication System

This notebook implements a comprehensive biometric authentication system that combines **voice recognition** and **facial recognition** technologies to provide secure, passwordless authentication.

## Overview

Traditional authentication methods relying on emails and passwords are increasingly vulnerable to security breaches. This biometric system offers a more secure alternative by utilizing:

- **Voice Biometrics**: Analyzes unique vocal characteristics and speech patterns
- **Facial Recognition**: Identifies individuals based on facial features and geometry

## Key Features

- **Multi-modal Authentication**: Combines voice and visual biometrics for enhanced security
- **Real-time Processing**: Live capture and analysis of voice and facial data
- **Machine Learning Integration**: Uses TensorFlow/Keras models for pattern recognition
- **No Password Required**: Eliminates the need for traditional credentials

## Technology Stack

- **Computer Vision**: OpenCV for image processing and facial detection
- **Audio Processing**: PyAudio for real-time voice capture and analysis
- **Machine Learning**: TensorFlow/Keras for deep learning models
- **Feature Extraction**: MFCC (Mel-frequency cepstral coefficients) for voice analysis
- **Biometric Matching**: Gaussian Mixture Models for voice pattern recognition

This system provides a robust, user-friendly authentication solution that leverages the uniqueness of human biometric traits.

In [2]:
import tensorflow as tf
import numpy as np
import os
import glob
import pickle
import cv2
import time
from numpy import genfromtxt

from keras import backend as K
from keras.models import load_model

K.set_image_data_format('channels_first')
np.set_printoptions(threshold=np.inf)


import pyaudio
from IPython.display import Audio, display, clear_output
import wave
from scipy.io.wavfile import read
#from sklearn.mixture import GMM 
from sklearn.mixture import GaussianMixture 
import warnings
warnings.filterwarnings("ignore")

from sklearn import preprocessing
import python_speech_features as mfcc
import face_recognition

In [6]:
CHANNELS = 1

# Audio Processing

## Voice Biometric Feature Extraction

This section focuses on extracting and processing acoustic features from voice recordings for biometric authentication. The audio processing pipeline converts raw voice data into machine-readable features that capture unique vocal characteristics.

### MFCC Feature Extraction

The system uses **Mel-Frequency Cepstral Coefficients (MFCC)** as the primary feature extraction technique:

- **12 MFCC coefficients** are extracted from each audio frame
- **Frame size**: 25ms with 10ms overlap for detailed temporal analysis
- **Energy component** is appended to capture voice intensity patterns
- **FFT size**: 2048 points for high-frequency resolution

### Delta Features

To capture temporal dynamics in speech patterns, the system computes **delta features**:

- **First-order derivatives** of MFCC coefficients
- Represents rate of change in spectral characteristics
- Combined with original MFCC features for comprehensive voice modeling

### Feature Normalization

- **Z-score normalization** applied to ensure consistent feature scaling
- Removes bias from different recording conditions
- Improves model convergence and matching accuracy

### Voice Pattern Modeling

The extracted features will be used to train **Gaussian Mixture Models (GMM)** that learn individual voice patterns, enabling accurate speaker identification and verification for secure authentication.

In [3]:
def calculate_delta(array):
    rows, cols = array.shape
    deltas = np.zeros((rows, cols))
    N = 2
    for i in range(rows):
        index = []
        j = 1
        while len(index) < 2 and j < N+1:
            if i-j >= 0:
                index.append(i-j)
            if i+j < rows:
                index.append(i+j)
            j+=1
        if len(index) == 2:
            deltas[i] = (array[index[1]] - array[index[0]]) / (2*N)
        elif len(index) == 1:
            deltas[i] = (array[index[0]] - array[i]) / N
    return deltas


#convert audio to mfcc features
def extract_features(audio, rate):    
    mfcc_feat = mfcc.mfcc(audio, rate, 0.025, 0.01, 12, appendEnergy=True, nfft=2048)
    mfcc_feat = preprocessing.scale(mfcc_feat)
    delta = calculate_delta(mfcc_feat)

    #combining both mfcc features and delta
    combined = np.hstack((mfcc_feat, delta)) 
    return combined

# User Registration

## Adding New Users to the System

This section covers the process of registering new users in the biometric authentication system. The registration process involves capturing and storing biometric data for both voice and facial recognition.

In [5]:
import cv2
import os

# Create a directory to store captured images
dir_name = 'saved_images'
if not os.path.exists(dir_name):
    os.makedirs(dir_name)

# Get name of the person to capture images
name = input("Enter your name: ")

# Create a directory with the name of the person
person_dir = os.path.join(dir_name, name)
if not os.path.exists(person_dir):
    os.makedirs(person_dir)

# Initialize the webcam
cap = cv2.VideoCapture(0)

# Set the width and height of the capture window
cap.set(3, 640)
cap.set(4, 480)

# Define the codec and create a VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('output.avi', fourcc, 20.0, (640, 480))

# Capture frames continuously
while True:
    # Capture a frame
    ret, frame = cap.read()

    if not ret or frame is None:
        print("Failed to capture frame from camera. Exiting...")
        break

    # Display instructions in the frame
    cv2.putText(frame, "Press 'q' to exit", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
    cv2.putText(frame, "Press 's' to save image", (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

    # Display the frame
    cv2.imshow('frame', frame)

    key = cv2.waitKey(1) & 0xFF
    # Exit the camera if 'q' is pressed
    if key == ord('q'):
        break
    # Save the image if 's' is pressed
    if key == ord('s'):
        # Generate a unique filename for the image
        filename = os.path.join(person_dir, name + "_" + str(len(os.listdir(person_dir))+1) + ".jpg")

        # Save the image
        cv2.imwrite(filename, frame)
        print("Image saved successfully in database!")

# Release the webcam and close all windows
cap.release()
cv2.destroyAllWindows()


2025-05-30 15:06:28.746 Python[27820:304246] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to /var/folders/hq/hty7b3f54q99qj56brbxr0p00000gn/T/org.python.python.savedState


Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!
Image saved successfully in database!


In [3]:
# Initialize empty arrays to store the face encodings and names
known_face_encodings = []
known_face_names = []
path = "saved_images"

# Set the path to the folder containing the images of the particular person
for file in os.listdir(path):
    
    person_path = os.path.join(path, file)
    
    # Get the list of image files in the folder
    image_files = os.listdir(person_path)

    # Check if there is at least one image file in the folder
    if len(image_files) > 0:
        # Select the first image file in the folder
        image_file = image_files[0]
    
        # Load the image file
        image = face_recognition.load_image_file(os.path.join(person_path, image_file))
    
        # Extract the face encoding from the image
        encoding = face_recognition.face_encodings(image)[0]
    
        # Extract the name of the person from the filename
        name = os.path.splitext(file)[0]
    
        # Append the face encoding and name to their respective arrays
        known_face_encodings.append(encoding)
        known_face_names.append(name)

# Print the arrays of known face encodings and their names
print("Known face encodings:", known_face_encodings)
print("Known face names:", known_face_names)


Known face encodings: [array([-0.15630172, -0.03808632,  0.01052278, -0.0074402 , -0.06061751,
        0.02011276,  0.0702951 , -0.09100263,  0.16502056, -0.10435909,
        0.19622302, -0.06010798, -0.17718334, -0.11288489, -0.03114074,
        0.12462305, -0.11458257, -0.16932786, -0.0450679 , -0.12066258,
        0.03476317, -0.01583534, -0.00969218,  0.10023776, -0.16187677,
       -0.38878292, -0.05838905, -0.17021155,  0.03976464, -0.07268727,
       -0.01359243,  0.1020198 , -0.21776104, -0.00594426, -0.03758461,
        0.05819738,  0.07864693,  0.01889201,  0.12699898,  0.05223377,
       -0.17250392, -0.0886682 ,  0.02712412,  0.26270956,  0.16132922,
        0.04599509,  0.01933765, -0.00728455,  0.01621879, -0.26146719,
        0.0645002 ,  0.07115059,  0.13612103, -0.01617012,  0.0645465 ,
       -0.16215117, -0.05950534,  0.02541297, -0.14464839,  0.08522309,
       -0.02649183, -0.01096546, -0.04645266, -0.03601773,  0.28627971,
        0.11091712, -0.11283754, -0.04810

# Speech Recognition

In [None]:
global identity
identity = None
def recognize_voice():
    global identity
    FORMAT = pyaudio.paInt16
    RATE = 44100
    CHUNK = 1024
    RECORD_SECONDS = 3
    FILENAME = "./voice.wav"

    audio = pyaudio.PyAudio()

    # Check if CHANNELS is a valid value for your microphone
    try:
        stream = audio.open(format=FORMAT, channels=CHANNELS,
                            rate=RATE, input=True,
                            frames_per_buffer=CHUNK)
    except OSError as e:
        print(f"Error opening audio stream: {e}")
        print("Please check your microphone and CHANNELS setting.")
        audio.terminate()
        return

    print("recording... say hello authentication for authentication")
    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)
    print("finished recording")

    # stop Recording
    stream.stop_stream()
    stream.close()
    audio.terminate()

recognize_voice()
print(identity)


recording... say hello authentication for authentication
finished recording
None


: 

# FACE AND VOICE AUTHENTICATION

In [5]:
face_locations = []
face_encodings = []
face_names = []
process_this_frame = True    

print("Keep Your face infront of the camera")
cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 480)
    
    
time.sleep(1.0)
start_time = time.time()
while True:
    curr_time = time.time()
            
    _, frame = cap.read()
        
    small_frame = cv2.resize(frame, (0, 0), fx=0.25, fy=0.25)
        
    rgb_small_frame = small_frame[:, :, ::-1]
        
    face=face_recognition.face_locations(rgb_small_frame)
        
    if len(face) == 1:
            
        if process_this_frame:
                
            # Find all the faces and face encodings in the current frame of video
            face_locations = face_recognition.face_locations(rgb_small_frame)
            face_encodings = face_recognition.face_encodings(rgb_small_frame.astype(np.uint8), face_locations)
                
            face_name = []
                
            for face_encoding in face_encodings:
                    
                # See if the face is a match for the known face(s)
                matches = face_recognition.compare_faces(known_face_encodings, face_encoding)
                facename="Unknown"
                    # # If a match was found in known_face_encodings, just use the first one.
                    #if True in matches:
                    #first_match_index = matches.index(True)
                    #name = known_face_names[first_match_index]
                    # Or instead, use the known face with the smallest distance to the new face
                    
                face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
                best_match_index = np.argmin(face_distances)
            
                if matches[best_match_index]:
                    facename = known_face_names[best_match_index]
                    face_names.append(facename)
                            
                 # if min dist is less then threshold value 
                 # and both face and voice matched than unlock the door
                
        process_this_frame = not process_this_frame
        # Display the results
        for (top, right, bottom, left), name in zip(face_locations, face_names):
              # Scale back up face locations since the frame we detected in was scaled to 1/4 size
            top *= 4
            right *= 4
            bottom *= 4
            left *= 4

            # Draw a box around the face
            cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2)

            # Draw a label with a name below the face
            cv2.rectangle(frame, (left, bottom - 35), (right, bottom), (0, 0, 0), cv2.FILLED)
            font = cv2.FONT_HERSHEY_DUPLEX
            
            if facename == identity: 
                cv2.putText(frame, facename +" Authenticaation successful !welcome", (left + 6, bottom - 6), font, 0.5, (0, 255, 0), 1)
            elif facename=="Unknown":
                cv2.putText(frame, facename+" Face not registered,! Unsuccessful", (left + 6, bottom - 6), font, 0.5, (0, 0, 255), 1)
            elif identity=="unknown":
                cv2.putText(frame, facename+" Voice not recognized , !Try again.....", (left + 6, bottom - 6), font, 0.5, (0, 0, 255), 1)
            else: 
                cv2.putText(frame, facename+" Voice identity not matching with face !Try again....", (left + 6, bottom - 6), font, 0.5, (0, 0, 255), 1)

        # Display the resulting image
        cv2.imshow('Video', frame)

        # Hit 'q' on the keyboard to quit!
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

# Release handle to the webcam
cap.release()
cv2.destroyAllWindows()

if len(face) == 0:
    print('There was no face found in the frame. Try again...')
                

elif len(face) > 1:
    print("More than one faces found. Try again...")


Keep Your face infront of the camera


2025-05-30 15:08:15.783 Python[27891:307152] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to /var/folders/hq/hty7b3f54q99qj56brbxr0p00000gn/T/org.python.python.savedState
