<a href="https://colab.research.google.com/github/mikesplore/Face-Match/blob/main/identity_verification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Setup and Installation**

face_recognition: For face detection and recognition

opencv-python: For image processing

easyocr: For extracting text from ID cards

dlib: Required by face_recognition for face landmark detection

Pillow: For image handling

In [2]:
# Install required packages
!pip install face_recognition opencv-python easyocr dlib numpy Pillow matplotlib



**Import Libraries**
Import all necessary libraries for:

Image processing (OpenCV, PIL)

Face recognition (face_recognition)

Text extraction (easyocr)

Data handling (numpy, json)

File operations (os, google.colab)

In [3]:
# Import libraries
import cv2
import numpy as np
import face_recognition
import easyocr
import json
import os
import re
from datetime import datetime
from PIL import Image, ImageEnhance # Added ImageEnhance
import matplotlib.pyplot as plt
from google.colab import files
import pandas as pd

**Upload Images Function**

Since Colab can't access your webcam directly, we'll upload images:

Selfie image (simulating live camera capture)

ID front image

ID back image (optional)
The uploaded files are saved to Colab's temporary storage.

In [4]:
def upload_id_images():
    """Reads pre-uploaded selfie and ID images for verification"""
    print("üì± LOOKING FOR YOUR IMAGES (Please ensure they are named selfie.jpg, id_front.jpg, and optionally id_back.jpg)")
    print("-" * 40)

    selfie_filename = 'selfie.jpg'
    id_front_filename = 'id_front.jpg'
    id_back_filename = 'id_back.jpg'

    if not os.path.exists(selfie_filename):
        print(f"‚ùå Error: '{selfie_filename}' not found. Please upload and rename your selfie image.")
        return None, None, None
    print(f"‚úì Found: {selfie_filename}")

    if not os.path.exists(id_front_filename):
        print(f"‚ùå Error: '{id_front_filename}' not found. Please upload and rename your ID front image.")
        return None, None, None
    print(f"‚úì Found: {id_front_filename}")

    if os.path.exists(id_back_filename):
        print(f"‚úì Found: {id_back_filename}")
    else:
        id_back_filename = None
        print("‚úì Skipped ID back image (file not found)")

    return selfie_filename, id_front_filename, id_back_filename

# Test upload function
# selfie_file, id_front_file, id_back_file = upload_id_images()

**Face Detection Function**

Face detection process:

Load image using face_recognition.load_image_file()

Find all faces using HOG (Histogram of Oriented Gradients) method

Extract face encodings (128-dimensional vector = face "fingerprint")

Return encoding, location, and cropped face image
Each person's face encoding is unique and mathematically comparable.

In [5]:
def fix_image_orientation(image_path):
    """
    Checks image orientation and rotates if necessary. Returns path to fixed image.
    If no rotation needed, returns original path.
    """
    try:
        img = Image.open(image_path)
        # Get EXIF orientation tag
        exif = img._getexif()
        orientation = exif.get(0x112)

        if orientation == 3: # Rotated 180
            img = img.rotate(180, expand=True)
        elif orientation == 6: # Rotated 90 CW
            img = img.rotate(270, expand=True)
        elif orientation == 8: # Rotated 90 CCW
            img = img.rotate(90, expand=True)

        if orientation in [3, 6, 8]:
            fixed_path = image_path.replace('.jpg', '_fixed.jpg')
            img.save(fixed_path)
            print(f"‚úì Fixed orientation for {image_path}, saved to {fixed_path}")
            return fixed_path
        else:
            return image_path

    except (AttributeError, KeyError, IndexError, TypeError) as e:
        # No EXIF data, or other error, assume correct orientation
        # print(f"Warning: Could not get EXIF orientation for {image_path}: {e}")
        return image_path

def detect_face_in_image(image_path):
    """
    Detect and extract face from an image using multiple robust methods.
    Returns: (encoding, location, face_image) or (None, None, None)
    """
    print(f"üîÑ Attempting robust face detection for {image_path}...")

    # Step 1: Fix image orientation first
    processed_image_path = fix_image_orientation(image_path)

    # Load image (potentially fixed orientation)
    image = face_recognition.load_image_file(processed_image_path)

    best_face_location = None
    best_face_encoding = None

    # Define multiple detection strategies
    detection_strategies = [
        {"model": "hog", "upsample": 2, "name": "HOG (upsample 2x)"},
        {"model": "cnn", "upsample": 1, "name": "CNN (no upsample)"},
        {"model": "cnn", "upsample": 2, "name": "CNN (upsample 2x)"},
        {"model": "hog", "upsample": 1, "name": "HOG (no upsample)"} # Less aggressive for speed after others failed
    ]

    # Try each strategy
    for strategy in detection_strategies:
        model_name = strategy["model"]
        upsample_factor = strategy["upsample"]
        strategy_name = strategy["name"]

        try:
            face_locations = face_recognition.face_locations(
                image,
                number_of_times_to_upsample=upsample_factor,
                model=model_name
            )

            if len(face_locations) > 0:
                print(f"   ‚úì Face detected using {strategy_name}.")
                # For simplicity, take the first detected face
                best_face_location = face_locations[0]
                best_face_encoding = face_recognition.face_encodings(image, [best_face_location])[0]
                break # Found a face, no need to try further strategies
        except Exception as e:
            print(f"   ‚úó {strategy_name} failed: {e}")

    # If still no face, try image enhancement (contrast)
    if best_face_encoding is None:
        print("   Trying contrast enhancement...")
        pil_image = Image.fromarray(image)
        enhancer = ImageEnhance.Contrast(pil_image)
        enhanced_image_pil = enhancer.enhance(1.5) # Increase contrast
        enhanced_image_np = np.array(enhanced_image_pil)

        try:
            face_locations = face_recognition.face_locations(enhanced_image_np, model="hog", number_of_times_to_upsample=2)
            if len(face_locations) > 0:
                print("   ‚úì Face detected after contrast enhancement.")
                best_face_location = face_locations[0]
                best_face_encoding = face_recognition.face_encodings(enhanced_image_np, [best_face_location])[0]
        except Exception as e:
            print(f"   ‚úó Contrast enhancement detection failed: {e}")

    if best_face_encoding is None:
        print(f"‚ö†Ô∏è No face found in {image_path} after all attempts.")
        return None, None, None
    else:
        top, right, bottom, left = best_face_location
        face_image = image[top:bottom, left:right]
        print(f"‚úì Final face detected at position: {best_face_location} in {image_path}")
        return best_face_encoding, best_face_location, face_image

**Face Comparison Function**

Face matching algorithm:

Calculate Euclidean distance between two face encodings

Smaller distance = more similar faces

Convert distance to percentage score (0-100%)

Use threshold of 0.6 (industry standard)

Return match status and similarity score

In [6]:
def compare_two_faces(face_encoding1, face_encoding2):
    """
    Compare two face encodings and calculate similarity

    Returns:
    - match: True if faces match
    - distance: Euclidean distance between encodings
    - similarity_score: Percentage similarity (0-100%)
    """
    if face_encoding1 is None or face_encoding2 is None:
        return False, None, 0

    # Calculate Euclidean distance
    # Lower distance = more similar
    distance = np.linalg.norm(face_encoding1 - face_encoding2)

    # Convert distance to similarity percentage
    # Using threshold of 0.6 (common in face recognition)
    threshold = 0.6
    similarity_score = max(0, 100 * (1 - distance/threshold))

    # Determine if it's a match
    match = distance < threshold

    return match, distance, similarity_score

**Visualize Face Comparison**

Create a visual comparison display showing:

Selfie face

ID face

Similarity score and match result
This helps users understand why verification passed or failed.

In [7]:
def display_face_comparison(selfie_face, id_face, similarity_score, match_status):
    """Display side-by-side comparison of faces"""
    fig, axes = plt.subplots(1, 3, figsize=(15, 4))

    # Display selfie face
    axes[0].imshow(selfie_face)
    axes[0].set_title("üì∏ Your Selfie", fontsize=14, fontweight='bold')
    axes[0].axis('off')

    # Display ID face
    axes[1].imshow(id_face)
    axes[1].set_title("üÜî ID Photo", fontsize=14, fontweight='bold')
    axes[1].axis('off')

    # Display result
    axes[2].text(0.5, 0.7, "FACE MATCH RESULT",
                 ha='center', va='center', fontsize=16, fontweight='bold')

    axes[2].text(0.5, 0.5, f"Similarity: {similarity_score:.1f}%",
                 ha='center', va='center', fontsize=14)

    if match_status:
        axes[2].text(0.5, 0.3, "‚úÖ MATCH CONFIRMED",
                     ha='center', va='center', fontsize=14, color='green', fontweight='bold')
    else:
        axes[2].text(0.5, 0.3, "‚ùå NO MATCH",
                     ha='center', va='center', fontsize=14, color='red', fontweight='bold')

    # Show threshold info
    axes[2].text(0.5, 0.1, f"Required: >85% similarity",
                 ha='center', va='center', fontsize=10)

    axes[2].axis('off')
    plt.tight_layout()
    plt.show()

**OCR Setup for ID Text Extraction**

OCR (Optical Character Recognition) setup:

Initialize EasyOCR reader for English text

First run downloads the model (~100MB)

Reader can detect text in images and convert to machine-readable text

We'll extract: Name, ID Number, Date of Birth

In [8]:
# Initialize OCR reader (this downloads model on first run)
print("üîÑ Loading OCR engine... (first time may take a minute)")
ocr_reader = easyocr.Reader(['en'])
print("‚úÖ OCR engine ready!")



üîÑ Loading OCR engine... (first time may take a minute)
‚úÖ OCR engine ready!


**Extract Text from ID**

Text extraction process:

Use EasyOCR to read all text in ID image

Apply regex patterns to find specific information:

Name: Capitalized words (2-3 parts)

ID Number: 6-10 digits or alphanumeric

Date of Birth: Various date formats

Return structured dictionary of extracted data

In [9]:
def extract_id_information(image_path, reader):
    """Extract name, ID number, and DOB from ID image"""

    # Read text from image using OCR
    print(f"üîç Reading text from {image_path}...")
    ocr_results = reader.readtext(image_path)

    # Combine all detected text
    all_text = ' '.join([result[1] for result in ocr_results])
    print(f"üìù Raw text found: {all_text[:100]}...")

    # Initialize results dictionary
    extracted_data = {
        'full_name': None,
        'id_number': None,
        'date_of_birth': None,
        'raw_text': all_text
    }

    # PATTERN 1: Find Name (capitalized words, 2-3 parts)
    name_pattern = r'\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+){1,2})\b'
    name_match = re.search(name_pattern, all_text)
    if name_match:
        extracted_data['full_name'] = name_match.group(1)
        print(f"‚úì Found name: {extracted_data['full_name']}")

    # PATTERN 2: Find ID Number (common formats)
    id_patterns = [
        r'\b\d{8}\b',  # 8 digits (Kenyan ID)
        r'\b\d{6,10}\b',  # 6-10 digits
        r'[A-Z0-9]{6,12}',  # Alphanumeric
        r'ID[:\s]\s*([A-Z0-9]+)',  # "ID: ABC123"
    ]

    for pattern in id_patterns:
        id_match = re.search(pattern, all_text)
        if id_match and not extracted_data['id_number']:
            extracted_data['id_number'] = id_match.group()
            print(f"‚úì Found ID: {extracted_data['id_number']}")
            break

    # PATTERN 3: Find Date of Birth
    dob_patterns = [
        r'\b\d{2}/\d{2}/\d{4}\b',  # DD/MM/YYYY
        r'\b\d{2}-\d{2}-\d{4}\b',  # DD-MM-YYYY
        r'DOB[:\s]\s*(\d{2}[\/\-]\d{2}[\/\-]\d{4})',  # "DOB: 01/01/1990"
        r'Birth[:\s]\s*(\d{2}[\/\-]\d{2}[\/\-]\d{4})',  # "Birth: 01/01/1990"
    ]

    for pattern in dob_patterns:
        dob_match = re.search(pattern, all_text)
        if dob_match and not extracted_data['date_of_birth']:
            extracted_data['date_of_birth'] = dob_match.group()
            print(f"‚úì Found DOB: {extracted_data['date_of_birth']}")
            break

    return extracted_data

**Display Extracted Information**

Show extracted data in user-friendly format:

Display each extracted field

Show confidence level based on what was found

Format for easy reading

In [10]:
def show_extracted_info(info_dict):
    """Display extracted information in readable format"""
    print("\n" + "=" * 50)
    print("üìÑ EXTRACTED ID INFORMATION")
    print("=" * 50)

    print(f"\nüë§ Name: {info_dict['full_name'] or 'Not found'}")
    print(f"üÜî ID Number: {info_dict['id_number'] or 'Not found'}")
    print(f"üéÇ Date of Birth: {info_dict['date_of_birth'] or 'Not found'}")

    # Calculate extraction confidence
    fields_found = sum(1 for key in ['full_name', 'id_number', 'date_of_birth']
                      if info_dict[key])

    print(f"\nüìä Extraction Confidence: {fields_found}/3 fields found")

    if fields_found == 3:
        print("‚úÖ All required information extracted successfully!")
    elif fields_found >= 2:
        print("‚ö†Ô∏è Most information extracted (check missing fields)")
    else:
        print("‚ùå Poor extraction - check image quality")

    print("=" * 50)

**Complete Verification Function**

Main verification pipeline that combines:

Face detection from both images

Face similarity calculation

ID text extraction

Final decision based on >85% similarity threshold
Returns comprehensive verification result

In [11]:
def verify_identity(selfie_path, id_front_path, id_back_path=None):
    """
    Complete identity verification process

    Returns dictionary with:
    - verification_status: PASSED/FAILED
    - similarity_score: Face match percentage
    - extracted_info: ID data
    - timestamp: When verification happened
    """

    print("\n" + "=" * 60)
    print("üîê STARTING IDENTITY VERIFICATION")
    print("=" * 60)

    # STEP 1: Detect faces
    print("\n1Ô∏è‚É£ DETECTING FACES...")
    # The detect_face_in_image function now handles multiple strategies internally
    selfie_encoding, _, selfie_face = detect_face_in_image(selfie_path)
    id_encoding, _, id_face = detect_face_in_image(id_front_path)

    if selfie_encoding is None or id_encoding is None:
        print("‚ùå Face detection failed for one or both images. Please retry with clearer images.")
        return None

    # STEP 2: Compare faces
    print("\n2Ô∏è‚É£ COMPARING FACES...")
    match, distance, similarity_score = compare_two_faces(selfie_encoding, id_encoding)

    print(f"   Face Distance: {distance:.4f}")
    print(f"   Similarity Score: {similarity_score:.1f}%")
    print(f"   Match Found: {'Yes' if match else 'No'}")

    # STEP 3: Extract ID information
    print("\n3Ô∏è‚É£ EXTRACTING ID TEXT...")
    id_info = extract_id_information(id_front_path, ocr_reader)

    # Extract from back if provided
    if id_back_path and os.path.exists(id_back_path):
        print("   Extracting from ID back...")
        back_info = extract_id_information(id_back_path, ocr_reader)

        # Merge info (use front as priority)
        for key in ['full_name', 'id_number', 'date_of_birth']:
            if not id_info[key] and back_info[key]:
                id_info[key] = back_info[key]

    # STEP 4: Display results
    print("\n4Ô∏è‚É£ DISPLAYING RESULTS...")
    display_face_comparison(selfie_face, id_face, similarity_score, match)
    show_extracted_info(id_info)

    # STEP 5: Make final decision
    print("\n5Ô∏è‚É£ FINAL VERIFICATION DECISION")
    print("-" * 40)

    verification_passed = similarity_score > 85

    if verification_passed:
        print("‚úÖ VERIFICATION PASSED!")
        print(f"   You are verified as: {id_info.get('full_name', 'Unknown')}")
        status = "PASSED"
    else:
        print("‚ùå VERIFICATION FAILED")
        print(f"   Similarity score ({similarity_score:.1f}%) below 85% threshold")
        status = "FAILED"

    # Compile results
    verification_result = {
        'verification_status': status,
        'face_similarity_score': float(similarity_score),
        'face_match': bool(match),
        'extracted_info': id_info,
        'verification_timestamp': datetime.now().isoformat(),
        'required_threshold': 85.0
    }

    print("\n" + "=" * 60)
    print("‚úÖ VERIFICATION COMPLETE")
    print("=" * 60)

    return verification_result

**Save Results Function**

Save verification results for:

Future reference

Integration with Phase 2

Audit trail
Saves as both JSON (detailed) and CSV (summary)

In [12]:
def save_verification_results(results, output_dir='verification_results'):
    """Save verification results to files"""

    # Create directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)

    # Generate unique ID
    user_id = f"user_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

    # Save as JSON (full details)
    json_filename = f"{output_dir}/{user_id}_full_results.json"
    with open(json_filename, 'w') as f:
        json.dump(results, f, indent=2)

    # Save as CSV (summary)
    csv_data = {
        'user_id': user_id,
        'timestamp': results['verification_timestamp'],
        'status': results['verification_status'],
        'similarity_score': results['face_similarity_score'],
        'name': results['extracted_info'].get('full_name', ''),
        'id_number': results['extracted_info'].get('id_number', ''),
        'dob': results['extracted_info'].get('date_of_birth', '')
    }

    csv_filename = f"{output_dir}/{user_id}_summary.csv"
    df = pd.DataFrame([csv_data])
    df.to_csv(csv_filename, index=False)

    print(f"\nüíæ Results saved:")
    print(f"   üìÑ Full details: {json_filename}")
    print(f"   üìä Summary: {csv_filename}")

    # Also save data for next phase
    next_phase_data = {
        'user_id': user_id,
        'verified_name': csv_data['name'],
        'verified_id_number': csv_data['id_number'],
        'verification_score': csv_data['similarity_score'],
        'verification_time': csv_data['timestamp']
    }

    with open('next_phase_data.json', 'w') as f:
        json.dump(next_phase_data, f, indent=2)

    print(f"   üîó Next phase data: next_phase_data.json")

    return json_filename, csv_filename

**Main Execution Function**

Complete workflow that users will run:

Upload images

Run verification

Save results

Prepare for next phase
This is the function users should call

In [None]:
def run_complete_verification():
    """Main function to run the entire verification process"""

    print("üëã WELCOME TO IDENTITY VERIFICATION SYSTEM")
    print("=" * 50)

    # Step 1: Upload images
    print("\nüì§ STEP 1: Using Pre-Uploaded Images")
    selfie_path, id_front_path, id_back_path = upload_id_images()

    if selfie_path is None or id_front_path is None:
        print("‚ùå Image files missing. Please ensure 'selfie.jpg' and 'id_front.jpg' are uploaded and correctly named.")
        return None

    # Step 2: Run verification
    print("\nüîç STEP 2: Verifying Identity...")
    results = verify_identity(selfie_path, id_front_path, id_back_path)

    if results is None:
        print("‚ùå Verification failed. Please try again.")
        return None

    # Step 3: Save results
    print("\nüíæ STEP 3: Saving Results...")
    json_file, csv_file = save_verification_results(results)

    # Step 4: Next steps
    print("\nüöÄ STEP 4: Next Steps")
    print("-" * 30)

    if results['verification_status'] == 'PASSED':
        print("‚úÖ Ready for Phase 2: Financial Data Collection")
        print(f"   User: {results['extracted_info'].get('full_name', 'Unknown')}")
        print(f"   Score: {results['face_similarity_score']:.1f}%")
        print("\nüìã Next phase will collect:")
        print("   ‚Ä¢ M-Pesa transaction history")
        print("   ‚Ä¢ Airtime usage data")
        print("   ‚Ä¢ Financial behavior analysis")
    else:
        print("‚ùå Cannot proceed to Phase 2")
        print("   Please retry verification with:")
        print("   1. Better lighting")
        print("   2. Clearer ID photo")
        print("   3. Straight-facing selfie")

    return results

# ============================================
# üöÄ RUN THE COMPLETE VERIFICATION
# ============================================
print("Ready to run identity verification!")
verification_results = run_complete_verification()

Ready to run identity verification!
üëã WELCOME TO IDENTITY VERIFICATION SYSTEM

üì§ STEP 1: Using Pre-Uploaded Images
üì± LOOKING FOR YOUR IMAGES (Please ensure they are named selfie.jpg, id_front.jpg, and optionally id_back.jpg)
----------------------------------------
‚úì Found: selfie.jpg
‚úì Found: id_front.jpg
‚úì Found: id_back.jpg

üîç STEP 2: Verifying Identity...

üîê STARTING IDENTITY VERIFICATION

1Ô∏è‚É£ DETECTING FACES...
üîÑ Attempting robust face detection for selfie.jpg...
‚úì Fixed orientation for selfie.jpg, saved to selfie_fixed.jpg
   ‚úì Face detected using HOG (upsample 2x).
‚úì Final face detected at position: (640, 1412, 1634, 419) in selfie.jpg
üîÑ Attempting robust face detection for id_front.jpg...


### Reviewing the problematic Selfie Image

Here is the selfie image that caused the 'No face found' error:

In [None]:
from PIL import Image
import matplotlib.pyplot as plt

# Path to the selfie image that failed detection
selfie_fail_path = 'selfie.jpg'

if os.path.exists(selfie_fail_path):
    img = Image.open(selfie_fail_path)
    plt.imshow(img)
    plt.title('Selfie Image (Failed Face Detection)')
    plt.axis('off')
    plt.show()
else:
    print(f"Error: The file {selfie_fail_path} was not found.")

### Why Face Detection Might Fail

Face detection models, while powerful, can be sensitive to various factors. If the system reported 'No face found' even when you believe your face was clearly visible, here are some common reasons:

*   **Poor Lighting:** Insufficient or harsh lighting can obscure facial features.
*   **Angles and Pose:** Extreme angles, looking significantly away from the camera, or having part of your face covered can hinder detection.
*   **Obstructions:** Hair, glasses (especially reflective ones), hats, masks, or hands covering parts of the face can make detection difficult.
*   **Image Resolution and Quality:** Low-resolution images, blurry photos, or highly compressed images may lack the detail needed for accurate detection.
*   **Distance from Camera:** Being too far or too close to the camera can affect how clearly facial features are captured.
*   **Background Clutter:** A busy or complex background might confuse the detection algorithm, making it harder to isolate the face.

To improve detection, please ensure your selfie:

1.  **Has good, even lighting.**
2.  **Shows your face clearly, looking straight at the camera.**
3.  **Is free from obstructions (hair, accessories, etc.).**
4.  **Is sharp and of reasonable quality.**

Please retry the verification process with an image that meets these guidelines.