## System Design

### Overview

The system is designed to provide a robust visual identification solution, ensuring high accuracy, adaptability to varying conditions, scalability, and dynamic access control. The system architecture is modular, enabling easy maintenance and updates.

### Meeting Requirements
The Visual Identification System is designed to meet the following requirements: 
1. **High Identification Rate and Low False Positive Rate**:
- The system leverages deep learning models for accurate facial recognition, achieving a high identification rate. 
- The use of cosine similarity and Euclidean distance metrics helps in minimizing false positives by accurately measuring the similarity between embeddings.
2. **Adaptability to Different Brightness Conditions**: 
- The preprocessing module includes normalization and transformation steps to handle different lighting conditions, ensuring consistent input to the model.
3. **Dynamic Access Control**: 
- The system provides endpoints to dynamically add or remove identities from the gallery, ensuring up-to-date access control. 
- The KD-Tree structure allows for efficient updates and searches.
4.  **Scalability**: 
- The indexing module uses KD-Tree to efficiently handle millions of entries, ensuring the system can scale as the number of personnel increases.

### Detailed Components and Processes

#### 1. Preprocessing Module

**Function**: Prepares raw images for ingestion by the deep learning model.

-   **Steps**:
    1.  **Image Resizing**: Ensures all images are resized to a consistent dimension (e.g., 224x224 or 64x64 pixels) to match the input requirements of the model.
    2.  **Normalization**: Adjusts pixel values to a standard range (typically [0, 1] or [-1, 1]) to improve model performance and stability.
    3.  **Data Augmentation**: (Optional) Applies transformations like rotations, flips, and brightness adjustments to make the model more robust to variations.

- **Diagram**:

	`[Raw Image] ---> [Resize] ---> [Normalize] ---> [Preprocessed Image]` 

#### 2. Embedding Module

**Function**: Extracts feature embeddings from preprocessed images using a deep learning model.

-   **Model**: Typically a convolutional neural network (CNN) like ResNet.
-   **Process**:
    1.  The preprocessed image is fed into the CNN.
    2.  The network processes the image through several layers, extracting hierarchical features.
    3.  The output of the final layer is a feature vector (embedding) representing the image in a high-dimensional space.

- **Diagram**:

	`[Preprocessed Image] ---> [CNN (e.g., ResNet)] ---> [Embedding]` 

#### 3. Indexing Module

**Function**: Indexes embeddings for efficient similarity search.

-   **Data Structure**: KD-Tree
-   **Process**:
    1.  **Insertion**: New embeddings are inserted into the KD-Tree.
    2.  **Organization**: The KD-Tree organizes embeddings in a way that allows efficient nearest neighbor search.
    3.  **Updates**: Handles dynamic updates, allowing addition and removal of embeddings.

- **Diagram**:
	`[Embedding] ---> [KD-Tree] ---> [Indexed Embeddings]` 

#### 4. Retrieval Module

**Function**: Matches probe images with identities in the indexed gallery.

-   **Similarity Metrics**: Uses metrics like Euclidean distance or cosine similarity to find the closest matches.
-   **Process**:
    1.  The probe image is preprocessed and passed through the embedding module.
    2.  The resulting embedding is compared against the indexed embeddings using the KD-Tree.
    3.  The k-nearest neighbors are identified and returned as potential matches.

- **Diagram**:

	`[Probe Image] ---> [Embedding] ---> [KD-Tree Search] ---> [k-Nearest Neighbors]` 

#### 5. Interface Module

**Function**: Provides an API for interaction with the system.

-   **Endpoints**:
    1.  **Authenticate**: Takes an image, processes it, and returns the k-nearest neighbors.
    2.  **Add Identity**: Adds a new image to the gallery and updates the KD-Tree.
    3.  **Remove Identity**: Removes an image from the gallery and updates the KD-Tree.
    4.  **Access Logs**: Retrieves access logs for a specified period.
    5.  **Change Model**: Updates the model architecture and image size, reinitializing the pipeline.

- **Diagram**:

	`[API Request] ---> [Interface Module] ---> [Internal Process] ---> [API Response]` 

### Interaction Flow

1.  **Authentication Process**:
    
    -   User sends an image to the `/authenticate` endpoint.
    -   The image is preprocessed and embedded.
    -   The embedding is searched in the KD-Tree.
    -   The system returns the closest matches.
2.  **Adding an Identity**:
    
    -   User sends an image to the `/add_identity` endpoint.
    -   The image is preprocessed and embedded.
    -   The embedding is added to the KD-Tree.
    -   The system updates the gallery and returns a confirmation.
3.  **Removing an Identity**:
    
    -   User sends a request to the `/remove_identity` endpoint with identity details.
    -   The corresponding embedding is removed from the KD-Tree.
    -   The system updates the gallery and returns a confirmation.
4.  **Changing the Model**:
    
    -   User sends new model parameters to the `/change_model` endpoint.
    -   The system reinitializes the pipeline with the new model.
    -   Embeddings are recomputed, and the KD-Tree is rebuilt.
5.  **Access Logs**:
    
    -   User sends a request to the `/access_logs` endpoint with a time range.
    -   The system retrieves and returns the relevant logs.

**Complete System Diagram**:

![System Architecture Diagram](system_diagram.png)


## Data, Data Pipelines, and Model

### Data

The data comprises images of personnel stored in directories named after each individual. Each image represents a unique individual in the system.

-   **Storage Structure**:
    -   Gallery: `storage/gallery/<First_Name>_<Last_Name>/<Image_ID>.jpg`
    -   Embeddings: `storage/embedding/<Model_Name>/<First_Name>_<Last_Name>/<Image_ID>.npy`

### Data Pipelines

1.  **Preprocessing Pipeline**:
    
    -   Input: Raw image
    -   Output: Preprocessed image
    -   Steps: Resize, normalize, (optional) data augmentation.
2.  **Embedding Pipeline**:
    
    -   Input: Preprocessed image
    -   Output: Embedding vector
    -   Steps: Feed image into CNN, extract embedding from final layer.
3.  **Indexing Pipeline**:
    
    -   Input: Embedding vector
    -   Output: Indexed embeddings in KD-Tree
    -   Steps: Insert embedding into KD-Tree, organize for efficient search.

### Model

-   **ResNet Architecture**:
    -   Variants: ResNet-18, ResNet-34
    -   Function: Extracts feature embeddings from images.
    -   Characteristics: Balances accuracy and computational efficiency.

## Metrics Definition

### Offline Metrics

1.  **Precision at k (P@k)**:
    
    -   Measures the proportion of relevant instances among the top k retrieved results.
    -   Purpose: Evaluates the model's accuracy in retrieving relevant identities.
2.  **Recall at k (R@k)**:
    
    -   Measures the proportion of relevant instances retrieved out of all relevant instances.
    -   Purpose: Ensures the model retrieves a sufficient number of relevant identities.
3.  **Mean Reciprocal Rank (MRR)**:
    
    -   Measures the average rank at which the first relevant result is retrieved.
    -   Purpose: Evaluates the ranking quality of the retrieval system.

### Online Metrics

1.  **Authentication Success Rate**:
    
    -   Measures the proportion of successful authentications out of total authentication attempts.
    -   Purpose: Monitors the real-time performance of the system.
2.  **Average Retrieval Time**:
    
    -   Measures the time taken to retrieve the top k results.
    -   Purpose: Ensures the system meets performance requirements for real-time use.

## Post-deployment Policies

### Monitoring and Maintenance Plan

1.  **Regular Performance Monitoring**:
    
    -   Monitor authentication success rates and retrieval times.
    -   Set up alerts for significant drops in performance metrics.
2.  **Periodic Model Updates**:
    
    -   Regularly update the model with new training data to maintain accuracy.
    -   Recompute embeddings for all identities after model updates.
3.  **System Health Checks**:
    
    -   Perform routine checks on the KD-Tree structure and database integrity.
    -   Ensure the system's dependencies are up to date and secure.

### Fault Mitigation Strategies

1.  **Redundancy and Failover**:
    
    -   Implement redundant servers and failover mechanisms to ensure high availability.
    -   Regular backups of the KD-Tree and embeddings to prevent data loss.
2.  **Graceful Degradation**:
    
    -   Ensure the system can handle partial failures without complete downtime.
    -   Fallback mechanisms to switch to a simpler model or retrieval method in case of failures.
3.  **Real-time Error Reporting**:
    
    -   Implement real-time error logging and reporting.
    -   Immediate notifications to the maintenance team for critical errors.