Skip to content

nawedarshad/TrueVision-TrueSight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TrueVision-TrueSight

Overview

The Deepfake Detection System is a sophisticated project aimed at identifying and mitigating the risks associated with deepfake technology. As deepfakes become increasingly sophisticated, it is essential to develop robust methods for detection and analysis. This project leverages advanced machine learning techniques, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and ensemble methods, to deliver a reliable deepfake detection solution.

The system focuses on analyzing both visual and audio components of media files, employing innovative techniques such as multimodal detection and blockchain verification to enhance accuracy and transparency.

System Architecture

The architecture of the deepfake detection system consists of several interrelated components designed to effectively extract, analyze, and classify media content. The system can be divided into the following layers:

Flowchart

System Architecture

1. Feature Extraction Layer

This layer is responsible for extracting relevant features from both video and audio inputs:

  • Visual Features: The system employs various facial action detectors, including the fusion of face fake detectors, to analyze frames and extract critical visual features. Models such as FABNet and Vision Transformers play a crucial role in this stage.

    Let ( I ) denote an input image. The visual features can be extracted using a convolutional operation:

    [ F(I) = W * I + b ]

    where ( W ) is the filter (kernel) and ( b ) is the bias term. The resulting features are then passed through non-linear activation functions, typically ReLU:

    [ F'(I) = \max(0, F(I)) ]

  • Audio Features: The audio input is processed using Fast Fourier Transform (FFT) for frequency analysis and HuBERT for advanced audio feature extraction. The FFT of a discrete signal ( x[n] ) is defined as:

    [ X(k) = \sum_{n=0}^{N-1} x[n] e^{-j(2\pi/N)kn} ]

    This allows the system to capture discrepancies between audio and visual components.

2. Attention Layer

The attention layer utilizes Perceiver IO to identify complex interactions between different modalities (audio and visual). The attention mechanism can be mathematically represented as follows:

[ \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{Q K^T}{\sqrt{d_k}}\right)V ]

where ( Q ), ( K ), and ( V ) represent the query, key, and value matrices, respectively, and ( d_k ) is the dimensionality of the key vectors. This layer is crucial for understanding the relationship between what is seen and heard in the media, facilitating more accurate detection.

3. Behaviour & Appearance Layer

In this layer, the system employs an ensemble of models, including Random Forest, XGBoost, and Support Vector Machines (SVM), to classify the media as real or fake. The decision boundary for SVM can be expressed as:

[ f(x) = w^T x + b ]

where ( w ) represents the weights, ( b ) the bias, and ( x ) is the input feature vector. This ensemble approach leverages the strengths of various algorithms, enhancing the overall performance of the detection system.

4. Temporal Modelling Layer

Temporal analysis is critical for understanding how media evolves over time. The system uses techniques such as Long Short-Term Memory (LSTM) networks and WaveNet for temporal sequence analysis. An LSTM cell can be defined by the following equations:

[ f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \quad \text{(Forget Gate)} ]

[ i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \quad \text{(Input Gate)} ]

[ \tilde{C}t = \tanh(W_C \cdot [h{t-1}, x_t] + b_C) \quad \text{(Cell State)} ]

[ C_t = f_t \cdot C_{t-1} + i_t \cdot \tilde{C}_t \quad \text{(Cell State Update)} ]

[ o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \quad \text{(Output Gate)} ]

[ h_t = o_t \cdot \tanh(C_t) \quad \text{(Hidden State)} ]

This allows the detection of inconsistencies that may arise in videos due to manipulation.

5. Additional Detection Layer

To further enhance the system's capabilities, this layer incorporates various techniques, including:

  • Frequency Analysis: Analyzing the frequency patterns to detect alterations. For instance, the power spectral density ( P(f) ) can be computed to understand the signal's distribution over frequency.

  • Biometric Verification: Ensuring the identity of the individuals in the video by comparing biometric data (like facial features) against a database.

  • Hybrid Model: Combining different detection approaches for improved accuracy through techniques like stacking or blending classifiers.

  • Audio-Visual Inconsistencies: Identifying discrepancies between audio and video content, quantified using measures such as cross-correlation.

6. Triplet Loss Layer

Utilized for improving the embedding quality of the data, the triplet loss function helps in optimizing the model to distinguish between real and fake instances effectively. The triplet loss is defined as:

[ L(a, p, n) = \max\left(0, |f(a) - f(p)|^2 - |f(a) - f(n)|^2 + \alpha\right) ]

where ( a ) is the anchor, ( p ) is the positive sample, ( n ) is the negative sample, and ( \alpha ) is the margin. This loss function encourages the model to pull together embeddings of similar instances while pushing apart embeddings of dissimilar instances.

Installation

To set up the Deepfake Detection System on your local machine, follow these steps:

  1. Clone the Repository:
    git clone https://github.com/yourusername/deepfake-detection-system.git
    cd deepfake-detection-system
    
    

Installation

To set up the Deepfake Detection System on your local machine, follow these steps:

  1. Clone the Repository:

    git clone https://github.com/nawedarshad/TrueVision-TrueSight.git
    cd TrueVision-TrueSight
  2. Install Required Packages: Ensure you have Python installed, then install the necessary dependencies:

    pip install -r requirements.txt
  3. Prepare Your Dataset: Download your sample videos and audio files and place them in the data/sample_videos/ and data/sample_audio/ directories, respectively.

Usage

To run the Deepfake Detection System, execute the following command in your terminal:

python src/main.py

The results will be saved in the results/ directory, where you can review the detections.

Future Directions

As deepfake technology continues to evolve, this project aims to stay at the forefront of detection capabilities. Future enhancements may include:

  • Integrating New Detection Models: Exploring the latest advancements in deep learning for enhanced detection accuracy.
  • Real-Time Processing: Implementing real-time detection capabilities to identify deepfakes as they are created.
  • User Interface: Developing a user-friendly interface to facilitate easier interaction with the system.
  • Community Contributions: Encouraging collaboration and contributions from researchers and developers in the field.

Disclaimer

This project is intended for research purposes only. The development of deepfake detection technologies aims to promote ethical AI use and combat misinformation. Users are encouraged to apply these methods responsibly.

For more information, please refer to the individual components in the src directory and the accompanying documentation.

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages