# Introduction to the Integrated Deepfake Detection System

## Overview

Welcome to the Integrated Deepfake Detection System! This project is a comprehensive effort to tackle the growing problem of deepfake videos using advanced machine learning techniques. As deepfake technology continues to evolve, it poses significant challenges to privacy, security, and authenticity in media. Our system is designed to detect deepfakes by analyzing multiple aspects of video content, specifically focusing on spatial, temporal, and micro-expression features. By integrating these different feature types, we aim to create a robust and reliable detection mechanism capable of identifying manipulated video content.

## Objectives

The primary objective of this project is to develop a deepfake detection system that can accurately distinguish between genuine and manipulated videos. This involves:

1. **Spatial Feature Extraction**: Analyzing individual frames of a video to capture static facial features. This is accomplished using pre-trained Convolutional Neural Networks (CNNs) such as ResNet50 and VGG16, which are fine-tuned for this task.
   
2. **Temporal Feature Extraction**: Understanding how facial features change over time by analyzing sequences of frames. Bidirectional Long Short-Term Memory (BiLSTM) networks are employed to capture temporal dependencies and detect inconsistencies in the flow of facial expressions.

3. **Micro-Expression Analysis**: Focusing on subtle facial movements that are difficult to replicate in deepfake videos. This module uses specialized CNN architectures to extract and analyze micro-expressions, providing an additional layer of detection.

4. **Feature Fusion**: Combining spatial, temporal, and micro-expression features using attention mechanisms to form a comprehensive feature set that enhances detection accuracy. This fusion approach leverages the strengths of each feature type to make a final decision about the authenticity of the video.

## Why This Approach?

Deepfake detection is a challenging task due to the sophistication of the algorithms used to create these fakes. Traditional detection methods that rely on a single type of feature often fail to capture the complexity of manipulations. By integrating spatial, temporal, and micro-expression features, our approach provides a multi-dimensional analysis that significantly improves the likelihood of detecting deepfakes. This holistic strategy addresses the limitations of existing methods and provides a more reliable solution.

### Key Challenges Addressed:

- **Variability in Deepfake Techniques**: Different deepfake algorithms have varying strengths and weaknesses. By analyzing multiple aspects of the video, our system can detect a wide range of manipulations.
- **Subtle Manipulations**: Some deepfakes are so well-crafted that the manipulations are not immediately noticeable to the human eye. Micro-expression analysis helps in detecting these subtle manipulations.
- **Generalization**: Ensuring that the model is not overfitting to specific datasets or types of deepfakes. The fusion of different feature types helps the model generalize better across diverse datasets.

## Dataset

For this project, we use the **FaceForensics++** dataset, which is a benchmark dataset commonly used in deepfake detection research. It contains a collection of both original and manipulated video sequences, providing a diverse set of examples for training and testing the system. The dataset is divided into two main categories:

- **Original Sequences**: Videos that have not been altered, serving as ground truth for authenticity.
- **Manipulated Sequences**: Videos that have been altered using various deepfake techniques, providing examples of fake content.

## Notebook Structure

This Jupyter Notebook is structured to guide you through the different stages of the deepfake detection process:

1. **Data Preprocessing**: Preparing the video frames for feature extraction, including face detection, alignment, and normalization.
   
2. **Model Architecture**: Detailed implementation of the spatial, temporal, and micro-expression feature extraction models. This section includes building and training the CNN (ResNet50, VGG16), BiLSTM, and micro-expression analysis models.

3. **Feature Fusion and Classification**: Combining the extracted features and using attention mechanisms to improve the detection accuracy. The final output layer provides the classification result, indicating whether the video is genuine or a deepfake.

4. **Evaluation and Results**: Testing the trained model on a set of test videos and evaluating its performance using metrics such as accuracy, precision, recall, and F1-score.

5. **Conclusion and Future Work**: Summarizing the findings and discussing potential improvements and future directions for research in deepfake detection.

## Getting Started

To run this notebook, ensure that you have all the necessary dependencies installed. The required libraries are listed in the `requirements.txt` file. Use the following command to install them:

```bash
pip install -r requirements.txt
```

Additionally, make sure you have the FaceForensics++ dataset downloaded and placed in the appropriate directory as specified in the notebook.

## Conclusion

The Integrated Deepfake Detection System represents a significant step towards addressing the challenges posed by deepfake technology. By leveraging multiple feature extraction methods and incorporating attention mechanisms, this system aims to provide a robust solution for identifying manipulated video content. We hope that this project will contribute to the ongoing efforts to maintain the integrity and authenticity of digital media.

Let's get started and dive into the world of deepfake detection!

## **Data Loading**

In [1]:
import pandas as pd
import numpy as np
import os

In [2]:
original_video_directory = '../Datasets/original_sequences'
manipulated_video_directory = '../Datasets/manipulated_sequences'

In [3]:
video_paths = []
labels = []

for  video in os.listdir(original_video_directory):
    if video.endswith('.mp4'):
        video_paths.append(os.path.join(original_video_directory, video))
        labels.append(0)
for video in os.listdir(manipulated_video_directory):
    if video.endswith('.mp4'):
        video_paths.append(os.path.join(manipulated_video_directory,video))
        labels.append(1)

deepfake_data = pd.DataFrame({
    'video_path': video_paths,
    'label': labels
})

deepfake_data = deepfake_data.sample(frac=1).reset_index(drop=True)

In [4]:
deepfake_data['label'].value_counts()

label
1    56
0    36
Name: count, dtype: int64

## **Preprocessing Phase**

## **Model Building**

## 1. **Facial Feature Extractor Module**

### 1. Spatial Feature Extractor

In [5]:
import numpy as np
import cv2
import os
import tensorflow as tf
from keras.api.layers import TimeDistributed,Input, Flatten
from keras.api.applications.resnet50 import  ResNet50, preprocess_input as resnet_preprocess
from keras.api.models import  Model
import matplotlib.pyplot as plt

In [6]:
base_model = ResNet50(include_top=False, weights='imagenet', pooling='avg', input_shape=(224,224,3))

In [7]:
model =  Model(inputs=base_model.input, outputs=base_model.output)

In [8]:
def resnet_preprocess(image_path, target_size=(224,224)):

    img = tf.io.read_file(image_path)
    img = tf.image.decode_image(img, channels=3)
    img = tf.image.resize(img, target_size)

    img = img/255.0
    img = (img-0.5)*2.0

    img_arr = tf.expand_dims(img, axis=0)

    return img_arr


In [9]:
preprocessable_files = []
for filename in os.listdir('../frames'):
    filepath = os.path.join('../frames',filename)
    preprocessable_files.append(filepath)

In [10]:
preprocessed_images = []
for item in preprocessable_files:
    preprocessed_img = resnet_preprocess(item)
    preprocessed_images.append(preprocessed_img)
preprocessed_images_batch = tf.concat(preprocessed_images, axis=0)

In [11]:
len(preprocessed_images_batch)

146

In [12]:
preprocessed_images = np.array(preprocessed_images_batch)

In [13]:
features = model.predict(preprocessed_images)

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 1s/step


In [14]:
preprocessed_images.shape

(146, 224, 224, 3)

In [15]:
preprocessed_images_expanded = np.expand_dims(preprocessed_images, axis=0)

In [16]:
preprocessed_images_expanded.shape

(1, 146, 224, 224, 3)

In [17]:
features.shape

(146, 2048)

In [18]:
print(features[0])

[0.         0.         0.00772802 ... 4.02363    0.         0.        ]


### 2. Temporal Feature Extractor

In [19]:
from tensorflow import keras
from keras.layers import Bidirectional,LSTM,Input
from keras.models import Model

In [20]:
def build_temporal_feature_extractor(sequence_length, feature_shape):
    input_seq = Input((sequence_length, feature_shape))

    lstm_1 = LSTM(128, return_sequences=True, dropout=0.2)
    lstm_2 = LSTM(64,return_sequences=True, dropout=0.2)

    lstm_out = Bidirectional(lstm_1)(input_seq)
    lstm_out = Bidirectional(lstm_2)(lstm_out)

    model = Model(inputs=input_seq, outputs=lstm_out)
    return model

In [21]:
temporal_feature_extractor = build_temporal_feature_extractor(sequence_length=features.shape[0], feature_shape=features.shape[1])

In [22]:
temporal_feature_extractor.summary()

In [23]:
features.shape

(146, 2048)

In [24]:
features_temp = np.expand_dims(features, axis=0)

In [25]:
features_temp.shape

(1, 146, 2048)

In [26]:
temp_features_extracted = temporal_feature_extractor.predict(features_temp)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 570ms/step


In [27]:
temp_features_extracted

array([[[-0.01078969,  0.05039846,  0.03381352, ..., -0.18898293,
         -0.5162102 , -0.03835326],
        [-0.02348706,  0.08325385,  0.05160118, ..., -0.2102538 ,
         -0.5105164 , -0.07875411],
        [-0.03358026,  0.11335701,  0.0600911 , ..., -0.21939592,
         -0.5087893 , -0.10199063],
        ...,
        [-0.08211824,  0.31410012,  0.10042705, ..., -0.16421047,
         -0.13652982, -0.13261798],
        [-0.08654797,  0.30833328,  0.09354513, ..., -0.11656106,
         -0.08131052, -0.11769918],
        [-0.09374568,  0.29856735,  0.08686332, ..., -0.06158441,
         -0.03160968, -0.08218902]]], dtype=float32)

## **2. Micro Expression Inconsistency Detection Module**

### **1. Micro Expression Feature Extraction**

In [28]:
from keras.api.layers import Conv2D,BatchNormalization,Activation,MaxPooling2D,Dropout, Dense, Flatten

In [29]:
input_shape = (64,64,3)
inputs = Input(input_shape)

# Layer 1
x = Conv2D(32,(3,3),padding='same')(inputs)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(2,2))(x)

# Layer 2
x = Conv2D(64,(3,3),padding='same')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(2,2))(x)

# Layer 3
x = Conv2D(128,(3,3),padding='same')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(2,2))(x)

# Layer 4
x = Flatten()(x)
x = Dense(256, activation='relu')(x)
# Add dropouts in case of overfitting

output = Dense(128, activation='relu')(x)

micro_exp_feature_extractor = Model(inputs=inputs, outputs=output)
micro_exp_feature_extractor.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
micro_exp_feature_extractor.summary()

In [30]:
#TODO: Train the model
micro_exp_feature_extractor.fit()

AttributeError: 'NoneType' object has no attribute 'shape'

### **2. Temporal Inconsistency Detection in Micro Expression**

In [None]:
from keras.api.layers import Attention,Concatenate

In [None]:
def build_temporal_inconsistency_attention(seq_length, feature_dims):
    temp_feat_shape = (seq_length, feature_dims)
    temp_inputs = Input(shape=temp_feat_shape)
    
    x_mic_exp = Bidirectional(LSTM(128, return_sequences=True, dropout=0.2))(temp_inputs)
    
    x_mic_exp = Bidirectional(LSTM(64, return_sequences=True, dropout=0.2))(x_mic_exp)
    
    attention_output = Attention()([x_mic_exp,x_mic_exp])
    
    x_mic_exp = Dense(256, activation='relu')(attention_output)
    # Add dropout layers for overfitting
    x_mic_exp = Dense(128, activation='relu')(x_mic_exp)
    
    mic_exp_temp_model = Model(inputs=temp_inputs, outputs=x_mic_exp)
    
    mic_exp_temp_model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
    
    return mic_exp_temp_model
    

In [None]:
mic_exp_temp_model_built = build_temporal_inconsistency_attention(seq_length=146, feature_dims=128)

In [None]:
#TODO: Train the model
mic_exp_temp_model_built.fit()

## **3. Feature Fusion Layer**

### **1. Spatial Attention Mechanism**

In [None]:
from keras.api.layers import Multiply, GlobalAvgPool2D

In [34]:
def build_spatial_attention_mechanism(feature_maps):
    """
    :param feature_maps: 
    :return: weighted feature maps
    """
    
    attention_map = Conv2D(1,(1,1),strides=(1,1),padding="same")(feature_maps)
    
    attention_map = Activation('sigmoid')(attention_map) # 'sigmoid' or 'softmax' can be used as an activation function
    
    # Element wise multiplication of feature_maps and attention_map
    weighted_feature_map = Multiply()([feature_maps, attention_map])
    
    # Convert the weighted feature map into a context vector
    spatial_context_vectors = GlobalAvgPool2D()(weighted_feature_map)
    
    return spatial_context_vectors
    

### **2. Temporal Attention Mechanism**

In [35]:
def build_temporal_attention_mechanism(feature_maps):
    """
    :param feature_maps: 
    :return weighted_feature_maps: 
    """
    
    temporal_attention_scores = Dense(1, activation='tanh')(feature_maps)
    
    temporal_attention_weights = Activation('softmax')(temporal_attention_scores)
    
    weighted_temporal_features = Multiply()([feature_maps, temporal_attention_weights])
    
    context_vector = tf.reduce_sum(weighted_temporal_features, axis=1)
    
    return context_vector

### **3. Micro Expression Attention Mechanism**

#### **1. Spatial Micro Expression Attention Mechanism**

In [33]:
def build_spatial_micro_expression_attention_mechanism(micro_exp_spatial_feature_maps):
    """
    :param micro_exp_spatial_feature_maps: 
    :return weighted micro_exp feature maps : 
    """
    
    attention_map = Conv2D(1,(1,1),strides=(1,1),padding="same")(micro_exp_spatial_feature_maps)
    
    attention_map = Activation('sigmoid')(attention_map)
    
    weighted_micro_exp_feature_map = Multiply()([micro_exp_spatial_feature_maps,attention_map])
    
    micro_exp_spatial_context_vector = GlobalAvgPool2D()(weighted_micro_exp_feature_map)
    
    return micro_exp_spatial_context_vector

#### **2. Temporal Micro Expression Attention Mechanism**

In [36]:
def build_temporal_micro_expression_attention_mechanism(micro_exp_feature_vectors):
    """
    :param micro_exp_feature_vectors: 
    :return micro_exp_context_vectors: 
    """
    
    attention_scores = Dense(1,activation='tanh')(micro_exp_feature_vectors)
    
    attention_weights = Activation('softmax')(attention_scores)
    
    weighted_micro_exp_temporal_features = Multiply()([attention_weights, micro_exp_feature_vectors])
    
    micro_exp_context_vector = tf.reduce_sum(weighted_micro_exp_temporal_features, axis=1)
    
    return micro_exp_context_vector

### **4. Concatenation Layer**

In [37]:
from keras.api.layers import Concatenate

In [39]:
def build_feature_fusion_layer():
    
    spatial_context_vectors = build_spatial_attention_mechanism(feature_maps=features)

    temporal_context_vector = build_temporal_attention_mechanism(feature_maps=features)
    
    micro_exp_spatial_context_vector = build_spatial_micro_expression_attention_mechanism(micro_exp_spatial_feature_maps=features)
    
    micro_exp_temporal_context_vector = build_temporal_micro_expression_attention_mechanism(micro_exp_feature_vectors=features)
    
    concatenated_feature_vector = Concatenate()([
        spatial_context_vectors,
        temporal_context_vector,
        micro_exp_spatial_context_vector,
        micro_exp_temporal_context_vector
    ])
    
    return concatenated_feature_vector

## **4. Face Swap Detection Model**

In [40]:
def build_face_swap_detection_model():
    concatenated_feature_vector = build_feature_fusion_layer()
    
    dense_units = [256,128,64]
    
    x_face_swap = concatenated_feature_vector
    for unit in dense_units:
        x_face_swap = Dense(unit, activation='relu')(x_face_swap)
        x_face_swap = Dropout(0.5)(x_face_swap)
    
    op_face_swap = Dense(1,activation='sigmoid')(x_face_swap)
    
    face_swap_detector_model = Model(inputs=concatenated_feature_vector, output=op_face_swap)
    
    return face_swap_detector_model    