# üéôÔ∏è Deepfake Audio Detection Pipeline

A deep learning pipeline for detecting AI-generated/synthetic audio using multiple model architectures.

## üìã Overview

| Feature      | Description                                      |
| ------------ | ------------------------------------------------ |
| **Datasets** | ASVspoof2019, Fake-or-Real, SceneFake            |
| **Task**     | Binary classification (Real vs Fake audio)       |
| **Features** | Raw waveform, Mel-spectrogram, LFCC, MFCC, CQT   |
| **Models**   | EfficientNet-B2, SEResNet, LCNN, RawNet3, AASIST |
| **Metrics**  | EER (Equal Error Rate), Accuracy, t-DCF          |


## üîß Step 1: Setup Environment

Clone the repository and navigate to the project directory.


In [5]:
%cd /content
!git clone --branch add_fake_or_real_dataset https://github.com/gkibria121/ai-pipeline.git
%cd ai-pipeline
!git pull

/content
fatal: destination path 'ai-pipeline' already exists and is not an empty directory.
/content/ai-pipeline
remote: Enumerating objects: 3, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Total 3 (delta 2), reused 3 (delta 2), pack-reused 0 (from 0)[K
Unpacking objects: 100% (3/3), 1.10 KiB | 1.10 MiB/s, done.
From https://github.com/gkibria121/ai-pipeline
   37f96c9..7c5477e  add_fake_or_real_dataset_windows -> origin/add_fake_or_real_dataset_windows
Already up to date.


## üì¶ Step 2: Install Dependencies

Install all required Python packages.


In [2]:
!pip install -r requirements.txt

Collecting torchcontrib (from -r requirements.txt (line 2))
  Downloading torchcontrib-0.0.2.tar.gz (11 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: torchcontrib
  Building wheel for torchcontrib (setup.py) ... [?25l[?25hdone
  Created wheel for torchcontrib: filename=torchcontrib-0.0.2-py3-none-any.whl size=7516 sha256=002c62c43b493f9b1d8fa02c693eecaaa6c2c7c6ad4be123f878f965deab9cab
  Stored in directory: /root/.cache/pip/wheels/e3/d1/1f/63f00ffea223db446943147a04ff035eb40d00cec3e87d63e5
Successfully built torchcontrib
Installing collected packages: torchcontrib
Successfully installed torchcontrib-0.0.2


## üìñ Available Options

### Models (`--config`)

| Config File                            | Model                       | Parameters | Input Type   |
| -------------------------------------- | --------------------------- | ---------- | ------------ |
| `config/LCNN.conf`                     | LCNN                        | ~0.5M      | Spectrogram  |
| `config/LCNN_Large.conf`               | LCNN Large                  | ~1M        | Spectrogram  |
| `config/SEResNet.conf`                 | SEResNet                    | ~12M       | Spectrogram  |
| `config/EfficientNetB2.conf`           | EfficientNet-B2             | ~9M        | Spectrogram  |
| `config/EfficientNetB2_Attention.conf` | EfficientNet-B2 + Attention | ~9M        | Spectrogram  |
| `config/RawNet3.conf`                  | RawNet3                     | ~2M        | Raw waveform |
| `config/AASIST.conf`                   | AASIST                      | ~0.3M      | Raw waveform |
| `config/AASIST-L.conf`                 | AASIST-L                    | ~0.6M      | Raw waveform |

### Datasets (`--dataset`)

| Flag | Dataset      | Description                                     |
| ---- | ------------ | ----------------------------------------------- |
| `1`  | ASVspoof2019 | Standard benchmark for audio spoofing detection |
| `2`  | Fake-or-Real | Binary classification for fake vs real audio    |
| `3`  | SceneFake    | Scene-aware fake audio detection                |

### Feature Types (`--feature_type`)

| Flag | Feature         | Description                                      |
| ---- | --------------- | ------------------------------------------------ |
| `0`  | Raw waveform    | Direct waveform processing (for RawNet3, AASIST) |
| `1`  | Mel-spectrogram | 128 mel bins                                     |
| `2`  | LFCC            | Linear Frequency Cepstral Coefficients           |
| `3`  | MFCC            | Mel-Frequency Cepstral Coefficients              |
| `4`  | CQT             | Constant-Q Transform                             |

### Command Line Arguments

```bash
python main.py \
    --config <config_file>      # Model configuration file
    --dataset <1|2|3>           # Dataset to use
    --feature_type <0-4>        # Audio feature representation
    --epochs <num>              # Number of training epochs
    --batch_size <num>          # Batch size (overrides config)
    --random_noise              # Enable data augmentation
    --weight_avg                # Enable Stochastic Weight Averaging (SWA)
    --eval_best                 # Evaluate on test set when best model is found
    --eval                      # Evaluation mode only
    --eval_model_weights <path> # Path to model weights for evaluation
    --data_subset <0.0-1.0>     # Use subset of data (for quick testing)
```

### Training Flags

| Flag             | Description                                                  |
| ---------------- | ------------------------------------------------------------ |
| `--random_noise` | Enable data augmentation (RIR, MUSAN, pitch shift, etc.)     |
| `--weight_avg`   | Enable Stochastic Weight Averaging for better generalization |
| `--eval_best`    | Evaluate on test set each time a new best model is found     |

### Augmentation Types (`--random_noise`)

When enabled, applies these augmentations randomly:

| Augmentation      | Description                                      |
| ----------------- | ------------------------------------------------ |
| RIR Simulation    | Room Impulse Response - simulates room acoustics |
| MUSAN-style Noise | Babble, music, and ambient noise                 |
| Gaussian Noise    | Additive white Gaussian noise (SNR: 10-25 dB)    |
| Reverberation     | Echo/reverb effects                              |
| Pitch Shift       | ¬±4 semitones                                     |
| Time Stretch      | 0.85x - 1.15x speed                              |
| Gain              | ¬±6 dB volume adjustment                          |
| Filters           | Low-pass and high-pass filtering                 |
| SpecAugment       | Frequency and time masking for spectrograms      |


## üì• Step 3: Download Fake-or-Real Dataset

Download the Fake-or-Real dataset (2-second audio clips). This contains:

- **Training**: 13,956 samples (6,978 real + 6,978 fake)
- **Validation**: 2,826 samples (1,413 real + 1,413 fake)
- **Testing**: 1,088 samples (544 real + 544 fake)


In [3]:
!python download_dataset.py --dataset 2


DATASET DOWNLOADER

Downloading Fake-or-Real Dataset
Using Colab cache for faster access to the 'the-fake-or-real-dataset' dataset.
‚úì Data source download complete.
‚úì Symlink created: ./fake_or_real ‚Üí /kaggle/input/the-fake-or-real-dataset
‚úì Fake-or-Real dataset ready!

Download Complete!


---

## üöÄ Step 4: Train Models

### LCNN


In [None]:
!python -u main.py --config config/LCNN.conf --feature_type 1 --dataset 2 --epochs 20 --random_noise --weight_avg --eval_best 

### LCNN Large


In [None]:
!python -u main.py --config config/LCNN_Large.conf --feature_type 1 --dataset 2 --epochs 20 --random_noise --weight_avg --eval_best 

### SEResNet


In [6]:
!python -u main.py --config config/SEResNet.conf --feature_type 1 --dataset 2 --epochs 15 --random_noise --weight_avg --eval_best --data_subset 0.01 

2025-12-11 15:26:21.936692: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1765466781.956908    2177 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1765466781.963253    2177 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1765466781.978338    2177 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1765466781.978364    2177 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1765466781.978368    2177 computation_placer.cc:177] computation placer alr

### EfficientNet-B2 with Attention


In [None]:
!python -u main.py --config config/EfficientNetB2_Attention.conf --feature_type 1 --dataset 2 --epochs 20 --random_noise --weight_avg --eval_best 

### RawNet3


In [None]:
!python -u main.py --config config/RawNet3.conf --feature_type 0 --dataset 2 --epochs 15 --random_noise --weight_avg --eval_best 

### AASIST


In [None]:
!python -u main.py --config config/AASIST.conf --feature_type 0 --dataset 2 --epochs 30 --random_noise --weight_avg --eval_best 

---

## üìä Output Structure

After training, results are saved in `exp_result/`:

```
exp_result/
‚îî‚îÄ‚îÄ <dataset>_<track>_<model>_<flags>_ep<epochs>_bs<batch>_feat<feature>/
    ‚îú‚îÄ‚îÄ config.conf              # Copy of training config
    ‚îú‚îÄ‚îÄ weights/                 # Model checkpoints
    ‚îÇ   ‚îú‚îÄ‚îÄ best.pth            # Best model (lowest dev EER)
    ‚îÇ   ‚îî‚îÄ‚îÄ swa.pth             # SWA averaged model
    ‚îú‚îÄ‚îÄ metrics/                 # Training metrics
    ‚îÇ   ‚îú‚îÄ‚îÄ epoch_metrics.json  # Per-epoch metrics
    ‚îÇ   ‚îî‚îÄ‚îÄ final_summary.json  # Final results
    ‚îú‚îÄ‚îÄ metric_log.txt          # Training log
    ‚îú‚îÄ‚îÄ evaluation_results.txt  # Final evaluation
    ‚îî‚îÄ‚îÄ events.out.*            # TensorBoard logs
```

### Metrics

| Metric       | Description                                    |
| ------------ | ---------------------------------------------- |
| **EER**      | Equal Error Rate                               |
| **Accuracy** | Classification accuracy                        |
| **t-DCF**    | Tandem Detection Cost Function (ASVspoof only) |


---

## üîç Step 5: Evaluate Model

Evaluate a trained model on the test set:


In [None]:
# Replace path with your trained model
#!python main.py --config config/LCNN.conf --dataset 2 --feature_type 1 --eval --eval_model_weights ./exp_result/<your_model_folder>/weights/best.pth

---

## üí° Common Issues

| Issue           | Solution                                                          |
| --------------- | ----------------------------------------------------------------- |
| Out of memory   | Reduce `--batch_size 16` or `--batch_size 8`                      |
| Slow training   | Mixed precision is enabled by default (`use_amp: true` in config) |
| Need quick test | Use `--data_subset 0.1` to train on 10% of data                   |


---

## üß™ Step 6: Feature & Model Experiments

Run experiments with different feature types and model configurations to find the best combination.

### Feature Type Reference

| Feature             | Best For            | Models                       |
| ------------------- | ------------------- | ---------------------------- |
| `0` Raw             | End-to-end learning | RawNet3, AASIST              |
| `1` Mel-spectrogram | General purpose     | LCNN, EfficientNet, SEResNet |
| `2` LFCC            | Speech features     | LCNN, SEResNet               |
| `3` MFCC            | Traditional speech  | All spectrogram models       |
| `4` CQT             | Harmonic analysis   | LCNN, EfficientNet           |


### Experiment 1: LCNN with CQT Features

CQT (Constant-Q Transform) provides excellent harmonic resolution for detecting synthesis artifacts.


In [None]:
!python -u main.py --config config/LCNN.conf --feature_type 4 --dataset 2 --epochs 20 --random_noise --weight_avg --eval_best 

### Experiment 2: LCNN with LFCC Features

LFCC (Linear Frequency Cepstral Coefficients) is widely used for spoofing detection.


In [None]:
!python -u main.py --config config/LCNN.conf --feature_type 2 --dataset 2 --epochs 20 --random_noise --weight_avg --eval_best 

### Experiment 3: EfficientNet-B2 with CQT Features

EfficientNet with CQT can capture fine-grained frequency patterns.


In [None]:
!python -u main.py --config config/EfficientNetB2_Attention.conf --feature_type 4 --dataset 2 --epochs 20 --random_noise --weight_avg --eval_best 

### Experiment 4: SEResNet with MFCC Features

SEResNet with MFCC for traditional speech feature analysis.


In [None]:
!python -u main.py --config config/SEResNet.conf --feature_type 3 --dataset 2 --epochs 15 --random_noise --weight_avg --eval_best 

### Experiment 5: AASIST-L (Large) with Raw Waveform

AASIST-L is a larger version with more capacity for complex patterns.


In [None]:
!python -u main.py --config config/AASIST-L.conf --feature_type 0 --dataset 2 --epochs 30 --random_noise --weight_avg --eval_best 

In [None]:
### Experiment 6: SimpleCNN Baseline with Mel-spectrogram

A lightweight baseline model for quick comparison.


In [None]:
!python -u main.py --config config/SimpleCNN.conf --feature_type 1 --dataset 2 --epochs 25 --random_noise --weight_avg --eval_best 

### Experiment 7: LCNN Large with LFCC Features

Larger LCNN model with LFCC for better spoofing artifact detection.


In [None]:
!python -u main.py --config config/LCNN_Large.conf --feature_type 2 --dataset 2 --epochs 20 --random_noise --weight_avg --eval_best 

---

## üìà Step 7: Compare All Results

After running experiments, visualize and compare all model results.


In [None]:
# Compare all trained models
!python visualize_results.py --path "exp_result/*/metrics" --compare --show-summary --output ./comparison_plots

### Display Comparison Plots


In [None]:
from IPython.display import Image, display
from pathlib import Path

# Display comparison plot if it exists
comparison_plot = Path("./comparison_plots/model_comparison.png")
if comparison_plot.exists():
    display(Image(filename=str(comparison_plot)))
else:
    print("Run the comparison command above first to generate plots.")

---

## üìä Experiment Summary Table

After completing all experiments, fill in the results:

| Experiment | Model           | Feature      | Epochs | Best EER (%) | Best Accuracy (%) |
| ---------- | --------------- | ------------ | ------ | ------------ | ----------------- |
| Baseline 1 | LCNN            | Mel-spec (1) | 20     | -            | -                 |
| Baseline 2 | EfficientNet-B2 | Mel-spec (1) | 20     | -            | -                 |
| Baseline 3 | RawNet3         | Raw (0)      | 15     | -            | -                 |
| Baseline 4 | AASIST          | Raw (0)      | 30     | -            | -                 |
| Exp 1      | LCNN            | CQT (4)      | 20     | -            | -                 |
| Exp 2      | LCNN            | LFCC (2)     | 20     | -            | -                 |
| Exp 3      | EfficientNet-B2 | CQT (4)      | 20     | -            | -                 |
| Exp 4      | SEResNet        | MFCC (3)     | 15     | -            | -                 |
| Exp 5      | AASIST-L        | Raw (0)      | 30     | -            | -                 |
| Exp 6      | SimpleCNN       | Mel-spec (1) | 25     | -            | -                 |
| Exp 7      | LCNN Large      | LFCC (2)     | 20     | -            | -                 |

**Notes:**

- Lower EER is better (0% = perfect)
- Higher Accuracy is better (100% = perfect)
- CQT and LFCC features often work well for spoofing detection
- Raw waveform models (RawNet3, AASIST) learn features automatically
