# üéôÔ∏è Deepfake Audio Detection Pipeline

A deep learning pipeline for detecting AI-generated/synthetic audio using multiple model architectures.

## üìã Overview

| Feature      | Description                                      |
| ------------ | ------------------------------------------------ |
| **Datasets** | ASVspoof2019, Fake-or-Real, SceneFake            |
| **Task**     | Binary classification (Real vs Fake audio)       |
| **Features** | Raw waveform, Mel-spectrogram, LFCC, MFCC, CQT   |
| **Models**   | EfficientNet-B2, SEResNet, LCNN, RawNet3, AASIST |
| **Metrics**  | EER (Equal Error Rate), Accuracy, t-DCF          |


## üîß Step 1: Setup Environment

Clone the repository and navigate to the project directory.


In [None]:
%cd /content
!git clone --branch add_fake_or_real_dataset https://github.com/gkibria121/ai-pipeline.git
%cd ai-pipeline
!git pull

## üì¶ Step 2: Install Dependencies

Install all required Python packages.


In [None]:
!pip install -r requirements.txt

## üìñ Available Options

### Models (`--config`)

| Config File                            | Model                       | Parameters | Input Type   |
| -------------------------------------- | --------------------------- | ---------- | ------------ |
| `config/LCNN.conf`                     | LCNN                        | ~0.5M      | Spectrogram  |
| `config/LCNN_Large.conf`               | LCNN Large                  | ~1M        | Spectrogram  |
| `config/SEResNet.conf`                 | SEResNet                    | ~12M       | Spectrogram  |
| `config/EfficientNetB2.conf`           | EfficientNet-B2             | ~9M        | Spectrogram  |
| `config/EfficientNetB2_Attention.conf` | EfficientNet-B2 + Attention | ~9M        | Spectrogram  |
| `config/RawNet3.conf`                  | RawNet3                     | ~2M        | Raw waveform |
| `config/AASIST.conf`                   | AASIST                      | ~0.3M      | Raw waveform |
| `config/AASIST-L.conf`                 | AASIST-L                    | ~0.6M      | Raw waveform |

### Datasets (`--dataset`)

| Flag | Dataset      | Description                                     |
| ---- | ------------ | ----------------------------------------------- |
| `1`  | ASVspoof2019 | Standard benchmark for audio spoofing detection |
| `2`  | Fake-or-Real | Binary classification for fake vs real audio    |
| `3`  | SceneFake    | Scene-aware fake audio detection                |

### Feature Types (`--feature_type`)

| Flag | Feature         | Description                                      |
| ---- | --------------- | ------------------------------------------------ |
| `0`  | Raw waveform    | Direct waveform processing (for RawNet3, AASIST) |
| `1`  | Mel-spectrogram | 128 mel bins                                     |
| `2`  | LFCC            | Linear Frequency Cepstral Coefficients           |
| `3`  | MFCC            | Mel-Frequency Cepstral Coefficients              |
| `4`  | CQT             | Constant-Q Transform                             |

### Command Line Arguments

```bash
python main.py \
    --config <config_file>      # Model configuration file
    --dataset <1|2|3>           # Dataset to use
    --feature_type <0-4>        # Audio feature representation
    --epochs <num>              # Number of training epochs
    --batch_size <num>          # Batch size (overrides config)
    --random_noise              # Enable data augmentation
    --weight_avg                # Enable Stochastic Weight Averaging (SWA)
    --eval_best                 # Evaluate on test set when best model is found
    --eval                      # Evaluation mode only
    --eval_model_weights <path> # Path to model weights for evaluation
    --data_subset <0.0-1.0>     # Use subset of data (for quick testing)
```

### Training Flags

| Flag             | Description                                                  |
| ---------------- | ------------------------------------------------------------ |
| `--random_noise` | Enable data augmentation (RIR, MUSAN, pitch shift, etc.)     |
| `--weight_avg`   | Enable Stochastic Weight Averaging for better generalization |
| `--eval_best`    | Evaluate on test set each time a new best model is found     |

### Augmentation Types (`--random_noise`)

When enabled, applies these augmentations randomly:

| Augmentation      | Description                                      |
| ----------------- | ------------------------------------------------ |
| RIR Simulation    | Room Impulse Response - simulates room acoustics |
| MUSAN-style Noise | Babble, music, and ambient noise                 |
| Gaussian Noise    | Additive white Gaussian noise (SNR: 10-25 dB)    |
| Reverberation     | Echo/reverb effects                              |
| Pitch Shift       | ¬±4 semitones                                     |
| Time Stretch      | 0.85x - 1.15x speed                              |
| Gain              | ¬±6 dB volume adjustment                          |
| Filters           | Low-pass and high-pass filtering                 |
| SpecAugment       | Frequency and time masking for spectrograms      |


## üì• Step 3: Download Fake-or-Real Dataset

Download the Fake-or-Real dataset (2-second audio clips). This contains:

- **Training**: 13,956 samples (6,978 real + 6,978 fake)
- **Validation**: 2,826 samples (1,413 real + 1,413 fake)
- **Testing**: 1,088 samples (544 real + 544 fake)


In [None]:
!python download_dataset.py --dataset 2

---

## üöÄ Step 4: Train Models

### LCNN


In [None]:
!python main.py --config config/LCNN.conf --feature_type 1 --dataset 2 --epochs 20 --random_noise --weight_avg --eval_best

### LCNN Large


In [None]:
!python main.py --config config/LCNN_Large.conf --feature_type 1 --dataset 2 --epochs 20 --random_noise --weight_avg --eval_best

### SEResNet


In [None]:
!python main.py --config config/SEResNet.conf --feature_type 1 --dataset 2 --epochs 15 --random_noise --weight_avg --eval_best

### EfficientNet-B2 with Attention


In [None]:
!python main.py --config config/EfficientNetB2_Attention.conf --feature_type 1 --dataset 2 --epochs 20 --random_noise --weight_avg --eval_best

### RawNet3


In [None]:
!python main.py --config config/RawNet3.conf --feature_type 0 --dataset 2 --epochs 15 --random_noise --weight_avg --eval_best

### AASIST


In [None]:
!python main.py --config config/AASIST.conf --feature_type 0 --dataset 2 --epochs 30 --random_noise --weight_avg --eval_best

---

## üìä Output Structure

After training, results are saved in `exp_result/`:

```
exp_result/
‚îî‚îÄ‚îÄ <dataset>_<track>_<model>_<flags>_ep<epochs>_bs<batch>_feat<feature>/
    ‚îú‚îÄ‚îÄ config.conf              # Copy of training config
    ‚îú‚îÄ‚îÄ weights/                 # Model checkpoints
    ‚îÇ   ‚îú‚îÄ‚îÄ best.pth            # Best model (lowest dev EER)
    ‚îÇ   ‚îî‚îÄ‚îÄ swa.pth             # SWA averaged model
    ‚îú‚îÄ‚îÄ metrics/                 # Training metrics
    ‚îÇ   ‚îú‚îÄ‚îÄ epoch_metrics.json  # Per-epoch metrics
    ‚îÇ   ‚îî‚îÄ‚îÄ final_summary.json  # Final results
    ‚îú‚îÄ‚îÄ metric_log.txt          # Training log
    ‚îú‚îÄ‚îÄ evaluation_results.txt  # Final evaluation
    ‚îî‚îÄ‚îÄ events.out.*            # TensorBoard logs
```

### Metrics

| Metric       | Description                                    |
| ------------ | ---------------------------------------------- |
| **EER**      | Equal Error Rate                               |
| **Accuracy** | Classification accuracy                        |
| **t-DCF**    | Tandem Detection Cost Function (ASVspoof only) |


---

## üîç Step 5: Evaluate Model

Evaluate a trained model on the test set:


In [None]:
# Replace path with your trained model
#!python main.py --config config/LCNN.conf --dataset 2 --feature_type 1 --eval --eval_model_weights ./exp_result/<your_model_folder>/weights/best.pth

---

## üí° Common Issues

| Issue           | Solution                                                          |
| --------------- | ----------------------------------------------------------------- |
| Out of memory   | Reduce `--batch_size 16` or `--batch_size 8`                      |
| Slow training   | Mixed precision is enabled by default (`use_amp: true` in config) |
| Need quick test | Use `--data_subset 0.1` to train on 10% of data                   |
