# üéôÔ∏è Deepfake Audio Detection Training Pipeline

A comprehensive deep learning pipeline for detecting AI-generated/synthetic audio (deepfakes) using multiple state-of-the-art model architectures.

## üìã Overview

| Feature      | Description                                      |
| ------------ | ------------------------------------------------ |
| **Datasets** | ASVspoof2019, Fake-or-Real, SceneFake            |
| **Task**     | Binary classification (Real vs Fake audio)       |
| **Features** | Raw waveform, Mel-spectrogram, LFCC, MFCC, CQT   |
| **Models**   | EfficientNet-B2, SEResNet, LCNN, RawNet3, AASIST |
| **Metrics**  | EER (Equal Error Rate), Accuracy, t-DCF          |

## üèóÔ∏è Available Models

| Model                           | Parameters | Input Type      | Best For                         |
| ------------------------------- | ---------- | --------------- | -------------------------------- |
| **EfficientNet-B2 + Attention** | ~9M        | Mel-spectrogram | Maximum accuracy with attention  |
| **EfficientNet-B2**             | ~9M        | Mel-spectrogram | Transfer learning from ImageNet  |
| **SEResNet**                    | ~12M       | Mel-spectrogram | Channel attention modeling       |
| **LCNN**                        | ~0.5M      | Mel-spectrogram | Lightweight & efficient          |
| **RawNet3**                     | ~2M        | Raw waveform    | End-to-end learning              |
| **AASIST**                      | ~0.3M      | Raw waveform    | State-of-the-art graph attention |

## üéØ Key Features

‚ú® **7 Model Architectures** - From lightweight LCNN to state-of-the-art AASIST  
‚ú® **Multiple Feature Types** - Raw, Mel-spectrogram, LFCC, MFCC, CQT  
‚ú® **Data Augmentation** - Random noise, pitch shift, reverberation  
‚ú® **Auto Visualization** - Feature analysis and training metrics  
‚ú® **Mixed Precision Training** - Faster training with AMP


## üîß Step 1: Setup Environment

Clone the repository and navigate to the project directory.


In [None]:
%cd /content
!git clone --branch add_fake_or_real_dataset https://github.com/gkibria121/ai-pipeline.git
%cd ai-pipeline
!git pull

## üì¶ Step 2: Install Dependencies

Install all required Python packages.


In [None]:
!pip install -r requirements.txt

## üìñ Training Configuration Guide

### Dataset Selection (`--dataset`)

| Flag | Dataset      | Description                                     |
| ---- | ------------ | ----------------------------------------------- |
| `1`  | ASVspoof2019 | Standard benchmark for audio spoofing detection |
| `2`  | Fake-or-Real | Binary classification for fake vs real audio    |
| `3`  | SceneFake    | Scene-aware fake audio detection                |

### Feature Type Options (`--feature_type`)

| Flag | Feature         | Best With                    | Description                            |
| ---- | --------------- | ---------------------------- | -------------------------------------- |
| `0`  | Raw waveform    | RawNet3, AASIST              | Direct waveform processing             |
| `1`  | Mel-spectrogram | EfficientNet, SEResNet, LCNN | 128 mel bins, best for CNN models      |
| `2`  | LFCC            | All models                   | Linear Frequency Cepstral Coefficients |
| `3`  | MFCC            | All models                   | Mel-Frequency Cepstral Coefficients    |
| `4`  | CQT             | All models                   | Constant-Q Transform                   |

### Command Line Arguments

```bash
python main.py \
    --config <config_file>      # Model configuration file
    --dataset <1|2|3>           # Dataset to use
    --feature_type <0-4>        # Audio feature representation
    --epochs <num>              # Number of training epochs
    --batch_size <num>          # Batch size (optional, overrides config)
    --random_noise              # Enable data augmentation
    --eval                      # Evaluation mode only
    --eval_model_weights <path> # Path to model weights for evaluation
    --data_subset <0.0-1.0>     # Use subset of data (for quick testing)
```

### Quick Examples

```bash
# Train EfficientNet-B2 on Fake-or-Real with augmentation
python main.py --config config/EfficientNetB2.conf --dataset 2 --feature_type 1 --epochs 20 --random_noise

# Train AASIST on ASVspoof2019
python main.py --config config/AASIST.conf --dataset 1 --feature_type 0 --epochs 50

# Quick test with 10% of data
python main.py --config config/LCNN.conf --dataset 2 --feature_type 1 --epochs 5 --data_subset 0.1
```


## üì• Step 3: Download Fake-or-Real Dataset

Download the Fake-or-Real dataset (2-second audio clips). This contains:

- **Training**: 13,956 samples (6,978 real + 6,978 fake)
- **Validation**: 2,826 samples (1,413 real + 1,413 fake)
- **Testing**: 1,088 samples (544 real + 544 fake)


In [None]:
!python download_dataset.py --dataset 2

---

## üöÄ Model Training - Recommended Configurations

Train with the best feature types for each model architecture.
**Ordered by model complexity**: Heavy models first (longer training) ‚Üí Lighter models last (faster training)

### 1. üé™ EfficientNet-B2 with Attention (Heaviest - ~9M params + Attention)

**Best for**: Maximum performance with spatial attention

- Attention-weighted pooling
- Better temporal modeling
- **Recommended**: `feature_type 1` (Mel-Spectrogram)
- **Training time**: Longest (~25 epochs recommended)


In [None]:
!python main.py --config config/EfficientNetB2_Attention.conf --feature_type 1 --dataset 2 --epochs 25 --random_noise

In [None]:
!python main.py --config config/EfficientNetB2_Attention.conf --feature_type 1 --dataset 2 --epochs 25

### 2. ‚ö° EfficientNet-B2 Standard (Heavy - ~9M params)

**Best for**: Pre-trained ImageNet knowledge transfer

- Compound scaling for efficiency
- State-of-the-art CNN architecture
- **Recommended**: `feature_type 1` (Mel-Spectrogram)
- **Training time**: Long (~20 epochs recommended)


In [None]:
!python main.py --config config/EfficientNetB2.conf --feature_type 1 --dataset 2 --epochs 20 --random_noise

In [None]:
!python main.py --config config/EfficientNetB2.conf --feature_type 1 --dataset 2 --epochs 20

### 3. üî• SEResNet (Medium - ~12M params)

**Best for**: Time-frequency representation learning

- SE blocks for channel attention
- Attentive statistics pooling
- **Recommended**: `feature_type 1` (Mel-Spectrogram)
- **Training time**: Medium (~15 epochs recommended)


In [None]:
!python main.py --config config/SEResNet.conf --feature_type 1 --dataset 2 --epochs 15 --random_noise

In [None]:
!python main.py --config config/SEResNet.conf --feature_type 1 --dataset 2 --epochs 15

### 4. üî≤ LCNN (Light CNN with MFM - Lightweight ~0.5M params)

**Best for**: Efficient and robust deepfake detection

- Max-Feature-Map (MFM) activation for noise suppression
- Residual blocks for better gradient flow
- Attentive statistics pooling
- **Recommended**: `feature_type 1` (Mel-Spectrogram)
- **Training time**: Fast (~20 epochs recommended)


In [None]:
!python main.py --config config/LCNN.conf --feature_type 1 --dataset 2 --epochs 20 --random_noise

In [None]:
!python main.py --config config/LCNN.conf --feature_type 1 --dataset 2 --epochs 20

### 5. üéØ RawNet3 (End-to-end - ~2M params)

**Best for**: Fast end-to-end learning from raw audio

- Learnable SincConv filters for adaptive frequency response
- Res2Net blocks for multi-scale feature extraction
- Processes raw waveform directly (no preprocessing needed)
- **Recommended**: `feature_type 0` (Raw waveform)
- **Training time**: Fast (~15 epochs recommended)


In [None]:
!python main.py --config config/RawNet3.conf --feature_type 0 --dataset 2 --epochs 15 --random_noise

In [None]:
!python main.py --config config/RawNet3.conf --feature_type 0 --dataset 2 --epochs 15

### 6. üß† AASIST (Graph Attention Network - ~0.3M params)

**Best for**: State-of-the-art detection accuracy

- Graph Attention Layers for spectro-temporal modeling
- Heterogeneous stacking for multi-scale features
- Winner architecture in ASVspoof challenges
- **Recommended**: `feature_type 0` (Raw waveform)
- **Training time**: Medium-Long (~25-50 epochs recommended)


In [None]:
!python main.py --config config/AASIST.conf --feature_type 0 --dataset 2 --epochs 25 --random_noise

In [None]:
!python main.py --config config/AASIST.conf --feature_type 0 --dataset 2 --epochs 25

---

## üìä Training Results & Outputs

After training, results are automatically saved in organized folders:

```
results/
‚îî‚îÄ‚îÄ FakeOrReal_audio_LCNN_ep20_bs32_feat1/
    ‚îú‚îÄ‚îÄ config.conf              # Copy of training config
    ‚îú‚îÄ‚îÄ weights/                 # Model checkpoints
    ‚îÇ   ‚îú‚îÄ‚îÄ best.pth            # Best model (lowest dev EER)
    ‚îÇ   ‚îî‚îÄ‚îÄ swa.pth             # SWA averaged model
    ‚îú‚îÄ‚îÄ metrics/                 # Training metrics
    ‚îÇ   ‚îú‚îÄ‚îÄ epoch_metrics.json  # Per-epoch metrics
    ‚îÇ   ‚îî‚îÄ‚îÄ final_summary.json  # Final results
    ‚îú‚îÄ‚îÄ metric_log.txt          # Training log
    ‚îú‚îÄ‚îÄ evaluation_results.txt  # Final evaluation
    ‚îî‚îÄ‚îÄ events.out.*            # TensorBoard logs
```

### Key Metrics

| Metric       | Description                    | Good Value            |
| ------------ | ------------------------------ | --------------------- |
| **EER**      | Equal Error Rate               | < 5%                  |
| **Accuracy** | Classification accuracy        | > 95%                 |
| **t-DCF**    | Tandem Detection Cost Function | < 0.1 (ASVspoof only) |


## üîç Model Evaluation

After training, evaluate your model on the test set:


In [None]:
# Evaluate a trained model (replace path with your trained model)
# !python main.py --config config/LCNN.conf --dataset 2 --feature_type 1 --eval --eval_model_weights ./results/FakeOrReal_audio_LCNN_ep20_bs32_feat1/weights/best.pth

## üí° Tips & Best Practices

### Choosing the Right Model

| Use Case              | Recommended Model                   | Why                          |
| --------------------- | ----------------------------------- | ---------------------------- |
| **Quick prototyping** | LCNN                                | Fast training, good accuracy |
| **Production (edge)** | LCNN, RawNet3                       | Lightweight, efficient       |
| **Maximum accuracy**  | AASIST, EfficientNet-B2 + Attention | State-of-the-art performance |
| **Transfer learning** | EfficientNet-B2                     | Pretrained ImageNet weights  |

### Training Tips

1. **Start with augmentation** (`--random_noise`) - Usually improves generalization
2. **Use Mel-spectrogram** (`--feature_type 1`) for CNN-based models
3. **Use Raw waveform** (`--feature_type 0`) for RawNet3 and AASIST
4. **Monitor validation EER** - Lower is better (0% = perfect)
5. **Use `--data_subset 0.1`** for quick experiments before full training

### Common Issues

| Issue         | Solution                                             |
| ------------- | ---------------------------------------------------- |
| Out of memory | Reduce `batch_size` in config or via `--batch_size`  |
| Slow training | Enable `use_amp: true` in config for mixed precision |
| Overfitting   | Add `--random_noise` or increase `dropout` in config |
| Poor results  | Try different `feature_type` or more epochs          |
