# Audio Deepfake Detection — Report & Presentation

Summary of the project: datasets, methodology, results, and conclusions. Suitable for reports and presentations.

---
## 1. Introduction & Objectives

- **Goal**: Detect synthetic (deepfake) speech vs human (real) speech from audio.
- **Approach**: Handcrafted acoustic features (MFCC, spectral, pitch, etc.) + classical ML (SVM, XGBoost, Random Forest, Decision Tree, Logistic Regression).
- **Evaluation**: In-dataset (FoR test) and zero-shot (In-the-Wild, ElevenLabs) to assess generalisation.

---
## 2. Datasets

In [1]:
import os
import sys
import json
from pathlib import Path
import matplotlib.pyplot as plt
import pandas as pd

sys.path.insert(0, str(Path.cwd().parent))

from configs.config import FEATURES_DIR, ITW_DATASET_PATH, ELEVEN_LABS_FEATURES_PATH, MODELS_PATH, FINAL_MODELS_PATH

FEATURE_SET = "mean_20_128_256_128"
paths = {
    "FoR train": os.path.join(FEATURES_DIR, f"training_features_{FEATURE_SET}.parquet"),
    "FoR val": os.path.join(FEATURES_DIR, f"validation_features_{FEATURE_SET}.parquet"),
    "FoR test": os.path.join(FEATURES_DIR, f"testing_features_{FEATURE_SET}.parquet"),
    "ITW": os.path.join(ITW_DATASET_PATH, "normalized_features", f"itw_features_{FEATURE_SET}_trimmed_loudness_normalized.parquet"),
    "ElevenLabs": os.path.join(ELEVEN_LABS_FEATURES_PATH, f"eleven_labs_features_{FEATURE_SET}.parquet"),
}

rows = []
for name, p in paths.items():
    if os.path.isfile(p):
        df = pd.read_parquet(p)
        n_real = (df["label"] == "real").sum() if "label" in df.columns else "—"
        n_fake = (df["label"] == "fake").sum() if "label" in df.columns else "—"
        rows.append({"Dataset": name, "Samples": len(df), "Real": n_real, "Fake": n_fake})
    else:
        rows.append({"Dataset": name, "Samples": "—", "Real": "—", "Fake": "—"})

dataset_table = pd.DataFrame(rows)
display(dataset_table)

Unnamed: 0,Dataset,Samples,Real,Fake
0,FoR train,53868,26941,26927
1,FoR val,10798,5400,5398
2,FoR test,4634,2264,2370
3,ITW,31779,19963,11816
4,ElevenLabs,136,74,62


- **FoR (Forensics in the Wild)**: train/validation/test splits; primary training and in-dataset evaluation.
- **In-the-Wild**: external benchmark; zero-shot evaluation only.
- **ElevenLabs**: external benchmark; zero-shot evaluation only.

---
## 3. Methodology

- **Features**: MFCC (20) + deltas, spectral (centroid, bandwidth, flatness, rolloff), RMSE, ZCR, pitch (YIN), mel spectrogram; aggregated per file (mean).
- **Models**: Linear SVM, RBF SVM, Logistic Regression, Random Forest, XGBoost, Decision Tree (see notebooks in this folder).
- **Training**: FoR train (and validation where used); StandardScaler + classifier; class_weight="balanced" for SVMs.
- **Evaluation**: Accuracy, precision, recall, F1 (macro), ROC AUC; confusion matrices; zero-shot on ITW and ElevenLabs.

---
## 4. Results — FoR test set

Summary of saved experiments (when available). Metrics are read from `notebooks/experiments/<model>/<exp>/metrics.json`.

In [4]:
def collect_val_results(root_dir):
    results = []
    if not os.path.isdir(root_dir):
        return results
    for model_name in os.listdir(root_dir):
        model_dir = os.path.join(root_dir, model_name)
        if not os.path.isdir(model_dir):
            continue
        for exp_name in sorted(os.listdir(model_dir)):
            exp_dir = os.path.join(model_dir, exp_name)
            val_path = os.path.join(exp_dir, "val_results.json")
            if not os.path.isfile(val_path):
                continue
            with open(val_path) as f:
                val_entries = json.load(f)
            if not isinstance(val_entries, list) or not val_entries:
                continue
            best = max(val_entries, key=lambda e: e.get("selection_score", 0))
            row = {"Model": model_name, "Experiment": exp_name}
            for k in ["val_accuracy", "val_precision", "val_recall", "val_f1_macro", "val_roc_auc"]:
                v = best.get(k)
                if v is not None:
                    row[k] = round(v[0] if isinstance(v, (list, tuple)) else v, 4)
            results.append(row)
    return results

val_results = collect_val_results(FINAL_MODELS_PATH)
if val_results:
    val_df = pd.DataFrame(val_results)
    display(val_df)
else:
    print("No saved experiments found under FINAL_MODELS_PATH.")

Unnamed: 0,Model,Experiment,val_accuracy,val_precision,val_recall,val_f1_macro,val_roc_auc
0,Dtree,exp_20260207_210558,0.9545,0.9565,0.9538,0.9543,0.9588
1,linear_svm,exp_20260207_193304,0.7052,0.7741,0.6993,0.6815,0.8904
2,logistic_reg,exp_20260207_192945,0.7279,0.7875,0.7226,0.7099,0.9022
3,poly_svm,exp_20260207_201742,0.7615,0.8002,0.7656,0.7554,0.8479
4,rbf_svm,exp_20260207_200752,0.7611,0.7614,0.7605,0.7607,0.8323
5,sigmoid_svm,exp_20260207_204305,0.7268,0.7656,0.7223,0.7137,0.835


---
## Results — ITW test set

In [2]:
def collect_experiment_metrics(root_dir):
    results = []
    if not os.path.isdir(root_dir):
        return results
    for model_name in os.listdir(root_dir):
        model_dir = os.path.join(root_dir, model_name)
        if not os.path.isdir(model_dir):
            continue
        for exp_name in sorted(os.listdir(model_dir)):
            exp_dir = os.path.join(model_dir, exp_name)
            metrics_path = os.path.join(exp_dir, "metrics.json")
            if not os.path.isfile(metrics_path):
                continue
            with open(metrics_path) as f:
                metrics = json.load(f)
            row = {"Model": model_name, "Experiment": exp_name}
            for k in ["accuracy", "precision", "recall", "f1", "roc_auc"]:
                if k in metrics:
                    row[k] = round(metrics[k], 4)
            results.append(row)
    return results

all_results = collect_experiment_metrics(FINAL_MODELS_PATH)
if all_results:
    metrics_df = pd.DataFrame(all_results)
    display(metrics_df)
else:
    print("No saved experiments found under MODELS_PATH. Run experimental-pipeline-e2e.ipynb or the model notebooks and save experiments to populate this table.")

Unnamed: 0,Model,Experiment,accuracy,precision,recall,f1,roc_auc
0,Dtree,exp_20260207_210558,0.5905,0.5495,0.5451,0.5446,0.5216
1,linear_svm,exp_20260207_193304,0.72,0.7075,0.7181,0.7097,0.7811
2,logistic_reg,exp_20260207_192945,0.7163,0.706,0.7179,0.7073,0.7775
3,poly_svm,exp_20260207_201742,0.6742,0.6497,0.614,0.6144,0.6907
4,rbf_svm,exp_20260207_200752,0.7059,0.6968,0.7086,0.6974,0.7631
5,sigmoid_svm,exp_20260207_204305,0.6601,0.6768,0.6864,0.6585,0.7038


---
## 5. Zero-shot evaluation

Models trained on FoR are evaluated on **In-the-Wild** and **ElevenLabs** without retraining. See `model_zero_shot_evaluation.ipynb` for detailed per-model results and threshold analysis.

---
## 6. Conclusions & reproducibility

- **Summary**: Classical ML on handcrafted acoustic features achieves strong in-dataset performance on FoR; zero-shot performance on ITW and ElevenLabs reflects domain shift.
- **Reproducibility**: Use `experimental-pipeline-e2e.ipynb` for a single end-to-end run (feature paths → train → evaluate → zero-shot → save). Ensure parquet feature files and (optionally) raw WAV are available as in README Data Setup.