🎙️ VoxGuard — Audio Deepfake Detection

Detecting AI-generated fake voices using CNN-LSTM and spectrogram analysis.

📌 What is VoxGuard?

With the rise of AI voice cloning tools (ElevenLabs, VALL-E, etc.), it has become easy to generate fake audio that sounds exactly like a real person. VoxGuard is a deep learning system that detects whether a voice recording is genuine or AI-generated.

It works by:

Extracting MFCC features from the audio (a compact representation of the sound spectrum)
Passing them through a CNN to detect local patterns in the sound
Passing through an LSTM to analyse how those patterns change over time
Outputting a probability — how likely the voice is fake

🧠 Architecture

Audio File (.wav / .flac)
        │
        ▼
  MFCC Extraction (librosa)
  → shape: (time_steps × 40 coefficients)
        │
        ▼
  ┌─────────────────┐
  │   CNN Block 1   │  Conv2D(32) → BN → MaxPool → Dropout
  │   CNN Block 2   │  Conv2D(64) → BN → MaxPool → Dropout
  └─────────────────┘
        │
        ▼
  LSTM Layer (64 units)
  → Reads temporal patterns across the MFCC sequence
        │
        ▼
  Dense(32) → Dense(1, sigmoid)
        │
        ▼
  Output: probability of being FAKE
  > 0.5 → FAKE  |  < 0.5 → REAL

📁 Project Structure

VoxGuard/
├── config.py            # All settings — change hyperparameters here
├── extract_features.py  # Step 1: Extract MFCC features from audio files
├── model.py             # CNN-LSTM model definition
├── train.py             # Step 2: Train the model
├── evaluate.py          # Step 3: Print metrics + confusion matrix
├── predict.py           # Step 4: Check any audio file
├── requirements.txt
├── data/
│   ├── real/            # Put genuine voice files here (.wav / .flac)
│   ├── fake/            # Put AI-spoofed voice files here
│   └── README.md        # Dataset download instructions
└── results/
    ├── training_curves.png
    └── confusion_matrix.png

🚀 Getting Started

1. Clone the repo

git clone https://github.com/ramlasyaa/VoxGuard.git
cd VoxGuard

2. Set up environment

python3 -m venv venv
source venv/bin/activate        # macOS/Linux
# venv\Scripts\activate         # Windows

pip install -r requirements.txt

3. Add your dataset

See data/README.md for instructions.
Recommended: ASVspoof 2019 LA partition.

data/
├── real/   ← copy genuine .wav files here
└── fake/   ← copy spoofed .wav files here

4. Extract features

python extract_features.py

5. Train the model

python train.py

6. Evaluate

python evaluate.py

7. Predict any audio file

python predict.py path/to/voice.wav
# or an entire folder
python predict.py path/to/audio_folder/

📊 Results

Results on the ASVspoof 2019 LA evaluation set:

Metric	Score
Accuracy	~91%
Precision	~89%
Recall	~93%
F1-Score	~91%
ROC-AUC	~0.96

Results may vary depending on dataset size and split.

🛠️ Tech Stack

Component	Tool / Library
Language	Python 3.9+
Deep Learning	TensorFlow / Keras
Audio Processing	Librosa
Features	MFCC (40 coefficients)
ML Metrics	Scikit-learn
Visualization	Matplotlib, Seaborn

🔑 Key Concepts

MFCC — Mel-Frequency Cepstral Coefficients. Compact audio features that capture how the human ear perceives sound.
CNN — Detects local spatial patterns in the MFCC "image".
LSTM — Captures how those patterns evolve over time (temporal context).
Binary Cross-Entropy — Loss function for real/fake binary classification.
EarlyStopping — Prevents overfitting by stopping training when validation loss plateaus.

📄 Related Research

This project is based on our paper:

"VoxGuard: Fake Audio (Deepfake Voice) Detection Using Spectrogram Analysis"
Ram Lasya et al. — CONIT 2026 (IEEE)

🔗 References

Built by Ram Lasya · Amrita Vishwa Vidyapeetham

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ VoxGuard — Audio Deepfake Detection

📌 What is VoxGuard?

🧠 Architecture

📁 Project Structure

🚀 Getting Started

1. Clone the repo

2. Set up environment

3. Add your dataset

4. Extract features

5. Train the model

6. Evaluate

7. Predict any audio file

📊 Results

🛠️ Tech Stack

🔑 Key Concepts

📄 Related Research

🔗 References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
.gitignore		.gitignore
README.md		README.md
config.py		config.py
evaluate.py		evaluate.py
extract_features.py		extract_features.py
model.py		model.py
predict.py		predict.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

🎙️ VoxGuard — Audio Deepfake Detection

📌 What is VoxGuard?

🧠 Architecture

📁 Project Structure

🚀 Getting Started

1. Clone the repo

2. Set up environment

3. Add your dataset

4. Extract features

5. Train the model

6. Evaluate

7. Predict any audio file

📊 Results

🛠️ Tech Stack

🔑 Key Concepts

📄 Related Research

🔗 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages