# Neural Voice Conversion - Training

This notebook trains the voice conversion system on Google Colab.

**Steps:**
1. Mount Google Drive
2. Install dependencies
3. Download VCTK dataset
4. Train speaker encoder
5. Train voice conversion model

**Expected Time:** ~20-24 hours total (with free Colab GPU)

## Setup

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Clone repository (or upload your code to Drive)
!git clone https://github.com/jahuytee/Neural-Voice-Conversion-via-Learned-Speaker-Embeddings-and-Time-Frequency-Speech-Representations.git
%cd Neural-Voice-Conversion-via-Learned-Speaker-Embeddings-and-Time-Frequency-Speech-Representations

In [None]:
# Install dependencies
!pip install -q torch torchaudio
!pip install -q numpy scipy librosa
!pip install -q pyworld
!pip install -q tqdm matplotlib tabulate
!pip install -q soundfile

print("âœ… Dependencies installed!")

In [None]:
# Check GPU
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

## Download VCTK Dataset

In [None]:
# Download VCTK corpus (~11 GB, takes ~15-20 minutes)
!python scripts/download_vctk.py --output /content/drive/MyDrive/vctk_data

# OR if download fails, use manual method:
# !wget https://datashare.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip
# !unzip VCTK-Corpus-0.92.zip -d /content/drive/MyDrive/vctk_data

## Train Speaker Encoder

**Time:** ~8 hours on T4 GPU  
**Note:** Save checkpoints to Drive to resume if session times out

In [None]:
# Train speaker encoder
!python scripts/train_speaker_encoder.py \
  --data /content/drive/MyDrive/vctk_data/vctk/wav48_silence_trimmed \
  --output /content/drive/MyDrive/checkpoints/speaker_encoder \
  --epochs 100 \
  --batch-size 16 \
  --lr 2e-4 \
  --device cuda

# To resume from checkpoint:
# Add: --resume /content/drive/MyDrive/checkpoints/speaker_encoder/speaker_encoder_epoch_50.pt

## Train Voice Conversion Model

**Time:** ~12-18 hours on T4 GPU  
**Requires:** Trained speaker encoder from previous step

In [None]:
# Train voice conversion model
!python scripts/train_vc.py \
  --data /content/drive/MyDrive/vctk_data/vctk/wav48_silence_trimmed \
  --speaker-encoder /content/drive/MyDrive/checkpoints/speaker_encoder/speaker_encoder_epoch_100.pt \
  --output /content/drive/MyDrive/checkpoints/voice_conversion \
  --epochs 200 \
  --batch-size 8 \
  --lr 1e-4 \
  --device cuda

# To resume:
# Add: --resume /content/drive/MyDrive/checkpoints/voice_conversion/voice_conversion_epoch_100.pt

## Test Voice Conversion

In [None]:
# Extract a speaker embedding from VCTK
!python scripts/extract_speaker.py \
  --audio "/content/drive/MyDrive/vctk_data/vctk/wav48_silence_trimmed/p225/*.flac" \
  --output /content/drive/MyDrive/embeddings/p225.pt \
  --speaker-encoder /content/drive/MyDrive/checkpoints/speaker_encoder/speaker_encoder_epoch_100.pt \
  --vc-model /content/drive/MyDrive/checkpoints/voice_conversion/voice_conversion_epoch_200.pt

In [None]:
# Test conversion (use your own audio or another VCTK speaker)
!python convert.py \
  --input /content/drive/MyDrive/my_test_audio.wav \
  --target /content/drive/MyDrive/embeddings/p225.pt \
  --output /content/drive/MyDrive/converted_output.wav \
  --speaker-encoder /content/drive/MyDrive/checkpoints/speaker_encoder/speaker_encoder_epoch_100.pt \
  --vc-model /content/drive/MyDrive/checkpoints/voice_conversion/voice_conversion_epoch_200.pt

In [None]:
# Play converted audio
from IPython.display import Audio
Audio('/content/drive/MyDrive/converted_output.wav')

## Download Trained Models

After training completes, download checkpoints to use locally!

In [None]:
# All checkpoints are saved in:
# /content/drive/MyDrive/checkpoints/

# You can access them anytime from your Google Drive!
!ls -lh /content/drive/MyDrive/checkpoints/speaker_encoder/
!ls -lh /content/drive/MyDrive/checkpoints/voice_conversion/