# ðŸ‡¬ðŸ‡· Greek TTS Training on Google Colab

Train DIA TTS model for Greek language using free/paid Colab GPU.

**Requirements:**
- Google Colab Pro ($10/month) or Pro+ ($50/month) recommended for A100
- HuggingFace account (for Common Voice dataset)

**Estimated Time:**
- T4 (free): ~20-30 hours for 50 epochs
- A100 (Pro+): ~6-10 hours for 50 epochs

In [None]:
# Check GPU
!nvidia-smi
import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

In [None]:
# Install dependencies
!pip install -q transformers datasets huggingface_hub speechbrain torchaudio phonemizer tqdm wandb
!apt-get install -y espeak-ng > /dev/null 2>&1
print("âœ… Dependencies installed")

In [None]:
# Clone repository
!git clone https://github.com/nari-labs/dia.git
%cd dia/dia-multilingual

In [None]:
# Login to HuggingFace
from huggingface_hub import notebook_login
notebook_login()

In [None]:
# Download Greek datasets
# Adjust max_samples based on your time budget:
# - 5000 samples: ~2-3 hours training
# - 20000 samples: ~8-10 hours training

!python scripts/download_greek_datasets.py \
    --datasets commonvoice fleurs \
    --output_dir /content/data/el \
    --max_samples 10000

In [None]:
# Check downloaded data
import json
with open("/content/data/el/manifests/train_manifest_el.json") as f:
    manifest = json.load(f)
print(f"Training samples: {len(manifest)}")
total_hours = sum(s.get('duration', 0) for s in manifest) / 3600
print(f"Total audio: {total_hours:.1f} hours")

In [None]:
# Optional: Setup Weights & Biases for monitoring
import wandb
wandb.login()

In [None]:
# Mount Google Drive to save checkpoints
from google.colab import drive
drive.mount('/content/drive')

# Create checkpoint directory in Drive
!mkdir -p "/content/drive/MyDrive/greek_tts_checkpoints"

In [None]:
# Start training!
# Adjust batch_size based on GPU:
# - T4 (16GB): batch_size=8-16
# - A100 (40GB): batch_size=32-64

!python scripts/train_greek.py \
    --manifest /content/data/el/manifests/train_manifest_el.json \
    --lang_vocab configs/lang_vocab.json \
    --output_dir "/content/drive/MyDrive/greek_tts_checkpoints" \
    --epochs 50 \
    --batch_size 16 \
    --lr 1e-4 \
    --wandb

In [None]:
# Test inference with trained model
!python scripts/infer_greek.py \
    --model_path "/content/drive/MyDrive/greek_tts_checkpoints/greek_best.pt" \
    --text "Î“ÎµÎ¹Î± ÏƒÎ±Ï‚, Î±Ï…Ï„Î® ÎµÎ¯Î½Î±Î¹ Î¼Î¹Î± Î´Î¿ÎºÎ¹Î¼Î®" \
    --output_dir /content/samples/

In [None]:
# Play generated audio
from IPython.display import Audio
Audio("/content/samples/greek_Î“ÎµÎ¹Î±_ÏƒÎ±Ï‚_Î±Ï…Ï„Î®_ÎµÎ¯Î½Î±Î¹_Î¼Î¹Î±_Î´Î¿ÎºÎ¹Î¼.wav")