# üéôÔ∏è Piper TTS - Fine-Tuning Completo

Notebook completo per fine-tuning di modelli Piper TTS su Google Colab.

## üìã Requisiti:
- Google Colab con GPU T4 (Runtime ‚Üí Change runtime type ‚Üí GPU)
- Dataset LJSpeech-IT gi√† caricato su Google Drive
- ~10GB spazio su Google Drive

## ‚è±Ô∏è Tempo stimato:
- Setup: ~10-15 min
- Training: ~8-12 ore (1000 epoch)
- Export: ~5 min

## üéØ Risultato:
- Modello ONNX personalizzato pronto per l'uso con Piper
- Config JSON con impostazioni modello


## 1Ô∏è‚É£ Setup Ambiente

In [None]:
# Monta Google Drive
from google.colab import drive
drive.mount('/content/drive')

print("‚úÖ Google Drive montato!")

In [None]:
# Verifica GPU disponibile
import torch

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    vram = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"‚úÖ GPU Disponibile: {gpu_name}")
    print(f"   VRAM: {vram:.1f} GB")
    if vram < 10:
        print("   ‚ö†Ô∏è  VRAM limitata, considera di ridurre BATCH_SIZE")
else:
    print("‚ùå GPU NON DISPONIBILE!")
    print("   Vai su Runtime ‚Üí Change runtime type ‚Üí GPU")
    raise RuntimeError("GPU required for training")

In [None]:
# Installa dipendenze sistema
print("üì¶ Installazione espeak-ng...")
!apt-get update -qq
!apt-get install -qq espeak-ng

print("\nüì¶ Installazione piper_train e dipendenze...")
!pip install -q piper-phonemize piper_train

print("\n‚úÖ Installazione completata!")

## 2Ô∏è‚É£ Configurazione Dataset

In [None]:
# ‚öôÔ∏è CONFIGURAZIONE PERCORSI
import os
from pathlib import Path

# Percorso dataset su Google Drive (MODIFICA SE NECESSARIO)
DATASET_DIR = "/content/drive/MyDrive/piper_training/dataset/ljspeech_italian"
WAVS_DIR = os.path.join(DATASET_DIR, "wavs")
METADATA_FILE = os.path.join(DATASET_DIR, "metadata.csv")

# Percorso output training
OUTPUT_DIR = "/content/drive/MyDrive/piper_training/output"
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Verifica dataset
if not os.path.exists(METADATA_FILE):
    raise FileNotFoundError(f"‚ùå metadata.csv non trovato in {DATASET_DIR}")

# Conta file audio
wav_files = list(Path(WAVS_DIR).glob("*.wav"))
print(f"‚úÖ Dataset trovato!")
print(f"   üìÅ {len(wav_files)} file audio in {WAVS_DIR}")
print(f"   üìÑ metadata.csv: {METADATA_FILE}")

## 3Ô∏è‚É£ Preprocessing - Generazione Phonemi

In [None]:
# Genera phonemi con espeak-ng
import csv
from piper_phonemize import phonemize_espeak
from tqdm import tqdm

print("üîÑ Generazione phonemi...")

# Leggi metadata
metadata = []
with open(METADATA_FILE, 'r', encoding='utf-8') as f:
    for line in f:
        parts = line.strip().split('|')
        if len(parts) == 2:
            filename, text = parts
            metadata.append((filename, text))

print(f"   üìä {len(metadata)} sample da processare")

# Genera phonemi per italiano
phonemized_metadata = []
for filename, text in tqdm(metadata, desc="Phonemizing"):
    try:
        # Usa espeak-ng per italiano
        phonemes = phonemize_espeak(text, voice="it")[0]
        phonemized_metadata.append((filename, text, phonemes))
    except Exception as e:
        print(f"‚ö†Ô∏è  Errore su {filename}: {e}")
        continue

# Salva metadata con phonemi
phonemized_file = os.path.join(DATASET_DIR, "metadata_phonemized.csv")
with open(phonemized_file, 'w', encoding='utf-8') as f:
    writer = csv.writer(f, delimiter='|')
    for row in phonemized_metadata:
        writer.writerow(row)

print(f"\n‚úÖ Phonemi generati!")
print(f"   üíæ Salvato in: {phonemized_file}")
print(f"   üìä {len(phonemized_metadata)} sample processati")

## 4Ô∏è‚É£ Download Checkpoint Base (Fine-Tuning)

In [None]:
# Download checkpoint base italiano per fine-tuning
import wget

# Checkpoint base Piper medium-quality italiano
# (Usare checkpoint esistente accelera training)
CHECKPOINT_URL = "https://huggingface.co/rhasspy/piper-voices/resolve/main/it/it_IT/riccardo/medium/it_IT-riccardo-medium.ckpt"
CHECKPOINT_DIR = "/content/piper_checkpoints"
os.makedirs(CHECKPOINT_DIR, exist_ok=True)

checkpoint_path = os.path.join(CHECKPOINT_DIR, "base_checkpoint.ckpt")

if not os.path.exists(checkpoint_path):
    print("üì• Download checkpoint base...")
    wget.download(CHECKPOINT_URL, checkpoint_path)
    print("\n‚úÖ Checkpoint scaricato!")
else:
    print("‚úÖ Checkpoint gi√† presente")

print(f"   üìÑ {checkpoint_path}")

## 5Ô∏è‚É£ Training Configuration

In [None]:
# ‚öôÔ∏è HYPERPARAMETERS

# Training
MAX_EPOCHS = 1000        # Epoch totali (ridurre per test: 100)
BATCH_SIZE = 8           # Ridurre a 4 se OOM
LEARNING_RATE = 1e-4     # Learning rate
VALIDATION_SPLIT = 0.1   # 10% validation

# Checkpoint
SAVE_EVERY = 100         # Salva checkpoint ogni N epoch
CHECKPOINT_DIR_TRAIN = os.path.join(OUTPUT_DIR, "checkpoints")
os.makedirs(CHECKPOINT_DIR_TRAIN, exist_ok=True)

# Audio
SAMPLE_RATE = 22050      # Hz
QUALITY = "medium"       # low, medium, high

print("‚öôÔ∏è  Configurazione Training:")
print(f"   Max Epochs: {MAX_EPOCHS}")
print(f"   Batch Size: {BATCH_SIZE}")
print(f"   Learning Rate: {LEARNING_RATE}")
print(f"   Validation: {VALIDATION_SPLIT*100}%")
print(f"   Quality: {QUALITY}")
print(f"   Sample Rate: {SAMPLE_RATE} Hz")
print(f"\n   üíæ Checkpoints: {CHECKPOINT_DIR_TRAIN}")

## 6Ô∏è‚É£ Avvio Training

In [None]:
# Preparazione dataset per piper_train
# Crea config.yaml per training

import yaml

config = {
    "dataset": {
        "metadata_file": phonemized_file,
        "audio_dir": WAVS_DIR,
        "sample_rate": SAMPLE_RATE,
        "validation_split": VALIDATION_SPLIT
    },
    "model": {
        "quality": QUALITY,
        "language": "it"
    },
    "training": {
        "max_epochs": MAX_EPOCHS,
        "batch_size": BATCH_SIZE,
        "learning_rate": LEARNING_RATE,
        "checkpoint_dir": CHECKPOINT_DIR_TRAIN,
        "save_every": SAVE_EVERY,
        "resume_from_checkpoint": checkpoint_path  # Fine-tuning da checkpoint base
    }
}

config_file = os.path.join(OUTPUT_DIR, "training_config.yaml")
with open(config_file, 'w') as f:
    yaml.dump(config, f)

print(f"‚úÖ Config salvata: {config_file}")

In [None]:
# AVVIO TRAINING
# ‚ö†Ô∏è Questo pu√≤ richiedere 8-12 ore!

print("üöÄ Avvio training...")
print("‚è±Ô∏è  Tempo stimato: ~8-12 ore per 1000 epoch")
print("üí° Colab Free ha limite 12 ore ‚Üí usa checkpoints per riprendere\n")

# Comando training piper
!python -m piper_train \
    --config {config_file} \
    --dataset-dir {DATASET_DIR} \
    --output-dir {CHECKPOINT_DIR_TRAIN} \
    --resume-from-checkpoint {checkpoint_path}

print("\n‚úÖ Training completato!")

## 7Ô∏è‚É£ Export Modello ONNX

In [None]:
# Trova ultimo checkpoint
import glob

checkpoints = sorted(glob.glob(os.path.join(CHECKPOINT_DIR_TRAIN, "*.ckpt")))
if not checkpoints:
    raise FileNotFoundError("‚ùå Nessun checkpoint trovato!")

last_checkpoint = checkpoints[-1]
print(f"‚úÖ Ultimo checkpoint: {last_checkpoint}")

# Export a ONNX
EXPORT_DIR = os.path.join(OUTPUT_DIR, "final_model")
os.makedirs(EXPORT_DIR, exist_ok=True)

print("\nüîÑ Export a ONNX...")
!python -m piper_train.export_onnx \
    {last_checkpoint} \
    {EXPORT_DIR}/model.onnx

print(f"\n‚úÖ Modello ONNX creato!")
print(f"   üìÑ {EXPORT_DIR}/model.onnx")

## 8Ô∏è‚É£ Test Modello

In [None]:
# Test generazione audio
from IPython.display import Audio, display
import subprocess

# Download piper binary per test
if not os.path.exists("/content/piper"):
    print("üì• Download Piper binary...")
    !wget -q https://github.com/rhasspy/piper/releases/latest/download/piper_linux_x86_64.tar.gz
    !tar -xzf piper_linux_x86_64.tar.gz -C /content
    print("‚úÖ Piper installato")

# Genera audio di test
test_text = "Benvenuto al sistema di prenotazioni. Questa √® una voce personalizzata creata con Piper TTS."
test_output = "/content/test_output.wav"

print(f"üîä Generazione audio test...")
print(f"   Testo: {test_text}")

# Usa piper per generare audio
result = subprocess.run(
    ['/content/piper/piper', 
     '--model', f'{EXPORT_DIR}/model.onnx',
     '--output_file', test_output],
    input=test_text.encode('utf-8'),
    capture_output=True
)

if os.path.exists(test_output):
    print("\n‚úÖ Audio generato!")
    display(Audio(test_output))
else:
    print(f"‚ùå Errore generazione: {result.stderr.decode()}")

## 9Ô∏è‚É£ Download Modello Finale

In [None]:
# Crea ZIP con modello e config per download
import shutil

zip_path = "/content/my_piper_model.zip"
shutil.make_archive(
    zip_path.replace('.zip', ''),
    'zip',
    EXPORT_DIR
)

print("‚úÖ Modello pronto per il download!")
print(f"   üì¶ {zip_path}")
print("\nüí° Click destro sul file nel pannello Files ‚Üí Download")
print("\nüìã Contenuto:")
print("   - model.onnx (modello TTS)")
print("   - model.onnx.json (config)")

# Mostra size
size_mb = os.path.getsize(zip_path) / 1e6
print(f"\n   üìä Dimensione: {size_mb:.1f} MB")

# Opzionale: Download automatico
from google.colab import files
files.download(zip_path)

## üéâ Completato!

### Uso del modello in locale:

```bash
# Estrai ZIP
unzip my_piper_model.zip

# Genera audio
echo "Testo di prova" | ./piper/piper \
  --model model.onnx \
  --output_file output.wav
```

### Prossimi passi:
- Testa il modello con vari testi
- Se qualit√† non soddisfacente: aumenta MAX_EPOCHS
- Confronta con modello base

### Riprendere training (se interrotto):
1. Riavvia runtime
2. Ri-esegui celle 1-5
3. Modifica cella 6: `resume_from_checkpoint` ‚Üí ultimo checkpoint salvato
