# Spanish Voice Cloning with Tortoise TTS

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/juanvolpe/voiceJuan/blob/main/colab_spanish_tts.ipynb)

This notebook will help you:
1. Set up the Spanish voice cloning system
2. Upload your voice samples
3. Generate Spanish speech with your voice

## Hugging Face Token Setup

This notebook requires your Hugging Face token to download models.
The token must be set in your Colab secrets as 'HF_TOKEN'.

To add your token to Colab secrets:
1. Click the folder icon on the left sidebar
2. Click the key icon 🔑 to open secrets
3. Add a new secret with:
   - Name: `HF_TOKEN`
   - Value: Your Hugging Face token

Let's check if your token is set correctly:

In [None]:
# Get Hugging Face token from Colab secrets
import os
from google.colab import userdata

try:
    # Get token from Colab secrets
    token = userdata.get('HF_TOKEN')
    if not token:
        raise ValueError(
            "❌ HF_TOKEN not found in Colab secrets!\n"
            "Please add your Hugging Face token to Colab secrets as 'HF_TOKEN'"
        )
    
    # Set for use in the TTS system
    os.environ['HF_TOKEN'] = token
    print("✅ Found HF token in Colab secrets!")
    print("🚀 Token set successfully! Ready to proceed.")
except Exception as e:
    print(f"❌ Error: {str(e)}")
    raise  # Stop execution if no token

In [None]:
# Clone repository and install dependencies
!git clone https://github.com/juanvolpe/voiceJuan.git
%cd voiceJuan

# Add current directory to Python path
import sys
sys.path.append('.')

print("\n📦 Installing dependencies...")
# First install TTS explicitly
!pip install -q TTS
# Then install other requirements
!pip install -q -r requirements.txt

# Verify TTS installation
print("\n🔍 Verifying TTS installation...")
!pip list | grep TTS

# Restart runtime to ensure TTS is properly loaded
print("\n🔄 Restarting runtime to complete setup...")
print("Please run this cell again after the restart.")
import IPython
IPython.get_ipython().kernel.do_shutdown(True)

In [None]:
# Verify Python path is set correctly
import sys
if '.' not in sys.path:
    sys.path.append('.')
print("✅ Python path set up correctly!")

In [None]:
# List available TTS models
from TTS.api import TTS

print("📋 Available TTS models:")
print("\nMultilingual models:")
for model in TTS.list_models():
    if "multilingual" in model and "multi-dataset" in model:
        print(f"- {model}")

print("\nSpanish models:")
for model in TTS.list_models():
    if "es" in model:
        print(f"- {model}")

## Upload Voice Samples

Please prepare your WAV files with these requirements:
- Clear Spanish speech
- WAV format (22050 Hz)
- Good quality audio (no background noise)
- 3-10 seconds per sample

Use the "Choose Files" button below to upload your samples:

In [None]:
from google.colab import files
import os

# Create directories
!mkdir -p tortoise/voices/juan_es/samples

# Upload interface
print("📂 Please upload your WAV files...")
uploaded = files.upload()

# Save files
for filename in uploaded.keys():
    if filename.endswith('.wav'):
        path = f'tortoise/voices/juan_es/samples/{filename}'
        with open(path, 'wb') as f:
            f.write(uploaded[filename])
        print(f'✅ Saved {filename}')
    else:
        print(f'❌ Skipped {filename} - not a WAV file')

# List all uploaded samples
print("\n📊 Uploaded voice samples:")
!ls tortoise/voices/juan_es/samples/

## Generate Speech

Ready to generate speech with your voice samples! You will have two options:
1. Use existing voice cache (faster)
2. Reprocess voice samples (choose this if you added new samples)

Available quality presets:
- `ultra_fast`: Quick results, lower quality
- `fast`: Good balance of speed/quality
- `standard`: Better quality, slower
- `high_quality`: Best quality, slowest

Run the code below to begin:

In [None]:
from voiceJuan.spanish_tortoise import SpanishTTS
from IPython.display import Audio

# Initialize TTS
print("🎙️ Initializing TTS system...")
tts = SpanishTTS()  # Will ask about cache usage

# Get text input
text = input("✍️ Enter Spanish text: ")

# Available presets
presets = ['ultra_fast', 'fast', 'standard', 'high_quality']
print("\n⚙️ Available quality presets:")
for i, p in enumerate(presets, 1):
    print(f"{i}. {p}")

# Get preset choice
while True:
    choice = input("\n🎚️ Select quality (1-4) [default=2]: ").strip()
    if not choice:
        preset = 'fast'
        break
    try:
        idx = int(choice) - 1
        if 0 <= idx < len(presets):
            preset = presets[idx]
            break
    except ValueError:
        pass
    print("❌ Please enter a number between 1 and 4")

print(f"\n🎵 Generating speech with '{preset}' preset...")
output_file = tts.generate_speech(text, preset=preset)

print("\n🔊 Playing generated audio:")
Audio(output_file)

## Download Generated Audio

Click below to save the generated audio file to your computer:

In [None]:
print("💾 Starting download...")
files.download(output_file)
print("✅ Download complete!")