# Spanish Voice Cloning with Tortoise TTS

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/juanvolpe/voiceJuan/blob/main/colab_spanish_tts.ipynb)

This notebook will help you:
1. Set up the Spanish voice cloning system
2. Upload your voice samples
3. Generate Spanish speech with your voice

## Hugging Face Token Setup

This notebook uses your Hugging Face token to download models. The token should be set in one of these ways:

1. **Colab Secrets (Recommended)**: 
   - Already set up as "HF_TOKEN" in your Colab secrets ✅
   - No additional setup needed!

2. Alternative: Environment File
   - Only needed if not using Colab secrets
   - Create a `.env` file with `HF_TOKEN=your_token_here`

The code will automatically check Colab secrets first.

In [None]:
# Check for HF token in Colab secrets or .env file
import os
from google.colab import userdata
from dotenv import load_dotenv

def get_hf_token():
    """Get HF token from Colab secrets or .env file"""
    try:
        # Try Colab secrets first
        token = userdata.get('HF_TOKEN')
        print("✅ Found HF token in Colab secrets!")
        return token
    except Exception as e:
        print("⚠️ Colab secret not found, trying .env file...")
        # If not in secrets, try .env file
        load_dotenv()
        token = os.getenv('HF_TOKEN')
        if not token:
            raise ValueError(
                "❌ HF_TOKEN not found in Colab secrets or .env file.\n"
                "Please add it to Colab secrets or create a .env file."
            )
        print("✅ Found HF token in .env file!")
        return token

# Set the token for use in the TTS system
try:
    os.environ['HF_TOKEN'] = get_hf_token()
    print("🚀 Token set successfully! Ready to proceed.")
except Exception as e:
    print(f"❌ Error setting token: {str(e)}")

In [None]:
# Clone repository and install dependencies
!git clone https://github.com/juanvolpe/voiceJuan.git
%cd voiceJuan

print("\n📦 Installing dependencies...")
!pip install -r requirements.txt

print("\n✨ Setup complete! Ready to start voice cloning.")

## Upload Voice Samples

Please prepare your WAV files with these requirements:
- Clear Spanish speech
- WAV format (22050 Hz)
- Good quality audio (no background noise)
- 3-10 seconds per sample

Use the "Choose Files" button below to upload your samples:

In [None]:
from google.colab import files
import os

# Create directories
!mkdir -p tortoise/voices/juan_es/samples

# Upload interface
print("📂 Please upload your WAV files...")
uploaded = files.upload()

# Save files
for filename in uploaded.keys():
    if filename.endswith('.wav'):
        path = f'tortoise/voices/juan_es/samples/{filename}'
        with open(path, 'wb') as f:
            f.write(uploaded[filename])
        print(f'✅ Saved {filename}')
    else:
        print(f'❌ Skipped {filename} - not a WAV file')

# List all uploaded samples
print("\n📊 Uploaded voice samples:")
!ls tortoise/voices/juan_es/samples/

## Generate Speech

Ready to generate speech with your voice samples! You will have two options:
1. Use existing voice cache (faster)
2. Reprocess voice samples (choose this if you added new samples)

Available quality presets:
- `ultra_fast`: Quick results, lower quality
- `fast`: Good balance of speed/quality
- `standard`: Better quality, slower
- `high_quality`: Best quality, slowest

Run the code below to begin:

In [None]:
from spanish_tortoise import SpanishTTS
from IPython.display import Audio

# Initialize TTS
print("🎙️ Initializing TTS system...")
tts = SpanishTTS()  # Will ask about cache usage

# Get text input
text = input("✍️ Enter Spanish text: ")

# Available presets
presets = ['ultra_fast', 'fast', 'standard', 'high_quality']
print("\n⚙️ Available quality presets:")
for i, p in enumerate(presets, 1):
    print(f"{i}. {p}")

# Get preset choice
while True:
    choice = input("\n🎚️ Select quality (1-4) [default=2]: ").strip()
    if not choice:
        preset = 'fast'
        break
    try:
        idx = int(choice) - 1
        if 0 <= idx < len(presets):
            preset = presets[idx]
            break
    except ValueError:
        pass
    print("❌ Please enter a number between 1 and 4")

print(f"\n🎵 Generating speech with '{preset}' preset...")
output_file = tts.generate_speech(text, preset=preset)

print("\n🔊 Playing generated audio:")
Audio(output_file)

## Download Generated Audio

Click below to save the generated audio file to your computer:

In [None]:
print("💾 Starting download...")
files.download(output_file)
print("✅ Download complete!")