# Spanish Voice Cloning with Tortoise TTS

<a href="https://colab.research.google.com/github/juanvolpe/voiceJuan/blob/main/colab_spanish_tts.ipynb" target="_parent">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This notebook will help you:
1. Set up the Spanish voice cloning system
2. Upload your voice samples
3. Generate Spanish speech with your voice

Let us begin with the setup:

In [None]:
## Hugging Face Token Setup

This notebook uses your Hugging Face token to download models. The token should be set in one of these ways:

1. **Colab Secrets (Recommended)**: 
   - Already set up as "HF_TOKEN" in your Colab secrets ✅
   - No additional setup needed!

2. Alternative: Environment File
   - Only needed if not using Colab secrets
   - Create a `.env` file with `HF_TOKEN=your_token_here`

The code will automatically check Colab secrets first.


In [None]:
# Get Hugging Face token from Colab secrets
import os
from google.colab import userdata

try:
    # Get token from Colab secrets
    token = userdata.get('HF_TOKEN')
    if not token:
        raise ValueError(
            "❌ HF_TOKEN not found in Colab secrets!\n"
            "Please add your Hugging Face token to Colab secrets as 'HF_TOKEN'"
        )
    
    # Set for use in the TTS system
    os.environ['HF_TOKEN'] = token
    print("✅ Found HF token in Colab secrets!")
    print("🚀 Token set successfully! Ready to proceed.")
except Exception as e:
    print(f"❌ Error: {str(e)}")
    raise  # Stop execution if no token


In [None]:
# Clone repository and install dependencies
!git clone https://github.com/juanvolpe/voiceJuan.git
%cd voiceJuan

print("\n📦 Installing dependencies...")
!pip install -q -r requirements.txt TTS  # -q for quiet installation

print("\n✨ Setup complete! Ready to start voice cloning.")


In [None]:
# Verify TTS installation
print("🔍 Checking TTS package...")
!pip list | grep TTS

print("\n📦 If TTS is not listed above, run these commands:")
print("!pip install -q TTS")
print("Then restart the runtime and run all cells from the beginning.")


In [None]:
## Hugging Face Token Setup

This notebook uses your Hugging Face token to download models. The token should be set in one of these ways:

1. **Colab Secrets (Recommended)**: 
   - Already set up as "HF_TOKEN" in your Colab secrets ✅
   - No additional setup needed!

2. Alternative: Environment File
   - Only needed if not using Colab secrets
   - Create a `.env` file with `HF_TOKEN=your_token_here`

The code will automatically check Colab secrets first.


In [None]:
# Check for HF token in Colab secrets or .env file
import os
from google.colab import userdata
from dotenv import load_dotenv

def get_hf_token():
    """Get HF token from Colab secrets or .env file"""
    try:
        # Try Colab secrets first
        return userdata.get('HF_TOKEN')
    except Exception:
        # If not in secrets, try .env file
        load_dotenv()
        token = os.getenv('HF_TOKEN')
        if not token:
            raise ValueError(
                "HF_TOKEN not found in Colab secrets or .env file.\n"
                "Please add it to Colab secrets or create a .env file."
            )
        return token

# Set the token for use in the TTS system
os.environ['HF_TOKEN'] = get_hf_token()
print("✅ Hugging Face token found and set!")


In [None]:
## Important: Hugging Face Token Setup

Before running this notebook, you need to set up your Hugging Face token:

1. Go to [Hugging Face](https://huggingface.co/settings/tokens)
2. Create a new token (or use an existing one)
3. Click the folder icon on the left sidebar in Colab
4. Click the "Files" tab
5. Click the three dots next to "Files" and select "Mount Drive"
6. Create a new file called `.env` with:
```
HF_TOKEN=your_token_here
```

This token is needed to download the Tortoise TTS models.


In [None]:
# Set up Hugging Face token
import os
from google.colab import files

def setup_hf_token():
    """Set up Hugging Face token from .env file or user input"""
    if os.path.exists('.env'):
        print("Found existing .env file")
        return
    
    print("Please enter your Hugging Face token (from https://huggingface.co/settings/tokens):")
    token = input().strip()
    
    with open('.env', 'w') as f:
        f.write(f"HF_TOKEN={token}")
    print("\nToken saved to .env file")

setup_hf_token()


In [None]:
# Check version and setup
import requests

def get_latest_commit():
    repo_api = "https://api.github.com/repos/juanvolpe/voiceJuan/commits/main"
    response = requests.get(repo_api)
    if response.status_code == 200:
        return response.json()['sha']
    return None

# Clone repository
!git clone https://github.com/juanvolpe/voiceJuan.git
%cd voiceJuan

# Check version
latest_commit = get_latest_commit()
if latest_commit:
    print(f"\nLatest GitHub commit: {latest_commit}")
    print("\nIf you want the latest version:")
    print("1. File → Reset runtime")
    print("2. Runtime → Run all")

# Install dependencies
!pip install -r requirements.txt

## Upload Voice Samples

Please prepare your WAV files with these requirements:
- Clear Spanish speech
- WAV format
- Good quality audio

Use the "Choose Files" button below to upload your samples:

In [None]:
from google.colab import files
import os

# Create directories
!mkdir -p tortoise/voices/juan_es/samples

# Upload interface
print("Please upload your WAV files...")
uploaded = files.upload()

# Save files
for filename in uploaded.keys():
    if filename.endswith('.wav'):
        path = f'tortoise/voices/juan_es/samples/{filename}'
        with open(path, 'wb') as f:
            f.write(uploaded[filename])
        print(f'Saved {filename}')

# List all uploaded samples
print("\nUploaded voice samples:")
!ls tortoise/voices/juan_es/samples/

## Generate Speech

Ready to generate speech with your voice samples! You will have two options:
1. Use existing voice cache (faster)
2. Reprocess voice samples (choose this if you added new samples)

Run the code below to begin:

In [None]:
from spanish_tortoise import SpanishTTS
from IPython.display import Audio

# Initialize TTS
tts = SpanishTTS()  # Will ask about cache usage

# Get text input
text = input("Enter Spanish text: ")

# Available presets
presets = ['ultra_fast', 'fast', 'standard', 'high_quality']
print("\nAvailable quality presets:")
for i, p in enumerate(presets, 1):
    print(f"{i}. {p}")

# Get preset choice
while True:
    choice = input("\nSelect quality (1-4) [default=2]: ").strip()
    if not choice:
        preset = 'fast'
        break
    try:
        idx = int(choice) - 1
        if 0 <= idx < len(presets):
            preset = presets[idx]
            break
    except ValueError:
        pass
    print("Please enter a number between 1 and 4")

# Generate speech
output_file = tts.generate_speech(text, preset=preset)

# Play the generated audio
Audio(output_file)

## Download Generated Audio

Click below to save the generated audio file to your computer:

In [None]:
files.download(output_file)