Chatterbox TTS Google Colab Script
==================================

A comprehensive script for text-to-speech generation and voice cloning using
Chatterbox TTS in Google Colab environment.

- Author: Ujjwal Nova
- License: MIT
- Repository: https://github.com/UKR-PROJECTS/chatterbox-tts-colab

Features:
- Automatic dependency installation with fallbacks
- Voice cloning from audio samples
- Long text processing with chunking
- Google Drive integration
- Robust error handling
- GPU/CPU automatic detection

In [None]:
import subprocess
import sys
import os
from pathlib import Path
import torch # Import torch early to check CUDA availability

def run_command(command, description=""):
    """Run a command and handle errors gracefully"""
    print(f"Running: {description if description else command}")
    try:
        result = subprocess.run(command, shell=True, capture_output=True, text=True)
        if result.returncode != 0:
            print(f"Warning: {description} failed with return code {result.returncode}")
            print(f"stderr: {result.stderr}")
            print(f"stdout: {result.stdout}")
            return False
        else:
            print(f"Success: {description}")
            return True
    except Exception as e:
        print(f"Error running command: {e}")
        return False

# 1. Install dependencies with proper error handling
print(f"Python version: {sys.version}")

# Update pip first
run_command("pip install --upgrade pip", "Upgrading pip")

# Check for CUDA availability before attempting CUDA installation
cuda_available = torch.cuda.is_available()
if cuda_available:
    print(f"CUDA is available: {torch.version.cuda}")
    # Install PyTorch with CUDA support, force reinstall in case of previous issues
    pytorch_success = run_command(
        "pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118 --force-reinstall",
        "Installing PyTorch with CUDA (force reinstall)"
    )

    if not pytorch_success:
        print("CUDA PyTorch installation failed. Trying CPU version...")
        run_command(
            "pip install torch torchaudio --force-reinstall",
            "Installing PyTorch (CPU fallback, force reinstall)"
        )
else:
    print("CUDA not available. Installing CPU version of PyTorch.")
    run_command(
        "pip install torch torchaudio --force-reinstall",
        "Installing PyTorch (CPU, force reinstall)"
    )

# Install git-lfs for handling large files
# Use check=True in run_command or add specific error handling if apt fails often
run_command("apt update && apt install -y git-lfs", "Installing git-lfs")

# Install chatterbox-tts using the official PyPI package
chatterbox_success = run_command(
    "pip install chatterbox-tts --no-cache-dir --force-reinstall",
    "Installing Chatterbox TTS (force reinstall)"
)

if not chatterbox_success:
    print("PyPI installation failed. Trying GitHub installation...")
    # Alternative: Install from GitHub
    run_command(
        "git clone https://github.com/resemble-ai/chatterbox.git /tmp/chatterbox",
        "Cloning Chatterbox repository"
    )
    run_command(
        "cd /tmp/chatterbox && pip install -e .",
        "Installing Chatterbox from source"
    )

# --- Added code to restart the kernel after installation ---
print("\nDependencies installed. Restarting kernel...")
get_ipython().kernel.do_shutdown(True) # Restart the kernel

# After the kernel restarts, the rest of the code will run in a fresh environment
# This will ensure that torchaudio is imported correctly

# The verification block and subsequent code (sections 2, 3, 4, 5) should be in
# separate cells in the Jupyter Notebook or Colab environment to run after the kernel restart.

Python version: 3.11.13 (main, Jun  4 2025, 08:57:29) [GCC 11.4.0]
Running: Upgrading pip
Success: Upgrading pip
CUDA is available: 11.8
Running: Installing PyTorch with CUDA (force reinstall)
Success: Installing PyTorch with CUDA (force reinstall)
Running: Installing git-lfs
Success: Installing git-lfs
Running: Installing Chatterbox TTS (force reinstall)


In [None]:
# 2. Setup Google Drive and create directory
from google.colab import drive

def setup_drive():
    """Setup Google Drive mount and create necessary directories"""
    try:
        drive.mount('/content/drive')
        drive_path = '/content/drive/MyDrive/Chatterbox'
        os.makedirs(drive_path, exist_ok=True)
        print(f"✓ Drive setup complete: {drive_path}")
        return drive_path
    except Exception as e:
        print(f"✗ Drive setup failed: {e}")
        return None

DRIVE_PATH = setup_drive()

# 3. Enhanced model loading with correct API usage
def load_model(max_retries=3):
    """Load the Chatterbox model with retry logic"""
    import torch

    try:
        device = 'cuda' if torch.cuda.is_available() else 'cpu'
        print(f"Loading model on device: {device}")

        # Import here to avoid import errors if package isn't installed
        from chatterbox.tts import ChatterboxTTS

        # Correct API usage - only pass device parameter
        model = ChatterboxTTS.from_pretrained(device=device)
        print("✓ Model loaded successfully")
        return model

    except Exception as e:
        print(f"✗ Model loading failed: {e}")
        print("Trying CPU fallback...")

        try:
            # Fallback to CPU if CUDA fails
            model = ChatterboxTTS.from_pretrained(device="cpu")
            print("✓ Model loaded successfully on CPU")
            return model
        except Exception as e2:
            print(f"✗ CPU fallback also failed: {e2}")
            raise e2

# 4. Helper functions
def split_into_chunks(text, max_words=100):
    """Split text into manageable chunks"""
    words = text.strip().split()
    for i in range(0, len(words), max_words):
        yield ' '.join(words[i:i+max_words])

# 5. Main execution with comprehensive error handling
def main():
    """Main execution function"""

    # Your input text
    long_text = """
    This is a test of the Chatterbox TTS system.
    I hope this works properly now with the improved error handling and correct repository.
    The model should now load from ResembleAI/chatterbox instead of the old fluffyox repository.
    """

    # Optional voice sample path
    if DRIVE_PATH:
        SAMPLE_PATH = f"{DRIVE_PATH}/my_voice_sample2.wav"
    else:
        SAMPLE_PATH = None
        print("Drive not mounted, voice cloning disabled")

    # Try to load the model
    try:
        print("Loading Chatterbox model...")
        model = load_model()
    except Exception as e:
        print(f"Failed to load model: {e}")
        print("Please check your internet connection and try again")
        return

    # Generate speech
    try:
        import torch
        import torchaudio

        wav_tensors = []
        chunks = list(split_into_chunks(long_text, max_words=50))

        print(f"Processing {len(chunks)} chunks...")

        for i, chunk in enumerate(chunks):
            print(f"Processing chunk {i+1}/{len(chunks)}: '{chunk[:50]}...'")

            try:
                # Check if voice sample exists for cloning
                if SAMPLE_PATH and os.path.exists(SAMPLE_PATH):
                    print("Using voice cloning...")
                    wav = model.generate(
                        text=chunk,
                        audio_prompt_path=SAMPLE_PATH,
                        exaggeration=0.6,
                        cfg_weight=0.5
                    )
                else:
                    if SAMPLE_PATH:
                        print(f"Voice sample not found at {SAMPLE_PATH}, using default voice")
                    wav = model.generate(chunk)

                wav_tensors.append(wav)

                # Clear GPU memory if available
                if torch.cuda.is_available():
                    torch.cuda.empty_cache()

            except Exception as e:
                print(f"Error generating chunk {i+1}: {e}")
                continue

        # Save the final audio
        if wav_tensors and DRIVE_PATH:
            full_audio = torch.cat(wav_tensors, dim=1)
            output_file = f"{DRIVE_PATH}/generated_speech.wav"
            torchaudio.save(output_file, full_audio, model.sr)
            print(f"✓ Audio saved to: {output_file}")
        elif wav_tensors:
            print("Audio generated but no drive path available for saving")
        else:
            print("No audio was generated")

    except Exception as e:
        print(f"Error during speech generation: {e}")

# Run the main function
if __name__ == "__main__":
    main()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
✓ Drive setup complete: /content/drive/MyDrive/Chatterbox
Loading Chatterbox model...
Loading model on device: cuda
✗ Model loading failed: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
Trying CPU fallback...


ve.safetensors:   0%|          | 0.00/5.70M [00:00<?, ?B/s]

t3_cfg.safetensors:   0%|          | 0.00/2.13G [00:00<?, ?B/s]

s3gen.safetensors:   0%|          | 0.00/1.06G [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/25.5k [00:00<?, ?B/s]

conds.pt:   0%|          | 0.00/107k [00:00<?, ?B/s]

  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)


loaded PerthNet (Implicit) at step 250,000
✓ Model loaded successfully on CPU
Processing 1 chunks...
Processing chunk 1/1: 'This is a test of the Chatterbox TTS system. I hop...'
Using voice cloning...


  self.gen = func(*args, **kwds)
Sampling:   0%|          | 0/1000 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)
Sampling:  33%|███▎      | 330/1000 [01:31<03:06,  3.60it/s]


✓ Audio saved to: /content/drive/MyDrive/Chatterbox/generated_speech.wav
