<a href="https://colab.research.google.com/github/rajstories/Voice_clonning_Model/blob/main/Voice_Cloning_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üéôÔ∏è Voice Cloning Notebook - Your Personal Edition

## Quick Start Guide:
1. Click **Runtime** ‚Üí **Run all** (or press Ctrl+F9)
2. Wait 3-4 minutes for setup
3. A link will appear below - click it
4. Upload your voice, type text, generate! ‚úÖ

---

### üìã What You Need:
- Your voice recording (15-30 seconds, quiet room)
- Text of what you said in the recording
- New script you want your voice to speak

### üåü Features:
- ‚úÖ Works with Hindi, English, Hinglish
- ‚úÖ Free Google Colab GPU
- ‚úÖ No coding needed
- ‚úÖ Download your cloned voice

---

In [None]:
# ============================================
# STEP 1: Install Packages
# ============================================
# This takes 2-3 minutes - please be patient!

print("üì¶ Installing required packages...")
print("‚è≥ This will take 2-3 minutes - please wait...\n")

!pip install -q TTS
!pip install -q gradio

print("\n‚úÖ Installation complete!")

In [None]:
# ============================================
# STEP 2: Import Libraries
# ============================================

import torch
from TTS.api import TTS
import gradio as gr
import os
from pathlib import Path

print("‚úÖ Libraries loaded successfully!")

In [None]:
# ============================================
# STEP 3: Load Voice Cloning Model
# ============================================
# Loading XTTS v2 - best for Hindi/English
# This takes 1-2 minutes

print("ü§ñ Loading XTTS v2 model...")
print("‚è≥ Please wait 1-2 minutes...\n")

device = "cuda" if torch.cuda.is_available() else "cpu"
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)

print(f"\n‚úÖ Model loaded on {device}!")
print(f"üí° GPU Available: {torch.cuda.is_available()}")

In [None]:
# ============================================
# STEP 4: Define Voice Cloning Function
# ============================================

def clone_voice(reference_audio, reference_text, generation_text, language):
    """
    Clone your voice and generate new speech
    """

    print("üéØ Starting voice cloning...")

    # Create output directory
    output_dir = Path("/content/outputs")
    output_dir.mkdir(exist_ok=True)

    # Output file path
    output_file = output_dir / "cloned_voice.wav"

    try:
        # Generate cloned voice
        tts.tts_to_file(
            text=generation_text,
            speaker_wav=reference_audio,
            language=language,
            file_path=str(output_file)
        )

        print("‚úÖ Voice cloning successful!")
        print(f"üìÅ Saved to: {output_file}")

        return str(output_file), "‚úÖ Success! Your cloned voice is ready. Download below!"

    except Exception as e:
        error_msg = f"‚ùå Error: {str(e)}"
        print(error_msg)
        return None, error_msg

print("‚úÖ Voice cloning function ready!")

In [None]:
# ============================================
# STEP 5: Launch User Interface
# ============================================

print("üé® Creating user interface...\n")

interface = gr.Interface(
    fn=clone_voice,
    inputs=[
        gr.Audio(
            type="filepath",
            label="üì§ Upload Your Voice Recording (15-30 seconds)",
        ),
        gr.Textbox(
            label="üìù Reference Text (What You Said)",
            placeholder="Example: Namaste doston, mera naam Raj hai...",
            lines=3,
        ),
        gr.Textbox(
            label="üéØ Generation Text (New Content)",
            placeholder="Example: Aaj hum baat karenge AI ke baare mein...",
            lines=5,
        ),
        gr.Dropdown(
            choices=["hi", "en", "es", "fr", "de", "it", "pt", "ru", "zh-cn"],
            value="hi",
            label="üåê Language",
        )
    ],
    outputs=[
        gr.Audio(label="üîä Your Cloned Voice"),
        gr.Textbox(label="üìä Status")
    ],
    title="üéôÔ∏è Voice Cloning Tool",
    description="""
    ### How to Use:
    1. Upload your voice (15-30 sec, clear audio)
    2. Type what you said (use Roman script for Hindi: "Namaste" not "‡§®‡§Æ‡§∏‡•ç‡§§‡•á")
    3. Type new content you want your voice to speak
    4. Select language (hi=Hindi, en=English)
    5. Click Submit ‚Üí Wait 20-30 sec ‚Üí Download!

    ### Tips:
    - Record in quiet room, no background noise
    - Reference text must EXACTLY match your audio
    - For Hindi, use Roman script (better results)
    """,
    theme=gr.themes.Soft(),
)

print("üöÄ Launching interface...")
print("="*50)
print("‚úÖ READY! Click the link below:")
print("="*50)

interface.launch(share=True, debug=True)

---

## üìå Troubleshooting Guide

### Problem: Voice doesn't sound like me
**Solution:**
- Make sure reference text EXACTLY matches your audio
- Use longer audio (20-30 sec better)
- Re-record with clearer pronunciation

### Problem: Hindi pronunciation is wrong
**Solution:**
- Use Roman script: "Namaste doston" ‚úÖ
- NOT Devanagari: "‡§®‡§Æ‡§∏‡•ç‡§§‡•á ‡§¶‡•ã‡§∏‡•ç‡§§‡•ã‡§Ç" ‚ùå
- Make sure language is set to 'hi'

### Problem: Audio quality is poor
**Solution:**
- Record in quieter room
- Speak closer to mic
- Use WAV format if possible

### Problem: Want to generate multiple scripts
**Solution:**
- Keep same reference audio
- Just change "Generation Text"
- Submit again!

---

## üí° Pro Tips for Content Creators:

1. **One-time setup:** Record ONE perfect 30-second sample ‚Üí save it forever
2. **Daily workflow:** Write script ‚Üí paste here ‚Üí generate ‚Üí download ‚Üí add to video
3. **Batch processing:** Generate 5-10 voiceovers in one session
4. **Quality check:** Always listen before using in final video

---

### üéØ Your Content Creation Workflow:

```
Write video script
      ‚Üì
Open this Colab notebook
      ‚Üì
Upload your voice sample (reuse same one)
      ‚Üì
Paste script ‚Üí Generate
      ‚Üì
Download .wav file
      ‚Üì
Import to video editor
      ‚Üì
Upload to YouTube ‚úÖ
```

**Time saved:** 80-90% of voice recording time!

---