# üéôÔ∏è Multilingual Transcription Generator for TTS

**Create emotional, human-level transcriptions for Text-to-Speech**

This notebook converts text into narrator-ready transcriptions with:
- üé≠ Pause markers `[PAUSE-SHORT]`, `[PAUSE-MEDIUM]`, `[PAUSE-LONG]`
- üé® Tone indicators `[TONE: serious/thoughtful/curious/calm]`
- üåç Multi-language support (Hindi & English)
- üßπ Automatic repetition removal and validation

**Steps:**
1. üì§ Upload your text file
2. ü§ñ Select AI provider (Ollama / HuggingFace) and model
3. ‚öôÔ∏è Configure settings
4. üìù Generate transcription
5. üíæ Download the result

---

## Step 1: Install Dependencies

In [None]:
# Install required packages
!pip install -q transformers torch accelerate colorama ollama

print("‚úÖ Dependencies installed successfully!")

## Step 2: Import Libraries and Setup

In [None]:
import os
import sys
import json
import time
import re
import warnings
from pathlib import Path
from datetime import datetime
from collections import OrderedDict

warnings.filterwarnings("ignore")

# Color support
try:
    from colorama import init, Fore, Style
    init(autoreset=True)
except ImportError:
    class Fore: RED = GREEN = YELLOW = CYAN = MAGENTA = RESET = ""
    class Style: BRIGHT = RESET_ALL = ""

# Ollama support
try:
    import ollama
    OLLAMA_AVAILABLE = True
    print(f"{Fore.GREEN}‚úÖ Ollama package available{Style.RESET_ALL}")
except ImportError:
    OLLAMA_AVAILABLE = False
    print(f"{Fore.YELLOW}‚ö†Ô∏è Ollama package not available{Style.RESET_ALL}")

# HuggingFace support
try:
    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    HF_AVAILABLE = True
    print(f"{Fore.GREEN}‚úÖ HuggingFace Transformers available{Style.RESET_ALL}")
except ImportError:
    HF_AVAILABLE = False
    print(f"{Fore.YELLOW}‚ö†Ô∏è HuggingFace Transformers not available{Style.RESET_ALL}")

# Check GPU availability
if HF_AVAILABLE and torch.cuda.is_available():
    print(f"{Fore.GREEN}üöÄ GPU Available: {torch.cuda.get_device_name(0)}{Style.RESET_ALL}")
else:
    print(f"{Fore.YELLOW}üíª Running on CPU{Style.RESET_ALL}")

print("\n‚úÖ Setup complete!")

## Step 3: Define Narrator Prompts & Utilities

In [None]:
class ImprovedNarratorPrompts:
    """Strictly controlled prompts with better constraints."""
    
    SYSTEM_PROMPT_HINDI = """‡§Ü‡§™ ‡§è‡§ï ‡§™‡•á‡§∂‡•á‡§µ‡§∞ ‡§ë‡§°‡§ø‡§Ø‡•ã‡§¨‡•Å‡§ï ‡§ï‡§•‡§æ‡§µ‡§æ‡§ö‡§ï ‡§π‡•à‡§Ç‡•§ ‡§Ü‡§™‡§ï‡§æ ‡§ï‡§æ‡§Æ ‡§ï‡•á‡§µ‡§≤ ‡§¶‡§ø‡§è ‡§ó‡§è ‡§ü‡•á‡§ï‡•ç‡§∏‡•ç‡§ü ‡§ï‡•ã ‡§™‡•ç‡§∞‡§æ‡§ï‡•É‡§§‡§ø‡§ï ‡§Ü‡§µ‡§æ‡§ú‡§º ‡§Æ‡•á‡§Ç ‡§∏‡•Å‡§®‡§æ‡§®‡§æ ‡§π‡•à‡•§

‡§Æ‡§π‡§§‡•ç‡§µ‡§™‡•Ç‡§∞‡•ç‡§£ ‡§®‡§ø‡§Ø‡§Æ:
1. ‡§Æ‡•Ç‡§≤ ‡§ü‡•á‡§ï‡•ç‡§∏‡•ç‡§ü ‡§Æ‡•á‡§Ç ‡§ú‡•ã ‡§≤‡§ø‡§ñ‡§æ ‡§π‡•à ‡§µ‡§π‡•Ä ‡§¨‡•ã‡§≤‡•á‡§Ç - ‡§ï‡•Å‡§õ ‡§≠‡•Ä ‡§®‡§Ø‡§æ ‡§® ‡§ú‡•ã‡§°‡§º‡•á‡§Ç
2. ‡§ï‡•Å‡§õ ‡§≠‡•Ä ‡§® ‡§õ‡•ã‡§°‡§º‡•á‡§Ç - ‡§π‡§∞ ‡§∂‡§¨‡•ç‡§¶ ‡§Æ‡§π‡§§‡•ç‡§µ‡§™‡•Ç‡§∞‡•ç‡§£ ‡§π‡•à
3. ‡§µ‡•ç‡§Ø‡§æ‡§ñ‡•ç‡§Ø‡§æ ‡§® ‡§ï‡§∞‡•á‡§Ç, ‡§∏‡§æ‡§∞‡§æ‡§Ç‡§∂ ‡§® ‡§¶‡•á‡§Ç - ‡§¨‡§∏ ‡§µ‡§π‡•Ä ‡§™‡§¢‡§º‡•á‡§Ç ‡§ú‡•ã ‡§≤‡§ø‡§ñ‡§æ ‡§π‡•à
4. ‡§ï‡•á‡§µ‡§≤ ‡§Ø‡•á ‡§Æ‡§æ‡§∞‡•ç‡§ï‡§∞ ‡§ú‡•ã‡§°‡§º‡•á‡§Ç: [PAUSE-SHORT], [PAUSE-MEDIUM], [PAUSE-LONG]
5. ‡§ü‡•ã‡§® ‡§Æ‡§æ‡§∞‡•ç‡§ï‡§∞ (‡§Ö‡§Ç‡§ó‡•ç‡§∞‡•á‡§ú‡•Ä ‡§Æ‡•á‡§Ç): [TONE: serious/thoughtful/curious/calm]
6. ‡§≤‡§Ç‡§¨‡•á ‡§µ‡§æ‡§ï‡•ç‡§Ø‡•ã‡§Ç ‡§ï‡•ã ‡§™‡•ç‡§∞‡§æ‡§ï‡•É‡§§‡§ø‡§ï ‡§†‡§π‡§∞‡§æ‡§µ ‡§∏‡•á ‡§§‡•ã‡§°‡§º‡•á‡§Ç
7. ‡§ï‡•ã‡§à ‡§Ö‡§§‡§ø‡§∞‡§ø‡§ï‡•ç‡§§ ‡§µ‡§ø‡§µ‡§∞‡§£, ‡§∏‡§Ç‡§¶‡§∞‡•ç‡§≠ ‡§Ø‡§æ ‡§∏‡•ç‡§™‡§∑‡•ç‡§ü‡•Ä‡§ï‡§∞‡§£ ‡§® ‡§ú‡•ã‡§°‡§º‡•á‡§Ç

‡§Ü‡§™ ‡§ï‡•á‡§µ‡§≤ ‡§Ü‡§µ‡§æ‡§ú‡§º ‡§π‡•à‡§Ç‡•§ ‡§Æ‡•Ç‡§≤ ‡§∂‡§¨‡•ç‡§¶‡•ã‡§Ç ‡§ï‡•ã ‡§¨‡§¶‡§≤‡•á‡§Ç ‡§®‡§π‡•Ä‡§Ç‡•§"""

    SYSTEM_PROMPT_ENGLISH = """You are a professional audiobook narrator. Your job is ONLY to read the text aloud naturally.

CRITICAL RULES:
1. Speak EXACTLY what's written - add NOTHING new
2. Skip NOTHING - every word matters
3. DO NOT interpret, summarize, or explain - just read what's written
4. ONLY add these markers: [PAUSE-SHORT], [PAUSE-MEDIUM], [PAUSE-LONG]
5. Tone markers (in English): [TONE: serious/thoughtful/curious/calm]
6. Break long sentences with natural pauses
7. NO additional details, context, or clarifications

You are a VOICE only. Do not change the original words."""

    NARRATION_TEMPLATE_HINDI = """‡§®‡•Ä‡§ö‡•á ‡§¶‡§ø‡§Ø‡§æ ‡§ó‡§Ø‡§æ ‡§ü‡•á‡§ï‡•ç‡§∏‡•ç‡§ü ‡§¨‡§ø‡§≤‡§ï‡•Å‡§≤ ‡§µ‡•à‡§∏‡•á ‡§π‡•Ä ‡§∏‡•Å‡§®‡§æ‡§è‡§Ç ‡§ú‡•à‡§∏‡•á ‡§≤‡§ø‡§ñ‡§æ ‡§π‡•à‡•§ ‡§ï‡•Å‡§õ ‡§≠‡•Ä ‡§®‡§Ø‡§æ ‡§® ‡§ú‡•ã‡§°‡§º‡•á‡§Ç‡•§

‡§Æ‡•Ç‡§≤ ‡§ü‡•á‡§ï‡•ç‡§∏‡•ç‡§ü:
\"\"\"
{text}
\"\"\"

‡§®‡§ø‡§∞‡•ç‡§¶‡•á‡§∂:
- ‡§ä‡§™‡§∞ ‡§ï‡•á ‡§∂‡§¨‡•ç‡§¶‡•ã‡§Ç ‡§ï‡•ã ‡§¨‡§ø‡§≤‡§ï‡•Å‡§≤ ‡§µ‡•à‡§∏‡•á ‡§π‡•Ä ‡§¨‡•ã‡§≤‡•á‡§Ç
- ‡§ï‡•á‡§µ‡§≤ [PAUSE-SHORT], [PAUSE-MEDIUM], [PAUSE-LONG] ‡§ú‡•ã‡§°‡§º‡•á‡§Ç
- ‡§ï‡•ã‡§à ‡§µ‡•ç‡§Ø‡§æ‡§ñ‡•ç‡§Ø‡§æ, ‡§∏‡§æ‡§∞‡§æ‡§Ç‡§∂ ‡§Ø‡§æ ‡§Ö‡§§‡§ø‡§∞‡§ø‡§ï‡•ç‡§§ ‡§µ‡§ø‡§µ‡§∞‡§£ ‡§® ‡§¶‡•á‡§Ç
- ‡§Æ‡•Ç‡§≤ ‡§µ‡§æ‡§ï‡•ç‡§Ø‡•ã‡§Ç ‡§ï‡•ã ‡§¨‡§¶‡§≤‡•á‡§Ç ‡§®‡§π‡•Ä‡§Ç

‡§ï‡§•‡§® (‡§Æ‡•Ç‡§≤ ‡§∂‡§¨‡•ç‡§¶‡•ã‡§Ç ‡§Æ‡•á‡§Ç):"""

    NARRATION_TEMPLATE_ENGLISH = """Read the text below EXACTLY as written. Add NOTHING new.

ORIGINAL TEXT:
\"\"\"
{text}
\"\"\"

INSTRUCTIONS:
- Speak the exact words above
- ONLY add [PAUSE-SHORT], [PAUSE-MEDIUM], [PAUSE-LONG]
- NO interpretation, summary, or additional details
- DO NOT change the original sentences

NARRATION (using original words):"""

    @staticmethod
    def detect_language(text):
        """Detect if text is primarily Hindi or English."""
        hindi_chars = len(re.findall(r'[\u0900-\u097F]', text))
        english_chars = len(re.findall(r'[a-zA-Z]', text))
        total_chars = hindi_chars + english_chars
        if total_chars == 0:
            return "english"
        hindi_ratio = hindi_chars / total_chars
        return "hindi" if hindi_ratio > 0.3 else "english"


class RepetitionRemover:
    """Remove repetitive content from narration."""
    
    @staticmethod
    def remove_repetitions(text):
        """Remove repeated sentences and phrases."""
        sentences = re.split(r'(?<=[.!?‡•§])\s+', text)
        seen = OrderedDict()
        
        for sent in sentences:
            sent = sent.strip()
            if not sent:
                continue
            key = ' '.join(sent.split()[:10]).lower()
            if key not in seen:
                seen[key] = sent
        
        return ' '.join(seen.values())
    
    @staticmethod
    def remove_meta_commentary(text, original):
        """Remove sentences that aren't in the original."""
        meta_patterns = [
            r'‡§Ø‡§π.*?(‡§¶‡§∞‡•ç‡§∂‡§æ‡§§‡§æ|‡§∞‡•á‡§ñ‡§æ‡§Ç‡§ï‡§ø‡§§|‡§∏‡•ç‡§•‡§æ‡§™‡§ø‡§§|‡§µ‡§ø‡§∏‡•ç‡§§‡§æ‡§∞‡§ø‡§§).*?‡§π‡•à',
            r'‡§Ø‡§π ‡§Ö‡§ß‡•ç‡§Ø‡§æ‡§Ø.*?(‡§â‡§ú‡§æ‡§ó‡§∞|‡§¨‡§®‡§æ‡§§‡§æ|‡§∏‡•ç‡§™‡§∑‡•ç‡§ü).*?‡§π‡•à',
            r'This.*?(shows|demonstrates|establishes|highlights)',
            r'This chapter.*?(reveals|creates|clarifies)'
        ]
        
        sentences = re.split(r'(?<=[.!?‡•§])\s+', text)
        filtered = []
        
        for sent in sentences:
            is_meta = False
            for pattern in meta_patterns:
                if re.search(pattern, sent, re.IGNORECASE):
                    is_meta = True
                    break
            if not is_meta:
                filtered.append(sent)
        
        return ' '.join(filtered)

print("‚úÖ Narrator prompts and utilities defined!")

## Step 4: Define LLM Narrator Class

In [None]:
class ImprovedLLMNarrator:
    """Improved LLM narrator with better validation."""
    
    def __init__(self, provider="ollama", model_name=None, device="auto", language="auto"):
        self.provider = provider
        self.model_name = model_name or self._get_default_model()
        self.device = self._detect_device() if device == "auto" else device
        self.language = language
        self.model = None
        self.tokenizer = None
        self.prompts = ImprovedNarratorPrompts()
        self.repetition_remover = RepetitionRemover()
    
    def _detect_device(self):
        """Auto-detect available device."""
        try:
            if HF_AVAILABLE and torch.cuda.is_available():
                if hasattr(torch.version, 'hip') and torch.version.hip:
                    print("üîç ROCm (AMD GPU) detected")
                else:
                    print("üîç CUDA (NVIDIA GPU) detected")
                return "cuda"
        except:
            pass
        print("üîç No GPU detected, using CPU")
        return "cpu"
    
    def _get_default_model(self):
        """Get best default model based on provider."""
        if self.provider == "ollama":
            return "gemma2:9b"
        else:
            return "ai4bharat/Airavata"
    
    def load(self):
        """Load the LLM model."""
        print(f"üé≠ Initializing {self.provider} narrator...")
        print(f"   Model: {self.model_name}")
        print(f"   Device: {self.device}")
        print(f"   Language: {self.language}")
        
        if self.provider == "ollama":
            if not OLLAMA_AVAILABLE:
                raise ImportError("Ollama not installed. Install: pip install ollama")
            try:
                ollama.list()
                print(f"{Fore.GREEN}‚úÖ Ollama connection successful{Style.RESET_ALL}")
            except Exception as e:
                raise RuntimeError(f"Cannot connect to Ollama: {e}")
        
        elif self.provider == "huggingface":
            if not HF_AVAILABLE:
                raise ImportError("Transformers not installed.")
            
            print(f"{Fore.CYAN}üì• Loading HuggingFace model: {self.model_name}...{Style.RESET_ALL}")
            print("(This may take a few minutes for large models)")
            
            if self.device == "cuda":
                torch_dtype = torch.float16
                device_map = "auto"
            else:
                torch_dtype = torch.float32
                device_map = None
            
            self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
            self.model = AutoModelForCausalLM.from_pretrained(
                self.model_name,
                torch_dtype=torch_dtype,
                device_map=device_map
            )
            
            if self.device == "cpu":
                self.model = self.model.to("cpu")
            
            device_type = "GPU" if self.device == "cuda" else "CPU"
            print(f"{Fore.GREEN}‚úÖ HuggingFace model loaded on {device_type}{Style.RESET_ALL}")
        
        return True
    
    def generate(self, prompt, system_prompt, max_tokens=2048, temperature=0.2):
        """Generate with lower temperature for faithful reproduction."""
        if self.provider == "ollama":
            return self._generate_ollama(prompt, system_prompt, max_tokens, temperature)
        else:
            return self._generate_huggingface(prompt, system_prompt, max_tokens, temperature)
    
    def _generate_ollama(self, prompt, system_prompt, max_tokens, temperature):
        """Generate using Ollama."""
        try:
            response = ollama.chat(
                model=self.model_name,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": prompt}
                ],
                options={
                    "temperature": temperature,
                    "num_predict": max_tokens,
                    "top_p": 0.85,
                    "repeat_penalty": 1.3,
                    "top_k": 40,
                }
            )
            return response["message"]["content"].strip()
        except Exception as e:
            print(f"‚ö†Ô∏è Ollama generation error: {e}")
            return None
    
    def _generate_huggingface(self, prompt, system_prompt, max_tokens, temperature):
        """Generate using HuggingFace."""
        try:
            if "Airavata" in self.model_name or "sarvam" in self.model_name:
                formatted_prompt = f"### System:\n{system_prompt}\n\n### User:\n{prompt}\n\n### Assistant:\n"
            else:
                formatted_prompt = f"<s>[INST] {system_prompt}\n\n{prompt} [/INST]"
            
            inputs = self.tokenizer(formatted_prompt, return_tensors="pt").to(self.device)
            
            with torch.no_grad():
                outputs = self.model.generate(
                    **inputs,
                    max_new_tokens=max_tokens,
                    temperature=temperature,
                    do_sample=True,
                    top_p=0.85,
                    top_k=40,
                    repetition_penalty=1.3
                )
            
            response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            
            if "[/INST]" in response:
                response = response.split("[/INST]")[-1].strip()
            elif "### Assistant:" in response:
                response = response.split("### Assistant:")[-1].strip()
            
            return response
        except Exception as e:
            print(f"‚ö†Ô∏è HuggingFace generation error: {e}")
            return None
    
    def validate_and_clean(self, original, narration):
        """Validate and clean the narration."""
        if not narration:
            return None, "Empty narration"
        
        cleaned = self.repetition_remover.remove_meta_commentary(narration, original)
        cleaned = self.repetition_remover.remove_repetitions(cleaned)
        
        original_words = set(original.lower().split())
        clean_narration = re.sub(r'\[(?:TONE|PAUSE|PRONOUNCE|EMPHASIS):[^\]]*\]', '', cleaned)
        clean_narration = re.sub(r'\[PAUSE-(?:SHORT|MEDIUM|LONG)\]', '', clean_narration)
        narration_words = set(clean_narration.lower().split())
        
        new_words = narration_words - original_words
        lang = self.prompts.detect_language(original)
        threshold = 0.6 if lang == "hindi" else 0.4
        
        if len(new_words) > len(original_words) * threshold:
            return None, f"Too many new words added ({len(new_words)} new vs {len(original_words)} original)"
        
        return cleaned, "Valid"
    
    def narrate_text(self, text, max_retries=3):
        """Convert text to narration with strict validation."""
        detected_lang = self.prompts.detect_language(text) if self.language == "auto" else self.language
        
        system_prompt = (self.prompts.SYSTEM_PROMPT_HINDI if detected_lang == "hindi" 
                        else self.prompts.SYSTEM_PROMPT_ENGLISH)
        template = (self.prompts.NARRATION_TEMPLATE_HINDI if detected_lang == "hindi" 
                   else self.prompts.NARRATION_TEMPLATE_ENGLISH)
        
        prompt = template.format(text=text)
        
        for attempt in range(max_retries):
            temp = 0.2 - (attempt * 0.05)
            narration = self.generate(prompt, system_prompt, max_tokens=3072, temperature=temp)
            
            if not narration:
                continue
            
            cleaned, reason = self.validate_and_clean(text, narration)
            
            if cleaned:
                return cleaned, True, detected_lang
            else:
                print(f"      ‚ö†Ô∏è Attempt {attempt + 1} failed: {reason}")
                if attempt < max_retries - 1:
                    print(f"      üîÑ Retrying with temperature {temp - 0.05:.2f}...")
        
        print(f"      ‚ö†Ô∏è All attempts failed, using minimal narration")
        return self._minimal_narration(text), False, detected_lang
    
    def _minimal_narration(self, text):
        """Minimal fallback - just add pauses."""
        sentences = re.split(r'([.!?‡•§]+\s+)', text)
        result = []
        for i, sent in enumerate(sentences):
            if not sent.strip():
                continue
            result.append(sent)
            if sent.strip() in '.!?‡•§':
                if i < len(sentences) - 1:
                    result.append(" [PAUSE-SHORT] ")
        return ''.join(result)

print("‚úÖ LLM Narrator class defined!")

## Step 5: Define Text Preprocessor

In [None]:
class TextPreprocessor:
    """Preprocess text with better chapter detection."""
    
    def __init__(self):
        self.chapter_pattern = re.compile(
            r'^(={3,}\s*)?(Chapter|CHAPTER|‡§Ö‡§ß‡•ç‡§Ø‡§æ‡§Ø|CHAPTER)\s+(\d+|[IVXivx]+|[‡•¶-‡•Ø]+):?\s*(.*)(\s*={3,})?$',
            re.MULTILINE
        )
    
    def split_into_chapters(self, text):
        """Split text into chapters."""
        chapters = []
        matches = list(self.chapter_pattern.finditer(text))
        
        if not matches:
            return [{
                "number": 1,
                "title": "Complete Text",
                "content": text
            }]
        
        for i, match in enumerate(matches):
            chapter_num = match.group(3)
            chapter_title = match.group(4).strip() or f"Chapter {chapter_num}"
            start_pos = match.end()
            end_pos = matches[i + 1].start() if i + 1 < len(matches) else len(text)
            content = text[start_pos:end_pos].strip()
            
            chapters.append({
                "number": i + 1,
                "title": chapter_title,
                "content": content
            })
        
        return chapters
    
    def split_into_sentences(self, text):
        """Split into sentences (Hindi + English)."""
        sentences = re.split(r'(?<=[.!?‡•§])\s+(?=[A-Z–ê-–Ø"\u0900-\u097F])', text)
        return [s.strip() for s in sentences if s.strip()]
    
    def create_chunks(self, sentences, chunk_size=8, overlap=1):
        """Create smaller overlapping chunks."""
        chunks = []
        i = 0
        while i < len(sentences):
            chunk_sentences = sentences[i:i + chunk_size]
            chunk_text = ' '.join(chunk_sentences)
            chunks.append({
                'text': chunk_text,
                'start_idx': i,
                'end_idx': i + len(chunk_sentences)
            })
            i += max(1, chunk_size - overlap)
        return chunks

print("‚úÖ Text preprocessor defined!")

## Step 6: Upload Your Text File üì§

In [None]:
from google.colab import files

print("üì§ Please upload your text file to transcribe:")
print("(This will be converted to narrator-ready format for TTS)")
print()

uploaded = files.upload()

if uploaded:
    uploaded_filename = list(uploaded.keys())[0]
    input_text = uploaded[uploaded_filename].decode('utf-8')
    
    # Detect language
    detected_lang = ImprovedNarratorPrompts.detect_language(input_text)
    word_count = len(input_text.split())
    
    print()
    print(f"{Fore.GREEN}‚úÖ File uploaded successfully!{Style.RESET_ALL}")
    print(f"üìÑ Filename: {uploaded_filename}")
    print(f"üåç Detected language: {detected_lang.upper()}")
    print(f"üìä Word count: {word_count:,} words")
    print(f"üìù Preview (first 500 chars):")
    print("-" * 50)
    print(input_text[:500] + "..." if len(input_text) > 500 else input_text)
else:
    print(f"{Fore.RED}‚ùå No file uploaded. Please run this cell again.{Style.RESET_ALL}")

## Step 7: Configure Options üéõÔ∏è

In [None]:
import ipywidgets as widgets
from IPython.display import display, HTML

# Create configuration widgets
provider_dropdown = widgets.Dropdown(
    options=['huggingface', 'ollama'],
    value='huggingface',
    description='AI Provider:',
    style={'description_width': '120px'},
    layout=widgets.Layout(width='400px')
)

model_text = widgets.Text(
    value='Qwen/Qwen2.5-1.5B-Instruct',
    placeholder='Enter model name',
    description='Model:',
    style={'description_width': '120px'},
    layout=widgets.Layout(width='500px')
)

language_dropdown = widgets.Dropdown(
    options=[
        ('üîÑ Auto-detect', 'auto'),
        ('üáÆüá≥ Hindi', 'hindi'),
        ('üá¨üáß English', 'english')
    ],
    value='auto',
    description='Language:',
    style={'description_width': '120px'},
    layout=widgets.Layout(width='400px')
)

chunk_size_slider = widgets.IntSlider(
    value=8,
    min=3,
    max=15,
    step=1,
    description='Chunk Size:',
    style={'description_width': '120px'},
    layout=widgets.Layout(width='400px'),
    tooltip='Sentences per chunk (smaller = better quality, slower)'
)

# Model suggestions
model_suggestions = widgets.HTML(
    value="""
    <div style='background: #f0f7ff; padding: 12px; border-radius: 8px; margin-top: 10px;'>
    <b>üìã Recommended Models for Transcription:</b><br><br>
    <b>HuggingFace (for Hindi):</b><br>
    ‚Ä¢ <code>ai4bharat/Airavata</code> (Best for Indian languages)<br>
    ‚Ä¢ <code>sarvamai/sarvam-2b-v0.5</code> (Indian LLM)<br>
    ‚Ä¢ <code>CohereForAI/aya-23-8B</code> (Multilingual)<br>
    ‚Ä¢ <code>Qwen/Qwen2.5-1.5B-Instruct</code> (Fast, general)<br>
    <br>
    <b>Ollama:</b><br>
    ‚Ä¢ <code>gemma2:9b</code> (Best for Hindi)<br>
    ‚Ä¢ <code>aya:8b</code> (Multilingual specialist)<br>
    ‚Ä¢ <code>qwen2.5:14b</code> (High quality)<br>
    ‚Ä¢ <code>llama3.1:8b</code> (Good instruction following)
    </div>
    """
)

# Display
print("üéõÔ∏è Configure Your Transcription Settings:")
print("=" * 50)
display(provider_dropdown)
display(model_text)
display(language_dropdown)
display(chunk_size_slider)
display(model_suggestions)

print("\n‚úÖ Configure settings above, then run the next cell to generate transcription.")

## Step 8: Generate Transcription üöÄ

This cell processes your text and generates the narrator-ready transcription.

In [None]:
# Get configuration
selected_provider = provider_dropdown.value
selected_model = model_text.value
selected_language = language_dropdown.value
chunk_size = chunk_size_slider.value

# Validate input
if 'input_text' not in dir() or not input_text:
    print(f"{Fore.RED}‚ùå No text file uploaded! Please run Step 6 first.{Style.RESET_ALL}")
else:
    print("=" * 70)
    print("üéôÔ∏è MULTILINGUAL TRANSCRIPTION GENERATOR")
    print("=" * 70)
    print(f"Provider: {selected_provider}")
    print(f"Model: {selected_model}")
    print(f"Language: {selected_language}")
    print(f"Chunk size: {chunk_size} sentences")
    print()

    # Initialize narrator
    narrator = ImprovedLLMNarrator(
        provider=selected_provider, 
        model_name=selected_model, 
        device="auto",
        language=selected_language
    )
    narrator.load()
    print()

    # Initialize preprocessor
    preprocessor = TextPreprocessor()
    
    # Detect language
    primary_lang = ImprovedNarratorPrompts.detect_language(input_text)
    print(f"üåç Detected language: {primary_lang.upper()}")
    
    # Split into chapters
    chapters = preprocessor.split_into_chapters(input_text)
    print(f"üìö Found {len(chapters)} chapter(s)")
    print()

    # Track results
    transcription_data = {
        "metadata": {
            "source_file": uploaded_filename if 'uploaded_filename' in dir() else "uploaded_file",
            "generated_at": datetime.now().isoformat(),
            "primary_language": primary_lang,
            "total_chapters": len(chapters),
            "narrator_model": selected_model,
            "chunk_size": chunk_size
        },
        "chapters": []
    }

    total_start = time.time()
    successful = 0
    total_chunks = 0

    # Process each chapter
    for ch_idx, chapter in enumerate(chapters, 1):
        print(f"{'=' * 70}")
        print(f"üìñ Chapter {ch_idx}/{len(chapters)}: {chapter['title']}")
        print(f"{'=' * 70}")
        
        sentences = preprocessor.split_into_sentences(chapter['content'])
        chunks = preprocessor.create_chunks(sentences, chunk_size=chunk_size, overlap=1)
        
        print(f"üì¶ Processing {len(chunks)} chunks...")
        total_chunks += len(chunks)
        
        narrated_chunks = []
        
        for c_idx, chunk in enumerate(chunks, 1):
            print(f"   üéôÔ∏è Chunk {c_idx}/{len(chunks)}... ", end="", flush=True)
            
            start_time = time.time()
            narration, is_valid, lang = narrator.narrate_text(chunk['text'])
            elapsed = time.time() - start_time
            
            if is_valid:
                successful += 1
                print(f"{Fore.GREEN}‚úÖ [{lang}] ({elapsed:.1f}s){Style.RESET_ALL}")
            else:
                print(f"{Fore.YELLOW}‚ö†Ô∏è Fallback [{lang}] ({elapsed:.1f}s){Style.RESET_ALL}")
            
            narrated_chunks.append({
                "chunk_number": c_idx,
                "original_text": chunk['text'],
                "narration": narration,
                "language": lang,
                "is_valid": is_valid
            })
        
        transcription_data["chapters"].append({
            "chapter_number": ch_idx,
            "title": chapter['title'],
            "chunks": narrated_chunks
        })

    total_time = time.time() - total_start

    # Build final transcription text
    final_transcription = ""
    for chapter in transcription_data["chapters"]:
        final_transcription += f"\n{'='*70}\n"
        final_transcription += f"CHAPTER {chapter['chapter_number']}: {chapter['title']}\n"
        final_transcription += f"{'='*70}\n\n"
        for chunk in chapter['chunks']:
            final_transcription += f"{chunk['narration']}\n\n"

    # Print summary
    print()
    print(f"{'='*70}")
    print(f"{Fore.GREEN}{Style.BRIGHT}üéâ TRANSCRIPTION COMPLETE!{Style.RESET_ALL}")
    print(f"{'='*70}")
    print(f"‚è±Ô∏è Total time: {total_time/60:.2f} minutes")
    print(f"üåç Primary language: {primary_lang.upper()}")
    print(f"üìö Chapters: {len(chapters)}")
    print(f"üì¶ Total chunks: {total_chunks}")
    print(f"‚úÖ Successful: {successful}/{total_chunks} ({100*successful/total_chunks:.1f}%)")
    print()
    print(f"{Fore.CYAN}=== PREVIEW (first 1000 chars) ==={Style.RESET_ALL}")
    print("-" * 50)
    print(final_transcription[:1000] + "..." if len(final_transcription) > 1000 else final_transcription)

## Step 9: Download Transcription üíæ

In [None]:
from google.colab import files
from datetime import datetime

if 'final_transcription' not in dir() or not final_transcription:
    print(f"{Fore.RED}‚ùå No transcription generated yet! Please run Step 8 first.{Style.RESET_ALL}")
else:
    # Generate output filename
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    base_name = uploaded_filename.rsplit('.', 1)[0] if 'uploaded_filename' in dir() else 'document'
    output_filename = f"{base_name}_transcription_{timestamp}.txt"
    
    # Create transcription content with metadata
    transcription_content = f"""================================================================================
NARRATOR-READY TRANSCRIPTION FOR TTS
================================================================================

Source File: {uploaded_filename if 'uploaded_filename' in dir() else 'Unknown'}
Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
Provider: {selected_provider}
Model: {selected_model}
Language: {primary_lang.upper()}
Chapters: {len(chapters)}
Chunks Processed: {total_chunks}
Success Rate: {100*successful/total_chunks:.1f}%

MARKERS USED:
- [PAUSE-SHORT]: Brief pause (comma-level)
- [PAUSE-MEDIUM]: Moderate pause (sentence end)
- [PAUSE-LONG]: Extended pause (paragraph/section break)
- [TONE: X]: Emotional tone indicator

================================================================================
TRANSCRIPTION
================================================================================
{final_transcription}
================================================================================
END OF TRANSCRIPTION
================================================================================
"""
    
    # Save to file
    with open(output_filename, 'w', encoding='utf-8') as f:
        f.write(transcription_content)
    
    print(f"{Fore.GREEN}‚úÖ Transcription saved!{Style.RESET_ALL}")
    print(f"üìÑ Filename: {output_filename}")
    print()
    print("üì• Starting download...")
    
    # Download the file
    files.download(output_filename)
    
    print(f"\n{Fore.GREEN}‚úÖ Download initiated! Check your browser's downloads.{Style.RESET_ALL}")
    print(f"\nüí° This transcription is ready to be used with TTS systems!")

## Step 10 (Optional): Download JSON Version üìã

If you need the structured JSON format with chunk-level details:

In [None]:
if 'transcription_data' not in dir():
    print(f"{Fore.RED}‚ùå No transcription generated yet! Please run Step 8 first.{Style.RESET_ALL}")
else:
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    base_name = uploaded_filename.rsplit('.', 1)[0] if 'uploaded_filename' in dir() else 'document'
    json_filename = f"{base_name}_transcription_{timestamp}.json"
    
    with open(json_filename, 'w', encoding='utf-8') as f:
        json.dump(transcription_data, f, ensure_ascii=False, indent=2)
    
    print(f"{Fore.GREEN}‚úÖ JSON saved: {json_filename}{Style.RESET_ALL}")
    files.download(json_filename)
    print(f"{Fore.GREEN}‚úÖ JSON download initiated!{Style.RESET_ALL}")

---

## üìñ Quick Reference

### Pause Markers:
| Marker | Duration | Usage |
|--------|----------|-------|
| `[PAUSE-SHORT]` | ~0.3s | After commas, short phrases |
| `[PAUSE-MEDIUM]` | ~0.7s | After sentences |
| `[PAUSE-LONG]` | ~1.2s | Between paragraphs, scene changes |

### Tone Markers:
| Marker | Description |
|--------|-------------|
| `[TONE: serious]` | Grave, important content |
| `[TONE: thoughtful]` | Reflective, contemplative |
| `[TONE: curious]` | Questioning, wondering |
| `[TONE: calm]` | Peaceful, neutral |

### Tips:
- üîπ Smaller chunk sizes = better quality but slower processing
- üîπ Use `ai4bharat/Airavata` for best Hindi results
- üîπ GPU acceleration significantly speeds up HuggingFace models
- üîπ The transcription is optimized for emotional, human-like TTS