# ü§ñ GenAI-Powered Speech-to-Speech Translator

## Powered by Groq (Llama) + Google Gemini

This notebook implements an **intelligent speech-to-speech translator** that leverages Generative AI:

| Component | Technology | Benefit |
|-----------|------------|----------|
| üé§ Speech Recognition | **OpenAI Whisper** | Multi-language, noise-robust |
| üß† Translation | **Groq (Llama) / Google Gemini** | Fast, context-aware, FREE! |
| üîä Speech Synthesis | **Edge TTS** | Natural, expressive voices (FREE) |
| ‚ú® Intelligence | **LLM-powered** | Style adaptation, explanations, context memory |

### üÜì Cost: Completely FREE!
- **Groq**: Free tier with generous limits
- **Google Gemini**: Free tier available
- **Whisper**: Runs locally
- **Edge TTS**: Free Microsoft voices

---

# Task 1: Environment Setup and Configuration

Install dependencies and configure API keys.

In [None]:
# Install core dependencies
!pip install -q groq google-generativeai
!pip install -q openai-whisper edge-tts pydub soundfile
!pip install -q SpeechRecognition ipywidgets nest-asyncio python-dotenv rich

In [1]:
# Import all required libraries
import os
import io
import time
import asyncio
import tempfile
import warnings
from pathlib import Path
from typing import Optional, Dict, List, Tuple
from dataclasses import dataclass
from enum import Enum

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Audio processing
import numpy as np
import soundfile as sf
from pydub import AudioSegment
import speech_recognition as sr

# Whisper for speech recognition
import whisper

# LLM backends - Groq and Google Gemini
from groq import Groq
import google.generativeai as genai

# Edge TTS for speech synthesis
import edge_tts
import nest_asyncio
nest_asyncio.apply()  # Enable nested async in Jupyter

# Jupyter widgets
import ipywidgets as widgets
from IPython.display import display, Audio, HTML, clear_output

# Environment variables
from dotenv import load_dotenv
load_dotenv()

print("‚úÖ All libraries imported successfully!")

‚úÖ All libraries imported successfully!


In [2]:
# Configuration - Groq and Google Gemini settings

@dataclass
class Config:
    """Configuration for the GenAI Speech Translator"""
    
    # LLM Backend Selection: 'groq' or 'google'
    LLM_BACKEND: str = 'groq'  # Change to 'google' to use Gemini
    
    # API Keys (set via environment variables or directly here)
    GROQ_API_KEY: str = os.getenv('GROQ_API_KEY', '')
    GOOGLE_API_KEY: str = os.getenv('GOOGLE_API_KEY', '')
    
    # Groq Model Selection (Llama models)
    # Options: 'llama-3.3-70b-versatile', 'llama-3.1-70b-versatile', 
    #          'llama-3.1-8b-instant', 'mixtral-8x7b-32768', 'gemma2-9b-it'
    GROQ_MODEL: str = 'llama-3.3-70b-versatile'  # Best quality
    # GROQ_MODEL: str = 'llama-3.1-8b-instant'   # Faster, still good
    
    # Google Gemini Model Selection
    # Options: 'gemini-2.0-flash-exp', 'gemini-1.5-flash', 'gemini-1.5-pro'
    GOOGLE_MODEL: str = 'gemini-2.0-flash-exp'  # Latest and fastest
    # GOOGLE_MODEL: str = 'gemini-1.5-pro'      # Most capable
    
    # Whisper Model Size: 'tiny', 'base', 'small', 'medium', 'large'
    WHISPER_MODEL: str = 'base'  # Good balance of speed/accuracy
    
    # TTS Backend: 'edge_tts' (recommended, free)
    TTS_BACKEND: str = 'edge_tts'

# Initialize configuration
config = Config()

print("‚öôÔ∏è Configuration:")
print(f"   LLM Backend: {config.LLM_BACKEND}")
print(f"   Groq Model: {config.GROQ_MODEL}")
print(f"   Google Model: {config.GOOGLE_MODEL}")
print(f"   Whisper Model: {config.WHISPER_MODEL}")
print(f"   TTS Backend: {config.TTS_BACKEND}")

‚öôÔ∏è Configuration:
   LLM Backend: groq
   Groq Model: llama-3.3-70b-versatile
   Google Model: gemini-2.0-flash-exp
   Whisper Model: base
   TTS Backend: edge_tts


In [3]:
# Interactive API Key Input
# Get your FREE API keys from:
# - Groq: https://console.groq.com/keys
# - Google: https://aistudio.google.com/app/apikey

print("üîë API Key Setup")
print("="*50)
print("Get your FREE API keys from:")
print("  ‚Ä¢ Groq: https://console.groq.com/keys")
print("  ‚Ä¢ Google: https://aistudio.google.com/app/apikey")
print("="*50)

# Backend selector
backend_selector = widgets.Dropdown(
    options=[
        ('üöÄ Groq (Llama 3.3 - Ultra Fast)', 'groq'),
        ('üåü Google Gemini (Gemini 2.0)', 'google')
    ],
    value='groq',
    description='Backend:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='350px')
)

# API Key input
api_key_input = widgets.Password(
    placeholder='Paste your API key here',
    description='API Key:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='400px')
)

# Status output
key_status = widgets.Output()

def set_api_key(btn):
    """Set the API key based on selected backend"""
    with key_status:
        key_status.clear_output()
        
        if not api_key_input.value:
            print("‚ö†Ô∏è Please enter an API key!")
            return
            
        if backend_selector.value == 'groq':
            config.GROQ_API_KEY = api_key_input.value
            config.LLM_BACKEND = 'groq'
            print(f"‚úÖ Groq API key set!")
            print(f"   Model: {config.GROQ_MODEL}")
            
        elif backend_selector.value == 'google':
            config.GOOGLE_API_KEY = api_key_input.value
            config.LLM_BACKEND = 'google'
            # Configure Google GenAI
            genai.configure(api_key=config.GOOGLE_API_KEY)
            print(f"‚úÖ Google API key set!")
            print(f"   Model: {config.GOOGLE_MODEL}")
        
        print("\nüéâ Ready to translate!")

set_key_btn = widgets.Button(
    description='‚úì Set API Key',
    button_style='success',
    layout=widgets.Layout(width='150px')
)
set_key_btn.on_click(set_api_key)

display(widgets.VBox([
    backend_selector,
    api_key_input,
    set_key_btn,
    key_status
]))

üîë API Key Setup
Get your FREE API keys from:
  ‚Ä¢ Groq: https://console.groq.com/keys
  ‚Ä¢ Google: https://aistudio.google.com/app/apikey


VBox(children=(Dropdown(description='Backend:', layout=Layout(width='350px'), options=(('üöÄ Groq (Llama 3.3 - U‚Ä¶

---
# Task 2: Define Languages and Translation Styles

Extended language support and translation style options.

In [4]:
# Language definitions with Edge TTS voice mappings

LANGUAGES = {
    'en': {'name': 'English', 'tts_voice': 'en-US-AriaNeural', 'whisper_code': 'en'},
    'es': {'name': 'Spanish', 'tts_voice': 'es-ES-ElviraNeural', 'whisper_code': 'es'},
    'fr': {'name': 'French', 'tts_voice': 'fr-FR-DeniseNeural', 'whisper_code': 'fr'},
    'de': {'name': 'German', 'tts_voice': 'de-DE-KatjaNeural', 'whisper_code': 'de'},
    'hi': {'name': 'Hindi', 'tts_voice': 'hi-IN-SwaraNeural', 'whisper_code': 'hi'},
    'zh': {'name': 'Chinese (Mandarin)', 'tts_voice': 'zh-CN-XiaoxiaoNeural', 'whisper_code': 'zh'},
    'ja': {'name': 'Japanese', 'tts_voice': 'ja-JP-NanamiNeural', 'whisper_code': 'ja'},
    'ko': {'name': 'Korean', 'tts_voice': 'ko-KR-SunHiNeural', 'whisper_code': 'ko'},
    'ar': {'name': 'Arabic', 'tts_voice': 'ar-SA-ZariyahNeural', 'whisper_code': 'ar'},
    'pt': {'name': 'Portuguese', 'tts_voice': 'pt-BR-FranciscaNeural', 'whisper_code': 'pt'},
    'ru': {'name': 'Russian', 'tts_voice': 'ru-RU-SvetlanaNeural', 'whisper_code': 'ru'},
    'it': {'name': 'Italian', 'tts_voice': 'it-IT-ElsaNeural', 'whisper_code': 'it'},
    'nl': {'name': 'Dutch', 'tts_voice': 'nl-NL-ColetteNeural', 'whisper_code': 'nl'},
    'pl': {'name': 'Polish', 'tts_voice': 'pl-PL-AgnieszkaNeural', 'whisper_code': 'pl'},
    'tr': {'name': 'Turkish', 'tts_voice': 'tr-TR-EmelNeural', 'whisper_code': 'tr'},
}

# Translation styles
class TranslationStyle(Enum):
    STANDARD = "standard"
    FORMAL = "formal"
    CASUAL = "casual"
    POETIC = "poetic"
    TECHNICAL = "technical"
    SIMPLIFIED = "simplified"

STYLE_DESCRIPTIONS = {
    TranslationStyle.STANDARD: "Accurate and natural translation",
    TranslationStyle.FORMAL: "Professional, business-appropriate language",
    TranslationStyle.CASUAL: "Friendly and conversational tone",
    TranslationStyle.POETIC: "Artistic and literary expression",
    TranslationStyle.TECHNICAL: "Precise technical terminology",
    TranslationStyle.SIMPLIFIED: "Simple, easy-to-understand language",
}

print("üåç Supported Languages:")
for code, info in LANGUAGES.items():
    print(f"   ‚Ä¢ {info['name']} ({code})")

print(f"\nüé® Translation Styles: {len(STYLE_DESCRIPTIONS)}")
for style, desc in STYLE_DESCRIPTIONS.items():
    print(f"   ‚Ä¢ {style.value}: {desc}")

üåç Supported Languages:
   ‚Ä¢ English (en)
   ‚Ä¢ Spanish (es)
   ‚Ä¢ French (fr)
   ‚Ä¢ German (de)
   ‚Ä¢ Hindi (hi)
   ‚Ä¢ Chinese (Mandarin) (zh)
   ‚Ä¢ Japanese (ja)
   ‚Ä¢ Korean (ko)
   ‚Ä¢ Arabic (ar)
   ‚Ä¢ Portuguese (pt)
   ‚Ä¢ Russian (ru)
   ‚Ä¢ Italian (it)
   ‚Ä¢ Dutch (nl)
   ‚Ä¢ Polish (pl)
   ‚Ä¢ Turkish (tr)

üé® Translation Styles: 6
   ‚Ä¢ standard: Accurate and natural translation
   ‚Ä¢ formal: Professional, business-appropriate language
   ‚Ä¢ casual: Friendly and conversational tone
   ‚Ä¢ poetic: Artistic and literary expression
   ‚Ä¢ technical: Precise technical terminology
   ‚Ä¢ simplified: Simple, easy-to-understand language


---
# Task 3: Initialize Whisper Model

Load the Whisper model for speech recognition.

In [5]:
# Load Whisper model
print(f"üîÑ Loading Whisper model '{config.WHISPER_MODEL}'...")
print("   (First run downloads the model - this is one-time)")

whisper_model = whisper.load_model(config.WHISPER_MODEL)
print(f"‚úÖ Whisper model loaded successfully!")

üîÑ Loading Whisper model 'base'...
   (First run downloads the model - this is one-time)


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 139M/139M [00:12<00:00, 11.7MiB/s]


‚úÖ Whisper model loaded successfully!


---
# Task 4: Create LLM Client (Groq + Google Gemini)

Unified client supporting both Groq and Google Gemini backends.

In [8]:
class LLMClient:
    """
    Unified LLM client supporting Groq (Llama) and Google Gemini.
    """
    
    def __init__(self, config: Config):
        self.config = config
        self.groq_client = None
        self.google_model = None
    
    def _ensure_client(self):
        """Initialize the appropriate client based on backend selection."""
        if self.config.LLM_BACKEND == 'groq':
            if not self.config.GROQ_API_KEY:
                raise ValueError("Groq API key not set. Please set it in Task 1.")
            if self.groq_client is None:
                self.groq_client = Groq(api_key=self.config.GROQ_API_KEY)
                
        elif self.config.LLM_BACKEND == 'google':
            if not self.config.GOOGLE_API_KEY:
                raise ValueError("Google API key not set. Please set it in Task 1.")
            if self.google_model is None:
                genai.configure(api_key=self.config.GOOGLE_API_KEY)
                self.google_model = genai.GenerativeModel(self.config.GOOGLE_MODEL)
    
    def generate(self, prompt: str, system_prompt: str = "") -> Optional[str]:
        """
        Generate a response from the LLM.
        
        Parameters:
        -----------
        prompt : str
            The user prompt/question
        system_prompt : str
            System instructions for the model
        
        Returns:
        --------
        str or None
            Generated response or None on error
        """
        try:
            self._ensure_client()
            
            if self.config.LLM_BACKEND == 'groq':
                return self._generate_groq(prompt, system_prompt)
            elif self.config.LLM_BACKEND == 'google':
                return self._generate_google(prompt, system_prompt)
            else:
                raise ValueError(f"Unknown backend: {self.config.LLM_BACKEND}")
                
        except Exception as e:
            print(f"‚ùå LLM Error: {e}")
            return None
    
    def _generate_groq(self, prompt: str, system_prompt: str) -> str:
        """Generate using Groq (Llama models)."""
        messages = []
        
        if system_prompt:
            messages.append({"role": "system", "content": system_prompt})
        
        messages.append({"role": "user", "content": prompt})
        
        response = self.groq_client.chat.completions.create(
            model=self.config.GROQ_MODEL,
            messages=messages,
            temperature=0.3,
            max_tokens=1000
        )
        
        return response.choices[0].message.content
    
    def _generate_google(self, prompt: str, system_prompt: str) -> str:
        """Generate using Google Gemini."""
        # Combine system prompt and user prompt for Gemini
        full_prompt = f"{system_prompt}\n\n{prompt}" if system_prompt else prompt
        
        response = self.google_model.generate_content(
            full_prompt,
            generation_config=genai.types.GenerationConfig(
                temperature=0.3,
                max_output_tokens=1000
            )
        )
        
        return response.text

# Initialize LLM client
llm_client = LLMClient(config)
print("‚úÖ LLM Client initialized!")
print(f"   Backend: {config.LLM_BACKEND}")

‚úÖ LLM Client initialized!
   Backend: groq


In [9]:
# Test LLM connection (optional - run after setting API key)
def test_llm_connection():
    """Test the LLM connection."""
    print(f"üß™ Testing {config.LLM_BACKEND.upper()} connection...")
    
    try:
        response = llm_client.generate(
            "Say 'Hello! Connection successful!' in exactly those words.",
            "You are a helpful assistant. Respond briefly."
        )
        print(f"‚úÖ Response: {response}")
        return True
    except Exception as e:
        print(f"‚ùå Connection failed: {e}")
        return False

# Uncomment to test after setting API key:
test_llm_connection()

üß™ Testing GROQ connection...
‚úÖ Response: Hello! Connection successful!


True

---
# Task 5: Implement Speech Recognition with Whisper

Record audio and transcribe using Whisper.

In [10]:
def record_audio(duration: int = 10, sample_rate: int = 16000) -> Optional[np.ndarray]:
    """
    Record audio from microphone.
    
    Parameters:
    -----------
    duration : int
        Maximum recording duration in seconds
    sample_rate : int
        Audio sample rate (Whisper expects 16000)
    
    Returns:
    --------
    np.ndarray or None
        Audio data as numpy array
    """
    recognizer = sr.Recognizer()
    
    try:
        with sr.Microphone(sample_rate=sample_rate) as source:
            print("üîß Adjusting for ambient noise...")
            recognizer.adjust_for_ambient_noise(source, duration=1)
            
            print(f"üé§ Recording... (speak now, max {duration}s)")
            audio = recognizer.listen(source, timeout=5, phrase_time_limit=duration)
            
            print("‚úÖ Recording complete!")
            
            # Convert to numpy array for Whisper
            audio_data = np.frombuffer(audio.get_wav_data(), dtype=np.int16)
            audio_float = audio_data.astype(np.float32) / 32768.0
            
            return audio_float
            
    except sr.WaitTimeoutError:
        print("‚ö†Ô∏è No speech detected. Please try again.")
        return None
    except Exception as e:
        print(f"‚ùå Recording error: {e}")
        return None


def transcribe_audio(audio_data: np.ndarray, language: str = None) -> Dict:
    """
    Transcribe audio using Whisper.
    
    Parameters:
    -----------
    audio_data : np.ndarray
        Audio data
    language : str, optional
        Language code (if None, auto-detect)
    
    Returns:
    --------
    dict
        {'text': transcribed text, 'language': detected language}
    """
    try:
        print("üîÑ Transcribing with Whisper...")
        
        options = {}
        if language:
            options['language'] = language
        
        result = whisper_model.transcribe(audio_data, **options)
        
        detected_lang = result.get('language', 'unknown')
        text = result['text'].strip()
        
        print(f"‚úÖ Transcription complete!")
        print(f"   Detected language: {detected_lang}")
        
        return {'text': text, 'language': detected_lang}
        
    except Exception as e:
        print(f"‚ùå Transcription error: {e}")
        return {'text': None, 'language': None}

print("‚úÖ Speech recognition functions defined!")

‚úÖ Speech recognition functions defined!


---
# Task 6: Implement Intelligent Translation with LLM

Context-aware, style-adaptive translation using Groq/Gemini.

In [11]:
class IntelligentTranslator:
    """
    LLM-powered translator with:
    - Context-aware translation
    - Style adaptation
    - Cultural localization
    - Translation explanations
    - Conversation memory
    """
    
    def __init__(self, llm_client: LLMClient):
        self.llm = llm_client
        self.conversation_history: List[Dict] = []
        self.max_history = 5
    
    def _build_system_prompt(self, style: TranslationStyle) -> str:
        """Build the system prompt based on translation style."""
        
        base_prompt = """You are an expert multilingual translator with deep understanding of cultural nuances, idioms, and context. Your translations should be natural and fluent.

CRITICAL RULES:
1. ONLY output the translation - no explanations, notes, quotes, or additional text
2. Do NOT include phrases like "Here's the translation" or any preamble
3. Preserve the original meaning and intent
4. Adapt idioms to culturally equivalent expressions
5. Maintain the emotional tone of the original
"""
        
        style_prompts = {
            TranslationStyle.STANDARD: "Use natural, accurate language appropriate for general communication.",
            TranslationStyle.FORMAL: "Use formal, professional language. Avoid contractions and colloquialisms.",
            TranslationStyle.CASUAL: "Use friendly, conversational language with appropriate colloquialisms.",
            TranslationStyle.POETIC: "Use elegant, literary language with artistic expression.",
            TranslationStyle.TECHNICAL: "Use precise technical terminology. Maintain domain-specific accuracy.",
            TranslationStyle.SIMPLIFIED: "Use simple, clear language suitable for language learners.",
        }
        
        return base_prompt + "\nSTYLE: " + style_prompts.get(style, style_prompts[TranslationStyle.STANDARD])
    
    def translate(
        self,
        text: str,
        source_lang: str,
        target_lang: str,
        style: TranslationStyle = TranslationStyle.STANDARD,
        use_context: bool = True
    ) -> Dict:
        """
        Translate text using the LLM.
        
        Returns:
        --------
        dict
            {'translation': text, 'source': original, 'style': style}
        """
        if not text:
            return {'translation': None, 'source': text, 'style': style.value}
        
        # Get language names
        src_name = LANGUAGES.get(source_lang, {}).get('name', source_lang)
        tgt_name = LANGUAGES.get(target_lang, {}).get('name', target_lang)
        
        # Build context from history
        context = ""
        if use_context and self.conversation_history:
            context = "\nPrevious context for reference:\n"
            for h in self.conversation_history[-3:]:
                context += f"- \"{h['source']}\" ‚Üí \"{h['translation']}\"\n"
        
        # Build translation prompt
        prompt = f"""Translate from {src_name} to {tgt_name}.
{context}
Text to translate: "{text}"

Provide ONLY the translation, nothing else:"""
        
        system_prompt = self._build_system_prompt(style)
        
        print(f"üß† Translating with {config.LLM_BACKEND.upper()} ({style.value} style)...")
        
        # Get translation
        translation = self.llm.generate(prompt, system_prompt)
        
        if translation:
            # Clean up response
            translation = translation.strip().strip('"').strip()
            
            # Remove common prefixes that LLMs sometimes add
            prefixes_to_remove = [
                "Here's the translation:",
                "Translation:",
                "Here is the translation:",
                "The translation is:"
            ]
            for prefix in prefixes_to_remove:
                if translation.lower().startswith(prefix.lower()):
                    translation = translation[len(prefix):].strip()
            
            # Add to history
            self.conversation_history.append({
                'source': text,
                'translation': translation,
                'source_lang': source_lang,
                'target_lang': target_lang
            })
            
            # Trim history
            if len(self.conversation_history) > self.max_history:
                self.conversation_history = self.conversation_history[-self.max_history:]
            
            print("‚úÖ Translation complete!")
        
        return {
            'translation': translation,
            'source': text,
            'style': style.value
        }
    
    def explain_translation(
        self,
        original: str,
        translation: str,
        source_lang: str,
        target_lang: str
    ) -> str:
        """Get explanation of translation choices (for learning)."""
        src_name = LANGUAGES.get(source_lang, {}).get('name', source_lang)
        tgt_name = LANGUAGES.get(target_lang, {}).get('name', target_lang)
        
        prompt = f"""Explain this translation from {src_name} to {tgt_name}:

Original: "{original}"
Translation: "{translation}"

Briefly explain:
1. Key vocabulary choices
2. Any idioms adapted
3. Grammar differences
4. Cultural adaptations (if any)

Keep it concise and educational."""
        
        system_prompt = "You are a language teacher explaining translation choices."
        return self.llm.generate(prompt, system_prompt)
    
    def clear_history(self):
        """Clear conversation history."""
        self.conversation_history = []
        print("üßπ Conversation history cleared.")

# Initialize translator
translator = IntelligentTranslator(llm_client)
print("‚úÖ Intelligent Translator initialized!")

‚úÖ Intelligent Translator initialized!


In [12]:
# Test translation (run after setting API key)
def test_translation():
    """Test the translation system."""
    print("=" * 60)
    print(f"üß™ Testing Translation with {config.LLM_BACKEND.upper()}")
    print("=" * 60)
    
    test_text = "Hello! How are you doing today?"
    print(f"\nüìù Original: \"{test_text}\"")
    
    result = translator.translate(test_text, 'en', 'es', TranslationStyle.CASUAL)
    print(f"\nüá™üá∏ Spanish (Casual): {result['translation']}")
    
    result = translator.translate(test_text, 'en', 'fr', TranslationStyle.FORMAL)
    print(f"\nüá´üá∑ French (Formal): {result['translation']}")

# Uncomment after setting API key:
# test_translation()

---
# Task 7: Implement Text-to-Speech with Edge TTS

High-quality, free speech synthesis.

In [13]:
async def synthesize_speech_edge(text: str, language: str, output_file: str = "output.mp3") -> str:
    """
    Synthesize speech using Edge TTS (free, high-quality).
    """
    voice = LANGUAGES.get(language, {}).get('tts_voice', 'en-US-AriaNeural')
    
    print(f"üîä Synthesizing speech (voice: {voice})...")
    
    communicate = edge_tts.Communicate(text, voice)
    await communicate.save(output_file)
    
    print(f"‚úÖ Audio saved: {output_file}")
    return output_file


def text_to_speech(text: str, language: str) -> Optional[str]:
    """
    Convert text to speech and play in notebook.
    """
    if not text:
        print("‚ö†Ô∏è No text provided.")
        return None
    
    output_file = f"output_{int(time.time())}.mp3"
    
    try:
        # Run async TTS
        asyncio.get_event_loop().run_until_complete(
            synthesize_speech_edge(text, language, output_file)
        )
        
        # Display audio player
        print("\nüîä Playing audio:")
        display(Audio(output_file, autoplay=True))
        
        return output_file
        
    except Exception as e:
        print(f"‚ùå TTS Error: {e}")
        return None

print("‚úÖ Text-to-speech functions defined!")

‚úÖ Text-to-speech functions defined!


In [14]:
# Test TTS
print("üß™ Testing Text-to-Speech...")
test_tts = text_to_speech("Hola, esta es una prueba del sistema de voz.", 'es')

üß™ Testing Text-to-Speech...
üîä Synthesizing speech (voice: es-ES-ElviraNeural)...
‚úÖ Audio saved: output_1768504930.mp3

üîä Playing audio:


---
# Task 8: Build the Interactive UI

Complete user interface with all features.

In [15]:
# Create UI widgets

# Language dropdowns
input_lang_dropdown = widgets.Dropdown(
    options=[(v['name'], k) for k, v in LANGUAGES.items()],
    value='en',
    description='Input:',
    style={'description_width': '50px'},
    layout=widgets.Layout(width='180px')
)

output_lang_dropdown = widgets.Dropdown(
    options=[(v['name'], k) for k, v in LANGUAGES.items()],
    value='es',
    description='Output:',
    style={'description_width': '50px'},
    layout=widgets.Layout(width='180px')
)

# Auto-detect checkbox
auto_detect_checkbox = widgets.Checkbox(
    value=True,
    description='Auto-detect input language',
    indent=False
)

# Style dropdown
style_dropdown = widgets.Dropdown(
    options=[(s.value.title(), s) for s in TranslationStyle],
    value=TranslationStyle.STANDARD,
    description='Style:',
    style={'description_width': '50px'},
    layout=widgets.Layout(width='180px')
)

# Explain checkbox
explain_checkbox = widgets.Checkbox(
    value=False,
    description='Explain translation',
    indent=False
)

# Buttons
record_button = widgets.Button(
    description='üé§ Record & Translate',
    button_style='success',
    layout=widgets.Layout(width='180px', height='45px')
)

clear_button = widgets.Button(
    description='üßπ Clear',
    button_style='warning',
    layout=widgets.Layout(width='100px')
)

# Text input
text_input = widgets.Textarea(
    placeholder='Or type text here...',
    layout=widgets.Layout(width='100%', height='70px')
)

translate_text_button = widgets.Button(
    description='üìù Translate Text',
    button_style='primary',
    layout=widgets.Layout(width='140px')
)

# Output area
output_area = widgets.Output(
    layout=widgets.Layout(
        border='2px solid #3498db',
        padding='15px',
        margin='10px 0',
        min_height='200px',
        max_height='400px',
        overflow='auto'
    )
)

# Status
status_html = widgets.HTML(value="<b>Status:</b> Ready")

print("‚úÖ UI widgets created!")

‚úÖ UI widgets created!


In [16]:
# Event handlers

def perform_translation(text: str, source_lang: str):
    """Perform the full translation pipeline."""
    target_lang = output_lang_dropdown.value
    style = style_dropdown.value
    explain = explain_checkbox.value
    
    print("=" * 60)
    print(f"üåê GENAI TRANSLATOR ({config.LLM_BACKEND.upper()})")
    print("=" * 60)
    
    src_name = LANGUAGES.get(source_lang, {}).get('name', source_lang)
    tgt_name = LANGUAGES.get(target_lang, {}).get('name', target_lang)
    
    print(f"\nüì• Source ({src_name}): \"{text}\"")
    print(f"üì§ Target: {tgt_name}")
    print(f"üé® Style: {style.value}")
    print("-" * 60)
    
    # Translate
    result = translator.translate(text, source_lang, target_lang, style=style)
    
    if result['translation']:
        print(f"\n‚ú® Translation: \"{result['translation']}\"")
        print("-" * 60)
        
        # Explain if requested
        if explain:
            print("\nüìö Explanation:")
            explanation = translator.explain_translation(
                text, result['translation'], source_lang, target_lang
            )
            print(explanation)
            print("-" * 60)
        
        # Text-to-speech
        print("\nüîä Generating speech...")
        text_to_speech(result['translation'], target_lang)
        
        status_html.value = "<b>Status:</b> ‚úÖ Complete!"
    else:
        print("\n‚ùå Translation failed.")
        status_html.value = "<b>Status:</b> ‚ùå Failed"
    
    print("\n" + "=" * 60)


def on_record_clicked(btn):
    """Handle record button click."""
    with output_area:
        output_area.clear_output()
        status_html.value = "<b>Status:</b> üé§ Recording..."
        
        audio_data = record_audio(duration=10)
        
        if audio_data is not None:
            status_html.value = "<b>Status:</b> üîÑ Transcribing..."
            
            input_lang = None if auto_detect_checkbox.value else input_lang_dropdown.value
            result = transcribe_audio(audio_data, language=input_lang)
            
            if result['text']:
                detected_lang = result['language']
                if auto_detect_checkbox.value:
                    print(f"\nüîç Detected: {detected_lang}")
                
                status_html.value = "<b>Status:</b> üß† Translating..."
                perform_translation(result['text'], detected_lang)
            else:
                print("‚ùå Transcription failed.")
                status_html.value = "<b>Status:</b> ‚ùå Transcription failed"
        else:
            status_html.value = "<b>Status:</b> ‚ùå Recording failed"


def on_translate_text_clicked(btn):
    """Handle translate text button click."""
    with output_area:
        output_area.clear_output()
        
        text = text_input.value.strip()
        if not text:
            print("‚ö†Ô∏è Please enter text to translate.")
            return
        
        source_lang = input_lang_dropdown.value
        perform_translation(text, source_lang)


def on_clear_clicked(btn):
    """Handle clear button click."""
    with output_area:
        output_area.clear_output()
        translator.clear_history()
        text_input.value = ""
        status_html.value = "<b>Status:</b> Ready"


# Attach handlers
record_button.on_click(on_record_clicked)
translate_text_button.on_click(on_translate_text_clicked)
clear_button.on_click(on_clear_clicked)

print("‚úÖ Event handlers attached!")

‚úÖ Event handlers attached!


---
# Task 9: Launch the Application

Display the complete translator interface!

In [17]:
# Build and display the UI

# Title
title_html = widgets.HTML(f"""
<div style='background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); 
            padding: 20px; border-radius: 10px; margin-bottom: 15px;'>
    <h2 style='color: white; margin: 0;'>ü§ñ GenAI Speech-to-Speech Translator</h2>
    <p style='color: #e0e0e0; margin: 5px 0 0 0;'>
        Powered by Whisper + {config.LLM_BACKEND.upper()} + Edge TTS
    </p>
</div>
""")

# Language row
lang_row = widgets.HBox(
    [input_lang_dropdown, widgets.HTML(" ‚Üí "), output_lang_dropdown, auto_detect_checkbox],
    layout=widgets.Layout(align_items='center', gap='10px')
)

# Style row
style_row = widgets.HBox(
    [style_dropdown, explain_checkbox],
    layout=widgets.Layout(gap='20px')
)

# Voice section
voice_section = widgets.VBox([
    widgets.HTML("<h4 style='margin: 10px 0 5px 0;'>üé§ Voice Input</h4>"),
    widgets.HBox([record_button, clear_button], layout=widgets.Layout(gap='10px'))
])

# Text section
text_section = widgets.VBox([
    widgets.HTML("<h4 style='margin: 10px 0 5px 0;'>üìù Text Input</h4>"),
    text_input,
    translate_text_button
])

# Tabs
input_tabs = widgets.Tab(children=[voice_section, text_section])
input_tabs.set_title(0, 'üé§ Voice')
input_tabs.set_title(1, 'üìù Text')

# Main container
main_ui = widgets.VBox([
    title_html,
    widgets.HTML("<b>üåç Languages:</b>"),
    lang_row,
    widgets.HTML("<b>üé® Style:</b>"),
    style_row,
    input_tabs,
    status_html,
    widgets.HTML("<b>üì§ Output:</b>"),
    output_area
], layout=widgets.Layout(
    padding='20px',
    border='2px solid #764ba2',
    border_radius='15px',
    max_width='800px'
))

# Display
print("üéâ GenAI Speech-to-Speech Translator is ready!")
print("="*60)
print(f"Backend: {config.LLM_BACKEND.upper()}")
if config.LLM_BACKEND == 'groq':
    print(f"Model: {config.GROQ_MODEL}")
else:
    print(f"Model: {config.GOOGLE_MODEL}")
print("="*60)
display(main_ui)

üéâ GenAI Speech-to-Speech Translator is ready!
Backend: GROQ
Model: llama-3.3-70b-versatile


VBox(children=(HTML(value="\n<div style='background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); \n    ‚Ä¶

---
# Task 10: Demo and Documentation

Test the GenAI capabilities!

In [18]:
# Demo: Style comparison (run after setting API key)
def demo_styles():
    print("=" * 60)
    print("üß™ Demo: Translation Style Comparison")
    print("=" * 60)
    
    text = "I need to finish this project by tomorrow."
    print(f"\nüìù Original: \"{text}\"\n")
    
    translator.clear_history()
    
    for style in [TranslationStyle.FORMAL, TranslationStyle.CASUAL, TranslationStyle.SIMPLIFIED]:
        result = translator.translate(text, 'en', 'es', style)
        print(f"üîπ {style.value.upper()}: {result['translation']}")

# Uncomment to run:
demo_styles()

üß™ Demo: Translation Style Comparison

üìù Original: "I need to finish this project by tomorrow."

üßπ Conversation history cleared.
üß† Translating with GROQ (formal style)...
‚úÖ Translation complete!
üîπ FORMAL: Necesito terminar este proyecto para ma√±ana.
üß† Translating with GROQ (casual style)...
‚úÖ Translation complete!
üîπ CASUAL: Necesito terminar este proyecto para ma√±ana.
üß† Translating with GROQ (simplified style)...
‚úÖ Translation complete!
üîπ SIMPLIFIED: Necesito terminar este proyecto para ma√±ana.


In [19]:
# Demo: Idiom handling
def demo_idioms():
    print("=" * 60)
    print("üß™ Demo: Intelligent Idiom Translation")
    print("=" * 60)
    
    idioms = [
        "It's raining cats and dogs!",
        "Break a leg!",
        "The early bird catches the worm."
    ]
    
    translator.clear_history()
    
    for idiom in idioms:
        result = translator.translate(idiom, 'en', 'es')
        print(f"\nüá∫üá∏ EN: \"{idiom}\"")
        print(f"üá™üá∏ ES: \"{result['translation']}\"")

# Uncomment to run:
# demo_idioms()

In [20]:
# Cleanup utility
def cleanup_files():
    """Remove generated audio files."""
    import glob
    files = glob.glob('output_*.mp3')
    for f in files:
        os.remove(f)
    print(f"üßπ Cleaned up {len(files)} audio file(s).")

# Uncomment to cleanup:
# cleanup_files()

---

## üìã Summary

### Technology Stack

| Component | Technology | Cost |
|-----------|------------|------|
| Speech Recognition | OpenAI Whisper (local) | FREE |
| Translation | Groq (Llama 3.3) or Google Gemini | FREE tier |
| Text-to-Speech | Edge TTS | FREE |

### Supported LLM Models

**Groq (Llama):**
- `llama-3.3-70b-versatile` - Best quality
- `llama-3.1-8b-instant` - Fastest
- `mixtral-8x7b-32768` - Good balance

**Google Gemini:**
- `gemini-2.0-flash-exp` - Latest, fast
- `gemini-1.5-pro` - Most capable
- `gemini-1.5-flash` - Fast & efficient

