# Azure AI Speech Services Demo

This notebook demonstrates the capabilities of Azure AI Speech Services including:
- **Text-to-Speech (TTS)**: Convert text to natural-sounding speech
- **Speech-to-Text (STT)**: Convert spoken audio to text
- **Speech Translation**: Real-time speech translation
- **Voice Recognition**: Identify and verify speakers
- **Custom Speech Models**: Train models for specific domains

## Prerequisites
- Azure subscription
- Azure Speech resource created in Azure portal
- Python 3.8 or higher
- Audio input/output capabilities (microphone and speakers)

## 1. Setup and Package Installation

First, let's install the required Azure Speech SDK and supporting packages.

In [15]:
# Install required Azure Speech SDK and supporting packages
!pip install azure-cognitiveservices-speech azure-identity python-dotenv requests matplotlib numpy

# For audio processing and visualization
!pip install soundfile librosa IPython




[notice] A new release of pip is available: 24.0 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 24.0 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
# Import required libraries
import os
import time
import json
import requests
from pathlib import Path
from dotenv import load_dotenv
import azure.cognitiveservices.speech as speechsdk
from azure.identity import DefaultAzureCredential
from IPython.display import Audio, display
import matplotlib.pyplot as plt
import numpy as np

# Load environment variables
load_dotenv()

# Setup output directory
output_dir = Path('output/speech')
output_dir.mkdir(parents=True, exist_ok=True)

print("Libraries imported successfully!")
print(f"Azure Speech SDK version: {speechsdk.__version__}")
print(f"Output directory: {output_dir.absolute()}")

Libraries imported successfully!
Azure Speech SDK version: 1.34.1


## 2. Azure Speech Service Configuration

### Option 1: Using Environment Variables (Recommended)
Set these environment variables in your system or create a `.env` file:
```
AZURE_SPEECH_KEY=your-speech-api-key
AZURE_SPEECH_REGION=your-region  # e.g., eastus, westus2
```

### Option 2: Using Managed Identity (For Azure-hosted applications)
When running on Azure services with managed identity enabled.

In [17]:
# Configuration for Azure Speech Service
speech_key = os.getenv('AZURE_SPEECH_KEY') or "your-speech-key-here"
speech_region = os.getenv('AZURE_SPEECH_REGION') or "eastus"

# Create speech configuration
if speech_key and speech_key != "your-speech-key-here":
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
    print(f"✅ Speech service configured for region: {speech_region}")
else:
    print("❌ Please set AZURE_SPEECH_KEY and AZURE_SPEECH_REGION environment variables")
    print("You can get these from your Azure Speech resource in the Azure portal")

# Set default voice and language settings
speech_config.speech_synthesis_voice_name = "en-US-AriaNeural"  # Default voice
speech_config.speech_recognition_language = "en-US"  # Default language

print(f"🎤 Default recognition language: {speech_config.speech_recognition_language}")
print(f"🗣️ Default synthesis voice: {speech_config.speech_synthesis_voice_name}")

✅ Speech service configured for region: eastus
🎤 Default recognition language: en-US
🗣️ Default synthesis voice: en-US-AriaNeural


## 3. Text-to-Speech (TTS)

Text-to-Speech converts written text into natural-sounding spoken audio. Azure offers:
- **100+ voices** in multiple languages
- **Neural voices** with human-like quality
- **SSML support** for fine-tuning speech
- **Custom voices** for brand-specific needs

In [None]:
def text_to_speech(text, voice_name=None, output_file=None):
    """
    Convert text to speech using Azure Speech Service
    
    Args:
        text (str): Text to convert to speech
        voice_name (str): Optional voice name (e.g., 'en-US-AriaNeural')
        output_file (str): Optional file name to save audio (will be saved in output/speech/)
    
    Returns:
        str: Path to saved file if successful, None otherwise
    """
    try:
        # Check if speech_config is available
        if 'speech_config' not in globals():
            print("❌ Speech configuration not available. Please run the configuration cell first.")
            return None
            
        # Create a copy of speech config for this request
        current_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
        
        if voice_name:
            current_config.speech_synthesis_voice_name = voice_name
        else:
            current_config.speech_synthesis_voice_name = speech_config.speech_synthesis_voice_name
        
        # Configure audio output
        if output_file:
            # Ensure file is saved in output/speech directory
            output_path = output_dir / output_file
            audio_config = speechsdk.audio.AudioOutputConfig(filename=str(output_path))
        else:
            audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
        
        # Create synthesizer
        synthesizer = speechsdk.SpeechSynthesizer(speech_config=current_config, audio_config=audio_config)
        
        print(f"🎵 Converting text to speech...")
        print(f"📝 Text: '{text}'")
        print(f"🗣️ Voice: {current_config.speech_synthesis_voice_name}")
        
        # Perform synthesis
        result = synthesizer.speak_text_async(text).get()
        
        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            print(f"✅ Speech synthesis completed successfully!")
            if output_file:
                print(f"💾 Audio saved to: {output_path}")
                # Display audio player in notebook
                display(Audio(str(output_path)))
                return str(output_path)
            return "Success (played on speakers)"
        elif result.reason == speechsdk.ResultReason.Canceled:
            cancellation = result.cancellation_details
            print(f"❌ Speech synthesis canceled: {cancellation.reason}")
            if cancellation.reason == speechsdk.CancellationReason.Error:
                print(f"Error details: {cancellation.error_details}")
        
        return None
        
    except Exception as e:
        print(f"❌ Error in text-to-speech: {e}")
        return None

# Test basic text-to-speech
test_text = "Hello! Welcome to Azure AI Speech Services. This is a demonstration of text-to-speech capabilities using neural voices."

# Convert to speech and save to file
output_file = "sample_speech.wav"
result_path = text_to_speech(test_text, output_file=output_file)
if result_path:
    print(f"\n🎉 Test completed! Audio file available at: {result_path}")

🎵 Converting text to speech...
📝 Text: 'Hello! Welcome to Azure AI Speech Services. This is a demonstration of text-to-speech capabilities using neural voices.'
🗣️ Voice: en-US-AriaNeural
❌ Speech synthesis canceled: CancellationReason.Error
Error details: WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 bytes.
❌ Speech synthesis canceled: CancellationReason.Error
Error details: WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 bytes.


False

In [None]:
# Demonstrate different voices
voices_to_try = [
    ("en-US-AriaNeural", "Hi there! I'm Aria, a neural voice from the United States."),
    ("en-GB-SoniaNeural", "Hello! I'm Sonia, speaking with a British accent."),
    ("en-AU-NatashaNeural", "G'day! I'm Natasha from Australia."),
    ("fr-FR-DeniseNeural", "Bonjour! Je suis Denise et je parle français."),
    ("de-DE-KatjaNeural", "Hallo! Ich bin Katja und spreche Deutsch.")
]

print("🎭 Demonstrating different voices:")
voice_files = []

for voice, text in voices_to_try:
    print(f"\n🗣️ Voice: {voice}")
    # Create safer filename
    safe_voice_name = voice.replace('-', '_').replace('Neural', '')
    output_file = f"voice_demo_{safe_voice_name}.wav"
    
    result_path = text_to_speech(text, voice_name=voice, output_file=output_file)
    if result_path:
        voice_files.append(result_path)
        time.sleep(1)  # Brief pause between voices

print(f"\n✅ Voice demonstration completed! Generated {len(voice_files)} audio files in {output_dir}")

🎭 Demonstrating different voices:

🗣️ Voice: en-US-AriaNeural
🎵 Converting text to speech...
📝 Text: 'Hi there! I'm Aria, a neural voice from the United States.'
🗣️ Voice: en-US-AriaNeural
❌ Speech synthesis canceled: CancellationReason.Error
Error details: WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 bytes.

🗣️ Voice: en-GB-SoniaNeural
🎵 Converting text to speech...
📝 Text: 'Hello! I'm Sonia, speaking with a British accent.'
🗣️ Voice: en-GB-SoniaNeural
❌ Speech synthesis canceled: CancellationReason.Error
Error details: WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 bytes.

🗣️ Voice: en-GB-SoniaNeural
🎵 Converting text to speech...
📝 Text: 'Hello! I'm Sonia, speaking with a British accent.'
🗣️ Voice: en-GB-SoniaNeural
❌ Speech synthesis canceled: CancellationReason.Error
Error

### Advanced Text-to-Speech with SSML

**Speech Synthesis Markup Language (SSML)** allows fine-tuning of speech output:
- Adjust speaking rate, pitch, and volume
- Add pauses and emphasis
- Control pronunciation
- Insert audio effects

In [None]:
def ssml_to_speech(ssml_text, output_file=None):
    """
    Convert SSML text to speech
    
    Args:
        ssml_text (str): SSML formatted text
        output_file (str): Optional file name to save audio (will be saved in output/speech/)
    
    Returns:
        str: Path to saved file if successful, None otherwise
    """
    try:
        # Check if speech_config is available
        if 'speech_config' not in globals():
            print("❌ Speech configuration not available. Please run the configuration cell first.")
            return None
            
        # Configure audio output
        if output_file:
            output_path = output_dir / output_file
            audio_config = speechsdk.audio.AudioOutputConfig(filename=str(output_path))
        else:
            audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
        
        # Create synthesizer
        synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
        
        print(f"🎵 Converting SSML to speech...")
        print(f"📝 SSML: {ssml_text[:100]}..." if len(ssml_text) > 100 else f"📝 SSML: {ssml_text}")
        
        # Perform synthesis
        result = synthesizer.speak_ssml_async(ssml_text).get()
        
        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            print(f"✅ SSML synthesis completed successfully!")
            if output_file:
                print(f"💾 Audio saved to: {output_path}")
                display(Audio(str(output_path)))
                return str(output_path)
            return "Success (played on speakers)"
        else:
            print(f"❌ SSML synthesis failed: {result.reason}")
        
        return None
        
    except Exception as e:
        print(f"❌ Error in SSML synthesis: {e}")
        return None

# SSML examples demonstrating various features
ssml_examples = {
    "Rate_and_Pitch": '''
    <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
        <voice name="en-US-AriaNeural">
            <prosody rate="slow" pitch="low">This is slow and low.</prosody>
            <break time="1s"/>
            <prosody rate="fast" pitch="high">This is fast and high!</prosody>
        </voice>
    </speak>
    ''',
    
    "Emphasis_and_Pauses": '''
    <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
        <voice name="en-US-AriaNeural">
            This is <emphasis level="strong">very important</emphasis>.
            <break time="2s"/>
            Let me repeat that again... <break time="1s"/>
            This is <emphasis level="strong">very important</emphasis>!
        </voice>
    </speak>
    ''',
    
    "Multiple_Voices": '''
    <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
        <voice name="en-US-AriaNeural">
            Hi, I'm Aria.
        </voice>
        <voice name="en-US-DavisNeural">
            And I'm Davis. Nice to meet you!
        </voice>
    </speak>
    '''
}

# Demonstrate SSML examples
ssml_files = []
for name, ssml in ssml_examples.items():
    print(f"\n🎭 SSML Example: {name.replace('_', ' ')}")
    output_file = f"ssml_demo_{name.lower()}.wav"
    result_path = ssml_to_speech(ssml, output_file=output_file)
    if result_path:
        ssml_files.append(result_path)
        time.sleep(1)

print(f"\n✅ SSML demonstration completed! Generated {len(ssml_files)} audio files.")


🎭 SSML Example: Rate and Pitch
🎵 Converting SSML to speech...
📝 SSML: 
    <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
        <voi...
❌ SSML synthesis failed: ResultReason.Canceled
❌ SSML synthesis failed: ResultReason.Canceled

🎭 SSML Example: Emphasis and Pauses
🎵 Converting SSML to speech...
📝 SSML: 
    <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
        <voi...

🎭 SSML Example: Emphasis and Pauses
🎵 Converting SSML to speech...
📝 SSML: 
    <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
        <voi...
❌ SSML synthesis failed: ResultReason.Canceled
❌ SSML synthesis failed: ResultReason.Canceled

🎭 SSML Example: Multiple Voices
🎵 Converting SSML to speech...
📝 SSML: 
    <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
        <voi...

🎭 SSML Example: Multiple Voices
🎵 Converting SSML to speech...
📝 SSML: 
    <speak version="1.0" xml

## 4. Speech-to-Text (STT)

Speech-to-Text converts spoken audio into written text. Features include:
- **Real-time transcription** from microphone or audio files
- **Batch transcription** for large audio files
- **Custom models** for domain-specific vocabulary
- **Multiple language support** with automatic detection

In [21]:
def speech_to_text_from_microphone(duration_seconds=10):
    """
    Convert speech from microphone to text
    
    Args:
        duration_seconds (int): How long to listen (0 for continuous until silence)
    
    Returns:
        str: Recognized text or None if failed
    """
    try:
        # Create recognizer with microphone
        audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
        speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
        
        print(f"🎤 Listening for speech (for {duration_seconds} seconds)...")
        print("📢 Please speak now!")
        
        # Start recognition
        result = speech_recognizer.recognize_once_async().get()
        
        if result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print(f"✅ Recognized: '{result.text}'")
            return result.text
        elif result.reason == speechsdk.ResultReason.NoMatch:
            print("❌ No speech could be recognized")
        elif result.reason == speechsdk.ResultReason.Canceled:
            cancellation = result.cancellation_details
            print(f"❌ Speech recognition canceled: {cancellation.reason}")
            if cancellation.reason == speechsdk.CancellationReason.Error:
                print(f"Error details: {cancellation.error_details}")
        
        return None
        
    except Exception as e:
        print(f"❌ Error in speech recognition: {e}")
        return None

# Note: Uncomment the line below to test microphone input
# recognized_text = speech_to_text_from_microphone()
print("💡 To test microphone input, uncomment the line above and run the cell")

💡 To test microphone input, uncomment the line above and run the cell


In [None]:
def speech_to_text_from_file(audio_file_path):
    """
    Convert speech from audio file to text
    
    Args:
        audio_file_path (str): Path to audio file (can be relative or absolute)
    
    Returns:
        str: Recognized text or None if failed
    """
    try:
        # Check if speech_config is available
        if 'speech_config' not in globals():
            print("❌ Speech configuration not available. Please run the configuration cell first.")
            return None
            
        # Handle both absolute and relative paths
        audio_path = Path(audio_file_path)
        if not audio_path.is_absolute():
            # Try in output/speech directory first
            test_path = output_dir / audio_file_path
            if test_path.exists():
                audio_path = test_path
            elif not audio_path.exists():
                print(f"❌ Audio file not found: {audio_file_path}")
                print(f"   Searched in: {audio_path.absolute()} and {test_path.absolute()}")
                return None
        
        if not audio_path.exists():
            print(f"❌ Audio file not found: {audio_path.absolute()}")
            return None
        
        # Create recognizer with audio file
        audio_config = speechsdk.audio.AudioConfig(filename=str(audio_path))
        speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
        
        print(f"🎵 Processing audio file: {audio_path.name}")
        
        # Start recognition
        result = speech_recognizer.recognize_once_async().get()
        
        if result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print(f"✅ Recognized: '{result.text}'")
            return result.text
        elif result.reason == speechsdk.ResultReason.NoMatch:
            print("❌ No speech could be recognized in the audio file")
        elif result.reason == speechsdk.ResultReason.Canceled:
            cancellation = result.cancellation_details
            print(f"❌ Speech recognition canceled: {cancellation.reason}")
            if cancellation.reason == speechsdk.CancellationReason.Error:
                print(f"Error details: {cancellation.error_details}")
        
        return None
        
    except Exception as e:
        print(f"❌ Error processing audio file: {e}")
        return None

# Test speech-to-text with previously generated audio file
sample_file = "sample_speech.wav"
if (output_dir / sample_file).exists():
    print("\n🔄 Testing Speech-to-Text with generated audio...")
    recognized_text = speech_to_text_from_file(sample_file)
    if recognized_text:
        print(f"📝 Original text: '{test_text}'")
        print(f"🎯 Recognized text: '{recognized_text}'")
        
        # Calculate similarity
        original_words = test_text.lower().split()
        recognized_words = recognized_text.lower().split()
        common_words = set(original_words) & set(recognized_words)
        similarity = len(common_words) / max(len(original_words), len(recognized_words)) * 100
        print(f"📊 Similarity: {similarity:.1f}%")
else:
    print("\n💡 No audio file found. Run the TTS examples first to generate sample audio.")


🔄 Testing Speech-to-Text with generated audio...
❌ Error processing audio file: Exception with error code: 
[CALL STACK BEGIN]

    > pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - recognizer_create_speech_recognizer_from_config
    - recognizer_create_speech_recognizer_from_config

[CALL STACK END]

Exception with an error code: 0x9 (SPXERR_UNEXPECTED_EOF)
❌ Error processing audio file: Exception with error code: 
[CALL STACK BEGIN]

    > pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to

In [23]:
def continuous_speech_recognition(duration_seconds=30):
    """
    Perform continuous speech recognition
    
    Args:
        duration_seconds (int): How long to listen
    
    Returns:
        list: List of recognized text segments
    """
    try:
        recognized_texts = []
        
        # Create recognizer
        audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
        speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
        
        # Event handlers
        def recognized_handler(evt):
            if evt.result.text:
                print(f"🎯 Recognized: {evt.result.text}")
                recognized_texts.append(evt.result.text)
        
        def recognizing_handler(evt):
            if evt.result.text:
                print(f"⏳ Recognizing: {evt.result.text}")
        
        # Connect event handlers
        speech_recognizer.recognized.connect(recognized_handler)
        speech_recognizer.recognizing.connect(recognizing_handler)
        
        print(f"🎤 Starting continuous recognition for {duration_seconds} seconds...")
        print("📢 Speak multiple sentences!")
        
        # Start continuous recognition
        speech_recognizer.start_continuous_recognition()
        
        # Wait for the specified duration
        time.sleep(duration_seconds)
        
        # Stop recognition
        speech_recognizer.stop_continuous_recognition()
        
        print(f"\n✅ Continuous recognition completed!")
        print(f"📊 Total segments recognized: {len(recognized_texts)}")
        
        return recognized_texts
        
    except Exception as e:
        print(f"❌ Error in continuous recognition: {e}")
        return []

# Note: Uncomment to test continuous recognition
# print("\n🔄 Testing Continuous Speech Recognition...")
# segments = continuous_speech_recognition(15)  # Listen for 15 seconds
# if segments:
#     print("\n📝 All recognized segments:")
#     for i, segment in enumerate(segments, 1):
#         print(f"   {i}. {segment}")

print("💡 To test continuous recognition, uncomment the code above and run the cell")

💡 To test continuous recognition, uncomment the code above and run the cell


## 5. Speech Translation

Speech Translation provides real-time translation of spoken language. Features:
- **30+ source languages** supported
- **100+ target languages** for translation
- **Real-time translation** from speech to text or speech
- **Multi-target translation** to multiple languages simultaneously

In [None]:
def speech_translation_demo(source_language="en-US", target_languages=["es", "fr", "de"]):
    """
    Demonstrate speech translation capabilities
    
    Args:
        source_language (str): Source language code (e.g., 'en-US')
        target_languages (list): List of target language codes (e.g., ['es', 'fr'])
    
    Returns:
        dict: Translation results
    """
    try:
        # Check if speech configuration is available
        if 'speech_key' not in globals() or not speech_key:
            print("❌ Speech configuration not available. Please run the configuration cell first.")
            return None
            
        # Create translation config
        translation_config = speechsdk.translation.SpeechTranslationConfig(
            subscription=speech_key, 
            region=speech_region
        )
        
        # Set source language
        translation_config.speech_recognition_language = source_language
        
        # Add target languages
        for target_lang in target_languages:
            translation_config.add_target_language(target_lang)
        
        # Create recognizer
        audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
        translation_recognizer = speechsdk.translation.TranslationRecognizer(
            translation_config=translation_config, 
            audio_config=audio_config
        )
        
        print(f"🌐 Speech Translation Setup:")
        print(f"   🎤 Source Language: {source_language}")
        print(f"   🌍 Target Languages: {', '.join(target_languages)}")
        print(f"\n📢 Please speak in {source_language}...")
        
        # Perform translation
        result = translation_recognizer.recognize_once_async().get()
        
        if result.reason == speechsdk.ResultReason.TranslatedSpeech:
            print(f"\n✅ Translation completed!")
            print(f"🎯 Original ({source_language}): '{result.text}'")
            
            translation_results = {
                'original': result.text,
                'translations': {}
            }
            
            print(f"\n🌐 Translations:")
            for target_lang in target_languages:
                if target_lang in result.translations:
                    translated_text = result.translations[target_lang]
                    print(f"   📍 {target_lang}: '{translated_text}'")
                    translation_results['translations'][target_lang] = translated_text
            
            return translation_results
            
        elif result.reason == speechsdk.ResultReason.NoMatch:
            print("❌ No speech could be recognized for translation")
        elif result.reason == speechsdk.ResultReason.Canceled:
            cancellation = result.cancellation_details
            print(f"❌ Translation canceled: {cancellation.reason}")
            if cancellation.reason == speechsdk.CancellationReason.Error:
                print(f"Error details: {cancellation.error_details}")
        
        return None
        
    except Exception as e:
        print(f"❌ Error in speech translation: {e}")
        return None

# Demo with text-to-speech for testing translation
def test_translation_with_generated_speech():
    """
    Test translation using generated speech
    """
    # Generate test audio in different languages
    test_phrases = {
        "en-US": "Hello, my name is John and I love technology.",
        "es-ES": "Hola, me llamo Juan y me encanta la tecnología.",
        "fr-FR": "Bonjour, je m'appelle Jean et j'adore la technologie."
    }
    
    for lang, phrase in test_phrases.items():
        print(f"\n🎭 Testing with {lang}: '{phrase}'")
        
        # Generate speech
        voice_mapping = {
            "en-US": "en-US-AriaNeural",
            "es-ES": "es-ES-ElviraNeural", 
            "fr-FR": "fr-FR-DeniseNeural"
        }
        
        audio_file = f"test_{lang.lower().replace('-', '_')}.wav"
        result_path = text_to_speech(phrase, voice_name=voice_mapping[lang], output_file=audio_file)
        if result_path:
            # Test speech-to-text recognition
            recognized = speech_to_text_from_file(audio_file)
            if recognized:
                print(f"✅ Recognition successful: '{recognized}'")

# Note: Uncomment to test speech translation
# print("\n🌐 Testing Speech Translation...")
# translation_result = speech_translation_demo()

print("💡 To test speech translation, uncomment the code above")
print("🎵 Testing with generated speech files:")
test_translation_with_generated_speech()

💡 To test speech translation, uncomment the code above
🎵 Or test with generated speech files:

🎭 Testing with en-US: 'Hello, my name is John and I love technology.'
🎵 Converting text to speech...
📝 Text: 'Hello, my name is John and I love technology.'
🗣️ Voice: en-US-AriaNeural
❌ Speech synthesis canceled: CancellationReason.Error
Error details: WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 bytes.

🎭 Testing with es-ES: 'Hola, me llamo Juan y me encanta la tecnología.'
🎵 Converting text to speech...
📝 Text: 'Hola, me llamo Juan y me encanta la tecnología.'
🗣️ Voice: es-ES-ElviraNeural
❌ Speech synthesis canceled: CancellationReason.Error
Error details: WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 bytes.

🎭 Testing with es-ES: 'Hola, me llamo Juan y me encanta la tecnología.'
🎵

## 6. Available Voices and Languages

Let's explore the available voices and languages in Azure Speech Service.

In [25]:
def get_available_voices():
    """
    Get list of available voices from Azure Speech Service
    
    Returns:
        list: List of available voices with details
    """
    try:
        # Create synthesizer to get voices
        synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
        
        print("🎭 Fetching available voices...")
        
        # Get voices
        result = synthesizer.get_voices_async().get()
        
        if result.reason == speechsdk.ResultReason.VoicesListRetrieved:
            voices = result.voices
            print(f"✅ Found {len(voices)} available voices")
            
            # Group voices by language
            voices_by_language = {}
            for voice in voices:
                locale = voice.locale
                if locale not in voices_by_language:
                    voices_by_language[locale] = []
                
                voices_by_language[locale].append({
                    'name': voice.name,
                    'gender': voice.gender.name,
                    'voice_type': voice.voice_type.name
                })
            
            # Display popular languages and their voices
            popular_languages = ['en-US', 'en-GB', 'es-ES', 'fr-FR', 'de-DE', 'it-IT', 'ja-JP', 'ko-KR', 'zh-CN']
            
            print("\n🌍 Popular Languages and Voices:")
            for lang in popular_languages:
                if lang in voices_by_language:
                    voices = voices_by_language[lang]
                    print(f"\n📍 {lang} ({len(voices)} voices):")
                    for voice in voices[:5]:  # Show first 5 voices
                        print(f"   🗣️ {voice['name']} ({voice['gender']}, {voice['voice_type']})")
                    if len(voices) > 5:
                        print(f"   ... and {len(voices) - 5} more")
            
            return voices_by_language
            
        else:
            print(f"❌ Failed to retrieve voices: {result.reason}")
            return None
            
    except Exception as e:
        print(f"❌ Error retrieving voices: {e}")
        return None

# Get and display available voices
available_voices = get_available_voices()

🎭 Fetching available voices...
❌ Failed to retrieve voices: ResultReason.Canceled
❌ Failed to retrieve voices: ResultReason.Canceled


## 7. Performance Tips and Best Practices

### 🔧 Optimization Techniques
- **Batch Processing**: Process multiple audio files together
- **Streaming**: Use streaming for real-time applications
- **Caching**: Cache frequently used audio outputs
- **Connection Reuse**: Reuse speech service connections

### 🛡️ Security Best Practices
- **Use Managed Identity** when possible
- **Rotate API keys** regularly
- **Monitor usage** and set up alerts
- **Secure audio data** in transit and at rest

In [None]:
import time
from datetime import datetime
from pathlib import Path

def performance_test_tts(text_samples, voice_name="en-US-AriaNeural"):
    """
    Test text-to-speech performance with multiple samples
    
    Args:
        text_samples (list): List of text samples to test
        voice_name (str): Voice to use for synthesis
    
    Returns:
        dict: Performance metrics
    """
    print(f"⚡ Performance Testing - Text-to-Speech")
    print(f"🎭 Voice: {voice_name}")
    print(f"📊 Samples: {len(text_samples)}")
    
    results = {
        'total_time': 0,
        'successful_conversions': 0,
        'failed_conversions': 0,
        'average_time_per_sample': 0,
        'samples_per_minute': 0,
        'generated_files': []
    }
    
    start_time = time.time()
    
    for i, text in enumerate(text_samples, 1):
        print(f"\n🔄 Processing sample {i}/{len(text_samples)}...")
        sample_start = time.time()
        
        output_file = f"perf_test_{i}.wav"
        result_path = text_to_speech(
            text, 
            voice_name=voice_name, 
            output_file=output_file
        )
        
        sample_end = time.time()
        sample_time = sample_end - sample_start
        
        if result_path:
            results['successful_conversions'] += 1
            results['generated_files'].append(result_path)
            print(f"✅ Completed in {sample_time:.2f} seconds")
        else:
            results['failed_conversions'] += 1
            print(f"❌ Failed after {sample_time:.2f} seconds")
    
    end_time = time.time()
    results['total_time'] = end_time - start_time
    
    if results['successful_conversions'] > 0:
        results['average_time_per_sample'] = results['total_time'] / results['successful_conversions']
        results['samples_per_minute'] = 60 / results['average_time_per_sample']
    
    # Display results
    print(f"\n📊 Performance Test Results:")
    print(f"   ⏱️ Total Time: {results['total_time']:.2f} seconds")
    print(f"   ✅ Successful: {results['successful_conversions']}")
    print(f"   ❌ Failed: {results['failed_conversions']}")
    print(f"   📈 Average Time/Sample: {results['average_time_per_sample']:.2f} seconds")
    print(f"   🚀 Samples/Minute: {results['samples_per_minute']:.1f}")
    
    return results

# Performance test samples
perf_test_samples = [
    "Short text for testing.",
    "This is a medium length text sample that contains more words and should take a bit longer to process.",
    "This is a much longer text sample that contains significantly more content and words, which should demonstrate how the speech synthesis performance scales with longer input text. It includes multiple sentences and various punctuation marks to test the system thoroughly.",
    "Quick test!",
    "Azure Speech Services provide powerful text-to-speech and speech-to-text capabilities."
]

# Run performance test
print("\n⚡ Running TTS Performance Test...")
perf_results = performance_test_tts(perf_test_samples)

# Show generated files
if perf_results['generated_files']:
    print(f"\n📁 Generated Files:")
    for file_path in perf_results['generated_files']:
        file_size = Path(file_path).stat().st_size
        print(f"   🎵 {Path(file_path).name} ({file_size:,} bytes)")


⚡ Running TTS Performance Test...
⚡ Performance Testing - Text-to-Speech
🎭 Voice: en-US-AriaNeural
📊 Samples: 5

🔄 Processing sample 1/5...
🎵 Converting text to speech...
📝 Text: 'Short text for testing.'
🗣️ Voice: en-US-AriaNeural
❌ Speech synthesis canceled: CancellationReason.Error
Error details: WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 bytes.
❌ Failed after 1.03 seconds

🔄 Processing sample 2/5...
🎵 Converting text to speech...
📝 Text: 'This is a medium length text sample that contains more words and should take a bit longer to process.'
🗣️ Voice: en-US-AriaNeural
❌ Speech synthesis canceled: CancellationReason.Error
Error details: WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 bytes.
❌ Failed after 1.03 seconds

🔄 Processing sample 2/5...
🎵 Converting text to speech..

## 8. Troubleshooting and Common Issues

### 🔧 Common Problems and Solutions

1. **Authentication Issues**
   - ❌ Error: "Invalid subscription key"
   - ✅ Solution: Check your API key and region settings

2. **Audio Issues**
   - ❌ Error: "No microphone detected"
   - ✅ Solution: Check microphone permissions and hardware

3. **Rate Limiting**
   - ❌ Error: "Too many requests"
   - ✅ Solution: Implement retry logic with exponential backoff

4. **Language Support**
   - ❌ Error: "Language not supported"
   - ✅ Solution: Check available languages and update configuration

In [None]:
def run_diagnostics():
    """
    Run diagnostic tests for Azure Speech Service setup
    """
    print("🔧 Running Azure Speech Service Diagnostics...")
    print("=" * 60)
    
    # Test 1: Configuration Check
    print("\n1️⃣ Configuration Check:")
    if 'speech_key' in globals() and speech_key and speech_key != "your-speech-key-here":
        print("   ✅ API Key: Configured")
    else:
        print("   ❌ API Key: Not configured")
    
    if 'speech_region' in globals() and speech_region:
        print(f"   ✅ Region: {speech_region}")
    else:
        print("   ❌ Region: Not configured")
    
    # Test 2: Output Directory
    print("\n2️⃣ Output Directory Check:")
    if output_dir.exists():
        print(f"   ✅ Output Directory: {output_dir.absolute()}")
        files_count = len(list(output_dir.glob('*.wav')))
        print(f"   📁 Audio files in directory: {files_count}")
    else:
        print(f"   ❌ Output Directory: Not found")
    
    # Test 3: Service Connectivity
    print("\n3️⃣ Service Connectivity:")
    try:
        if 'speech_config' in globals():
            # Try to get voices (this tests connectivity)
            synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
            result = synthesizer.get_voices_async().get()
            if result.reason == speechsdk.ResultReason.VoicesListRetrieved:
                print(f"   ✅ Connection: Success ({len(result.voices)} voices available)")
            else:
                print(f"   ❌ Connection: Failed ({result.reason})")
        else:
            print("   ❌ Connection: Speech config not available")
    except Exception as e:
        print(f"   ❌ Connection: Error - {e}")
    
    # Test 4: Basic TTS Test
    print("\n4️⃣ Basic Text-to-Speech Test:")
    test_file = "diagnostic_test.wav"
    result_path = text_to_speech(
        "This is a diagnostic test.", 
        output_file=test_file
    )
    if result_path:
        print("   ✅ TTS: Working correctly")
        file_size = Path(result_path).stat().st_size
        print(f"   📁 File size: {file_size:,} bytes")
    else:
        print("   ❌ TTS: Not working")
    
    # Test 5: Audio File Processing
    print("\n5️⃣ Audio File Processing Test:")
    if result_path and Path(result_path).exists():
        stt_result = speech_to_text_from_file(test_file)
        if stt_result:
            print("   ✅ STT: Working correctly")
            print(f"   📝 Recognized: '{stt_result}'")
        else:
            print("   ❌ STT: Not working")
    else:
        print("   ❌ Audio file not found for STT test")
    
    # Test 6: File Listing
    print("\n6️⃣ Generated Files Summary:")
    audio_files = list(output_dir.glob('*.wav'))
    if audio_files:
        print(f"   📊 Total audio files: {len(audio_files)}")
        total_size = sum(f.stat().st_size for f in audio_files)
        print(f"   💾 Total size: {total_size:,} bytes ({total_size/1024/1024:.1f} MB)")
        print("   📁 Recent files:")
        for file in sorted(audio_files, key=lambda x: x.stat().st_mtime)[-5:]:
            print(f"      🎵 {file.name}")
    else:
        print("   📁 No audio files found")
    
    print("\n" + "=" * 60)
    print("🏁 Diagnostics completed!")

# Run diagnostics
run_diagnostics()

🔧 Running Azure Speech Service Diagnostics...

1️⃣ Configuration Check:
   ✅ API Key: Configured
   ✅ Region: eastus

2️⃣ Service Connectivity:
   ❌ Connection: Failed (ResultReason.Canceled)

3️⃣ Basic Text-to-Speech Test:
🎵 Converting text to speech...
📝 Text: 'This is a diagnostic test.'
🗣️ Voice: en-US-AriaNeural
   ❌ Connection: Failed (ResultReason.Canceled)

3️⃣ Basic Text-to-Speech Test:
🎵 Converting text to speech...
📝 Text: 'This is a diagnostic test.'
🗣️ Voice: en-US-AriaNeural
❌ Speech synthesis canceled: CancellationReason.Error
Error details: WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 bytes.
   ❌ TTS: Not working

4️⃣ Audio File Processing Test:
❌ Speech synthesis canceled: CancellationReason.Error
Error details: WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 by

## 9. Interactive Demo

Try these interactive features with your own content!

In [None]:
# 🎯 Interactive Speech Demo
# Customize these variables and run the cell!

# Text-to-Speech Demo
your_text = "Replace this with your own text to convert to speech!"
your_voice = "en-US-AriaNeural"  # Try: en-GB-SoniaNeural, es-ES-ElviraNeural, fr-FR-DeniseNeural

# Uncomment to test your text
# print("🎵 Converting your text to speech...")
# result = text_to_speech(your_text, voice_name=your_voice, output_file="your_speech.wav")
# if result:
#     print(f"✅ Your audio saved to: {result}")

# SSML Demo
your_ssml = '''
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
    <voice name="en-US-AriaNeural">
        Welcome to <emphasis level="strong">Azure Speech Services</emphasis>!
        <break time="1s"/>
        <prosody rate="slow">This is amazing technology.</prosody>
    </voice>
</speak>
'''

# Uncomment to test your SSML
# print("\n🎭 Converting your SSML to speech...")
# result = ssml_to_speech(your_ssml, output_file="your_ssml_speech.wav")
# if result:
#     print(f"✅ Your SSML audio saved to: {result}")

print("💡 Customize the variables above and uncomment the code to test!")
print("🎤 Available features:")
print("   • Text-to-Speech with different voices")
print("   • SSML for advanced speech control")
print("   • Speech-to-Text from microphone")
print("   • Speech translation between languages")
print(f"\n📁 All audio files will be saved to: {output_dir.absolute()}")

# Show current audio files
audio_files = list(output_dir.glob('*.wav'))
if audio_files:
    print(f"\n🎵 Current audio files ({len(audio_files)}):")
    for file in sorted(audio_files)[-10:]:  # Show last 10 files
        file_size = file.stat().st_size
        print(f"   🎶 {file.name} ({file_size:,} bytes)")
else:
    print("\n📁 No audio files generated yet. Run the examples above to create some!")

💡 Customize the variables above and uncomment the code to test!
🎤 Available features:
   • Text-to-Speech with different voices
   • SSML for advanced speech control
   • Speech-to-Text from microphone
   • Speech translation between languages


## 10. Conclusion and Next Steps

🎉 **Congratulations!** You've successfully explored Azure AI Speech Services capabilities including:
- ✅ Text-to-Speech with neural voices
- ✅ Speech-to-Text recognition
- ✅ SSML for advanced speech control
- ✅ Speech translation
- ✅ Performance optimization
- ✅ Diagnostic tools

### 🚀 Next Steps
1. **Explore other notebooks** in this series:
   - Azure AI Language Services ✅
   - Azure AI Vision Services ➡️
   - Azure AI Document Intelligence ➡️

2. **Build real applications** using Speech Services:
   - Voice assistants and chatbots
   - Accessibility tools for visually impaired
   - Language learning applications
   - Real-time transcription services
   - Multi-language customer support

3. **Advanced features to explore**:
   - Custom Neural Voice creation
   - Speaker Recognition and Verification
   - Conversation Transcription
   - Pronunciation Assessment
   - Audio Content Creation

### 📚 Additional Resources
- [Azure Speech Service Documentation](https://docs.microsoft.com/azure/cognitive-services/speech-service/)
- [Speech SDK Samples](https://github.com/Azure-Samples/cognitive-services-speech-sdk)
- [Voice Gallery](https://speech.microsoft.com/portal/voicegallery)
- [SSML Reference](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup)
- [Speech Service Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/)

### 🔗 Useful Links
- [Speech Studio Portal](https://speech.microsoft.com/portal) - Test and customize speech models
- [Language Support](https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support) - Complete language list
- [Voice Samples](https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support#neural-voices) - Listen to voice samples

**Happy coding with Azure Speech Services! 🎤🎵**