# 🔊 Converting Text into Audio with OpenAI TTS

## Learn to Generate Professional Audio for IT Support Scenarios

## 📚 Introduction

### What You'll Learn
In this notebook, you'll learn how to use OpenAI's Text-to-Speech (TTS) API to convert written text into professional audio files. You'll create real-world IT support announcements and notifications that can be used in automated phone systems, monitoring dashboards, and multi-channel communications.

### 🎯 The Business Problem

IT support teams face several communication challenges:

- **Time-consuming manual recording**: Creating voice announcements for every system update requires recording time and equipment
- **Inconsistent messaging**: Different team members may deliver the same information differently
- **24/7 availability needs**: Multi-shift teams need standardized audio communications that sound professional at any time
- **Multi-channel communication**: Teams need to deliver updates via phone systems, monitoring dashboards, and accessibility features
- **Quick response requirements**: When incidents occur, teams need to generate and deploy announcements quickly

### 💡 The Solution: Text-to-Speech

TTS technology can **instantly** convert written system updates into professional audio files, enabling:
- Consistent, high-quality voice communications
- Rapid deployment of announcements
- Easy updates without re-recording
- Professional sound quality without recording equipment

### 🔑 Key Concepts

- **Text-to-Speech (TTS)**: Technology that converts written text into spoken audio
- **Voice Synthesis**: The process of generating human-like speech from text
- **Audio Formats**: Different file types (MP3, WAV, etc.) for storing audio
- **Voice Selection**: Choosing different voices for different contexts

### 💰 Pricing Model

OpenAI's TTS API charges **per character processed**, making it cost-effective for generating announcements on-demand.

### 🏢 Real-World Applications in IT Support

- ✅ **Automated phone system announcements** - "Your call is important to us..."
- ✅ **Voice alerts for monitoring dashboards** - "Critical: Database server is down"
- ✅ **Accessibility features for documentation** - Making written docs available as audio
- ✅ **Multi-language support notifications** - Generate announcements in multiple languages
- ✅ **Standard operating procedure (SOP) audio guides** - Voice versions of written procedures

## 🔧 Setup

### Install Dependencies

First, let's install the OpenAI Python library:

In [None]:
!pip install -q openai

### Import Required Libraries

In [None]:
import os
from pathlib import Path
from openai import OpenAI
from IPython.display import Audio, display

### Configure OpenAI API Key

💡 **Recommended**: Store your API key in Colab secrets for security
- Go to 🔑 (left sidebar)
- Click "Add new secret"
- Name: `OPENAI_API_KEY`
- Value: Your OpenAI API key

In [None]:
# Configure OpenAI API key
# Method 1: Try to get API key from Colab secrets (recommended)
try:
    from google.colab import userdata
    OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
    print("✅ API key loaded from Colab secrets")
except:
    # Method 2: Manual input (fallback)
    from getpass import getpass
    print("💡 To use Colab secrets: Go to 🔑 (left sidebar) → Add new secret → Name: OPENAI_API_KEY")
    OPENAI_API_KEY = getpass("Enter your OpenAI API Key: ")

# Set the API key as an environment variable
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

# Validate that the API key is set
if not OPENAI_API_KEY or OPENAI_API_KEY.strip() == "":
    raise ValueError("❌ ERROR: No API key provided!")

print("✅ Authentication configured!")

# Configure which OpenAI model to use
OPENAI_MODEL = "gpt-5-nano"  # Using gpt-5-nano for cost efficiency
print(f"🤖 Selected Model: {OPENAI_MODEL}")

### Initialize OpenAI Client

In [None]:
# Initialize the OpenAI client
client = OpenAI(api_key=OPENAI_API_KEY)

print("✅ OpenAI client initialized!")

### Create Audio Files Directory

In [None]:
# Create directory for storing audio files
audio_dir = Path("/content/audio_files")
audio_dir.mkdir(parents=True, exist_ok=True)

print(f"✅ Audio directory created: {audio_dir}")

### 🎤 TTS Configuration

**Model**: We'll use `tts-1` (standard quality, cost-effective for most use cases)

**Available Voices**:
- `alloy` - Neutral, balanced voice
- `echo` - Clear, professional voice  
- `fable` - Warm, expressive voice
- `onyx` - Deep, authoritative voice (recommended for professional IT announcements)
- `nova` - Friendly, energetic voice
- `shimmer` - Soft, calm voice

**Audio Format**: MP3 (default, widely compatible)

## 🎓 Understanding Text-to-Speech

### How TTS Works

Text-to-Speech technology analyzes written text and generates synthetic speech that sounds natural and human-like. The process involves:

1. **Text Analysis**: Breaking down the text into words, sentences, and punctuation
2. **Pronunciation**: Determining how each word should be pronounced
3. **Prosody Generation**: Adding natural rhythm, intonation, and pacing
4. **Audio Synthesis**: Creating the actual audio waveform

### 📝 When Text Works Well for TTS

✅ **Good for TTS**:
- Clear, grammatically correct sentences
- Proper punctuation (helps with pacing and pauses)
- Standard words and common abbreviations
- Well-formatted text with natural flow

⚠️ **Problematic for TTS**:
- Excessive special characters (@, #, $, etc.)
- Complex technical abbreviations (might be mispronounced)
- All caps text (may sound unnatural)
- Unclear formatting or run-on sentences

### 🔧 Key TTS API Parameters

- **`model`**: `tts-1` (standard quality, cost-effective)
- **`voice`**: Choose from 6 voices (alloy, echo, fable, onyx, nova, shimmer)
- **`input`**: The text to convert (be mindful of character limits)
- **`response_format`**: Audio format - `mp3` (default), `opus`, `aac`, `flac`

### 🎯 Simple "Hello World" Example

Let's start with a simple test to verify everything works:

In [None]:
# Simple test text
test_text = "Hello, this is a test of the text-to-speech system."

# Save to file path
test_file = audio_dir / "test.mp3"

# Generate audio and save using streaming response
with client.audio.speech.with_streaming_response.create(
    model="tts-1",
    voice="onyx",
    input=test_text
) as response:
    response.stream_to_file(test_file)

print(f"✅ Test audio generated successfully!")
print(f"📁 File saved to: {test_file}")

# Play the audio in the notebook
print(f"\n🎧 Listen to the audio:")
display(Audio(test_file))

## 📢 Basic Example: System Status Announcement

Let's create a realistic system status notification for a planned maintenance window.

### The Written Announcement

Here's our planned maintenance notification:

---

**Attention all users.** 

This is a scheduled maintenance notification for the customer database system. 

Maintenance will begin on Saturday, January 20th at 2:00 AM Eastern Time and is expected to last approximately 4 hours. 

During this time, the customer database will be unavailable. All other systems will remain operational. 

If you have any questions, please contact the IT helpdesk at extension 5500. 

Thank you for your patience.

---

In [None]:
# System status notification text
system_status_text = """
Attention all users. 

This is a scheduled maintenance notification for the customer database system. 

Maintenance will begin on Saturday, January 20th at 2:00 AM Eastern Time and is expected to last approximately 4 hours. 

During this time, the customer database will be unavailable. All other systems will remain operational. 

If you have any questions, please contact the IT helpdesk at extension 5500. 

Thank you for your patience.
""".strip()

# Display character count for cost estimation
char_count = len(system_status_text)
print(f"📊 Character count: {char_count} characters")
print(f"💰 Estimated cost: ~€{(char_count / 1_000_000) * 15:.6f} EUR\n")

# Save to file path
status_file = audio_dir / "system_status.mp3"

# Generate audio using tts-1 with professional "onyx" voice
with client.audio.speech.with_streaming_response.create(
    model="tts-1",
    voice="onyx",  # Professional, authoritative voice
    input=system_status_text
) as response:
    response.stream_to_file(status_file)

# Display results
file_size = status_file.stat().st_size
print("\n" + "="*50)
print("✅ System Status Announcement Generated!")
print("="*50)
print(f"\n📄 Original text:")
print(system_status_text)
print(f"\n📁 Audio file: {status_file}")
print(f"📦 File size: {file_size:,} bytes ({file_size/1024:.1f} KB)")
print(f"🎤 Voice used: onyx")
print(f"🔊 Format: MP3")

# Play the audio in the notebook
print(f"\n🎧 Listen to the audio:")
display(Audio(status_file))

## 🚨 Practical Example: Incident Notification

In this example, we'll use a **two-step process**:
1. **Generate professional notification text** using `gpt-5-nano`
2. **Convert that text to audio** using `tts-1`

This simulates a real-world scenario where you need to quickly create and deploy an incident announcement.

### Step 1: Generate Incident Notification Text with GPT-5-Nano

In [None]:
# Real-world incident scenario
incident_scenario = "Email service outage affecting all users, resolved after 2 hours by restarting mail server cluster"

print("🔄 Generating professional incident notification...\n")

# Generate professional notification text using gpt-5-nano
notification_response = client.responses.create(
    model=OPENAI_MODEL,
    input=[
        {
            "role": "developer",
            "content": "You are an IT communications specialist. Create professional incident notification announcements for phone systems. Keep it concise (under 100 words), clear, and professional. Include: what happened, impact, resolution, and apology."
        },
        {
            "role": "user",
            "content": incident_scenario
        }
    ]
)

# Extract generated notification text
incident_notification = notification_response.output_text

# Display generated text
print("="*50)
print("📢 Generated Incident Notification")
print("="*50)
print(incident_notification)
print("\n" + "="*50)
print(f"📊 Character count: {len(incident_notification)} characters")

### Step 2: Convert Text to Audio with TTS-1

In [None]:
print("\n🔊 Converting notification to audio...\n")

# Save to file path
incident_file = audio_dir / "incident_notification.mp3"

# Generate audio from the notification text
with client.audio.speech.with_streaming_response.create(
    model="tts-1",
    voice="onyx",  # Professional voice for incident notifications
    input=incident_notification
) as response:
    response.stream_to_file(incident_file)

# Display results
file_size = incident_file.stat().st_size
print("="*50)
print("✅ Incident Notification Audio Created!")
print("="*50)
print(f"\n📁 Audio file: {incident_file}")
print(f"📦 File size: {file_size:,} bytes ({file_size/1024:.1f} KB)")
print(f"🎤 Voice: onyx")
print(f"🔊 Format: MP3")
print(f"\n💡 This audio is ready to be deployed to your phone system!")

# Play the audio in the notebook
print(f"\n🎧 Listen to the audio:")
display(Audio(incident_file))

## 📝 Text Formatting Best Practices for TTS

### Why Text Formatting Matters

The way you format your text has a **significant impact** on audio quality. Well-formatted text produces natural-sounding speech, while poorly formatted text can sound choppy, unclear, or robotic.

### ✅ Best Practices

1. **Use proper punctuation** - Periods, commas, and other punctuation create natural pauses
2. **Spell out abbreviations** - Avoid acronyms that might be mispronounced (e.g., "IT" vs "Information Technology")
3. **Handle numbers carefully** - Spell out numbers when clarity is needed ("two hours" vs "2 hours")
4. **Keep sentences reasonably short** - Long sentences can sound rushed or unclear
5. **Avoid special characters** - Characters like @, #, $, etc. may be mispronounced
6. **Use complete words** - Avoid text-speak abbreviations like "pls", "thx", "u"

### 🔬 Comparison Example

Let's compare BAD formatting vs GOOD formatting:

#### ❌ BAD TEXT Example

**Problem**: Excessive abbreviations, special characters, text-speak

In [None]:
# BAD formatting example
bad_text = "DB srv down @14:30. ETA 2h. Pls wait. Thx."

print("❌ BAD TEXT:")
print(bad_text)
print(f"\nCharacter count: {len(bad_text)}")

# Save to file path
bad_file = audio_dir / "bad_formatting.mp3"

# Generate audio
with client.audio.speech.with_streaming_response.create(
    model="tts-1",
    voice="onyx",
    input=bad_text
) as response:
    response.stream_to_file(bad_file)

print(f"\n📁 Audio saved to: {bad_file}")
print("🎧 Listen to hear how abbreviations and special characters sound unclear")

# Play the audio in the notebook
print(f"\n🎧 Listen to the audio:")
display(Audio(bad_file))

#### ✅ GOOD TEXT Example

**Solution**: Clear, complete sentences with proper punctuation

In [None]:
# GOOD formatting example
good_text = "The database server is currently down as of 2:30 PM. The estimated resolution time is 2 hours. Please wait for further updates. Thank you for your patience."

print("✅ GOOD TEXT:")
print(good_text)
print(f"\nCharacter count: {len(good_text)}")

# Save to file path
good_file = audio_dir / "good_formatting.mp3"

# Generate audio
with client.audio.speech.with_streaming_response.create(
    model="tts-1",
    voice="onyx",
    input=good_text
) as response:
    response.stream_to_file(good_file)

print(f"\n📁 Audio saved to: {good_file}")
print("🎧 Listen to hear how clear formatting produces natural-sounding speech")

# Play the audio in the notebook
print(f"\n🎧 Listen to the audio:")
display(Audio(good_file))

### 🎧 Compare the Results

Listen to both audio files:
- `bad_formatting.mp3` - Notice how abbreviations sound unclear or robotic
- `good_formatting.mp3` - Notice the natural flow and clarity

### 💡 Key Takeaway

**Spending a few extra seconds to format your text properly will dramatically improve the quality and professionalism of your audio output.**

The good example is longer (~140 characters vs ~40), but the audio quality and clarity are significantly better. For professional IT communications, clarity is more important than brevity.

## 🎯 Best Practices & Key Takeaways

### ✅ Best Practices Summary

1. **Write clear, grammatically correct text** - TTS works best with well-written content
2. **Use proper punctuation** - Creates natural speech rhythm and pauses
3. **Test audio before production deployment** - Always listen to generated audio before using it live
4. **Choose appropriate voice for context** - `onyx` for professional/authoritative, `nova` for friendly, etc.
5. **Keep announcements concise but informative** - Balance brevity with clarity
6. **Store audio files with descriptive names** - Use names like `system_status.mp3`, `incident_notification.mp3`
7. **Track generation costs for budgeting** - Monitor character counts to estimate API costs
8. **Spell out ambiguous abbreviations** - "Information Technology" instead of "I.T."
9. **Use complete sentences** - Avoid fragments that might sound choppy
10. **Consider your audience** - Professional tone for external communications, friendly for internal

### 🎯 When to Use TTS in IT Support

#### ✅ Excellent Use Cases

- **Standardized system notifications** - Consistent messaging across all announcements
- **Automated phone system messages** - Hold music, menu options, status updates
- **Accessibility features** - Making documentation available as audio
- **Multi-language announcements** - Generate same message in multiple languages
- **Scheduled maintenance announcements** - Pre-planned communications
- **Training materials** - Converting written guides to audio format

#### ⚠️ Use with Review

- **Emergency communications** - Test thoroughly before deployment
- **Complex technical instructions** - Ensure all terms are pronounced correctly
- **Legal/compliance messages** - Verify accuracy and tone
- **Customer-facing announcements** - Quality check for professionalism

#### ❌ Not Recommended For

- **Real-time conversational systems** - Use speech-to-speech or other technologies instead
- **Highly personalized messages** - May lack the warmth of human voice
- **Situations requiring immediate human judgment** - TTS is not AI-powered decision making

### 💰 Cost Optimization Tips

1. Use `tts-1` (standard quality) instead of `tts-1-hd` for most use cases
2. Keep text concise while maintaining clarity
3. Cache frequently-used announcements instead of regenerating
4. Monitor character counts before generating
5. Batch generate multiple announcements in one session

### 🚀 Next Steps

Now that you've learned the basics of TTS, try:
- Creating announcements for your own IT environment
- Experimenting with different voices for different message types
- Building a library of standard announcements
- Integrating TTS into your automated phone system or monitoring tools
- Generating multi-language versions of important announcements

### 📚 Additional Resources

- OpenAI TTS API Documentation: https://platform.openai.com/docs/guides/text-to-speech
- OpenAI Pricing: https://openai.com/pricing
- Audio best practices for professional communications

## 🎉 Congratulations!

You've successfully learned how to:
- ✅ Set up and configure OpenAI's TTS API
- ✅ Generate professional audio from text
- ✅ Create realistic IT support announcements
- ✅ Combine GPT-5-nano text generation with TTS for automated workflows
- ✅ Apply best practices for text formatting
- ✅ Understand when to use TTS in real-world scenarios

**You're now ready to create professional audio announcements for your IT support environment!** 🎊