OpenAI Realtime API Tester

A modern, interactive web application for testing OpenAI's Realtime API with real-time voice interactions. This project provides both a web-based interface and a Python implementation for bi-directional audio conversations with GPT-4 models.

✨ Features

Web Interface

🎙️ Real-time Voice Interaction - Live audio conversations with AI
🔊 Audio Playback - Hear AI responses in real-time
📝 Live Transcription - See conversation transcripts as you speak
📊 Event Logging - Monitor all WebSocket events in real-time
🎨 Modern UI - Clean, responsive design with visual feedback
💾 Download Transcripts - Export conversation history
🎚️ Audio Visualizer - Visual feedback during recording

Python Client

🐍 Python Implementation - Async WebSocket client using websockets
🎤 PyAudio Integration - Direct audio capture and playback
🔄 Server VAD - Automatic speech detection
📜 Transcription - Automatic Whisper transcription of user input
⚙️ Configurable - Customizable voice, temperature, and VAD settings

🚀 Quick Start

Prerequisites

Modern web browser (Chrome, Firefox, Edge, Safari)
Python 3.8+ (for Python implementation)
OpenAI API key with Realtime API access
Microphone access

Web Application Setup

Clone or download this repository
```
cd RealtimeAPI
```
Configure your API key

Open script.js and update the API key:
```
const API_KEY = 'your-api-key-here';
```

Start a local server

# Using Python
python -m http.server 3000

# Or using Node.js
npx http-server -p 3000

Open in browser

Navigate to http://localhost:3000 in your web browser
Start Using
- Click Connect to establish WebSocket connection
- Click Start Recording to begin speaking
- The AI will respond with audio when you finish speaking
- View transcripts in the Conversation panel

Python Client Setup

Install dependencies

pip install websockets pyaudio python-dotenv

Set up environment

Create a .env file:
```
OPENAI_API_KEY=your-api-key-here
```
Run the client
```
python example.py
```
Commands
- a - Record and send audio
- t - Send text message
- q - Quit application

📁 Project Structure

RealtimeAPI/
├── index.html          # Main web interface
├── script.js           # WebSocket client & audio handling
├── style.css           # Modern UI styling
├── audio-test.html     # Audio testing utility
├── example.py          # Python implementation
└── README.md           # This file

🔧 Configuration

Web Client Configuration

In script.js, you can customize:

// Model Selection
const REALTIME_API_URL = 'wss://api.openai.com/v1/realtime?model=gpt-realtime-mini-2025-10-06';

// Session Configuration
session: {
    modalities: ['text', 'audio'],
    voice: 'alloy',                    // alloy, echo, shimmer, ash, ballad, coral, sage, verse
    input_audio_format: 'pcm16',
    output_audio_format: 'pcm16',
    temperature: 0.8,
    max_response_output_tokens: 4096,
    turn_detection: {
        type: 'server_vad',
        threshold: 0.5,                // 0.0-1.0
        prefix_padding_ms: 300,
        silence_duration_ms: 500
    }
}

Python Client Configuration

In example.py, customize:

client = RealtimeClient(
    instructions="Your custom instructions",
    voice="ash"  # alloy, echo, shimmer, ash, ballad, coral, sage, verse
)

# VAD Configuration
VAD_config = {
    "type": "server_vad",
    "threshold": 0.5,
    "prefix_padding_ms": 300,
    "silence_duration_ms": 600
}

🎯 Available Voices

alloy - Neutral, balanced
echo - Warm, expressive
shimmer - Clear, articulate
ash - Relaxed, natural
ballad - Smooth, theatrical
coral - Friendly, upbeat
sage - Wise, calm
verse - Poetic, expressive

🛠️ Technical Details

Audio Specifications

Format: PCM16 (16-bit Linear PCM)
Sample Rate: 24,000 Hz
Channels: Mono (1 channel)
Encoding: Base64 for WebSocket transmission

WebSocket Events Handled

Client → Server:

session.update - Configure session settings
input_audio_buffer.append - Send audio chunks
input_audio_buffer.commit - Finalize audio input (manual VAD)
response.create - Request AI response
conversation.item.create - Send text messages

Server → Client:

session.created / session.updated - Session status
input_audio_buffer.speech_started / speech_stopped - VAD events
conversation.item.input_audio_transcription.completed - User speech transcription
response.output_audio.delta - Audio response chunks
response.output_audio.done - Audio response complete
response.audio_transcript.done - AI response transcription
response.done - Response complete
error - Error events

🔍 Troubleshooting

No Audio Output

Check model access: Ensure your API key has access to the Realtime API
Verify model name: Use gpt-realtime-mini-2025-10-06 or latest available model
Check console: Look for error messages in browser DevTools (F12)
Audio permissions: Ensure browser has microphone permissions
Speaker volume: Check system audio settings

Connection Issues

API Key: Verify your API key is valid and has Realtime API access
CORS: Use a local server (not file:// protocol)
Network: Check for firewall or proxy blocking WebSocket connections
Browser: Try a different browser (Chrome recommended)

Response Failed Errors

Model not found: Update REALTIME_API_URL to use the latest model
Rate limits: Check your OpenAI account usage limits
Invalid request: Review session configuration parameters

Python Client Issues

PyAudio installation:

# Windows
pip install pipwin
pipwin install pyaudio

# macOS
brew install portaudio
pip install pyaudio

# Linux
sudo apt-get install portaudio19-dev python3-pyaudio

Websockets version: Use websockets<14.0
```
pip install 'websockets<14.0'
```

📊 Browser Console Debugging

Open DevTools (F12) to see detailed logs:

// Event logs show:
📨 Event: response.output_audio.delta
🎵 OUTPUT Audio delta received!
Buffer now has 5 chunks (12800 bytes)
🎵 OUTPUT Audio done!
▶️ Playing...

🔐 Security Notes

Never commit API keys to version control
Use environment variables for API keys in production
Rotate keys regularly following security best practices
Monitor usage in your OpenAI dashboard

📚 Resources

🤝 Contributing

Contributions are welcome! Feel free to:

Report bugs
Suggest features
Submit pull requests
Improve documentation

📄 License

This project is open source and available under the MIT License.

🙏 Acknowledgments

Built with:

OpenAI Realtime API
Web Audio API
WebSocket API
Modern CSS & JavaScript

Note: This is a testing and development tool. For production applications, implement proper error handling, authentication, and security measures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenAI Realtime API Tester

✨ Features

Web Interface

Python Client

🚀 Quick Start

Prerequisites

Web Application Setup

Python Client Setup

📁 Project Structure

🔧 Configuration

Web Client Configuration

Python Client Configuration

🎯 Available Voices

🛠️ Technical Details

Audio Specifications

WebSocket Events Handled

🔍 Troubleshooting

No Audio Output

Connection Issues

Response Failed Errors

Python Client Issues

📊 Browser Console Debugging

🔐 Security Notes

📚 Resources

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
architecture.md		architecture.md
audio-test.html		audio-test.html
example.py		example.py
index.html		index.html
script.js		script.js
style.css		style.css

skanderspy/Realtime-API-Chat

Folders and files

Latest commit

History

Repository files navigation

OpenAI Realtime API Tester

✨ Features

Web Interface

Python Client

🚀 Quick Start

Prerequisites

Web Application Setup

Python Client Setup

📁 Project Structure

🔧 Configuration

Web Client Configuration

Python Client Configuration

🎯 Available Voices

🛠️ Technical Details

Audio Specifications

WebSocket Events Handled

🔍 Troubleshooting

No Audio Output

Connection Issues

Response Failed Errors

Python Client Issues

📊 Browser Console Debugging

🔐 Security Notes

📚 Resources

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages