A modern, interactive web application for testing OpenAI's Realtime API with real-time voice interactions. This project provides both a web-based interface and a Python implementation for bi-directional audio conversations with GPT-4 models.
- ποΈ Real-time Voice Interaction - Live audio conversations with AI
- π Audio Playback - Hear AI responses in real-time
- π Live Transcription - See conversation transcripts as you speak
- π Event Logging - Monitor all WebSocket events in real-time
- π¨ Modern UI - Clean, responsive design with visual feedback
- πΎ Download Transcripts - Export conversation history
- ποΈ Audio Visualizer - Visual feedback during recording
- π Python Implementation - Async WebSocket client using
websockets - π€ PyAudio Integration - Direct audio capture and playback
- π Server VAD - Automatic speech detection
- π Transcription - Automatic Whisper transcription of user input
- βοΈ Configurable - Customizable voice, temperature, and VAD settings
- Modern web browser (Chrome, Firefox, Edge, Safari)
- Python 3.8+ (for Python implementation)
- OpenAI API key with Realtime API access
- Microphone access
-
Clone or download this repository
cd RealtimeAPI -
Configure your API key
Open
script.jsand update the API key:const API_KEY = 'your-api-key-here';
-
Start a local server
# Using Python python -m http.server 3000 # Or using Node.js npx http-server -p 3000
-
Open in browser
Navigate to
http://localhost:3000in your web browser -
Start Using
- Click Connect to establish WebSocket connection
- Click Start Recording to begin speaking
- The AI will respond with audio when you finish speaking
- View transcripts in the Conversation panel
-
Install dependencies
pip install websockets pyaudio python-dotenv
-
Set up environment
Create a
.envfile:OPENAI_API_KEY=your-api-key-here
-
Run the client
python example.py
-
Commands
a- Record and send audiot- Send text messageq- Quit application
RealtimeAPI/
βββ index.html # Main web interface
βββ script.js # WebSocket client & audio handling
βββ style.css # Modern UI styling
βββ audio-test.html # Audio testing utility
βββ example.py # Python implementation
βββ README.md # This file
In script.js, you can customize:
// Model Selection
const REALTIME_API_URL = 'wss://api.openai.com/v1/realtime?model=gpt-realtime-mini-2025-10-06';
// Session Configuration
session: {
modalities: ['text', 'audio'],
voice: 'alloy', // alloy, echo, shimmer, ash, ballad, coral, sage, verse
input_audio_format: 'pcm16',
output_audio_format: 'pcm16',
temperature: 0.8,
max_response_output_tokens: 4096,
turn_detection: {
type: 'server_vad',
threshold: 0.5, // 0.0-1.0
prefix_padding_ms: 300,
silence_duration_ms: 500
}
}In example.py, customize:
client = RealtimeClient(
instructions="Your custom instructions",
voice="ash" # alloy, echo, shimmer, ash, ballad, coral, sage, verse
)
# VAD Configuration
VAD_config = {
"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 600
}- alloy - Neutral, balanced
- echo - Warm, expressive
- shimmer - Clear, articulate
- ash - Relaxed, natural
- ballad - Smooth, theatrical
- coral - Friendly, upbeat
- sage - Wise, calm
- verse - Poetic, expressive
- Format: PCM16 (16-bit Linear PCM)
- Sample Rate: 24,000 Hz
- Channels: Mono (1 channel)
- Encoding: Base64 for WebSocket transmission
Client β Server:
session.update- Configure session settingsinput_audio_buffer.append- Send audio chunksinput_audio_buffer.commit- Finalize audio input (manual VAD)response.create- Request AI responseconversation.item.create- Send text messages
Server β Client:
session.created/session.updated- Session statusinput_audio_buffer.speech_started/speech_stopped- VAD eventsconversation.item.input_audio_transcription.completed- User speech transcriptionresponse.output_audio.delta- Audio response chunksresponse.output_audio.done- Audio response completeresponse.audio_transcript.done- AI response transcriptionresponse.done- Response completeerror- Error events
- Check model access: Ensure your API key has access to the Realtime API
- Verify model name: Use
gpt-realtime-mini-2025-10-06or latest available model - Check console: Look for error messages in browser DevTools (F12)
- Audio permissions: Ensure browser has microphone permissions
- Speaker volume: Check system audio settings
- API Key: Verify your API key is valid and has Realtime API access
- CORS: Use a local server (not
file://protocol) - Network: Check for firewall or proxy blocking WebSocket connections
- Browser: Try a different browser (Chrome recommended)
- Model not found: Update
REALTIME_API_URLto use the latest model - Rate limits: Check your OpenAI account usage limits
- Invalid request: Review session configuration parameters
-
PyAudio installation:
# Windows pip install pipwin pipwin install pyaudio # macOS brew install portaudio pip install pyaudio # Linux sudo apt-get install portaudio19-dev python3-pyaudio
-
Websockets version: Use
websockets<14.0pip install 'websockets<14.0'
Open DevTools (F12) to see detailed logs:
// Event logs show:
π¨ Event: response.output_audio.delta
π΅ OUTPUT Audio delta received!
Buffer now has 5 chunks (12800 bytes)
π΅ OUTPUT Audio done!
βΆοΈ Playing...- Never commit API keys to version control
- Use environment variables for API keys in production
- Rotate keys regularly following security best practices
- Monitor usage in your OpenAI dashboard
Contributions are welcome! Feel free to:
- Report bugs
- Suggest features
- Submit pull requests
- Improve documentation
This project is open source and available under the MIT License.
Built with:
- OpenAI Realtime API
- Web Audio API
- WebSocket API
- Modern CSS & JavaScript
Note: This is a testing and development tool. For production applications, implement proper error handling, authentication, and security measures.