Skip to content

skanderspy/Realtime-API-Chat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OpenAI Realtime API Tester

A modern, interactive web application for testing OpenAI's Realtime API with real-time voice interactions. This project provides both a web-based interface and a Python implementation for bi-directional audio conversations with GPT-4 models.

OpenAI Realtime API License

✨ Features

Web Interface

  • πŸŽ™οΈ Real-time Voice Interaction - Live audio conversations with AI
  • πŸ”Š Audio Playback - Hear AI responses in real-time
  • πŸ“ Live Transcription - See conversation transcripts as you speak
  • πŸ“Š Event Logging - Monitor all WebSocket events in real-time
  • 🎨 Modern UI - Clean, responsive design with visual feedback
  • πŸ’Ύ Download Transcripts - Export conversation history
  • 🎚️ Audio Visualizer - Visual feedback during recording

Python Client

  • 🐍 Python Implementation - Async WebSocket client using websockets
  • 🎀 PyAudio Integration - Direct audio capture and playback
  • πŸ”„ Server VAD - Automatic speech detection
  • πŸ“œ Transcription - Automatic Whisper transcription of user input
  • βš™οΈ Configurable - Customizable voice, temperature, and VAD settings

πŸš€ Quick Start

Prerequisites

  • Modern web browser (Chrome, Firefox, Edge, Safari)
  • Python 3.8+ (for Python implementation)
  • OpenAI API key with Realtime API access
  • Microphone access

Web Application Setup

  1. Clone or download this repository

    cd RealtimeAPI
  2. Configure your API key

    Open script.js and update the API key:

    const API_KEY = 'your-api-key-here';
  3. Start a local server

    # Using Python
    python -m http.server 3000
    
    # Or using Node.js
    npx http-server -p 3000
  4. Open in browser

    Navigate to http://localhost:3000 in your web browser

  5. Start Using

    • Click Connect to establish WebSocket connection
    • Click Start Recording to begin speaking
    • The AI will respond with audio when you finish speaking
    • View transcripts in the Conversation panel

Python Client Setup

  1. Install dependencies

    pip install websockets pyaudio python-dotenv
  2. Set up environment

    Create a .env file:

    OPENAI_API_KEY=your-api-key-here
  3. Run the client

    python example.py
  4. Commands

    • a - Record and send audio
    • t - Send text message
    • q - Quit application

πŸ“ Project Structure

RealtimeAPI/
β”œβ”€β”€ index.html          # Main web interface
β”œβ”€β”€ script.js           # WebSocket client & audio handling
β”œβ”€β”€ style.css           # Modern UI styling
β”œβ”€β”€ audio-test.html     # Audio testing utility
β”œβ”€β”€ example.py          # Python implementation
└── README.md           # This file

πŸ”§ Configuration

Web Client Configuration

In script.js, you can customize:

// Model Selection
const REALTIME_API_URL = 'wss://api.openai.com/v1/realtime?model=gpt-realtime-mini-2025-10-06';

// Session Configuration
session: {
    modalities: ['text', 'audio'],
    voice: 'alloy',                    // alloy, echo, shimmer, ash, ballad, coral, sage, verse
    input_audio_format: 'pcm16',
    output_audio_format: 'pcm16',
    temperature: 0.8,
    max_response_output_tokens: 4096,
    turn_detection: {
        type: 'server_vad',
        threshold: 0.5,                // 0.0-1.0
        prefix_padding_ms: 300,
        silence_duration_ms: 500
    }
}

Python Client Configuration

In example.py, customize:

client = RealtimeClient(
    instructions="Your custom instructions",
    voice="ash"  # alloy, echo, shimmer, ash, ballad, coral, sage, verse
)

# VAD Configuration
VAD_config = {
    "type": "server_vad",
    "threshold": 0.5,
    "prefix_padding_ms": 300,
    "silence_duration_ms": 600
}

🎯 Available Voices

  • alloy - Neutral, balanced
  • echo - Warm, expressive
  • shimmer - Clear, articulate
  • ash - Relaxed, natural
  • ballad - Smooth, theatrical
  • coral - Friendly, upbeat
  • sage - Wise, calm
  • verse - Poetic, expressive

πŸ› οΈ Technical Details

Audio Specifications

  • Format: PCM16 (16-bit Linear PCM)
  • Sample Rate: 24,000 Hz
  • Channels: Mono (1 channel)
  • Encoding: Base64 for WebSocket transmission

WebSocket Events Handled

Client β†’ Server:

  • session.update - Configure session settings
  • input_audio_buffer.append - Send audio chunks
  • input_audio_buffer.commit - Finalize audio input (manual VAD)
  • response.create - Request AI response
  • conversation.item.create - Send text messages

Server β†’ Client:

  • session.created / session.updated - Session status
  • input_audio_buffer.speech_started / speech_stopped - VAD events
  • conversation.item.input_audio_transcription.completed - User speech transcription
  • response.output_audio.delta - Audio response chunks
  • response.output_audio.done - Audio response complete
  • response.audio_transcript.done - AI response transcription
  • response.done - Response complete
  • error - Error events

πŸ” Troubleshooting

No Audio Output

  1. Check model access: Ensure your API key has access to the Realtime API
  2. Verify model name: Use gpt-realtime-mini-2025-10-06 or latest available model
  3. Check console: Look for error messages in browser DevTools (F12)
  4. Audio permissions: Ensure browser has microphone permissions
  5. Speaker volume: Check system audio settings

Connection Issues

  1. API Key: Verify your API key is valid and has Realtime API access
  2. CORS: Use a local server (not file:// protocol)
  3. Network: Check for firewall or proxy blocking WebSocket connections
  4. Browser: Try a different browser (Chrome recommended)

Response Failed Errors

  • Model not found: Update REALTIME_API_URL to use the latest model
  • Rate limits: Check your OpenAI account usage limits
  • Invalid request: Review session configuration parameters

Python Client Issues

  1. PyAudio installation:

    # Windows
    pip install pipwin
    pipwin install pyaudio
    
    # macOS
    brew install portaudio
    pip install pyaudio
    
    # Linux
    sudo apt-get install portaudio19-dev python3-pyaudio
  2. Websockets version: Use websockets<14.0

    pip install 'websockets<14.0'

πŸ“Š Browser Console Debugging

Open DevTools (F12) to see detailed logs:

// Event logs show:
πŸ“¨ Event: response.output_audio.delta
🎡 OUTPUT Audio delta received!
Buffer now has 5 chunks (12800 bytes)
🎡 OUTPUT Audio done!
▢️ Playing...

πŸ” Security Notes

  • Never commit API keys to version control
  • Use environment variables for API keys in production
  • Rotate keys regularly following security best practices
  • Monitor usage in your OpenAI dashboard

πŸ“š Resources

🀝 Contributing

Contributions are welcome! Feel free to:

  • Report bugs
  • Suggest features
  • Submit pull requests
  • Improve documentation

πŸ“„ License

This project is open source and available under the MIT License.

πŸ™ Acknowledgments

Built with:


Note: This is a testing and development tool. For production applications, implement proper error handling, authentication, and security measures.

About

Uses the OpenAI Realtime API to answer queries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published