# Prerequisites

### Before You Begin
1. Sign up for a Twilio account and get a phone number with voice capabilities
2. Have an OpenAI API key

### Configuration Parameters
- `SYSTEM_MESSAGE`: Configures the behavior and personality of the AI
- `VOICE`: Controls the AI's voice responses (options: alloy, echo, shimmer)
- `LOG_EVENT_TYPES`: Determines which events from the OpenAI API to log

In [1]:
import os
import json
import base64
import asyncio
import websockets
from fastapi import FastAPI, WebSocket, Request
from fastapi.responses import HTMLResponse
from fastapi.websockets import WebSocketDisconnect
from twilio.twiml.voice_response import VoiceResponse, Connect, Say, Stream
from dotenv import load_dotenv

load_dotenv()

# Configuration
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PORT = int(os.getenv("PORT", 5050))
SYSTEM_MESSAGE = (
    "You are a helpful and bubbly AI assistant who loves to chat about "
    "anything the user is interested in and is prepared to offer them facts. "
    "You have a penchant for dad jokes, owl jokes, and rickrolling – subtly. "
    "Always stay positive, but work in a joke when appropriate."
)
VOICE = "alloy"
LOG_EVENT_TYPES = [
    "error",
    "response.content.done",
    "rate_limits.updated",
    "response.done",
    "input_audio_buffer.committed",
    "input_audio_buffer.speech_stopped",
    "input_audio_buffer.speech_started",
    "session.created",
]
SHOW_TIMING_MATH = False

app = FastAPI()

# FastAPI Routes Setup

## Route Overview
We implement two essential routes:

1. Root route (`/`)
   - Basic server health check
   - Useful for testing server status

2. Incoming call route (`/incoming-call`)
   - Handles incoming Twilio calls
   - Returns TwiML instructions
   - Controls call flow and behavior

## Implementation Details

### Route Handler
- `/incoming-call` serves as the entry point
- Processes all Twilio's incoming call requests
- Functions as the dedicated gateway for phone traffic

### TwiML Implementation
- Twilio's XML dialect for voice handling
- Provides structured instruction set
- Essential for call flow control

### Helper Library Benefits
- Utilizes Twilio's Python Helper library
- Eliminates need for raw XML
- Reduces syntax errors
- Improves code readability

### Response Flow
1. Places caller in wait state
2. Establishes WebSocket connection via `/media-stream`
3. Creates bi-directional communication channel

In [2]:

@app.get("/", response_class=HTMLResponse)
async def index_page():
    return {"message": "Twilio Media Stream Server is running!"}

@app.api_route("/incoming-call", methods=["GET", "POST"])
async def handle_incoming_call(request: Request):
    """Handle incoming call and return TwiML response to connect to Media Stream."""
    response = VoiceResponse()
    # <Say> punctuation to improve text-to-speech flow
    response.say("Please wait while we connect your call to the A. I. voice assistant, powered by Twilio and the Open-A.I. Realtime API")
    response.pause(length=1)
    response.say("O.K. you can start talking!")
    host = request.url.hostname
    connect = Connect()
    connect.stream(url=f'wss://{host}/media-stream')
    response.append(connect)
    return HTMLResponse(content=str(response), media_type="application/xml")


# WebSocket Implementation

## Overview
- Sets up WebSocket route for Media Streams
- Connects to both Twilio and OpenAI WebSockets
- Handles audio proxying between services

## OpenAI Realtime API Connection

### WebSocket Connection
- Establishes connection using provided endpoint
- Includes OpenAI API key and beta flag
- Handles authentication

### Session Configuration
- Sends initial configuration
- Manages connection parameters
- Controls API behavior

## Audio Proxy System
- `receive_from_twilio`: Listens for and processes Twilio audio data
- `send_to_twilio`: Handles OpenAI response events
- Manages event logging via `LOG_EVENT_TYPES`

In [3]:
async def send_session_update(openai_ws):
    """Send session update to OpenAI WebSocket."""
    session_update = {
        "type": "session.update",
        "session": {
            "turn_detection": {"type": "server_vad"},
            "input_audio_format": "g711_ulaw",
            "output_audio_format": "g711_ulaw",
            "voice": VOICE,
            "instructions": SYSTEM_MESSAGE,
            "modalities": ["text", "audio"],
            "temperature": 0.8,
        },
    }
    print("Sending session update:", json.dumps(session_update))
    await openai_ws.send(json.dumps(session_update))


@app.websocket("/media-stream")
async def handle_media_stream(websocket: WebSocket):
    """Handle WebSocket connections between Twilio and OpenAI."""
    print("Client connected")
    await websocket.accept()
    async with websockets.connect(
        'wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01',
        extra_headers={
            "Authorization": f"Bearer {OPENAI_API_KEY}",
            "OpenAI-Beta": "realtime=v1"
        }
    ) as openai_ws:
        await send_session_update(openai_ws)
        stream_sid = None
        async def receive_from_twilio():
            """Receive audio data from Twilio and send it to the OpenAI Realtime API."""
            nonlocal stream_sid
            try:
                async for message in websocket.iter_text():
                    data = json.loads(message)
                    if data['event'] == 'media' and openai_ws.open:
                        audio_append = {
                            "type": "input_audio_buffer.append",
                            "audio": data['media']['payload']
                        }
                        await openai_ws.send(json.dumps(audio_append))
                    elif data['event'] == 'start':
                        stream_sid = data['start']['streamSid']
                        print(f"Incoming stream has started {stream_sid}")
            except WebSocketDisconnect:
                print("Client disconnected.")
                if openai_ws.open:
                    await openai_ws.close()
    async def send_to_twilio():
        """Receive events from the OpenAI Realtime API, send audio back to Twilio."""
        nonlocal stream_sid
        try:
            async for openai_message in openai_ws:
                response = json.loads(openai_message)
                if response['type'] in LOG_EVENT_TYPES:
                    print(f"Received event: {response['type']}", response)
                if response['type'] == 'session.updated':
                    print("Session updated successfully:", response)
                if response['type'] == 'response.audio.delta' and response.get('delta'):
                    # Audio from OpenAI
                    try:
                        audio_payload = base64.b64encode(base64.b64decode(response['delta'])).decode('utf-8')
                        audio_delta = {
                            "event": "media",
                            "streamSid": stream_sid,
                            "media": {
                                "payload": audio_payload
                            }
                        }
                        await websocket.send_json(audio_delta)
                    except Exception as e:
                        print(f"Error processing audio data: {e}")
        except Exception as e:
            print(f"Error in send_to_twilio: {e}")
    await asyncio.gather(receive_from_twilio(), send_to_twilio())

# Session Configuration Details

The OpenAI Realtime API session includes these key settings:

## Turn Detection
- Enables server-side Voice Activity Detection (VAD)
- Controls AI response timing

## Audio Formats
- Uses G.711 ulaw format
- Ensures Twilio compatibility

## Voice and Instructions
- Configurable via `SYSTEM_MESSAGE`
- Defines AI personality and behavior

## Modalities
- Enables text and audio capabilities
- Supports multi-modal interaction

## Temperature Setting
- Controls response creativity
- Lower: More deterministic
- Higher: More diverse responses

In [None]:
# Note this code is in the above cell, This is just repeated for the purposes of the demo. 

async def send_session_update(openai_ws):
    """Send session update to OpenAI WebSocket."""
    session_update = {
        "type": "session.update",
        "session": {
            "turn_detection": {"type": "server_vad"},
            "input_audio_format": "g711_ulaw",
            "output_audio_format": "g711_ulaw",
            "voice": VOICE,
            "instructions": SYSTEM_MESSAGE,
            "modalities": ["text", "audio"],
            "temperature": 0.8,
        },
    }
    print("Sending session update:", json.dumps(session_update))
    await openai_ws.send(json.dumps(session_update))

# Deployment Instructions

## Local Setup with ngrok
To enable Twilio connectivity, run this command in terminal:

```bash
ngrok http 5050
```

This creates a secure tunnel to your local development environment, allowing Twilio to communicate with your application.

### Important Notes
- Keep the ngrok process running during development
- Update your Twilio webhook URLs with the new ngrok URL when it changes
- Default port is 5050, modify if needed in your environment configuration

In [None]:
# import nest_asyncio
# import uvicorn

# # Apply nest_asyncio to handle the event loop in Jupyter
# nest_asyncio.apply()

# # Mimic the `if __name__ == "__main__"` behavior
# if __name__ == "__main__":
#     uvicorn.run(app, host="0.0.0.0", port=PORT)