diff --git a/website/docs/main/compatibility-api/guides/voice/nodejs/realtime-streaming-to-openai/index.mdx b/website/docs/main/compatibility-api/guides/voice/nodejs/realtime-streaming-to-openai/index.mdx new file mode 100644 index 00000000..00e90793 --- /dev/null +++ b/website/docs/main/compatibility-api/guides/voice/nodejs/realtime-streaming-to-openai/index.mdx @@ -0,0 +1,1314 @@ +--- +title: Integrate OpenAI Realtime API with cXML +description: Put OpenAI Speech-to-Speech models on the phone with bidirectional streaming and cXML. +slug: /compatibility-api/cxml/stream-openai-realtime +sidebar_label: Stream an OpenAI Realtime API agent +sidebar_position: 0 +x-custom: + tags: + - product:ai + - product:voice + - language:nodejs + - language:javascript + - sdk:compatibility +--- + +import AddResource from '/docs/main/_common/dashboard/add-resource.mdx'; +import ResourcesFyi from '/docs/main/_common/call-fabric/resources-fyi-card.mdx'; +import { MdCode } from "react-icons/md"; + +# Stream to OpenAI Realtime API agent with cXML + +Put OpenAI Speech-to-Speech models on the phone with cXML `` + +In this guide, we will build a Node.js application that serves a +[cXML Script][cxml] +that initiates a two-way (bidirectional) +[``][bidir-stream] +to a Speech-to-Speech model on the OpenAI Realtime API. +When a caller initiates a call to the assigned phone number, +the SignalWire platform requests and runs the cXML script. + +```mermaid +graph LR + A[Phone call] --> B[SignalWire] + B --> C[WebSocket] + C --> D[Transport layer] + D --> E[OpenAI Realtime] + E --> D + D --> C + C --> B + B --> A +``` + +{/* This architectural explainer is a DRAFT. It could be useful, but needs further refinement. + +**Audio Flow Details:** +- **Inbound**: Phone → SignalWire → Base64 → Transport → ArrayBuffer → OpenAI +- **Outbound**: OpenAI → ArrayBuffer → Transport → Base64 → SignalWire → Phone +- **Latency**: Typically 150-300ms end-to-end +- **Quality**: Depends on codec choice (G.711 vs PCM16) + +The key architectural components involved are: + +- **cXML server:** Our Fastify server serves dynamic cXML to the SignalWire platform. +This gives our application the ability to update the call instructions according to each request. +- **WebSocket bridge:** Enables real-time audio streaming between telephony and AI +- **AI integration:** Natural conversations with OpenAI's Realtime API +- **Function calling:** Server-side tool execution during conversations + +Here's what happens when someone calls your application: + +```mermaid +flowchart TD + A(Phone call) --> B(SignalWire platform) + B --> |Request cXML Script via webhook| C(Your server) + B --> |Bidirectional WebSocket connection| D(OpenAI API) +``` + +1. **Call arrives** at SignalWire +2. **Webhook triggers** your server endpoint +3. **WebSocket streams** audio bidirectionally +4. **AI processes** speech in real-time +5. **Responses flow back** to the caller + +*/} + +## Prerequisites + +Before you begin, ensure you have: + +- **SignalWire Space** - [Sign up free](https://signalwire.com/signup) +- **OpenAI API Key** - [Get access](https://platform.openai.com/api-keys) (requires paid account) +- **Node.js 20+** - For running the TypeScript server ([Install Node](https://nodejs.org/en/download)) +- **ngrok** or other tunneling service - For local development tunneling ([Install ngrok](https://ngrok.com/download)) +- **Docker** (optional) - For containerized deployment + +## Quickstart + + + +### Clone and install + +
+ +
+ +Clone the SignalWire Solutions repository, navigate to this example, and install. + +```bash +git clone https://github.com/signalwire/cXML-realtime-agent-stream +cd cxml-realtime-agent-stream +npm install +``` + +
+ +
+ +} + > +View the source code on GitHub + + +
+ +
+ +### Add OpenAI credentials + +Select the **Local** or **Docker** tab below depending on where you plan to run the application. + + + + +When running the server on your local machine, store your credentials in a `.env` file. + +```bash +cp .env.example .env +``` + +Edit `.env` and add your OpenAI API key: + +```bash title=".env" +OPENAI_API_KEY=sk-your-actual-api-key-here +``` + + + + + +When running the server in production with the Docker container, store your credentials in a `secrets` folder. + +```bash +mkdir secrets +``` + +```bash +echo "sk-your-actual-api-key-here" > secrets/openai_api_key.txt +``` + + + + +### Run application + + + + +```bash +npm run build +npm start +``` + + + + + +```bash +docker-compose up --build signalwire-assistant +``` + + + + +Your AI assistant webhook is now running at `http://localhost:5050/incoming-call`. + +:::tip Health check +Make sure your server is running and the health check passes: +```bash +curl http://localhost:5050/health +# Should return: {"status":"healthy"} +``` +::: + +### Create a cXML Script + +Next, we need to tell SignalWire to request cXML from your server when a call comes in. + +
+ +
+ +- Navigate to [My Resources][resources] in your Dashboard. +- Click **Create Resource**, select **Script** as the resource type, and choose `cXML`. +- Under `Handle Using`, select `External Url`. +- Set the `Primary Script URL` to your server's **webhook endpoint**. + +Select the **Local** tab below if you ran the application locally, and the **Docker** tab if you're running it with Docker. + +
+ +
+ + + +
+ +
+ + + +Use ngrok to expose port 5050 on your development machine: + +```bash +ngrok http 5050 +``` + +Append `/incoming-call` to the HTTPS URL returned by ngrok. +https://abc123.ngrok.io/incoming-call + + +For production environments, set your server URL + `/incoming-call`: + ``` + https://your-domain.com/incoming-call + ``` + + + +:::important set routes +For this example, you **must** include `/incoming-call` at the end of your URL. This is the specific webhook endpoint that our application uses to handle incoming calls. +::: + +- Give the cXML Script a descriptive name, such as "AI Voice Assistant". +- Save your new Resource. + +### Assign phone number or SIP address + +To test your AI assistant, create a SIP address or phone number and assign it as a handler for your cXML Script Resource. + +- From the [My Resources][resources] tab, select your cXML Script +- Open the **Addresses & Phone Numbers** tab +- Click **Add**, and select either **SIP Address** or **Phone Number** +- Fill out any required details, and save the configuration + +### Test application + +Dial the SIP address or phone number assigned to your cXML Script. +You should now be speaking to your newly created agent! + +
+ + + +--- + + + +## How it works + + + + + +First, your server needs to handle incoming call webhooks from SignalWire. + +**Set up the HTTP endpoint** + + + + +```typescript +import Fastify from 'fastify'; + +const app = Fastify(); + +app.post('/incoming-call', async (req, res) => { + const host = req.headers.host; + const wsUrl = `wss://${host}/media-stream`; + + // Return cXML instructions to stream audio + const cxml = ` + + + `; + + res.type('text/xml').send(cxml); +}); + +app.listen({ port: 5050, host: '0.0.0.0' }); +``` + + + + +```javascript +const Fastify = require('fastify'); + +const app = Fastify(); + +app.post('/incoming-call', async (req, res) => { + const host = req.headers.host; + const wsUrl = `wss://${host}/media-stream`; + + // Return cXML instructions to stream audio + const cxml = ` + + + `; + + res.type('text/xml').send(cxml); +}); + +app.listen({ port: 5050, host: '0.0.0.0' }); +``` + + + + +:::tip Webhook URL Format +Your webhook URL must include `/incoming-call` at the end: +- Local: `https://your-ngrok-url.ngrok.io/incoming-call` +- Production: `https://your-domain.com/incoming-call` +::: + + + + + +Next, we will create a WebSocket server to handle bidirectional audio streaming. + +**Initialize WebSocket Server** + + + + +```typescript +import websocket from '@fastify/websocket'; +import { SignalWireRealtimeTransportLayer } from '../transports/SignalWireRealtimeTransportLayer.js'; +import { RealtimeSession, RealtimeAgent } from '@openai/agents/realtime'; +import { AGENT_CONFIG } from '../config.js'; + +interface SignalWireMessage { + event: 'start' | 'media' | 'stop' | 'mark'; + media?: { + payload: string; // Base64 encoded audio + track?: 'inbound' | 'outbound'; + }; + start?: { + streamSid: string; + callSid: string; + mediaFormat?: { + encoding: string; + sampleRate: number; + channels: number; + }; + }; +} + +app.register(websocket); + +app.get('/media-stream', { websocket: true }, async (connection) => { + console.log('📞 Client connected to WebSocket'); + + try { + // Create SignalWire transport layer with configured audio format + const signalWireTransportLayer = new SignalWireRealtimeTransportLayer({ + signalWireWebSocket: connection, + audioFormat: AGENT_CONFIG.audioFormat + }); + + // Create AI agent and session + const realtimeAgent = new RealtimeAgent(agentConfig); + const session = new RealtimeSession(realtimeAgent, { + transport: signalWireTransportLayer, + model: 'gpt-4o-realtime-preview' + }); + + // Connect to OpenAI Realtime API + await session.connect({ + apiKey: process.env.OPENAI_API_KEY + }); + + // Handle session events + session.on('agent_tool_start', (context, agent, tool, details) => { + console.log('🔧 Tool call started:', details); + }); + + } catch (error) { + console.error('❌ Transport initialization failed:', error); + } +}); +``` + + + + +```javascript +const websocket = require('@fastify/websocket'); +const { SignalWireRealtimeTransportLayer } = require('../transports/SignalWireRealtimeTransportLayer.js'); +const { RealtimeSession, RealtimeAgent } = require('@openai/agents/realtime'); +const { AGENT_CONFIG } = require('../config.js'); + +app.register(websocket); + +app.get('/media-stream', { websocket: true }, async (connection) => { + console.log('📞 Client connected to WebSocket'); + + try { + // Create SignalWire transport layer with configured audio format + const signalWireTransportLayer = new SignalWireRealtimeTransportLayer({ + signalWireWebSocket: connection, + audioFormat: AGENT_CONFIG.audioFormat + }); + + // Create AI agent and session + const realtimeAgent = new RealtimeAgent(agentConfig); + const session = new RealtimeSession(realtimeAgent, { + transport: signalWireTransportLayer, + model: 'gpt-4o-realtime-preview' + }); + + // Connect to OpenAI Realtime API + await session.connect({ + apiKey: process.env.OPENAI_API_KEY + }); + + // Handle session events + session.on('agent_tool_start', (context, agent, tool, details) => { + console.log('🔧 Tool call started:', details); + }); + + } catch (error) { + console.error('❌ Transport initialization failed:', error); + } +}); +``` + + + + + + + + +The `SignalWireRealtimeTransportLayer` is the critical component that bridges SignalWire's WebSocket protocol with OpenAI's Realtime API: + +```typescript +// Key features of the transport layer: +const transport = new SignalWireRealtimeTransportLayer({ + signalWireWebSocket: connection, + audioFormat: 'g711_ulaw' // or 'pcm16' +}); + +// Automatic handling of: +// 1. Audio format conversion +// 2. Base64 encoding/decoding +// 3. Interruption detection +// 4. Mark event tracking +// 5. Session cleanup +``` + +**Session Lifecycle:** +1. **WebSocket connection** → SignalWire connects to `/media-stream` +2. **Transport creation** → Bridge between SignalWire and OpenAI +3. **AI session start** → RealtimeSession connects to OpenAI +4. **Audio streaming** → Bidirectional real-time audio +5. **Tool execution** → Function calls processed server-side +6. **Session cleanup** → Graceful disconnect and resource cleanup + +SignalWire sends several types of messages through the WebSocket: + +| Event | Purpose | Key data | +|-------|---------|----------| +| `start` | Connection initialized | `streamSid`, `callSid`, `mediaFormat` | +| `media` | Audio data packet (~20ms) | Base64 encoded `payload`, `track` | +| `mark` | Audio playback confirmation | `name` (for timing) | +| `stop` | Stream ending | None | + +**Key features** +- **Automatic audio format conversion** between SignalWire and OpenAI +- **Interruption handling** using `clear` events and mark tracking +- **Base64 encoding/decoding** for audio data +- **Session lifecycle management** with proper cleanup +- **Error recovery** and reconnection handling + +**Audio Format Support:** +- **Input**: G.711 μ-law (8kHz) or PCM16 (24kHz) from SignalWire +- **Output**: Matches input format automatically +- **OpenAI Integration**: Handles format negotiation transparently + + + + + +Connect your WebSocket bridge to OpenAI's Realtime API for AI processing. + +**Create the AI Session** + + + + +```typescript +import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime'; +import type { RealtimeAgentConfiguration } from '@openai/agents/realtime'; +import { SignalWireRealtimeTransportLayer } from '../transports/SignalWireRealtimeTransportLayer.js'; +import { allTools } from '../tools/index.js'; + +// Configure the AI agent +const agentConfig: RealtimeAgentConfiguration = { + name: 'SignalWire Voice Assistant', + instructions: `You are a helpful and friendly voice assistant. + Always start every conversation by greeting the caller first. + You can help with weather information, time queries, and general conversation. + Be concise and friendly in your responses.`, + tools: allTools, // Weather, time, and other tools + voice: 'alloy' +}; + +async function createAISession(signalWireWebSocket: WebSocket): Promise { + // Create transport layer that bridges SignalWire and OpenAI + const transport = new SignalWireRealtimeTransportLayer({ + signalWireWebSocket, + audioFormat: 'g711_ulaw' // or 'pcm16' for HD audio + }); + + // Create agent and session + const agent = new RealtimeAgent(agentConfig); + const session = new RealtimeSession(agent, { + transport, + model: 'gpt-4o-realtime-preview' + }); + + // Connect to OpenAI + await session.connect({ + apiKey: process.env.OPENAI_API_KEY + }); + + return session; +} +``` + + + + +```javascript +const { RealtimeAgent, RealtimeSession } = require('@openai/agents/realtime'); +const { SignalWireRealtimeTransportLayer } = require('../transports/SignalWireRealtimeTransportLayer.js'); +const { allTools } = require('../tools/index.js'); + +// Configure the AI agent +const agentConfig = { + name: 'SignalWire Voice Assistant', + instructions: `You are a helpful and friendly voice assistant. + Always start every conversation by greeting the caller first. + You can help with weather information, time queries, and general conversation. + Be concise and friendly in your responses.`, + tools: allTools, // Weather, time, and other tools + voice: 'alloy' +}; + +async function createAISession(signalWireWebSocket) { + // Create transport layer that bridges SignalWire and OpenAI + const transport = new SignalWireRealtimeTransportLayer({ + signalWireWebSocket, + audioFormat: 'g711_ulaw' // or 'pcm16' for HD audio + }); + + // Create agent and session + const agent = new RealtimeAgent(agentConfig); + const session = new RealtimeSession(agent, { + transport, + model: 'gpt-4o-realtime-preview' + }); + + // Connect to OpenAI + await session.connect({ + apiKey: process.env.OPENAI_API_KEY + }); + + return session; +} +``` + + + + +**Send Audio Back to Caller** + +```typescript +// Audio is automatically handled by SignalWireRealtimeTransportLayer +// The transport layer manages: +// 1. Audio format conversion (g711_ulaw ↔ pcm16) +// 2. Base64 encoding/decoding +// 3. Chunk timing and interruption handling +// 4. Mark events for tracking audio playback + +// Example of session event handling: +session.on('agent_tool_start', (context, agent, tool, details) => { + console.log('🔧 Tool call started:', details); +}); + +session.on('agent_tool_end', (context, agent, tool, result, details) => { + console.log('✅ Tool call completed:', details); +}); + +session.on('error', (error) => { + console.error('❌ Session error:', error); +}); +``` + +**Environment Configuration** + +Set up your environment variables for different deployment scenarios: + + + + +Create a `.env` file in your project root: + +```bash +# Required +OPENAI_API_KEY=sk-your-actual-api-key-here + +# Optional +PORT=5050 +AUDIO_FORMAT=g711_ulaw # or 'pcm16' for HD audio +``` + + + + +For production with Docker secrets: + +```bash +# Create secrets directory +mkdir -p secrets +echo "sk-your-actual-api-key-here" > secrets/openai_api_key.txt +``` + +Environment variables in `docker-compose.yml`: +```yaml +environment: + - PORT=5050 + - AUDIO_FORMAT=pcm16 +``` + + + + +:::note Audio Format Options +Choose the right audio format for your use case: +- **g711_ulaw (8kHz)**: Standard telephony quality (default) +- **pcm16 (24kHz)**: High definition audio for demos +::: + + + + + +Enable your AI to execute server-side tools during conversations. + +**Define Tools** + + + + +```typescript +import { tool as realtimeTool } from '@openai/agents/realtime'; +import { z } from 'zod'; + +// Weather tool using real US National Weather Service API +const weatherTool = realtimeTool({ + name: 'get_weather', + description: 'Get current weather information for any US city', + parameters: z.object({ + location: z.string().describe('The US city or location to get weather for (include state if needed for clarity)') + }), + execute: async ({ location }) => { + try { + // Step 1: Geocoding - Convert city name to coordinates + const geocodeUrl = `https://nominatim.openstreetmap.org/search?format=json&q=${encodeURIComponent(location)}&countrycodes=us&limit=1`; + const geocodeResponse = await fetch(geocodeUrl, { + headers: { + 'User-Agent': 'SignalWire-OpenAI-Voice-Assistant/1.0.0' + } + }); + + if (!geocodeResponse.ok) { + return 'Sorry, weather information is currently unavailable.'; + } + + const geocodeData = await geocodeResponse.json(); + if (!geocodeData || geocodeData.length === 0) { + return `Sorry, I couldn't find the location "${location}". Please try a different city name.`; + } + + const lat = parseFloat(geocodeData[0].lat); + const lon = parseFloat(geocodeData[0].lon); + + // Step 2: Get weather from weather.gov + const pointsUrl = `https://api.weather.gov/points/${lat},${lon}`; + const pointsResponse = await fetch(pointsUrl); + const pointsData = await pointsResponse.json(); + + const forecastUrl = pointsData.properties?.forecast; + if (!forecastUrl) { + return 'Sorry, weather information is currently unavailable.'; + } + + const forecastResponse = await fetch(forecastUrl); + const forecastData = await forecastResponse.json(); + + const currentPeriod = forecastData.properties?.periods?.[0]; + if (!currentPeriod) { + return 'Sorry, weather information is currently unavailable.'; + } + + // Format response for voice + const cityName = geocodeData[0].display_name.split(',')[0]; + return `In ${cityName}, it's currently ${currentPeriod.detailedForecast.toLowerCase()}`; + + } catch (error) { + return 'Sorry, weather information is currently unavailable.'; + } + } +}); + +// Time tool example (no external API required) +const timeTool = realtimeTool({ + name: 'get_time', + description: 'Get the current time in Eastern Time', + parameters: z.object({}), // No parameters needed + execute: async () => { + try { + const now = new Date(); + const easternTime = now.toLocaleString('en-US', { + timeZone: 'America/New_York', + timeZoneName: 'short', + weekday: 'long', + year: 'numeric', + month: 'long', + day: 'numeric', + hour: 'numeric', + minute: '2-digit' + }); + return `The current time in Eastern Time is ${easternTime}.`; + } catch (error) { + return 'Sorry, time information is currently unavailable.'; + } + } +}); + +// Export all tools +export const allTools = [weatherTool, timeTool]; + +// Add to your AI agent configuration +const agentConfig = { + name: 'SignalWire Voice Assistant', + instructions: `You are a helpful and friendly voice assistant. + Always start every conversation by greeting the caller first. + You can help with weather information, time queries, and general conversation. + Be concise and friendly in your responses.`, + tools: allTools, + voice: 'alloy' +}; +``` + + + + +```javascript +const { tool: realtimeTool } = require('@openai/agents/realtime'); +const { z } = require('zod'); + +// Weather tool using real US National Weather Service API +const weatherTool = realtimeTool({ + name: 'get_weather', + description: 'Get current weather information for any US city', + parameters: z.object({ + location: z.string().describe('The US city or location to get weather for (include state if needed for clarity)') + }), + execute: async ({ location }) => { + try { + // Step 1: Geocoding - Convert city name to coordinates + const geocodeUrl = `https://nominatim.openstreetmap.org/search?format=json&q=${encodeURIComponent(location)}&countrycodes=us&limit=1`; + const geocodeResponse = await fetch(geocodeUrl, { + headers: { + 'User-Agent': 'SignalWire-OpenAI-Voice-Assistant/1.0.0' + } + }); + + if (!geocodeResponse.ok) { + return 'Sorry, weather information is currently unavailable.'; + } + + const geocodeData = await geocodeResponse.json(); + if (!geocodeData || geocodeData.length === 0) { + return `Sorry, I couldn't find the location "${location}". Please try a different city name.`; + } + + const lat = parseFloat(geocodeData[0].lat); + const lon = parseFloat(geocodeData[0].lon); + + // Step 2: Get weather from weather.gov + const pointsUrl = `https://api.weather.gov/points/${lat},${lon}`; + const pointsResponse = await fetch(pointsUrl); + const pointsData = await pointsResponse.json(); + + const forecastUrl = pointsData.properties?.forecast; + if (!forecastUrl) { + return 'Sorry, weather information is currently unavailable.'; + } + + const forecastResponse = await fetch(forecastUrl); + const forecastData = await forecastResponse.json(); + + const currentPeriod = forecastData.properties?.periods?.[0]; + if (!currentPeriod) { + return 'Sorry, weather information is currently unavailable.'; + } + + // Format response for voice + const cityName = geocodeData[0].display_name.split(',')[0]; + return `In ${cityName}, it's currently ${currentPeriod.detailedForecast.toLowerCase()}`; + + } catch (error) { + return 'Sorry, weather information is currently unavailable.'; + } + } +}); + +// Time tool example (no external API required) +const timeTool = realtimeTool({ + name: 'get_time', + description: 'Get the current time in Eastern Time', + parameters: z.object({}), // No parameters needed + execute: async () => { + try { + const now = new Date(); + const easternTime = now.toLocaleString('en-US', { + timeZone: 'America/New_York', + timeZoneName: 'short', + weekday: 'long', + year: 'numeric', + month: 'long', + day: 'numeric', + hour: 'numeric', + minute: '2-digit' + }); + return `The current time in Eastern Time is ${easternTime}.`; + } catch (error) { + return 'Sorry, time information is currently unavailable.'; + } + } +}); + +// Export all tools +module.exports = { allTools: [weatherTool, timeTool] }; + +// Add to your AI agent configuration +const agentConfig = { + name: 'SignalWire Voice Assistant', + instructions: `You are a helpful and friendly voice assistant. + Always start every conversation by greeting the caller first. + You can help with weather information, time queries, and general conversation. + Be concise and friendly in your responses.`, + tools: allTools, + voice: 'alloy' +}; +``` + + + + +1. **User asks**: "What's the weather in New York?" +2. **AI recognizes intent**: Needs weather information +3. **Function call triggered**: `get_weather({ location: "New York" })` +4. **Server executes**: Fetches from weather API +5. **Result returned**: AI incorporates into response +6. **User hears**: "The weather in New York is 72°F and sunny." + +All of this happens in real-time during the conversation. + + + + + +--- + +## Technical Deep Dive + +{/* Section reserved for future architectural explanations */} + +--- + +### Codec Selection Guide + +Choose the right audio codec for your use case: + + + + + + +### Configure Audio Format + + + + +```xml + + + + + +``` + + + + +```bash +# In your .env file +AUDIO_FORMAT=pcm16 # or g711_ulaw +``` + + + + +### Advanced Configuration + + + + +The transport layer automatically handles interruptions: + +```typescript +// When user interrupts AI speech: +// 1. Transport detects voice activity +// 2. Sends 'clear' event to SignalWire +// 3. Truncates OpenAI audio at last played position +// 4. Resumes with new user input + +session.on('interruption', (event) => { + console.log('🛑 User interrupted AI speech'); +}); +``` + + + + +Mark events track audio playback timing: + +```typescript +// Transport sends mark events for each audio chunk +{ + "event": "mark", + "mark": { "name": "item123:45" }, // itemId:chunkNumber + "streamSid": "..." +} + +// Used for precise interruption timing +``` + + + + +Built-in error handling and recovery: + +```typescript +session.on('error', (error) => { + console.error('Session error:', error); + // Transport automatically attempts reconnection +}); + +transport.on('*', (event) => { + if (event.type === 'transport_error') { + // Handle transport-specific errors + console.error('Transport error:', event.error); + } +}); +``` + + + + +:::tip Performance Optimization +For production deployments: +- Use **G.711 μ-law** for standard phone calls (lower latency) +- Use **PCM16** for high-fidelity demos (better quality) +- Monitor WebSocket connection stability +- Implement connection pooling for high traffic +- Track audio latency metrics +::: + +--- + +## Deployment + +### Local development + +1. **Install dependencies** + ```bash + npm install + ``` + +2. **Set up environment** + ```bash + cp .env.example .env + # Edit .env with your OpenAI API key + ``` + +3. **Start your server** + ```bash + npm run build + npm start + + # Or for development with hot reload: + npm run dev + ``` + +4. **Expose with ngrok** + ```bash + npx ngrok http 5050 + # Note the HTTPS URL (e.g., https://abc123.ngrok.io) + ``` + +5. **Configure SignalWire webhook** + - Use the ngrok HTTPS URL + `/incoming-call` + - Example: `https://abc123.ngrok.io/incoming-call` + +6. **Test your setup** + ```bash + # Check health endpoint + curl https://abc123.ngrok.io/health + + # Should return: {"status":"healthy","timestamp":"..."} + ``` + +### Production with Docker + + + + +```dockerfile +FROM node:20-alpine + +# Install system dependencies +RUN apk add --no-cache dumb-init + +WORKDIR /app + +# Copy package files +COPY package*.json ./ + +# Install dependencies +RUN npm ci --only=production && npm cache clean --force + +# Copy source code +COPY . . + +# Build TypeScript +RUN npm run build + +# Create non-root user +RUN addgroup -g 1001 -S nodejs && \ + adduser -S nodeuser -u 1001 + +# Change ownership and switch to non-root user +RUN chown -R nodeuser:nodejs /app +USER nodeuser + +EXPOSE 5050 + +# Use dumb-init for proper signal handling +ENTRYPOINT ["dumb-init", "--"] +CMD ["node", "dist/index.js"] +``` + + + + +```yaml +services: + signalwire-assistant: + build: . + ports: + - "${PORT:-5050}:${PORT:-5050}" + environment: + - PORT=${PORT:-5050} + - AUDIO_FORMAT=pcm16 + secrets: + - openai_api_key + restart: unless-stopped + healthcheck: + test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:5050/health"] + interval: 30s + timeout: 10s + retries: 3 + start_period: 10s + logging: + driver: "json-file" + options: + max-size: "10m" + max-file: "3" + +secrets: + openai_api_key: + file: ./secrets/openai_api_key.txt +``` + + + + + + +**Security & Secrets:** +- Use Docker secrets or external secret management (AWS Secrets Manager, Azure Key Vault) +- Never commit API keys to version control +- Use non-root user in Docker containers +- Implement proper CORS and rate limiting + +**Monitoring & Observability:** +- Set up health checks (`/health` endpoint included) +- Implement structured logging with correlation IDs +- Monitor WebSocket connection metrics +- Track audio latency and quality metrics +- Set up alerting for failed calls + +**Scalability & Performance:** +- Use horizontal scaling with session affinity +- Implement connection pooling for high traffic +- Consider using Redis for session state if needed +- Monitor memory usage (audio buffers can accumulate) + +**Error Handling:** +- Graceful degradation when OpenAI API is unavailable +- Retry logic with exponential backoff +- Proper WebSocket reconnection handling +- Fallback responses when tools fail + +**Development Workflow:** +```bash +# Local development with hot reload +npm run dev + +# Type checking +npm run typecheck + +# Production build +npm run build && npm start + +# Debug logging +DEBUG=openai-agents:* npm run dev +``` + + + +--- + +**Console Output to Look For:** +```bash +📡 Server running on http://0.0.0.0:5050 +🏥 Health check: http://0.0.0.0:5050/health +🔊 Audio format: g711_ulaw (8kHz telephony) +🎙️ Voice: alloy + +# When calls come in: +📞 Incoming call - Audio format: g711_ulaw, SignalWire codec: default +📱 Client connected to WebSocket +🔧 Tool call started: get_weather +✅ Tool call completed: get_weather +``` + +{/* Needs validation + +## Common Issues & Solutions + +### Debugging + + + + + + + + +### Troubleshooting Guide + +| Issue | Cause | Solution | +|-------|-------|----------| +| No audio from AI | Codec mismatch or transport error | Check `AUDIO_FORMAT` env var, verify SignalWire codec setting | +| High latency | Network or buffering issues | Use `g711_ulaw` for lower latency, check network | +| WebSocket disconnections | Network timeout or server overload | Implement reconnection logic, monitor server resources | +| Function calls fail | Network issues or API errors | Add retry logic, check API quotas and keys | +| "Missing OPENAI_API_KEY" | Configuration error | Verify .env file or Docker secrets setup | +| Calls not connecting | Webhook URL issues | Ensure URL is public and includes `/incoming-call` | +| Audio quality poor | Wrong codec configuration | Match audio format between SignalWire and application | +| Memory leaks | Audio buffer accumulation | Monitor memory usage, implement cleanup | +| Session errors | OpenAI API issues | Check API status, implement fallback responses | + +### Debug Checklist + +**Basic Setup:** +- [ ] Webhook URL includes `/incoming-call` endpoint +- [ ] ngrok is running and exposing port 5050 (for local dev) +- [ ] OpenAI API key is properly configured +- [ ] Node.js 20+ is installed +- [ ] All npm dependencies installed (`npm install`) + +**Configuration:** +- [ ] Audio format matches SignalWire codec setting +- [ ] Environment variables properly set +- [ ] Docker secrets configured (if using Docker) +- [ ] Port 5050 is available and not blocked + +**Runtime:** +- [ ] WebSocket connection establishes successfully +- [ ] Function tools are registered and accessible +- [ ] Health check endpoint responds (`/health`) +- [ ] Console logs show proper connection messages +- [ ] No error messages in server logs + +**SignalWire Integration:** +- [ ] cXML resource properly configured +- [ ] SIP address or phone number linked to resource +- [ ] Webhook URL is publicly accessible +- [ ] SignalWire project settings correct + +**Testing:** +- [ ] Can make test calls to SIP address +- [ ] Audio flows both directions +- [ ] AI responds appropriately +- [ ] Function calls (weather, time) work +- [ ] Interruptions handled gracefully + +*/} + +## Complete example + +See the GitHub repo for a complete working example, including +weather and time function usage, +error handling, +and a production Docker setup. + +} + href="https://github.com/signalwire/solutions-architecture/tree/main/code/cxml-realtime-agent-stream" +/> + +--- + +## Resources + + + } + > + Learn about SignalWire's Call Fabric platform + + } + > + Official documentation for the OpenAI Realtime API + + } + > + Complete reference for Compatibility XML + + } + > + NPM package documentation for the OpenAI Agents SDK + + + + + +[cxml]: /compatibility-api/cxml "Documentation for cXML, or Compatibility XML." +[bidir-stream]: /compatibility-api/cxml/voice/stream#bidirectional-stream "Technical reference for creating a bidirectional Stream in cXML." +[resources]: https://my.signalwire.com?page=resources "The My Resources page of your SignalWire Dashboard." +[repo]: https://github.com/signalwire/solutions-architecture/tree/main/code/cxml-realtime-agent-stream "This project's GitHub repository." +[openai-realtime-api]: https://platform.openai.com/docs/guides/realtime "The OpenAI Realtime API" \ No newline at end of file