This project simulates a real-time voice call between a human user and an OpenAI-powered conversational agent. It supports bidirectional audio streaming, dynamic prompt configuration, and tool-calling via a separate FastAPI server exposing both OpenAPI and MCP-compatible endpoints.
- 🔁 Full-duplex (bidirectional) audio using WebSockets and
pyaudio - 🎙️ Realtime transcription & TTS using OpenAI's Whisper + Speech APIs
- 🧠 Custom toolset integration served by a separate server (
tools_server.py) - 🔌 Tool use via both
OpenAPIandMCPendpoints - 🌐 Web interface via FastAPI (
app.py) to simulate an interactive call experience
.
├── app.py # Web server (FastAPI) handling audio exchange and user interaction
├── main.py # CLI runner to test mic/speaker streaming with OpenAI API
├── prompts.py # System prompt and tool configuration logic
├── tools_server.py # Standalone API server exposing OpenAPI + MCP endpoints
├── requirements.txt# Dependency list
├── .env.example # Sample environment variables
└── static/ # Web frontend assets (HTML, JS, etc.)
└── templates/ # Web frontend assets (HTML, JS, etc.)
We recommend using uv for fast dependency resolution.
uv venv
source .venv/bin/activate # or `.venv\Scripts\activate` on Windowsuv pip install -r requirements.txtCreate a .env file from the provided .env.example:
API_KEY=your_openai_key_here
WS_URL=wss://[api.openai.com/v1/audio/ws](https://api.openai.com/v1/audio/ws) # Or your realtime endpointStart this first to expose tool endpoints to the assistant:
python tools_server.pyAccess:
http://127.0.0.1:8888/docs– OpenAPI Docshttp://127.0.0.1:8888/.well-known/ai-plugin.json– Plugin manifest for tool calling- MCP endpoint is mounted via
fastapi-mcp
In another terminal:
python app.pyAccess the interface at: http://127.0.0.1:8000
You can simulate mic-to-model calls via:
python main.pyThis connects to the OpenAI realtime API, transmits microphone input, and plays AI audio response live.
Tools are defined in tools_server.py and registered via:
{
"tools": [
{
"type": "openapi",
"url": "[https://xyz123.ngrok.io/.well-known/ai-plugin.json](https://xyz123.ngrok.io/.well-known/ai-plugin.json)"
}
]
}To switch to an MCP-compatible client (like npx @modelcontextprotocol/inspector), ensure your client config is similar to:
{
"mcp": {
"servers": {
"bank-mcp-server": {
"type": "sse",
"url": "[http://127.0.0.1:8888/mcp](http://127.0.0.1:8888/mcp)"
}
}
}
}- User speaks or sends audio (Web UI or
main.py). - Audio is converted, encoded, and sent via WebSocket to OpenAI.
- OpenAI replies with TTS audio and optionally calls a tool.
- Result is streamed back and played to the user.
Key libraries:
fastapi,uvicorn,fastapi-mcp– Web & MCP APIpyaudio,pydub– Audio handlingwebsocket-client,socks,nest_asyncio– Real-time communicationpython-dotenv,pydantic– Configuration and validation
Install and run the inspector:
npx @modelcontextprotocol/inspectorPaste the MCP server config, then issue tool calls via the UI to test interaction with your tools_server.py.
This is a local-first prototype meant to test real-time capabilities. For production, you may need proper deployment (e.g., Docker, HTTPS certs, scalable ASGI workers).
Audio support on the frontend assumes webm and wav interoperability.