Skip to content

gmh5225/Smart-Call

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📞 AI Phone Call Simulator with Real-Time Audio and Tool Integration

This project simulates a real-time voice call between a human user and an OpenAI-powered conversational agent. It supports bidirectional audio streaming, dynamic prompt configuration, and tool-calling via a separate FastAPI server exposing both OpenAPI and MCP-compatible endpoints.


✨ Features

  • 🔁 Full-duplex (bidirectional) audio using WebSockets and pyaudio
  • 🎙️ Realtime transcription & TTS using OpenAI's Whisper + Speech APIs
  • 🧠 Custom toolset integration served by a separate server (tools_server.py)
  • 🔌 Tool use via both OpenAPI and MCP endpoints
  • 🌐 Web interface via FastAPI (app.py) to simulate an interactive call experience

🗂️ Project Structure

.
├── app.py          # Web server (FastAPI) handling audio exchange and user interaction
├── main.py         # CLI runner to test mic/speaker streaming with OpenAI API
├── prompts.py      # System prompt and tool configuration logic
├── tools_server.py # Standalone API server exposing OpenAPI + MCP endpoints
├── requirements.txt# Dependency list
├── .env.example    # Sample environment variables
└── static/         # Web frontend assets (HTML, JS, etc.)
└── templates/      # Web frontend assets (HTML, JS, etc.)

🧰 Installation

1. Create a virtual environment

We recommend using uv for fast dependency resolution.

uv venv
source .venv/bin/activate  # or `.venv\Scripts\activate` on Windows

2. Install dependencies

uv pip install -r requirements.txt

🔐 Environment Variables

Create a .env file from the provided .env.example:

API_KEY=your_openai_key_here
WS_URL=wss://[api.openai.com/v1/audio/ws](https://api.openai.com/v1/audio/ws)  # Or your realtime endpoint

🚀 Running the Components

1. Tool Server (OpenAPI + MCP)

Start this first to expose tool endpoints to the assistant:

python tools_server.py

Access:

  • http://127.0.0.1:8888/docs – OpenAPI Docs
  • http://127.0.0.1:8888/.well-known/ai-plugin.json – Plugin manifest for tool calling
  • MCP endpoint is mounted via fastapi-mcp

2. Web App (User Interface)

In another terminal:

python app.py

Access the interface at: http://127.0.0.1:8000

3. CLI Audio Simulator (Optional)

You can simulate mic-to-model calls via:

python main.py

This connects to the OpenAI realtime API, transmits microphone input, and plays AI audio response live.


🔧 Tool Configuration

Tools are defined in tools_server.py and registered via:

{
  "tools": [
    {
      "type": "openapi",
      "url": "[https://xyz123.ngrok.io/.well-known/ai-plugin.json](https://xyz123.ngrok.io/.well-known/ai-plugin.json)"
    }
  ]
}

To switch to an MCP-compatible client (like npx @modelcontextprotocol/inspector), ensure your client config is similar to:

{
  "mcp": {
    "servers": {
      "bank-mcp-server": {
        "type": "sse",
        "url": "[http://127.0.0.1:8888/mcp](http://127.0.0.1:8888/mcp)"
      }
    }
  }
}

📤 Voice Interaction Flow

  1. User speaks or sends audio (Web UI or main.py).
  2. Audio is converted, encoded, and sent via WebSocket to OpenAI.
  3. OpenAI replies with TTS audio and optionally calls a tool.
  4. Result is streamed back and played to the user.

📎 Dependencies

Key libraries:

  • fastapi, uvicorn, fastapi-mcp – Web & MCP API
  • pyaudio, pydub – Audio handling
  • websocket-client, socks, nest_asyncio – Real-time communication
  • python-dotenv, pydantic – Configuration and validation

🧪 Testing with MCP Inspector

Install and run the inspector:

npx @modelcontextprotocol/inspector

Paste the MCP server config, then issue tool calls via the UI to test interaction with your tools_server.py.


📌 Notes

This is a local-first prototype meant to test real-time capabilities. For production, you may need proper deployment (e.g., Docker, HTTPS certs, scalable ASGI workers).

Audio support on the frontend assumes webm and wav interoperability.

About

This project is an AI phone call simulator featuring real-time, bidirectional audio streaming with an OpenAI agent and dynamic tool integration via FastAPI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 79.9%
  • JavaScript 15.4%
  • CSS 2.4%
  • HTML 2.3%