A voice-based chat application for interacting with an LLM, built with Fastrtc[https://github.com/freddyaboulton/fastrtc]. This is mainly a proof of concept of using Fastrtc in backend. We have an electron app and a standard issue webapp as frontend. Using local STT and TTS models in backend. Though works well on my 2017 Intel chip macbook pro.
No real "agentic" stuff yet but you can checkout CompUse[https://github.com/swairshah/CompUse] to see how to make your app do "agentic" stuff. Just replace the LLMHandler call with your own agentic handler.
This repository contains two client applications that connect to the same backend:
- Electron App - A desktop application built with Electron.
- Webapp - See the webapp directory for more details.
- Node.js (v14+)
- npm
- Python 3.11+
- Fastrtc library (
pip install fastrtc) - Anthropic API key for Claude (again, replace with whatever llm provider)
- Set up Python virtual environment:
# Create a virtual environment
uv venv
# or
# python -m venv .venv
# Activate the virtual environment
# On Windows
# .venv\Scripts\activate
# On macOS/Linux
source .venv/bin/activate
# Install Python dependencies
uv pip install fastapi fastrtc anthropic uvicorn- Install Node.js dependencies:
npm install- Configure environment variables:
# Set your Anthropic API key
export ANTHROPIC_API_KEY=your_api_key- First, start the FastAPI server:
# Make sure your virtual environment is activated
# On macOS/Linux
source .venv/bin/activate
# On Windows
# .venv\Scripts\activate
# Run the server
python main.pyThe backend server will run at http://localhost:8000.
In a separate terminal, start the Electron app:
npm run devThe Electron app will connect to the locally running server.
To run the TypeScript web application:
# Navigate to the webapp directory
cd webapp
# Install dependencies (first time only)
npm install
# Start the development server
npm startThe web app will be available at http://localhost:3000 and will connect to the same backend server.
Note: You can use either the Electron app or the web app with the same backend. Both provide similar functionality.
Both client applications (Electron and web) connect to a local FastAPI server that uses Fastrtc for voice processing. The system:
- Uses dual communication channels:
- WebRTC for real-time audio streaming (microphone input and TTS output)
- WebSocket for text communication (chat transcript and session management)
- Processes voice through a pipeline of:
- Speech-to-Text (STT) to transcribe user input
- LLM processing via Claude to generate responses
- Text-to-Speech (TTS) to convert responses to audio
Fastrtc simplifies WebRTC implementation, which is traditionally complex. Fastrtc handles all the complex WebRTC setup with a simple Stream class that can be mounted on a FastAPI app. We get built-in functionality that would otherwise require custom implementation. This is done by the ReplyOnPause, check out the main.py file to see how it works.
Without Fastrtc, implementing this voice interface would require:
- Custom WebRTC signaling server implementation
- Complex peer connection and media stream handling
- Manual audio processing and turn detection
- Separate STT/TTS integration
The backend server and client applications run as separate processes, giving you flexibility to:
- Debug the server independently
- Make server-side changes without restarting the client applications
- Connect different clients to the same server
- Choose between desktop (Electron) or browser (TypeScript web app) interfaces
# From the root directory
npm run buildThis will create desktop application distributables in the dist folder.
# Navigate to the webapp directory
cd webapp
# Build for production
npm run buildThis will create production web files in the webapp/dist folder, which can be deployed to any web server.