This project implements a VOICE RAG Agent powered by Cartesia
Ensure you have Python 3.11 or later installed and run:
pip install -r requirements.txtThis implementation uses OpenAI's services for speech-to-text and cartesia for speech synthesis, simpler setup if you already have OpenAI API keys.
- Cartesia AI key
- OpenAI API key
- LiveKit credentials
- Copy
.env.exampleto.env - Configure the following environment variables:
OPENAI_API_KEY=your_openai_api_key
CARTESIA_API_KEY=your_cartesia_api_key
LIVEKIT_URL=your_livekit_url
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret
ASSEMBLYAI_API_KEY=your_assemblyai_api_keypython voice_agent_openai.py startThis implementation uses AssemblyAI for speech processing and Ollama (with Gemma) for language tasks.
-
Install Ollama
# For macOS brew install ollama # For Linux curl -fsSL https://ollama.com/install.sh | sh
-
Pull Gemma Model
ollama pull gemma3
-
Configure Environment Copy
.env.exampleto.envand set:CARTESIA_API_KEY=your_cartesia_api_key LIVEKIT_URL=your_livekit_url LIVEKIT_API_KEY=your_livekit_api_key LIVEKIT_API_SECRET=your_livekit_api_secret ASSEMBLYAI_API_KEY=your_assemblyai_api_key
-
Start Ollama server:
ollama serve
-
In a new terminal, run the voice agent:
python voice_agent.py start
Built by Adityeah
I build AI agents that solve real human problems, not just productivity ones.
📰 Read the full breakdown in my newsletter: Adityeah's Newletter
🤝 Connect on LinkedIn: Aditya Chaudhari
Contributions are welcome! Please fork the repository and submit a pull request with your improvements.