ADA is a helpful AI assistant specializing in STEM fields, designed to provide concise and accurate information and assist with various tasks through voice or text interaction. ADA comes in two versions: a local version (ada_local) that runs primarily on your machine and an online version (ada_online) that utilizes cloud-based services. A separate multimodal live demo (multimodal_live_api.py) is also included, showcasing real-time audio and video interaction.
Recommendation: While both versions are available, the ada_online version is heavily recommended. It leverages powerful cloud-based models (Google Gemini) and services (ElevenLabs TTS) that generally offer faster, higher-quality, and more reliable responses compared to the local version, which is dependent on your hardware capabilities. The online models have also been developed and refined for a longer period.
- Dual Versions: Choose between running ADA locally (
ada_local) or using online services (ada_online). - Real-time Interaction: Communicate with ADA using voice (Speech-to-Text) and receive spoken responses (Text-to-Speech).
- Function Calling & Grounding: ADA can perform specific tasks by calling available functions (widgets) and use tools like Google Search to access current information.
- Accessing system information (
system.info) - Setting timers (
timer.set) - Creating project folders (
project.create_folder) - Opening the camera (
camera.open) - Managing a To-Do list (
to_do_list.py- Note: Not currently integrated as a callable tool in provided main scripts) - Getting weather (
get_weather) - Calculating travel duration (
get_travel_duration)
- Accessing system information (
- STEM Expertise: Designed to assist with engineering, math, and science queries.
- Conversational: Engages in natural language conversation.
- Multimodal Demo: Includes a script (
multimodal_live_api.py) for live interaction combining audio and video (camera/screen).
- Python: Ensure you have Python installed (code uses features compatible with Python 3.11+).
- Ollama (for
ada_local): You need Ollama installed and running to serve the local LLM. Make sure you have downloaded the model specified inADA/ADA_Local.py(e.g.,gemma3:4b-it-q4_K_M). Performance heavily depends on your hardware. - CUDA (Optional, for
ada_local& potentially local STT/TTS models): For better performance with local models, a CUDA-compatible GPU and the necessary drivers are recommended. ADA's local components attempt to automatically detect and use the GPU if available via PyTorch. - Microphone and Speakers: Required for voice interaction (STT/TTS). Headphones are strongly recommended to prevent echo and self-interruption.
- API Keys (for
ada_online&multimodal_live_api.py): See the API Key Setup section below. - FFmpeg (Optional, Recommended): The
RealtimeSTTorRealtimeTTSlibraries (or their dependencies) might rely on FFmpeg for audio processing. If you encounter audio errors (liketorchaudiowarnings in logs), installing FFmpeg and ensuring it's in your system's PATH is recommended. - System Dependencies (e.g.,
portaudio): Libraries likePyAudiomight require system-level libraries (likeportaudioon Linux/macOS or specific drivers on Windows). Consult the documentation forPyAudioandRealtimeTTS(especially if usingCoquiEngine) for specific OS requirements.
- Clone the Repository:
git clone https://github.com/Nlouis38/ada.git cd ada_v1 - Install Dependencies:
Create a virtual environment (recommended):
Install the required Python libraries:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
pip install ollama websockets pyaudio RealtimeSTT RealtimeTTS torch google-generativeai opencv-python pillow mss psutil GPUtil elevenlabs python-dotenv python-weather googlemaps # Add any other specific libraries used
Both ada_online and multimodal_live_api.py require API keys for cloud services. It is highly recommended to use environment variables for security instead of hardcoding keys into the scripts.
-
Create a
.envfile: In the rootada_v1directory, create a file named.env. -
Add Keys to
.env: Open the.envfile and add your keys in the following format:# .env file GOOGLE_API_KEY=YOUR_GOOGLE_AI_STUDIO_KEY_HERE ELEVENLABS_API_KEY=YOUR_ELEVENLABS_KEY_HERE MAPS_API_KEY=YOUR_Maps_API_KEY_HERE
-
Get the Keys:
- Google Generative AI (Gemini API):
- Purpose: Core LLM for
ada_onlineandmultimodal_live_api.py. - Get: Visit Google AI Studio, sign in, and create an API key.
- Purpose: Core LLM for
- ElevenLabs:
- Purpose: High-quality Text-to-Speech (TTS) for
ada_online. - Get: Go to ElevenLabs, log in, and find your API key in your profile/settings.
- Purpose: High-quality Text-to-Speech (TTS) for
- Google Maps:
- Purpose: Used by the
get_travel_durationfunction tool inada_online. - Get: Go to the Google Cloud Console, create a project (or use an existing one), enable the "Directions API", and create an API key under "Credentials".
- Purpose: Used by the
- Google Generative AI (Gemini API):
-
Code Usage: The Python scripts (
ADA_Online.py,multimodal_live_api.py,tts_latency_test.py) usepython-dotenvto automatically load these variables from the.envfile when the script starts.# Example from ADA_Online.py from dotenv import load_dotenv load_dotenv() # Loads variables from .env ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY") GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY") MAPS_API_KEY = os.getenv("MAPS_API_KEY") # ... later use these variables ... self.client = genai.Client(api_key=GOOGLE_API_KEY, ...) # or when initializing ElevenLabsEngine/Websocket connection
ADA uses real-time libraries for voice interaction:
- STT (Speech-to-Text):
- Library:
RealtimeSTTis used in bothada_localandada_online. - Functionality: Captures audio from the default microphone, detects speech, and transcribes it to text using a backend model (e.g., Whisper
large-v3specified in the configs).
- Library:
- TTS (Text-to-Speech):
- Library:
RealtimeTTSprovides the framework. Different engines handle the actual synthesis:ada_local: UsesRealtimeTTSlikely withSystemEngine(OS default TTS) or potentiallyCoquiEngine(local neural voice, requires setup). Quality and latency depend heavily on the chosen engine and system hardware.ada_online(Recommended): UsesElevenlabsEnginevia WebSockets. This typically provides very low latency and high-quality, natural-sounding voices, but requires an ElevenLabs API key and internet connection.ada_online_noelevenlabs: UsesRealtimeTTSwithSystemEngine, offering an online LLM experience without needing an ElevenLabs key, but using the basic OS TTS voice.
- Library:
Uses Ollama for the LLM and local engines for STT/TTS. Performance depends significantly on your CPU/GPU and RAM.
- LLM: Served locally via Ollama (e.g.,
gemma3:4b-it-q4_K_M). - STT:
RealtimeSTT. - TTS:
RealtimeTTSwithSystemEngineorCoquiEngine. - To run:
# Ensure Ollama is running with the required model pulled python main_local.py
Uses Google Gemini (cloud) for LLM and ElevenLabs (cloud) for TTS. Requires API keys and internet. Generally faster and higher quality.
- LLM: Google Gemini (
gemini-2.0-flash-live-001or similar). - STT:
RealtimeSTT. - TTS:
RealtimeTTSwithElevenlabsEnginevia WebSockets. - To run:
# Make sure .env file is set up with API keys python main_online.py
Uses Google Gemini (cloud) for LLM and local OS TTS. A middle ground if you want the better online LLM but don't have/want an ElevenLabs key.
- LLM: Google Gemini (
gemini-2.0-flash-live-001or similar). - STT:
RealtimeSTT. - TTS:
RealtimeTTSwithSystemEngine. - To run:
# Make sure .env file is set up with GOOGLE_API_KEY and MAPS_API_KEY python main_online_noelevenlabs.py
This script demonstrates real-time, multimodal interaction using the Gemini Live API. It streams audio from your microphone and video frames (from your camera or screen) to the Gemini model and plays back the audio response.
- Ensure dependencies are installed (see main Installation section).
- Ensure your
GOOGLE_API_KEYis set in your.envfile. - Use headphones!
- With Camera:
python multimodal_live_api.py --mode camera # or just python multimodal_live_api.py - With Screen Sharing:
python multimodal_live_api.py --mode screen
- Audio Only:
python multimodal_live_api.py --mode none
- You can type text messages in the console while the audio/video stream is running. Type 'q' and Enter to quit.
Once main_local.py, main_online.py, or main_online_noelevenlabs.py is running:
- Voice Input: Speak clearly into your microphone. The STT engine will detect speech and transcribe it.
- Text Input: If you prefer typing, type your prompt into the console when it says "Enter your message:" and press Enter.
- Exit: Type
exitand press Enter.
ADA (ada_local and ada_online) can utilize several built-in functions/tools:
- Local Widgets (
WIDGETS/directory): Primarily used byada_local.camera.py: Opens the default camera feed. (Note: Implementation returns string, doesn't keep feed open)project.py: Creates project folders.system.py: Provides system hardware information.timer.py: Sets countdown timers.to_do_list.py: Manages a simple to-do list. (Not integrated)
- Online Tools (Gemini API): Used by
ada_onlineversions.GoogleSearch: Accesses Google Search for current information.get_weather: Fetches weather usingpython-weather.get_travel_duration: Calculates travel time usinggooglemaps.CodeExecution: Allows Gemini to generate and potentially execute code (primarily for analysis/computation, not file system interaction).
ADA decides when to call these based on your request and the model's understanding.
- Audio Issues (No Input/Output):
- Ensure microphone/speakers are system defaults and not muted.
- Check
PyAudiodependencies (portaudio). - Ensure necessary permissions are granted for microphone access.
- Try different audio devices if available.
- Check for
FFmpegif errors mention audio encoding/decoding.
- API Key Errors (
ada_online,multimodal_live_api.py):- Verify keys are correct in the
.envfile. - Ensure the relevant APIs (Gemini, Maps, ElevenLabs) are enabled in their respective cloud consoles.
- Check API key quotas and billing status.
- Verify keys are correct in the
- Library Errors:
- Ensure all dependencies from
Installationare correctly installed in your active virtual environment. - Some libraries (e.g.,
torch,tensorflowused by STT/TTS backends) might have specific CPU/GPU version requirements.
- Ensure all dependencies from
- Ollama Issues (
ada_local):- Confirm Ollama service is running.
- Verify the specified model (e.g.,
gemma3:4b-it-q4_K_M) is downloaded (ollama pull model_name) and accessible. - Check Ollama logs for errors.
- TTS Issues:
- If using
ElevenlabsEngine, check API key and internet connection. - If using
CoquiEngine, ensure it's installed correctly and models are downloaded. - If using
SystemEngine, ensure your OS's built-in TTS is functional. Latency might be higher.
- If using
- STT Issues:
- Check microphone levels.
- Ensure
RealtimeSTTmodel is appropriate for your hardware (larger models need more resources). - Background noise can interfere. Use headphones.