This application is a conversational AI assistant named Ada, built with a Python backend (Flask, SocketIO, Google Gemini) and a React frontend. It supports text input, client-side speech-to-text (Web Speech API), text-to-speech (ElevenLabs), webcam video frame processing, and integrates with external APIs for weather (python_weather), maps/directions (googlemaps), and web search (googlesearch-python, aiohttp, BeautifulSoup). Communication between the frontend and backend happens in real-time using WebSockets (SocketIO).
-
Backend (
backend/app.py,backend/ADA_Online.py):- A Flask server manages HTTP requests and SocketIO connections.
- SocketIO handles real-time bidirectional communication with the React frontend.
- An
ADAclass instance (ADA_Online.py) encapsulates the core assistant logic. - It uses
asynciowithin a separate thread to manage asynchronous tasks like interacting with the Gemini API, handling TTS streams, and processing inputs without blocking the Flask server. - It connects to the Google Gemini API (
google-generativeai) for conversational AI capabilities, configured with specific system instructions and tool functions (weather, travel duration, search). - It receives text input, transcribed speech, and video frames from the client via SocketIO.
- It processes text and video frames, sending them to the Gemini API.
- It handles function calls requested by Gemini, executing corresponding Python functions (e.g.,
get_weather,get_travel_duration,get_search_results). - The search function fetches URLs and then asynchronously extracts content (title, snippet, paragraph text) from those pages using
aiohttpandBeautifulSoup. - It streams Gemini's text responses back to the client chunk by chunk via SocketIO.
- It streams text responses to the ElevenLabs TTS API via WebSocket to generate audio.
- Generated audio chunks (PCM) are received from ElevenLabs and streamed back to the client via SocketIO.
- API results (weather, maps, search) are also emitted to the client via dedicated SocketIO events to update specific UI widgets.
-
Frontend (
frontend/src/App.jsx, components):- A React application provides the user interface.
- It establishes a SocketIO connection to the backend server.
- It renders components for chat display (
ChatBox), user input (InputArea), status messages (StatusDisplay), AI visualization (AiVisualizer), webcam feed (WebcamFeed), and widgets for weather (WeatherWidget), maps (MapWidget), code execution (CodeExecutionWidget), and search results (SearchResultsWidget). - Input:
- Text input is sent via the
send_text_messageSocketIO event. - The Web Speech API is used for client-side speech recognition. Final transcripts are sent via the
send_transcribed_textevent. - If the webcam is enabled, video frames are captured periodically from a
<video>element onto a<canvas>, converted to JPEG data URLs, and sent via thesend_video_frameevent.
- Text input is sent via the
- Output:
- Status messages and errors from the backend are displayed.
- Text chunks received via
receive_text_chunkare assembled and displayed in the chatbox. - Base64 encoded audio chunks received via
receive_audio_chunkare queued and played back using the Web Audio API (AudioContext). - Data received via
weather_update,map_update,executable_code_received, andsearch_results_updateupdates the state, causing the respective widgets to render or update.
- The
AiVisualizercomponent changes appearance based on Ada's status (idle, listening, speaking). - The
WebcamFeedcomponent handles accessing the user's camera, displaying the feed (mirrored), and capturing frames. - Other widgets (
Weather,Map,Code,Search) are displayed conditionally when relevant data is received from the backend.
These instructions assume you have Git, Python 3.7+, pip, and Node.js (with npm) installed on your system.
-
Clone the Repository: Open your terminal or command prompt and clone the project repository from its source (replace
<repository_url>with the actual URL):git clone <repository_url> cd <repository_directory_name> # Navigate into the cloned project directory
-
Backend Setup: Follow the steps in the Backend Setup (Python) section.
-
Frontend Setup: Follow the steps in the Frontend Setup (React) section.
-
Configuration: Create and populate the
.envfile as described in the Configuration section. -
Run the Application: Follow the steps in the Running the Application section.
-
Navigate to Backend Directory: From the root project directory you cloned, navigate into the backend folder:
cd backend # Or the name of your backend directory
-
Create & Activate Virtual Environment: It's highly recommended to use a virtual environment to manage dependencies.
python -m venv venv source venv/bin/activate # On Linux/macOS # OR # venv\Scripts\activate # On Windows Command Prompt/PowerShell
You should see
(venv)prefixing your terminal prompt. -
Install Dependencies: Ensure you have a
requirements.txtfile in thebackenddirectory with the following content:# requirements.txt Flask Flask-SocketIO python-dotenv google-generativeai torch # Or torch-cpu if no CUDA GPU / for simpler setup python-weather googlemaps websockets googlesearch-python aiohttp beautifulsoup4 lxml # Parser for BeautifulSoup requests # Often a dependency eventlet # Recommended async mode for Flask-SocketIOInstall the packages using pip:
pip install -r requirements.txt
(Note:
torchinstallation can be complex. If you encounter issues or don't have an NVIDIA GPU, consider usingtorch-cpuinrequirements.txt. Visit the PyTorch website for specific installation instructions for your system if needed.) -
Configuration: Make sure you have created the
.envfile inside thisbackenddirectory as detailed in the Configuration section.
-
Navigate to Frontend Directory: From the root project directory you cloned, navigate into the frontend folder:
cd ../frontend # Or the name of your frontend directory (use 'cd ..' first if still in 'backend')
-
Install Dependencies: This command reads the
package.jsonfile and installs all the necessary Node.js modules.npm install
This will install React,
socket.io-client,react-youtube,prop-types, and any other dependencies defined inpackage.json.
-
Locate Backend Directory: Ensure you are in the backend directory (e.g.,
cd backendfrom the root project folder). -
Create
.envfile: Create a file named exactly.env. -
Add API Keys and Secrets: Open the
.envfile and add the following lines, replacing the placeholder values with your actual keys and desired settings:# --- Backend API Keys --- # Get from ElevenLabs website ELEVENLABS_API_KEY="YOUR_ELEVENLABS_API_KEY" # Get from Google AI Studio (for Gemini Models) GOOGLE_API_KEY="YOUR_GOOGLE_GEMINI_API_KEY" # Get from Google Cloud Console (Enabled for Directions API) MAPS_API_KEY="YOUR_Maps_API_KEY" # --- Flask Server Settings --- # Used for session security, generate a random string FLASK_SECRET_KEY="a_very_strong_and_random_secret_key_please_change_me" # --- Frontend Settings (for Backend CORS) --- # Port the React frontend development server runs on REACT_APP_PORT="5173" # Default for Vite. Use 3000 for Create React App, or your custom port.
Important:
- Never commit your
.envfile to Git. Add.envto your.gitignorefile in the backend directory. - Ensure the
MAPS_API_KEYcorresponds to a Google Cloud project where the Directions API is enabled. - Ensure the
GOOGLE_API_KEYis for Google Gemini models (available via Google AI Studio). - Generate a truly random and strong
FLASK_SECRET_KEY.
- Never commit your
You will need two separate terminals open: one for the backend and one for the frontend.
-
Start the Backend Server:
- Open Terminal 1.
- Navigate to the
backenddirectory. - Activate the Python virtual environment:
source venv/bin/activate(orvenv\Scripts\activateon Windows). - Run the Flask application:
python app.py
- Wait for output indicating the server is running (e.g.,
* Running on http://0.0.0.0:5000and WebSocket server started messages). Leave this terminal running.
-
Start the Frontend Development Server:
- Open Terminal 2.
- Navigate to the
frontenddirectory. - Run the start script (use the command appropriate for your project setup):
npm run dev # If using Vite (likely based on main.jsx structure) # OR # npm start # If using Create React App
- This should automatically open the application in your default web browser, pointing to
http://localhost:5173(or the port specified inREACT_APP_PORTif configured differently).
-
Use the Application:
- Interact with the interface in your browser. Grant microphone and webcam permissions when prompted if you wish to use those features.
- Stop the Frontend Server: Go to Terminal 2 (where the frontend is running) and press
Ctrl + C. Confirm if prompted. - Stop the Backend Server: Go to Terminal 1 (where the backend is running) and press
Ctrl + C. - Deactivate Virtual Environment (Optional): In Terminal 1, you can type
deactivate.