An intelligent, voice-controlled virtual assistant built in Python that can learn new skills on the fly by generating its own code. It operates in both English and Spanish and features a user-friendly GUI for configuration.
- Voice Control: Interact with your PC using natural voice commands in English or Spanish.
- Self-Reprogramming: The assistant can learn new skills by generating, validating, and saving its own Python scripts.
- Multi-language Support: Fully functional UI, voice recognition, and responses in both English and Spanish.
- Core Functions:
- Open and close applications.
- Search on Google, YouTube, and Spotify.
- Control system volume and media playback (play/pause/next/previous).
- Get system status (CPU & RAM usage).
- Take screenshots.
- Persistent Memory: Remembers user-specific facts (e.g., your name, hobbies) across sessions using a local SQLite database.
- User-Friendly Configuration: A settings panel to easily change:
- Google Gemini API Key.
- UI and Voice Language.
- Text-to-Speech (TTS) voice.
- "Run on Startup" behavior for Windows.
- Special Modes: Includes dedicated modes for real-time translation and note-taking.
- System Tray Integration: Can be minimized to the system tray to run unobtrusively in the background.
Follow these instructions to get the assistant running on your local machine.
- Python 3.9 or higher.
pip(Python package installer).- A Google Gemini API Key. You can get one for free from Google AI Studio.
-
Clone the repository:
git clone https://github.com/inwoke032/Python-AI-Assistant.git cd Python-AI-Assistant -
Create and activate a virtual environment (highly recommended):
# For Windows python -m venv venv .\venv\Scripts\activate # For macOS/Linux python3 -m venv venv source venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Initial Configuration:
- Run the application for the first time:
python main.py
- Once the application launches, click the ⚙️ Settings button.
- In the settings window, click "Change API Key" and paste your Google Gemini API key.
- Your key will be saved locally in
config.jsonand is ready for use.
- Run the application for the first time:
This guide will walk you through every feature, from the basics to the most advanced capabilities.
To give the assistant a command, you first need to get its attention. There are two simple methods:
-
Voice Activation (Wake Word):
- Clearly say the phrase: "Hey Assistant" (for English) or "Oye Asistente" (for Spanish).
- The application's interface will indicate that it is listening. You can then state your command. This is ideal for hands-free operation.
-
Manual Activation (Push-to-Talk):
- Click the "🎙️ Speak (PTT)" button on the main window.
- The button will change its state to show that the microphone is active. State your command. This is perfect for noisy environments or when you want full control over when the assistant listens.
The assistant understands natural language, so you don't need to memorize exact phrases. Here is a comprehensive guide to its abilities with varied examples.
| Capability | Description | Example Commands |
|---|---|---|
| Open Programs | Launches any application installed on your PC. | "Open Google Chrome" "Launch Spotify, please" "Run calculator" |
| Close Programs | Terminates a running application's process. | "Close notepad" "Terminate the Spotify process" |
| Perform Calculations | Solves simple mathematical operations. | "What is 125 times 8?" "Calculate 1024 divided by 16" |
| Capability | Description | Example Commands |
|---|---|---|
| Search Google | Opens your browser to search for anything. | "Search for information about the history of computing" "Google the recipe for lasagna" |
| Search YouTube | Finds and plays a video on YouTube. | "Play a video on YouTube about outer space" "I want to watch a Python tutorial" |
| Search Spotify | Finds music in the Spotify application. | "Play music by Queen on Spotify" "Search for the album 'Midnights'" |
| Capability | Description | Example Commands |
|---|---|---|
| System Status | Reports the current CPU and RAM usage. | "What is the system status?" "Tell me the PC's performance" |
| Take Screenshot | Saves a full-screen image to the program's folder. | "Take a screenshot" "Capture the screen" |
| Volume Control | Modifies your system's master volume. | "Turn up the volume" "Lower the volume" "Mute" |
| Media Control | Controls playback in media players. | "Pause the music" "Resume playing" "Next song" "Previous song" |
If you need the assistant to perform a task it doesn't know, you can teach it!
- Trigger Learning: Use the command "Learn to..." followed by the desired task.
- Code Generation: The assistant will use the Gemini AI to write a small Python script to perform the task.
- Security Confirmation: It will show you the generated code and ask for your permission to run it. It is critical to read the code to ensure it is safe before you approve it.
- Execution and Saving: If you approve, the script will run. If it succeeds, it will be saved as a new, permanent skill.
Practical Example:
You: "Hey Assistant, learn to create a text file on the desktop named 'shopping list'."
Assistant: "Understood. I have generated a script for this task. Do you want me to execute it?" (Shows you the code).
You: "Yes, go ahead."
Assistant: "Done! I have learned the new skill and will remember it for the future."
The assistant can switch its behavior for specific tasks.
- To Activate: "Activate translator mode to French" (or English, German, etc.).
- How it Works: While active, anything you say will be translated into your chosen language. The assistant will speak the translation back to you.
- To Deactivate: "Exit translator mode".
- To Activate: "Take a note" or "Write a note".
- How it Works: Everything you say will be saved line-by-line into a
notes.txtfile in the program's folder, timestamped for your convenience. - To Deactivate: "End note".
- Language: Python 3
- GUI: Tkinter
- AI Model API: Google Gemini
- Speech Recognition:
speech_recognitionlibrary - Text-to-Speech (TTS):
pyttsx3 - Local Database: SQLite3
- System Automation:
pyautogui,psutil - Windows Integration:
winshell(for "Run on Startup")
This project is licensed under the MIT License. See the LICENSE file for details.
