This is a voice assistant application that listens for voice commands, transcribes the audio using Whisper, and executes various actions based on the transcribed text. The app uses a browser automation class to interact with the OpenAI GPT-4 model for generating responses to user prompts.
- Listens for voice commands continuously
- Transcribes audio using the Whisper model
- Executes external commands based on the transcribed text
- Interacts with the OpenAI GPT-4 model for generating responses
- Displays the generated responses in a popup window
-
Clone the repository:
git clone https://github.com/luishacm/voice-assistant-app.git
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up the necessary configurations:
- Set the
profile_path
variable inbrowser.py
to the path of your Selenium profile directory. - Adjust the
seconds_to_command
andsilence_threshold
variables inapp.py
according to your preferences.
- Set the
-
Run the application:
python app.py
- Launch the application by running
app.py
. - The app will start listening for voice commands.
- Speak a command that includes the keyword "luma" to activate the voice assistant.
- The app will transcribe the audio and execute the corresponding action based on the transcribed text.
- If the command requires interaction with the OpenAI GPT-4 model, the app will send the prompt to the model and display the generated response in a popup window.
- To stop the application, say a command that includes the keyword "desligar" or "luma".
app.py
: The main entry point of the application. It handles audio recording, transcription, and command execution.browser.py
: Contains theBrowser
class responsible for browser automation and interaction with the OpenAI GPT-4 model.external_commands.py
: Defines theExternalCommands
class, which contains methods for executing various external commands based on the transcribed text.popup_window.py
: Implements the popup window functionality for displaying the generated responses.
faster_whisper
: Whisper model for audio transcriptionsounddevice
: Audio input/output librarysoundfile
: Audio file I/O librarynumpy
: Numerical computing librarywinsound
: Windows-specific sound playback libraryselenium
: Browser automation libraryundetected_chromedriver
: Undetected Chrome WebDriver for Seleniumpsutil
: Process and system monitoring librarypygetwindow
: Library for retrieving window informationpyautogui
: GUI automation librarytkinter
: Standard Python GUI library
Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
This project is licensed under the MIT License.