Voice Assistant App

This is a voice assistant application that listens for voice commands, transcribes the audio using Whisper, and executes various actions based on the transcribed text. The app uses a browser automation class to interact with the OpenAI GPT-4 model for generating responses to user prompts.

Features

Listens for voice commands continuously
Transcribes audio using the Whisper model
Executes external commands based on the transcribed text
Interacts with the OpenAI GPT-4 model for generating responses
Displays the generated responses in a popup window

Installation

Clone the repository: git clone https://github.com/luishacm/voice-assistant-app.git
Install the required dependencies: pip install -r requirements.txt
Set up the necessary configurations:
- Set the profile_path variable in browser.py to the path of your Selenium profile directory.
- Adjust the seconds_to_command and silence_threshold variables in app.py according to your preferences.
Run the application: python app.py

Usage

Launch the application by running app.py.
The app will start listening for voice commands.
Speak a command that includes the keyword "luma" to activate the voice assistant.
The app will transcribe the audio and execute the corresponding action based on the transcribed text.
If the command requires interaction with the OpenAI GPT-4 model, the app will send the prompt to the model and display the generated response in a popup window.
To stop the application, say a command that includes the keyword "desligar" or "luma".

File Structure

app.py: The main entry point of the application. It handles audio recording, transcription, and command execution.
browser.py: Contains the Browser class responsible for browser automation and interaction with the OpenAI GPT-4 model.
external_commands.py: Defines the ExternalCommands class, which contains methods for executing various external commands based on the transcribed text.
popup_window.py: Implements the popup window functionality for displaying the generated responses.

Dependencies

faster_whisper: Whisper model for audio transcription
sounddevice: Audio input/output library
soundfile: Audio file I/O library
numpy: Numerical computing library
winsound: Windows-specific sound playback library
selenium: Browser automation library
undetected_chromedriver: Undetected Chrome WebDriver for Selenium
psutil: Process and system monitoring library
pygetwindow: Library for retrieving window information
pyautogui: GUI automation library
tkinter: Standard Python GUI library

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
browser.py		browser.py
external_commands.py		external_commands.py
license		license
popup_window.py		popup_window.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice Assistant App

Features

Installation

Usage

File Structure

Dependencies

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

luishacm/VoicePersonalAssistantLLM

Folders and files

Latest commit

History

Repository files navigation

Voice Assistant App

Features

Installation

Usage

File Structure

Dependencies

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages