This is a Python script that uses Mozilla's DeepSpeech library to perform speech-to-text transcription. It captures audio from the microphone and converts it to text using the DeepSpeech model.
The text obtained from the speech-to-text transcription is then used as input to an OpenAI GPT-3 chatbot. The chatbot generates a response based on the input text, which is then transformed into audio using the Google Text-to-Speech API.
Make sure you have Python 3 installed on your system (Requires maximum Python version 3.9).
To run this project, you need to install the following dependencies:
pip install deepspeech==0.9.3 numpy pyaudio openai python-dotenv gtts pygame
The deepspeech
library requires installation of the pre-trained models in addition to the Python package. Follow the steps below to install the deepspeech
library:
-
Download the pre-trained models and scorer from the DeepSpeech releases page.
-
Place the contents in a folder named
models
in the project root. -
Modify the
self.model
variable in thespeech_to_text.py
script to point to the directory containing the pre-trained models and scorer:
self.model = deepspeech.Model("models/deepspeech-0.9.3-models.pbmm")
self.model.enableExternalScorer("models/deepspeech-0.9.3-models.scorer")
This project uses OpenAI's GPT-3 API to power the chatbot functionality. To use this feature, you'll need to sign up for an API key and set it as an environment variable. Follow these steps:
- Sign up for an API key at https://beta.openai.com/signup/.
- Create a new file in the project root called
.env
. - In the
.env
file, add the following line with your API key:
API_KEY=your_api_key_here
- Save the
.env
file.
To run the script, simply execute the main.py
file:
python main.py
The script will capture audio from your microphone and convert it to text using the DeepSpeech model. The resulting text will be printed to the console.
- Allow the user to stop the TTS audio playing manually without exiting the application
- Add other types of STT and allow user to change between them
If you encounter any issues while installing or running the script, make sure to check the following:
- Make sure you have installed all the required dependencies, including the specific versions mentioned above.
- Make sure you have Python 3 installed on your system.
- Make sure your microphone is connected and working properly.
- If you encounter any issues with PyAudio, try running the script with administrator privileges or installing PyAudio using a different method (e.g., conda or homebrew).
- If you encounter any issues with DeepSpeech, try downloading the latest version of the pre-trained model files from the DeepSpeech releases page on GitHub and modifying the script to use the new files.
- If the audio from the microphone takes a long time to convert to text or is cutting off some of the speech, check if you don't have any other USB devices, such as a webcam, that could be causing interference.