Real-time Speech To Text using Faster Whisper.
- Real-time Speech to Text Conversion: Converts spoken language into written text in real-time.
- Microphone Support: Utilizes the system’s default microphone for audio input.
- Background Listening: Continuously listens to audio input in the background.
- Transcription: Transcribes the recorded audio into text.
- Stopping Mechanism: Provides an option to stop the transcription process at any time.
- Retrieving Transcription: Allows retrieval of the last transcribed text.
- Thread Safety: Ensures safe concurrent execution with multiple threads.
- Logging: Logs important events and messages for debugging and tracking.
13 seconds audio file generated by AI
stt.mp4
[0.00s -> 4.56s] A golden sunrise painted the sky, casting a warm glow on the quiet town below.
[5.44s -> 8.32s] The aroma of freshly baked bread wafted through the air.
[9.28s -> 13.52s] The town was waking up, ready to embrace a new day full of possibilities.
You said: The golden sunrise painted the sky, casting a warm glow on the quiet town below.
You said: the aroma of freshly baked bread wafted through the air.
You said: The town was waking up, ready to embrace a new day full of possibilities.
Check demo folder for audio files and results.
Install Real-time STT manually
- Python 3.7-3.9 (tested on 3.8)
- CUDA 11.8
- CUDA Toolkit 12
git clone https://github.com/rudymohammadbali/Real-time-STT.git
cd Real-time-STT
pip install -r requirements.txt
Install CUDA
# CUDA 11.8
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
# CPU only
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cpu
Download and install CUDA Toolkit 12 from: https://developer.nvidia.com/cuda-downloads
Check FasterWhisper for more info: https://github.com/SYSTRAN/faster-whisper
try:
stt = STT(model_size="base.en", device="cuda", compute_type="float16", language="en", logging_level="INFO")
stt.listen() # Start listening in background
while stt.is_listening:
last_transcription = stt.get_last_transcription()
if len(last_transcription) > 0:
print("You said: ", last_transcription) # Get last transcription
# If user said 'stop' then stop the transcription process by calling stt.stop()
if "stop" in last_transcription.lower():
stt.stop()
time.sleep(1)
except KeyboardInterrupt:
pass
Contributions are always welcome!
- Reporting a bug
- Discussing the current state of the code
- Submitting a fix
- Proposing new features
If you want to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.
If you'd like to support my ongoing efforts in sharing fantastic open-source projects, you can contribute by making a donation via PayPal.