Real-time STT

Real-time Speech To Text using Faster Whisper.

Features

Real-time Speech to Text Conversion: Converts spoken language into written text in real-time.
Microphone Support: Utilizes the system’s default microphone for audio input.
Background Listening: Continuously listens to audio input in the background.
Transcription: Transcribes the recorded audio into text.
Stopping Mechanism: Provides an option to stop the transcription process at any time.
Retrieving Transcription: Allows retrieval of the last transcribed text.
Thread Safety: Ensures safe concurrent execution with multiple threads.
Logging: Logs important events and messages for debugging and tracking.

Demo

13 seconds audio file generated by AI

stt.mp4

Using audio file:

[0.00s -> 4.56s] A golden sunrise painted the sky, casting a warm glow on the quiet town below.

[5.44s -> 8.32s] The aroma of freshly baked bread wafted through the air.

[9.28s -> 13.52s] The town was waking up, ready to embrace a new day full of possibilities.

Real-time transcription:

You said:  The golden sunrise painted the sky, casting a warm glow on the quiet town below.

You said:  the aroma of freshly baked bread wafted through the air.

You said:  The town was waking up, ready to embrace a new day full of possibilities.

Check demo folder for audio files and results.

Installation

Install Real-time STT manually

Python 3.7-3.9 (tested on 3.8)
CUDA 11.8
CUDA Toolkit 12

  git clone https://github.com/rudymohammadbali/Real-time-STT.git
  cd Real-time-STT
  pip install -r requirements.txt

Install CUDA

# CUDA 11.8
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
# CPU only
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cpu

Download and install CUDA Toolkit 12 from: https://developer.nvidia.com/cuda-downloads

Check FasterWhisper for more info: https://github.com/SYSTRAN/faster-whisper

Usage/Examples

try:
    stt = STT(model_size="base.en", device="cuda", compute_type="float16", language="en", logging_level="INFO")
    stt.listen() # Start listening in background

    while stt.is_listening:
        last_transcription = stt.get_last_transcription()
        if len(last_transcription) > 0:
            print("You said: ", last_transcription) # Get last transcription
            # If user said 'stop' then stop the transcription process by calling stt.stop()
            if "stop" in last_transcription.lower():
                stt.stop()

        time.sleep(1)

except KeyboardInterrupt:
    pass

Contributing

Contributions are always welcome!

Reporting a bug
Discussing the current state of the code
Submitting a fix
Proposing new features

If you want to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.

Support

If you'd like to support my ongoing efforts in sharing fantastic open-source projects, you can contribute by making a donation via PayPal.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
demo		demo
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-time STT

Features

Demo

Using audio file:

Real-time transcription:

Installation

Usage/Examples

Contributing

Support

About

Languages

License

rudymohammadbali/Real-time-STT

Folders and files

Latest commit

History

Repository files navigation

Real-time STT

Features

Demo

Using audio file:

Real-time transcription:

Installation

Usage/Examples

Contributing

Support

About

Topics

Resources

License

Stars

Watchers

Forks

Languages