Speech Translate

A speech transcription and translation application using whisper AI model.

Features

Speech to text
Translation of transcribed text (Speech to translated text)
Realtime input from mic and speaker
Batch file processing with timestamp
Preview

Detached window preview Transcribe mode on detached window (English) Translate mode on detached window (English to Indonesia)

User Requirements

Whisper uses vram/gpu to process the audio, so it is recommended to have a CUDA compatible GPU. If there is no compatible GPU, the application will use the CPU to process the audio (This might make it slower). For each model requirement you can check directly at the whisper repository or you can hover over the model selection in the app (there will be a tooltip about the model info).
Speaker input only work on windows 8 and above.

Download & Installation

Download the latest release here
Install
Run the program

General Usage

Set user setting
Select model
Select mode and language
Click the record button
Stop record
(Optionally) export the result to a file

User Settings

You can change the settings by clicking the settings button on the menubar of the app. Alternatively, you can press F2 to open the menu window or you could also edit the settings file manually located at ./setting/setting.json.

- Development -

Warning
As of right now (4th of November 2022) I guess pytorch is not compatible with python 3.11 so you can't use python 3.11. I tried with 3.11 but it doesn't work so i rollback to python 3.10.8.

Setup

Note
It is recommended to create a virtual environment, but it is not required.

For OS other than windows, you can install the packages from requirements_notwindows.txt

The master branch might not always be stable so you can checkout to the latest release tag to get the latest stable version.

Create your virtual environment by running python -m venv venv
Activate your virtual environment by running source venv/bin/activate
Install all the dependencies needed by running the devSetup.py located in root directory or install the packages yourself by installing from the requirements.txt yourself by running pip install -r requirements.txt
Get to root directory and Run the script by typing python Main.py

Whisper needs ffmpeg to work, you can install it and add it to your path manually or you can do it easily by running the following command:

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

Using GPU for Whisper

Note
This process could be handled automatically by running devSetup.py

To use GPU you first need to uninstall torch then you can go to pytorch official website to install the correct version of pytorch with GPU compatibily for your system.

Building

You can use pyinstaller or auto-py-to-exe for a graphical interface.

If you use pyinstaller you can load the spec file by running pyinstaller ./build.spec to build the project. Alternatively, you can type the build command when in root directory directly like this:

pyinstaller --noconfirm --onedir --console --icon "./assets/icon.ico" --name "Speech Translate" --clean --add-data "./assets;assets/" --copy-metadata "tqdm" --copy-metadata "regex" --copy-metadata "requests" --copy-metadata "packaging" --copy-metadata "filelock" --copy-metadata "numpy" --copy-metadata "tokenizers" --add-data "./venv/Lib/site-packages/whisper/assets;whisper/assets/"  "./Main.py"

This will produce an exceutable file in the dist directory.

Note: Replace the venv with your actual venv path

If you use auto-py-to-exe you can load the build.json file located in root directory. You will need to replace the dot (.) in the build.json file with the actual path of the project. This will produce an exceutable file in the output directory.

You should be able to compile it on other platform (mac/linux) but I only tested it on Windows.

Compatibility

This project should be compatible with Windows (preferrably windows 10 or later) and other platforms. But I haven't tested it on platform other than windows.

Contributing

Feel free to contribute to this project by forking the repository, making your changes, and submitting a pull request. You can also contribute by creating an issue if you find a bug or have a feature request. Also, feel free to give this project a star if you like it.

License

This project is licensed under the MIT License - see the LICENSE file for details

Other

Check out my other similar project called Screen Translate a screen translator / OCR tools made possible using tesseract.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Translate

Features

User Requirements

Download & Installation

General Usage

User Settings

- Development -

Setup

Using GPU for Whisper

Building

Compatibility

Contributing

License

Other

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github		.github
.vscode		.vscode
assets		assets
speech_translate		speech_translate
.gitignore		.gitignore
LICENSE		LICENSE
Main.py		Main.py
build.json		build.json
build.spec		build.spec
devSetup.py		devSetup.py
readme.md		readme.md
requirements.txt		requirements.txt
requirements_notwindows.txt		requirements_notwindows.txt

License

migggzz/Speech-Translate

Folders and files

Latest commit

History

Repository files navigation

Speech Translate

Features

User Requirements

Download & Installation

General Usage

User Settings

- Development -

Setup

Using GPU for Whisper

Building

Compatibility

Contributing

License

Other

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages