Bolt

This browser based project let's the whisper model transcribe incoming audio in realtime.

Translation is not possible(maybe added later)

Openai's Whisper

[Blog] [Paper] [Model card] [Colab example]

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

Approach

Use the webrtc api to transmit audio data in realtime to the backend.
Extend the model, so it caches the previous outputs, hence mitigating duplicate computation.
Make realtime transcription happen.

Setup

Install the requirements.txt with pip. No need for ffmpeg.

# in backend
pip3 install -r requirements.txt

To start

cd frontend && npm run dev
cd ../backend && python3 main.py

TODO:

Docker file for the frontend and backend
The timestamp for individual words can also be extracted by extracting the timestamp_token after each word.
The timestamp after each token doesn't produce nice results. So I don't know how this will fare with "Scriptio continua" languages.
I am in need of some ideas for the continous transcription. Please state any methodology: twitter:@gslaller.

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

more information about the models

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.vscode		.vscode
assets		assets
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
model-card.md		model-card.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bolt

Openai's Whisper

Approach

Setup

Available models and languages

About

Releases

Packages

Languages

License

gslaller/whisper-webrtc

Folders and files

Latest commit

History

Repository files navigation

Bolt

Openai's Whisper

Approach

Setup

Available models and languages

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages