Skip to content

Openai's Whispers with Webrtc & "caching"🙃

License

Notifications You must be signed in to change notification settings

gslaller/whisper-webrtc

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bolt

This browser based project let's the whisper model transcribe incoming audio in realtime.

Translation is not possible(maybe added later)

Screenshot

Openai's Whisper

[Blog] [Paper] [Model card] [Colab example]

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

Approach

  1. Use the webrtc api to transmit audio data in realtime to the backend.
  2. Extend the model, so it caches the previous outputs, hence mitigating duplicate computation.
  3. Make realtime transcription happen.

Setup

Install the requirements.txt with pip. No need for ffmpeg.

# in backend
pip3 install -r requirements.txt

To start

cd frontend && npm run dev
cd ../backend && python3 main.py

TODO:

  1. Docker file for the frontend and backend
  2. The timestamp for individual words can also be extracted by extracting the timestamp_token after each word.
  3. The timestamp after each token doesn't produce nice results. So I don't know how this will fare with "Scriptio continua" languages.
  4. I am in need of some ideas for the continous transcription. Please state any methodology: twitter:@gslaller.

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x

more information about the models

About

Openai's Whispers with Webrtc & "caching"🙃

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 86.3%
  • Svelte 8.7%
  • TypeScript 3.7%
  • Other 1.3%