Skip to content

A client-server setup for speech recognition using Mozilla Deep Speech

Notifications You must be signed in to change notification settings

miselaytes-anton/speech-to-text-experiment

Repository files navigation

Speech to text experiment

A client-server setup for speech recognition using Mozilla Deep Speech project.

Setup

Get the pre-trained models

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.4.1/deepspeech-0.4.1-models.tar.gz
tar xvfz deepspeech-0.4.1-models.tar.gz 

Start the app

npm i
npm run dev:server
npm run dev:client

Now you can open https://localhost:8080 in Chrome with unsafe localhost flag

Limitations

  • Based on my tests its only works well with really simple commands, e.g. left, right, yes, no etc.
  • We need to use HTTPS for audio input and web sockets. And since we use self-signed certificates, this only works in Chrome with unsafe localhost flag.
  • It seems that Deep Speech only works well with a single audio file being transcribed at a time, so using multiple clients lowers the quality of recognition
  • We need to down sample audio from 41000 to 16000 which lowers the quality of recognition
  • Looks like the model only works well with native english speakers at the moment
  • Perfomance is not that great

About

A client-server setup for speech recognition using Mozilla Deep Speech

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published