DeepSpeech 0.6.1
General
This is the 0.6.1 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not backwards compatible with version 0.5.1 or earlier versions. So when updating one will have to update code and models. As with previous releases, this release source code:
and a model
deepspeech-0.6.1-models.tar.gz (This is identical to the 0.6.0 model).
trained on American English which achieves an 7.5% word error rate on the LibriSpeech clean test corpus. Models with a "*.pbmm" extension are memory mapped and much more memory efficient, as well as faster to load. Models with the ".tflite" extension are converted to use with TFLite and have post-training quantization enabled, and are more suitable for resource constrained environments.
We also include example audio files:
which can be used to test the engine; and checkpoint files
deepspeech-0.6.1-checkpoint.tar.gz (This is identical to the 0.6.0 checkpoint, except the missing alphabet.txt file is now included.)
which can be used as the basis for further fine-tuning.
Notable changes from the previous release
DeepSpeech 0.6.1 is a patch release that addresses some minor points surfaced after the 0.6.0 release:
- Fixed a bug where silence was incorrectly transcribed as "i", "a" or (rarely) other one letter transcriptions.
- Fixed a bug where the TFLite version of the model was exported with a mismatched
forget_bias
setting. - Fixed some broken links in the documentation and the PyPI package listing.
- Build and package TFLite version of the Python package for desktop platforms as
deepspeech-tflite
. - Move examples to a separate repository for easier maintenance: https://github.com/mozilla/DeepSpeech-examples.
- Remove outdated remark about the performance of
DS_IntermediateDecode
from the docs. - Added third party bindings for the V programming language
- Fixed incorrect shape handling in online augmentation code.
- Minor fixes to documentation text and CLI flag help texts.
Hyperparameters for fine-tuning
The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the hardware used, a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM. These are identical to the 0.6.0 release.
train_files
Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.dev_files
LibriSpeech clean dev corpus.test_files
LibriSpeech clean test corpustrain_batch_size
128dev_batch_size
128test_batch_size
128n_hidden
2048learning_rate
0.0001dropout_rate
0.20epoch
75lm_alpha
0.75lm_beta
1.85
The weights with the best validation loss were selected at the end of 75 epochs using --noearly_stop
, and the selected model was trained for 233784 steps. In addition the training used the --use_cudnn_rnn
flag.
Bindings
This release also includes a Python based command line tool deepspeech
, installed through
pip install deepspeech
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
pip install deepspeech-gpu
On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:
pip install deepspeech-tflite
Also, it exposes bindings for the following languages
- Python (Versions 2.7, 3.5, 3.6, 3.7 and 3.8) installed via
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
pip install deepspeech
pip install deepspeech-gpu
On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:
pip install deepspeech-tflite
-
NodeJS (Versions 4.x, 5.x, 6.x, 7.x, 8.x, 9.x, 10.x, 11.x, 12.x and 13.x) installed via
npm install deepspeech
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
npm install deepspeech-gpu
-
ElectronJS versions 3.1, 4.0, 4.1, 5.0, 6.0, 7.0 and 7.1 are also supported
-
C++ which requires the appropriate shared objects are installed from
native_client.tar.xz
(See the section in the main README which describesnative_client.tar.xz
installation.) -
.NET which is installed by following the instructions on the NuGet package page.
In addition there are third party bindings that are supported by external developers, for example
- Rust which is installed by following the instructions on the external Rust repo.
- Go which is installed by following the instructions on the external Go repo.
- V which is installed by following the instructions on the external Vlang repo.
Supported Platforms
- Windows 8.1, 10, and Server 2012 R2 64-bits (Needs at least AVX support, requires
Redistribuable Visual C++ 2015 Update 3 (64-bits)
for runtime). - OS X 10.10, 10.11, 10.12, 10.13, 10.14 and 10.15
- Linux x86 64 bit with a modern CPU (Needs at least AVX/FMA)
- Linux x86 64 bit with a modern CPU + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)
- Raspbian Buster on Raspberry Pi 3 + Raspberry Pi 4
- ARM64 built against Debian/ARMbian Buster and tested on LePotato boards
- Java Android bindings / demo app. Early preview, tested only on Pixel 2 device, TF Lite model only.
Contact/Getting Help
- FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.
- Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.
- IRC - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the
#machinelearning
channel on Mozilla IRC; people there can try to answer/help - Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.