Pre-release
Pre-release

@deepspeech-automation deepspeech-automation released this Oct 13, 2018 · 8 commits to master since this release

Assets 29
Merge pull request #1646 from lissyx/bump-v0.3.0-alpha.1

Bump to v0.3.0-alpha.1
Pre-release
Pre-release

@deepspeech-automation deepspeech-automation released this Oct 11, 2018 · 16 commits to master since this release

Assets 29
Merge pull request #1641 from lissyx/bump-v0.3.0-alpha.0

Bump to v0.3.0-alpha.0
Pre-release
Pre-release

@deepspeech-automation deepspeech-automation released this Oct 2, 2018 · 35 commits to master since this release

Assets 29
Merge pull request #1618 from lissyx/bump-v0.2.1-alpha.2

Bump to v0.2.1-alpha.2
Pre-release
Pre-release

@deepspeech-automation deepspeech-automation released this Sep 26, 2018 · 52 commits to master since this release

Assets 29
Merge pull request #1596 from lissyx/bump-v0.2.1-alpha.1

Bump to v0.2.1-alpha.1
Pre-release
Pre-release

@deepspeech-automation deepspeech-automation released this Sep 26, 2018 · 56 commits to master since this release

Assets 29
Merge pull request #1591 from lissyx/bump-v0.2.1-alpha.0

Bump to v0.2.1-alpha.0

@reuben reuben released this Sep 18, 2018 · 70 commits to master since this release

Assets 32

General

This is the 0.2.0 release of Deep Speech, an open speech-to-text engine. This release includes source code

v0.2.0.tar.gz

and a trained model

deepspeech-0.2.0-models.tar.gz

trained on American English which achieves an 11% word error rate on the LibriSpeech clean test corpus (models with "rounded" in their file name have rounded weights and those with a "*.pbmm" extension are memory mapped and much more memory efficient), and example audio

audio-0.2.0.tar.gz

which can be used to test the engine and checkpoint files

deepspeech-0.2.0-checkpoint.tar.gz

which can be used as the basis for further fine-tuning.

Notable changes from the previous release

  • Made Deep Speech streamable, i.e. able to do inference while audio is streaming in (#1463)
  • Introduced new streaming API, example usage in this gist (#1463)
  • Added feature caching, precomputing + caching audio features to speed training (#1532)
  • Added progressbar to indicate training progress (#1488)
  • Updated Dockerfile's cuDNN version from 7.1.1 to 7.2.1 (1a7ac22)
  • Removed old training + website scripts (#1539)
  • Pre-built binaries now work with upstream TensorFlow 1.6 (c579b74)
  • Switched to LSTMBlockFusedCell (0b95ed6)
  • Added tool to convert graph protobuf to pbtxt (4e383ac)
  • Added tool to find out which ops are needed by a graph (d2be00f)
  • Added Non-positional arguments everywhere (646c917)
  • Added support for Node.JS 10 (#1396)

Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the hardware used, a server with 8 TitanX Pascal GPUs (12GB of VRAM).

  • train_files Fisher, LibriSpeech, Switchboard training corpora, as well as a pre-release snapshot of the English Common Voice training corpus.
  • dev_files LibriSpeech clean and other dev corpora, as well as a pre-release snapshot of the English Common Voice validation corpus.
  • test_files LibriSpeech clean test corpus
  • train_batch_size 24
  • dev_batch_size 48
  • test_batch_size 48
  • epoch 30
  • learning_rate 0.0001
  • display_step 0
  • validation_step 1
  • dropout_rate 0.2
  • checkpoint_step 1
  • n_hidden 2048

The weights with the best validation loss were selected at the end of the 30 epochs.

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

Also, it exposes bindings for the following languages

  • Python (Versions 2.7, 3.4, 3.5, 3.6 and 3.7) installed via
    pip install deepspeech
    Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
    pip install deepspeech-gpu
  • NodeJS (Versions 4.x, 5.x, 6.x, 7.x, 8.x, 9.x and 10.x) installed via
    npm install deepspeech
    
    Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
    npm install deepspeech-gpu
    
  • C++ which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)

In addition there are third party bindings that are supported by external developers, for example

  • Rust which is installed by following the instructions on the external Rust repo.

Supported Platforms

  • OS X 10.10, 10.11, 10.12, 10.13 and 10.14
  • Linux x86 64 bit with a modern CPU (Needs at least AVX/FMA)
  • Linux x86 64 bit with a modern CPU + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)
  • Raspbian Stretch on Raspberry Pi 3
  • ARM64 built against Debian/ARMbian Stretch and tested on LePotato boards

Known Issues

  • Feature caching speeds training but increases memory usage

Contact/Getting Help

  1. FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.
  2. Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.
  3. IRC - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning channel on Mozilla IRC; people there can try to answer/help
  4. Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

Contributors to 0.2.0 release

Pre-release
Pre-release

@deepspeech-automation deepspeech-automation released this Sep 18, 2018 · 75 commits to master since this release

Assets 29
Merge pull request #1554 from lissyx/bump-v0.2.0-alpha.10

Bump to version 0.2.0-alpha.10

@kdavis-mozilla kdavis-mozilla released this Sep 18, 2018 · 295 commits to master since this release

Assets 25

General

This is the 0.1.1 release of Deep Speech, an open speech-to-text engine. This release includes source code

v0.1.1.tar.gz

and a model, not yet optimized for size,

deepspeech-0.1.1-models.tar.gz

trained on American English which achieves a 5.6% word error rate (The language model included some test data.) on the LibriSpeech clean test corpus, and example audio

audio-0.1.1.tar.gz

which can be used to test the engine and checkpoint files

deepspeech-0.1.1-checkpoint.tar.gz

which can be used as the basis for further fine-tuning. Unfortunately licensing issues prevent us from releasing the text used to train the language model.

Notable changes from the previous release

  • Rust bindings were contributed by RustAudio
  • Lowering dependency on AVX2 to AVX instruction sets (mozilla/tensorflow#46)
  • Pre-built binaries now work with upstream TensorFlow 1.4 (mozilla/tensorflow#43)
  • Switching GPU build to CUDA 8.0 / CuDNN v6 (mozilla/tensorflow#43)
  • Added support for Node.JS 7/8/9 (#1042)
  • Initializing a training run from a frozen graph (eg. a release model) is now easier (#1149)
  • The Python package no longer holds the GIL during inference and can be used in multi-threaded Python programs (#1164)
  • The Python package now works on macOS 10.10 and 10.11 (#1065)

Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the hardware used, a two node cluster where each node has 8 TitanX Pascal GPU's.

  • train_files Fisher, LibriSpeech, and Switchboard training corpora.
  • dev_files LibriSpeech clean dev corpus
  • test_files LibriSpeech clean test corpus
  • train_batch_size 12
  • dev_batch_size 8
  • test_batch_size 8
  • epoch 13
  • learning_rate 0.0001
  • display_step 0
  • validation_step 1
  • dropout_rate 0.2367
  • default_stddev 0.046875
  • checkpoint_step 1
  • log_level 0
  • checkpoint_dir value specific to hardware setup
  • wer_log_pattern "GLOBAL LOG: logwer('${COMPUTE_ID}', '%s', '%s', %f)"
  • decoder_library_path value specific to hardware setup
  • n_hidden 2048

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

Also, it exposes bindings for the following languages

  • Python (Versions 2.7, 3.4, 3.5, and 3.6) installed via
    pip install deepspeech
    Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
    pip install deepspeech-gpu
  • NodeJS (Versions 4.x, 5.x, 6.x, 7.x, 8.x and 9.x) installed via
    npm install deepspeech
    
    Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
    npm install deepspeech-gpu
    
  • C++ which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)

In addition there are third party bindings that are supported by external developers, for example

  • Rust which is installed by following the instructions on the external Rust repo.

Supported Platforms

  • OS X 10.10, 10.11, 10.12 and 10.13
  • Linux x86 64 bit with a modern CPU (Needs at least AVX/FMA)
  • Linux x86 64 bit with a modern CPU + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)
  • Raspbian Jessie on Raspberry Pi 3

Contact/Getting Help

  1. FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.
  2. Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.
  3. IRC - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning channel on Mozilla IRC; people there can try to answer/help
  4. Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

Contributors to 0.1.1 release

Aug 9, 2018
Bump to v0.2.0-alpha.9
Jul 23, 2018
Merge pull request #1470 from lissyx/alpha-8
Bump to version 0.2.0-alpha.8