Project DeepSpeech

Project DeepSpeech is an open source Speech-To-Text engine that uses a model trained by machine learning techniques, based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow project to facilitate implementation.

Prerequisites

Install

Manually install Git Large File Storage, then open a terminal and run:

git clone https://github.com/mozilla/DeepSpeech
cd DeepSpeech
pip install -r doc/requirements.txt

Recommendations

If you have a capable (Nvidia, at least 8GB of VRAM) GPU, it is highly recommended to install TensorFlow with GPU support. Training will likely be significantly quicker than using the CPU.

Training a model

Open a terminal, change to the directory of the DeepSpeech checkout and run

DeepSpeech$ ./bin/run-ldc93s1.sh

By default, the code will train on a small sample dataset called LDC93S1, which can be overfitted on a GPU in a few minutes for demonstration purposes. From here, you can alter any variables with regards to what dataset is used, how many training iterations are run and the default values of the network parameters. Then, just run the script to train the modified network.

You can also use other utility scripts in bin/ to train on different data sets, but keep in mind that the other speech corpora are very large, on the order of tens of gigabytes, and some aren't free. Downloading and preprocessing them can take a very long time, and training on them without a fast GPU (GTX 10 series recommended) takes even longer. If you experience GPU OOM errors while training, try reducing batch_size.

Exporting a model for serving

If the ds_export_dir environment variable is set, or the export_dir variable is set manually, a model will have been exported to this directory during training. If training has been performed without exporting a model, a model can be exported by setting the variable to the directory you'd like to export to (e.g. export_dir = os.path.join(checkpoint_dir, 'export')) and running the model exporting cell manually. If the notebook has been restarted since training, you will need to run all the cells above the training cell first before running the export cell, to declare and initialise the required variables and functions.

Refer to the corresponding README.md for information on building and running the client.

Documentation

Documentation for the project can be found here: http://deepspeech.readthedocs.io/en/latest/

Name		Name	Last commit message	Last commit date
Latest commit History 366 Commits
bin		bin
data		data
demos		demos
doc		doc
images		images
local_tf		local_tf
native_client		native_client
resources		resources
serving_client		serving_client
util		util
.gitattributes		.gitattributes
.gitignore		.gitignore
DeepSpeech.py		DeepSpeech.py
LICENSE		LICENSE
README.md		README.md
README.website.md		README.website.md
index.htm		index.htm

License

pandeydivesh15/DeepSpeech

Folders and files

Latest commit

History

Repository files navigation

Project DeepSpeech

Prerequisites

Install

Recommendations

Training a model

Exporting a model for serving

Documentation

About

Resources

License

Stars

Watchers

Forks

Languages