🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
Switch branches/tags
Clone or download
pannous Merge pull request #45 from camelshang/master
Installation details and minor bug fixes of speech2text-tflearn.py
Latest commit ee48345 Jun 21, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.idea No commit message Jun 7, 2018
extra No commit message Dec 12, 2016
images No commit message Dec 9, 2016
layer No commit message Jun 7, 2018
tensorpeers @ f571827 note Feb 22, 2017
.gitignore fixed requirements.txt Mar 22, 2017
.gitmodules No commit message Dec 22, 2016
LICENSE No commit message Dec 14, 2016
README.md Merge pull request #45 from camelshang/master Jun 20, 2018
WarpCTC.txt No commit message Dec 14, 2016
__init__.py No commit message Feb 16, 2017
bdlstm_utils.py No commit message Feb 16, 2017
densenet_layer.py No commit message Jun 7, 2018
generate_speech_data.py No commit message Feb 16, 2017
lstm-tflearn.py No commit message Feb 16, 2017
lstm_ctc_to_chars.py No commit message Feb 16, 2017
lstm_mfcc_ctc_to_words.py No commit message Feb 16, 2017
lstm_mfcc_to_chars.py No commit message Feb 16, 2017
lstm_to_chars.py No commit message Feb 16, 2017
mfcc_feature_classifier.py No commit message Feb 16, 2017
number_classifier_tflearn.py No commit message Feb 16, 2017
number_gan_layer.py No commit message Feb 16, 2017
number_gan_tflearn.py No commit message Feb 16, 2017
record-autoencoder.py qanda q_and_a sample data Feb 22, 2017
record.py No commit message Jun 7, 2018
requirements.txt No commit message Jun 7, 2018
speaker_classifier_tflearn.py No commit message Feb 16, 2017
spectro_gan.py nix Feb 21, 2017
speech2text-seq2seq.py No commit message Feb 16, 2017
speech2text-tflearn.py fix speech2text-tflearn.py Sep 6, 2017
speech_data.py spoken_numbers_pcm.tar Jun 7, 2018
speech_encoder.py No commit message Feb 16, 2017
spoken_numbers_pcm.tar spoken_numbers_pcm.tar Jun 7, 2018
spoken_numbers_spectros_64x64.tar No commit message Jun 7, 2018
subtitle-downloader.py No commit message Feb 23, 2017
subtitle_srt_parser.py x Feb 28, 2017
wave_GANerate.py No commit message Feb 16, 2017
word_to_phonemes.swift No commit message Dec 9, 2016

README.md

Tensorflow Speech Recognition

Speech recognition using google's tensorflow deep learning framework, sequence-to-sequence neural networks.

Replaces caffe-speech-recognition, see there for some background.

Update Mozilla released DeepSpeech

They achieve good error rates. Free Speech is in good hands, go there if you are an end user. For now this project is only maintained for educational purposes.

Ultimate goal

Create a decent standalone speech recognition for Linux etc. Some people say we have the models but not enough training data. We disagree: There is plenty of training data (100GB here and 21GB here on openslr.org , synthetic Text to Speech snippets, Movies with transcripts, Gutenberg, YouTube with captions etc etc) we just need a simple yet powerful model. It's only a question of time...

Sample spectrogram, That's what she said, too laid?

Sample spectrogram, Karen uttering 'zero' with 160 words per minute.

Installation

clone code

git clone https://github.com/pannous/tensorflow-speech-recognition
cd tensorflow-speech-recognition
git clone https://github.com/pannous/layer.git
git clone https://github.com/pannous/tensorpeers.git

pyaudio

requirements portaudio from http://www.portaudio.com/

git clone  https://git.assembla.com/portaudio.git
./configure --prefix=/path/to/your/local
make
make install
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/your/local/lib
export LIDRARY_PATH=$LIBRARY_PATH:/path/to/your/local/lib
export CPATH=$CPATH:/path/to/your/local/include
source ~/.bashrc

install pyaudio

pip install pyaudio

Getting started

Toy examples: ./number_classifier_tflearn.py ./speaker_classifier_tflearn.py

Some less trivial architectures: ./densenet_layer.py

Later: ./train.sh ./record.py

Sample spectrogram or record.py

Update: Nervana demonstrated that it is possible for 'independents' to build speech recognizers that are state of the art.

Fun tasks for newcomers

Extensions

Extensions to current tensorflow which are probably needed:

Even though this project is far from finished we hope it gives you some starting points.

Looking for a tensorflow collaboration / consultant / deep learning contractor? Reach out to info@pannous.com