<a href="https://colab.research.google.com/github/ladyada/notebooks/blob/master/Speech_Training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Speech Recognition Training Notebook




This notebook demonstrates how to train a 20kb [Simple Audio Recognition](https://www.tensorflow.org/tutorials/sequences/audio_recognition) model for [TensorFlow Lite for Microcontrollers](https://tensorflow.org/lite/microcontrollers/overview). It will produce the same model used in the [micro_speech](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro/examples/micro_speech) example application.

The model is designed to be used with [Google Colaboratory](https://colab.research.google.com).

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/examples/micro_speech/train_speech_model.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/examples/micro_speech/train_speech_model.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>




The notebook runs Python scripts to train and freeze the model, and uses the TensorFlow Lite converter to convert it for use with TensorFlow Lite for Microcontrollers.

**Training is much faster using GPU acceleration.** Before you proceed, ensure you are using a GPU runtime by going to **Runtime -> Change runtime type** and selecting **GPU**. Training 18,000 iterations will take 1.5-2 hours on a GPU runtime.

## Configure training

The following `os.environ` lines can be customized to set the words that will be trained for, and the steps and learning rate of the training. The default values will result in the same model that is used in the micro_speech example. Run the cell to set the configuration:

In [1]:
import os

# A comma-delimited list of the words you want to train for.
# The options are: yes,no,up,down,left,right,on,off,stop,go
# All other words will be used to train an "unknown" category.
os.environ["WANTED_WORDS"] = "yes,no"

# The number of steps and learning rates can be specified as comma-separated
# lists to define the rate at each stage. For example,
# TRAINING_STEPS=15000,3000 and LEARNING_RATE=0.001,0.0001
# will run 18,000 training loops in total, with a rate of 0.001 for the first
# 15,000, and 0.0001 for the final 3,000.
os.environ["TRAINING_STEPS"]="15000,3000"
os.environ["LEARNING_RATE"]="0.001,0.0001"

# Calculate the total number of steps, which is used to identify the checkpoint
# file name.
total_steps = sum(map(lambda string: int(string),
                  os.environ["TRAINING_STEPS"].split(",")))
os.environ["TOTAL_STEPS"] = str(total_steps)

# Print the configuration to confirm it
!echo "Training these words: ${WANTED_WORDS}"
!echo "Training steps in each stage: ${TRAINING_STEPS}"
!echo "Learning rate in each stage: ${LEARNING_RATE}"
!echo "Total number of training steps: ${TOTAL_STEPS}"

Training these words: yes,no
Training steps in each stage: 15000,3000
Learning rate in each stage: 0.001,0.0001
Total number of training steps: 18000


In [2]:
import os.path
from google.colab import drive

DRIVE_STORAGE_PATH = '/content/drive/My Drive/speech-recognition'

def ensure_drive():
  if not os.path.exists('/content/drive/My Drive'):
    drive.mount('/content/drive', force_remount=True)
    
ensure_drive()

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


## Install dependencies

Next, we'll install a GPU build of TensorFlow, so we can use GPU acceleration for training.

In [3]:
# Replace Colab's default TensorFlow install with a more recent
# build that contains the operations that are needed for training
!pip uninstall -y tensorflow tensorflow_estimator
!pip install -q tf-estimator-nightly==1.14.0.dev2019072901 tf-nightly-gpu==1.15.0.dev20190729

Uninstalling tensorflow-1.15.0:
  Successfully uninstalled tensorflow-1.15.0
Uninstalling tensorflow-estimator-1.15.1:
  Successfully uninstalled tensorflow-estimator-1.15.1
[K     |████████████████████████████████| 501kB 6.4MB/s 
[K     |████████████████████████████████| 406.6MB 70kB/s 
[K     |████████████████████████████████| 3.8MB 44.3MB/s 
[?25h

## Download TensorFlow

We'll also clone the TensorFlow repository, which contains the scripts that train and freeze the model.

In [4]:
# Clone the repository from GitHub
!git clone -q https://github.com/tensorflow/tensorflow
# Check out a commit that has been tested to work
# with the build of TensorFlow we're using
!git -c advice.detachedHead=false -C tensorflow checkout 17ce384df70

Checking out files: 100% (8496/8496), done.
HEAD is now at 17ce384df7 Share ownership of `UnboundedWorkQueue` between collective executor and executor manager.


## Create trained model

In [5]:
!python tensorflow/tensorflow/examples/speech_commands/train.py \
--model_architecture=tiny_conv --window_stride=20 --preprocess=micro \
--wanted_words=${WANTED_WORDS} --silence_percentage=25 --unknown_percentage=25 \
--quantize=1 --how_many_training_steps=${TRAINING_STEPS} \
--learning_rate=${LEARNING_RATE} --summaries_dir=/content/retrain_logs \
--data_dir=/content/speech_dataset --train_dir=/content/speech_commands_train

2019-10-23 01:11:27.448973: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-23 01:11:27.473251: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-10-23 01:11:27.590800: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-23 01:11:27.591696: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x15eea00 executing computations on platform CUDA. Devices:
2019-10-23 01:11:27.591734: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2019-10-23 01:11:27.594228: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-10-23 01:11:27.594458: I tensorflow/compiler/xla/serv

### Optional: Visualize graph and training rate

In [7]:
%load_ext tensorboard
%tensorboard --logdir /content/retrain_logs

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


ERROR: Failed to launch TensorBoard (exited with 1).
Contents of stderr:
Traceback (most recent call last):
  File "/usr/local/bin/tensorboard", line 10, in <module>
    sys.exit(run_main())
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/main.py", line 64, in run_main
    app.run(tensorboard.main, flags_parser=tensorboard.configure)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 220, in main
    server = self._make_server()
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 301, in _make_server
    self.assets_zip_provider)
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/backend/application.py", line 150, in standard_tensorboard_wsgi
    flags, plugin_loaders, data_provider, assets_zip_provider, mul

In [8]:
!cp -vr /content/speech_commands_train "{DRIVE_STORAGE_PATH}/"

'/content/speech_commands_train' -> '/content/drive/My Drive/speech-recognition/'
'/content/speech_commands_train/tiny_conv.pbtxt' -> '/content/drive/My Drive/speech-recognition/tiny_conv.pbtxt'
'/content/speech_commands_train/tiny_conv_labels.txt' -> '/content/drive/My Drive/speech-recognition/tiny_conv_labels.txt'
'/content/speech_commands_train/tiny_conv.ckpt-17600.meta' -> '/content/drive/My Drive/speech-recognition/tiny_conv.ckpt-17600.meta'
'/content/speech_commands_train/tiny_conv.ckpt-17700.meta' -> '/content/drive/My Drive/speech-recognition/tiny_conv.ckpt-17700.meta'
'/content/speech_commands_train/tiny_conv.ckpt-17800.meta' -> '/content/drive/My Drive/speech-recognition/tiny_conv.ckpt-17800.meta'
'/content/speech_commands_train/tiny_conv.ckpt-17900.meta' -> '/content/drive/My Drive/speech-recognition/tiny_conv.ckpt-17900.meta'
'/content/speech_commands_train/tiny_conv.ckpt-18000.meta' -> '/content/drive/My Drive/speech-recognition/tiny_conv.ckpt-18000.meta'
'/content/speech_

## Freeze Graph

In [9]:
!python tensorflow/tensorflow/examples/speech_commands/freeze.py \
--model_architecture=tiny_conv --window_stride=20 --preprocess=micro \
--wanted_words=${WANTED_WORDS} --quantize=1 --output_file=/content/tiny_conv.pb \
--start_checkpoint=/content/speech_commands_train/tiny_conv.ckpt-${TOTAL_STEPS}

2019-10-23 05:11:59.752084: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-23 05:11:59.777363: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-10-23 05:11:59.864999: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-23 05:11:59.866444: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2b3ca00 executing computations on platform CUDA. Devices:
2019-10-23 05:11:59.866486: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2019-10-23 05:11:59.869081: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-10-23 05:11:59.869510: I tensorflow/compiler/xla/serv

## Convert to TFLite (may not be needed?)

In [0]:
!toco --graph_def_file="{DRIVE_STORAGE_PATH}/tiny_conv.pb" --output_file="{DRIVE_STORAGE_PATH}/tiny_conv.tflite" --input_shapes=1,49,40,1 --input_arrays=Reshape_1 --output_arrays='labels_softmax' --inference_type=QUANTIZED_UINT8 --mean_values=0 --std_dev_values=9.8077

2019-10-21 21:10:58.069629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-10-21 21:10:58.101409: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-21 21:10:58.102185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
2019-10-21 21:10:58.102478: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-10-21 21:10:58.103689: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-10-21 21:10:58.104799: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.

## Spot checks

In [0]:
import os.path
if not os.path.exists('/tmp/speech_dataset'):
  !wget https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
  !mkdir /tmp/speech_dataset
  !tar -xzf speech_commands_v0.02.tar.gz -C /tmp/speech_dataset

ensure_drive()


for phrase in ('yes', 'no', 'right'):
  wav = !ls -1 /tmp/speech_dataset/{phrase}/*.wav | head -n 1
  wav = wav[0]

  print('')
  print(f'{phrase:10s} <' + '-' * 69)
  !cd tensorflow && python tensorflow/examples/speech_commands/label_wav.py --graph="{DRIVE_STORAGE_PATH}/tiny_conv.pb" --labels="{DRIVE_STORAGE_PATH}/speech_commands_train/tiny_conv_labels.txt" --wav={wav} 2>/dev/null



yes        <---------------------------------------------------------------------
yes (score = 0.97023)
_unknown_ (score = 0.02556)
no (score = 0.00415)

no         <---------------------------------------------------------------------
no (score = 0.49113)
yes (score = 0.36273)
_unknown_ (score = 0.14614)

right      <---------------------------------------------------------------------
_unknown_ (score = 0.84957)
no (score = 0.07522)
yes (score = 0.07522)
