# Overview

This notebook is a modified version of the example notebook from Tensorflow Lite for Microcontrollers repository:

https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/examples/micro_speech/train/train_micro_speech_model.ipynb

Many thanks to the authors for publishing this resource openly with a permissive license. 

The TinyML book by Pete Warden and Daniel Situnayake (https://www.oreilly.com/library/view/tinyml/9781492052036/) was a (for me) indispensable resource in making the modifications work. 

The main changes with respect to the original version are:

- new (different) dataset (see other notebooks in this repository for details on pre-processing and preparation)
- changed parameters to adapt the model training to the new data set
- some adjustments to execute model conversion on a Windows 10 machine

I have left the original code intact. Added or changed text (not changed parameters!) with respect to the original notebook are from here on indicated with [ESC50] label. 

# Train a Simple Audio Recognition Model

This notebook demonstrates how to train a 20 kB [Simple Audio Recognition](https://www.tensorflow.org/tutorials/sequences/audio_recognition) model to recognize keywords in speech.

The model created in this notebook is used in the [micro_speech](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro/examples/micro_speech) example for [TensorFlow Lite for MicroControllers](https://www.tensorflow.org/lite/microcontrollers/overview).

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/train/train_micro_speech_model.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/train/train_micro_speech_model.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>


**Training is much faster using GPU acceleration.** Before you proceed, ensure you are using a GPU runtime by going to **Runtime -> Change runtime type** and set **Hardware accelerator: GPU**. Training 15,000 iterations will take 1.5 - 2 hours on a GPU runtime.

## Configure Defaults

**MODIFY** the following constants for your specific use case.

In [1]:
# A comma-delimited list of the words you want to train for.
# The options are: yes,no,up,down,left,right,on,off,stop,go
# All the other words will be used to train an "unknown" label and silent
# audio data with no spoken words will be used to train a "silence" label.
WANTED_WORDS = "helicopter,siren"

# [ESC50]

# If own dataset is available, put path here and add empty string for data URL
DATASET_DIR = ""
DATA_URL = ""

# If data set has different parameters (clip duration, sampling rate), add here
# Note that preliminary experiments with different clip length showed rapid increase in training time
SAMPLE_RATE = "16000"
CLIP_DURATION_MS = "1000"
BACKGROUND_FREQ = 0.8
BACKGROUND_VOLUME = 0.1

# [/ESC50]

# The number of steps and learning rates can be specified as comma-separated
# lists to define the rate at each stage. For example,
# TRAINING_STEPS=12000,3000 and LEARNING_RATE=0.001,0.0001
# will run 12,000 training loops in total, with a rate of 0.001 for the first
# 8,000, and 0.0001 for the final 3,000.
TRAINING_STEPS = "12000,3000"
LEARNING_RATE = "0.001,0.0001"

# Calculate the total number of steps, which is used to identify the checkpoint
# file name.
TOTAL_STEPS = str(sum(map(lambda string: int(string), TRAINING_STEPS.split(","))))

# Print the configuration to confirm it
print("Training these words: %s" % WANTED_WORDS)
print("Training steps in each stage: %s" % TRAINING_STEPS)
print("Learning rate in each stage: %s" % LEARNING_RATE)
print("Total number of training steps: %s" % TOTAL_STEPS)

Training these words: helicopter,siren
Training steps in each stage: 12000,3000
Learning rate in each stage: 0.001,0.0001
Total number of training steps: 15000


**DO NOT MODIFY** the following constants as they include filepaths used in this notebook and data that is shared during training and inference.

**_[ESC50] use of own dataset requires to comment out the setting of dataset directory [/ESC50]_**

In [2]:
# Calculate the percentage of 'silence' and 'unknown' training samples required
# to ensure that we have equal number of samples for each label.
number_of_labels = WANTED_WORDS.count(',') + 1
number_of_total_labels = number_of_labels + 2 # for 'silence' and 'unknown' label
equal_percentage_of_training_samples = int(100.0/(number_of_total_labels))
SILENT_PERCENTAGE = equal_percentage_of_training_samples
UNKNOWN_PERCENTAGE = equal_percentage_of_training_samples

# Constants which are shared during training and inference
PREPROCESS = 'micro'
WINDOW_STRIDE = 20
MODEL_ARCHITECTURE = 'tiny_conv' # Other options include: single_fc, conv,
                      # low_latency_conv, low_latency_svdf, tiny_embedding_conv

# Constants used during training only
VERBOSITY = 'WARN'
EVAL_STEP_INTERVAL = '1000'
SAVE_STEP_INTERVAL = '1000'

# Constants for training directories and filepaths
# [ESC50]
# need to comment out data set dir variable if using own data 
#DATASET_DIR =  'dataset/'
# [/ESC50]
LOGS_DIR = 'logs/'
TRAIN_DIR = 'train/' # for training checkpoints and other files.

# Constants for inference directories and filepaths
import os
MODELS_DIR = 'models'
if not os.path.exists(MODELS_DIR):
  os.mkdir(MODELS_DIR)
MODEL_TF = os.path.join(MODELS_DIR, 'model.pb')
MODEL_TFLITE = os.path.join(MODELS_DIR, 'model.tflite')
FLOAT_MODEL_TFLITE = os.path.join(MODELS_DIR, 'float_model.tflite')
MODEL_TFLITE_MICRO = os.path.join(MODELS_DIR, 'model.cc')
SAVED_MODEL = os.path.join(MODELS_DIR, 'saved_model')

QUANT_INPUT_MIN = 0.0
QUANT_INPUT_MAX = 26.0
QUANT_INPUT_RANGE = QUANT_INPUT_MAX - QUANT_INPUT_MIN

## Setup Environment

**_[ESC50] All code cells in this section contain adaptations. Explanation within the cell as comments. [/ESC50]_** 

Install Dependencies

In [1]:
# magic command to choose tensorflow version not working
#%tensorflow_version 1.x
import tensorflow as tf
print(tf.__version__)

1.15.0


**DELETE** any old data from previous runs


In [4]:
# manually delete the old data (do NOT delete data dir if no download!)
#!rm -rf {DATASET_DIR} {LOGS_DIR} {TRAIN_DIR} {MODELS_DIR}

Clone the TensorFlow Github Repository, which contains the relevant code required to run this tutorial.

In [1]:
# done once, no need to redo
#!git clone -q --depth 1 https://github.com/tensorflow/tensorflow

Load TensorBoard to visualize the accuracy and loss as training proceeds.


In [6]:
# tensorboard not working
#%load_ext tensorboard
#%tensorboard --logdir {LOGS_DIR}

## Training

The following script downloads the dataset and begin training.

In [7]:
# [ESC50]
# added parameters for data url, sample rate, clip duration, and background frequency and volume
# [/ESC50]

!python tensorflow/tensorflow/examples/speech_commands/train.py \
--data_dir={DATASET_DIR} \
--wanted_words={WANTED_WORDS} \
--silence_percentage={SILENT_PERCENTAGE} \
--unknown_percentage={UNKNOWN_PERCENTAGE} \
--preprocess={PREPROCESS} \
--window_stride={WINDOW_STRIDE} \
--model_architecture={MODEL_ARCHITECTURE} \
--how_many_training_steps={TRAINING_STEPS} \
--learning_rate={LEARNING_RATE} \
--train_dir={TRAIN_DIR} \
--summaries_dir={LOGS_DIR} \
--verbosity={VERBOSITY} \
--eval_step_interval={EVAL_STEP_INTERVAL} \
--save_step_interval={SAVE_STEP_INTERVAL} \
--data_url={DATA_URL} \
--sample_rate={SAMPLE_RATE} \
--clip_duration_ms={CLIP_DURATION_MS} \
--background_frequency={BACKGROUND_FREQ} \
--background_volume={BACKGROUND_VOLUME}

2021-08-16 16:19:55.548209: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0816 16:19:56.208218 16072 deprecation.py:323] From C:\Users\OstermannFO\Miniconda3\envs\tinyml\lib\site-packages\tensorflow_core\python\ops\losses\losses_impl.py:121: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Use standard file APIs to delete files with this prefix.
W0816 17:05:59.441536 16072 deprecation.py:323] From C:\Users\OstermannFO\Miniconda3\envs\tinyml\lib\site-packages\tensorflow_core\python\training\saver.py:963: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future vers

## Skipping the training

If you don't want to spend an hour or two training the model from scratch, you can download pretrained checkpoints by uncommenting the lines below (removing the '#'s at the start of each line) and running them.

In [None]:
#!curl -O "https://storage.googleapis.com/download.tensorflow.org/models/tflite/speech_micro_train_2020_05_10.tgz"
#!tar xzf speech_micro_train_2020_05_10.tgz

## Generate a TensorFlow Model for Inference

Combine relevant training results (graph, weights, etc) into a single file for inference. This process is known as freezing a model and the resulting model is known as a frozen model/graph, as it cannot be further re-trained after this process.

In [8]:
# [ESC50]
# remove saved model manually if necessary
# added parameters sample rate and clip duration
# [/ESC50]

#!rm -rf {SAVED_MODEL}
!python tensorflow/tensorflow/examples/speech_commands/freeze.py \
--wanted_words=$WANTED_WORDS \
--window_stride_ms=$WINDOW_STRIDE \
--preprocess=$PREPROCESS \
--model_architecture=$MODEL_ARCHITECTURE \
--start_checkpoint=$TRAIN_DIR$MODEL_ARCHITECTURE".ckpt-"{TOTAL_STEPS} \
--save_format=saved_model \
--output_file={SAVED_MODEL} \
--sample_rate={SAMPLE_RATE} \
--clip_duration_ms={CLIP_DURATION_MS}

2021-08-16 18:26:19.027386: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
INFO:tensorflow:Restoring parameters from train/tiny_conv.ckpt-15000
I0816 18:26:19.121572 16956 saver.py:1284] Restoring parameters from train/tiny_conv.ckpt-15000
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
W0816 18:26:19.211090 16956 deprecation.py:323] From tensorflow/tensorflow/examples/speech_commands/freeze.py:235: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
W0816 18:26:19.212056 16956 deprecation.py:323] From C:\Users\OstermannFO\Miniconda3\envs\tinyml\lib\site-packages\tensorflow_core\python\framework\graph_util

## Generate a TensorFlow Lite Model

Convert the frozen graph into a TensorFlow Lite model, which is fully quantized for use with embedded devices.

The following cell will also print the model size, which will be under 20 kilobytes.

In [2]:
import sys
# We add this path so we can import the speech processing modules.
sys.path.append("tensorflow/tensorflow/examples/speech_commands/")
import input_data
import models
import numpy as np
import tensorflow as tf

In [10]:
SAMPLE_RATE = 16000
CLIP_DURATION_MS = 1000
WINDOW_SIZE_MS = 30.0
FEATURE_BIN_COUNT = 40
BACKGROUND_FREQUENCY = 0.8
BACKGROUND_VOLUME_RANGE = 0.1
TIME_SHIFT_MS = 100.0

# [ESC50] 
# comment out data url when using own data set
# [/ESC50]
#DATA_URL = 'https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz'
VALIDATION_PERCENTAGE = 10
TESTING_PERCENTAGE = 10

In [11]:
model_settings = models.prepare_model_settings(
    len(input_data.prepare_words_list(WANTED_WORDS.split(','))),
    SAMPLE_RATE, CLIP_DURATION_MS, WINDOW_SIZE_MS,
    WINDOW_STRIDE, FEATURE_BIN_COUNT, PREPROCESS)
print(model_settings)

{'desired_samples': 16000, 'window_size_samples': 480, 'window_stride_samples': 320, 'spectrogram_length': 49, 'fingerprint_width': 40, 'fingerprint_size': 1960, 'label_count': 4, 'sample_rate': 16000, 'preprocess': 'micro', 'average_window_width': -1}


In [12]:
audio_processor = input_data.AudioProcessor(
    DATA_URL, DATASET_DIR,
    SILENT_PERCENTAGE, UNKNOWN_PERCENTAGE,
    WANTED_WORDS.split(','), VALIDATION_PERCENTAGE,
    TESTING_PERCENTAGE, model_settings, LOGS_DIR)
print(audio_processor)

<input_data.AudioProcessor object at 0x00000158B3FF8F48>


In [13]:
print(tf.__version__)

1.15.0


In [14]:
# [ESC50]
# this code needs two adaptations for different data sets;
# first, the range for representative dataset generator must be equal or less than number of samples 
# reported in the training step above (the original value was 100);
# second, the reshape needs to be changed according to clip length (fingerprint size of model settings), 
# with a value of 1960 for the original 1s clips, and e.g., 9960 for 5s clips
# [/ESC50]

with tf.Session() as sess:
  float_converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL)
  float_tflite_model = float_converter.convert()
  float_tflite_model_size = open(FLOAT_MODEL_TFLITE, "wb").write(float_tflite_model)
  print("Float model is %d bytes" % float_tflite_model_size)

  converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL)
  converter.optimizations = [tf.lite.Optimize.DEFAULT]
  converter.inference_input_type = tf.lite.constants.INT8
  converter.inference_output_type = tf.lite.constants.INT8
  def representative_dataset_gen():
    for i in range(50): 
      data, _ = audio_processor.get_data(1, i*1, model_settings,
                                         BACKGROUND_FREQUENCY, 
                                         BACKGROUND_VOLUME_RANGE,
                                         TIME_SHIFT_MS,
                                         'testing',
                                         sess)
      flattened_data = np.array(data.flatten(), dtype=np.float32).reshape(1, 1960) 
      yield [flattened_data]
  converter.representative_dataset = representative_dataset_gen
  tflite_model = converter.convert()
  tflite_model_size = open(MODEL_TFLITE, "wb").write(tflite_model)
  print("Quantized model is %d bytes" % tflite_model_size)


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
INFO:tensorflow:Restoring parameters from models\saved_model\variables\variables
INFO:tensorflow:The given SavedModel MetaGraphDef contains SignatureDefs with the following keys: {'serving_default'}
INFO:tensorflow:input tensors info: 
INFO:tensorflow:Tensor's key in saved_model's tensor_map: input
INFO:tensorflow: tensor name: Reshape_1:0, shape: (1, 1960), type: DT_FLOAT
INFO:tensorflow:output tensors info: 
INFO:tensorflow:Tensor's key in saved_model's tensor_map: output
INFO:tensorflow: tensor name: labels_softmax:0, shape: (1, 4), type: DT_FLOAT
INFO:tensorflow:Restoring parameters from models\saved_model\variables\variables
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
Instructions for updating

## Testing the TensorFlow Lite model's accuracy

Verify that the model we've exported is still accurate, using the TF Lite Python API and our test set.

In [15]:
# Helper function to run inference
def run_tflite_inference(tflite_model_path, model_type="Float"):
  # Load test data
  np.random.seed(0) # set random seed for reproducible test results.
  with tf.Session() as sess:
    test_data, test_labels = audio_processor.get_data(
        -1, 0, model_settings, BACKGROUND_FREQUENCY, BACKGROUND_VOLUME_RANGE,
        TIME_SHIFT_MS, 'testing', sess)
  test_data = np.expand_dims(test_data, axis=1).astype(np.float32)

  # Initialize the interpreter
  interpreter = tf.lite.Interpreter(tflite_model_path)
  interpreter.allocate_tensors()

  input_details = interpreter.get_input_details()[0]
  output_details = interpreter.get_output_details()[0]

  # For quantized models, manually quantize the input data from float to integer
  if model_type == "Quantized":
    input_scale, input_zero_point = input_details["quantization"]
    test_data = test_data / input_scale + input_zero_point
    test_data = test_data.astype(input_details["dtype"])

  correct_predictions = 0
  for i in range(len(test_data)):
    interpreter.set_tensor(input_details["index"], test_data[i])
    interpreter.invoke()
    output = interpreter.get_tensor(output_details["index"])[0]
    top_prediction = output.argmax()
    correct_predictions += (top_prediction == test_labels[i])

  print('%s model accuracy is %f%% (Number of test samples=%d)' % (
      model_type, (correct_predictions * 100) / len(test_data), len(test_data)))

In [16]:
# Compute float model accuracy
run_tflite_inference(FLOAT_MODEL_TFLITE)

# Compute quantized model accuracy
run_tflite_inference(MODEL_TFLITE, model_type='Quantized')

Float model accuracy is 86.111111% (Number of test samples=72)
Quantized model accuracy is 86.111111% (Number of test samples=72)


## Generate a TensorFlow Lite for MicroControllers Model
Convert the TensorFlow Lite model into a C source file that can be loaded by TensorFlow Lite for Microcontrollers.

In [13]:
# [ESC50]
# the original commands obviously won't work in the Windows 10 environment used here;
# therefor, all lines are commented out and need to be replaced with the following command 
# run from within models directory with Windows Powershell (assuming VIM is installed)
#
#  & "C:\Program Files (x86)\Vim\vim82\xxd.exe" -i model.tflite > model.cc
#
# then either do the replacement of variable names manually (just two instances) or not at all, 
# because the last step can be copying array content and model length manually into the Arduino code
# [/ESC50]


# Install xxd if it is not available
#!apt-get update && apt-get -qq install xxd
# Convert to a C source file
#!xxd -i /content/tiny_conv.tflite > /content/tiny_conv.cc
# Update variable names
#REPLACE_TEXT = MODEL_TFLITE.replace('/', '_').replace('.', '_')
#!sed -i 's/'{REPLACE_TEXT}'/g_model/g' {MODEL_TFLITE_MICRO}

#print(MODEL_TFLITE, MODEL_TFLITE_MICRO)

models\model.tflite models\model.cc


## Deploy to a Microcontroller

Follow the instructions in the [micro_speech](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro/examples/micro_speech) README.md for [TensorFlow Lite for MicroControllers](https://www.tensorflow.org/lite/microcontrollers/overview) to deploy this model on a specific microcontroller.

**Reference Model:** If you have not modified this notebook, you can follow the instructions as is, to deploy the model. Refer to the [`micro_speech/train/models`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/train/models) directory to access the models generated in this notebook.

**New Model:** If you have generated a new model to identify different words: (i) Update `kCategoryCount` and `kCategoryLabels` in [`micro_speech/micro_features/micro_model_settings.h`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/micro_features/micro_model_settings.h) and (ii) Update the values assigned to the variables defined in [`micro_speech/micro_features/model.cc`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/micro_features/model.cc) with values displayed after running the following cell.

In [None]:
# [ESC50]
# easier and less prone to errors to copy and paste using a text editor
# [/ESC50]

# Print the C source file
#!cat {MODEL_TFLITE_MICRO}