**Training of a Visual Wake Word ML Application based on the Mobile Net Architecture**

This script is based on the instructions and code given for the training of the Person Detection Algorithm of TensorFlow Micro:

https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/examples/person_detection/training_a_model.md

In [None]:
#mount google drive
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
#download tensorflow models, we will be using MobileNet for this application
! git clone https://github.com/tensorflow/models.git

Cloning into 'models'...
remote: Enumerating objects: 69341, done.[K
remote: Total 69341 (delta 0), reused 0 (delta 0), pack-reused 69341[K
Receiving objects: 100% (69341/69341), 577.37 MiB | 26.38 MiB/s, done.
Resolving deltas: 100% (48882/48882), done.


In [None]:
!ls 

gdrive	models	sample_data


In [None]:
#set up environment
! pip install contextlib2
import os
new_python_path = (os.environ.get("PYTHONPATH") or '') + ":models/research/slim"
%env PYTHONPATH=$new_python_path

env: PYTHONPATH=/env/python:models/research/slim


In [None]:
#download visual wake word data set
! python models/research/slim/download_and_convert_data.py \
--logtostderr \
--dataset_name=visualwakewords \
--dataset_dir=person_detection_dataset \
--foreground_class_of_interest='person' \
--small_object_area_threshold=0.005

>> Downloading train2014.zip 100.0%
Successfully downloaded train2014.zip 13510573713 bytes.
>> Downloading val2014.zip 100.0%
Successfully downloaded val2014.zip 6645013297 bytes.
>> Downloading annotations_trainval2014.zip 100.0%
Successfully downloaded annotations_trainval2014.zip 252872794 bytes.
INFO:tensorflow:Creating a labels file...
I0301 02:31:48.149631 139645597820800 download_and_convert_visualwakewords.py:130] Creating a labels file...
INFO:tensorflow:Creating train VisualWakeWords annotations...
I0301 02:31:48.156384 139645597820800 download_and_convert_visualwakewords.py:135] Creating train VisualWakeWords annotations...
INFO:tensorflow:Building annotations index...
I0301 02:32:00.051107 139645597820800 download_and_convert_visualwakewords_lib.py:115] Building annotations index...
INFO:tensorflow:702 images are missing annotations.
I0301 02:32:00.575697 139645597820800 download_and_convert_visualwakewords_lib.py:123] 702 images are missing annotations.
INFO:tensorflow:On

In [None]:
!ls

gdrive	models	person_detection_dataset  sample_data


In [None]:
!pip install tf_slim

Collecting tf_slim
  Downloading tf_slim-1.1.0-py2.py3-none-any.whl (352 kB)
[?25l[K     |█                               | 10 kB 30.9 MB/s eta 0:00:01[K     |█▉                              | 20 kB 36.9 MB/s eta 0:00:01[K     |██▉                             | 30 kB 25.6 MB/s eta 0:00:01[K     |███▊                            | 40 kB 15.4 MB/s eta 0:00:01[K     |████▋                           | 51 kB 13.6 MB/s eta 0:00:01[K     |█████▋                          | 61 kB 15.8 MB/s eta 0:00:01[K     |██████▌                         | 71 kB 15.7 MB/s eta 0:00:01[K     |███████▌                        | 81 kB 15.2 MB/s eta 0:00:01[K     |████████▍                       | 92 kB 16.7 MB/s eta 0:00:01[K     |█████████▎                      | 102 kB 16.7 MB/s eta 0:00:01[K     |██████████▎                     | 112 kB 16.7 MB/s eta 0:00:01[K     |███████████▏                    | 122 kB 16.7 MB/s eta 0:00:01[K     |████████████                    | 133 kB 16.7 MB/s e

In [None]:
#make sure that you are using tensorflow 1 for this script (default in Google Colab is TF2)
%tensorflow_version 1.x

TensorFlow 1.x selected.


In [None]:
#check TF version
import tensorflow
print(tensorflow.__version__)

1.15.2


In [None]:
#make sure that you are using the GPU for faster testing
print(tensorflow.test.gpu_device_name())

/device:GPU:0


In [None]:
#!rm -rf person_detection_train/
#!rm -rf tensorflow_new

In [None]:
#train the model
! python models/research/slim/train_image_classifier.py \
    --alsologtostderr \
    --dataset_name=visualwakewords \
    --dataset_dir=person_detection_dataset \
    --dataset_split_name=train \
    --train_image_size=96 \
    --use_grayscale=True \
    --preprocessing_name=mobilenet_v1 \
    --model_name=mobilenet_v1_025 \
    --train_dir=gdrive/MyDrive/person_detection_train \
    --save_summaries_secs=300 \
    --learning_rate=0.045 \
    --label_smoothing=0.1 \
    --learning_rate_decay_factor=0.98 \
    --num_epochs_per_decay=2.5 \
    --moving_average_decay=0.9999 \
    --batch_size=96 \
    --max_number_of_steps=1000000

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
INFO:tensorflow:global step 1075630: loss = 0.5581 (0.453 sec/step)
I0228 16:12:15.887226 140119110559616 learning.py:512] global step 1075630: loss = 0.5581 (0.453 sec/step)
INFO:tensorflow:global step 1075640: loss = 0.5132 (0.401 sec/step)
I0228 16:12:20.547901 140119110559616 learning.py:512] global step 1075640: loss = 0.5132 (0.401 sec/step)
INFO:tensorflow:global step 1075650: loss = 0.4573 (0.429 sec/step)
I0228 16:12:25.358636 140119110559616 learning.py:512] global step 1075650: loss = 0.4573 (0.429 sec/step)
INFO:tensorflow:global step 1075660: loss = 0.4727 (0.371 sec/step)
I0228 16:12:30.169979 140119110559616 learning.py:512] global step 1075660: loss = 0.4727 (0.371 sec/step)
INFO:tensorflow:global step 1075670: loss = 0.4677 (0.469 sec/step)
I0228 16:12:34.939930 140119110559616 learning.py:512] global step 1075670: loss = 0.4677 (0.469 sec/step)
INFO:tensorflow:global step 1075680: loss = 0.4564 (0.365 se

In [None]:
#test network and calculate accuracy
! python models/research/slim/eval_image_classifier.py \
    --alsologtostderr \
    --dataset_name=visualwakewords \
    --dataset_dir=person_detection_dataset \
    --dataset_split_name=val \
    --eval_image_size=96 \
    --use_grayscale=True \
    --preprocessing_name=mobilenet_v1 \
    --model_name=mobilenet_v1_025 \
    --train_dir=gdrive/MyDrive/person_detection_train \
    --checkpoint_path=gdrive/MyDrive/person_detection_train/model.ckpt-1000000

Instructions for updating:
Please switch to tf.train.get_or_create_global_step
W0301 03:00:10.596404 140308424501120 deprecation.py:323] From models/research/slim/eval_image_classifier.py:98: get_or_create_global_step (from tf_slim.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensor_slices(string_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs)`. If `shuffle=False`, omit the `.shuffle(...)`.
W0301 03:00:10.606663 140308424501120 deprecation.py:323] From /usr/local/lib/python3.7/dist-packages/tf_slim/data/parallel_reader.py:249: string_input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.dat

In [None]:
#export the inference graph
! python models/research/slim/export_inference_graph.py \
    --alsologtostderr \
    --dataset_name=visualwakewords \
    --image_size=96 \
    --use_grayscale=True \
    --model_name=mobilenet_v1_025 \
    --output_file=person_detection_graph.pb

INFO:tensorflow:Scale of 0 disables regularizer.
I0228 05:52:27.758332 140488768460672 regularizers.py:99] Scale of 0 disables regularizer.
Instructions for updating:
Please use `layer.__call__` method instead.
W0228 05:52:27.759882 140488768460672 deprecation.py:323] From /usr/local/lib/python3.7/dist-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.


In [None]:
!ls

gdrive	person_detection_dataset   person_detection_train  tensorflow
models	person_detection_graph.pb  sample_data


In [None]:
#! git clone -b r1.15 --single-branch https://github.com/tensorflow/tensorflow
! python tensorflow/tensorflow/python/tools/freeze_graph.py \
--input_graph=person_detection_graph.pb \
--input_checkpoint=gdrive/MyDrive/person_detection_train/model.ckpt-1000000 \
--input_binary=true \
--output_node_names=MobilenetV1/Predictions/Reshape_1 \
--output_graph=person_detection_frozen_graph.pb

Instructions for updating:
Use standard file APIs to check for files with this prefix.
W0228 05:55:42.767617 139770412267392 deprecation.py:323] From tensorflow/tensorflow/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2022-02-28 05:55:42.882013: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-02-28 05:55:42.887109: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-28 05:55:42.888018: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
20

In [None]:
import tensorflow.compat.v1 as tf
import io
import PIL
import numpy as np

In [None]:
#for a complete (also input and output) int TF Lite conversion, we need some samples to extract the range of float values
def representative_dataset_gen():

  record_iterator = tf.python_io.tf_record_iterator(path='person_detection_dataset/val.record-00000-of-00010')
  for _ in range(250):
	  string_record = next(record_iterator)
  example = tf.train.Example()
  example.ParseFromString(string_record)
  image_stream = io.BytesIO(example.features.feature['image/encoded'].bytes_list.value[0])
  image = PIL.Image.open(image_stream)
  image = image.resize((96, 96))
  image = image.convert('L')
  array = np.array(image)
  array = np.expand_dims(array, axis=2)
  array = np.expand_dims(array, axis=0)
  array = ((array / 127.5) - 1.0).astype(np.float32)
  yield([array])

In [None]:
#TF Lite Conversion
converter =tf.lite.TFLiteConverter.from_frozen_graph('person_detection_frozen_graph.pb',['input'], ['MobilenetV1/Predictions/Reshape_1'])
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8

tflite_quant_model = converter.convert()
open("person_detection_model.tflite", "wb").write(tflite_quant_model)

Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`


300536

In [None]:
#zip files for easier download
!zip -r person_detection_train.zip gdrive/MyDrive/person_detection_train

  adding: gdrive/MyDrive/person_detection_train/ (stored 0%)
  adding: gdrive/MyDrive/person_detection_train/events.out.tfevents.1645233017.adb061741171 (deflated 26%)
  adding: gdrive/MyDrive/person_detection_train/events.out.tfevents.1645262814.531d09240bc1 (deflated 18%)
  adding: gdrive/MyDrive/person_detection_train/events.out.tfevents.1645322630.4d81f6e8465d (deflated 18%)
  adding: gdrive/MyDrive/person_detection_train/events.out.tfevents.1645375041.456ff4222da9 (deflated 18%)
  adding: gdrive/MyDrive/person_detection_train/events.out.tfevents.1645408611.0899877c2e5c (deflated 20%)
  adding: gdrive/MyDrive/person_detection_train/events.out.tfevents.1645427836.0899877c2e5c (deflated 25%)
  adding: gdrive/MyDrive/person_detection_train/events.out.tfevents.1645458453.ba5a5f5e6d7d (deflated 19%)
  adding: gdrive/MyDrive/person_detection_train/events.out.tfevents.1645492561.f2c9e707745e (deflated 18%)
  adding: gdrive/MyDrive/person_detection_train/events.out.tfevents.1645544188.1139

In [None]:
from google.colab import files
files.download('person_detection_train.zip') 

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# Install xxd if it is not available
! apt-get -qq install xxd
# Save the file as a C source file
! xxd -i person_detection_model.tflite > person_detect_model_data.cc