<a href="https://colab.research.google.com/github/iwatake2222/pico-loud_talking_detector/blob/master/01_script/training/train_micro_speech_model_talking_20210529_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!nvidia-smi

# Train a Simple Audio Recognition Model

This notebook demonstrates how to train a 20 kB [Simple Audio Recognition](https://www.tensorflow.org/tutorials/sequences/audio_recognition) model to recognize keywords in speech.

The model created in this notebook is used in the [micro_speech](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro/examples/micro_speech) example for [TensorFlow Lite for MicroControllers](https://www.tensorflow.org/lite/microcontrollers/overview).

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/train/train_micro_speech_model.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/train/train_micro_speech_model.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>


## Prepare my dataset (iwatake2222)

### Download dataset (iwatake2222)

In [None]:
import os
import glob
import subprocess

CLIP_DURATION = 10 * 1000

DATASET_DIR =  "dataset/"
!rm -rf {DATASET_DIR} && mkdir -p {DATASET_DIR}
 
def download(file_id, file_name):
  subprocess.run(["curl", "-sc", "/tmp/cookie", f"https://drive.google.com/uc?export=download&id={file_id}"])
  cmd = ["awk", "/_warning_/ {print $NF}", "/tmp/cookie"]
  code = subprocess.run(cmd, shell=False, stdout=subprocess.PIPE, check=True).stdout.decode("utf-8").replace("\n", "")
  subprocess.run(["curl", "-Lb", "/tmp/cookie", f"https://drive.google.com/uc?export=download&confirm={code}&id={file_id}", "-o", f"{file_name}"])
 
# def extract_dataset(file_id, file_name, dataset_dir):
#   download(file_id, file_name)
#   subprocess.run(["unzip", "-o", f"{file_name}"])
#   data_path = os.path.splitext(os.path.basename(file_name))[0]
#   subprocess.run(f"cp -rf {data_path}/* {dataset_dir}/.", shell=True)
def extract_dataset(file_id, file_name, dataset_dir):
  if not os.path.exists(file_name):
    download(file_id, file_name)
  subprocess.run(["tar", "xzvf", file_name, "--strip", "1", "-C", dataset_dir])
 
''' Download my dataset '''
# AudioSet (use not_talking data only)
extract_dataset("1wLit745TX4rw_KgUJULuhZxXETC58fgd", "balanced_train_segments.tgz", DATASET_DIR)
# extract_dataset("1l5pj8DO0rreT-OimYdZAf2mJdmV_6nYu", "eval_segments.tgz", DATASET_DIR)
!rm -rf {DATASET_DIR}/ambiguous
!rm -rf {DATASET_DIR}/talking/*
 
# My data
extract_dataset("13HC_vZeXwqpz4eKZliLKomvTBuYW1-jt", "music_pops.tgz", DATASET_DIR + "not_talking")
extract_dataset("1tlkCJ2RnhapPmPeIhd09dzTD9xrovyV4", "music.tgz", DATASET_DIR + "not_talking")
extract_dataset("12_z34bHB1bR3QbTM-rzJOq8Fol1G3kpj", "yoshimoto.tgz", DATASET_DIR + "talking")
 
!rm -rf temp && mkdir temp
extract_dataset("1MhhJJxqCdd33gxgwBkuTJzABURjOsUNP", "my_jp_talk.tgz", "temp")
!find temp -name *.wav -exec mv {} temp \;
!mv temp/*.wav {DATASET_DIR}"/talking/."
!rm -rf temp
 
# WANTED_WORDS_LIST = ["talking", "not_talking", "ambiguous"]
WANTED_WORDS_LIST = ["talking", "not_talking"]
 
# ''' Download background_noise '''
# !curl -O "https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz"
# !tar xzf speech_commands_v0.02.tar.gz ./_background_noise_
# !cp -r _background_noise_ {DATASET_DIR}/.

In [None]:
import os
import glob
import random

''' Adjust the number of dataset '''
def get_file_num(directory):
  return len([name for name in os.listdir(directory) if os.path.isfile(directory + "/" + name)])

def choose_random_data(DATASET_DIR, label, data_num):
  org_path = DATASET_DIR + "/" + label + "/"
  tmp_path = DATASET_DIR + "/temp_" + label + "/"
  os.makedirs(tmp_path, exist_ok=True)
  subprocess.run(f"mv {org_path}/* {tmp_path}/.", shell=True)

  files = [r.split('/')[-1] for r in glob.glob(tmp_path + "/*.wav")]
  for i in range(data_num):
    chosen_file_name = random.choice(files)
    files.remove(chosen_file_name)
    chosen_file_path = tmp_path + chosen_file_name
    subprocess.run(f"mv {chosen_file_path} {org_path}/.", shell=True)
  subprocess.run(f"rm -rf {tmp_path}", shell=True)
  !rm -rf {tmp_path}

# data_num = get_file_num(DATASET_DIR + "talking")
# print(str(get_file_num(DATASET_DIR + "talking")), str(get_file_num(DATASET_DIR + "not_talking")), str(get_file_num(DATASET_DIR + "ambiguous")))
# choose_random_data(DATASET_DIR, "not_talking", data_num)
# choose_random_data(DATASET_DIR, "ambiguous", data_num)
# print(str(get_file_num(DATASET_DIR + "talking")), str(get_file_num(DATASET_DIR + "not_talking")), str(get_file_num(DATASET_DIR + "ambiguous")))
print(str(get_file_num(DATASET_DIR + "talking")), str(get_file_num(DATASET_DIR + "not_talking")))

In [None]:
import os
import glob
import shutil
import time
import random
import librosa
import soundfile

def separate_wav(target_dir, output_dir, sampling_rate=16000, output_duration_time=5,delete_original=False):
    output_duration_sample = int(sampling_rate * output_duration_time)

    ''' Process for selected input files '''
    wav_path_list = glob.glob(target_dir + "/*.wav")
    for wav_path in wav_path_list:
        basename, ext = os.path.splitext(os.path.basename(wav_path))

        try:
            data, sr = librosa.core.load(wav_path, sr=sampling_rate, mono=True)
        except:
            continue
        duration_sample = len(data)
        index = 0
        while (index + 0.5) * output_duration_sample <= duration_sample:
            if (index + 1) * output_duration_sample <= duration_sample:
                data_out = data[index * output_duration_sample : (index + 1) * output_duration_sample]
            else:
                data_out = data[duration_sample - output_duration_sample : duration_sample]
            output_path = output_dir + "/" + basename + "_" + f"{index:02}" + ".wav"
            soundfile.write(output_path, data_out, samplerate=sampling_rate, subtype="PCM_16")
            index += 1
        if delete_original:
            os.remove(wav_path)
        rest_data_sample = duration_sample - (index * output_duration_sample)
        # if rest_data_sample > 0:
        #     print(f"Warning: data dropped, {basename}, {str(rest_data_sample)}")

separate_wav(DATASET_DIR + "/talking/.", DATASET_DIR + "/talking/.", sampling_rate=16000, output_duration_time=CLIP_DURATION/1000, delete_original=True)
separate_wav(DATASET_DIR + "/not_talking/.", DATASET_DIR + "/not_talking/.", sampling_rate=16000, output_duration_time=CLIP_DURATION/1000, delete_original=True)

print(str(get_file_num(DATASET_DIR + "talking")), str(get_file_num(DATASET_DIR + "not_talking")))

### Add noise and background (iwatake2222)

In [None]:
!apt install librosa soundfile

In [None]:
import os
import glob
import shutil
import time
import random
import librosa
import soundfile

def clear_last_sep(dir):
    dir.replace(os.sep,'/')
    if dir[-1] == "/":
        dir = dir[:-1]
    return dir

def add_noise(target_dir, noise_dir, output_dir, signature_text="", sampling_rate=16000, process_ratio=0.3, original_volume=1.0, noise_volume=1.0):
    if signature_text == "":
        signature_text = os.path.splitext(os.path.basename(clear_last_sep(noise_dir)))[0]

    ''' Make sure the shuffling is deterministic for reproduce '''
    random.seed(1234)

    ''' Read noise data as array '''
    noise_list = []
    noise_wav_path_list = glob.glob(noise_dir + "/*.wav")
    for noise_wav_path in noise_wav_path_list:
        data, sr = librosa.core.load(noise_wav_path, sr=sampling_rate, mono=True)
        noise_list.append(data)

    ''' Process for selected input files '''
    wav_path_list = glob.glob(target_dir + "/*.wav")
    random.shuffle(wav_path_list)
    wav_path_list = wav_path_list[:int(len(wav_path_list) * process_ratio)]
    for wav_path in wav_path_list:
        basename, ext = os.path.splitext(os.path.basename(wav_path))
        output_path = output_dir + "/" + basename + "_" + signature_text + ".wav"

        data, sr = librosa.core.load(wav_path, sr=sampling_rate, mono=True)
        duration_sample = len(data)

        noise = random.choice(noise_list)
        start_sample = int(random.uniform(0, len(noise) - duration_sample - 1))
        data = data * original_volume + noise[start_sample:start_sample + duration_sample] * noise_volume

        soundfile.write(output_path, data, samplerate=sampling_rate, subtype="PCM_16")

def create_noise(noise_dir, output_dir, signature_text="", sampling_rate=16000, duration_time=10, output_number_of_file=10, noise_volume=1.0):
    if signature_text == "":
        signature_text = os.path.splitext(os.path.basename(clear_last_sep(noise_dir)))[0]

    ''' Make sure the shuffling is deterministic for reproduce '''
    random.seed(1234)

    ''' Read noise data as array '''
    noise_list = []
    noise_wav_path_list = glob.glob(noise_dir + "/*.wav")
    for noise_wav_path in noise_wav_path_list:
        data, sr = librosa.core.load(noise_wav_path, sr=sampling_rate, mono=True)
        noise_list.append(data)

    duration_sample = int(duration_time * sampling_rate)
    ''' Process to create files'''
    for i in range(output_number_of_file):
        noise = random.choice(noise_list)
        start_sample = int(random.uniform(0, len(noise) - duration_sample - 1))
        data = noise[start_sample:start_sample + duration_sample] * noise_volume
        output_path = output_dir + "/" + signature_text + f"_{i:05}.wav"
        soundfile.write(output_path, data, samplerate=sampling_rate, subtype="PCM_16")

In [None]:
''' Download noise data '''
BACKGROUND_DIR = "background/"
NOISE_DIR = "noise/"
!rm -rf {BACKGROUND_DIR} && mkdir -p {BACKGROUND_DIR}
!rm -rf {NOISE_DIR} && mkdir -p {NOISE_DIR}
extract_dataset("19vFtkVp1d_e8lBBnfnD8K8JohBPcPM_H", "background.tgz", BACKGROUND_DIR)
extract_dataset("12R0jtfXr6cGWYYZaTcV2B0nJvXV8AV-F", "mic_noise.tgz", NOISE_DIR)

''' Add noise files into "not_talking" '''
# Create noise data after separatint test data
# noise_num = int(get_file_num(DATASET_DIR + "not_talking") * 0.2)
# create_noise(BACKGROUND_DIR, DATASET_DIR + "not_talking", duration_time=CLIP_DURATION/1000, output_number_of_file=noise_num, noise_volume=1.0)
# create_noise(NOISE_DIR, DATASET_DIR + "not_talking", duration_time=CLIP_DURATION/1000, output_number_of_file=noise_num, noise_volume=1.0)

''' Mix noise and "talking" '''
# the order is important (mix background, then mix noise)
add_noise(DATASET_DIR + "talking", BACKGROUND_DIR, DATASET_DIR + "talking", process_ratio=0.5, original_volume=1.0, noise_volume=1.0)
add_noise(DATASET_DIR + "talking", NOISE_DIR, DATASET_DIR + "talking", process_ratio=0.5, original_volume=1.0, noise_volume=1.0)
print(str(get_file_num(DATASET_DIR + "talking")), str(get_file_num(DATASET_DIR + "not_talking")))

### Separate test data
Manually separate test data from training data. because the training script randomly pick-up training/validation/test data and it causes data leakage.
Copy wav files which has similar prefix

In [None]:
import shutil

DATASET_TEST_DIR =  "dataset_test/"
!rm -rf {DATASET_TEST_DIR} && mkdir -p {DATASET_TEST_DIR} && mkdir -p {DATASET_TEST_DIR}/talking && mkdir -p {DATASET_TEST_DIR}/not_talking
def move_test_data(src_path, dst_path, ratio):
  random.seed(984983)
  num_to_be_mv = int(len(glob.glob(src_path + "/*.wav")) * ratio)
  while num_to_be_mv > 0:
    org_file_list = glob.glob(src_path + "/*.wav")
    basename, ext = os.path.splitext(os.path.basename(random.choice(org_file_list)))
    # Check the first letters to identigy if the files are the same (see longer is filename is date)
    num_to_identify_scene = 5
    if basename[:8].isdigit():
      num_to_identify_scene = 13
    target_file_list = glob.glob(src_path + "/" + basename[:num_to_identify_scene] + "*.wav")
    for target_file in target_file_list:
      shutil.move(target_file, dst_path + "/.")
      num_to_be_mv -= 1

move_test_data(DATASET_DIR + "talking", DATASET_TEST_DIR + "talking", 0.1)
move_test_data(DATASET_DIR + "not_talking", DATASET_TEST_DIR + "not_talking", 0.1)
# move_test_data(DATASET_TEST_DIR + "talking", DATASET_DIR + "talking", 1.0)  # revert
# move_test_data(DATASET_TEST_DIR + "not_talking", DATASET_DIR + "not_talking", 1.0) # revert
print(str(get_file_num(DATASET_DIR + "talking")), str(get_file_num(DATASET_DIR + "not_talking")))
print(str(get_file_num(DATASET_TEST_DIR + "talking")), str(get_file_num(DATASET_TEST_DIR + "not_talking")))

In [None]:
''' Add noise data to training data (this sould be after separating test data) '''
noise_num = int(get_file_num(DATASET_DIR + "not_talking") * 0.2)
create_noise(BACKGROUND_DIR, DATASET_DIR + "not_talking", duration_time=CLIP_DURATION/1000, output_number_of_file=noise_num, noise_volume=1.0)
create_noise(NOISE_DIR, DATASET_DIR + "not_talking", duration_time=CLIP_DURATION/1000, output_number_of_file=noise_num, noise_volume=1.0)
print(str(get_file_num(DATASET_DIR + "talking")), str(get_file_num(DATASET_DIR + "not_talking")))
print(str(get_file_num(DATASET_TEST_DIR + "talking")), str(get_file_num(DATASET_TEST_DIR + "not_talking")))

**Training is much faster using GPU acceleration.** Before you proceed, ensure you are using a GPU runtime by going to **Runtime -> Change runtime type** and set **Hardware accelerator: GPU**. Training 15,000 iterations will take 1.5 - 2 hours on a GPU runtime.

## Configure Defaults

**MODIFY** the following constants for your specific use case.

In [None]:
# A comma-delimited list of the words you want to train for.
# The options are: yes,no,up,down,left,right,on,off,stop,go
# All the other words will be used to train an "unknown" label and silent
# audio data with no spoken words will be used to train a "silence" label.

''' Modified by iwatake2222 ---------------------------------------------- '''
# WANTED_WORDS = "yes,no"
WANTED_WORDS = ",".join(WANTED_WORDS_LIST)
# The number of steps and learning rates can be specified as comma-separated
# lists to define the rate at each stage. For example,
# TRAINING_STEPS=12000,3000 and LEARNING_RATE=0.001,0.0001
# will run 12,000 training loops in total, with a rate of 0.001 for the first
# 8,000, and 0.0001 for the final 3,000.
''' Modified by iwatake2222 ---------------------------------------------- '''
TRAINING_STEPS = "12000,3000"
# TRAINING_STEPS = "5000,1000"
LEARNING_RATE = "0.001,0.0001"

# Calculate the total number of steps, which is used to identify the checkpoint
# file name.
TOTAL_STEPS = str(sum(map(lambda string: int(string), TRAINING_STEPS.split(","))))

# Print the configuration to confirm it
print("Training these words: %s" % WANTED_WORDS)
print("Training steps in each stage: %s" % TRAINING_STEPS)
print("Learning rate in each stage: %s" % LEARNING_RATE)
print("Total number of training steps: %s" % TOTAL_STEPS)

**DO NOT MODIFY** the following constants as they include filepaths used in this notebook and data that is shared during training and inference.

In [None]:
# Calculate the percentage of 'silence' and 'unknown' training samples required
# to ensure that we have equal number of samples for each label.
number_of_labels = WANTED_WORDS.count(',') + 1
number_of_total_labels = number_of_labels + 2 # for 'silence' and 'unknown' label
equal_percentage_of_training_samples = int(100.0/(number_of_total_labels))
''' Modified by iwatake2222 ---------------------------------------------- '''
equal_percentage_of_training_samples = 0
SILENT_PERCENTAGE = equal_percentage_of_training_samples
UNKNOWN_PERCENTAGE = equal_percentage_of_training_samples

# Constants which are shared during training and inference
PREPROCESS = 'micro'
''' Modified by iwatake2222 ---------------------------------------------- '''
# CLIP_DURATION = 10000
WINDOW_SIZE = 30
WINDOW_STRIDE = 20
FEATURE_BIN_COUNT = 40
# MODEL_ARCHITECTURE = 'conv' # Other options include: single_fc, conv,
                      # low_latency_conv, low_latency_svdf, tiny_embedding_conv
MODEL_ARCHITECTURE = 'tiny_conv' # Other options include: single_fc, conv,

# Constants used during training only
VERBOSITY = 'DEBUG'
''' Modified by iwatake2222 ---------------------------------------------- '''
EVAL_STEP_INTERVAL = '1000'
SAVE_STEP_INTERVAL = '1000'
# EVAL_STEP_INTERVAL = '100'
# SAVE_STEP_INTERVAL = '100'

# Constants for training directories and filepaths
# DATASET_DIR =  'dataset/'
LOGS_DIR = 'logs/'
TRAIN_DIR = 'train/' # for training checkpoints and other files.

# Constants for inference directories and filepaths
import os
MODELS_DIR = 'models'
if not os.path.exists(MODELS_DIR):
  os.mkdir(MODELS_DIR)
MODEL_TF = os.path.join(MODELS_DIR, 'model.pb')
MODEL_TFLITE = os.path.join(MODELS_DIR, 'model.tflite')
FLOAT_MODEL_TFLITE = os.path.join(MODELS_DIR, 'float_model.tflite')
MODEL_TFLITE_MICRO = os.path.join(MODELS_DIR, 'model.cc')
SAVED_MODEL = os.path.join(MODELS_DIR, 'saved_model')

QUANT_INPUT_MIN = 0.0
QUANT_INPUT_MAX = 26.0
QUANT_INPUT_RANGE = QUANT_INPUT_MAX - QUANT_INPUT_MIN

## Setup Environment

Install Dependencies

In [None]:
%tensorflow_version 1.x
import tensorflow as tf

**DELETE** any old data from previous runs


In [None]:
''' Modified by iwatake2222 ---------------------------------------------- '''
# !rm -rf {DATASET_DIR} {LOGS_DIR} {TRAIN_DIR} {MODELS_DIR}
!rm -rf {LOGS_DIR} {TRAIN_DIR} {MODELS_DIR}

Clone the TensorFlow Github Repository, which contains the relevant code required to run this tutorial.

In [None]:
!git clone -q --depth 1 https://github.com/tensorflow/tensorflow

Load TensorBoard to visualize the accuracy and loss as training proceeds.


In [None]:
%load_ext tensorboard
%tensorboard --logdir {LOGS_DIR}

## Training

The following script downloads the dataset and begin training.

In [None]:
''' Modified by iwatake2222 ----------------------------------------------- '''
# !python tensorflow/tensorflow/examples/speech_commands/train.py \
# --data_dir={DATASET_DIR} \
# --wanted_words={WANTED_WORDS} \
# --silence_percentage={SILENT_PERCENTAGE} \
# --unknown_percentage={UNKNOWN_PERCENTAGE} \
# --preprocess={PREPROCESS} \
# --window_stride={WINDOW_STRIDE} \
# --model_architecture={MODEL_ARCHITECTURE} \
# --how_many_training_steps={TRAINING_STEPS} \
# --learning_rate={LEARNING_RATE} \
# --train_dir={TRAIN_DIR} \
# --summaries_dir={LOGS_DIR} \
# --verbosity={VERBOSITY} \
# --eval_step_interval={EVAL_STEP_INTERVAL} \
# --save_step_interval={SAVE_STEP_INTERVAL}
!python tensorflow/tensorflow/examples/speech_commands/train.py \
--data_url="" \
--data_dir={DATASET_DIR} \
--wanted_words={WANTED_WORDS} \
--silence_percentage={SILENT_PERCENTAGE} \
--unknown_percentage={UNKNOWN_PERCENTAGE} \
--preprocess={PREPROCESS} \
--clip_duration_ms={CLIP_DURATION} \
--window_size_ms={WINDOW_SIZE} \
--window_stride={WINDOW_STRIDE} \
--feature_bin_count={FEATURE_BIN_COUNT} \
--model_architecture={MODEL_ARCHITECTURE} \
--how_many_training_steps={TRAINING_STEPS} \
--learning_rate={LEARNING_RATE} \
--train_dir={TRAIN_DIR} \
--summaries_dir={LOGS_DIR} \
--verbosity={VERBOSITY} \
--validation_percentage=10 \
--testing_percentage=0 \
--eval_step_interval={EVAL_STEP_INTERVAL} \
--save_step_interval={SAVE_STEP_INTERVAL} 
# \
# --start_checkpoint=./train/conv.ckpt-3000

# Skipping the training

If you don't want to spend an hour or two training the model from scratch, you can download pretrained checkpoints by uncommenting the lines below (removing the '#'s at the start of each line) and running them.

In [None]:
#!curl -O "https://storage.googleapis.com/download.tensorflow.org/models/tflite/speech_micro_train_2020_05_10.tgz"
#!tar xzf speech_micro_train_2020_05_10.tgz

## Generate a TensorFlow Model for Inference

Combine relevant training results (graph, weights, etc) into a single file for inference. This process is known as freezing a model and the resulting model is known as a frozen model/graph, as it cannot be further re-trained after this process.

In [None]:
''' Modified by iwatake2222 ---------------------------------------------- '''
# !rm -rf {SAVED_MODEL}
# !python tensorflow/tensorflow/examples/speech_commands/freeze.py \
# --wanted_words=$WANTED_WORDS \
# --window_stride_ms=$WINDOW_STRIDE \
# --preprocess=$PREPROCESS \
# --model_architecture=$MODEL_ARCHITECTURE \
# --start_checkpoint=$TRAIN_DIR$MODEL_ARCHITECTURE'.ckpt-'{TOTAL_STEPS} \
# --save_format=saved_model \
# --output_file={SAVED_MODEL}
!rm -rf {SAVED_MODEL}
!python tensorflow/tensorflow/examples/speech_commands/freeze.py \
--wanted_words=$WANTED_WORDS \
--clip_duration_ms=$CLIP_DURATION \
--clip_stride_ms=$WINDOW_SIZE \
--window_size_ms=$WINDOW_SIZE \
--window_stride_ms=$WINDOW_STRIDE \
--feature_bin_count=$FEATURE_BIN_COUNT \
--preprocess=$PREPROCESS \
--model_architecture=$MODEL_ARCHITECTURE \
--start_checkpoint=$TRAIN_DIR$MODEL_ARCHITECTURE'.ckpt-'{TOTAL_STEPS} \
--save_format=saved_model \
--output_file={SAVED_MODEL}

## Generate a TensorFlow Lite Model

Convert the frozen graph into a TensorFlow Lite model, which is fully quantized for use with embedded devices.

The following cell will also print the model size, which will be under 20 kilobytes.

In [None]:
import sys
# We add this path so we can import the speech processing modules.
sys.path.append("/content/tensorflow/tensorflow/examples/speech_commands/")
import input_data
import models
import numpy as np

In [None]:
''' Modified by iwatake2222 ---------------------------------------------- '''
SAMPLE_RATE = 16000
# CLIP_DURATION_MS = 1000
# WINDOW_SIZE_MS = 30.0
# FEATURE_BIN_COUNT = 40
# BACKGROUND_FREQUENCY = 0.8
BACKGROUND_FREQUENCY = 0.0
BACKGROUND_VOLUME_RANGE = 0.1
TIME_SHIFT_MS = 100.0
 
''' Modified by iwatake2222 ---------------------------------------------- '''
# DATA_URL = 'https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz'
DATA_URL = ''
# VALIDATION_PERCENTAGE = 10
# TESTING_PERCENTAGE = 10
VALIDATION_PERCENTAGE = 0
# use 100% causes error (AudioProcessor needs at least 1 trainign data)
TESTING_PERCENTAGE = 99

In [None]:
''' Modified by iwatake2222 ---------------------------------------------- '''
model_settings = models.prepare_model_settings(
    len(input_data.prepare_words_list(WANTED_WORDS.split(','))),
    SAMPLE_RATE, CLIP_DURATION, WINDOW_SIZE,
    WINDOW_STRIDE, FEATURE_BIN_COUNT, PREPROCESS)
# audio_processor = input_data.AudioProcessor(
#     DATA_URL, DATASET_DIR,
#     SILENT_PERCENTAGE, UNKNOWN_PERCENTAGE,
#     WANTED_WORDS.split(','), VALIDATION_PERCENTAGE,
#     TESTING_PERCENTAGE, model_settings, LOGS_DIR)
audio_processor = input_data.AudioProcessor(
    DATA_URL, DATASET_TEST_DIR,
    SILENT_PERCENTAGE, UNKNOWN_PERCENTAGE,
    WANTED_WORDS.split(','), VALIDATION_PERCENTAGE,
    TESTING_PERCENTAGE, model_settings, LOGS_DIR)

In [None]:
with tf.Session() as sess:
  float_converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL)
  float_tflite_model = float_converter.convert()
  float_tflite_model_size = open(FLOAT_MODEL_TFLITE, "wb").write(float_tflite_model)
  print("Float model is %d bytes" % float_tflite_model_size)
 
  converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL)
  converter.optimizations = [tf.lite.Optimize.DEFAULT]
  converter.inference_input_type = tf.lite.constants.INT8
  converter.inference_output_type = tf.lite.constants.INT8
  def representative_dataset_gen():
    for i in range(1000):
      data, _ = audio_processor.get_data(1, i*1, model_settings,
                                         BACKGROUND_FREQUENCY, 
                                         BACKGROUND_VOLUME_RANGE,
                                         TIME_SHIFT_MS,
                                         'testing',
                                         sess)
      ''' Modified by iwatake2222 ---------------------------------------------- '''
      # flattened_data = np.array(data.flatten(), dtype=np.float32).reshape(1, 1960)
      flattened_data = np.array(data.flatten(), dtype=np.float32).reshape(1, int(40 * (CLIP_DURATION / 20 - 1)))
      yield [flattened_data]
  converter.representative_dataset = representative_dataset_gen
  tflite_model = converter.convert()
  tflite_model_size = open(MODEL_TFLITE, "wb").write(tflite_model)
  print("Quantized model is %d bytes" % tflite_model_size)

## Testing the TensorFlow Lite model's accuracy

Verify that the model we've exported is still accurate, using the TF Lite Python API and our test set.

In [None]:
# Helper function to run inference
def run_tflite_inference(tflite_model_path, model_type="Float"):
  # Load test data
  np.random.seed(0) # set random seed for reproducible test results.
  with tf.Session() as sess:
    test_data, test_labels = audio_processor.get_data(
        -1, 0, model_settings, BACKGROUND_FREQUENCY, BACKGROUND_VOLUME_RANGE,
        TIME_SHIFT_MS, 'testing', sess)
  test_data = np.expand_dims(test_data, axis=1).astype(np.float32)
 
  # Initialize the interpreter
  interpreter = tf.lite.Interpreter(tflite_model_path)
  interpreter.allocate_tensors()
 
  input_details = interpreter.get_input_details()[0]
  output_details = interpreter.get_output_details()[0]
 
  # For quantized models, manually quantize the input data from float to integer
  if model_type == "Quantized":
    input_scale, input_zero_point = input_details["quantization"]
    test_data = test_data / input_scale + input_zero_point
    test_data = test_data.astype(input_details["dtype"])
 
  correct_predictions = 0
  for i in range(len(test_data)):
    interpreter.set_tensor(input_details["index"], test_data[i])

    ''' Modified by iwatake2222 ---------------------------------------------- '''
    # To avoid "interpreter.invoke() There is at least 1 reference to internal data" error
    try: 
        interpreter.invoke()
    except:
        interpreter = tf.lite.Interpreter(tflite_model_path)
        interpreter.allocate_tensors()
        interpreter.invoke()

    output = interpreter.get_tensor(output_details["index"])[0]
    top_prediction = output.argmax()
    correct_predictions += (top_prediction == test_labels[i])
 
  print('%s model accuracy is %f%% (Number of test samples=%d)' % (
      model_type, (correct_predictions * 100) / len(test_data), len(test_data)))

In [None]:
# Compute float model accuracy
run_tflite_inference(FLOAT_MODEL_TFLITE)
 
# Compute quantized model accuracy
run_tflite_inference(MODEL_TFLITE, model_type='Quantized')

Float model accuracy is 96.961726% (Number of test samples=7603)
Quantized model accuracy is 96.988031% (Number of test samples=7603)


## Generate a TensorFlow Lite for MicroControllers Model
Convert the TensorFlow Lite model into a C source file that can be loaded by TensorFlow Lite for Microcontrollers.

In [None]:
# Install xxd if it is not available
!apt-get update && apt-get -qq install xxd
# Convert to a C source file
!xxd -i {MODEL_TFLITE} > {MODEL_TFLITE_MICRO}
# Update variable names
REPLACE_TEXT = MODEL_TFLITE.replace('/', '_').replace('.', '_')
!sed -i 's/'{REPLACE_TEXT}'/g_model/g' {MODEL_TFLITE_MICRO}

## Deploy to a Microcontroller

Follow the instructions in the [micro_speech](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro/examples/micro_speech) README.md for [TensorFlow Lite for MicroControllers](https://www.tensorflow.org/lite/microcontrollers/overview) to deploy this model on a specific microcontroller.

**Reference Model:** If you have not modified this notebook, you can follow the instructions as is, to deploy the model. Refer to the [`micro_speech/train/models`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/train/models) directory to access the models generated in this notebook.

**New Model:** If you have generated a new model to identify different words: (i) Update `kCategoryCount` and `kCategoryLabels` in [`micro_speech/micro_features/micro_model_settings.h`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/micro_features/micro_model_settings.h) and (ii) Update the values assigned to the variables defined in [`micro_speech/micro_features/model.cc`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/micro_features/model.cc) with values displayed after running the following cell.

In [None]:
# Print the C source file
# !cat {MODEL_TFLITE_MICRO}

''' Modified by iwatake2222 ---------------------------------------------- '''
!tar czvf models.tgz models