https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/train/train_micro_speech_model.ipynb

this is just a temporary notebook to generate a formatted string containing flags for `speech_commands/train.py`

In [1]:
import sys
import glob

In [2]:
sys.path.append("../tensorflow/tensorflow/examples/speech_commands")

In [3]:
import input_data

In [3]:
# inspect train/val/test splits for commonvoice filenames
ups = glob.glob("eleven_word_dataset/extractions_deepspeech/up/*.wav")
downs = glob.glob("eleven_word_dataset/extractions_deepspeech/down/*.wav")

In [7]:
def counts(dataset):
    total = len(dataset)
    training = 0
    val = 0
    testing = 0
    for d in dataset:
        result = input_data.which_set(d, 10, 10)
        if result == 'validation':
            val += 1
        elif result == 'testing':
            testing += 1
        else:
            training += 1
    print(training/total, val/total, testing/total)
    return (total, training, val, testing)
print(counts(ups))
print(counts(downs))

0.7906637490882568 0.1050328227571116 0.10430342815463166
(1371, 1084, 144, 143)
0.7899934167215273 0.10335747202106649 0.10664911125740618
(1519, 1200, 157, 162)


In [1]:
# A comma-delimited list of the words you want to train for.
# The options are: yes,no,up,down,left,right,on,off,stop,go
# All the other words will be used to train an "unknown" label and silent
# audio data with no spoken words will be used to train a "silence" label.
WANTED_WORDS = "up,down,left,right,stop,go,off,on,yes,no"

# The number of steps and learning rates can be specified as comma-separated
# lists to define the rate at each stage. For example,
# TRAINING_STEPS=12000,3000 and LEARNING_RATE=0.001,0.0001
# will run 12,000 training loops in total, with a rate of 0.001 for the first
# 8,000, and 0.0001 for the final 3,000.
TRAINING_STEPS = "12000,3000"
LEARNING_RATE = "0.001,0.0001"

# Calculate the total number of steps, which is used to identify the checkpoint
# file name.
TOTAL_STEPS = str(sum(map(lambda string: int(string), TRAINING_STEPS.split(","))))

# Print the configuration to confirm it
print("Training these words: %s" % WANTED_WORDS)
print("Training steps in each stage: %s" % TRAINING_STEPS)
print("Learning rate in each stage: %s" % LEARNING_RATE)
print("Total number of training steps: %s" % TOTAL_STEPS)

Training these words: up,down,left,right,stop,go,off,on,yes,no
Training steps in each stage: 12000,3000
Learning rate in each stage: 0.001,0.0001
Total number of training steps: 15000


In [4]:
# Calculate the percentage of 'silence' and 'unknown' training samples required
# to ensure that we have equal number of samples for each label.
number_of_labels = WANTED_WORDS.count(',') + 1
number_of_total_labels = number_of_labels + 2 # for 'silence' and 'unknown' label
equal_percentage_of_training_samples = int(100.0/(number_of_total_labels))
SILENT_PERCENTAGE = equal_percentage_of_training_samples
UNKNOWN_PERCENTAGE = equal_percentage_of_training_samples

# Constants which are shared during training and inference
PREPROCESS = 'micro'
WINDOW_STRIDE = 20
MODEL_ARCHITECTURE = 'tiny_conv' # Other options include: single_fc, conv,
                      # low_latency_conv, low_latency_svdf, tiny_embedding_conv

# Constants used during training only
VERBOSITY = 'WARN'
EVAL_STEP_INTERVAL = '1000'
SAVE_STEP_INTERVAL = '1000'

# Constants for training directories and filepaths
#DATASET_DIR =  'dataset/'
DATASET_DIR =  'eleven_word_dataset/extractions_gcloud/'
LOGS_DIR = 'logs/'
TRAIN_DIR = 'train/' # for training checkpoints and other files.

# Constants for inference directories and filepaths
import os
MODELS_DIR = 'models'
#if not os.path.exists(MODELS_DIR):
#  os.mkdir(MODELS_DIR)
MODEL_TF = os.path.join(MODELS_DIR, 'model.pb')
MODEL_TFLITE = os.path.join(MODELS_DIR, 'model.tflite')
FLOAT_MODEL_TFLITE = os.path.join(MODELS_DIR, 'float_model.tflite')
MODEL_TFLITE_MICRO = os.path.join(MODELS_DIR, 'model.cc')
SAVED_MODEL = os.path.join(MODELS_DIR, 'saved_model')

QUANT_INPUT_MIN = 0.0
QUANT_INPUT_MAX = 26.0
QUANT_INPUT_RANGE = QUANT_INPUT_MAX - QUANT_INPUT_MIN

## Original:
```
#!python ../tensorflow/tensorflow/examples/speech_commands/train.py \
!echo\
--data_dir={DATASET_DIR} \
--wanted_words={WANTED_WORDS} \
--silence_percentage={SILENT_PERCENTAGE} \
--unknown_percentage={UNKNOWN_PERCENTAGE} \
--preprocess={PREPROCESS} \
--window_stride={WINDOW_STRIDE} \
--model_architecture={MODEL_ARCHITECTURE} \
--how_many_training_steps={TRAINING_STEPS} \
--learning_rate={LEARNING_RATE} \
--train_dir={TRAIN_DIR} \
--summaries_dir={LOGS_DIR} \
--verbosity={VERBOSITY} \
--eval_step_interval={EVAL_STEP_INTERVAL} \
--save_step_interval={SAVE_STEP_INTERVAL}
```

In [6]:
#!python ../tensorflow/tensorflow/examples/speech_commands/train.py \
!echo\
--data_dir={DATASET_DIR} \
--wanted_words={WANTED_WORDS} \
--background_volume=0.0\
--silence_percentage={SILENT_PERCENTAGE} \
--unknown_percentage={UNKNOWN_PERCENTAGE} \
--preprocess={PREPROCESS} \
--window_stride={WINDOW_STRIDE} \
--model_architecture={MODEL_ARCHITECTURE} \
--how_many_training_steps={TRAINING_STEPS} \
--learning_rate={LEARNING_RATE} \
--train_dir={TRAIN_DIR} \
--summaries_dir={LOGS_DIR} \
--verbosity={VERBOSITY} \
--eval_step_interval={EVAL_STEP_INTERVAL} \
--save_step_interval={SAVE_STEP_INTERVAL}

--data_dir=eleven_word_dataset/extractions_gcloud/ --wanted_words=up,down,left,right,stop,go,off,on,yes,no --background_volume=0.0 --silence_percentage=8 --unknown_percentage=8 --preprocess=micro --window_stride=20 --model_architecture=tiny_conv --how_many_training_steps=12000,3000 --learning_rate=0.001,0.0001 --train_dir=train/ --summaries_dir=logs/ --verbosity=WARN --eval_step_interval=1000 --save_step_interval=1000


`cp -r ../speech_commands/_background_noise_ ../tinyspeech/eleven_word_dataset/extractions_deepspeech/`

From new directory:
```bash
eleven_no_bkgd $ python ../tensorflow/tensorflow/examples/speech_commands/train.py --data_dir=../tinyspeech/eleven_word_dataset/extractions_gcloud/ --wanted_words=up,down,left,right,stop,go,off,on,yes,no --background_volume=0.0 --silence_percentage=8 --unknown_percentage=8 --preprocess=micro --window_stride=20 --model_architecture=tiny_conv --how_many_training_steps=12000,3000 --learning_rate=0.001,0.0001 --train_dir=train/ --summaries_dir=logs/ --verbosity=WARN --eval_step_interval=1000 --save_step_interval=1000
```

```bash
python ../tensorflow/tensorflow/examples/speech_commands/train.py --data_dir=../tinyspeech/eleven_word_dataset/extractions_deepspeech/ --wanted_words=up,down,left,right,stop,go,off,on,yes,no --silence_percentage=8 --unknown_percentage=8 --preprocess=micro --window_stride=20 --model_architecture=tiny_conv --how_many_training_steps=12000,3000 --learning_rate=0.001,0.0001 --train_dir=train/ --summaries_dir=logs/ --verbosity=WARN --eval_step_interval=1000 --save_step_interval=1000
```

For cross-validation (ignores most of these flags)

```bash
python ../tensorflow/tensorflow/examples/speech_commands/eval.py --start_checkpoint=train/tiny_conv.ckpt-15000 --data_dir=../speech_commands/ --wanted_words=up,down,left,right,stop,go,off,on,yes,no --silence_percentage=8 --unknown_percentage=8 --preprocess=micro --window_stride=20 --model_architecture=tiny_conv --how_many_training_steps=12000,3000 --learning_rate=0.001,0.0001 --train_dir=train/ --summaries_dir=logs/ --verbosity=WARN --eval_step_interval=1000 --save_step_interval=1000

###

python ../tensorflow/tensorflow/examples/speech_commands/eval.py --start_checkpoint=train/tiny_conv.ckpt-15000 --data_dir=../speech_commands/ --wanted_words=up,down --silence_percentage=25 --unknown_percentage=25 --preprocess=micro --window_stride=20 --model_architecture=tiny_conv --how_many_training_steps=12000,3000 --learning_rate=0.001,0.0001 --train_dir=train/ --summaries_dir=logs/ --verbosity=WARN --eval_step_interval=1000 --save_step_interval=1000
```