# Automated B&Q on trained trained speech model

This notebooks allows you to apply B&Q compression technique on the pre-trained speech command "yes,no" recognition model. 

The B&Q automation process consists of the following compression and decompression steps: 

Compression: 
1. Extract the original fully connected (FC) weights from uint8 tflite file to original_weights.bin
2. Compress the extracted FC weights and generate the B&Q binary file. Automated B&Q to find the appropriate bin values without losing accuracy. 
3. Create a compressed .tflite file using the B&Q binary file. 

Decompression:
4. Extract the B&Q binary file from the compressed tflite file 
5. Extract the original weights from the B&Q binary file
6. Create the original.tflite file using the original weights binary
7. Test the accurcy of the decompressed model 

This notebook allows you to load a model file with uint8 weights, test the uint8 model accuracy and modify the values using B&Q, and then observe the effect on the model's accuracy.

## Software installation

To be able to modify TensorFlow Lite files we need flatbuffers and TF libraries installed.

In [1]:
%tensorflow_version 1.x
import tensorflow as tf

# Build and install the Flatbuffer compiler.
%cd /content/
!rm -rf flatbuffers
!git clone https://github.com/google/flatbuffers
%cd flatbuffers
!git checkout 37a5dee10525cc58908aff99b0aa073bf91b9ba6
!cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release
!make
!cp flatc /usr/local/bin/
%cd /content/
!rm -rf tensorflow
!git clone --depth 1 https://github.com/tensorflow/tensorflow
!flatc --python --gen-object-api tensorflow/tensorflow/lite/schema/schema_v3.fbs

TensorFlow 1.x selected.
/content
Cloning into 'flatbuffers'...
remote: Enumerating objects: 75, done.[K
remote: Counting objects: 100% (75/75), done.[K
remote: Compressing objects: 100% (66/66), done.[K
remote: Total 20180 (delta 23), reused 21 (delta 8), pack-reused 20105[K
Receiving objects: 100% (20180/20180), 12.55 MiB | 19.18 MiB/s, done.
Resolving deltas: 100% (14048/14048), done.
/content/flatbuffers
Note: checking out '37a5dee10525cc58908aff99b0aa073bf91b9ba6'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 37a5dee1 Code cleanup + updates test and readme (#6004)
-- The C compiler identification is GNU 7.5.0


## Speech Evaluation Setup

To evaluate the accuracy of the speech model we need to load a test data set and some utility classes to read the files and convert them into the right input form for the network.

The full dataset is several gigabytes in size, so it may take a few minutes to download.

In [2]:
%cd /content/

/content


In [3]:
!pip install flatbuffers
import flatbuffers
import matplotlib.pyplot as plt
import numpy as np
import pprint
import re
import sys

# This hackery allows us to import the Python files we've just generated.
sys.path.append("/content/tflite/")
import Model

sys.path.append("/content/tensorflow/tensorflow/examples/speech_commands/")
import input_data
import models

# A comma-delimited list of the words you want to train for.
# The options are: yes,no,up,down,left,right,on,off,stop,go
# All the other words will be used to train an "unknown" label and silent
# audio data with no spoken words will be used to train a "silence" label.
WANTED_WORDS = "yes,no"

# The number of steps and learning rates can be specified as comma-separated
# lists to define the rate at each stage. For example,
# TRAINING_STEPS=12000,3000 and LEARNING_RATE=0.001,0.0001
# will run 12,000 training loops in total, with a rate of 0.001 for the first
# 8,000, and 0.0001 for the final 3,000.
TRAINING_STEPS = "12000,3000"
LEARNING_RATE = "0.001,0.0001"

# Calculate the total number of steps, which is used to identify the checkpoint
# file name.
TOTAL_STEPS = str(sum(map(lambda string: int(string), TRAINING_STEPS.split(","))))

# Calculate the percentage of 'silence' and 'unknown' training samples required
# to ensure that we have equal number of samples for each label.
number_of_labels = WANTED_WORDS.count(',') + 1
number_of_total_labels = number_of_labels + 2 # for 'silence' and 'unknown' label
equal_percentage_of_training_samples = int(100.0/(number_of_total_labels))
SILENT_PERCENTAGE = equal_percentage_of_training_samples
UNKNOWN_PERCENTAGE = equal_percentage_of_training_samples

# Constants which are shared during training and inference
PREPROCESS = 'micro'
WINDOW_STRIDE =20
MODEL_ARCHITECTURE = 'tiny_conv' # Other options include: single_fc, conv,
                      # low_latency_conv, low_latency_svdf, tiny_embedding_conv

# Constants used during training only
VERBOSITY = 'WARN'
EVAL_STEP_INTERVAL = '1000'
SAVE_STEP_INTERVAL = '1000'

# Constants for training directories and filepaths
DATASET_DIR =  'dataset/'
LOGS_DIR = 'logs/'
TRAIN_DIR = 'train/' # for training checkpoints and other files.

SAMPLE_RATE = 16000
CLIP_DURATION_MS = 1000
WINDOW_SIZE_MS = 30.0
FEATURE_BIN_COUNT = 40
BACKGROUND_FREQUENCY = 0.8
BACKGROUND_VOLUME_RANGE = 0.1
TIME_SHIFT_MS = 100.0

DATA_URL = 'https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz'
VALIDATION_PERCENTAGE = 10
TESTING_PERCENTAGE = 10

model_settings = models.prepare_model_settings(
    len(input_data.prepare_words_list(WANTED_WORDS.split(','))),
    SAMPLE_RATE, CLIP_DURATION_MS, WINDOW_SIZE_MS,
    WINDOW_STRIDE, FEATURE_BIN_COUNT, PREPROCESS)
audio_processor = input_data.AudioProcessor(
    DATA_URL, DATASET_DIR,
    SILENT_PERCENTAGE, UNKNOWN_PERCENTAGE,
    WANTED_WORDS.split(','), VALIDATION_PERCENTAGE,
    TESTING_PERCENTAGE, model_settings, LOGS_DIR)

>> Downloading speech_commands_v0.02.tar.gz 99.9%

## Utility Functions

To load, save, and evaluate models we need some helper functions.

In [30]:
def load_model_from_file(model_filename):
  with open(model_filename, "rb") as file:
    buffer_data = file.read()
  model_obj = Model.Model.GetRootAsModel(buffer_data, 0)
  model = Model.ModelT.InitFromObj(model_obj)
  return model

def save_model_to_file(model, model_filename):
  builder = flatbuffers.Builder(1024)
  model_offset = model.Pack(builder)
  builder.Finish(model_offset, file_identifier=b'TFL3')
  model_data = builder.Output()
  with open(model_filename, 'wb') as out_file:
    out_file.write(model_data)

def test_model_accuracy(model_filename):
  with tf.Session() as sess:
    test_data, test_labels = audio_processor.get_data(
        -1, 0, model_settings, 0, 0,
        0, 'testing', sess)

  interpreter = tf.lite.Interpreter(model_filename)
  interpreter.allocate_tensors()

  input_index = interpreter.get_input_details()[0]["index"]

  output_index = interpreter.get_output_details()[0]["index"]
  model_output = interpreter.tensor(output_index)

  correct_predictions = 0
  for i in range(len(test_data)):
    current_input = test_data[i]
    current_label = test_labels[i]
    flattened_input = np.array(current_input.flatten(), dtype=np.uint8).reshape(1, 49,40,1)
    interpreter.set_tensor(input_index, flattened_input)
    interpreter.invoke()
    top_prediction = model_output()[0].argmax()
    if top_prediction == current_label:
      correct_predictions += 1

  #print('Accuracy is %f%% (N=%d)' % ((correct_predictions * 100) / len(test_data), len(test_data)))
  return (correct_predictions * 100) / len(test_data)

## Download a trained model

This pulls down a trained model from the speech commands. It includes two files, the first is the original pb model in float and the second has weights stored in eight bits using post-training quantization.

# Test the uint8 model
Run the below code to identify the uint8 inference accuracy before B&Q compression. 

In [31]:
inital_accuracy = test_model_accuracy('/content/model.tflite')
print("Accuracy is:", inital_accuracy)

Accuracy is: 91.42394822006473


# **B&Q applied on the trained saved model**
The below script runs multiple bins for B&Q and tests inference performance. 
To start the automation process, run the below block. 

# B&Q automation using sensitivity analysis

This automation code performs B&Q with a inital 4 bins. The goal is to find a model which has least accuracy perturbation. 
The sensitivity code perturbs each bin with 20% of the original value and identifies the most sensitive bin. Based on that, each sensitive bin is spilt in the middle and bin values are updated based on these bin rangeValues. 

In [37]:
#load the model 
model = load_model_from_file('/content/model.tflite')
max_bins = '16'
curr_acc = inital_accuracy - 3

top_acc = curr_acc
top_acc_bins = []

#load the inital layer parameters of the FC layer
for buffer in model.buffers:
  if buffer.data is not None and len(buffer.data) > 1024:
    original_weights = np.frombuffer(buffer.data, dtype=np.uint8)
    v2 = np.add(original_weights,0)
    v2_min = v2.min()
    v2_max = v2.max()

RangeValues = [v2_min, np.mean(v2) - np.std(v2), np.mean(v2), np.mean(v2) + np.std(v2), v2_max]
while (curr_acc <= inital_accuracy - 0.2):
  model = load_model_from_file('/content/model.tflite')
  for buffer in model.buffers:
    if buffer.data is not None and len(buffer.data) > 1024:
      original_weights = np.frombuffer(buffer.data, dtype=np.uint8)
      v2 = np.add(original_weights,0)      

      for x in range(len(RangeValues) - 1):
        indices = np.where(np.logical_and(v2>=RangeValues[x], v2<=RangeValues[x+1]))
        v2[indices] = np.uint8((RangeValues[x] + RangeValues[x+1])/2)

      buffer.data = v2

  save_model_to_file(model, '/content/speech_commands_model_modified.tflite')
  curr_acc = test_model_accuracy('/content/speech_commands_model_modified.tflite')
  print("Accuracy for range values:", curr_acc, RangeValues)

  if curr_acc >= top_acc:
    top_acc = curr_acc
    top_bins = RangeValues
    save_model_to_file(model, '/content/speech_commands_model_top_acc_model.tflite')

  #perturbation code to identify the sensitive bin
  bin_acc = []
  for times in range(len(RangeValues) - 1):
    model = load_model_from_file('/content/model.tflite')
    for buffer in model.buffers:
      if buffer.data is not None and len(buffer.data) > 1024:
        original_weights = np.frombuffer(buffer.data, dtype=np.uint8)
        v2 = np.add(original_weights,0)      

        indices = np.where(np.logical_and(v2>=RangeValues[times], v2<=RangeValues[times+1]))
        v2[indices] = np.uint8(v2[indices] + 0.2*v2[indices])

        buffer.data = v2

    save_model_to_file(model, '/content/speech_commands_model_modified.tflite')
    bin_acc.append(test_model_accuracy('/content/speech_commands_model_modified.tflite'))

  print("The most sensitive bin here is:", bin_acc, np.array(bin_acc).argmin())
  to_modify = np.array(bin_acc).argmin()
  middle_bin = (RangeValues[to_modify] + RangeValues[to_modify+1]) / 2
  RangeValues.insert(to_modify+1, middle_bin)
  print("New Range Values:", RangeValues, len(RangeValues))



Accuracy for range values: 90.45307443365695 [1, 97.78707476597417, 128.78125, 159.77542523402582, 255]
The most sensitive bin here is: [91.10032362459548, 90.85760517799353, 91.2621359223301, 89.07766990291262] 3
New Range Values: [1, 97.78707476597417, 128.78125, 159.77542523402582, 207.3877126170129, 255] 6
Accuracy for range values: 90.93851132686085 [1, 97.78707476597417, 128.78125, 159.77542523402582, 207.3877126170129, 255]
The most sensitive bin here is: [90.93851132686085, 90.85760517799353, 91.2621359223301, 90.77669902912622, 86.24595469255664] 4
New Range Values: [1, 97.78707476597417, 128.78125, 159.77542523402582, 207.3877126170129, 231.19385630850644, 255] 7
Accuracy for range values: 90.61488673139158 [1, 97.78707476597417, 128.78125, 159.77542523402582, 207.3877126170129, 231.19385630850644, 255]
The most sensitive bin here is: [91.18122977346279, 90.53398058252426, 91.18122977346279, 90.6957928802589, 90.53398058252426, 91.42394822006473] 1
New Range Values: [1, 97.78

In [42]:
print("Reported top_accuracy and infered test accuracy :", top_acc, test_model_accuracy('/content/speech_commands_model_top_acc_model.tflite'))
print("The RangeValues are:", RangeValues)
print("total_bins", len(RangeValues) - 1)
print("The model is saved in /content/speech_commands_model_top_acc_model.tflite")


Reported top_accuracy and infered test accuracy : 91.42394822006473 91.34304207119742
The RangeValues are: [1, 49.393537382987084, 97.78707476597417, 113.28416238298709, 128.78125, 136.52979380850644, 144.2783376170129, 159.77542523402582, 183.58156892551938, 207.3877126170129, 219.2907844627597, 231.19385630850644, 255]
total_bins 12
The model is saved in /content/speech_commands_model_top_acc_model.tflite


In [None]:
model = load_model_from_file('/content/model.tflite')
bins = '6'
for buffer in model.buffers:
  if buffer.data is not None and len(buffer.data) > 1024:
    print("buffer.data:", buffer)
    original_weights = np.frombuffer(buffer.data, dtype=np.uint8)
    v2 = np.add(original_weights,0)
    v2_min = v2.min()
    v2_max = v2.max()

    print(np.mean(v2), np.std(v2))
    
    # with open('/content/weights_to_file.bin', 'wb') as out_file:
    #   out_file.write(original_weights)
    # print(len(buffer.data))

    if bins == '2':
       RangeValues = [v2_min, 0, v2_max]
    elif bins == '4':
      RangeValues = [v2_min, np.mean(v2) - np.std(v2), np.mean(v2), np.mean(v2) + np.std(v2), v2_max]
    elif bins == '6':
      RangeValues = [v2_min, (RangeValues[0] + RangeValues[1]) /2, np.mean(v2) - np.std(v2), np.mean(v2), np.mean(v2) + np.std(v2), \
                     (RangeValues[3] + RangeValues[4]) /2, v2_max]


    for x in range(len(RangeValues) - 1):
            indices = np.where(np.logical_and(v2>=RangeValues[x], v2<=RangeValues[x+1]))
            v2[indices] = np.uint8((RangeValues[x] + RangeValues[x+1])/2)

    print(np.unique(v2))

    # print(munged_weights)
    buffer.data = v2
    print(buffer.data)
    # with open('/content/bin_data', "rb") as file:
    #   buffer_data = file.read()



save_model_to_file(model, '/content/speech_commands_model_modified.tflite')
test_model_accuracy('/content/speech_commands_model_modified.tflite')

In [None]:
model = load_model_from_file('/content/inception_v1_224_quant.tflite')

for buffer in model.buffers:
  if buffer.data is not None and len(buffer.data) > 32:
    original_weights = np.frombuffer(buffer.data, dtype=np.uint8)
    # print(original_weights)

    # This is the line where the weights are altered.
    # Try replacing it with your own version, for example:
    munged_weights = np.add(original_weights, 1)
    #munged_weights = np.round(original_weights * (1/0.02)) * 0.02
    
    # print(munged_weights)
    buffer.data = munged_weights.tobytes()

save_model_to_file(model, '/content/inception_modified.tflite')
#test_model_accuracy('/content/speech_commands_model/speech_commands_model_modified.tflite')