# Weight Quantization Explorer

This notebooks allows you to explore the effect of weight changes to the simple speech commands example model used in TensorFlow Lite Micro.

It allows you to load a model file with float weights, modify the values, and then observe the effect on the model's accuracy.

## Software installation

To be able to modify TensorFlow Lite files we need flatbuffers and TF libraries installed.

In [9]:
%tensorflow_version 2.x
import tensorflow as tf

# Build and install the Flatbuffer compiler.
%cd /content/
!rm -rf flatbuffers*
!curl -L "https://github.com/google/flatbuffers/archive/v1.12.0.zip" -o flatbuffers.zip
!unzip -q flatbuffers.zip
!mv flatbuffers-1.12.0 flatbuffers
%cd flatbuffers
#!git checkout 37a5dee10525cc58908aff99b0aa073bf91b9ba6
!cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release
!make
!cp flatc /usr/local/bin/
%cd /content/
!rm -rf tensorflow
!git clone --depth 1 https://github.com/tensorflow/tensorflow
!flatc --python --gen-object-api tensorflow/tensorflow/lite/schema/schema_v3.fbs

/content
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   124    0   124    0     0    324      0 --:--:-- --:--:-- --:--:--   323
100 1463k    0 1463k    0     0  1417k      0 --:--:--  0:00:01 --:--:-- 1417k
/content/flatbuffers
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for strtof_l
-- Looking for strtof_l - found
-- L

## Utility Functions

To load, save, and evaluate models we need some helper functions.

In [10]:
!flatc --python --gen-object-api tensorflow/tensorflow/lite/schema/schema_v3.fbs

In [11]:
!pip3 install flatbuffers
import flatbuffers
import numpy as np
import sys
# This hackery allows us to import the Python files we've just generated.
sys.path.append("/content/tflite/")
import Model

def load_model_from_file(model_filename):
  with open(model_filename, "rb") as file:
    buffer_data = file.read()
  model_obj = Model.Model.GetRootAsModel(buffer_data, 0)
  model = Model.ModelT.InitFromObj(model_obj)
  return model

def save_model_to_file(model, model_filename):
  builder = flatbuffers.Builder(1024)
  model_offset = model.Pack(builder)
  builder.Finish(model_offset, file_identifier=b'TFL3')
  model_data = builder.Output()
  with open(model_filename, 'wb') as out_file:
    out_file.write(model_data)



## Download a trained model

This pulls down a trained model from the speech commands tutorial. It includes two files, the first is the original float model and the second has weights stored in eight bits using post-training quantization.

In [12]:
!curl -O 'https://storage.googleapis.com/download.tensorflow.org/models/tflite/micro/speech_commands_model_2020_04_27.zip'
!unzip speech_commands_model_2020_04_27.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 78855  100 78855    0     0   394k      0 --:--:-- --:--:-- --:--:--  394k
Archive:  speech_commands_model_2020_04_27.zip
replace speech_commands_model/speech_commands_model_float.tflite? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

## Try your own modifications

The code below loads the float model, applies a small change to the float weights, and evaluates the accuracy of the modified model.

This version rounds the weights in a way that simulates heavy quantization, but try some different functions to see how the affect the accuracy.

In [14]:
model = load_model_from_file('/content/speech_commands_model/speech_commands_model_float.tflite')

for buffer in model.buffers:
  if buffer.data is not None and len(buffer.data) > 1024:
    original_weights = np.frombuffer(buffer.data, dtype=np.float32)
    # print(original_weights)

    # This is the line where the weights are altered.
    # Try replacing it with your own version, for example:
    # munged_weights = np.add(original_weights, 0.002)
    munged_weights = np.round(original_weights * (1/0.02)) * 0.02
    
    # print(munged_weights)
    buffer.data = munged_weights

save_model_to_file(model, '/content/speech_commands_model/speech_commands_model_modified.tflite')

Apply B&Q and generate tflite file

In [17]:
!curl -O 'https://github.com/foss-xtensa/Bin_And_Quant/tree/main/bnq_huff_encodec/bnq_huff_encodec.bin'
model = load_model_from_file('/content/speech_commands_model/speech_commands_model_int8.tflite')
for buffer in model.buffers:
  if buffer.data is not None and len(buffer.data) > 1024:
    original_weights = np.frombuffer(buffer.data, dtype=np.int8)
    print(original_weights)
    with open('/content/weights_to_file.tflite', 'wb') as out_file:
      out_file.write(original_weights)
    print(len(buffer.data))
    with open('/content/bnq_huff_encodec.bin', "rb") as file:
      buffer_data = file.read()
    print(len(buffer_data))
    buffer.data = buffer_data
save_model_to_file(model, '/content/speech_commands_model_bnq_huff_encode.tflite')
#test_model_accuracy('/content/speech_commands_model/speech_commands_model_modified_msb_uint4.tflite')

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  175k    0  175k    0     0   694k      0 --:--:-- --:--:-- --:--:--  694k
[  5  33  41 ... -13  -7 -25]
16000
179830
