# Preface

> TensorFlow Lite it’s not
a framework for training models, but a supplementary set of tools designed to meet
all the constraints of mobile and embedded systems. *Laurence Moroney, AI and ML for Coders*. 

TF-Lite should broadly be seen as two main things: **a converter** that takes your TensorFlow
model and converts it to the .tflite format, shrinking and optimizing it, and a **suite of
interpreters** for various runtimes.


<center><img width="600" src="https://drive.google.com/uc?export=view&id=1XIDqEiOl4F33y5mkDQKr93oShFM2qAnS"></center>


Note that not every operation (or “op”) in TensorFlow is presently supported in
TensorFlow Lite or the TensorFlow Lite converter. You may encounter this issue
when [converting](https://www.tensorflow.org/lite/models/convert/convert_models) models, and it’s always a good idea to check the [documentation](https://www.tensorflow.org/lite/guide/ops_compatibility) for
details. 



# Walkthrough: Creating and Converting a Model to TensorFlow Lite

We’ll begin with a step-by-step walkthrough showing how to create a simple model with TensorFlow, convert it to the TensorFlow Lite format, and then use the Tensor‐
Flow Lite interpreter. 

For the sake of understanding, we will evaluate a very simple TensorFlow model that learned the relationship between two sets of numbers that ended up as:

$
y = 2x – 1
$

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

In [None]:
layer_0 = Dense(units=1, input_shape=[1])
model = Sequential([layer_0])
model.compile(optimizer='sgd', loss='mean_squared_error')

xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)

model.fit(xs, ys, epochs=500, verbose=0)

print(model.predict([10.0]))
# get_weights() return the weights and bias
print("Here is what I learned: {}".format(layer_0.get_weights()))

## Save the model

The TensorFlow Lite converter works on a number of [different file formats](https://www.tensorflow.org/lite/models/convert), including
SavedModel (preferred) and the Keras H5 format. For this exercise we’ll use
SavedModel. Once you have the saved model, you can convert it using the TensorFlow Lite
converter.

In [None]:
# Save the model
export_dir = 'saved_model/1'
tf.saved_model.save(model, export_dir)

## Convert and Save the Mode

The TensorFlow Lite converter is in the ``tf.lite`` package. You can call it to convert a
saved model by first invoking it with the ``from_saved_model`` method, passing it the
directory containing the saved model, and then invoking its convert method.

In [None]:
# Convert the model.
converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)
tflite_model = converter.convert()

You can then save out the new ``.tflite`` model using ``pathlib``.

In [None]:
import pathlib
tflite_model_file = pathlib.Path('model.tflite')
tflite_model_file.write_bytes(tflite_model)

At this point, you have a ``.tflite`` file that you can use in any of the interpreter environments. Let’s use the Python-based
interpreter so you can run it in Colab. This same interpreter can be used in embedded Linux environments like a Raspberry Pi!

##  Load the TFLite Model and Allocate Tensors

The next step is to **load the model into the interpreter**, **allocate tensors** that will be
used for inputting data to the model for prediction, and then read the predictions that
the model outputs. 

> This is where using ``TensorFlow Lite``, from a programmer’s perspective, greatly differs from using TensorFlow. 

With TensorFlow you can just say
``model.predict(something)`` and get the results, but because TensorFlow Lite won’t
have many of the dependencies that TensorFlow does, particularly in non-Python
environments, you now have to get a bit more low-level and deal with the input and
output tensors, formatting your data to fit them and parsing the output in a way that makes sense for your device.

First, load the model and allocate the tensors:

In [None]:
# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()

Then you can get the input and output details from the model, so you can begin to
understand what data format it expects, and what data format it will provide back to you:

In [None]:
from pprint import pprint
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
pprint(input_details)

**First, let’s inspect the input parameter**. Note the shape setting, which is an array of
type ``[1,1]``. Also note the class, which is ``numpy.float32``. These settings will dictate the shape of the input data and its format.

So, in order to format the input data, you’ll need to use code like this to define the
input array shape and type if you want to predict the ``y`` for ``x=10.0``:

In [None]:
to_predict = np.array([[10.0]], dtype=np.float32)
print(to_predict)
print(to_predict.shape)

The double brackets around the 10.0 can cause a little confusion—the mnemonic I
use for the ``array[1,1]`` here is to say that there is 1 list, giving us the first set of ``[]``,
and that list contains just 1 value, which is ``[10.0]``, thus giving ``[[10.0]]``. It can also
be confusing that the shape is defined as ``dtype=int32``, whereas you’re using
``numpy.float32``. The dtype parameter is the data type defining the shape, not the
contents of the list that is encapsulated in that shape. For that, you’ll use the class.

The output details are very similar, and what you want to keep an eye on here is the
shape. Because it’s also an array of type ``[1,1]``, you can expect the answer to be ``[[y]]``
in much the same way as the input was ``[[x]]``:

In [None]:
pprint(output_details)

## Perform the Prediction

To get the interpreter to do the prediction, you set the input tensor with the value to predict, telling it what input value to use:

In [None]:
interpreter.set_tensor(input_details[0]['index'], to_predict)
interpreter.invoke()

The input tensor is specified using the index of the array of input details. In this case
you have a very simple model that has only a single input option, so it’s
``input_details[0]``, and you’ll address it at the index. Input details item 0 has only
one index, indexed at 0, and it expects a shape of ``[1,1]`` as defined earlier. So, you put the ``to_predict`` value in there. Then you invoke the interpreter with the ``invoke``
method.

You can then read the prediction by calling ``get_tensor`` and supplying it with the details of the tensor you want to read:

In [None]:
tflite_results = interpreter.get_tensor(output_details[0]['index'])
print(tflite_results)

Given that this is a very simple example, let’s look at something a little more complex next—using transfer learning on a well-known image classification model, and then converting that for TensorFlow Lite. From there we’ll also be able to better explore the impacts of optimizing and quantizing the model.

## Model Visualization

Use [Netron](https://netron.app/) to visualize the model. Netron is a viewer for neural network, deep learning and machine learning models. Just load the file ``model.tflite``on it. Note that convertion of TF model into a TF-Lite not considered quantization (all weights and bias are ``float32``).


<center><img width="600" src="https://drive.google.com/uc?export=view&id=1pasZ7O6NFK8OVxNBg_EXXGQhXl_qZPgs"></center>
