<a href="https://colab.research.google.com/github/sensiml/AnalyticStudioTutorials/blob/master/sensiml_keyworld_spotting_recognition_with_tensorflow_lite_micro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Keyword Spotting Demo with TensorFlow Lite

*Note: You must have a SensiML Account to use this tutorial. You can sign up for free at https://sensiml.com/plans/community-edition/*

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import math

### SensiML Python SDK

Install the SensiML Python SDK in the Google Collab environment by running the following cell. 

In [None]:
!pip install sensiml -U

Next, we will initialize SensiML Python SDK by running the following cell. After connecting, you will be able to use the SensiML Python SDK to manage the data in your project, create queries, build and test models, and download firmware.

*Further documentation for using the SensiML Python SDK can be found at https://sensiml.com/documentation/sensiml-python-sdk/api-methods/overview.html*

In [None]:
from sensiml import *
import sensiml.tensorflow.utils as sml_tf

Next we are going to connect to our **Keyword Spotting - Demo** Project. Run the following cell to connect to the project using the SensiML Python SDK. 

In [None]:
client = SensiML()
client.project = '<Your Project Name>'

### Pipelines

Pipelines are a key component of the SensiML workflow. Pipelines store the preprocessing, feature extraction and model building steps. When training a model, these steps are executed on the SensiML server. Once the model has been trained, the pipeline is converted to firmware code that will run on your target embedded device. For more documentation on pipelines see the [Getting Started with the SensiML Python SDK Documentation](https://sensiml.com/documentation/sensiml-python-sdk/getting-started-with-the-sensiml-python-sdk.html). To create a new empty pipeline, run the cell below.

In [None]:
client.pipeline = '<Your Pipeline name>'

## Query Your Data

To select the data you want to use in your pipeline you need to add a query step. Queries give us a way to select and filter the data we want to use in our pipeline.

Run the following cell to see the available queries in your project.

In [None]:
client.list_queries()

### Using the Create Query API

We recommend using the **Prepare Data** section of the Analytics Studio at https://app.sensiml.cloud/ to create your query. Alternatively, you can also use the create_query API. See how to use the create_query API below.

Find your segmenter id by using the list_segmenters API

In [None]:
client.list_segmenters()

Next create your query using the create_query API. Replace **Your Segmenter ID** below

In [None]:
client.create_query(
    name="Q1",
    label_column="Label",
    columns=[
        "Microphone",
    ],
    metadata_columns=["Subject"],
    segmenter=<Your Segmenter ID>,
)

View the segments in your query by running the following cell.

In [None]:
q = client.get_query("Q1")
q.statistics_segments().groupby('Label').size().plot(kind='bar')


### Assembling a pipeline 

The pipeline for this tutorial will consist of the following steps:

1.   The **Input Query** which specifies what data is being fed into the model
2.   The **Feature Generators** which specify which features should be extracted from the raw time series data
3.   The **Feature Transform** which specify how to transform the features after extraction. In this case it is to scale them to 1 byte each
4.   The **Feature Selector** which selects the best features. In this case we are using the custom feature selector to downsample the data. 


The code in the following cell sets our initial variables, then specifies each step in the pipeline. For now, you don't need to dig into each of these steps, but just know that the end result will be a feature vector scaled to 1 byte values for each of the segments that were labeled in the Data Capture Lab. We will use these features as input to our TensorFlow model.

In [None]:
client.project.columns()

In [None]:
num_cols=12
row_size=10

client.pipeline.reset()
client.pipeline.set_input_query("Q1")

client.pipeline.add_transform("Windowing", params={"window_size": 400,
                                                "delta": 400,
                                                "train_delta": 0,
                                                "return_segment_index": False,
                                                })

client.pipeline.add_feature_generator([{'name':'MFCC', 'params':{"columns": ["Microphone"],
                                    "sample_rate": 16000,
                                    "cepstra_count": num_cols,
                                    }}])

client.pipeline.add_transform("Feature Cascade", params={"num_cascades": row_size ,
                                "slide": True,
                                })

client.pipeline.add_transform("Min Max Scale", params={"min_bound": 0,
                                "max_bound": 255,
                                "pad": 0,
                                "feature_min_max_defaults":{'minimum':-500000, 'maximum':500000.0},
                                })

To see the steps currently added to your pipeline you can use the describe method

In [None]:
client.pipeline.describe()

### Executing a Pipeline

At this point the pipeline has not been executed yet, we are just assembling the steps. To run the pipeline, execute the following cell. This will print out a summary of the pipeline steps that are being run followed by the status of the pipeline execution on the server.

Once the pipeline is finished running, the results will be stored in the variable *fv_t*. A summary of the execution is stored in the *s_t* variable. 


The pipeline will report status updates that tell you which step is currently running, how many batches (parallel steps) have been run, and how many steps are left. This process is run on the server, so if you get disconnected you can run the client.pipeline.get_results() to check the status of your pipeline.

Another thing to note is that intermediate steps are cached. you can retrieve data from intermediate steps by using client.pipeline.data(<index_of_step>). If you make any changes to your pipeline, only steps after the change will need to rerun.


In [None]:
fv_s, s= client.pipeline.execute()

## SensiML Features and TensorFlow Model

Now we have our features for this model, we will go ahead and train a TensorFlow Model in the colab environment. We will start by splitting our dataset into train, test and validate groups. The SensiML Python SDK has a built-in function for performing this split. You can also pass in the validation data test sizes. By default, they are set to 10% each.

In [None]:


x_train, x_validate, x_test, y_train, y_validate, y_test,  class_map = \
    client.pipeline.features_to_tensor(fv_s, test=.1, validate=.1)

### Feature Visualization
Let's take a quick look at the features that we have created. Run the following cell to see the heatmap (left) along with the raw features (right).  To switch which event you are looking at change the event_index.

In [None]:
import seaborn as sn
event_index=50
sn.heatmap(x_train[event_index].reshape(row_size,-1).T)

### Data Augmentation

It is often useful to perform data augmentation to make models more robust to unseen data. One data augmentation strategy is to apply masks in the feature space. For this tutorial, we will add masks randomly across time and sensors. To implement the masking in TensorFlow we use the tf.data API. The steps in the following cell are described by:

1.   Specify variables which describe the properties of our masking
2.   Convert our training dataset that is currently a dataframe into a tf.data bject
3.   Add a noise layer which randomly adds noise to the features.
4.   Apply a time mask will randomly mask up to rand_max time slices.
5.   Apply a sensor mask which will randomly mask up to rand_max sensors
6.   Apply a shuffler which will randomize our dataset after each training iteration.

For more information about the tf.data.DataSet see the documentation [here](https://www.tensorflow.org/guide/data)

In [None]:
rand_max = 1
min_noise=-10
max_noise=10
mask_value=127
batch_size =32

data  = tf.data.Dataset.from_tensor_slices((x_train, y_train))
noise_ds = data.map(lambda image, label: sml_tf.add_image_noise(image, label, rand_max, min_noise, max_noise))
freq_ds = noise_ds.map(lambda image, label: sml_tf.mask_image_row(image, label, rand_max, num_cols, mask_value))
masked_ds = freq_ds.map(lambda image, label: sml_tf.mask_image_column(image, label, row_size, rand_max, num_cols, mask_value))


# Augmented Data
shuffle_aug = masked_ds.shuffle(buffer_size=x_train.shape[0], reshuffle_each_iteration=True).batch(batch_size)

# Shuffled Data
shuffle_ds = data.shuffle(buffer_size=x_train.shape[0], reshuffle_each_iteration=True).batch(batch_size)

To visualize the tf data you can use this helper function, switch the index to look at different feature maps.

In [None]:
sml_tf.visualize_tf_data(masked_ds, index=7, row_size=row_size, num_cols=num_cols)

### Specifying the TensorFlow Model

The Next step is to define what our TensorFlow model looks like. For this tutorial we are going to use the TensorFlow Keras API to create a [Convolution Neural Network](https://en.wikipedia.org/wiki/Convolutional_neural_network) (CNN). When you are building a model to deploy on a microcontroller, it is important to remember that all functions of tensorflow are not suitable for a micro controller. Additionally, only a subset of TensorFlow functions are available as part of TensorFlow Lite Micro. For a full list of available functions see the [all_ops_resolver.cc](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/kernels/all_ops_resolver.cc). 

For this tutorial use a 1D convolution followed by a 2D convolution and finally a fully connected layer to efficiently classify the boxing gestures. Our aim is to limit the number and size of every layer in the model to only those necessary to get our desired accuracy. Often you will find that you need to make a trade off between latency/memory usage and accuracy in order to get a model that will work well on your microcontroller.


In [None]:
optimization_metric = "accuracy"
# We'll use Keras to create a simple model architecture
from tensorflow.keras import layers
import tensorflow as tf

tf_model = tf.keras.Sequential()
tf_model.add(layers.Reshape((10, 12, 1), input_shape=(120,)))
tf_model.add(layers.Conv2D(32, (2, 6), padding="valid", activation="relu",  strides=[2,2]))
tf_model.add(layers.MaxPool2D((2, 2)))
tf_model.add(layers.Dropout(0.1))
tf_model.add(layers.Conv2D(16, (2, 2), padding="valid", activation="relu"))
#tf_model.add(layers.Dropout(0.1))
#print(tf_model.summary())
#tf_model.add(layers.MaxPool2D((1, 2), padding="valid"))
tf_model.add(layers.Flatten())
# Final layer is a single neuron, since we want to output a single value
tf_model.add(layers.Dense(18, activation='relu', ))
tf_model.add(layers.Dense(len(class_map.keys()), activation='sigmoid'))
# Compile the model using a standard optimizer and loss function for regression
tf_model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=[optimization_metric])
tf_model.summary()

train_history = {'loss':[], 'val_loss':[], 'accuracy':[], 'val_accuracy':[]}

### Training the TensorFlow Model

After defining the model graph, it is time to train the model. Training a NN consists of iterating through batches of your training dataset multiple times, each time it loops through the entire training set is called an epoch. For each batch of data, the loss function is computed and the weights of the layers in the network are adjusted.  

The following cell will loop through the training data num_iterations of times. Each time running a specific number of epochs. After each iteration, the visualizations for the loss, accuracy and confusion matrix will be updated for both the validation and training data sets. You can use this to see how the model is progressing. 

Sometimes the initialization of the model gets stuck in a bad state, we put a check in that will call reset_weights to reinitialize the network weights if the model is stuck in a low accuracy state. Other methods could be to adjust the learning rate. If your reaches the desired accuracy before the training completes, hit the stop button at the left side of the cell and you can continue the tutorial with the model at its current state. 

In [None]:
from IPython.display import clear_output

num_iterations=10
for i in range(num_iterations):
    history = tf_model.fit( shuffle_aug, #shuffle_ds
                              epochs=50,
                              batch_size=batch_size, 
                              validation_data=(x_validate, y_validate),
                              verbose=0)
    clear_output()    

    for key in train_history:
        train_history[key].extend(history.history[key])

    sml_tf.plot_training_results(tf_model, train_history, x_train, y_train, x_validate, y_validate)

    if train_history['accuracy'][-1] <= .3 and train_history['accuracy'][-25] <= (train_history['accuracy'][-1]+5):
        print('reinitializing weights')
        sml_tf.reset_weights(tf_model)
        train_history = {'loss':[], 'val_loss':[], 'accuracy':[], 'val_accuracy':[]}

### Quantize the TensorFlow Model

Now that you have trained a neural network with TensorFlow, we are going to use the built-in tools to quantize it. Quantization of NN allows use to reduce the model size by up to 4x by converting the network weights from 4-byte floating point values to 1-byte uint8. This can be done without sacrificing much in terms of accuracy. The best way to perform quantization is still an active area of research. For this tutorial, we will use the built-in methods that are provided as part of TensorFlow. 

*   The ```representative_dataset_generator()``` function is necessary to provide statistical information about your dataset in order to quantize the model weights appropriately. 
*   The TFLiteConverter is used to convert a TensorFlow model into a TensorFlow Lite model. The TensorFlow Lite model is stored as a [flatbuffer](https://google.github.io/flatbuffers/) which allows us to easily store and access it on embedded systems.

In [None]:
def representative_dataset_generator():  
  for value in x_test:   
  # Each scalar value must be inside of a 2D array that is wrapped in a list    
    yield [np.array(value, dtype=np.float32, ndmin=2)]

# Unquantized Model
converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
tflite_model_full = converter.convert()
print("Full Model Size", len(tflite_model_full))

# Quantized Model
converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
converter.experimental_new_converter = False
converter.representative_dataset = representative_dataset_generator
tflite_model_quant = converter.convert()

print("Quantized Model Size", len(tflite_model_quant))

As you can see the Quantized model is significantly smaller than the unquantized model. An additional benefit of quantizing the model is that tensorflow lite micro is able to take advantage of specialized instructions on cortex M chips using the [cmsis-nn](http://www.keil.com/pack/doc/cmsis/NN/html/index.html) DSP library which gives another huge boost in performance. For more information on TensorFlow Lite for microcontrollers you can check out the excellent [tinyml](https://www.oreilly.com/library/view/tinyml/9781492052036/) book by Pete Warden.

Now that you have trained your model, its time to upload it as the final step in your pipeline. From here you will be able to test the entire pipeline against test data as well as download the firmware which can be flashed to run locally on your embedded device. To do this we will use the **Load Model TF Micro** function. We will convert the tflite model to and upload it to the SensiML Cloud server. 

*Note: An additional parameter that can specified here is the threshold. We set it to .2 in this example. Any classification with a value less than the threshold will be returned as 0 which is reserved for an **unknown** classification.*

In [None]:
class_map_tmp = {k:v+1 for k,v in class_map.items()} #increment by 1 as 0 corresponds to unknown

client.pipeline.set_training_algorithm("Load Model TF Micro",
                                    params={"model_parameters": {'tflite': sml_tf.convert_tf_lite(tflite_model_quant)},
                                            "class_map": class_map_tmp,
                                            "estimator_type": "classification",
                                            "threshold": 0.9, # must be above this value otherwise is unknown
                                            })

client.pipeline.set_validation_method("Recall", params={})

client.pipeline.set_classifier("TF Micro", params={})

client.pipeline.set_tvo()

results, stats = client.pipeline.execute()

### Model summary
After executing the pipeline, the cloud computes a model summary as well as a confusion matrix. The model summary gives a quick overview of the model performance so we can see what the accuracy of the quantized model was across our data set.

In [None]:
results.summarize()

### Confusion Matrix

The confusion matrix provides information not only about the accuracy, but also what sort of misclassifications occurred. The confusion matrix is often one of the best ways to understand how your model is performing, as you can see which classes are difficult to distinguish between. The confusion matrix here also includes the Sensitivity and Positive Predictivity for each class along with the overall accuracy. If you raise the threshold value, you will notice that some of the class start showing up as having UNK values. This corresponds to an unknown class and is useful for filtering out False Positives or detecting anomalous states.

In [None]:
model = results.configurations[0].models[0]
model.confusion_matrix_stats['validation']

*Finally*, we save the model knowledge pack with a name. This tells the server to persist this model and will let us retrieve and view the metrics in the future. Models that are not saved will be deleted when the pipeline is rerun. 

In [None]:
model.knowledgepack.save("TensorFlow_model_1")

### Offline Model Validation

After saving your model, you can return back to the Analytics Studio at https://app.sensiml.cloud to continue validating and generating your firmware. To test your model against any of the captured data files select the pipeline and 

1.   Go to the Test Model tab of the Analytics Studio.
2.   Select the model you want to test.
3.   Select any of the capture files in the Project.
4.   Click Compute Accuracy to classify that capture using the selected model.

The model will be compiled in the SensiML Cloud and the output of the classification will be returned. The graph shows the segment start and segment classified for all of the detected events.

We can also test the model through the SensiML Python SDK by selecting a capture and running it through the model recognize_signal API below

In [None]:
client.list_captures()

In [None]:
res,s = model.recognize_signal(capturefile="off_017c4098_nohash_2.csv")

In [None]:
res