# Droop, There it is: Smart Irrigation with Droop Detection

This notebook will use transfer learning to train a MobileNetV2-based classifier and teach a new, much smaller CNN, to match the teacher model using [knowledge distillation](https://arxiv.org/pdf/2002.03532.pdf). We will then prepare and quantize the final student model for deployment onto the [Arduino Nano 33 BLE sense](https://store.arduino.cc/usa/nano-33-ble-sense).

To start, we'll handle some imports and set variables for parameters and model assets.

In [1]:
import os
import numpy as np

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

In [2]:
MODELS_DIR = "tflite-models/"
if not os.path.exists(MODELS_DIR):
    os.mkdir(MODELS_DIR)
MODEL_TF = MODELS_DIR + "model"
MODEL_NO_QUANT_TFLITE = MODELS_DIR + "model_no_quant.tflite"
MODEL_TFLITE = MODELS_DIR + "model.tflite"
MODEL_TFLITE_MICRO = MODELS_DIR + "model.cc"

In [3]:
# PARAMETERS
batch_size = 128 
img_width = img_height = 128
t_epochs = 20 
d_epochs = 50 
d_alpha = 0.1 
temps = 1 
aug_factor = 3

Next, we create our datasets for training and deployment. The `data/` directory is expected to contain subdirectories of image samples, with directory names matching image labels. 

```
data/   
│
└───droop/
│   │   sample011.jpg
│   │   sample012.jpg
│   │
└───no-droop/
│   │   sample011.jpg
│   │   sample012.jpg
│   │
```

In [4]:
# Data Preparation
data_augmentation = tf.keras.Sequential([
  layers.experimental.preprocessing.RandomFlip("horizontal"),
  layers.experimental.preprocessing.RandomRotation(0.05),
  layers.experimental.preprocessing.RandomContrast(0.2)
])


train_ds = keras.preprocessing.image_dataset_from_directory(
  "data/",
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

test_ds = keras.preprocessing.image_dataset_from_directory(
  "data/",
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

train_ds = (
    train_ds.repeat(aug_factor).map(lambda x, y: (data_augmentation(x, training=True), y))
            )

Found 5817 files belonging to 2 classes.
Using 4654 files for training.
Found 5817 files belonging to 2 classes.
Using 1163 files for validation.


## Train

We create a custom keras model class called `Distiller` to train the student model to match the teacher model. 

This new "model" will perform a forward pass of both the teacher and student, then calculate the loss with weighting of the student_loss and distillation_loss by alpha and 1 - alpha, and finally perform the backward pass.

In [5]:
class Distiller(keras.Model):
    """## Construct `Distiller()` class

    The custom `Distiller()` class, overrides the `Model` methods `train_step`, `test_step`,
    and `compile()`. In order to use the distiller, we need:

    - A trained teacher model
    - A student model to train
    - A student loss function on the difference between student predictions and ground-truth
    - A distillation loss function, along with a `temperature`, on the difference between the
    soft student predictions and the soft teacher labels
    - An `alpha` factor to weight the student and distillation loss
    - An optimizer for the student and (optional) metrics to evaluate performance

    In the `train_step` method, we perform a forward pass of both the teacher and student,
    calculate the loss with weighting of the `student_loss` and `distillation_loss` by `alpha` and
    `1 - alpha`, respectively, and perform the backward pass. Note: only the student weights are updated,
    and therefore we only calculate the gradients for the student weights.

    In the `test_step` method, we evaluate the student model on the provided dataset.
    """
    def __init__(self, student, teacher):
        super(Distiller, self).__init__()
        self.teacher = teacher
        self.student = student

    def compile(
        self,
        optimizer,
        metrics,
        student_loss_fn,
        distillation_loss_fn,
        alpha=0.1,
        temperature=3,
        epochs=20,
    ):
        """ Configure the distiller.

        Args:
            optimizer: Keras optimizer for the student weights
            metrics: Keras metrics for evaluation
            student_loss_fn: Loss function of difference between student
                predictions and ground-truth
            distillation_loss_fn: Loss function of difference between soft
                student predictions and soft teacher predictions
            alpha: weight to student_loss_fn and 1-alpha to distillation_loss_fn
            temperature: Temperature for softening probability distributions.
                Larger temperature gives softer distributions.
        """
        super(Distiller, self).compile(optimizer=optimizer, metrics=metrics)
        self.student_loss_fn = student_loss_fn
        self.distillation_loss_fn = distillation_loss_fn
        self.alpha = alpha
        self.temperature = temperature
        self.epochs = epochs

    def train_step(self, data, batch_size=32):
        x, y = data

        # Forward pass of teacher
        teacher_predictions = self.teacher(x, training=False)

        with tf.GradientTape() as tape:
            # Forward pass of student
            student_predictions = self.student(x, training=True)

            # Compute losses
            student_loss = self.student_loss_fn(y, student_predictions)
            distillation_loss = self.distillation_loss_fn(
                tf.nn.softmax(teacher_predictions / self.temperature, axis=1),
                tf.nn.softmax(student_predictions / self.temperature, axis=1),
            )
            loss = self.alpha * student_loss + (1 - self.alpha) * distillation_loss

        # Compute gradients
        trainable_vars = self.student.trainable_variables
        gradients = tape.gradient(loss, trainable_vars)

        # Update weights
        self.optimizer.apply_gradients(zip(gradients, trainable_vars))

        # Update the metrics configured in `compile()`.
        self.compiled_metrics.update_state(y, student_predictions)

        # Return a dict of performance
        results = {m.name: m.result() for m in self.metrics}
        results.update(
            {"student_xentropy": student_loss, "KL_logits": distillation_loss}
        )
        return results

    def test_step(self, data):
        # Unpack the data
        x, y = data

        # Compute predictions
        y_prediction = self.student(x, training=False)

        # Calculate the loss
        student_loss = self.student_loss_fn(y, y_prediction)

        # Update the metrics.
        self.compiled_metrics.update_state(y, y_prediction)

        # Return a dict of performance
        results = {m.name: m.result() for m in self.metrics}
        results.update({"student_xentropy": student_loss})
        return results

First, we train the teacher model. We can see that the teacher model will be comprised of 2,422,210 parameters.

In [6]:
# Teacher Model
input_tensor = layers.Input(shape=(img_height, img_width, 3))

baseModel = keras.applications.MobileNetV2(
    include_top=False,
    weights="imagenet",
    input_tensor=input_tensor,
    input_shape=None,
    pooling=None)

baseModel.trainable=False

x = layers.experimental.preprocessing.Rescaling(scale=1./127.5, offset=-1)(input_tensor)
x = baseModel.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128)(x)
predictions = layers.Dense(2)(x)
teacher = keras.models.Model(inputs=baseModel.input, outputs=predictions, name="teacher")

teacher.summary()

Model: "teacher"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 128, 128, 3) 0                                            
__________________________________________________________________________________________________
Conv1 (Conv2D)                  (None, 64, 64, 32)   864         input_1[0][0]                    
__________________________________________________________________________________________________
bn_Conv1 (BatchNormalization)   (None, 64, 64, 32)   128         Conv1[0][0]                      
__________________________________________________________________________________________________
Conv1_relu (ReLU)               (None, 64, 64, 32)   0           bn_Conv1[0][0]                   
____________________________________________________________________________________________

In [7]:
## Train the teacher
teacher_callbacks = [
    keras.callbacks.EarlyStopping(monitor="accuracy", patience=3),
    keras.callbacks.TensorBoard(log_dir="./logs/teacher/"),
]

teacher.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")],
)

teacher.fit(train_ds, epochs=t_epochs, callbacks=teacher_callbacks)
teacher.evaluate(test_ds)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20


[0.5360531210899353, 0.7308684587478638]

Next, we initialize and train our student model using the `Distiller` class from earlier. Notice the reduction of parameters of this model to just 6,850.

In [8]:
# Student Model

student = keras.Sequential(
    [   layers.Lambda(lambda x: tf.image.rgb_to_grayscale(x)),
        keras.layers.experimental.preprocessing.Resizing(32, 32, interpolation="bilinear"),
        tf.keras.layers.experimental.preprocessing.Rescaling(scale=1./127.5, offset=-1),
        layers.Conv2D(16, (3, 3), strides=(1, 1), padding="same", input_shape=(32,32,1)),
        layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"),
        layers.Conv2D(16, (3, 3), strides=(2, 2), padding="same"),
        layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"),
        layers.Conv2D(16, (3, 3), strides=(2, 2), padding="same"),
        layers.Flatten(),
        layers.Dense(2, activation="softmax"),
    ],
    name="student",
)

In [9]:
## Distill teacher to student
student_callbacks = [
    keras.callbacks.EarlyStopping(monitor="accuracy", patience=3),
    keras.callbacks.TensorBoard(log_dir="./logs/student/"),
]

distiller = Distiller(student=student, teacher=teacher)
distiller.compile(
    optimizer=keras.optimizers.Adam(),
    metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")],
    student_loss_fn=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    distillation_loss_fn=keras.losses.KLDivergence(),
    alpha=d_alpha,
    temperature=temps,
)

distiller.fit(train_ds, epochs=d_epochs, callbacks=student_callbacks)
distiller.evaluate(test_ds)


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50


[0.7050730586051941, 0.561350405216217]

In [10]:
final_layers = distiller.student.layers[3:]
final_model = keras.Sequential(final_layers, name="distilled_droop_detection")
final_model.summary()

Model: "distilled_droop_detection"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 32, 32, 16)        160       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 32, 32, 16)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 16, 16, 16)        2320      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 16)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 8, 8, 16)          2320      
_________________________________________________________________
flatten (Flatten)            (None, 1024)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)   

In [11]:
final_model.save(MODEL_TF)

INFO:tensorflow:Assets written to: tflite-models/model/assets


## Deploy

Having saved our student model, we'll prepare it for deployment by converting to a tflite model and quantizing before converting it into a C source file that can be loaded by tflite running on the Arduino.

In order to quantize, we need to feed the converter representative samples of the data the model was trained on. 

In [12]:
image_transform = tf.keras.Sequential([
    layers.Lambda(lambda x: tf.image.rgb_to_grayscale(x)),
    keras.layers.experimental.preprocessing.Resizing(32, 32, interpolation='bilinear'),
    keras.layers.experimental.preprocessing.Rescaling(scale=1./127.5, offset=-1)
])

In [13]:
sample_iter = test_ds.map(lambda x, y: (image_transform(x, training=False), y)).as_numpy_iterator()

for i in range(1):
    sample = next(sample_iter)[0]
    
print("Number of samples: {}".format(sample.shape[0]))

Number of samples: 128


In [14]:
# Convert the model to the TensorFlow Lite format without quantization
converter = tf.lite.TFLiteConverter.from_saved_model(MODEL_TF)
model_no_quant_tflite = converter.convert()

# Save the model to disk
open(MODEL_NO_QUANT_TFLITE, "wb").write(model_no_quant_tflite)

# Convert the model to the TensorFlow Lite format with quantization
def representative_dataset():
    for i in range(batch_size):
        yield([np.expand_dims(sample[i], axis=0)])
        
        
# Set the optimization flag.
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Enforce integer only quantization
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
# Provide a representative dataset to ensure we quantize correctly.
converter.representative_dataset = representative_dataset
model_tflite = converter.convert()

# Save the model to disk
open(MODEL_TFLITE, "wb").write(model_tflite)

12072

Now, we're ready to convert this tflite model into a C source file! 

Update the contents of the `arduino/droop_detection/droop_detection_model_data.cpp` file with the values displayed here after running the `cat` command.

In [15]:
# Install xxd if it is not available
#!apt-get update && apt-get -qq install xxd

# Convert to a C source file, i.e, a TensorFlow Lite for Microcontrollers model
!xxd -i {MODEL_TFLITE} > {MODEL_TFLITE_MICRO}
# Update variable names
REPLACE_TEXT = MODEL_TFLITE.replace('/', '_').replace('.', '_')
!sed -i 's/'{REPLACE_TEXT}'/g_model/g' {MODEL_TFLITE_MICRO}

In [16]:
!cat {MODEL_TFLITE_MICRO}

unsigned char tflite_models_model_tflite[] = {
  0x20, 0x00, 0x00, 0x00, 0x54, 0x46, 0x4c, 0x33, 0x00, 0x00, 0x00, 0x00,
  0x14, 0x00, 0x20, 0x00, 0x1c, 0x00, 0x18, 0x00, 0x14, 0x00, 0x10, 0x00,
  0x0c, 0x00, 0x00, 0x00, 0x08, 0x00, 0x04, 0x00, 0x14, 0x00, 0x00, 0x00,
  0x1c, 0x00, 0x00, 0x00, 0x98, 0x00, 0x00, 0x00, 0xc8, 0x00, 0x00, 0x00,
  0x28, 0x1d, 0x00, 0x00, 0x38, 0x1d, 0x00, 0x00, 0x50, 0x2e, 0x00, 0x00,
  0x03, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00,
  0x44, 0xd2, 0xff, 0xff, 0x10, 0x00, 0x00, 0x00, 0x18, 0x00, 0x00, 0x00,
  0x28, 0x00, 0x00, 0x00, 0x44, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00,
  0x73, 0x65, 0x72, 0x76, 0x65, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x00, 0x00,
  0x73, 0x65, 0x72, 0x76, 0x69, 0x6e, 0x67, 0x5f, 0x64, 0x65, 0x66, 0x61,
  0x75, 0x6c, 0x74, 0x00, 0x01, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00,
  0xbc, 0xff, 0xff, 0xff, 0x11, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00,
  0x07, 0x00, 0x00, 0x00, 0x64, 0x65, 0x6e, 0x73, 0x

## References

* [tflite-micro](https://github.com/tensorflow/tflite-micro)
* [knowledge distillation notebook](https://keras.io/examples/vision/knowledge_distillation/)