# Project Notes:
I was going to push everything to GitLab with a requirements.txt and pull everything here from GitLab; but I had a hardware issue with my computer so I had to take everything to Google Colaboratory. You may ignore my pushes in GitLab except for this notebook as they will not be the finalized form.
[This](https://www.oreilly.com/content/compressing-and-regularizing-deep-neural-networks/) and [this webpage](https://blog.xmartlabs.com/2020/06/01/how-to-speed-up-inference-in-your-deep-learning-model/) has very good insights and analysis as well.

My aim with this notebook is to try out different approaches to test what might work the best. All methods that reduce the file size or parameter sizes has their own advantages and disadvantages. Each technique can be examined in detail and the best approach can be selected after testing many possibilities. 


## List of ideas to try:
* Weight Sharing
* Model Pruning
* Knowledge Distillation: Non destructive; but we define a learner model.
* Low Rank Matrix and Tensor Decompositions: See [this](https://arxiv.org/abs/2006.06443) link.
* Quantization (Quantization Aware Training OR Post-Training Quantization): Available with TensorFlow.
* Clustering: See [this](https://www.tensorflow.org/model_optimization/guide/clustering/clustering_example) link.

## Notes:
* Made the methods such a way that it can prune any given h5 or tf model file.
* Further (deeper) methods can be trained on top of this network given with the code segment. Deeper models can achieve higher capacities and can be pruned even further. Clustering and quantization can compress the network size even more; then this can be converted to tflite model for mobile deployment.

# Install necessary dependencies: 

* "tensorflow-model-optimization" contains many algorithms that involve network pruning, quantization and encodings. This library is an easy way to deploy models fast; however it is not compatible with many types of layers that TF has.
* 'tfcoreml' and 'coremltools' are used to convert TensorFlow models to CoreML outputs for better compatibility of iOS environment. I could directly use CoreML but I don't have access to necessary tools.
* Also the second cell checks the device (GPU) information for further reference.

In [1]:
!pip install tensorflow-model-optimization
# Run these later as they significantly slow down the training process.
# !pip install --upgrade tfcoreml
# !pip install --upgrade coremltools

Collecting tensorflow-model-optimization
[?25l  Downloading https://files.pythonhosted.org/packages/55/38/4fd48ea1bfcb0b6e36d949025200426fe9c3a8bfae029f0973d85518fa5a/tensorflow_model_optimization-0.5.0-py2.py3-none-any.whl (172kB)
[K     |██                              | 10kB 17.7MB/s eta 0:00:01[K     |███▉                            | 20kB 18.2MB/s eta 0:00:01[K     |█████▊                          | 30kB 14.9MB/s eta 0:00:01[K     |███████▋                        | 40kB 13.4MB/s eta 0:00:01[K     |█████████▌                      | 51kB 7.0MB/s eta 0:00:01[K     |███████████▍                    | 61kB 8.1MB/s eta 0:00:01[K     |█████████████▎                  | 71kB 8.5MB/s eta 0:00:01[K     |███████████████▏                | 81kB 8.9MB/s eta 0:00:01[K     |█████████████████               | 92kB 9.0MB/s eta 0:00:01[K     |███████████████████             | 102kB 7.5MB/s eta 0:00:01[K     |████████████████████▉           | 112kB 7.5MB/s eta 0:00:01[K     |███

In [2]:
!nvidia-smi

Thu Jun 17 01:28:17 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# Method Definitions:

All methods that are going to be used in the runtime is listed below:

In [3]:
# Tested on Colaboratory with GPU, Tensorflow 2.4.1
# Tested on MacBook Pro M1 with CPU/GPU, Tensorflow 2.4.0

import tensorflow as tf
# work through pip install tensorflow-model-optimization
import tensorflow_model_optimization as tfmot
import numpy as np
import time


def build_model(input_shape):
    model = tf.keras.models.Sequential()

    model.add(tf.keras.layers.BatchNormalization(input_shape=input_shape))
    model.add(tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation='relu'))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
    model.add(tf.keras.layers.Dropout(0.25))

    model.add(tf.keras.layers.BatchNormalization(input_shape=input_shape))
    model.add(tf.keras.layers.Conv2D(128, (5, 5), padding='same', activation='relu'))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Dropout(0.25))

    model.add(tf.keras.layers.BatchNormalization(input_shape=input_shape))
    model.add(tf.keras.layers.Conv2D(256, (5, 5), padding='same', activation='relu'))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
    model.add(tf.keras.layers.Dropout(0.25))

    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(256))
    model.add(tf.keras.layers.Activation('relu'))
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(10))
    model.add(tf.keras.layers.Activation('softmax'))

    return model


def train(x_train, y_train, x_test, y_test, lr, epochs, batch_size, savedir):
    """
    Train the model given the dataset and the global parameters (LR, EPOCHS and BATCH_SIZE).

    The model is automatically saved after the training.

    """
    model = build_model(x_train.shape[1:])
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
        loss='categorical_crossentropy',
        metrics=['categorical_accuracy'],
    )
    # print(model.summary())

    start_time = time.time()

    model.fit(
        x=x_train.astype(np.float32),
        y=y_train.astype(np.float32),
        epochs=epochs,
        validation_data=(x_test.astype(np.float32), y_test.astype(np.float32)),
        batch_size=batch_size,
    )

    end_time = time.time()
    print("Train elapsed time: {} seconds".format(end_time - start_time))

    model.save(savedir, overwrite=True)
    return model


def test(x_test, y_test, loaddir):
    """
    Load any saved model and evaluate it against the test set.
    """
    model = tf.keras.models.load_model(loaddir)

    
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=LR),
        loss='categorical_crossentropy',
        metrics=['categorical_accuracy'],
)   
    # print(model.summary())

    start_time = time.time()

    scores = model.evaluate(x_test, y_test)

    end_time = time.time()
    print("Test elapsed time: {} seconds".format(end_time - start_time))
    return scores


def pruned_train(x_train, y_train, loaddir, lr, val_split, epochs, batch_size,
                 prune_summaries,  # uncomment to add prune summary callback directory to save logs.
                 savedir):

    # Load the model:
    model = tf.keras.models.load_model(loaddir)
    num_images = x_train.shape[0] * (1 - val_split)
    end_step = np.ceil(num_images / batch_size).astype(np.int32) * epochs

    # Define model for pruning. # ToDo: Further optimization can work good; put these to arguments of method.
    pruning_params = {
        'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.40,
                                                                 final_sparsity=0.80,
                                                                 begin_step=0,
                                                                 end_step=end_step,
                                                                 frequency=100)
    }

    model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(model, **pruning_params)

    # Pruning method requires a recompile.
    model_for_pruning.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
        loss='categorical_crossentropy',
        metrics=['categorical_accuracy'],
    )

    # model_for_pruning.summary()   # for debugging, uncomment if you want to inspect.

    # Train for given amount of time.
    callbacks = [
        tfmot.sparsity.keras.UpdatePruningStep(),
        tfmot.sparsity.keras.PruningSummaries(log_dir=prune_summaries),
    ]

    model_for_pruning.fit(x_train, y_train,
                          batch_size=batch_size, epochs=epochs, validation_split=val_split,
                          callbacks=callbacks)

    final_model = tfmot.sparsity.keras.strip_pruning(model_for_pruning)
    final_model.summary() # for debugging, uncomment if you want to inspect.

    final_model.save(savedir, overwrite=True)
    return final_model


def apply_custom_quantization(layer):
    """
    Helper function that quantizes all layers except for batch normalization, and maxpooling
    as these are not supported by TensorFlow 2.4.

    # ToDo: Hacky ways are possible. Different models may work; but more engineering and different network
    #       implementations would be needed.
    # ToDo: eLu quantization is also not accepted by tensorflow; relu is accepted.
    """
    if not isinstance(layer, tf.keras.layers.BatchNormalization):
        if not isinstance(layer, tf.keras.layers.MaxPooling2D):
            return tfmot.quantization.keras.quantize_annotate_layer(layer)
    return layer

def quantized_train(x_train, y_train, loaddir, lr, val_split, epochs, batch_size,
                    savedir):

    # Load the model:
    model = tf.keras.models.load_model(loaddir)
    
    # Proceed with quantization:
    annotated_model = tf.keras.models.clone_model(model, clone_function=apply_custom_quantization)

    q_aware_model = tfmot.quantization.keras.quantize_apply(annotated_model)
    # q_aware_model.summary()

    # Pruning method requires a recompile.
    q_aware_model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
        loss='categorical_crossentropy',
        metrics=['categorical_accuracy'],
    )

    q_aware_model.fit(x_train, y_train,
                  batch_size=batch_size, epochs=epochs, validation_split=0.1)

    q_aware_model.summary() # for debugging, uncomment if you want to inspect.

    q_aware_model.save(savedir, overwrite=True)
    return q_aware_model

# ToDo: Uses apply_custom_quantization, however this method has bugs listed, commented within.
def quantized_test(x_test, y_test, loaddir):
    """
    Applies quantization to the given model.
    """
    model = tf.keras.models.load_model(loaddir)

    start_time = time.time()

    scores = model.evaluate(x_test, y_test)

    end_time = time.time()
    print("Test elapsed time: {} seconds".format(end_time - start_time))
    return scores


# Runtime Cells: 

* First, run the vanilla model this will be the baseline.
* Secondly, pruning will be applied to the trained vanilla model, using tensorflow's optimization libraries (tensorflow-model-optimization).
* Thirdly, quantization aware training will be done. Knowledge distillation will be tested.
* Finally, this model will be converted to TFLite model for mobile deployment.

## Extras:

* PyTorch implementation would use the same pipeline; analogous methods are available. 
* TensorFlow does not allow pruning of many types of layers such as MaxPooling and Batch Normalization. It allows writing custom code to prune or quantize such layers but it requires more analysis and engineering before implementation.
* This means the model can be further compressed. The cells will report their model loss and accuracy; and all models trained and optimized will be downloaded to your PC at the end of the file for your further inspection.


It would be good to try out very deep models that can achieve 95% accuracy on Fashion MNIST or more (current SotA at 96%). We can prune such networks to have better accuracies afteer pruning. One other advantage is that it might be possible to reach or surpass 90% accuracy limit after quantization losses as well.

## Notes:

* Almost no difference after pruning procedure. Regular network has a size of 18 MB; pruned method has 6 MB. The percentage difference between the networks are 1% in favor of the vanilla model.

* Should check whether clustering first would give better results; than prune first and the cluster. When multiple methods are used; we can look at different combinations of implementing the compression.

* Quantization aware training gives good results; but its performance is mostly visible after encoding/compressing the network to TFLite format/

* Knowledge distillation is a good way to completely reduce number of parameters(other methods mask filter weights). Less number of parameter imply faster inference times. We need to aim for at least 24 FPS for real-time processing. 

## Weight Pruning Compression:

This is used to mask weakest model weights to zero so that the compression algorithms can reduce the file size.

Initial observations indicate that it does not help in inference time; but the loss in accuracy is minimal and there is a significant decrease in model size. 

In [4]:
import numpy as np

# from fashion_mnist import train, test, pruned_train
import tensorflow as tf

# Global Variables:
LR = 1E-3
EPOCHS = 10
BATCH_SIZE = 64


# Load the dataset:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

# Add a trailing unitary dimension to make a 3D multidimensional array (tensor):
# N x 28 x 28 --> N x 28 x 28 x 1
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)

# Convert the labels from integers to one-hot encoding:
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

# Comment/uncomment the following two lines as needed:
print('Starting Training:')
model = train(x_train, y_train, x_test, y_test, LR, EPOCHS, BATCH_SIZE, 'fashion_mnist_model')
print("\n")

# Check and list scores of the regular model given with this code segment.
print('Starting Testing:')
scores = test(x_test, y_test, './fashion_mnist_model')
print("\n")
print('Model Validation Loss:', scores[0])
print('Model Validation Accuracy:', scores[1])

model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
Starting Training:
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Train elapsed time: 143.01352500915527 seconds
INFO:tensorflow:Assets written to: fashion_mnist_model/assets


Starting Testing:
Test elapsed time: 2.8691580295562744 seconds


Model Validation Loss: 0.23055659234523773
Model Validation Accuracy: 0.9192000031471252
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalizatio

In [5]:
# Model pruning using TF's optimization libraries. PyTorch port is also available.
# The support of this API is limited; but allows writing custom pruning and quantization functions.
 
# Uses 5 epochs of training with 80% sparsity. We can add additional callbacks and take checkpoints. 
# By this way we can optimize over objectives: the best (smallest) model and the highest accuracy.
print('Starting Training:')
pruned_model = pruned_train(x_train, y_train, './fashion_mnist_model', 0.0001, 0.1, 2, BATCH_SIZE,
                            './temp', 
                            './pruned_fashion_mnist_model')
print("\n")

# Observe and evaluate results.
print('Starting Testing:')
scores = test(x_test, y_test, './pruned_fashion_mnist_model')

print('\n')
print('Pruned Validation Loss:', scores[0])
print('Pruned Validation Accuracy:', scores[1])


Starting Training:




Epoch 1/2
Instructions for updating:
The `validate_indices` argument has no effect. Indices are always validated on CPU and never validated on GPU.
Epoch 2/2
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization (BatchNo (None, 28, 28, 1)         4         
_________________________________________________________________
conv2d (Conv2D)              (None, 28, 28, 64)        1664      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64)        0         
_________________________________________________________________
dropout (Dropout)            (None, 14, 14, 64)        0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 14, 14, 64)        256       
_________________________________________________________________
conv2d_1 (Conv2D)            (

## Quantization Aware Training:

Do the training quantization aware and mask the unnecessary capacity of the network by weight pruning.

In [6]:
# Quantization Aware Training:
print('Starting training.')
q_aware_model = quantized_train(x_train, y_train, './fashion_mnist_model', LR, 0.1, 5, BATCH_SIZE,
                            './qaware_fashion_mnist_model')



Starting training.
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization (BatchNo (None, 28, 28, 1)         4         
_________________________________________________________________
quant_conv2d (QuantizeWrappe (None, 28, 28, 64)        1795      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64)        0         
_________________________________________________________________
quant_dropout (QuantizeWrapp (None, 14, 14, 64)        1         
_________________________________________________________________
batch_normalization_1 (Batch (None, 14, 14, 64)        256       
_________________________________________________________________
quant_conv2d_1 (QuantizeWrap (None, 14, 14, 128)       205187    
_____________________________________________________



INFO:tensorflow:Assets written to: ./qaware_fashion_mnist_model/assets


INFO:tensorflow:Assets written to: ./qaware_fashion_mnist_model/assets


In [7]:
# Quantization awareness slows down the training and inference times; 
# but it should not be problem after we convert to TFLite models.
print('Starting testing.')
scores = quantized_test(x_test, y_test, './qaware_fashion_mnist_model')

print('\n')
print('Pruned Validation Loss:', scores[0])
print('Pruned Validation Accuracy:', scores[1])

Starting testing.
Test elapsed time: 2.7808849811553955 seconds


Pruned Validation Loss: 0.23629344999790192
Pruned Validation Accuracy: 0.9180999994277954


In [8]:
# Prune the quantization aware network
# NOT APPLICABLE: As quantize aware layers are not prunable by TensorFlow.
# See the error output below:
# print('Starting Training:')
# pruned_model = pruned_train(x_train, y_train, './qaware_fashion_mnist_model', 0.0001, 0.1, 2, BATCH_SIZE,
#                             './temp', 
#                             './pruned_qaware_fashion_mnist_model')
# print("\n")

# # Observe and evaluate results.
# print('Starting Testing:')
# scores = test(x_test, y_test, './pruned_qaware_fashion_mnist_model')

# print('\n')
# print('Pruned Validation Loss:', scores[0])
# print('Pruned Validation Accuracy:', scores[1])


## Knowledge Distillation:

First knowledge distillation can be used to extract a smaller model in a non-destructive way. Then the weight pruning and quantization can be used to further process the networks.

The good part here is that we construct a smaller network and teach the network using our current model. This smaller network uses the has lesser neurons; which means better inference time performance. With addition of pruning and quantization; the model size can be compressed as well.

However, one caveat is that learner network has to be built by us as well; so this means we have to engineer a good (and small) learner network for our task as well.

Distiller was able to reduce the inference time to 0.88 seconds (more than 50% reduction).

Number of parameters were shrunk significantly but at the cost of less model accuracy. More engineering is needed on student network to optimize for this issue. (Accuracy dropped from 90% to 

In [9]:
# Taken from: https://keras.io/examples/vision/knowledge_distillation/#setup

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

class Distiller(keras.Model):
    def __init__(self, student, teacher):
        super(Distiller, self).__init__()
        self.teacher = teacher
        self.student = student

    def compile(
        self,
        optimizer,
        metrics,
        student_loss_fn,
        distillation_loss_fn,
        alpha=0.1,
        temperature=3,
    ):
        """ Configure the distiller.

        Args:
            optimizer: Keras optimizer for the student weights
            metrics: Keras metrics for evaluation
            student_loss_fn: Loss function of difference between student
                predictions and ground-truth
            distillation_loss_fn: Loss function of difference between soft
                student predictions and soft teacher predictions
            alpha: weight to student_loss_fn and 1-alpha to distillation_loss_fn
            temperature: Temperature for softening probability distributions.
                Larger temperature gives softer distributions.
        """
        super(Distiller, self).compile(optimizer=optimizer, metrics=metrics)
        self.student_loss_fn = student_loss_fn
        self.distillation_loss_fn = distillation_loss_fn
        self.alpha = alpha
        self.temperature = temperature

    def train_step(self, data):
        # Unpack data
        x, y = data

        # Forward pass of teacher
        teacher_predictions = self.teacher(x, training=False)

        with tf.GradientTape() as tape:
            # Forward pass of student
            student_predictions = self.student(x, training=True)

            # Compute losses
            student_loss = self.student_loss_fn(y, student_predictions)
            distillation_loss = self.distillation_loss_fn(
                tf.nn.softmax(teacher_predictions / self.temperature, axis=1),
                tf.nn.softmax(student_predictions / self.temperature, axis=1),
            )
            loss = self.alpha * student_loss + (1 - self.alpha) * distillation_loss

        # Compute gradients
        trainable_vars = self.student.trainable_variables
        gradients = tape.gradient(loss, trainable_vars)

        # Update weights
        self.optimizer.apply_gradients(zip(gradients, trainable_vars))

        # Update the metrics configured in `compile()`.
        self.compiled_metrics.update_state(y, student_predictions)

        # Return a dict of performance
        results = {m.name: m.result() for m in self.metrics}
        results.update(
            {"student_loss": student_loss, "distillation_loss": distillation_loss}
        )
        return results

    def test_step(self, data):
        # Unpack the data
        x, y = data

        # Compute predictions
        y_prediction = self.student(x, training=False)

        # Calculate the loss
        student_loss = self.student_loss_fn(y, y_prediction)

        # Update the metrics.
        self.compiled_metrics.update_state(y, y_prediction)

        # Return a dict of performance
        results = {m.name: m.result() for m in self.metrics}
        results.update({"student_loss": student_loss})
        return results


In [10]:
# Create the teacher:
def build_teacher(input_shape):
    teacher = tf.keras.models.Sequential([tf.keras.layers.BatchNormalization(input_shape=input_shape),
                                        tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation='relu'),
                                        tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
                                        tf.keras.layers.Dropout(0.25),

                                        tf.keras.layers.BatchNormalization(input_shape=input_shape),
                                        tf.keras.layers.Conv2D(128, (5, 5), padding='same', activation='relu'),
                                        tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
                                        tf.keras.layers.Dropout(0.25),

                                        tf.keras.layers.BatchNormalization(input_shape=input_shape),
                                        tf.keras.layers.Conv2D(256, (5, 5), padding='same', activation='relu'),
                                        tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
                                        tf.keras.layers.Dropout(0.25),

                                        tf.keras.layers.Flatten(),
                                        tf.keras.layers.Dense(256),
                                        tf.keras.layers.Activation('relu'),
                                        tf.keras.layers.Dropout(0.5),
                                        tf.keras.layers.Dense(10),
                                        tf.keras.layers.Activation('softmax')],
                                        name="teacher"
                                    )
    return teacher

def build_student(input_shape):

    student = keras.Sequential(
        [   tf.keras.layers.BatchNormalization(input_shape=input_shape),
            layers.Conv2D(64, (5,5), strides=(2, 2), padding="same"),
            layers.Activation('relu'),
            layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"),
            tf.keras.layers.Dropout(0.25),

            tf.keras.layers.BatchNormalization(input_shape=input_shape),
            layers.Conv2D(256, (5,5), strides=(2, 2), padding="same"),
            layers.Activation('relu'),
            tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
            tf.keras.layers.Dropout(0.25),
         
            layers.Flatten(),
            tf.keras.layers.Dense(256),
            tf.keras.layers.Activation('relu'),
            layers.Dense(10),
            tf.keras.layers.Activation('softmax')],
        name="student",
    )
    return student

teacher = build_teacher(x_train.shape[1:])
student = build_student(x_train.shape[1:])


In [11]:
# Compile student:
student.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=LR),
        loss='categorical_crossentropy',
        metrics=['categorical_accuracy'],
    )

# Train teacher as usual
teacher.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=LR),
        loss='categorical_crossentropy',
        metrics=['categorical_accuracy'],
    )

# Train and evaluate teacher on data.
teacher.fit(x_train, y_train, epochs=5)

start_time = time.time()

scores = teacher.evaluate(x_test, y_test)

end_time = time.time()

print('Test time elapsed:', str(end_time-start_time))

print('\n')
print('Validation Loss:', scores[0])
print('Validation Accuracy', scores[1])

teacher.save("./teacher_model", overwrite=True)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test time elapsed: 1.4647762775421143


Validation Loss: 0.27731695771217346
Validation Accuracy 0.9002000093460083
INFO:tensorflow:Assets written to: ./teacher_model/assets


INFO:tensorflow:Assets written to: ./teacher_model/assets


In [12]:
# Initialize and compile distiller
distiller = Distiller(student=student, teacher=teacher)
distiller.compile(
    optimizer=keras.optimizers.Adam(),
    metrics=[keras.metrics.CategoricalAccuracy()],
    student_loss_fn=keras.losses.CategoricalCrossentropy(from_logits=True),
    distillation_loss_fn=keras.losses.KLDivergence(),
    alpha=0.1,
    temperature=10,
)

# Distill teacher to student
distiller.fit(x_train, y_train, epochs=3)

# Evaluate student on test dataset
distiller.evaluate(x_test, y_test)


Epoch 1/3


  '"`categorical_crossentropy` received `from_logits=True`, but '


Epoch 2/3
Epoch 3/3


[0.8980000019073486, 0.286588579416275]

In [13]:
# distiller.save("./distilled_student_model", overwrite=True)

# Mobile Deployment:

Deploy the models to TFLite and CoreML Outputs.

* Choosing 8 bits as I assume we would be working on CPU; not GPU of the device. 
* If GPU was to be used; the best option is to do 16 bit quantization for better performance.

## TFLite Deployment:

The cells below can be combined together in a loop; this way is not efficient.

In [14]:
def export_tflite_model(modeldir, tf_filename):
    converter = tf.lite.TFLiteConverter.from_saved_model(modeldir)
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    lite_model = converter.convert()

    # save model
    tflite_model_name = tf_filename
    open(tflite_model_name, "wb").write(lite_model)

    print('Saved TFLite model to:', lite_model)
    return

In [15]:
# Export models:
export_tflite_model('./fashion_mnist_model',"tflite_model.tflite")
export_tflite_model('./pruned_fashion_mnist_model',"tflite_pruned_model.tflite")
export_tflite_model('./qaware_fashion_mnist_model',"tflite_qaware_model.tflite")


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



## CoreML Deployment:

I have not worked with CoreML before; but it is possible to convert TensorFlow and PyTorch models do CoreML models. 

[This link](https://medium.com/maxims-passion-project/convert-a-tensorflow-model-to-core-ml-with-coremltools-8c304f1af2f6) shows a brief introduction of this for tensorflow.

In [16]:
# Run these now as they significantly slow down the training process.
!pip install --upgrade tfcoreml
!pip install --upgrade coremltools

Collecting tfcoreml
[?25l  Downloading https://files.pythonhosted.org/packages/b1/0c/cfbc828342685ca78ac7bf7126670a0d3bac2f1d7c3560289a3a62b93a18/tfcoreml-2.0-py3-none-any.whl (44kB)
[K     |███████▍                        | 10kB 9.7MB/s eta 0:00:01[K     |██████████████▊                 | 20kB 9.0MB/s eta 0:00:01[K     |██████████████████████          | 30kB 11.4MB/s eta 0:00:01[K     |█████████████████████████████▍  | 40kB 9.5MB/s eta 0:00:01[K     |████████████████████████████████| 51kB 4.1MB/s 
Collecting tensorflow<=1.14,>=1.5.0
[?25l  Downloading https://files.pythonhosted.org/packages/f4/28/96efba1a516cdacc2e2d6d081f699c001d414cc8ca3250e6d59ae657eb2b/tensorflow-1.14.0-cp37-cp37m-manylinux1_x86_64.whl (109.3MB)
[K     |████████████████████████████████| 109.3MB 1.3MB/s 
[?25hCollecting coremltools>=0.8
[?25l  Downloading https://files.pythonhosted.org/packages/32/b0/14c37edf39a9b32c2c9c7aa3e27ece4ef4f5b2dd2c950102661a106520f1/coremltools-4.1-cp37-none-manylinux1_x86

Requirement already up-to-date: coremltools in /usr/local/lib/python3.7/dist-packages (4.1)


# Download files:

* Note that Google Colaboratory includes a directory called "sample_data"; this directory is also downloaded; but unrelated with the files that we are interested in.

In [17]:
!zip -r content.zip /content/
from google.colab import files
files.download("content.zip")

  adding: content/ (stored 0%)
  adding: content/.config/ (stored 0%)
  adding: content/.config/configurations/ (stored 0%)
  adding: content/.config/configurations/config_default (deflated 15%)
  adding: content/.config/.last_survey_prompt.yaml (stored 0%)
  adding: content/.config/gce (stored 0%)
  adding: content/.config/active_config (stored 0%)
  adding: content/.config/.last_update_check.json (deflated 22%)
  adding: content/.config/.last_opt_in_prompt.yaml (stored 0%)
  adding: content/.config/logs/ (stored 0%)
  adding: content/.config/logs/2021.06.15/ (stored 0%)
  adding: content/.config/logs/2021.06.15/13.37.22.745818.log (deflated 53%)
  adding: content/.config/logs/2021.06.15/13.36.40.402408.log (deflated 91%)
  adding: content/.config/logs/2021.06.15/13.37.15.895583.log (deflated 86%)
  adding: content/.config/logs/2021.06.15/13.37.40.569743.log (deflated 53%)
  adding: content/.config/logs/2021.06.15/13.37.39.858399.log (deflated 55%)
  adding: content/.config/logs/2021.

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>