<h1 style="padding-top: 25px;padding-bottom: 25px;text-align: left; padding-left: 10px; background-color: #DDDDDD; 
    color: black;"> <img style="float: left; padding-right: 10px; width: 45px" src="https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/iacs.png"> AC295: Advanced Practical Data Science </h1>

## Model Compression Techniques

**Harvard University, Fall 2020**  
**Instructors**: Pavlos Protopapas  

---

**Each assignment is graded out of 5 points.  The topic for this assignment is Distillation and Pruning.**

**Due:** 11/10/2020 10:15 AM EDT

**Submit:** We won't be re running your notebooks, please ensure output is visible in the notebook.

#### Learning Objectives

In this exercise you will cover the following topics:  
- Knowledge Distillation
- Distill Teacher to Student
- Model Pruning


This exercise aims to distill a mobile-net model that has been trained initially on imagenet and then trained on vegetables images. The learned mobilenet-base model will be considered the "teacher" network. You will use distillation techniques to train a smaller, less sophisticated network that is called a "student" network. The hope is that the wisdom from the teacher network will be distilled and used. 

Then you will learn how to prune the weights of a model using the `tensorflow_model_optimization` package 

---

#### Installs

In [None]:
!pip install -q tensorflow_model_optimization

[?25l[K     |██                              | 10kB 18.7MB/s eta 0:00:01[K     |███▉                            | 20kB 1.5MB/s eta 0:00:01[K     |█████▊                          | 30kB 2.0MB/s eta 0:00:01[K     |███████▋                        | 40kB 2.3MB/s eta 0:00:01[K     |█████████▌                      | 51kB 1.9MB/s eta 0:00:01[K     |███████████▍                    | 61kB 2.1MB/s eta 0:00:01[K     |█████████████▎                  | 71kB 2.4MB/s eta 0:00:01[K     |███████████████▏                | 81kB 2.6MB/s eta 0:00:01[K     |█████████████████               | 92kB 2.8MB/s eta 0:00:01[K     |███████████████████             | 102kB 2.7MB/s eta 0:00:01[K     |████████████████████▉           | 112kB 2.7MB/s eta 0:00:01[K     |██████████████████████▊         | 122kB 2.7MB/s eta 0:00:01[K     |████████████████████████▊       | 133kB 2.7MB/s eta 0:00:01[K     |██████████████████████████▋     | 143kB 2.7MB/s eta 0:00:01[K     |██████████████████████████

#### Imports

In [None]:
import os
import requests
import tempfile
import zipfile
import shutil
import json
import time
import sys
import cv2
import numpy as np
import pandas as pd
from glob import glob
import subprocess
import matplotlib.pyplot as plt
%matplotlib inline

import tensorflow as tf
from tensorflow import keras
from tensorflow.python.keras import backend as K
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras import layers
from tensorflow.keras import activations
from tensorflow.keras import optimizers
from tensorflow.keras import losses
from tensorflow.keras import metrics
from tensorflow.keras import initializers
from tensorflow.keras import regularizers
from tensorflow.keras.utils import to_categorical
from keras.utils.layer_utils import count_params
import tensorflow_hub as hub

import tensorflow_model_optimization as tfmot
from tensorflow_model_optimization.sparsity.keras import prune_low_magnitude

from sklearn.model_selection import train_test_split

## Dataset

**We will use the dataset from Exercise 4.** The dataset consists of images downloaded from Google Image search. There are 5 classes of the following labels: **'tomato', 'beetroot', 'broccoli', 'bell_pepper', 'carrot'**.  

[Link to dataset](https://github.com/shivasj/dataset-store/releases/download/v1.0/vegetables.zip)

## Question 1 : Build Teacher Model (1.0 Point)

Steps to build teacher model:
- Refer to code from Lecture Demo
- Download data & Create TF Datasets 
- Build a transfer learning model to classify vegetables. If you use the TF Hub [mobilenet](https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/4) and `learning_rate = 0.001` and `epochs > 30` you should easliy be able to get a validation accuracy of 85% or higher
- Ensure there is a plot of your training history

## Question 2 : Build Smaller Student Model (1.0 Point)

Steps to build teacher model:
- Refer to code from Lecture Demo
- Build a very small student model to classify vegetables. Use just 2 Convolution layers with max pooling and a dense layer
- Train the student model from scratch but use `learning_rate = 0.01` and `epochs = 10`
- Ensure there is a plot of your training history

## Question 3 : Model Distillation (1.5 Points)

Steps to distill teacher to student:
- Refer to code from Lecture Demo
- Copy the `Distiller` class over from lecture demo
- Keeping `learning_rate = 0.01` and `epochs = 10` constant, distill teacher model to student model as shown in the demo code
- You will notice there are few new parameters when you compile the `Distiller` model:
  - **Student Loss Function**: `student_loss_fn`. Set this to `tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)`
  - **Distillation Loss Function**: `distillation_loss_fn`. Set this to `tf.keras.losses.KLDivergence()`
  - **Alpha**: alpha to `student_loss_fn` and 1-alpha to `distillation_loss_fn`
  - **Temperature**: Temperature for softening probability distributions. The larger the temperature gives softer the distributions
- Try out various values for `alpha` ranging from `[0.1,0.2,0.3,0.5,1.0]`
- Try out various values for `temperature` e.g: `[1,5,10,15,30]`
- Plot the validation accuracy for the various values of `alpha` you tried
- Plot the validation accuracy for the various values of `temperature` you tried
- Pick the best `alpha` and `temperature` and train your final student model
- Ensure there is a plot of your training history of the final student model
- What are your **model size**, **total parameters**, and **accuracy** of your teacher model, student model trained with distillation, and student model trained from scratch. Feel free to use the util functions from the demo code

## Question 4 : Model Pruning (1.5 Point)

In this question you will use the already trained model from the previous question and perform some weights pruning.  

Steps to perform model pruning:  
- You will use the package `tensorflow_model_optimization`, `!pip install -q tensorflow_model_optimization` which has already been included in the notebook
- For this problem you will use the student model from scratch you already trained in question 3
- Here are some helper functions to view model weights:

In [None]:
def check_model_weights(model):
  for i, w in enumerate(model.get_weights()):
    print(model.weights[i].name,"Total:",w.size, "Zeros:", round(np.sum(w == 0) / w.size * 100,2),"%")

def compare_model_sizes(model):
    _, model_file = tempfile.mkstemp(".h5")
    tf.keras.models.save_model(model, model_file, include_optimizer=False)
    _, zip3 = tempfile.mkstemp(".zip")
    with zipfile.ZipFile(zip3, "w", compression=zipfile.ZIP_DEFLATED) as f:
        f.write(model_file)
    print("Model before zipping: %.2f Kb"% (os.path.getsize(model_file) / float(1000)))
    print("Model after zipping: %.2f Kb"% (os.path.getsize(zip3) / float(1000)))


- Run `check_model_weights(...)` and `compare_model_sizes(...)` on your student model from scratch

In [None]:
# Check model before pruning
check_model_weights(...)
compare_model_sizes(...)

- Next you will perform model pruning. For this you will need to create a wrapper model that performs the pruning. YOu can do this by passing the student model into the `prune_low_magnitude` function as shown. This is function is from the `tensorflow_model_optimization` package
- Compile the new model, `model_for_pruning` with the same optimizer and loss function that you used to train the student model from scratch
- Add a pruning callback
- Train your `model_for_pruning` model for just 2 epochs. This is enough for the wrapper model to prune weights of the the actual model

In [None]:
# Define model for pruning
epochs = 2
end_step = np.ceil(len(train_x) / batch_size).astype(np.int32) * epochs
pruning_params = {
      'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.50,
                                                               final_sparsity=0.80,
                                                               begin_step=0,
                                                               end_step=end_step)
}
model_for_pruning = prune_low_magnitude(before_prune, **pruning_params)

# Optimizer
optimizer = ...
# Loss
loss = ...
# Compile model_for_pruning

# Callback
callbacks = [
  tfmot.sparsity.keras.UpdatePruningStep()
]

# Train
start_time = time.time()
training_results = model_for_pruning.fit(
        train_data,
        validation_data=validation_data,
        epochs=epochs,
        callbacks=callbacks,
        verbose=1)
execution_time = (time.time() - start_time)/60.0
print("Training execution time (mins)",execution_time)

- Next you will need to get the student model from scratch back from the pruning wrapper. So for this you will use

In [None]:
# Get the model back after pruning
after_prune = tfmot.sparsity.keras.strip_pruning(model_for_pruning)
after_prune.summary()

- Now `after_prune` is your pruned model (of the original student model from scratch)
- Run `check_model_weights(...)` and `compare_model_sizes(...)` on your pruned model
- Compare what you see from `check_model_weights(...)` on your student model from scratch vs. pruned model
- Compare what you see from `compare_model_sizes(...)` on your student model from scratch vs. pruned model

In [None]:
# Check model after pruning
check_model_weights(after_prune)
compare_model_sizes(after_prune)