## Instructions

You are asked to complete the following files:
* **pruned_layers.py**, which contains the pruning of DNNs to reduce the storage of insignificant weight parameters with 2 methods: pruning by percentage and prune by standara deviation.
* **train_util.py**, which includes the training process of DNNs with pruned connections.
* **quantize.py**, which applies the quantization (weight sharing) part on the DNN to reduce the storage of weight parameters.
* **huffman_coding.py**, which applies the Huffman coding onto the weight of DNNs to further compress the weight size.

You are asked to submit the following files:
* **net_before_pruning.pt**, which is the weight parameters before applying pruning on DNN weight parameters.
* **net_after_pruning.pt**, which is the weight paramters after applying pruning on DNN weight parameters.
* **net_after_quantization.pt**, which is the weight parameters after applying quantization (weight sharing) on DNN weight parameters.
* **codebook_vgg16.npy**, which is the quantization codebook of each layer after applying quantization (weight sharing).
* **huffman_encoding.npy**, which is the encoding map of each item within the quantization codebook in the whole DNN architecture.
* **huffman_freq.npy**, which is the frequency map of each item within the quantization codebook in the whole DNN. 

To ensure fair grading policy, we fix the choice of model to VGG16_half, which is a down-scaled version of VGG16 using a width multiplier of 0.5. You may check the implementation in **vgg16.py** for more details.

In [1]:
from vgg16 import VGG16, VGG16_half
from train_util import train, finetune_after_prune, test, train_iter
from quantize import quantize_whole_model
from huffman_coding import huffman_coding
from summary import summary
import torch
import numpy as np
from prune import prune

device = 'cuda' if torch.cuda.is_available() else 'cpu'

### Full-precision model training

In [2]:
net = VGG16_half()
net = net.to(device)

# Uncomment to load pretrained weights
net.load_state_dict(torch.load("net_before_pruning.pt"))

# Comment if you have loaded pretrained weights
# Tune the hyperparameters here.
#train(net, epochs=40, batch_size=128, lr=0.008, reg=1e-2)

<All keys matched successfully>

In [3]:
# Load the best weight paramters
net.load_state_dict(torch.load("net_before_pruning.pt"))
test(net)

Files already downloaded and verified
Test Loss=0.3117, Test accuracy=0.9067


In [4]:
# print("-----Summary before pruning-----")
# summary(net)
# print("-------------------------------")

### Pruning & Finetune with pruned connections

#### Prune #1

In [5]:
# Helper: Test accuracy using percentage pruning before fine-tuning
prune(net, method='percentage', q=60)
test(net)
#summary(net)

Files already downloaded and verified
Test Loss=0.5098, Test accuracy=0.8488


In [6]:
net.load_state_dict(torch.load("net_before_pruning_iter_1.pt"))
#train_iter(net, epochs=20, batch_size=128, lr=0.008, reg=1e-3, iter_num=1)

<All keys matched successfully>

#### Prune #2

In [7]:
prune(net, method='percentage', q=60)
test(net)
#summary(net)

Files already downloaded and verified
Test Loss=0.3611, Test accuracy=0.8825


In [8]:
net.load_state_dict(torch.load("net_before_pruning_iter_2.pt"))
#train_iter(net, epochs=20, batch_size=128, lr=0.008, reg=1e-3, iter_num=2)

<All keys matched successfully>

#### Prune #3

In [9]:
prune(net, method='percentage', q=60)
test(net)
#summary(net)

Files already downloaded and verified
Test Loss=0.3390, Test accuracy=0.8907


#### Fine-tuning

In [10]:
test(net)
#summary(net)

Files already downloaded and verified
Test Loss=0.3390, Test accuracy=0.8907


In [11]:
# Uncomment to load pretrained weights
net.load_state_dict(torch.load("net_after_pruning.pt"))
# Comment if you have loaded pretrained weights
# finetune_after_prune(net, epochs=30, batch_size=128, lr=0.01, reg=5e-4) #around 20epochs

<All keys matched successfully>

In [12]:
#finetune_after_prune(net, epochs=30, batch_size=128, lr=0.01, reg=1e-4) # iter

In [13]:
test(net)

Files already downloaded and verified
Test Loss=0.3284, Test accuracy=0.9048


In [14]:
# Load the best weight paramters
net.load_state_dict(torch.load("net_after_pruning.pt"))
test(net)

Files already downloaded and verified
Test Loss=0.3284, Test accuracy=0.9048


In [15]:
print("-----Summary After pruning-----")
summary(net)
print("-------------------------------")

-----Summary After pruning-----
Layer id	Type		Parameter	Non-zero parameter	Sparsity(\%)
1		Convolutional	864		346			0.599537
2		BatchNorm	N/A		N/A			N/A
3		ReLU		N/A		N/A			N/A
4		Convolutional	9216		3687			0.599935
5		BatchNorm	N/A		N/A			N/A
6		ReLU		N/A		N/A			N/A
7		Convolutional	18432		7373			0.599989
8		BatchNorm	N/A		N/A			N/A
9		ReLU		N/A		N/A			N/A
10		Convolutional	36864		14746			0.599989
11		BatchNorm	N/A		N/A			N/A
12		ReLU		N/A		N/A			N/A
13		Convolutional	73728		29491			0.600003
14		BatchNorm	N/A		N/A			N/A
15		ReLU		N/A		N/A			N/A
16		Convolutional	147456		58983			0.599996
17		BatchNorm	N/A		N/A			N/A
18		ReLU		N/A		N/A			N/A
19		Convolutional	147456		58983			0.599996
20		BatchNorm	N/A		N/A			N/A
21		ReLU		N/A		N/A			N/A
22		Convolutional	294912		117965			0.599999
23		BatchNorm	N/A		N/A			N/A
24		ReLU		N/A		N/A			N/A
25		Convolutional	589824		235930			0.599999
26		BatchNorm	N/A		N/A			N/A
27		ReLU		N/A		N/A			N/A
28		Convolutional	589824		235930			0.599999
29		BatchNorm

### Quantization

In [16]:
net.load_state_dict(torch.load("net_after_pruning.pt"))


<All keys matched successfully>

In [17]:
test(net)

Files already downloaded and verified
Test Loss=0.3284, Test accuracy=0.9048


In [18]:
quant_bits = 4
centers = quantize_whole_model(net, quant_bits)
np.save("codebook_vgg16.npy", centers)

Complete 1 layers quantization...
Complete 2 layers quantization...
Complete 3 layers quantization...
Complete 4 layers quantization...
Complete 5 layers quantization...
Complete 6 layers quantization...
Complete 7 layers quantization...
Complete 8 layers quantization...
Complete 9 layers quantization...
Complete 10 layers quantization...
Complete 11 layers quantization...
Complete 12 layers quantization...
Complete 13 layers quantization...
Complete 14 layers quantization...
Complete 15 layers quantization...
Complete 16 layers quantization...


In [19]:
test(net)

Files already downloaded and verified
Test Loss=0.3316, Test accuracy=0.9023


In [20]:
# b = np.load('sample_submission/codebook_vgg16.npy') #lab3/sample_submission/codebook_vgg16.npy
# print(np.shape(b))

# h = np.load('sample_submission/huffman_freq.npy')
# print(h)

### Huffman Coding

In [21]:
centers = np.load('sample_submission/codebook_vgg16.npy')
frequency_map, encoding_map, orig_bits, huffman_bits = huffman_coding(net, centers)
np.save("huffman_encoding", encoding_map)
np.save("huffman_freq", frequency_map)

Original storage for each parameter: 5.0000 bits
Average storage for each parameter after Huffman Coding: 2.8299 bits
Complete 1 layers for Huffman Coding...
Original storage for each parameter: 5.0000 bits
Average storage for each parameter after Huffman Coding: 2.6453 bits
Complete 2 layers for Huffman Coding...
Original storage for each parameter: 5.0000 bits
Average storage for each parameter after Huffman Coding: 2.7101 bits
Complete 3 layers for Huffman Coding...
Original storage for each parameter: 5.0000 bits
Average storage for each parameter after Huffman Coding: 2.8350 bits
Complete 4 layers for Huffman Coding...
Original storage for each parameter: 5.0000 bits
Average storage for each parameter after Huffman Coding: 2.8357 bits
Complete 5 layers for Huffman Coding...
Original storage for each parameter: 5.0000 bits
Average storage for each parameter after Huffman Coding: 2.8695 bits
Complete 6 layers for Huffman Coding...
Original storage for each parameter: 5.0000 bits
Ave

### Compression ratio

In [22]:
from summary_return_params import summary_return_params
nonzero_params, total_params = summary_return_params(net)

ratio = (nonzero_params / total_params) * (quant_bits / 32) * (huffman_bits / orig_bits)

test(net)

print("Nonzero params: ", nonzero_params)
print("Total params: ", total_params)
print("Sparsity: ", 1. - nonzero_params / total_params)
print("Quantized bits: ", quant_bits)
print("Original avg storage (bits): ", orig_bits)
print("Huffman avg storage (bits): ", huffman_bits)
print("Compression ratio: ", ratio)
print("Compression goal: 0.025")

Files already downloaded and verified
Test Loss=0.3316, Test accuracy=0.9023
Nonzero params:  1524678
Total params:  3811680
Sparsity:  0.5999984258909457
Quantized bits:  5
Original avg storage (bits):  5.0
Huffman avg storage (bits):  2.7618819978502063
Compression ratio:  0.03452366083261069
Compression goal: 0.025
