<a href="https://colab.research.google.com/github/jevliu/2022-Machine-Learning-Specialization/blob/main/quantization_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np

In [2]:
from os import terminal_size
# suppres scitntific notation
np.set_printoptions(suppress=True)

In [5]:
# generate random distributed parameters
params = np.random.uniform(low=-50, high=150, size=20)

# make sure importment values are at the begining for better debugging
params[0] = params.max() + 1
params[1] = params.min() - 1
params[2] = 0

# round each number to the second decimal place
params = np.round(params, 2)

# print the parameters
print(params)

[149.28 -17.66   0.    61.84  -0.25 134.6  148.28  68.34  74.51  51.24
  43.13 140.25 142.08 114.53 -16.66  81.33 135.17  -8.6  112.6  103.86]


In [21]:
# define several function for quantization and dequantize according to the
# mathmetical formular
def clamp(param_q:np.array, lower_bound:int, upper_bound:int)->np.array:
  param_q[param_q < lower_bound] = lower_bound
  param_q[param_q > upper_bound] = upper_bound
  return param_q

def asymmetric_quantization(params:np.array,bits:int)->tuple[np.array,float,int]:
  # calulate the scale and zero point
  alpha = np.max(params)
  beta = np.min(params)
  scale = (alpha-beta) / (2**bits-1)
  zero = -1*np.round(beta/scale)
  # unsigned integer
  lower_bound, upper_bound = 0, 2**bits-1
  # quantize the parameters
  quantized = clamp(np.round(params/scale+zero),lower_bound,upper_bound).astype(np.int32)
  return quantized,scale,zero

def symmetric_quantization(params:np.array,bits:int)->tuple[np.array,float]:
  # calculate the scale
  alpha = np.max(np.abs(params))
  scale = alpha / (2**(bits-1)-1)
  lower_bound,upper_bound = -1*(2**(bits-1)),2**(bits-1)-1
  quantized = clamp(np.round(params/scale),lower_bound,upper_bound).astype(np.int32)
  return quantized,scale

def asymmetric_dequantize(params:np.array,scale:float,zero:int)->np.array:
  return scale * (params-zero)

def symmetric_dequantize(params:np.array,scale:float)->np.array:
  return scale * params

def quantization_error(params:np.array, params_q:np.array):
  # calculate the MSE
  return np.mean((params-params_q)**2)


In [22]:
(asymmetric_q, asymmetric_s, asymmetric_z) = asymmetric_quantization(params, 8)
(symmetric_q, symmetric_s) = symmetric_quantization(params, 8)
as_deq_params = asymmetric_dequantize(asymmetric_q, asymmetric_s, asymmetric_z)
sy_deq_params = symmetric_dequantize(symmetric_q, symmetric_s)

print('original parameters:\n',np.round(params,2))
print('parameters after asymmetric quantitation:\n',np.round(asymmetric_q))
print(f'asymmetric_scale: {np.round(asymmetric_s,2)}, asymmetric_zero: {asymmetric_z.round(2)}')
print('parameters after symmetric quantitation:\n',np.round(symmetric_q))
print(f'symmetric_scale: {symmetric_s.round(2)}')
print(f'quantitation error with asymmetric: {quantization_error(params,as_deq_params).round(2)}')
print(f'quantitation error with symmetric: {quantization_error(params,sy_deq_params).round(2)}')

original parameters:
 [149.28 -17.66   0.    61.84  -0.25 134.6  148.28  68.34  74.51  51.24
  43.13 140.25 142.08 114.53 -16.66  81.33 135.17  -8.6  112.6  103.86]
parameters after asymmetric quantitation:
 [255   0  27 121  27 233 253 131 141 105  93 241 244 202   2 151 233  14
 199 186]
asymmetric_scale: 0.65, asymmetric_zero: 27.0
parameters after symmetric quantitation:
 [127 -15   0  53   0 115 126  58  63  44  37 119 121  97 -14  69 115  -7
  96  88]
symmetric_scale: 1.18
quantitation error with asymmetric: 0.04
quantitation error with symmetric: 0.11


## Quantization range:how to choose alpha&beta

### Quantization strategy

**Min-Max:** sensitive to outlier numbers

**Percntile:** only the outlier has big error

**Mean-Square-Error:**It is usually solved using Grid-Search

**Cross-Entropy:**used when the values in the tensor being quantized are not equally importan.
to keep the order in the softmax layer

### Quantization granularity

## Post Tranining Quantization (PTQ)

### PTQ process:

pre-trained model --> attatch observers(calculate the s and z parameter using the observed data) --> calibrate --> quantized model

In [25]:
import torch
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import torch.nn as nn
import matplotlib.pyplot as plt
from tqdm import tqdm
import os

#### Load the MNIST dataset