# Quantization Modes


In [7]:
# define imports
import numpy as np

from quantization.utils import plot_table

ModuleNotFoundError: No module named 'quantization_notes'

## Symmetric Quantization

In symmetric quantization, we map the floating-point range to the quantized range with respect to 0. To do so, we choose the maximum absolute value between the min/max of the floating-point range i.e. $max|x_f| = -max(x_f) $ or $max(x_f)$, where $x_f$ is some number in the floating-point range. Additionally, we choose $N_{bins} = 2^n$, where $n$ is the number of bits we want to quantize to.

Example: Let's say we wanted an 8-bit quantization range. Then, the number of bins would be $N_{bins} = 2^8 = 256 \implies [0,255] \implies [-128,127]$. That is the "full range" symmetric around 0. However in practice, this range is generally "restricted" to $[-127,127]$. We can derive scaling factors to map from floating point to quantized for both ranges:

$$
\def\arraystretch{1.5}
\begin{array}{c|c|c}
& \text{Full Range} & \text{Restricted Range} \\ \hline
\text{Quantized Range} & [-\frac{N_{bins}}{2},\frac{N_{bins}}{2}-1] & [-(\frac{N_{bins}}{2}-1),\frac{N_{bins}}{2}-1] \\ \hline
\text{8-bit Example} & [-128,127] & [-127,127] \\ \hline
\text{Scale Factor} & q_x=\frac{(2^n-1)/2}{max(|x_f|)} & q_x=\frac{2^{n-1}-1}{max(|x_f|)} \\
\end {array}
$$

Finally, we can compute our symmetric quantized number:

$$x_q = \text{round}(q_xx_f)$$


In [5]:
def quantize_symmetric_full(n: int, x_f: np.array) -> np.array:
    q_x = ((pow(2, n) - 1) / 2) / np.max(np.abs(x_f))
    x_q = np.round(q_x * x_f)
    return x_q


def quantize_symmetric_restricted(n: int, x_f: np.array) -> np.array:
    q_x = (pow(2, n - 1) - 1) / np.max(np.abs(x_f))
    x_q = np.round(q_x * x_f)
    return x_q


# In Python the smallest data type is bool which is 8 bits,
# unfortunately we can't do any smaller here
data = np.random.uniform(-1000, 1000, size=(5, 5))
print(f"unquantized data:\n {data}")
print(f"quantize symmetric full\n {quantize_symmetric_full(8, data).astype(np.int8)}")
print(
    f"quantize symmetric restricted\n {quantize_symmetric_restricted(8, data).astype(np.int8)}"
)
plot_table(data, "output/random-table.png", "Oranges", "black", "black")

unquantized data:
 [[ 172.71694805 -695.71811529  387.47549919 -126.40860514  462.68867062]
 [ 248.72155205  773.94617695 -405.48576577  739.61213269 -652.9702183 ]
 [-833.36536888  -84.05127764 -867.87526821  323.935187   -402.2115593 ]
 [ 547.32503623 -671.37693182 -135.44525078  920.91887197 -266.84658603]
 [ 314.45379454  227.06985558 -878.88743677  732.116133    400.80117965]]
quantize symmetric full
 [[  24  -96   54  -18   64]
 [  34  107  -56  102  -90]
 [-115  -12 -120   45  -56]
 [  76  -93  -19 -128  -37]
 [  44   31 -122  101   55]]
quantize symmetric restricted
 [[  24  -96   53  -17   64]
 [  34  107  -56  102  -90]
 [-115  -12 -120   45  -55]
 [  75  -93  -19  127  -37]
 [  43   31 -121  101   55]]


NameError: name 'plot_table' is not defined