# Vector quantisation of one grey scale image
This tutorial will show an example of vector quantisation applied to a grey scale image (*cameraman.tif*) where pairs of pixels are vector quantised with 8 bits for each pair, roughly amounting to 4 bits per single pixel. The coding efficiency of such quantisation scheme will be compared with a uniform scalar quantisation scheme which quantises each pixel with the same amount of bits (i.e. 4). Given that both quantisers (i.e. scalar and vector) operate at the same bits per pixel, we'll measure the distortion in terms of Peak-Signal-to-Noise-Ratio (PSNR) and comment the objective and subjetive visual quality. The main goal of this training is to provide the reader with a practical example of vector quantisation, most notably how the generalised Lloyd-Max algorithm could be implemented. For a thorough treatment of the fundamentals of vector quantisation, the interested reader is referred to the following textbooks:
 * Allen Gersho and Robert M. Gray. Vector Quantization and Signal Compression. Kluwer Academic Press, 732 pages, 1992.
 * David S. Taubman and Micheal W. Marcellin, "JPEG 2000: Image compression fundamentals, standards and practice", Kluwer Academic Press, 773 pages, 2002.

For vector quantisation, our pairs are constituted by the pixels belonging to two consecutive rows. This is shown in the following figure.

<img src="vectors.png" width="400">

The overall processing can be summarised with the following three main steps:
 * **Step 1**: Select a subset of vectors which will constitute the so-called Training Set (*TS*) and use it to design the reproduction levels for all vectors to be quantised (the so-called codebook).
 * **Step 2**: Derive the reproduction levels $l_i$ using the generalised Lloyd-Max algorithm over the TS found earlier.
 * **Step 3**: Perform the actual vector quantisation.

## Step 1: Selection of the training set
This step receives as input a gray scale image and returns the aforementioned Training Set (*TS*), constituted by a subset of the pairs of pixels associated with the whole image (i.e. our vectors). More precisely, we'll subsample all pairs of adjacent image pixels by a factor of 4 and insert such pairs in *TS*. Note that the subsampling factor is arbitrary but its value leads to a trade-off between coding efficiency and complexity. In fact, large subsampling factors will speed up the design of the reproduction levels (i.e. **Step 2**) but will result in lower coding efficiency as the levels have been designed on a set of pixel pairs which may not be representative of the image statistics. Conversely, a smaller subsampling factor, will increase coding efficiency given that now more pixels are included in the design of the codebook. The price to pay for this is an increase in the encoder's complexity. The following image depicts the selection of vectors to be included in the training set, while the code cell below implements such selection.

<img src="training-set.png" width="650">

In [None]:
import cv2
import numpy as np
import random as rnd
import matplotlib.pyplot as plt

# Vector quantiser bits
vq_bits = 8
vq_levels = 2**vq_bits

image = cv2.imread('../../input-data/cameraman.tif', cv2.IMREAD_UNCHANGED)
rows, cols = image.shape[0:2]

sampling_ratio, vector_height, vector_width = (4, 2, 1)
total_training_samples = (rows * cols) // (vector_height * sampling_ratio * vector_width * sampling_ratio)
training_set = np.zeros((vector_height * vector_width, total_training_samples), np.int32)

k = 0
for r in range(0, rows, vector_height * sampling_ratio):
    for c in range(0, cols, vector_width*sampling_ratio):
        training_vector = image[r:r + vector_height, c:c + vector_width]
        training_set[:, k] = training_vector.flatten()
        k += 1

## Step 2: Derivation of the reproduction levels
Once the training set is available, the reproduction levels can be derived by applying the generalised Lloyd-Max algorithm (see the references listed above for more details). Accordingly, the initial reproduction levels will be set equal to some vectors belonging to the training set. It is important to note at this point that the choice of the initial value for the reproduction level will primarily impact on the convergence speed of the Lloyd-Max algorithm. So if we had more information about the image statistics (e.g. we know that the image has a bimodal histogram) we could reduce the number of iterations by properly selecting the values associated with the two peaks in the histogram. For this example we'll select the initial value by sampling the training set calculated by a factor $r = \frac{|TS|}{2^{qb}}$, where $qb$ in our example is equal to 8 bits per vector and $|\cdot|$ denotes the number of vectors included in the training set. Let's denote the set of initial reproduction levels as $L_{init}$, the Llyod-Max algorithm will take $L_{init}$ as input parameter along with the training set vectors. The output of the algorithm will be the set of reproduction levels $L_{final}$ containing all reproduction levels which minimise the overall Mean Square Error (MSE) between the vectors in *TS* and their vector quantised counterparts.

We can summarise the generalised Lloyd-Max algorithm with the following sequence of ordered steps:
 1. Set $L_{final} = L_{init}$.
 1. For each vector $v_i$ in the training set, find the reproduction level $l_i \in L_{final}$ which minimises the square error $e^2 = (v_i - l_i)^2$.
 1. Add the value of $e^2$ to variable $SE$ which stores the overall square error for the current iteration.
 1. Update $L_{final}$ as $L_{final} = L_{final} / H$, where $H$ denotes a 1D array having each $i$-th element containing the number of times $l_i$ has been selected as the closest reproduction level for a given $v_i$ in the training set. If a given $l_i$ has never been selected, then substitute it by randomly choosing another vector from the training set.
 1. If $SE$ hasn't decreased by a factor $\epsilon$ stop, else go to Step 2.
 
 The following Python code cell implements such iterative procedure.

In [None]:
ts_sampling_ratio = total_training_samples // vq_levels
reproduction_levels = training_set[:, ts_sampling_ratio-1::ts_sampling_ratio].astype(np.float64)

last_iteration_mse = 1e6
epsilon = 1e-3
iteration = 0
delta_mse = 1.0

print("Step\tMSE\tvariation")
while delta_mse > epsilon:
    levels_accumulator = np.zeros((vector_height * vector_width, vq_levels), np.float64)
    levels_hit_cnt = np.zeros(vq_levels, np.int32)
    MSE = 0.0

    # Step 2: For each vector vi in the training set, find the reproduction level li which minimises 
    # the square error
    for i in range(total_training_samples):
        V = training_set[:, i]
        dV = np.dot(V.T, V) + np.sum(np.square(reproduction_levels), axis=0) - 2*np.dot(V.T, reproduction_levels)
        square_error = np.min(dV)
        l_start_idx = np.argmin(dV)
        levels_accumulator[:, l_start_idx] += V
        levels_hit_cnt[l_start_idx] += 1
        MSE += square_error

    MSE /= total_training_samples * vector_height * vector_width

    # Step 3: Update Lfinal as Lfinal = Lfinal / H
    for i in range(vq_levels):
        if levels_hit_cnt[i]:
            reproduction_levels[:, i] = levels_accumulator[:, i] / levels_hit_cnt[i]
        else:
            random_idx = max(1, int(rnd.random()*total_training_samples))
            reproduction_levels[:, i] = training_set[:, random_idx]

    delta_mse = (last_iteration_mse - MSE) / MSE
    print(f"{iteration}\t{MSE}\t{delta_mse}")
    iteration += 1
    last_iteration_mse = MSE

Worth noting from the code cell above how the selection of $l_i$ from Step 2 of the generalised Lloyd-Max algorithm is implemented. In principle, finding $l_i$ which minimises the square error with the current vector $v_i$ can be done by looping through all reproduction levels, compute such square error and pick the one which minimises it. However, we can compact the code by noting that what we're doing is indeed the following:

$$
\large
e^2 = \lvert\lvert L_{final} - v_i\rvert\rvert^2 = L_{final}\cdot L_{final}^t - 2*L_{final}^t\cdot v_i + v_i\cdot v_i^t,
$$

where superscript $^t$ denotes the transpose operator and $\lvert\lvert \cdot\rvert\rvert^2$ is the $L^2$ norm. Given that our data are stored in **numpy** arrays, dot products and element wise operations such as sum and subtraction are easily implemented and built in as either overloaded operators or interfaces.

## Step 3: Actual vector quantisation over input image
Now that the optimal reproduction levels have been found by the generalised Lloyd-Max algorithm, it is time to perform actual vector quantisation. The processing is similar to what we did above when deriving the optimal reproduction levels. In fact, this time we'll loop through all vectors associated with the *cameraman.tif* image and, for each one, $v_i$, the reproduction level $l_i$ from $L_{final}$ which minimises the square error $e^2 = (v_i - l_i)^2$, will be selected. Vector $l_i$ will be then placed at the same spatial location of $v_i$ and the process can move to the next $v_i$.

In [None]:
reproduction_levels = np.round(reproduction_levels).astype(np.int32)
square_sum_level = np.sum(np.square(reproduction_levels), axis=0)

image_vq = np.zeros((rows, cols), np.uint8)
for r in range(0, rows, vector_height):
    for c in range(0, cols, vector_width):
        V = image[r:r + vector_height, c:c + vector_width].flatten()
        dV = np.dot(V.T, V) + square_sum_level - 2*np.dot(V.T, reproduction_levels)
        l_start_idx = np.argmin(dV)
        image_vq[r:r + vector_height, c:c + vector_width] =\
        np.reshape(reproduction_levels[:, l_start_idx], (vector_height, vector_width))

## Comparison with scalar quantisation
The last operation we need to perform is to quantise *cameraman.tif* with a quantiser having $qb = 4$, that is 4 bits per pixel. Over the images obtained with scalar and vector quantisation, we'll then compute the Peak-Signal-to-Noise-Ratio (PSNR) and express it in decibel [dB] according to the following formula:

$$
PSNR(I,\hat{I}) = 10\cdot\log_{10}\left(\frac{M^2}{E\left[\lvert\lvert I - \hat{I}\rvert\rvert^2\right]}\right) [dB],
$$

where $\hat{I}$ denotes the image quantised with either scalar or vector quantisation, $M$ is the maximum value allowed for image $I$, that is 255 with an 8 bit per pixel image and finally $E[\cdot]$ denotes the expectation operator.

In [None]:
sq_bits = vq_bits / vector_height / vector_width
Q = 256 // 2**sq_bits
image_sq = np.round(image / Q).astype(np.int32) * Q

mse_vq = np.mean(np.square(image - image_vq))
mse_sq = np.mean(np.square(image - image_sq))
psnr_vq = 10.0*np.log10(255.0**2 / mse_vq)
psnr_sq = 10.0*np.log10(255.0**2 / mse_sq)

plt.figure(1)
plt.figure(figsize=(20,20))
plt.subplot(1, 3, 1), plt.imshow(image, cmap='gray'), plt.title('Original image')
plt.subplot(1, 3, 2), plt.imshow(image_vq, cmap='gray'), plt.title(f"Vector quantised image (PSNR = {psnr_vq:.2f} [dB])")
plt.subplot(1, 3, 3), plt.imshow(image_sq, cmap='gray'), plt.title(f"Scalar quantised image (PSNR = {psnr_sq:.2f} [dB])");

As we may note, the image resulting from scalar quantisation shows noticeable banding artefacts. Quality is significantly better for the vector quantised image which not only improves the PSNR by almost 4 dBs but also shows less artefacts.

This example shows a compelling case for the use of vector quantisation: in fact, the vectors considered (i.e. pair of adjacent image pixels) show a correlation which would make their distribution on a scatter plot aligned to the 45 degree straight line. Such a correlation is efficiently exploited by vector quantisation whereby the generalised Lloyd-Max algorithm places the reproduction levels along the joint probability mass function. Scalar quantisation doesn't consider this pair-based correlation, hence places all reproduction levels as to span all possible range of values (even those which would never appear in the image statistics).

# Concluding remarks
We have presented a simple implementation of the generalised Lloyd-Max algorithm with application to image coding via vector quantisation. We have verified that vector quantisation is indeed a better alternative to scalar quantisation when the input data show some degree of correlation (or redundancy). Accordingly, if the transmitter (i.e. the encoder) can bear some additional complexity, vector quantisation can constitute an attractive alternative. Worth noting that we didn't considered to apply entropy coding on top of the resulting quantisation cells indexes: this would still reduce the coding rate given there will be some inter symbol redundancy to exploit with a coding scheme such as run length encoding.

It is also worth to mention that sometimes vector quantisation is referred as palette coding and a good example of design for the case of screen content and RGB images is the palette mode from the H.265/HEVC (V3) and H.266/VVC standards.

We shall also provide the reader with some ideas on the extension of the vector quantisation scheme presented in this tutorial:
 * Consider colour images. Some design choices and aspects to address would be wether the input data are considered in the RGB or a YCbCr colour space. The former might save in complexity since no colour transform is required but would not allow for an effective perceptual quantisation. Another aspect is whether to treat each image plane separately or jointly. The latter might bring benefits in terms of coding efficiency.
 * Consider region based vector quantisation. Here the images is broken up into square regions and a different codebook is derived for each region. This will allow for parallel encoding and decoding, along with a more content adaptive coding scheme which, in this case, would get closer the palette mode of the H.265/HEVC and H.266/VVC standards.

Finally, although we pointed out at the encoder's complexity as a limiting factor for vector quantisation, we should remind that in case of a region-based approach, GPU implementation of k-means algorithms (another way of optimising the codebook) will speed up compression. At the receiver side, the decoding process is a simple read from the bitstream and look up operation to write the pixels to the output buffer.