# Scalar uniform quantisation of random variables
This tutorial considers scalar quantisation implemented using a uniform quantiser and applied over random variables with different Probability Mass Functions (PMFs). In particular we will consider uniform- and Gaussian-distributed random variables so to comment on the optimality of such a simple quantiser.

## Preliminary remarks
Quantisation is an irreversible operation which reduces the precision used to represent our data to be encoded. Such a precision reduction translates into less bits used to transmit the information. Accordingly, the whole dynamic range associated with the input data ($X$) is divided into intervals denoted as *quantisation bins*, each having a given width. Each quantisation bin $b_i$ is also associated with its reproduction level $l_i$ which corresponds to the value used to represent all original data values belonging to $b_i$. From this description, it is easy to realise why quantisation is an irreversible process: it is indeed a *many-to-one* mapping, hence after a value $x$ is quantised it cannot be recovered. Usually a quantiser is associated with its number of bits $qb$ which determines the number of reproduction levels, given as $2^{qb}$. If scalar (1D) quantities are presented to the quantiser as input, then we talk about *scalar quantisation* (i.e. the subject of this tutorial) if group of samples are considered together as input to the quantiser, we talk about *vector quantisation*.

During encoding the quantiser will output the index $i$ of each quantisation bin $b_i$ where each input sample belongs to. The decoder will receive these indexes and write to the output the corresponding reproduction level $l_i$. The mapping $\{b_i \leftrightarrow l_i\}$ must be known at the decoder side. Working out the optimal partitioning of the input data range (i.e. the width of each $b_i$) and the associated set of $\{l_i\}$ can be a computational intensive process, although it can provide significant gains in the overall rate distortion performance of our coding system.

A widely and well-known used quantiser is the so-called *uniform quantiser*, characterised by having each $b_i$ with the same width and the reproduction level $l_i$ placed at the mid-value of the quantisation bin, that is:

$$
\large
l_i = \frac{b_i + b_{i+1}}{2}.
$$

The width of each quantisation bin is usually denoted as the quantisation step $\Delta$, given as:

$$
\large
\Delta = \frac{\max(X) - \min(X)}{2^{qb}}.
$$

Using the Mean Square Error (MSE) as distortion measure for the quantisation error ($e$) and considering the input data ($X$) to have a uniform PMF, the variance of $e$, $\sigma^2_e$ is given by:

$$
\large
\sigma^2_e = \frac{\Delta^2}{12}.
$$

If $X$ ranges in $\left[-\frac{M\Delta}{2},\frac{M\Delta}{2}\right]$ and if we consider the Signal-to-Noise-Ratio (SNR) as alternative measure to express the reproduction quality, then we have the so-called ***six dB rule***:

$$
SNR = 6 \cdot qb\quad[dB],
$$

That is, each bit added to increase the number of reproduction levels will provide a 6 dB improvement to our reconstructed quality. More details about rate distortion theory and quantisation are provided in these two good references:
 * Allen Gersho and Robert M. Gray. Vector Quantization and Signal Compression. Kluwer Academic Press, 732 pages, 1992.
 * David S. Taubman and Micheal W. Marcellin, "JPEG2000: Image compression fundamentals, standards and practice", Kluwer Academic Press, 773 pages, 2002.

## Rate distortion performance of a uniform quantiser
We demonstrate now the rate-distortion performance of a uniform quantiser and see how this resembles the six dB rule. We also note from the remarks above that a uniform quantiser is a sort of low complexity solution to quantisation. In fact, encoding results in a simple integer precision division by $\Delta$ rather than a comparison of each input data with the different quantisation bins' extrema. Moreover, at the decoder side, the only additional information one would require is only $\Delta$. Accordingly, it is interesting to verify whether a uniform quantiser is able to attain the six dB rule also for other PMFs, most notably knowing that when transformation is used in our coding scheme, the distribution of coefficients tends to be more Laplacian or Gaussian.

The following Python code cell will generate two input data: one with uniform and another with Gaussian PMF (zero mean and variance $\sigma^2$ equal to four). Uniform quantisation is applied to both inputs and the SNR is computed on the reconstructed values. A plot of the rate distortion perform is shown along the straight line associated with the 6 dB rule.

In [None]:
import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt

# Total samples
N = 1000

# Quantiser's bits
qb = np.arange(0, 8, 1)

# Generate a random variable uniformly distributed in [0, 255]
X = np.round(255*rnd.rand(N, 1)).astype(np.int32)
var_X = np.var(X)

# Generate a random Gaussian variable with mean 128 and variance 4
Xg = 2.0*rnd.randn(N, 1) + 128
var_Xg = np.var(Xg)
B = np.max(Xg) - np.min(Xg)

snr_data = np.zeros(len(qb))
snr_data_g = np.zeros(len(qb))

for i, b in enumerate(qb):
    levels = 2**b
    Q = 256.0 / float(levels)
    Qg = B / float( levels)
    Y = Q * np.round(X / Q)
    Yg = Qg*np.round(Xg / Qg)
    mse = np.mean(np.square(X - Y))
    mse_g = np.mean(np.square(Xg - Yg))
    snr_data[i] = 10*np.log10(var_X / mse)
    snr_data_g[i] = 10*np.log10(var_Xg / mse_g)

six_dB_rule = 6.0 * qb

# Plot the results and verify the 6dB rule
plt.figure(figsize=(8,8))
plt.plot(qb, snr_data, 'b-o', label='Uniform quantiser with uniform variable')
plt.xlabel('Quantiser bits', fontsize=16)
plt.ylabel('Signal-to-Noise-Ratio SNR [dB]', fontsize=16)
plt.grid()
plt.plot(qb, snr_data_g, 'k-+', label='Uniform quantiser with Gaussian variable')
plt.plot(qb, six_dB_rule, 'r-*', label='Six dB rule')
plt.legend();

As expected, the uniform quantiser applied over a uniformly distributed input provides a rate-distortion performance which follows the six dB rule. Conversely, when the input is Gaussian, then the performance is offset by approximately 4 dB. Such a suboptimal performance is due to the fact that the reproduction levels are placed at the mid-point of each interval, which for a uniform PMF is absolutely fine since each value in a given bin $b_i$ has equal chance to appear. This is not the case for a Gaussian PMF where in each bin some values have higher chance to appear than others. Accordingly, it would make sense to place the reproduction levels around those values which are more likely to appear. The procedure which does this automatically is the subject of the next section of our tutorial.

## Towards an optimal quantiser: The Lloyd-Max algorithm
As mentioned above, we want to find a procedure which adjusts the reproduction levels to fit the underlying PMF of the data. In particular, by using again the MSE as distortion measure, one can show that the reproduction levels which minimise the MSE in each quantisation bin is given by:

$$
\large
l_i = E[X|X\in b_i]= \frac{\sum_{x_i\in b_i}x_i\cdot P_X(x_i)}{\sum_{x_i\in b_i}P_X(x_i)},
$$

where $P_X$ denotes the PMF of the input $X$. The condition above is usually denoted as *centroid condition* and, for a continuos variable, becomes:
$$
\large
l_i = E[X|X\in b_i]= \frac{\int_{x\in b_i}x\cdot f_X(x)}{\int_{x\in b_i}f_X(x)},
$$

where now $f_X$ denotes the Probability Density Function (PDF). We note that the centroid condition above requires to know the partitioning of the input data into quantisation bins $b_i$. Given that we do not know beforehand what such a partitioning would look like, we could assume an initial partitioning with equal width (as for a uniform quantiser) and then compute the reproduction levels according to the centroid condition above. Once all reproduction levels have been computed, we can derive a new set of quantisation bins whereby the extrema of each bin are given by the mid point of the reproduction levels computed previously. We then compute a new set of reproduction levels and continue to iterate until convergence is reached. More precisely, the following pseudo code represents the workflow we just described in plain text:

 * k = 0
 * set $b_i^k$ equal to the bins associated with a uniform quantiser with $qb$ bits
 * apply uniform quantisation over the input data, compute the associated MSE and set it to $MSE_{old}$
 * set $\gamma$ = $\infty$
 * while $\gamma > \epsilon$:
   * compute $l_i^k$ using the centroid condition for each bin $b_i^k$
   * derive the new quantisation bins as $b_i^{k+1} = \frac{l_{i}^k + l_{i+1}^k}{2}$
   * apply the quantiser derived by these new bins and reproduction levels and compute the MSE
   * compute $\gamma = \frac{MSE_{old} - MSE}{MSE_{old}}$
   * set $k = k + 1$

Where $\epsilon$ denotes a given tolerance threshold. The iterative produce described above is also known as the [Lloyd-Max algorithm](https://en.wikipedia.org/wiki/Lloyd%27s_algorithm). The next code cell will provide you with an implementation of the Lloyd-Max algorithm, which is conveniently wrapped up as a function so we can then use it to compare its rate-distortion performance with the uniform quantiser analysed before.

In [None]:
from typing import Any, List, Tuple

from nptyping import NDArray

def lloydmax(X: NDArray[(Any), np.float64], qb: int) -> Tuple[List[float], List[float], NDArray[(Any), np.float64]]:
    levels = 1 << qb
    delta = (np.max(X) + 1 - np.min(X)) / levels

    # Quantisation bins
    bins = np.array([np.min(X) + float(i * delta) for i in range(levels + 1)], np.float64)

    # Reproduction levels
    rl = (bins[:levels] + bins[1:levels + 1]) / 2
    
    # Add a small increment to include the max value of X
    bins[-1] += 0.1

    # Codebook initialization with a uniform scalar quantiser
    XQ = np.zeros(X.shape)
    for i in range(rl.size):
        index = (bins[i] <= X) & (X < bins[i + 1])
        XQ[index] = rl[i]

    error = np.square(X - XQ)
    MSE_old = np.average(error)

    epsilon, variation, step = 1e-5, 1, 1

    # Lloyd-Max Iteration over all decision thresholds and reproduction levels
    bins_next, rl_next = np.zeros(bins.shape), np.zeros(rl.shape)
    while variation > epsilon:
        # Loop over all reproduction levels in order to adjust them wrt
        # centroid condition
        for i in range(levels):
            index = (bins[i] <= X) & (X < bins[i + 1])
            if np.all(~index):  # empty decision threshold, relative reprodution level will be the same for the next step
                rl_next[i] = rl[i]
            else:  # centroid condition
                rl_next[i] = np.sum(X[index]) / np.sum(index)

        # New decision threshold: they are at the mid point of two
        # reproduction levels
        bins_next[1:levels] = (rl_next[:levels - 1] + rl_next[1:levels]) / 2
        bins_next[0], bins_next[-1] = bins[0], bins[-1]

        # New MSE calculation
        XQ[:] = 0
        for i in range(rl_next.size):
            index = (bins_next[i] <= X) & (X < bins_next[i + 1])
            XQ[index] = rl_next[i]

        MSE = np.average(np.square(X - XQ))
        variation = (MSE_old - MSE) / (MSE_old)

        # Swap the old variables with the new ones
        bins, rl, MSE_old = bins_next, rl_next, MSE
        step += 1

    return bins, rl, XQ

The code above contains some comments to help the reader understand the flow. We are now ready to try this non uniform quantiser and measure its performance. The following code cell in Python will run the Lloyd-Max quantiser for each of the tested quantiser bit values and compute its associated SNR.

In [None]:
snr_data_lm = np.zeros((len(qb)))

for i, b in enumerate(qb):
    _, _, xq_lm = lloydmax(Xg, b)
    mse = np.average(np.square(Xg - xq_lm))
    snr_data_lm[i] = 10 * np.log10(var_Xg / mse)

# Plot the results, including the 6dB rule
plt.figure(figsize=(8,8))
plt.plot(qb, snr_data, 'b-o', label='Uniform quantiser applied to uniform PMF')
plt.xlabel('Quantiser bits', fontsize=16)
plt.ylabel('Signal-to-Noise-Ratio SNR [dB]', fontsize=16)
plt.grid()
plt.plot(qb, snr_data_g, 'k-+', label='Uniform quantiser applied to Gaussian PMF')
plt.plot(qb, snr_data_lm, 'g-x', label='Lloyd-Max quantiser applied to Gaussian PMF')
plt.plot(qb, six_dB_rule, 'r-*', label='Six dB rule')
plt.legend();

We can observe from the graph above that the Lloyd-Max algorithm starts by providing a better SNR performance at low bitrates and then tends to sit on the same performance of the uniform quantiser when applied to a Gaussian variable. This result might be surprising at first sight but it is actually not. In fact, as the number of quantiser bits gets higher, the quantisation step of the Lloyd-Max quantiser gets smaller and the PMF enclosed in each quantisation bin resembles a uniform one. In that case, the best the Lloyd-Max algorithm can do is to place all reproduction levels at the mid-point of the quantisation bin, which is exactly what a uniform quantiser would do. Finally, we also note that the Lloyd-Max algorithm would require to send the reproduction levels and quantisation bins, thus some additional rate needs to be added to the bits used by the quantiser.

## Concluding remarks
In this short tutorial we have investigated the rate-distortion performance of two types of scalar quantiser when applied to random variables with a given probability mass function. We showed how a uniform quantiser follows the six dB rule when it is applied to a random variable uniformly distributed but it is sub-optimal in the case of a Gaussian PMF. We then introduced the Lloyd-Max algorithm which provides a better rate-distortion performance, most notably when the number of bits allocated to the quantiser is small. The price to pay for this improved rate-distortion tradeoff is the additional complexity associated with the iterative Lloyd-Max procedure.