# A quick journey into frequency decomposition for transform coding
This short tutorial will consider two of the most common frequency transforms used in image and video compression: Wavelet decomposition and the Discrete Cosine Transform (DCT). As usual, it is not our main aim to review thoroughly the theoretical fundations of both these transforms: these classical textbooks on image and video compression will provide you with a comprehensive review of the topic:
 * David S. Taubman and Micheal W. Marcellin, "JPEG2000: Image compression fundamentals, standards and practice", Kluwer Academic Press, 773 pages, 2002
 * Rafael Gonzalez and Richard E. Wood, "Digital image processing", 3rd Edition, Pearson, 976 pages, 2007.

Here, it is only worth recalling that both Wavelet- and Fourier-based transforms (such as the DCT) provide a frequency domain representation of the input image pixels, whereby the former does it by applying filtering operations over different scaled versions of the input whilst the latter expresses the input as a weighted sum of 2D cosine waves vibrating at different frequencies.

In the following we will first introduce the Wavelet decomposition by considering the simplest case, i.e. the Haar Wavelet. We'll then move to compare the coding efficiency of the Haar Wavelet and DCT over two types of image content: natural and synthetic one.

## The Haar Wavelet applied over one image
We mentioned above that the Wavelet transformation is carried out by performing filtering operations over different scaled versions of the input image. More precisely, a low- and high-pass filtering is applied along the rows and columns and all four filtering combinations are considered, leading to the following types of filtered signals:
 * Low-pass along rows and columns (LL)
 * High pass along rows and Low-pass along columns (HL)
 * Low-pass along rows and High-pass along columns (LH)
 * High-pass along rows and columns (HH)

The generation of these four filtered signals is usually denoted as Mallat decomposition, whereby each quadrant has half the size of the original image: this is because, thanks to the careful choice of the filters' impulse response coefficients, half of the filtered pixels can be discarded.

We said we will consider the Haar Wavelet which uses the following two kernels for the low- and high-pass filters:
 * Low-Pass kernel: $h_{LP} = \frac{1}{\sqrt{2}}\cdot[1, 1]$
 * High-Pass kernel: $h_{HP} = \frac{1}{\sqrt{2}}\cdot[1, -1]$

We note that the cascade of these two kernels in all possible combinations would lead to highlight some image details such as:
 * Approximation of the original image (LL resulting signal)
 * Vertical edges (HL resulting signal)
 * Horizontal edges (LH resulting signal)
 * Diagonal -45 degree edge (HH resulting signal)

We are now ready to apply the Haar Wavelet to the usual `cameraman` grey scale image. We will simplify the filtering implementation by noting that for images, all combinations of cascaded filtering can be rewritten as follows:

$$
\large
\left[
    \begin{array}{cc}
    LL & HL \\
    LH & HH
    \end{array}
\right] = \frac{1}{2} \left[
    \begin{array}{cc}
    1 & 1 \\
    1 & -1
    \end{array}
\right] \cdot \left[
    \begin{array}{cc}
    P_0 & P_1 \\
    P_2 & P_3
    \end{array}
\right] \cdot \left[
    \begin{array}{cc}
    1 & 1 \\
    1 & -1
    \end{array}
\right] = \frac{1}{2} \left[
    \begin{array}{cc}
    P_0 + P_1 + P_2 + P_3 & P_0 + P_2 - (P_1 + P_3) \\
    (P_0 - P_2) + (P_1 - P_3) & P_0 - P_2 - (P_1 - P_3)
    \end{array}
\right],
$$

where $P_i$ denotes the set of pixels associated with the following:
 * $P_0$: even rows and columns
 * $P_1$: even rows and odd columns
 * $P_2$: odd rows and even columns
 * $P_3$: odd rows and columns

The following Python code cell performs the calculation of the 2D Haar Wavelet.

In [None]:
import cv2
import numpy as np

image = cv2.imread('../input-data/cameraman.tif', cv2.IMREAD_UNCHANGED).astype(np.float64)

P0 = image[0::2, 0::2]
P1 = image[0::2, 1::2]
P2 = image[1::2, 0::2]
P3 = image[1::2, 1::2]

LL = (P0 + P1 + P2 + P3) / 2
HL = ((P0 + P2) - (P1 + P3)) / 2
LH = ((P0 - P2) + (P1 - P3)) / 2
HH = ((P0 - P2) - (P1 - P3)) / 2

Now that we've computed the Haar over the input image we can visualise the resulting filtered pixels (i.e. the transform coefficients) to appreciate how each different filtering combination highlights the image details.

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(15, 15))
coefficients = [LL, HL, LH, HH]
i = 1
labels = ['LL', 'HL', 'LH', 'HH']
for coeff in coefficients:
    plt.subplot(2, 2, i)
    plt.imshow(coeff, cmap='gray')
    plt.title('Coefficients ' + labels[i-1])
    i += 1

As may be noted, image for coefficients $LL$ provide a lower (half) resolution of the original image. Coefficients $HL$ and $LH$ are associated with vertical and horizontal edged respectively. In fact, the image associated with $HL$ contains the vertical edges associated with the tall building in the background whilst the one associated with $LH$ retains all horizontal details (e.g. the photographe's shoulder). Finally, coefficients $HH$ are associated with diagonal details in the image. In fact, in the bottom right quadrant of the plot, we can see that the only edges represented are those associated with (e.g.) the camera's tripode.

From this simple graphical example one can easily realise why the Wavelet decomposition is also widely used as image feature in applications such object detection. Worth also noting that in the example above we just computed one level of decomposition of the Haar Wavelet (or resolution to use the same terminology of the JPEG2000 standard). Nothing prevents us from re-applying the same Haar Wavelet over the LL samples and continue until a given level of resolution (scaling) is achieved. This is what image standards such as JPEG2000 or JPEG-XS do. The rationale here is to keep decomposing in order to analyse which spatial details of the input image are carried forward through the different resolutions: these are usually the most important details which should be encoded with the best affordable quality. We will see in the next example how the Haar Wavelet compares against the DCT.

## Simple image coding using Haar Wavelet and DCT
We want to make a step forward and evaluate the coding efficiency of the Haar Wavelet and DCT. Accordingly, we will consider two simple image codecs, both splitting the input image into a grid of non overlapping $8\times8$ block and over these blocks either one resolution level of the the 2D Haar Wavelet or DCT is computed, depending on the codec. For both codecs, only the top left $4\times4$ quandrant associated with the resulting transform coefficients is retained and transmitted using Pulse Code Modulation (PCM). Given that both codecs operate at fixed rate (due to the use of PCM), we can compare their coding efficiency in terms of the reconstructed image quality. The following Python code cell provides a function implementing the processing associated with these simple image codecs. The function returns two `numpy` 2D arrays containing the reconstructed pixels associated with both codecs.


In [None]:
import math
from typing import Any, Tuple
from nptyping import NDArray

def two_simple_image_codecs(image: NDArray[(Any, Any), np.float64], B: int = 8) -> Tuple[NDArray[(Any, Any), np.int32], NDArray[(Any, Any), np.int32]]:
    rows, cols = image.shape[0], image.shape[1]
    HB = B // 2

    rows_blocks_units, cols_block_units = rows // B, cols // B

    rec_haar = np.zeros((rows, cols), np.int32)
    rec_dct = np.zeros((rows, cols), np.int32)
    block_selector = np.zeros((B, B), np.float64)
    block_selector[:HB, :HB] = 1

    # Compute the DCT transformation matrix
    m1, m2 = np.meshgrid(range(B), range(B))
    normaliser = np.ones((B, B), np.float64)
    normaliser[0, ::] = 1.0 / math.sqrt(B)
    normaliser[1::, ::] = math.sqrt(2.0 / B)
    cosine_basis = np.cos(np.multiply(m2, 2.0 * m1 + 1.0)*np.pi / (2.0 * B))
    T = np.multiply(cosine_basis, normaliser)
    Tt = np.transpose(T)

    for r in range(rows_blocks_units):
        rows_sel = slice(r * B, (r + 1) * B)
        for c in range(cols_block_units):
            cols_sel = slice(c * B, (c + 1) * B)
            block = image[rows_sel, cols_sel]

            # 2D Type-II DCT
            block_dct = np.matmul(T, np.matmul(block, Tt))

            # 2D Haar with one level of decomposition
            block_haar = np.zeros((B, B), np.float64)
            P0 = block[0::2, 0::2]
            P1 = block[0::2, 1::2]
            P2 = block[1::2, 0::2]
            P3 = block[1::2, 1::2]
            block_haar[0:HB, 0:HB] = (P0 + P1 + P2 + P3) / 2  # LL
            block_haar[0:HB, HB:] = ((P0 + P2) - (P1 + P3)) / 2  # HL
            block_haar[HB:, 0:HB] = ((P0 - P2) + (P1 - P3)) / 2  # LH
            block_haar[HB:, HB:] = ((P0 - P2) - (P1 - P3)) / 2  # HH

            # Retain only the top left quadrant of the resulting coefficients
            a = block_dct.copy()
            block_dct = np.multiply(block_dct, block_selector)
            block_haar = np.multiply(block_haar, block_selector)

            # Inverse DCT
            rec_dct_block = np.matmul(Tt, np.matmul(block_dct, T))
            rec_dct[rows_sel, cols_sel] = np.clip(rec_dct_block, 0, 255).astype(np.int32)
            
            # Inverse Haar
            H00 = block_haar[0:HB, 0:HB]
            H01 = block_haar[0:HB, HB::]
            H10 = block_haar[HB::, 0:HB]
            H11 = block_haar[HB::, HB::]
            LL = (H00 + H01 + H10 + H11) / 2
            HL = ((H00 + H10) - (H01 + H11)) / 2
            LH = ((H00 - H10) + (H01 - H11)) / 2
            HH = ((H00 - H10) - (H01 - H11)) / 2
            rec_haar_block = np.zeros((B, B), np.float64)
            rec_haar_block[0::2, 0::2] = LL
            rec_haar_block[0::2, 1::2] = HL
            rec_haar_block[1::2, 0::2] = LH
            rec_haar_block[1::2, 1::2] = HH
            rec_haar[rows_sel, cols_sel] = np.clip(rec_haar_block, 0, 255).astype(np.int32)

    return rec_dct.astype(np.uint8), rec_haar.astype(np.uint8)

We can use the function above to perform image encoding over different types of content. We will start with the natural that we considered from the beginning of this tutorial: `cameraman`. We'll then show the reconstructed output to appreciate the different types of artefacts introduced by the two frequency transforms considered.

In [None]:
rec_dct, rec_haar = two_simple_image_codecs(image)
plt.figure(figsize=(25, 25))
plt.subplot(1, 3, 1), plt.imshow(image, cmap='gray'), plt.title('Original')
plt.subplot(1, 3, 2), plt.imshow(rec_dct, cmap='gray'), plt.title('DCT')
plt.subplot(1, 3, 3), plt.imshow(rec_haar, cmap='gray'), plt.title('Haar Wavelet (1 level)');


From the reconstructed images above we can realise that the DCT tends to introduce more blocking than the Haar Wavelet, most notably in those areas with high level of detail (e.g. around the camera). On the other hand the Haar Wavelet tends to blur more and introduce ringing-like patterns around sharp edges (e.g. the camera's tripode or the man's coat). The Haar Wavelet's artefacts are easily explained by observing that the simple image coding algorithm implemented performs a nearest neighbour upsampling over the LL coefficients. This kind of upsampling is the simplest possible and leads inevitably to annoying artefacts, especially around sharp edges. One would be tempepted to conclude that for the problem at hand, the DCT solution is the way to go. Of course one could use better Wavelet kernels (e.g. LeGall or Cohen-Daubechies-Feauveau (CDF) as in JPEG2000) but then the trade off between complexity and coding efficiency needs to be investigated. Sticking to the simplest Wavelet implementation (Haar), we wonder whether there are cases when the Haar Wavelet can provide a better reconstructed quality than the DCT. Accordingly, we start by recalling the common knowledge by those skilled in the art that the DCT doesn't really cope very well with text like sharp edges. In fact, one would expect that these edges would expand in several coefficients in the high frequencies which are then remove because of the particular type of compression algorithm used. We wonder whether this assumption is true by calling the `two_simple_image_codecs` function over the a desktop screenshot image showing text (assumed as grey scale image for the sake of simplicity). The following Python code cell runs the encoding and compute the Peak-Signal-to-Noise-Ratio (PSNR) to compare the quality from an objective point. The cell also saves both images so that the reader can compare them with any image viewer utility.

In [None]:
# Read the thesis image
image_sc = cv2.imread('../input-data/thesis.png', cv2.IMREAD_UNCHANGED).astype(np.float64)

# Compress
rec_dct_sc, rec_haar_sc = two_simple_image_codecs(image_sc)

# Compute the PSNR
mse_dct = np.average(np.square(rec_dct_sc.astype(np.float64) - image_sc))
mse_haar = np.average(np.square(rec_haar_sc.astype(np.float64) - image_sc))
psnr_dct = 10 * np.log10(255**2 / mse_dct)
psnr_haar = 10 * np.log10(255**2 / mse_haar)
print(f"PSNR-DCT: {psnr_dct:.2f} [dB], PSNR-HAAR: {psnr_haar:.2f} [dB]")

# Save the reconstructed images onto disk (uncomment to enable the saving)
#cv2.imwrite('reconstructed_dct.png', rec_dct_sc)
#cv2.imwrite('reconstructed_haar.png', rec_haar_sc);

We can already see from the PSNR values that also this time the DCT did a better job than the Haar Wavelet. A quick visual inspection of the images saved would also confirm the PSNR score difference, with the reconstructed image associated with the Haar Wavelet having its text mode blurred. We mentioned above that the DCT should introduce more blocking over graphics content. If this has not been revealed so far is due to the fact that we are still using coding blocks pretty small ($8\times8$). So we now want to know how does the DCT perform when the block size increases to (say) 64.

In [None]:
# Compress
rec_dct_sc, rec_haar_sc = two_simple_image_codecs(image_sc, 64)

# Compute the PSNR
mse_dct = np.average(np.square(rec_dct_sc.astype(np.float64) - image_sc))
mse_haar = np.average(np.square(rec_haar_sc.astype(np.float64) - image_sc))
psnr_dct = 10 * np.log10(255**2 / mse_dct)
psnr_haar = 10 * np.log10(255**2 / mse_haar)
print(f"PSNR-DCT: {psnr_dct:.2f} [dB], PSNR-HAAR: {psnr_haar:.2f} [dB]")

# Save the reconstructed images onto disk
cv2.imwrite('reconstructed_dct_64.png', rec_dct_sc)
cv2.imwrite('reconstructed_haar_64.png', rec_haar_sc);

Despite we still have a difference in PSNR values (which may lead to the conclusion that the DCT is still doing better), the visual inspection would reveal annoying ringing artefacts around the edges of the reconstructed image associated with the DCT.

In [None]:
roi = [420, 580, 120, 420]
rows_sel = slice(roi[0], roi[0] + roi[2])
cols_sel = slice(roi[1], roi[1] + roi[3])
region_dct = rec_dct_sc[rows_sel, cols_sel]
region_haar = rec_haar_sc[rows_sel, cols_sel]

plt.figure(figsize=(25, 25))
plt.subplot(1, 2, 1), plt.imshow(region_dct, cmap='gray'), plt.title('DCT')
plt.subplot(1, 2, 2), plt.imshow(region_haar, cmap='gray'), plt.title('Haar');

In this case, we can notice that the DCT starts to show some coding artefacts. One might wonder why bother with larger blocks. This can help the entropy coding in case a scheme such as run-length encoding is selected: if the entropy encoding needs to be confined within the coding block, the larger the runs of zeros, the more efficient the encoder would be.

## Conclusive remarks
We analysed the property of the Wavelet decomposition according to the Mallat scheme. This has revealed to be useful when we want to highlight some particular features of an image. We then move to compare the coding efficiency of a simple transform-based image encoder which uses either the DCT or the Haar Wavelet. Over natural and text images we've seen that the DCT-based codec provides a better quality both visually as well as in terms of PSNR. The Wavelet transform starts to offer a more graceful quality degration when large blocks are used instead.