üìù **Author:** Amirhossein Heydari - üìß **Email:** <amirhosseinheydari78@gmail.com> - üìç **Origin:** [mr-pylin/media-processing-workshop](https://github.com/mr-pylin/media-processing-workshop)

---


**Table of contents**<a id='toc0_'></a>    
- [Dependencies](#toc1_)    
- [Load Images](#toc2_)    
- [Image Compression](#toc3_)    
  - [Joint Photographic Experts Group (JPEG)](#toc3_1_)    
    - [JPEG (aka JPEG-1)](#toc3_1_1_)    
      - [Encoder](#toc3_1_1_1_)    
        - [Color Space Conversion](#toc3_1_1_1_1_)    
        - [Chroma Subsampling (4:2:0)](#toc3_1_1_1_2_)    
        - [Level Shifting](#toc3_1_1_1_3_)    
        - [Block Splitting](#toc3_1_1_1_4_)    
        - [Forward Discrete Cosine Transform (DCT)](#toc3_1_1_1_5_)    
        - [Quantization](#toc3_1_1_1_6_)    
        - [Zigzag Scanning](#toc3_1_1_1_7_)    
        - [Differential Pulse-Code Modulation (DPCM)](#toc3_1_1_1_8_)    
        - [Run-Length Encoding (RLE)](#toc3_1_1_1_9_)    
        - [Entropy Coding: Huffman encoding (or arithmetic encoding)](#toc3_1_1_1_10_)    
        - [Bitstream Packaging](#toc3_1_1_1_11_)    
      - [Decoder](#toc3_1_1_2_)    
        - [Bitstream Parsing](#toc3_1_1_2_1_)    
        - [Entropy Decoding: Huffman decoding (or arithmetic decoding)](#toc3_1_1_2_2_)    
        - [Differential Pulse-Code Modulation (DPCM)](#toc3_1_1_2_3_)    
        - [Run-Length Encoding (RLE)](#toc3_1_1_2_4_)    
        - [Inverse Zigzag](#toc3_1_1_2_5_)    
        - [Dequantization](#toc3_1_1_2_6_)    
        - [Inverse Discrete Cosine Transform (IDCT)](#toc3_1_1_2_7_)    
        - [Block Joining](#toc3_1_1_2_8_)    
        - [Level Shifting](#toc3_1_1_2_9_)    
        - [Chroma Upsampling (4:4:4)](#toc3_1_1_2_10_)    
        - [Color Space Conversion](#toc3_1_1_2_11_)    
    - [JPEG 2000 (JPEG-2)](#toc3_1_2_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Dependencies](#toc0_)

In [None]:
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
from numpy.typing import NDArray

In [None]:
np.set_printoptions(linewidth=160)

# <a id='toc2_'></a>[Load Images](#toc0_)


In [None]:
im_1_path = "../assets/images/misc/CH02_Fig0222(b)(cameraman).jpg"
im_1 = plt.imread(fname=im_1_path)
im_1_fs = Path(im_1_path).stat().st_size

im_2_path = "../assets/images/misc/CH06_Fig0638(a)(lenna_RGB).jpg"
im_2 = plt.imread(fname=im_2_path)
im_2_fs = Path(im_2_path).stat().st_size

In [None]:
# plot
fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(10, 4), layout="compressed")
axs[0].imshow(im_1, cmap="gray")
axs[0].set_title(f"{np.prod(im_1.shape) / 1024:,} KiB -> {im_1_fs / 1024:,.2f} KiB")
axs[0].axis("off")
axs[1].imshow(im_2)
axs[1].set_title(f"{np.prod(im_2.shape) / 1024:,} KiB -> {im_2_fs / 1024:,.2f} KiB")
axs[1].axis("off")
plt.show()

# <a id='toc3_'></a>[Image Compression](#toc0_)

Image compression is achieved by **reducing redundancies** present in an image.

üìù **Docs**:

- ITU-R Recommendation BT.601-7 [an international technical standard]: [itu.int/dms_pubrec/itu-r/rec/bt/r-rec-bt.601-7-201103-i!!pdf-e.pdf](http://itu.int/dms_pubrec/itu-r/rec/bt/r-rec-bt.601-7-201103-i!!pdf-e.pdf)
- Check out the [**color-space.ipynb**](./utils/color-space.ipynb) and [**compression.ipynb**](./utils/compression.ipynb) notebooks for comprehensive information on the topic of image compression.


## <a id='toc3_1_'></a>[Joint Photographic Experts Group (JPEG)](#toc0_)


### <a id='toc3_1_1_'></a>[JPEG (aka JPEG-1)](#toc0_)


#### <a id='toc3_1_1_1_'></a>[Encoder](#toc0_)


##### <a id='toc3_1_1_1_1_'></a>[Color Space Conversion](#toc0_)

JPEG compression starts by converting the image from **RGB** to **YCbCr** color space using the **ITU-R BT.601** matrix.  
This is a **lossless** step that separates brightness from color data, enabling efficient compression.  

**Components of YCbCr in JPEG**  
- **Y (Luminance)**:  
  - Represents brightness (perceptually weighted sum of RGB).  
  - Formula:  
    $$
    Y = 0.299R + 0.587G + 0.114B
    $$  
  - Range: `[0, 255]` (no offset).  

- **Cb (Chrominance-Blue)**:  
  - Encodes the difference between blue and luma.  
  - Formula:  
    $$
    Cb = -0.168736R - 0.331264G + 0.5B + 128
    $$  
  - Range: `[0, 255]` (centered at **128**, equivalent to `[-128, 127]`).  

- **Cr (Chrominance-Red)**:  
  - Encodes the difference between red and luma.  
  - Formula:  
    $$
    Cr = 0.5R - 0.418688G - 0.081312B + 128
    $$  
  - Range: Same as Cb.  

**Matrix Form (Standard JPEG Conversion)**  
$$
\begin{bmatrix}
Y \\ Cb \\ Cr
\end{bmatrix}
=
\begin{bmatrix}
0.299 & 0.587 & 0.114 \\
-0.168736 & -0.331264 & 0.5 \\
0.5 & -0.418688 & -0.081312
\end{bmatrix}
\cdot
\begin{bmatrix}
R \\ G \\ B
\end{bmatrix}
+
\begin{bmatrix}
0 \\ 128 \\ 128
\end{bmatrix}
$$


In [None]:
def rgb_to_ycbcr_jpeg(rgb_img: NDArray) -> NDArray[np.uint8]:
    """
    Convert an RGB image to YCbCr color space using the JPEG BT.601 standard.

    Args:
        rgb_img (NDArray[int, int, 3]):
            RGB image with shape (H, W, 3).
            Dtype can be uint8 ([0, 255]) or float32 ([0.0, 1.0] scaled to [0, 255]).

    Returns:
        NDArray[int, int, 3]:
            YCbCr image with shape (H, W, 3), dtype uint8, values in [0, 255].
    """
    # standard JPEG conversion matrix (BT.601)
    transform = np.array(
        [
            [0.299, 0.587, 0.114],
            [-0.168736, -0.331264, 0.5],
            [0.5, -0.418688, -0.081312],
        ],
        dtype=np.float32,
    )

    bias = np.array([0, 128, 128], dtype=np.float32)

    # apply transformation (automatically broadcasts to H,W,3)
    ycbcr = rgb_img @ transform.T + bias

    # clip to range [0, 255]
    return np.clip(ycbcr, 0, 255).astype(np.uint8)

In [None]:
im_2_ycbcr = rgb_to_ycbcr_jpeg(im_2)

# split channels
Y, Cb, Cr = im_2_ycbcr[..., 0], im_2_ycbcr[..., 1], im_2_ycbcr[..., 2]

# log
print(f"Y.shape :{Y.shape} | Y.dtype :{Y.dtype} | Y.min() :{Y.min():6.02f} | Y.max() :{Y.max():6.02f}")
print(f"Cb.shape:{Cb.shape} | Cb.dtype:{Cb.dtype} | Cb.min():{Cb.min():6.02f} | Cb.max():{Cb.max():6.02f}")
print(f"Cr.shape:{Cr.shape} | Cr.dtype:{Cr.dtype} | Cr.min():{Cr.min():6.02f} | Cr.max():{Cr.max():6.02f}")

In [None]:
# plot
fig, axs = plt.subplots(1, 4, figsize=(18, 5), layout="compressed")

axs[0].imshow(im_2)
axs[0].set_title("Original RGB")

im1 = axs[1].imshow(Y, cmap="gray", vmin=0, vmax=255)
axs[1].set_title("Y (Luminance)")
plt.colorbar(im1, ax=axs[1], fraction=0.046, pad=0.04)

im2 = axs[2].imshow(Cb, cmap="coolwarm", vmin=0, vmax=255)
axs[2].set_title("Blue Chrominance (Cb)")
plt.colorbar(im2, ax=axs[2], fraction=0.046, pad=0.04)

im3 = axs[3].imshow(Cr, cmap="PiYG", vmin=0, vmax=255)
axs[3].set_title("Red Chrominance (Cr)")
plt.colorbar(im3, ax=axs[3], fraction=0.046, pad=0.04)

for ax in axs:
    ax.axis("off")
plt.show()

##### <a id='toc3_1_1_1_2_'></a>[Chroma Subsampling (4:2:0)](#toc0_)

In the **YCbCr** color space, the **Y (luminance)** channel is preserved at full resolution, while the **Cb** and **Cr (chrominance)** channels are downsampled.  
This leverages the fact that the human visual system is more sensitive to brightness than color.

The most common subsampling scheme in JPEG is **4:2:0**:
- Y: full resolution
- Cb, Cr: half resolution **in both horizontal and vertical directions**
- More details: [**compression.ipynb**](./utils/compression.ipynb)

This significantly reduces data size with little perceptible impact.

**Common 4:2:0 subsampling**:

- **Decimation (pixel replication):**
  - Keeps only the top-left pixel of each 2√ó2 block.
  - This is the default in most JPEG libraries like libjpeg.
  
- **Averaging (mean pooling):**
  - Replaces each 2√ó2 block with the average of its four pixels.
  - This method is smoother and slightly more accurate but less commonly used in JPEG encoders.


In [None]:
def subsample_420(chroma_img: NDArray) -> np.ndarray:
    """
    Perform JPEG-compliant 4:2:0 chroma subsampling (decimation) using NumPy.

    Args:
        chroma_img: Chroma channel image (Cr or Cb), values in [0, 255],
                    dtype either uint8 or float32.

    Returns:
        NDArray[uint8]: Subsampled chroma image with shape (H//2, W//2),
                           dtype uint8.
    """
    if chroma_img.dtype != np.uint8:
        chroma_img = np.round(chroma_img).astype(np.uint8)

    # pad to even dimensions (mirror edges)
    h, w = chroma_img.shape
    if h % 2:
        chroma_img = np.pad(chroma_img, ((0, 1), (0, 0)), mode="edge")
    if w % 2:
        chroma_img = np.pad(chroma_img, ((0, 0), (0, 1)), mode="edge")

    # decimate each 2√ó2 block
    return chroma_img[::2, ::2]

In [None]:
Cb_sub = subsample_420(Cb)
Cr_sub = subsample_420(Cr)

# log
print(f"Y.shape:{Y.shape} | Cb_sub.shape:{Cb_sub.shape} | Cr_sub.shape:{Cr_sub.shape}")

In [None]:
# plot
H1, W1 = Cb.shape
H2, W2 = Cb_sub.shape
Cb_shifted_sub_pad = np.pad(Cb_sub, ((0, H1-H2), (0, W1-W2)))
Cr_shifted_sub_pad = np.pad(Cr_sub, ((0, H1-H2), (0, W1-W2)))

fig, axs = plt.subplots(1, 4, figsize=(16, 5), layout="compressed")
axs[0].imshow(Cb, cmap="coolwarm", vmin=0, vmax=255)
axs[0].set_title(f"Original Cb : {Cb.shape}")
axs[0].axis("off")
axs[1].imshow(Cb_shifted_sub_pad, cmap="coolwarm", vmin=0, vmax=255)
axs[1].set_title(f"Cb (4:2:0 Subsampled) : {Cb_sub.shape}")
axs[1].axis("off")
axs[2].imshow(Cr, cmap="PiYG", vmin=0, vmax=255)
axs[2].set_title(f"Original Cr : {Cr.shape}")
axs[2].axis("off")
axs[3].imshow(Cr_shifted_sub_pad, cmap="PiYG", vmin=0, vmax=255)
axs[3].set_title(f"Cr (4:2:0 Subsampled) : {Cr_sub.shape}")
axs[3].axis("off")
plt.show()

In [None]:
# plot
Cb_sub_up = Cb_sub.repeat(2, axis=0).repeat(2, axis=1)
Cr_sub_up = Cr_sub.repeat(2, axis=0).repeat(2, axis=1)

fig, axs = plt.subplots(1, 4, figsize=(16, 5), layout="compressed")
axs[0].imshow(Cb, cmap="coolwarm", vmin=0, vmax=255)
axs[0].set_title(f"Original Cb : {Cb.shape}")
axs[0].axis("off")
axs[1].imshow(Cb_sub_up, cmap="coolwarm", vmin=0, vmax=255)
axs[1].set_title(f"Upsampled Cb : {Cb_sub_up.shape}")
axs[1].axis("off")
axs[2].imshow(Cr, cmap="PiYG", vmin=0, vmax=255)
axs[2].set_title(f"Original Cr : {Cr.shape}")
axs[2].axis("off")
axs[3].imshow(Cr_sub_up, cmap="PiYG", vmin=0, vmax=255)
axs[3].set_title(f"Upsampled Cr : {Cr_sub_up.shape}")
axs[3].axis("off")
plt.show()

##### <a id='toc3_1_1_1_3_'></a>[Level Shifting](#toc0_)

Before applying the **Discrete Cosine Transform (DCT)**, JPEG performs **level shifting** to center pixel values around zero.  
The DCT is optimized for input values centered **around zero** **[-128, 127]**.

$$
I_{\text{shifted}}(x, y) = I(x, y) - 128
$$


In [None]:
# level shift: convert from uint8 [0, 255] to int16 [-128, 127]
Y_shifted = Y.astype(np.int16) - 128
Cb_sub_shifted = Cb_sub.astype(np.int16) - 128
Cr_sub_shifted = Cr_sub.astype(np.int16) - 128

# log
print(
    f"Y_shifted.min()     :{Y_shifted.min():7.02f} | "
    f"Y_shifted.max()     :{Y_shifted.max():7.02f} | "
    f"Y_shifted.mean()     :{Y_shifted.mean():7.02f} | "
    f"Y_shifted.std()     :{Y_shifted.std():7.02f}"
)
print(
    f"Cb_sub_shifted.min():{Cb_sub_shifted.min():7.02f} | "
    f"Cb_sub_shifted.max():{Cb_sub_shifted.max():7.02f} | "
    f"Cb_sub_shifted.mean():{Cb_sub_shifted.mean():7.02f} | "
    f"Cb_sub_shifted.std():{Cb_sub_shifted.std():7.02f}"
)
print(
    f"Cr_sub_shifted.min():{Cr_sub_shifted.min():7.02f} | "
    f"Cr_sub_shifted.max():{Cr_sub_shifted.max():7.02f} | "
    f"Cr_sub_shifted.mean():{Cr_sub_shifted.mean():7.02f} | "
    f"Cr_sub_shifted.std():{Cr_sub_shifted.std():7.02f}"
)

##### <a id='toc3_1_1_1_4_'></a>[Block Splitting](#toc0_)

Once the image is in the **YCbCr** color space (with optional chroma subsampling applied), JPEG divides each channel (**Y**, **Cb**, **Cr**) into non-overlapping **8√ó8 blocks**.

This step is essential because:
- The next stages (DCT, quantization, entropy coding) operate **block-wise**
- Working in small spatial regions improves compression by localizing frequency information

If the image dimensions are **not divisible by 8**, JPEG pads the image (usually by repeating the edge pixels).

> üìê Each channel is reshaped independently into blocks of shape (H//8, W//8, 8, 8), which allows easy vectorized processing in later steps.


In [None]:
def split_into_blocks(image: NDArray, block_size: int = 8) -> NDArray[np.int16]:
    """
    Split an image into non-overlapping square blocks.

    Args:
        image: Input 2D image (Y, Cb_sub, or Cr_sub), shape (H, W).
        block_size: Size of each block (default 8 for JPEG).

    Returns:
        NDArray[np.int16]: Array of shape (num_blocks, block_size, block_size),
                           with dtype int16.
    """
    if image.dtype != np.int16:
        image = image.astype(np.int16)

    # pad image if not divisible by block_size
    pad_h = (block_size - image.shape[0] % block_size) % block_size
    pad_w = (block_size - image.shape[1] % block_size) % block_size
    padded = np.pad(image, ((0, pad_h), (0, pad_w)), mode="edge")

    # carve into (H//block_size, block_size, W//block_size, block_size), then reorder to (N_blocks_h, N_blocks_w, block_size, block_size)
    h, w = padded.shape
    blocks = padded.reshape(h // block_size, block_size, w // block_size, block_size)
    blocks = blocks.transpose(0, 2, 1, 3)

    # flatten to (N_blocks, block_size, block_size)
    return blocks.reshape(-1, block_size, block_size)

In [None]:
Y_blocks = split_into_blocks(Y_shifted)
Cb_blocks = split_into_blocks(Cb_sub_shifted)
Cr_blocks = split_into_blocks(Cr_sub_shifted)

# log
print(f"Y_blocks.shape  :{Y_blocks.shape}")
print(f"Cb_blocks.shape :{Cb_blocks.shape}")
print(f"Cr_blocks.shape :{Cr_blocks.shape}")

##### <a id='toc3_1_1_1_5_'></a>[Forward Discrete Cosine Transform (DCT)](#toc0_)

Each **8√ó8 block** (from Y, Cb, Cr channels) undergoes the **2D Discrete Cosine Transform**.  
DCT converts spatial domain values into frequency domain coefficients.

The benefits of DCT:
- Concentrates most of the visual information in a few low-frequency components.
- Enables JPEG to efficiently discard high-frequency components (which humans perceive less).

The result is an **8√ó8 matrix of DCT coefficients**:
- **Top-left corner** (0,0): DC coefficient (average brightness)
- **Bottom-right**: High-frequency AC coefficients

> üìâ Most energy is compacted in the upper-left region, which allows aggressive quantization and compression.


In [None]:
def apply_dct_to_blocks(blocks: NDArray) -> np.ndarray:
    """
    Apply 2D DCT (type-II, orthonormal) to each 8x8 block using scipy.fft.dctn.

    Args:
        blocks: Array of blocks, shape (N, 8, 8), dtype int16 or float32.

    Returns:
        NDArray[float32]: DCT transformed blocks, shape (N, 8, 8).
    """
    dct_blocks = np.zeros_like(blocks, dtype=np.float32)
    for i in range(blocks.shape[0]):
        dct_blocks[i] = sp.fft.dctn(blocks[i], norm="ortho", type=2)
    return dct_blocks

In [None]:
# apply DCT to all channels
Y_dct = apply_dct_to_blocks(Y_blocks)
Cb_dct = apply_dct_to_blocks(Cb_blocks)
Cr_dct = apply_dct_to_blocks(Cr_blocks)

# log
print(f"Y_dct.shape :{Y_dct.shape} | Y_dct.dtype :{Y_dct.dtype}")
print(f"Cb_dct.shape:{Cb_dct.shape} | Cb_dct.dtype:{Cb_dct.dtype}")
print(f"Cr_dct.shape:{Cr_dct.shape} | Cr_dct.dtype:{Cr_dct.dtype}")

##### <a id='toc3_1_1_1_6_'></a>[Quantization](#toc0_)

Quantization is the core step in JPEG that achieves **lossy compression**.

After applying the DCT to each 8√ó8 block in the **Y**, **Cb**, and **Cr** channels, the resulting frequency coefficients are **divided by a quantization matrix** and rounded to integers:

$$
Q_{u,v} = \text{round} \left( \frac{F_{u,v}}{Q_{\text{table}}(u,v)} \right)
$$

This reduces precision for higher-frequency components (bottom-right of the block), which are less visually important.

üìå **Notes**:
- Lower values in the quantization table ‚Üí preserve more detail (less compression)
- Higher values ‚Üí more aggressive compression (and loss)
- Custom tables can be used for different quality levels


In [None]:
# Luma (Y) quantization table (Quality=50)
Q_Y = np.array(
    [
        [16, 11, 10, 16, 24, 40, 51, 61],
        [12, 12, 14, 19, 26, 58, 60, 55],
        [14, 13, 16, 24, 40, 57, 69, 56],
        [14, 17, 22, 29, 51, 87, 80, 62],
        [18, 22, 37, 56, 68, 109, 103, 77],
        [24, 35, 55, 64, 81, 104, 113, 92],
        [49, 64, 78, 87, 103, 121, 120, 101],
        [72, 92, 95, 98, 112, 100, 103, 99],
    ],
    dtype=np.uint8,
)

In [None]:
# Chroma (Cb/Cr) quantization table (Quality=50)
Q_C = np.array(
    [
        [17, 18, 24, 47, 99, 99, 99, 99],
        [18, 21, 26, 66, 99, 99, 99, 99],
        [24, 26, 56, 99, 99, 99, 99, 99],
        [47, 66, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99],
    ],
    dtype=np.uint8,
)

In [None]:
def scale_quant_table(base_table: NDArray, quality: int) -> NDArray[np.uint8]:
    """
    Scale the JPEG quantization table according to quality setting.

    Args:
        base_table: Base quantization table as np.ndarray.
        quality: Quality factor from 1 (lowest) to 100 (highest).

    Returns:
        NDArray[uint8]: Scaled quantization table, clipped to [1, 255].
    """
    base_table = base_table.astype(np.float32)
    scale = 5000 / quality if quality < 50 else 200 - 2 * quality
    scaled_table = np.clip(np.floor((base_table * scale + 50) / 100), 1, 255)
    return scaled_table.astype(np.uint8)

In [None]:
Q_Y_50 = scale_quant_table(Q_Y, quality=50)
Q_C_50 = scale_quant_table(Q_C, quality=50)

print(f"Q_Y_50:\n{Q_Y_50}\n")
print(f"Q_C_50:\n{Q_C_50}")

In [None]:
def quantize(dct_blocks: NDArray, quantization_table: NDArray) -> NDArray[np.int16]:
    """
    Quantize DCT blocks using the quantization table.

    Args:
        dct_blocks: DCT coefficients, shape (N, 8, 8), dtype float32.
        quantization_table: Quantization table, shape (8, 8).

    Returns:
        NDArray[np.int16]: Quantized coefficients, shape (N, 8, 8).
    """
    return np.round(dct_blocks / quantization_table).astype(np.int16)


# quantize all channels
Y_quantized = quantize(Y_dct, Q_Y_50)
Cb_quantized = quantize(Cb_dct, Q_C_50)
Cr_quantized = quantize(Cr_dct, Q_C_50)

# log
print(f"Y_quantized.shape :{Y_quantized.shape} | Y_quantized.dtype :{Y_quantized.dtype}")
print(f"Cb_quantized.shape:{Cb_quantized.shape} | Cb_quantized.dtype:{Cb_quantized.dtype}")
print(f"Cr_quantized.shape:{Cr_quantized.shape} | Cr_quantized.dtype:{Cr_quantized.dtype}")

##### <a id='toc3_1_1_1_7_'></a>[Zigzag Scanning](#toc0_)

After quantization, each **8√ó8 DCT block** contains many zeros, especially toward the bottom-right (high frequencies).  
To efficiently encode these sparse blocks, JPEG uses **zigzag scanning** to convert the 2D block into a 1D sequence.

The **zigzag pattern** orders the coefficients:
- From low to high frequency  
- Groups non-zero values early  
- Clusters zeros at the end ‚Üí ideal for **run-length encoding**

üåÄ **Zigzag Pattern (8√ó8):**

```text
 0  1  5  6 14 15 27 28  
 2  4  7 13 16 26 29 42  
 3  8 12 17 25 30 41 43  
 9 11 18 24 31 40 44 53  
10 19 23 32 39 45 52 54  
20 22 33 38 46 51 55 60  
21 34 37 47 50 56 59 61  
35 36 48 49 57 58 62 63
```

In [None]:
# zig-zag indices for an 8x8 block (JPEG standard order)
ZIG_ZAG_ORDER = np.array(
    [
        [0, 1, 5, 6, 14, 15, 27, 28],
        [2, 4, 7, 13, 16, 26, 29, 42],
        [3, 8, 12, 17, 25, 30, 41, 43],
        [9, 11, 18, 24, 31, 40, 44, 53],
        [10, 19, 23, 32, 39, 45, 52, 54],
        [20, 22, 33, 38, 46, 51, 55, 60],
        [21, 34, 37, 47, 50, 56, 59, 61],
        [35, 36, 48, 49, 57, 58, 62, 63],
    ]
)

In [None]:
def zigzag_scan(quantized_blocks: NDArray) -> NDArray[np.int16]:
    """
    Convert quantized 8x8 blocks to 1D arrays using zig-zag scan order.

    Args:
        quantized_blocks: Quantized blocks, shape (N, 8, 8).

    Returns:
        NDArray[np.int16]: Zig-zag scanned coefficients, shape (N, 64).
    """
    N = quantized_blocks.shape[0]
    zigzag_blocks = np.zeros((N, 64), dtype=np.int16)

    for i in range(N):
        # flatten the block using pre-defined zig-zag order
        zigzag_blocks[i] = quantized_blocks[i].flatten()[ZIG_ZAG_ORDER.flatten()]

    return zigzag_blocks

In [None]:
# apply to all channels
Y_zigzag = zigzag_scan(Y_quantized)
Cb_zigzag = zigzag_scan(Cb_quantized)
Cr_zigzag = zigzag_scan(Cr_quantized)

# log
print(f"Y_zigzag.shape :{Y_zigzag.shape}")
print(f"Cb_zigzag.shape:{Cb_zigzag.shape}")
print(f"Cr_zigzag.shape:{Cr_zigzag.shape}")

##### <a id='toc3_1_1_1_8_'></a>[Differential Pulse-Code Modulation (DPCM)](#toc0_)

In [None]:
def extract_dc_coefficients(zigzag_blocks: NDArray) -> NDArray[np.int16]:
    """
    Extract DC coefficients from zig-zag scanned blocks.

    Args:
        zigzag_blocks: Zig-zag scanned blocks, shape (N, 64).

    Returns:
        NDArray[np.int16]: DC coefficients array of shape (N,).
    """
    return zigzag_blocks[:, 0]

In [None]:
# extract DC coefficients for all channels
Y_dc = extract_dc_coefficients(Y_zigzag)
Cb_dc = extract_dc_coefficients(Cb_zigzag)
Cr_dc = extract_dc_coefficients(Cr_zigzag)

# log
print(f"Y_dc.shape :{Y_dc.shape} | Y_dc[:16] :{Y_dc[:16]}")
print(f"Cb_dc.shape:{Cb_dc.shape} | Cb_dc[:16]:{Cb_dc[:16]}")
print(f"Cr_dc.shape:{Cr_dc.shape} | Cr_dc[:16]:{Cr_dc[:16]}")

In [None]:
def dpcm_encode(dc_coeffs: NDArray) -> NDArray[np.int16]:
    """
    Differential Pulse Code Modulation (DPCM) for DC coefficients.

    Args:
        dc_coeffs: DC coefficients array, shape (N,).

    Returns:
        NDArray[np.int16]: DPCM encoded differences, shape (N,).
    """
    differences = np.zeros_like(dc_coeffs)
    differences[0] = dc_coeffs[0]
    differences[1:] = np.diff(dc_coeffs)
    return differences

In [None]:
# apply DPCM to all channels
Y_dc_diff = dpcm_encode(Y_dc)
Cb_dc_diff = dpcm_encode(Cb_dc)
Cr_dc_diff = dpcm_encode(Cr_dc)

# log
print(f"Y_dc_diff.shape :{Y_dc_diff.shape} | Y_dc_diff[:16] :{Y_dc_diff[:16]}")
print(f"Cb_dc_diff.shape:{Cb_dc_diff.shape} | Cb_dc_diff[:16]:{Cb_dc_diff[:16]}")
print(f"Cr_dc_diff.shape:{Cr_dc_diff.shape} | Cr_dc_diff[:16]:{Cr_dc_diff[:16]}")

##### <a id='toc3_1_1_1_9_'></a>[Run-Length Encoding (RLE)](#toc0_)


In [None]:
def rle_encode_ac(ac_coeffs: NDArray) -> list[tuple[int, int]]:
    """
    Run-Length Encode AC coefficients according to JPEG spec.

    Args:
        ac_coeffs: 1D array of AC coefficients (zig-zag order, excluding DC).

    Returns:
        List of (run_length, size) tuples; (0, 0) indicates End-of-Block (EOB).
    """
    encoded = []
    zero_run = 0

    for coeff in ac_coeffs:
        if coeff == 0:
            zero_run += 1
        else:
            while zero_run >= 16:
                encoded.append((15, 0))  # ZRL code
                zero_run -= 16

            encoded.append((zero_run, int(coeff)))
            zero_run = 0

    if zero_run > 0 or not encoded:
        encoded.append((0, 0))  # EOB

    return encoded


def batch_rle_encode(blocks: NDArray) -> list[list[tuple[int, int]]]:
    """
    Apply RLE to a batch of AC coefficient blocks.

    Args:
        blocks: Array of zig-zag scanned blocks of shape (N, 64) or (N, 63).

    Returns:
        List of RLE encoded lists for each block.
    """
    return [rle_encode_ac(block[1:] if len(block) == 64 else block) for block in blocks]

In [None]:
# apply to all channels
Y_rle = batch_rle_encode(Y_zigzag)
Cb_rle = batch_rle_encode(Cb_zigzag)
Cr_rle = batch_rle_encode(Cr_zigzag)

# log
print(f"len(Y_rle) :{len(Y_rle)}")
print(f"len(Cb_rle):{len(Cb_rle)}")
print(f"len(Cr_rle):{len(Cr_rle)}")

##### <a id='toc3_1_1_1_10_'></a>[Entropy Coding: Huffman encoding (or arithmetic encoding)](#toc0_)


In [None]:
# https://github.com/libjpeg-turbo/libjpeg-turbo/blob/main/src/jstdhuff.c

In [None]:
def build_huffman_dc_dict(bits: list[int], vals: list[int]) -> dict[int, str]:
    """
    Build Huffman code dictionary for DC coefficients.

    Args:
        bits: List with counts of codes for lengths 1 to 16.
        vals: List of symbol values.

    Returns:
        Dictionary mapping symbol -> Huffman bitstring.
    """
    huffman_table = {}
    code = 0
    index = 0

    for bit_length in range(1, len(bits)):
        count = bits[bit_length]
        for _ in range(count):
            code_str = format(code, f"0{bit_length}b")
            huffman_table[code_str] = vals[index]
            code += 1
            index += 1
        code <<= 1

    return {v: k for k, v in huffman_table.items()}

bits_dc_luminance = [0, 0, 1, 5, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]
val_dc_luminance = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

bits_dc_chrominance = [0, 0, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
val_dc_chrominance = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

# log
HUFFMAN_DC_LUMA = build_huffman_dc_dict(bits_dc_luminance, val_dc_luminance)
HUFFMAN_DC_CHROMA = build_huffman_dc_dict(bits_dc_chrominance, val_dc_chrominance)

In [None]:
def build_huffman_ac_dict(bits: list[int], vals: list[int]) -> dict[tuple[int, int], str]:
    """
    Build Huffman code dictionary for AC coefficients.

    Args:
        bits: List with counts of codes for lengths 1 to 16.
        vals: List of symbol values (run-length/size pairs encoded as one byte).

    Returns:
        Dictionary mapping (run_length, size) tuples -> Huffman bitstrings.
    """
    code_lengths = bits[1:17]

    huffman_dict = {}
    code = 0
    pos = 0

    for length, count in enumerate(code_lengths, start=1):
        for _ in range(count):
            symbol = vals[pos]
            runlength = symbol >> 4
            size = symbol & 0xF
            code_str = format(code, f"0{length}b")
            huffman_dict[(runlength, size)] = code_str
            code += 1
            pos += 1
        code <<= 1

    return huffman_dict


bits_ac_luminance = [0, 0, 2, 1, 3, 3, 2, 4, 3, 5, 5, 4, 4, 0, 0, 1, 0x7D]
val_ac_luminance = [
    0x01, 0x02, 0x03, 0x00, 0x04, 0x11, 0x05, 0x12,
    0x21, 0x31, 0x41, 0x06, 0x13, 0x51, 0x61, 0x07,
    0x22, 0x71, 0x14, 0x32, 0x81, 0x91, 0xa1, 0x08,
    0x23, 0x42, 0xb1, 0xc1, 0x15, 0x52, 0xd1, 0xf0,
    0x24, 0x33, 0x62, 0x72, 0x82, 0x09, 0x0a, 0x16,
    0x17, 0x18, 0x19, 0x1a, 0x25, 0x26, 0x27, 0x28,
    0x29, 0x2a, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39,
    0x3a, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49,
    0x4a, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59,
    0x5a, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69,
    0x6a, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79,
    0x7a, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89,
    0x8a, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98,
    0x99, 0x9a, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
    0xa8, 0xa9, 0xaa, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6,
    0xb7, 0xb8, 0xb9, 0xba, 0xc2, 0xc3, 0xc4, 0xc5,
    0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xd2, 0xd3, 0xd4,
    0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xe1, 0xe2,
    0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea,
    0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8,
    0xf9, 0xfa
]

bits_ac_chrominance = [0, 0, 2, 1, 2, 4, 4, 3, 4, 7, 5, 4, 4, 0, 1, 2, 0x77]
val_ac_chrominance = [
    0x00, 0x01, 0x02, 0x03, 0x11, 0x04, 0x05, 0x21,
    0x31, 0x06, 0x12, 0x41, 0x51, 0x07, 0x61, 0x71,
    0x13, 0x22, 0x32, 0x81, 0x08, 0x14, 0x42, 0x91,
    0xa1, 0xb1, 0xc1, 0x09, 0x23, 0x33, 0x52, 0xf0,
    0x15, 0x62, 0x72, 0xd1, 0x0a, 0x16, 0x24, 0x34,
    0xe1, 0x25, 0xf1, 0x17, 0x18, 0x19, 0x1a, 0x26,
    0x27, 0x28, 0x29, 0x2a, 0x35, 0x36, 0x37, 0x38,
    0x39, 0x3a, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48,
    0x49, 0x4a, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58,
    0x59, 0x5a, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68,
    0x69, 0x6a, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78,
    0x79, 0x7a, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
    0x88, 0x89, 0x8a, 0x92, 0x93, 0x94, 0x95, 0x96,
    0x97, 0x98, 0x99, 0x9a, 0xa2, 0xa3, 0xa4, 0xa5,
    0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xb2, 0xb3, 0xb4,
    0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xc2, 0xc3,
    0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xd2,
    0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda,
    0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9,
    0xea, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8,
    0xf9, 0xfa
]

# log
HUFFMAN_AC_LUMA = build_huffman_ac_dict(bits_ac_luminance, val_ac_luminance)
HUFFMAN_AC_CHROMA = build_huffman_ac_dict(bits_ac_chrominance, val_ac_chrominance)

In [None]:
HUFFMAN_DC_LUMA = {
    # (Category, Code) pairs
    0:  "00",         # 2 bits (Category 0: 0)
    1:  "010",        # 3 bits (Category 1: -1, 1)
    2:  "011",        # 3 bits (Category 2: -3, -2, 2, 3)
    3:  "100",        # 3 bits (Category 3: -7..-4, 4..7)
    4:  "101",        # 3 bits (Category 4: -15..-8, 8..15)
    5:  "110",        # 3 bits (Category 5: -31..-16, 16..31)
    6:  "1110",       # 4 bits (Category 6: -63..-32, 32..63)
    7:  "11110",      # 5 bits (Category 7: -127..-64, 64..127)
    8:  "111110",     # 6 bits (Category 8: -255..-128, 128..255)
    9:  "1111110",    # 7 bits (Category 9: -511..-256, 256..511)
    10: "11111110",   # 8 bits (Category 10: -1023..-512, 512..1023)
    11: "111111110",  # 9 bits (Category 11: -2047..-1024, 1024..2047)
}

In [None]:
HUFFMAN_AC_LUMA = {
    # (RunLength, Category): Code
    # Run=0 (No zeros before coefficient)
    (0, 0):  "1010",  # EOB (End of Block)
    (0, 1):  "00",
    (0, 2):  "01",
    (0, 3):  "100",
    (0, 4):  "1011",
    (0, 5):  "11010",
    (0, 6):  "1111000",
    (0, 7):  "11111000",
    (0, 8):  "1111110110",
    (0, 9):  "1111111110000010",
    (0, 10): "1111111110000011",

    # Run=1 (1 zero before coefficient)
    (1, 1):  "1100",
    (1, 2):  "11011",
    (1, 3):  "1111001",
    (1, 4):  "111110110",
    (1, 5):  "11111110110",
    (1, 6):  "1111111110000100",
    (1, 7):  "1111111110000101",
    (1, 8):  "1111111110000110",
    (1, 9):  "1111111110000111",
    (1, 10): "1111111110001000",

    # Run=2 (2 zeros before coefficient)
    (2, 1):  "11100",
    (2, 2):  "11111001",
    (2, 3):  "1111110111",
    (2, 4):  "111111110100",
    (2, 5):  "1111111110001001",
    (2, 6):  "1111111110001010",
    (2, 7):  "1111111110001011",
    (2, 8):  "1111111110001100",
    (2, 9):  "1111111110001101",
    (2, 10): "1111111110001110",

    # Run=3 (3 zeros before coefficient)
    (3, 1):  "111010",
    (3, 2):  "111110111",
    (3, 3):  "111111110101",
    (3, 4):  "1111111110001111",
    (3, 5):  "1111111110010000",
    (3, 6):  "1111111110010001",
    (3, 7):  "1111111110010010",
    (3, 8):  "1111111110010011",
    (3, 9):  "1111111110010100",
    (3, 10): "1111111110010101",

    # Run=4 (4 zeros before coefficient)
    (4, 1):  "111011",
    (4, 2):  "1111111000",
    (4, 3):  "1111111110010110",
    (4, 4):  "1111111110010111",
    (4, 5):  "1111111110011000",
    (4, 6):  "1111111110011001",
    (4, 7):  "1111111110011010",
    (4, 8):  "1111111110011011",
    (4, 9):  "1111111110011100",
    (4, 10): "1111111110011101",

    # Run=5 (5 zeros before coefficient)
    (5, 1):  "1111010",
    (5, 2):  "11111110111",
    (5, 3):  "1111111110011110",
    (5, 4):  "1111111110011111",
    (5, 5):  "1111111110100000",
    (5, 6):  "1111111110100001",
    (5, 7):  "1111111110100010",
    (5, 8):  "1111111110100011",
    (5, 9):  "1111111110100100",
    (5, 10): "1111111110100101",

    # Run=6 (6 zeros before coefficient)
    (6, 1):  "1111011",
    (6, 2):  "111111110110",
    (6, 3):  "1111111110100110",
    (6, 4):  "1111111110100111",
    (6, 5):  "1111111110101000",
    (6, 6):  "1111111110101001",
    (6, 7):  "1111111110101010",
    (6, 8):  "1111111110101011",
    (6, 9):  "1111111110101100",
    (6, 10): "1111111110101101",

    # Run=7 (7 zeros before coefficient)
    (7, 1):  "11111010",
    (7, 2):  "111111110111",
    (7, 3):  "1111111110101110",
    (7, 4):  "1111111110101111",
    (7, 5):  "1111111110110000",
    (7, 6):  "1111111110110001",
    (7, 7):  "1111111110110010",
    (7, 8):  "1111111110110011",
    (7, 9):  "1111111110110100",
    (7, 10): "1111111110110101",

    # Run=8 (8 zeros before coefficient)
    (8, 1):  "111111000",
    (8, 2):  "111111111000000",
    (8, 3):  "1111111110110110",
    (8, 4):  "1111111110110111",
    (8, 5):  "1111111110111000",
    (8, 6):  "1111111110111001",
    (8, 7):  "1111111110111010",
    (8, 8):  "1111111110111011",
    (8, 9):  "1111111110111100",
    (8, 10): "1111111110111101",

    # Run=9 (9 zeros before coefficient)
    (9, 1):  "111111001",
    (9, 2):  "1111111110111110",
    (9, 3):  "1111111110111111",
    (9, 4):  "1111111111000000",
    (9, 5):  "1111111111000001",
    (9, 6):  "1111111111000010",
    (9, 7):  "1111111111000011",
    (9, 8):  "1111111111000100",
    (9, 9):  "1111111111000101",
    (9, 10): "1111111111000110",

    # Run=10 (10 zeros before coefficient)
    (10, 1):  "111111010",
    (10, 2):  "1111111111000111",
    (10, 3):  "1111111111001000",
    (10, 4):  "1111111111001001",
    (10, 5):  "1111111111001010",
    (10, 6):  "1111111111001011",
    (10, 7):  "1111111111001100",
    (10, 8):  "1111111111001101",
    (10, 9):  "1111111111001110",
    (10, 10): "1111111111001111",

    # Run=11 (11 zeros before coefficient)
    (11, 1):  "1111111001",
    (11, 2):  "1111111111010000",
    (11, 3):  "1111111111010001",
    (11, 4):  "1111111111010010",
    (11, 5):  "1111111111010011",
    (11, 6):  "1111111111010100",
    (11, 7):  "1111111111010101",
    (11, 8):  "1111111111010110",
    (11, 9):  "1111111111010111",
    (11, 10): "1111111111011000",

    # Run=12 (12 zeros before coefficient)
    (12, 1):  "1111111010",
    (12, 2):  "1111111111011001",
    (12, 3):  "1111111111011010",
    (12, 4):  "1111111111011011",
    (12, 5):  "1111111111011100",
    (12, 6):  "1111111111011101",
    (12, 7):  "1111111111011110",
    (12, 8):  "1111111111011111",
    (12, 9):  "1111111111100000",
    (12, 10): "1111111111100001",

    # Run=13 (13 zeros before coefficient)
    (13, 1):  "11111111000",
    (13, 2):  "1111111111100010",
    (13, 3):  "1111111111100011",
    (13, 4):  "1111111111100100",
    (13, 5):  "1111111111100101",
    (13, 6):  "1111111111100110",
    (13, 7):  "1111111111100111",
    (13, 8):  "1111111111101000",
    (13, 9):  "1111111111101001",
    (13, 10): "1111111111101010",

    # Run=14 (14 zeros before coefficient)
    (14, 1):  "1111111111101011",
    (14, 2):  "1111111111101100",
    (14, 3):  "1111111111101101",
    (14, 4):  "1111111111101110",
    (14, 5):  "1111111111101111",
    (14, 6):  "1111111111110000",
    (14, 7):  "1111111111110001",
    (14, 8):  "1111111111110010",
    (14, 9):  "1111111111110011",
    (14, 10): "1111111111110100",

    # Run=15 (15 zeros before coefficient)
    (15, 0):  "11111111001",  # ZRL (Zero Run Length)
    (15, 1):  "1111111111110101",
    (15, 2):  "1111111111110110",
    (15, 3):  "1111111111110111",
    (15, 4):  "1111111111111000",
    (15, 5):  "1111111111111001",
    (15, 6):  "1111111111111010",
    (15, 7):  "1111111111111011",
    (15, 8):  "1111111111111100",
    (15, 9):  "1111111111111101",
    (15, 10): "1111111111111110",
}

In [None]:
HUFFMAN_DC_CHROMA = {
    # (Category, Code) pairs
    0:  "00",           #  2 bits (Category 0: 0)
    1:  "01",           #  2 bits (Category 1: -1, 1)
    2:  "10",           #  2 bits (Category 2: -3, -2, 2, 3)
    3:  "110",          #  3 bits (Category 3: -7..-4, 4..7)
    4:  "1110",         #  4 bits (Category 4: -15..-8, 8..15)
    5:  "11110",        #  5 bits (Category 5: -31..-16, 16..31)
    6:  "111110",       #  6 bits (Category 6: -63..-32, 32..63)
    7:  "1111110",      #  7 bits (Category 7: -127..-64, 64..127)
    8:  "11111110",     #  8 bits (Category 8: -255..-128, 128..255)
    9:  "111111110",    #  9 bits (Category 9: -511..-256, 256..511)
    10: "1111111110",   # 10 bits (Category 10: -1023..-512, 512..1023)
    11: "11111111110",  # 11 bits (Category 11: -2047..-1024, 1024..2047)
}

In [None]:
HUFFMAN_AC_CHROMA = {
    # (RunLength, Category): Code
    # Run=0 (No zeros before coefficient)
    (0, 0):  "00",  # EOB (End of Block)
    (0, 1):  "01",
    (0, 2):  "100",
    (0, 3):  "1010",
    (0, 4):  "11000",
    (0, 5):  "11001",
    (0, 6):  "111000",
    (0, 7):  "1111000",
    (0, 8):  "111110100",
    (0, 9):  "1111110110",
    (0, 10): "111111110100",

    # Run=1 (1 zero before coefficient)
    (1, 1):  "1011",
    (1, 2):  "111001",
    (1, 3):  "11110110",
    (1, 4):  "111110101",
    (1, 5):  "11111110110",
    (1, 6):  "111111110101",
    (1, 7):  "1111111110001000",
    (1, 8):  "1111111110001001",
    (1, 9):  "1111111110001010",
    (1, 10): "1111111110001011",

    # Run=2 (2 zeros before coefficient)
    (2, 1):  "11010",
    (2, 2):  "11110111",
    (2, 3):  "1111110111",
    (2, 4):  "111111110110",
    (2, 5):  "111111111000010",
    (2, 6):  "1111111110001100",
    (2, 7):  "1111111110001101",
    (2, 8):  "1111111110001110",
    (2, 9):  "1111111110001111",
    (2, 10): "1111111110010000",

    # Run=3 (3 zeros before coefficient)
    (3, 1):  "11011",
    (3, 2):  "11111000",
    (3, 3):  "1111111000",
    (3, 4):  "111111110111",
    (3, 5):  "1111111110010001",
    (3, 6):  "1111111110010010",
    (3, 7):  "1111111110010011",
    (3, 8):  "1111111110010100",
    (3, 9):  "1111111110010101",
    (3, 10): "1111111110010110",

    # Run=4 (4 zeros before coefficient)
    (4, 1):  "111010",
    (4, 2):  "111110110",
    (4, 3):  "1111111110010111",
    (4, 4):  "1111111110011000",
    (4, 5):  "1111111110011001",
    (4, 6):  "1111111110011010",
    (4, 7):  "1111111110011011",
    (4, 8):  "1111111110011100",
    (4, 9):  "1111111110011101",
    (4, 10): "1111111110011110",

    # Run=5 (5 zeros before coefficient)
    (5, 1):  "111011",
    (5, 2):  "1111111001",
    (5, 3):  "1111111110011111",
    (5, 4):  "1111111110100000",
    (5, 5):  "1111111110100001",
    (5, 6):  "1111111110100010",
    (5, 7):  "1111111110100011",
    (5, 8):  "1111111110100100",
    (5, 9):  "1111111110100101",
    (5, 10): "1111111110100110",

    # Run=6 (6 zeros before coefficient)
    (6, 1):  "1111001",
    (6, 2):  "11111110111",
    (6, 3):  "1111111110100111",
    (6, 4):  "1111111110101000",
    (6, 5):  "1111111110101001",
    (6, 6):  "1111111110101010",
    (6, 7):  "1111111110101011",
    (6, 8):  "1111111110101100",
    (6, 9):  "1111111110101101",
    (6, 10): "1111111110101110",

    # Run=7 (7 zeros before coefficient)
    (7, 1):  "1111010",
    (7, 2):  "11111111000",
    (7, 3):  "1111111110101111",
    (7, 4):  "1111111110110000",
    (7, 5):  "1111111110110001",
    (7, 6):  "1111111110110010",
    (7, 7):  "1111111110110011",
    (7, 8):  "1111111110110100",
    (7, 9):  "1111111110110101",
    (7, 10): "1111111110110110",

    # Run=8 (8 zeros before coefficient)
    (8, 1):  "11111001",
    (8, 2):  "1111111110110111",
    (8, 3):  "1111111110111000",
    (8, 4):  "1111111110111001",
    (8, 5):  "1111111110111010",
    (8, 6):  "1111111110111011",
    (8, 7):  "1111111110111100",
    (8, 8):  "1111111110111101",
    (8, 9):  "1111111110111110",
    (8, 10): "1111111110111111",

    # Run=9 (9 zeros before coefficient)
    (9, 1):  "111110111",
    (9, 2):  "1111111111000000",
    (9, 3):  "1111111111000001",
    (9, 4):  "1111111111000010",
    (9, 5):  "1111111111000011",
    (9, 6):  "1111111111000100",
    (9, 7):  "1111111111000101",
    (9, 8):  "1111111111000110",
    (9, 9):  "1111111111000111",
    (9, 10): "1111111111001000",

    # Run=10 (10 zeros before coefficient)
    (10, 1):  "111111000",
    (10, 2):  "1111111111001001",
    (10, 3):  "1111111111001010",
    (10, 4):  "1111111111001011",
    (10, 5):  "1111111111001100",
    (10, 6):  "1111111111001101",
    (10, 7):  "1111111111001110",
    (10, 8):  "1111111111001111",
    (10, 9):  "1111111111010000",
    (10, 10): "1111111111010001",

    # Run=11 (11 zeros before coefficient)
    (11, 1):  "111111001",
    (11, 2):  "1111111111010010",
    (11, 3):  "1111111111010011",
    (11, 4):  "1111111111010100",
    (11, 5):  "1111111111010101",
    (11, 6):  "1111111111010110",
    (11, 7):  "1111111111010111",
    (11, 8):  "1111111111011000",
    (11, 9):  "1111111111011001",
    (11, 10): "1111111111011010",

    # Run=12 (12 zeros before coefficient)
    (12, 1):  "111111010",
    (12, 2):  "1111111111011011",
    (12, 3):  "1111111111011100",
    (12, 4):  "1111111111011101",
    (12, 5):  "1111111111011110",
    (12, 6):  "1111111111011111",
    (12, 7):  "1111111111100000",
    (12, 8):  "1111111111100001",
    (12, 9):  "1111111111100010",
    (12, 10): "1111111111100011",

    # Run=13 (13 zeros before coefficient)
    (13, 1):  "11111111001",
    (13, 2):  "1111111111100100",
    (13, 3):  "1111111111100101",
    (13, 4):  "1111111111100110",
    (13, 5):  "1111111111100111",
    (13, 6):  "1111111111101000",
    (13, 7):  "1111111111101001",
    (13, 8):  "1111111111101010",
    (13, 9):  "1111111111101011",
    (13, 10): "1111111111101100",

    # Run=14 (14 zeros before coefficient)
    (14, 1):  "11111111100000",
    (14, 2):  "1111111111101101",
    (14, 3):  "1111111111101110",
    (14, 4):  "1111111111101111",
    (14, 5):  "1111111111110000",
    (14, 6):  "1111111111110001",
    (14, 7):  "1111111111110010",
    (14, 8):  "1111111111110011",
    (14, 9):  "1111111111110100",
    (14, 10): "1111111111110101",

    # Run=15 (15 zeros before coefficient)
    (15, 0):  "1111111010",  # ZRL (Zero Run Length)
    (15, 1):  "111111111000011",
    (15, 2):  "1111111111110110",
    (15, 3):  "1111111111110111",
    (15, 4):  "1111111111111000",
    (15, 5):  "1111111111111001",
    (15, 6):  "1111111111111010",
    (15, 7):  "1111111111111011",
    (15, 8):  "1111111111111100",
    (15, 9):  "1111111111111101",
    (15, 10): "1111111111111110",
}

In [None]:
def encode_dc(dc_value: int, huffman_table: dict[int, str]) -> str:
    """
    Encode a DC coefficient using Huffman coding.

    Args:
        dc_value: DC coefficient value.
        huffman_table: Huffman dictionary mapping category to code.

    Returns:
        Bitstring representing encoded DC coefficient.
    """
    if dc_value == 0:
        return huffman_table[0]

    category = abs(int(dc_value)).bit_length()
    code = huffman_table[category]

    if dc_value > 0:
        additional_bits = bin(dc_value)[2:]
    else:
        # it ensures all values in a category fit in the same number of bits
        additional_bits = format((1 << category) + dc_value, f"0{category}b")

    return code + additional_bits

In [None]:
def encode_ac(rle_pairs: list[tuple[int, int]], huffman_table: dict[tuple[int, int], str]) -> str:
    """
    Encode AC coefficients using Huffman coding.

    Args:
        rle_pairs: List of (run_length, value) pairs.
        huffman_table: Huffman dictionary mapping (run_length, size) -> code.

    Returns:
        Bitstring representing encoded AC coefficients.
    """
    bitstream = ""

    for run_length, value in rle_pairs:

        # handle End-of-Block (0,0) specially
        if run_length == 0 and value == 0:
            bitstream += huffman_table[(0, 0)]
            continue

        # handle Zero Run Length (15,0) specially
        if run_length == 15 and value == 0:
            bitstream += huffman_table[(15, 0)]
            continue

        size = abs(int(value)).bit_length()
        bitstream += huffman_table[(run_length, size)]

        if value != 0:
            if value > 0:
                additional_bits = bin(value)[2:]
            else:
                # it ensures all values in a category fit in the same number of bits
                additional_bits = format((1 << size) + value, f"0{size}b")
            bitstream += additional_bits

    return bitstream

In [None]:
def pad_bitstream(bitstream: str) -> str:
    """
    Pad bitstream with '1's until byte-aligned.

    Args:
        bitstream: Bitstring to pad.

    Returns:
        Byte-aligned bitstring.
    """
    remaining_bits = len(bitstream) % 8
    if remaining_bits != 0:
        bitstream += "1" * (8 - remaining_bits)
    return bitstream


def huffman_encode_all(
    Y_dc_diff: list[int],
    Cb_dc_diff: list[int],
    Cr_dc_diff: list[int],
    Y_rle: list[list[tuple[int, int]]],
    Cb_rle: list[list[tuple[int, int]]],
    Cr_rle: list[list[tuple[int, int]]],
) -> tuple[list[bytes], list[bytes], list[bytes]]:
    """
    Encode DC and AC coefficients for each block in Y, Cb, and Cr channels using Huffman coding.

    Args:
        Y_dc_diff: List of DC difference values for Y channel blocks.
        Cb_dc_diff: List of DC difference values for Cb channel blocks.
        Cr_dc_diff: List of DC difference values for Cr channel blocks.
        Y_rle: List of run-length encoded AC coefficients for Y channel blocks.
        Cb_rle: List of run-length encoded AC coefficients for Cb channel blocks.
        Cr_rle: List of run-length encoded AC coefficients for Cr channel blocks.

    Returns:
        Tuple containing three lists of byte-aligned bitstreams:
            - Y channel encoded blocks
            - Cb channel encoded blocks
            - Cr channel encoded blocks
    """
    # luma (Y) channel
    y_blocks = []
    for dc, ac in zip(Y_dc_diff, Y_rle):
        dc_bits = encode_dc(dc, HUFFMAN_DC_LUMA)
        ac_bits = encode_ac(ac, HUFFMAN_AC_LUMA)
        y_blocks.append(dc_bits + ac_bits)  # Combine DC + AC per block

    # chroma (Cb) channel
    cb_blocks = []
    for dc, ac in zip(Cb_dc_diff, Cb_rle):
        dc_bits = encode_dc(dc, HUFFMAN_DC_CHROMA)
        ac_bits = encode_ac(ac, HUFFMAN_AC_CHROMA)
        cb_blocks.append(dc_bits + ac_bits)  # Combine DC + AC per block

    # chroma (Cr) channel
    cr_blocks = []
    for dc, ac in zip(Cr_dc_diff, Cr_rle):
        dc_bits = encode_dc(dc, HUFFMAN_DC_CHROMA)
        ac_bits = encode_ac(ac, HUFFMAN_AC_CHROMA)
        cr_blocks.append(dc_bits + ac_bits)  # Combine DC + AC per block

    # pad each block's bitstream to byte alignment
    y_blocks = [pad_bitstream(block) for block in y_blocks]
    cb_blocks = [pad_bitstream(block) for block in cb_blocks]
    cr_blocks = [pad_bitstream(block) for block in cr_blocks]

    return y_blocks, cb_blocks, cr_blocks

In [None]:
y_bits, cb_bits, cr_bits = huffman_encode_all(Y_dc_diff, Cb_dc_diff, Cr_dc_diff, Y_rle, Cb_rle, Cr_rle)

# first 2x2x3 bits
block_bits = sum(len(y_bits[i]) for i in range(4)) + len(cb_bits[0]) + len(cr_bits[0])

# total bits
total_y_bits = sum(len(y) for y in y_bits)
total_cb_bits = sum(len(cb) for cb in cb_bits)
total_cr_bits = sum(len(cr) for cr in cr_bits)
total = total_y_bits + total_cb_bits + total_cr_bits


# log
print(f"len(y_bits)  : {len(y_bits)} | y_bits[0]  : {y_bits[0]}")
print(f"len(y_bits)  : {len(y_bits)} | y_bits[1]  : {y_bits[1]}")
print(f"len(y_bits)  : {len(y_bits)} | y_bits[2]  : {y_bits[2]}")
print(f"len(y_bits)  : {len(y_bits)} | y_bits[3]  : {y_bits[3]}")
print(f"len(cb_bits) : {len(cb_bits)} | cb_bits[0] : {cb_bits[0]}")
print(f"len(cr_bits) : {len(cr_bits)} | cr_bits[0] : {cr_bits[0]}\n")
print(f"number of bits to encode the 2x2x3 block = {block_bits} bits")
print(f"number of bits to encode the whole image ‚âà {total / (8 * 1024):,} KiB")

##### <a id='toc3_1_1_1_11_'></a>[Bitstream Packaging](#toc0_)


In [None]:
# SOI (Start of Image)
# size: 2 bytes
# purpose: signals the start of a JPEG file
soi = b"\xff\xd8"

In [None]:
# APP0 (JFIF Metadata)
# size: 16 bytes (minimum)
# purpose: contains JFIF identification, version, and pixel density
app0 = (
    b"\xff\xe0"  # APP0 marker
    + b"\x00\x10"  # Length (16 bytes)
    + b"JFIF\x00"  # Identifier
    + b"\x01\x01"  # Version (1.1)
    + b"\x00"  # Density units (0 = no units)
    + b"\x00\x01"  # X density
    + b"\x00\x01"  # Y density
    + b"\x00\x00"  # Thumbnail size (0x0)
)

In [None]:
def quant_table_to_bytes(q_table_float):
    """Convert 8x8 float32 quantization table to JPEG DQT bytes"""
    # clip to valid range and convert to uint8
    q_table = np.clip(q_table_float, 1, 255).astype(np.uint8)

    # flatten based on zigzag order
    q_table_zigzag = q_table.flatten()[ZIG_ZAG_ORDER]

    # prepend precision/ID byte (0x00 for 8-bit luma, 0x01 for chroma)
    return bytes(q_table_zigzag)


# DQT (Quantization Tables)
# size: 67 bytes per table (usually 2 tables ‚Üí 134 bytes total)
# purpose: defines quantization matrices for luma/chroma
dqt = (
    b"\xff\xdb"  # DQT marker
    + b"\x00\x43"  # Length (67 bytes)
    + b"\x00"
    + quant_table_to_bytes(Q_Y_50)  # Luma table (precision 0, id 0)
    + b"\xff\xdb"  # DQT marker
    + b"\x00\x43"  # Length (67 bytes)
    + b"\x01"
    + quant_table_to_bytes(Q_C_50)  # Chroma table (precision 0, id 1)
)

In [None]:
height, width = im_2.shape[:2]

# start of frame (SOF0 - Baseline DCT)
# size: 17 bytes
# purpose: specifies image dimensions, precision, and component info
sof0 = (
    b"\xff\xc0"  # SOF0 marker
    + b"\x00\x11"  # Length (17 bytes)
    + b"\x08"  # Precision (8 bits)
    + bytes([height >> 8, height & 0xFF])  # Height
    + bytes([width >> 8, width & 0xFF])  # Width
    + b"\x03"  # Number of components (3)
    +
    # Y component
    b"\x01"  # Component ID
    + b"\x22"  # Sampling factors (2x2)
    + b"\x00"  # Quantization table ID
    +
    # Cb component
    b"\x02"  # Component ID
    + b"\x11"  # Sampling factors (1x1)
    + b"\x01"  # Quantization table ID
    +
    # Cr component
    b"\x03"  # Component ID
    + b"\x11"  # Sampling factors (1x1)
    + b"\x01"  # Quantization table ID
)

In [None]:
# DHT (Huffman Tables)
# size: Variable (~200‚Äì418 bytes for default tables)
# purpose: stores Huffman coding tables for DC/AC coefficients

In [None]:
# SOS (Start of Scan)
# size: 12 bytes
# purpose: begins the compressed scan data
sos = (
    b"\xff\xda"  # SOS marker
    + b"\x00\x0c"  # Length (12 bytes)
    + b"\x03"  # Number of components (3)
    +
    # Y component
    b"\x01"  # Component ID
    + b"\x00"  # DC/AC table IDs (0, 0)
    +
    # Cb component
    b"\x02"  # Component ID
    + b"\x11"  # DC/AC table IDs (1, 1)
    +
    # Cr component
    b"\x03"  # Component ID
    + b"\x11"  # DC/AC table IDs (1, 1)
    + b"\x00\x3f\x00"  # Spectral selection/approx.
)

In [None]:
# compressed image data
# format: interleaved MCU blocks (Y/Cb/Cr).

In [None]:
# EOI (end of image)
eoi = b"\xff\xd9"

#### <a id='toc3_1_1_2_'></a>[Decoder](#toc0_)


##### <a id='toc3_1_1_2_1_'></a>[Bitstream Parsing](#toc0_)


##### <a id='toc3_1_1_2_2_'></a>[Entropy Decoding: Huffman decoding (or arithmetic decoding)](#toc0_)

##### <a id='toc3_1_1_2_3_'></a>[Differential Pulse-Code Modulation (DPCM)](#toc0_)


##### <a id='toc3_1_1_2_4_'></a>[Run-Length Encoding (RLE)](#toc0_)


##### <a id='toc3_1_1_2_5_'></a>[Inverse Zigzag](#toc0_)


##### <a id='toc3_1_1_2_6_'></a>[Dequantization](#toc0_)


##### <a id='toc3_1_1_2_7_'></a>[Inverse Discrete Cosine Transform (IDCT)](#toc0_)


##### <a id='toc3_1_1_2_8_'></a>[Block Joining](#toc0_)


##### <a id='toc3_1_1_2_9_'></a>[Level Shifting](#toc0_)


##### <a id='toc3_1_1_2_10_'></a>[Chroma Upsampling (4:4:4)](#toc0_)


##### <a id='toc3_1_1_2_11_'></a>[Color Space Conversion](#toc0_)


### <a id='toc3_1_2_'></a>[JPEG 2000 (JPEG-2)](#toc0_)
