# Window-based Stereo Matching

Stereo matching plays a pivotal role in 3D computer vision by computing the disparity
map between a pair of stereo images. Nowadays, traditional methods, like the one
presented in this notebook, have largely fallen out of favor in practical applications,
primarily due to the emergence of superior neural network-based algorithms.

Nevertheless, diving into the fundamentals of stereo matching offers a valuable
opportunity to grasp the foundational aspects of this field and gain insights into
potential pitfalls and challenges that persist in stereo matching algorithms. These
challenges encompass issues such as occlusion, variations in image brightness, and image
noise, which are prevalent in real-world scenarios. Familiarizing myself with these
challenges serves as a building block for developing intuition on how to tackle them
when working with more advanced neural network-based approaches. 😊

**RULES:** As usual, **`OpenCV`** is banned in this repository.

## Introduction

Stereo matching aims to determine the pixel disparity in the x-axis between two provided
images: a left image and a right image. It assumes that the y-axes of these stereo
images are perfectly aligned. However, in reality, this assumption often does not hold
true, necessitating the rectification of stereo images. The accuracy of this
rectification process hinges largely on the ability to locate feature points within both
stereo images. For the purpose of this exercise, let's presume that we have stereo
images perfectly aligned along the y-axis, enabling us to proceed with this tutorial.

**Note:** In this notebook, all discussions and equations adhere to the conventions
commonly used in computer vision for defining image coordinates. Accordingly, the x-axis
extends towards the right, while the y-axis extends downward. However, when it comes to
implementation, it's crucial to reverse the order of the parameters. This is because, in
a numpy array, the first axis corresponds to the downward direction, while the second
axis corresponds to the rightward direction.

One approach within stereo matching involves framing the problem as an energy function
minimization task. This energy function comprises two fundamental components: the data
term and the smoothness term, which can be formally expressed as:

$$
E(D) = \alpha E_{data}(D) + \beta E_{smooth}(D)
$$

The data term, denoted as $E_{data}(D)$, can be informally described as follows:

$$
E_{data}(D) = \sum_{(x,y)\in{I}} C(x, y, D(x, y))
$$

Here, it quantifies the cost associated with the disparity values $D(x, y)$ for all
pixel coordinates $(x, y)$ within the image $I$.

The smoothness term, on the other hand, is defined as:

$$
E_{smooth}(D) = \sum_{(p,q)\in{\epsilon}} V(d_p, d_q)
$$

This term captures the smoothness constraints by evaluating the relationship between
disparities $d_p$​ and $d_q$ for pixel pairs belonging to a defined neighborhood
$\epsilon$.

In this context, $\epsilon$ represents the set of neighboring pixels, and $V(d_p, d_q)$
signifies the $L1$ distance between them. To illustrate, given a disparity value $d_p$
for a pixel $p$ in the disparity map $D$, $d_q$ encompasses the collection of disparity
values for all neighboring pixels surrounding $p$.

For simplicity, we won't consider the smoothness term in this exercise. Thus, our energy
function becomes $E(D) = E_{data}(D)$.

It's important to note that we perform stereo matching in both the left-to-right and
right-to-left directions. This is necessary due to variations in depth (discontinuities)
and occlusions. In other words, some pixels that are visible in the left image may not
be visible in the right image, and vice versa.

To clarify, in the context of the stereo images provided below, the left image is
captured from the perspective of the left viewpoint, while the right image is captured
from the right viewpoint. You can observe that objects in the left image may appear to
the right of their counterparts in the right image. This phenomenon arises from the
inherent nature of stereo images.

In line with the disparity map convention, the disparity value is computed by
subtracting the x-coordinate of a pixel in the right image from the x-coordinate of the
corresponding pixel in the left image, as shown in the equation:

$$
\text{disparity} = \text{pixel}_{\text{left}} - pixel_{\text{right}}
$$

For instance, if we aim to determine the disparity value for a pixel, such as the eyes
of the statue, in the left image, the disparity value would be positive, indicating that
this pixel is to the right of its corresponding point in the right image.

It's important to note that if we reverse the order of the left and right images, the
sign of the disparity will be inverted.

<figure>
  <div style="display: flex; justify-content: space-between;">
    <div style="text-align: center;">
      <img src="./input/pair1-L.png" style="width: 80%;" alt="Left Image">
      <p><strong>Left Image</strong></p>
      <p>The statue appears more to the right in the left image compared to its position in the right image.</p>
    </div>
    <div style="text-align: center;">
      <img src="./input/pair1-R.png" style="width: 80%;" alt="Right Image">
      <p><strong>Right Image</strong></p>
      <p>The right image is captured from the right viewpoint.</p>
    </div>
  </div>
</figure>


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from skimage import io, feature, filters
import numpy as np
import cv2
from PIL import Image, ImageDraw
import networkx
from skimage import color
import image_processing_utils


gray_left = image_processing_utils.read_image("./input/pair0-L.png")
gray_right = image_processing_utils.read_image("./input/pair0-R.png")

statue_left = image_processing_utils.read_image("./input/pair1-L.png")
statue_right = image_processing_utils.read_image("./input/pair1-R.png")

cleaner_left = image_processing_utils.read_image("./input/pair2-L.png")
cleaner_right = image_processing_utils.read_image("./input/pair2-R.png")


statue_left_gt = io.imread("./input/pair1-D_L.png")
statue_right_gt = io.imread("./input/pair1-D_R.png")

cleaner_left_gt = io.imread("./input/pair2-D_L.png")
cleaner_right_gt = io.imread("./input/pair2-D_R.png")



In [None]:
print(gray_left.shape, gray_left.dtype)
print(statue_left.shape, statue_left.dtype, statue_left_gt.shape, statue_left_gt.dtype)

## Sum of Squared/Absolute Difference and Cosine Similarity

Our primary objective is to find the disparity map $D$ that minimizes $E(D)$. In other
words, we can define the cost for each pixel $(x, y)$ as follows:

$$
c(x, y, d) = \sum_{i=-\frac{W}{2}}^{\frac{W}{2}} \sum_{j=-\frac{W}{2}}^{\frac{W}{2}}
\text{Diff}(I_{\text{left}}(x+i, y+j), I_{\text{right}}(x+i-d, y+j))
$$

Here, $d$ spans a predefined range, and our aim is to find the best disparity value that
minimizes the cost function, known as sum of squared differences (SSD), or
alternatively, the sum of absolute differences (SAD) and cosine similarity. $W$
represents the width and height of the window. In this section, we'll compare SSD, SAD
and cosine similarity results to determine the superior approach.

**SSD:**

$$
[I_{left}(x+i, y+j) - I_{right}(x+i-d, y+j)]^2
$$

**SAD:**

$$
|I_{left}(x+i, y+j) - I_{right}(x+i-d, y+j)|
$$

**Cosine Similarity:**

$$
\cos(\theta) = \frac{\bm{A} \cdot \bm{B}}{\|\bm{A}\| \|\bm{B}\|}
$$

$$
\cos(\theta) = \frac{\sum_{i=1}^{n} (A_i \cdot B_i)}{\sqrt{\sum_{i=1}^{n} A_i^2} \cdot \sqrt{\sum_{i=1}^{n} B_i^2}}
$$


In [None]:
def coord_is_valid(x_min, x_max, y_min, y_max, width, height):
    if x_min < 0 or y_min < 0:
        # print(f"x_min: {x_min}. y_min: {y_min}")
        return False
    if x_max >= width or y_max >= height:
        # print(f"x_max: {x_max}. y_max: {y_max}")
        return False
    return True


def ssd(left, right):
    """Sum of Squared Difference
    left, right: C x H x W"""
    assert left.shape == right.shape and left.ndim == 3
    return np.sum((left - right) ** 2, axis=(1, 2))


def sad(left, right):
    """Sum of Absolute Difference
    left, right: C x H x W"""
    assert left.shape == right.shape and left.ndim == 3
    return np.sum(np.absolute(left - right), axis=(1, 2))


def cosine_similarity(left, right):
    """Cosine Similarity
    left, right: C x H x W"""
    assert left.shape == right.shape and left.ndim == 3
    dividend = np.sum(np.multiply(left, right), axis=(1, 2))
    assert dividend.shape[0] == left.shape[0]
    divisor = np.multiply(
        np.sqrt(np.sum(left**2, axis=(1, 2))),
        np.sqrt(np.sum(right**2, axis=(1, 2))),
    )
    assert divisor.shape[0] == left.shape[0]
    assert np.all(divisor != 0), f"divisor has 0 in it. {divisor}"
    return -1 * np.divide(dividend, divisor)


# def cal_disp_map(
#     left_img, right_img, window_size=3, disp_range=(-32, 32), cost_fun=None
# ):
#     assert cost_fun is not None
#     assert left_img.shape == right_img.shape
#     # disparity could be positive or negative floating-point.
#     disparity_map = np.zeros_like(left_img, dtype=np.float64)

#     half_win = window_size // 2  # 3->1, 5->2
#     height, width = left_img.shape  # Get the image dimensions

#     # Loop through each pixel in the left image. Do not
#     for y in range(height):
#         for x in range(width):
#             min_cost = float("inf")
#             best_disparity = None

#             # Define the search window coordinates
#             y_min, y_max = max(0, y - half_win), min(height - 1, y + half_win + 1)
#             x_min, x_max = max(0, x - half_win), min(width - 1, x + half_win + 1)
#             assert coord_is_valid(
#                 x_min, x_max, y_min, y_max, width, height
#             ), f"{x_min}, {x_max}, {y_min}, {y_max}, {width}, {height}"

#             # Calculate the cost for each disparity value in the disparity range.
#             for disparity in range(disp_range[0], disp_range[1]):
#                 # Apply disparty to the x axis of the right window.
#                 if (
#                     coord_is_valid(
#                         x_min - disparity,
#                         x_max - disparity,
#                         y_min,
#                         y_max,
#                         width,
#                         height,
#                     )
#                     is False
#                 ):
#                     continue

#                 # Extract the left and right image patches
#                 left_patch = left_img[y_min : y_max + 1, x_min : x_max + 1]
#                 right_patch = right_img[
#                     y_min : y_max + 1, x_min - disparity : x_max - disparity + 1
#                 ]

#                 cost = cost_fun(left_patch, right_patch)

#                 # Update if we found a better disparity value
#                 if cost < min_cost:
#                     min_cost = cost
#                     best_disparity = disparity

#             disparity_map[y, x] = best_disparity
#     return disparity_map

In [None]:
def cal_disp_map(
    left_img, right_img, window_size=3, disp_range=(-32, 32), cost_fun=None
):
    assert cost_fun is not None
    assert left_img.shape == right_img.shape
    # disparity could be positive or negative floating-point.
    disparity_map = np.zeros_like(left_img, dtype=np.float64)

    half_win = window_size // 2  # 3->1, 5->2
    height, width = left_img.shape  # Get the image dimensions

    # Loop through each pixel in the left image. Do not
    for y in range(height):
        for x in range(width):
            min_cost = float("inf")
            best_disparity = None

            # Define the search window coordinates
            y_min, y_max = max(0, y - half_win), min(height - 1, y + half_win + 1)
            x_min, x_max = max(0, x - half_win), min(width - 1, x + half_win + 1)
            assert coord_is_valid(
                x_min, x_max, y_min, y_max, width, height
            ), f"{x_min}, {x_max}, {y_min}, {y_max}, {width}, {height}"

            left_patches = []
            right_patches = []
            disp_candidates = []
            # Calculate the cost for each disparity value in the disparity range.
            for disp in range(disp_range[0], disp_range[1]):
                # Apply disparty to the x axis of the right window.
                if (
                    coord_is_valid(
                        x_min - disp,
                        x_max - disp,
                        y_min,
                        y_max,
                        width,
                        height,
                    )
                    is False
                ):
                    continue

                # Extract the left and right image patches
                left_patch = left_img[y_min : y_max + 1, x_min : x_max + 1]
                right_patch = right_img[
                    y_min : y_max + 1, x_min - disp : x_max - disp + 1
                ]
                left_patches.append(left_patch)
                right_patches.append(right_patch)
                disp_candidates.append(disp)

            costs = cost_fun(
                np.stack(left_patches, axis=0), np.stack(right_patches, axis=0)
            )
            assert np.any(np.isnan(costs)) == False
            assert costs.size == len(disp_candidates)
            disparity_map[y, x] = disp_candidates[np.argmin(costs)]

    return disparity_map


# image_data = [(cosine_similarity, "Cosine Similarity")]
image_data = [(ssd, "SSD"), (sad, "SAD"), (cosine_similarity, "Cosine Similarity")]

fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(10, 3))
fig.suptitle("Left to right disparity map.")
for i, (cost_fun, title) in enumerate(image_data):
    disp_map = cal_disp_map(
        gray_left, gray_right, window_size=3, disp_range=(-5, 5), cost_fun=cost_fun
    )

    ax[i].set_title(f"{title}")
    im = ax[i].imshow(
        disp_map,
        cmap="plasma",
        aspect="auto",
        vmin=-5,
        vmax=5,
    )

    cbar = fig.colorbar(im, ax=ax[i], orientation="vertical")
    cbar.set_label("Disparity Value", rotation=270, labelpad=20)

fig.tight_layout()
plt.show()

**Discussion:**
"this plot shows that the right image needs to shift 5 pixels to the right to be aligned with left image."

To be honest, the difference of SSD and SAD is not too much.
the size of the disparity map is the same as left and right images'.

In [None]:
disp_min = 0
disp_max = 150
# disp_map = cal_disp_map1(statue_left, statue_right, window_size=5, disp_range=(disp_min, disp_max), cost_fun=cosine_similarity)

# fig, ax = plt.subplots(figsize=(5, 4))
# fig.suptitle("Left to right disparity map.")
# ax.set_title(f"Min:{np.min(disp_map)}. Max:{np.max(disp_map)}")
# im = ax.imshow(
#     disp_map,
#     cmap="RdYlGn",
#     aspect="auto",
#     vmin=disp_min,
#     vmax=disp_max,
# )
# ax.set_axis_off()
# cbar = fig.colorbar(im, ax=ax, orientation="vertical")
# cbar.set_label("Disparity Value", rotation=270, labelpad=20)
# fig.tight_layout()
# plt.show()


In [None]:
fig, ax = plt.subplots(figsize=(5, 4))
fig.suptitle("Left to right ground truth.")
ax.set_title(f"Min:{np.min(statue_left_gt)}. Max:{np.max(statue_left_gt)}")
im = ax.imshow(
    statue_left_gt / 2,
    cmap="RdYlGn",
    aspect="auto",
    vmin=0,
    vmax=150,
)
ax.set_axis_off()
cbar = fig.colorbar(im, ax=ax, orientation="vertical")
cbar.set_label("Disparity Value", rotation=270, labelpad=20)
fig.tight_layout()
plt.show()

In [None]:
image_data = [(ssd, "SSD"), (sad, "SAD"), (cosine_similarity, "Cosine Similarity")]

fig, ax = plt.subplots(nrows=1, ncols=4, figsize=(16, 4))
fig.suptitle("Left to right disparity map.")

ax[0].set_title(f"Min:{np.min(statue_left_gt)}. Max:{np.max(statue_left_gt)}")
ax[0].imshow(
    statue_left_gt / 2,
    cmap="RdYlGn",
    aspect="auto",
    vmin=0,
    vmax=150,
)
ax[0].set_axis_off()

for i, (cost_fun, title) in enumerate(image_data):
    disp_map = cal_disp_map(
        statue_left,
        statue_right,
        window_size=5,
        disp_range=(disp_min, disp_max),
        cost_fun=cost_fun,
    )
    ax[i + 1].set_title(f"{title}. Min:{np.min(disp_map)}. Max:{np.max(disp_map)}")
    im = ax[i + 1].imshow(
        disp_map,
        cmap="RdYlGn",
        aspect="auto",
        vmin=0,
        vmax=150,
    )
    ax[i + 1].set_axis_off()

cbar = fig.colorbar(im, ax=ax[3], orientation="vertical")
cbar.set_label("Disparity Value", rotation=270, labelpad=20)

fig.tight_layout()
plt.show()

## Adding Noise and Variation of Brightness

now, we can add noises and 10% of brightness to the left image.

In [None]:
def add_noise(image, noise_mean=0, noise_std=0.05):
    # Generate random white noise.
    noise = np.random.normal(loc=noise_mean, scale=noise_std, size=image.shape)

    noisy_image = np.clip(image + noise, 0, 1)
    return noisy_image

def add_variation_brightness(image, percent=0.1):
    image_min = np.min(image)
    image_max = np.max(image)
    image_range = image_max - image_min

    brightness_variation = image_range * percent

    adjusted_image = np.clip(image + brightness_variation, 0, 1)
    return adjusted_image

In [None]:
# noise_statue_left = add_noise(statue_left, noise_mean=0.1, noise_std=0.1)
noise_statue_left = add_variation_brightness(statue_left, percent=0.2)

image_data = [statue_left, noise_statue_left]

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(8, 4))
fig.suptitle("Left to right disparity map.")
for i, image in enumerate(image_data):
    # ax[i].set_title(f"Min:{np.min(image)}. Max:{np.max(image)}")
    im = ax[i].imshow(
        image,
        cmap="gray",
        aspect="auto",
        # vmin=disp_min,
        # vmax=disp_max,
    )
    ax[i].set_axis_off()

# cbar = fig.colorbar(im, ax=ax, orientation="vertical")
# cbar.set_label("Disparity Value", rotation=270, labelpad=20)
fig.tight_layout()
plt.show()




In [None]:
disp_map = cal_disp_map(noise_statue_left, statue_right, window_size=5, disp_range=(disp_min, disp_max), cost_fun=ssd)

fig, ax = plt.subplots(figsize=(5, 4))
fig.suptitle("Left to right disparity map.")
ax.set_title(f"Min:{np.min(disp_map)}. Max:{np.max(disp_map)}")
im = ax.imshow(
    disp_map,
    cmap="RdYlGn",
    aspect="auto",
    vmin=disp_min,
    vmax=disp_max,
)
ax.set_axis_off()
cbar = fig.colorbar(im, ax=ax, orientation="vertical")
cbar.set_label("Disparity Value", rotation=270, labelpad=20)
fig.tight_layout()
plt.show()

In [None]:
disp_map = cal_disp_map(noise_statue_left, statue_right, window_size=5, disp_range=(disp_min, disp_max), cost_fun=cosine_similarity)

fig, ax = plt.subplots(figsize=(5, 4))
fig.suptitle("Left to right disparity map.")
ax.set_title(f"Min:{np.min(disp_map)}. Max:{np.max(disp_map)}")
im = ax.imshow(
    disp_map,
    cmap="RdYlGn",
    aspect="auto",
    vmin=disp_min,
    vmax=disp_max,
)
ax.set_axis_off()
cbar = fig.colorbar(im, ax=ax, orientation="vertical")
cbar.set_label("Disparity Value", rotation=270, labelpad=20)
fig.tight_layout()
plt.show()

## Normalized Cross Correlation

normalised cross-correlation (NCC)

In the example of SSD and SAD, we essentially want to know the similarity of two 2D
arrays. we can calculate the cosine similarity of these two 2D arrasy by using the eqaution:


talk about normalized cross correlation a bit. given intuitive 2D example, with template
matching. I want to emphasize that in the example, the disparity range is mostly
negative, then why do i specify the range to include positive and negative range. it's
because in reality, the stereo image are mostly from rectification. and the disparity in
recitified images can be positive and negative. so it's a good habit to keep in mind
that disparity could cover from negative to positive range.


