# Stereo Geometry

## Introduction

Understanding stereo geometry is a fundamental step in 3D computer vision. It provides
insight into the relationship between captured images and the camera's position. In this
tutorial, we will explore the calibration of a camera, estimating the camera projection
matrix to establish the link between 2D image points and 3D world points. Additionally,
we will delve into the concept of the fundamental matrix, which connects two stereo
images and aids in identifying epipolar lines. Lastly, we will rectify or warp stereo
images to align them along the y-axis.

To start, let's familiarize ourselves with the data we will be working with:

-   `pic_a.jpg` and `pic_b.jpg`: These are the left and right images.

-   `pts2d-norm-pic_a.txt`: This file contains normalized 2D points in the left image,
    which we can use for initial verification.

-   `pts3d-norm.txt`: Normalized 3D points in world coordinates, also for initial
    verification.

-   `pts2d-pic_a.txt` and `pts2d-pic_b.txt`: Unnormalized 2D points in the left and
    right images.

-   `pts3d.txt`: Unnormalized 3D points in world coordinates.

**Important Note on Implementation of this Notebook**

-   In this notebook, we strive to follow the implementation outlined in the book
    "**Multiple View Geometry in Computer Vision (MVG)**." Therefore, there may be
    slight inconsistencies between the input data and the textbook. For instance, the
    average distance from the origin of the normalized 2D points in
    `pts2d-norm-pic_a.txt` is `0.7785`, whereas it is suggested to be $\sqrt{2}$ in MVG.
    Similarly, for 3D points, the average distance is `1.2147`, whereas MVG suggests
    $\sqrt{3}$. We adhere to the MVG guidelines unless there are compelling reasons not
    to.

**RULES:** As usual, **`OpenCV`** is banned in this repository.


In [None]:
import numpy as np
from scipy.optimize import least_squares
from stereo_geometry_utils import *
import matplotlib.pyplot as plt
from skimage import io
from skimage.transform import warp, ProjectiveTransform

left_img = io.imread("./input/pic_a.jpg")
right_img = io.imread("./input/pic_b.jpg")

orig_norm_pts2d = read_keypoints("./input/pts2d-norm-pic_a.txt")
orig_norm_pts3d = read_keypoints("./input/pts3d-norm.txt")

pts2d_left = read_keypoints("./input/pts2d-pic_a.txt")
pts2d_right = read_keypoints("./input/pts2d-pic_b.txt")
pts3d = read_keypoints("./input/pts3d.txt")

## Calibration

The files `pts2d-pic_a.txt` and `pts3d.txt` contain a list of twenty 2D and 3D points
corresponding to the image `pic_a.jpg`. Our objective is to compute the projection
matrix that transforms 3D world coordinates into 2D image coordinates. Using homogeneous
coordinates, the equation is expressed as follows:

$$
\begin{bmatrix}
u \\
v \\
1 \\
\end{bmatrix}
\simeq
\begin{bmatrix}
s*u \\
s*v \\
s \\
\end{bmatrix}
=
\begin{bmatrix}
m_{11} & m_{12} & m_{13} & m_{14} \\
m_{21} & m_{22} & m_{23} & m_{24} \\
m_{31} & m_{32} & m_{33} & m_{34} \\
\end{bmatrix}
\begin{bmatrix}
X \\
Y \\
Z \\
1 \\
\end{bmatrix}
$$

To find the 3x4 matrix M, we can either solve the homogeneous version of these equations
using Singular Value Decomposition (SVD) or set $m_{34}$ to 1 and employ a normal least
squares method. It's important to note that M is known only up to a scale factor.

<figure>
  <div style="display: flex; justify-content: space-between;">
    <div style="text-align: center;">
      <img src="./input/pic_a.jpg" style="width: 80%;" alt="Left Image">
      <p><strong>Unrectified Left Image "pic_a.jpg"</strong></p>
    </div>
    <div style="text-align: center;">
      <img src="./input/pic_b.jpg" style="width: 80%;" alt="Right Image">
      <p><strong>Unrectified Right Image "pic_b.jpg"</strong></p>
    </div>
  </div>
</figure>


## Normalization

As per the guidelines in MVG, the normalization of 2D and 3D points is a critical step.
In this notebook, we adopt $\sqrt{2}$ as the average distance from the origin for 2D
points and $\sqrt{3}$ as the average distance from the origin for 3D points.


In [None]:
print(f"Before normalization:")
print(f"The unnormalized 2D point: {pts2d_left[:, 0]}")
print(f"The unnormalized 3D point: {pts3d[:, 0]}")
print("\n")
print("Applying normalization ...")
norm_pts2d_left, sim_mat_T_l = normalize_image_points(
    pts2d_left, avg_distance=np.sqrt(2)
)
norm_pts2d_right, sim_mat_T_r = normalize_image_points(
    pts2d_right, avg_distance=np.sqrt(2)
)
norm_pts3d, sim_mat_U = normalize_space_points(pts3d, avg_distance=np.sqrt(3))
print(f"The average distance to origin is:")
print(f"2D: {average_distance_to_origin(norm_pts2d_left)}")
print(f"3D: {average_distance_to_origin(norm_pts3d)}")
print("\n")
print(f"After normalization:")
inverse_pts2d = np.dot(np.linalg.inv(sim_mat_T_l), norm_pts2d_left)
inverse_pts3d = np.dot(np.linalg.inv(sim_mat_U), norm_pts3d)
np.divide(inverse_pts2d, inverse_pts2d[-1, :], out=inverse_pts2d)
np.divide(inverse_pts3d, inverse_pts3d[-1, :], out=inverse_pts3d)
print(
    "We can verify the correctness of normalization by applying the inverse of similarity transform to the normalized points."
)
print(f"The denormalized 2D point: {inverse_pts2d[:, 0]}")
print(f"The denormalized 3D point: {inverse_pts3d[:, 0]}")

## Projection Matrix Estimation

To estimate the projection matrix, we will employ Algorithm 7.1 from MVG's book, as
illustrated below. This algorithm initially leverages a linear solution to generate an
initial projection matrix estimate, which is subsequently refined using a least-squares
method to minimize geometric error. The geometric error quantifies the disparity between
the projected 2D points derived from 3D points and the actual 2D points extracted from
the input data. This residual is computed as the square root of the sum of squared
differences in $u$ and $v$.

<img src="images/MVG-algo_7.1.png" style="width:512px;">

To validate the correctness of our code, we can use a set of "normalized points" stored
in the files `pts2d-norm-pic_a.txt` and `pts3d-norm.txt`. When we solve for $M$ using
all these points, we should obtain a matrix that is a scaled equivalent of the
following:

$$
M_{norm}
=
\begin{bmatrix}
-0.4583 &  0.2947 & 0.0139 & -0.0040 \\
 0.0509 &  0.0546 & 0.5410 &  0.0524 \\
-0.1090 & -0.1784 & 0.0443 & -0.5968 \\
\end{bmatrix}
$$

For example, when given a normalized 3D point
${\begin{bmatrix}1.2323 & 1.4421 & 0.4506 & 1.0\end{bmatrix}}^T$, $M_{\text{norm}}$
will project this point to ${\begin{bmatrix}u & v\end{bmatrix}}^T$ of
${\begin{bmatrix}0.1419 & -0.4518\end{bmatrix}}^T$ where we convert the homogeneous 2D
point ${\begin{bmatrix}us & vs & s\end{bmatrix}}^T$ to its inhomogeneous version by
dividing by 's', resulting in the actual transformed pixel coordinates in the image.


In [None]:
def direct_linear_transformation(pts2d: np.ndarray, pts3d: np.ndarray):
    # pts2d: (3, N), [u, v, 1]
    # pts3d: (4, N), [X, Y, Z, 1]
    assert pts2d.shape[0] == 3 and pts3d.shape[0] == 4
    assert pts2d.shape[1] == pts3d.shape[1]
    num_pts = pts2d.shape[1]
    A = np.zeros((2 * num_pts, 12))
    A[0::2, 0] = pts3d[0, :]
    A[0::2, 1] = pts3d[1, :]
    A[0::2, 2] = pts3d[2, :]
    A[0::2, 3] = 1

    A[1::2, 4] = pts3d[0, :]
    A[1::2, 5] = pts3d[1, :]
    A[1::2, 6] = pts3d[2, :]
    A[1::2, 7] = 1

    A[0::2, 8] = -1 * np.multiply(pts2d[0, :], pts3d[0, :])
    A[1::2, 8] = -1 * np.multiply(pts2d[1, :], pts3d[0, :])
    A[0::2, 9] = -1 * np.multiply(pts2d[0, :], pts3d[1, :])
    A[1::2, 9] = -1 * np.multiply(pts2d[1, :], pts3d[1, :])
    A[0::2, 10] = -1 * np.multiply(pts2d[0, :], pts3d[2, :])
    A[1::2, 10] = -1 * np.multiply(pts2d[1, :], pts3d[2, :])
    A[0::2, 11] = -1 * pts2d[0, :]
    A[1::2, 11] = -1 * pts2d[1, :]

    eigenvalues, eigenvectors = np.linalg.eig(np.dot(A.T, A))
    min_arg = np.argmin(eigenvalues)
    return eigenvectors[:, min_arg]


def minimize_geometric_error(m, pts2d, pts3d, dist=1):
    # m: (12,)
    # norm_pts2d: (3, N)
    # norm_pts3d: (4, N)
    # Does `dist` matter?
    num_points = pts3d.shape[1]
    M = np.reshape(m, (3, 4))
    assert pts2d.shape[0] == 3 and pts2d.shape[1] == num_points
    assert pts3d.shape[0] == 4 and pts3d.shape[1] == num_points

    projected_2dpts = np.dot(M, pts3d)  # (3,4) x (4,N) -> (3,N)
    projected_2dpts = dist * np.divide(projected_2dpts, projected_2dpts[2, :])
    return np.sum((pts2d - projected_2dpts) ** 2)  # (3,N) -> scalar


def generate_projection_matrix(norm_pts2d, norm_pts3d):
    m0 = direct_linear_transformation(norm_pts2d, norm_pts3d)
    res_1 = least_squares(
        minimize_geometric_error, m0, args=(norm_pts2d, norm_pts3d)
    )
    return np.reshape(res_1.x, (3, 4)), res_1.cost


proj_m, res = generate_projection_matrix(orig_norm_pts2d, orig_norm_pts3d)

print("Using normalized 2D and 3D points given by tasks to estimate projection matrix.\n")
print(f"Estimated Projection Matrix: \n{proj_m}\n")
print(f"Normalized Projection Matrix: \n{proj_m / proj_m[2, 3]}\n")
print(f"Residual: {res}")

testing_point = np.array([1.2323, 1.4421, 0.4506, 1]).T
projected_point = np.dot(proj_m, testing_point)
print(projected_point, projected_point / projected_point[-1])

**Discussion:**

Indeed, the results are promising. When utilizing the provided normalized 2D and 3D
points, the estimated projection matrix closely approximates the ground truth projection
matrix provided in the task. Furthermore, when we apply this projection matrix to
project a 3D point from world coordinates to image coordinates, the resulting 2D point
aligns closely with the ground truth 2D point supplied in the task. These outcomes
validate the accuracy and effectiveness of the projection matrix estimation and
transformation process.


### Camera Center Estimation

Once we have the camera projection matrix, we can estimate the camera center using the
following equations. Given the projection matrix $M$:

$$
M = [\mathbf{Q}|\mathbf{b}]
$$

The camera center $C$ can be calculated as:

$$
C =
\begin{bmatrix}
    -\mathbf{Q}^{-1}\mathbf{b} \\
    1 \\
\end{bmatrix}
$$

For debugging purposes, if we use the normalized 3D points to obtain the previously
mentioned projection matrix, we would compute a camera center of:

$$
C_{norm} = {\begin{bmatrix}-1.5125 & -2.3515 & 0.2826\end{bmatrix}}^T
$$


In [None]:
def get_camera_center(m):
    assert m.shape == (3, 4)
    Q = m[:, :3]
    b = m[:, 3]
    return -1 * np.dot(np.linalg.inv(Q), b)


print("The camera center derived from normalized 2D and 3D points is", get_camera_center(proj_m))

### Applying to Unnormalized 2D and 3D Points

Following the implementation of the projection matrix function and its validation with
normalized 2D and 3D points, we can extend these functions to handle unnormalized 2D and
3D points. However, one additional function, `denormalize_projection_matrix`, is
required. This function will enable the conversion of the camera matrix from normalized
coordinates to the original unnormalized coordinates. Once we have this function, we can
proceed to process our unnormalized 2D and 3D points using the following code.


In [None]:
def denormalize_projection_matrix(T, P, U):
    return np.dot(np.linalg.inv(T), np.dot(np.reshape(P, (3, 4)), U))


proj_m, res = generate_projection_matrix(norm_pts2d_left, norm_pts3d)
print(
    "Using unnormalized 2D and 3D points given by tasks to estimate projection matrix.\n"
)
print(f"Estimated Projection Matrix: \n{proj_m}\n")
print(f"Normalized Projection Matrix: \n{proj_m / proj_m[2, 3]}\n")
print(f"Residual: {res}\n")

proj_m = denormalize_projection_matrix(sim_mat_T_l, proj_m, sim_mat_U)
print(
    "The camera center derived from normalized 2D and 3D points is",
    get_camera_center(proj_m),
)

**Discussion:**

As observed, the results demonstrate that the unnormalized and normalized projection
matrices are not identical when applied to the normalized points. This outcome is
entirely expected because, even though the camera projections differ, the underlying
intrinsic, extrinsic parameters, and camera center remain consistent. For instance, the
camera center derived from the projection matrix is
${\begin{bmatrix}-2.1526 & -3.3503 & 0.4023\end{bmatrix}}^T$. After rescaling, the
camera center becomes ${\begin{bmatrix}-1.5120 & -2.3532 & 0.2826\end{bmatrix}}^T$,
which closely aligns with the ground truth value provided by the task. This confirms the
robustness of the calibration and transformation process.


In [None]:
proj_m, res = generate_projection_matrix(norm_pts2d_left, norm_pts3d)
tmp = get_camera_center(proj_m)
print(f"Camera Center without rescaling: {tmp}")
print(f"Camera Center after rescaling is: ", 0.2826 * tmp / tmp[-1])

### Filtering 2D Image Points and 3D World Points

It's vital to recognize that not all matching 2D image points and 3D world points should
be incorporated into the algorithm. Real-world applications frequently introduce
inaccuracies in point matching. In practice, we can enhance accuracy by applying
filtering techniques such as RANSAC to identify and exclude inaccurate matching points
during the camera matrix estimation process.


## Fundamental Matrix Estimation

In our pursuit to estimate the mapping of points in one image to lines in another using
the fundamental matrix, we will apply methodologies akin to those employed in projection
matrix estimation. Our approach will rely on the corresponding point coordinates found
in `pts2d-pic_a.txt` and `pts2d-pic_b.txt`. Recall that the Fundamental Matrix is
defined as:

$$
\begin{bmatrix}
u^{\prime} & v^{\prime} & 1
\end{bmatrix}
\begin{bmatrix}
f_{11} & f_{12} & f_{13} \\
f_{21} & f_{22} & f_{23} \\
f_{31} & f_{32} & f_{33} \\
\end{bmatrix}
\begin{bmatrix}
u \\
v \\
1 \\
\end{bmatrix}
= 0
$$

Again, we will follow algorithm 11.1 in MVG. The essential idea is very similar to
estimating projection matrix.

<img src="images/MVG-algo_11.1.png" style="width:512px;">


In [None]:
def compose_A(pts2d_left, pts2d_right):
    num_points = pts2d_left.shape[1]
    tmp = np.zeros((num_points, 9))
    tmp[:, 0] = np.multiply(pts2d_left[0, :], pts2d_right[0, :])
    tmp[:, 1] = np.multiply(pts2d_left[1, :], pts2d_right[0, :])
    tmp[:, 2] = pts2d_right[0, :]
    tmp[:, 3] = np.multiply(pts2d_left[0, :], pts2d_right[1, :])
    tmp[:, 4] = np.multiply(pts2d_left[1, :], pts2d_right[1, :])
    tmp[:, 5] = pts2d_right[1, :]
    tmp[:, 6] = pts2d_left[0, :]
    tmp[:, 7] = pts2d_left[1, :]
    tmp[:, 8] = 1
    return tmp


def constrain_fundamental_matrix(f):
    u, s, v = np.linalg.svd(f, full_matrices=True)
    D_head = np.array([[s[0], 0, 0], [0, s[1], 0], [0, 0, 0]])
    return np.dot(np.dot(u, D_head), v)  # NOTE: v or v.T


def denormalize_fundamental_matrix(T_prime, F, T):
    return np.dot(np.dot(T_prime.T, F), T)


def generate_fundamental_matrix(pts2d_left, pts2d_right):
    A = compose_A(pts2d_left, pts2d_right)
    eigenvalues, eigenvectors = np.linalg.eig(np.dot(A.T, A))
    min_arg = np.argmin(eigenvalues)
    init_f = np.reshape(eigenvectors[:, min_arg], (3, 3))
    F = constrain_fundamental_matrix(init_f)
    return F


norm_F = generate_fundamental_matrix(norm_pts2d_left, norm_pts2d_right)
F = denormalize_fundamental_matrix(sim_mat_T_r, norm_F, sim_mat_T_l)

epi_left = draw_epipolar_line_in_image(
    left_points=pts2d_right, right_image=left_img, fundamental_matrix=F.T
)
epi_right = draw_epipolar_line_in_image(
    left_points=pts2d_left, right_image=right_img, fundamental_matrix=F
)

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(16, 8))
ax[0].imshow(epi_left)
ax[0].set_title("Epipolar lines in left image", fontsize=12)
ax[1].imshow(epi_right)
ax[1].set_title("Epipolar lines in right image", fontsize=12)
ax[0].axis("off")
ax[1].axis("off")
fig.tight_layout()
plt.show()

## Stereo Image Rectification

In 3D computer vision, once the fundamental model is established, we can utilize the
fundamental matrix for stereo image rectification. Image rectification involves
resampling pairs of stereo images captured from significantly different viewpoints to
produce a set of "matched epipolar projections." These projections align the epipolar
lines parallel to the x-axis, ensuring a consistent match between views. As a result,
disparities between the images only exist in the x-direction, eliminating any
y-disparity.

As previously mentioned, we will adhere to the methodology detailed in MVG Section
11.12, "Image Rectification." Given matching image points
$\mathbf{x}_{i} \leftrightarrow \mathbf{x}'_{i}$ between the left and right images, our
objective is to determine the projective transformations `H1` and `H2` to rectify the
left and right images. The steps for this procedure are summarized as follows:

1. Calculate the fundamental matrix and identify the epipoles `e1` and `e2` in the left
   and right images.

2. Select a projective transformation `H2` that maps the epipole `e2` to the point at
   infinity, ${\begin{bmatrix}1 & 0 & 0\end{bmatrix}}^T$.

3. Determine the optimal matching projective transformation `H1` that minimizes the
   least-squares distance defined as:
    $$
    \sum_i{d(H_{1}\mathbf{x}_{i}, H_{2}\mathbf{x}'_{i})^2}
    $$


### Computing Epipoles from the Fundamental Matrix

Given the fundamental matrix $F$, epipoles are determined by the equations $Fe_{1} = 0$
and $F^{T}e_{2} = 0$. To obtain the epipole, follow these steps:

1. Perform Singular Value Decomposition on the fundamental matrix $F$ to acquire its SVD
   decomposition: $F=U\Sigma V^T$. Here, $U$ and $V$ are orthogonal matrices, and
   $\Sigma$ is a diagonal matrix containing singular values.

2. Extract $e$ as the rightmost column of $V$ (i.e., the last column).

3. Ensure that $e$ is in homogeneous coordinates, representing them as 3x1 column
   vectors with the third component set to 1.


In [None]:
def compute_epipole(fundamental_matrix):
    """Compute epipole using the fundamental matrix."""
    u, s, v = np.linalg.svd(fundamental_matrix)
    e = v[-1, :]
    e /= e[2]
    return e  # (3, )

### Mapping Points at Infinity

For an arbitrarily placed point of interest $\mathbf{x}_0$ and epipole $e$, the required
mapping $H$ is a composite transformation $H = GRT$. Here, $T$ represents a translation
that relocates the point $\mathbf{x}_0$ to the origin, $R$ signifies a rotation about
the origin that aligns the epipole $e_{2}$ with a point
${\begin{bmatrix}f & 0 & 1\end{bmatrix}}^T$ on the $x$-axis, and $G$ is the
transformation that extends this point to infinity. This composite mapping is, to
first-order, a rigid transformation in the vicinity of $\mathbf{x}_0$. It's worth noting
that when applying the translation $T$ to the entire image, we subsequently need to
restore the image to its original coordinates, which is why `np.linalg.inv(T)` is
employed at the end.


In [None]:
def select_homography(image, epipole):
    height, width, _ = image.shape

    # Shift epipole's coord wrt the center of the image.
    T = np.array([[1, 0, -width / 2], [0, 1, -height / 2], [0, 0, 1]])
    e_centered = np.dot(T, epipole)
    e_centered = e_centered / e_centered[2]

    e_x = e_centered[0]
    e_y = e_centered[1]
    theta = np.arctan2(e_y, e_x)
    R = np.array(
        [
            [np.cos(theta), -np.sin(theta), 0],
            [np.sin(theta), np.cos(theta), 0],
            [0, 0, 1],
        ]
    )  # Rotation matrix
    # Take inverse because we are not rotating a point to aligh with `e_centered` but
    # rotating `e_centered` to algin with x-axis.
    R = np.linalg.inv(R)

    # Now e_centered has the form: (f, 0, 1)
    e_centered = np.dot(R, e_centered)
    f = e_centered[0]
    # Move the epipole to infinity
    G = np.array([[1, 0, 0], [0, 1, 0], [-1 / f, 0, 1]])

    # create the overall transformation matrix
    H = np.dot(np.linalg.inv(T), np.dot(G, np.dot(R, T)))
    return H

### Minimizing the Least-Squares Distance

Once `H2` is determined, we can derive the corresponding projective transformation `H1`
that minimizes the expression:

$$
\sum_i{d(H_{1}\mathbf{x}_{i}, H_{2}\mathbf{x}'_{i})^2}
$$

For a detailed algorithm, please refer to Section 11.12.2 and Corollary 11.4 in MVG.
It's important to note that when calculating the 3x3 submatrix $M$ of the camera matrix,
it must be made non-singular to prevent numerical instability. One common approach is to
introduce noise to the target matrix, and the following matrix can be employed for this
purpose (as discussed
[here](https://dsp.stackexchange.com/questions/89480/understanding-and-resolving-singularities-in-stereo-rectification-using-multiple)):

```Python
noise = np.array([[e[0], e[0], e[0]],
                  [e[1], e[1], e[1]],
                  [e[2], e[2], e[2]]])
```


In [None]:
def minimize_distance(e, F, H, pts2d_left, pts2d_right):
    noise = np.array([[e[0], e[0], e[0]], [e[1], e[1], e[1]], [e[2], e[2], e[2]]])
    skewed_e = np.array([[0, -e[2], e[1]], [e[2], 0, -e[0]], [-e[1], e[0], 0]])
    M = np.dot(skewed_e, F) + noise  # Singular matrix to non-singular matrix
    proj_pts2d_left = np.dot(np.dot(H, M), pts2d_left)  # 3 x num_pts
    proj_pts2d_right = np.dot(H, pts2d_right)
    proj_pts2d_left /= proj_pts2d_left[2, :]  # 3 x num_pts
    proj_pts2d_right /= proj_pts2d_right[2, :]
    x_coords = proj_pts2d_right[0, :]  # (num_pts,)
    # np.linalg.lstsq((M, N) array_like, {(M,), (M, K)} array_like)
    (a, b, c) = np.linalg.lstsq(proj_pts2d_left.T, x_coords, rcond=None)[0]
    H_A = np.array([[a, b, c], [0, 1, 0], [0, 0, 1]])
    H1 = np.dot(H_A, np.dot(H, M))
    return H1

### Putting It All Together

With all the necessary components in place, we can now compute the projective
transformations `H1` and `H2`. These transformations will be applied to rectify and warp
the original left and right images. Additionally, we will project our original 2D image
points, `pts2d_left` and `pts2d_right`, onto the rectified 2D coordinate space,
resulting in `proj_pts2d_left` and `proj_pts2d_right`. Subsequently, we will follow the
previously outlined procedure to recompute the fundamental matrix using
`proj_pts2d_left` and `proj_pts2d_right`. This allows us to generate new epipolar lines
within the rectified images.

Furthermore, an intriguing aspect to note is the differing use of the transformation
matrix. When projecting 2D image points with `H1`, it employs the direct matrix, whereas
during image warping, it utilizes the **inverse** of the matrix to avoid "holes" in the
interpolated rectified image.


In [None]:
def compute_matching_projective_transformation(
    f_mat, right_img, pts2d_left, pts2d_right
):
    e2 = compute_epipole(f_mat.T)
    H2 = select_homography(right_img, e2)
    H1 = minimize_distance(e2, f_mat, H2, pts2d_left, pts2d_right)
    return H1, H2


H1, H2 = compute_matching_projective_transformation(
    F, right_img, pts2d_left, pts2d_right
)
rectified_img_left = warp(left_img, ProjectiveTransform(matrix=np.linalg.inv(H1)))
rectified_img_right = warp(right_img, ProjectiveTransform(matrix=np.linalg.inv(H2)))

proj_pts2d_left = np.dot(H1, pts2d_left)
proj_pts2d_left = proj_pts2d_left / proj_pts2d_left[2, :]
proj_pts2d_right = np.dot(H2, pts2d_right)
proj_pts2d_right = proj_pts2d_right / proj_pts2d_right[2, :]

norm_pts2d_left, sim_mat_T_l = normalize_image_points(
    proj_pts2d_left, avg_distance=np.sqrt(2)
)
norm_pts2d_right, sim_mat_T_r = normalize_image_points(
    proj_pts2d_right, avg_distance=np.sqrt(2)
)

new_F = denormalize_fundamental_matrix(
    sim_mat_T_r,
    generate_fundamental_matrix(norm_pts2d_left, norm_pts2d_right),
    sim_mat_T_l,
)

rectified_img_left *= 255
rectified_img_left = rectified_img_left.astype(np.uint8) 
rectified_img_right *= 255
rectified_img_right = rectified_img_right.astype(np.uint8) 
epi_right = draw_epipolar_line_in_image(
    left_points=proj_pts2d_left, right_image=rectified_img_right, fundamental_matrix=new_F
)
epi_left = draw_epipolar_line_in_image(
    left_points=proj_pts2d_right,
    right_image=rectified_img_left,
    fundamental_matrix=new_F.T,
)

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(16, 8))
ax[0].imshow(epi_left)
ax[0].set_title("Rectified left image", fontsize=12)
ax[1].imshow(epi_right)
ax[1].set_title("Rectified right image", fontsize=12)
ax[0].axis("off")
ax[1].axis("off")
fig.tight_layout()
plt.show()

## An Alternative Approach for Transformation Matrix Calculation

In this tutorial, we follow algorithms discussed in "Multiple View Geometry in Computer
Vision (MVG)" for various stereo geometry calculations. However, it is clear that the
initial rectified image suffers from significant distortion. To address this issue, an
alternative method for rectifying stereo images with reduced distortion is explored in
the paper "Computing Rectifying Homographies for Stereo Vision" by Charles Loop and
Zhengyou Zhang.
