# COMP8510 Project-01 Part-I Camera Calibration using OpenCV
>## Group Information - Name [UWinID, StudentNumber]
>* Member-1: Jiajie Yang [yang4q, 110115897]
>* Member-2: Jilsa Chandarana [chandarj, 110105879]

>## Introduction
This program calibrates an RGB camera using the builtin OpenCV function.
The input files are a file, containing a list of 3D coordinates, and a file, containing the corresponding 2D pixel coordinates.
The output is a 3x4 calibration matrix and the average errors (difference between projected points and original pixels).


In [None]:
import numpy as np
import cv2 as cv

>## 1 Read Input Files and Construct Matrices $\textbf{P}$ and $\textbf{Q}$
In this section, we read $n$ pairs of points, $(P_i, p_i)$, from input files containing 2D and 3D coordinates. \\
We construct two metrices called $\textbf{P}$ which represents the 3D coordinates of object in scene coordinate system and $\textbf{Q}$ which represents the 2D coordinates corresponding pixel in image coordinate system. \\
The shape of $\textbf{P}$ is originally $\textbf{(n, 3)}$ and shape of $\textbf{Q}$ is $\textbf{(n, 2)}$ where $\textbf{n}$ is the number of given points.  
<!-- Then, we construct the matrix $\textbf{A}\in \textbf{M}_{(2n)\times 12}(\mathbb{R})$ as follows,
\begin{align}
        \textbf{A} 
        &= \begin{pmatrix}
        \textbf{G_1} \\
        \textbf{G_2} \\
        ... \\
        \textbf{G_n}
    \end{pmatrix}, \text{where } \textbf{G}_j
        =\begin{pmatrix}
        -X_j & -Y_j & -Z_j & -1 & \vec{0}^\top & u_jX_j & u_jY_j & u_jZ_j & u_j \\
        \vec{0}^\top & -X_j & -Y_j & -Z_j & -1 & v_jX_j & v_jY_j & v_jZ_j & v_j
    \end{pmatrix} \\
        &=\begin{pmatrix}
        -X_1 & -Y_1 & -Z_1 & -1 & 0 & 0 & 0 & 0 & u_1X_1 & u_1Y_1 & u_1Z_1 & u_1 \\
        0 & 0 & 0 & 0 & -X_1 & -Y_1 & -Z_1 & -1 & v_1X_1 & v_1Y_1 & v_1Z_1 & v_1 \\
        -X_2 & -Y_2 & -Z_2 & -1 & 0 & 0 & 0 & 0 & u_2X_2 & u_2Y_2 & u_2Z_2 & u_2 \\
        0 & 0 & 0 & 0 & -X_2 & -Y_2 & -Z_2 & -1 & v_2X_2 & v_2Y_2 & v_2Z_2 & v_2 \\
        &&&&&... \\
        -X_n & -Y_n & -Z_n & -1 & 0 & 0 & 0 & 0 & u_nX_n & u_nY_n & u_nZ_n & u_n \\
        0 & 0 & 0 & 0 & -X_n & -Y_n & -Z_n & -1 & v_nX_n & v_nY_n & v_nZ_n & v_n
        \end{pmatrix}
    \end{align}
In addition, we initialize the matrix $P$ from the input 3D coordinate file, defined as the following,
\begin{align} P = \begin{pmatrix}
        X_1 & ... & X_i & ... & X_n \\
        Y_1 & ... & Y_i & ... & Y_n \\
        Z_1 & ... & Z_i & ... & Z_n \\
        1   & ... & 1   & ... & 1
    \end{pmatrix} \in M_{4\times n}(\mathbb{R})
    \end{align}
Also, we initialize the matrix $Q$ from the input 2D coordinate file, defined as the following,
\begin{align} Q = \begin{pmatrix}
        u_1 & ... & u_i & ... & u_n \\
        v_1 & ... & v_i & ... & v_n \\
        1   & ... & 1   & ... & 1
    \end{pmatrix} \in M_{3\times n}(\mathbb{R})
    \end{align}
Based on the perspective projection equation, $P$ and $Q$ are expected to have the following property,
\begin{align} \exists \lambda_1, ..., \lambda_n,
        Q = MP(\lambda_1, ..., \lambda_n)^\top
    \end{align} -->

In [None]:
# define paths of the two input files
path_2d = "2D.txt"
path_3d = "3D.txt"

# opening both the files in reading modes
with open(path_2d) as f_2d, open(path_3d) as f_3d:
    num_line_2d = int(f_2d.readline().split('\n')[0])
    num_line_3d = int(f_3d.readline().split('\n')[0])
    if num_line_2d != num_line_3d:
        print("Error: Number of Points does NOT Match in 2D.text and 3D.txt")
        exit()
    # initialize matrics P and Q for input files
    P = np.ones((num_line_3d, 3))
    Q = np.ones((num_line_3d, 2))

    # initialize Matrix A
    A = np.zeros((num_line_3d * 2, 12))

    # fill Matrix P, Q, and A
    while num_line_2d > 0:
        # define the index of point, i
        i = num_line_3d - num_line_2d

        # define 2d coord (u,v)
        line_2d = f_2d.readline().split(' ')
        u = float(line_2d[0])
        v = float(line_2d[1])
        
        # fill matrix Q with u, v
        Q[i, 0] = u
        Q[i, 1] = v

        # define 3d coord (x,y,z)
        line_3d_raw = f_3d.readline().split(' ')
        line_3d_fl = []
        for j in range(len(line_3d_raw)):
            if len(line_3d_raw[j]) > 0:
                line_3d_fl.append(float(line_3d_raw[j]))
        x = line_3d_fl[0]
        y = line_3d_fl[1]
        z = line_3d_fl[2]
        
        # fill matrix P with x, y, z
        P[i, 0] = x
        P[i, 1] = y
        # P[i, 2] = z
        # Z = 0, explained later
        P[i, 2] = 0 
        
        # update
        num_line_2d -= 1

#print(P)
#print(Q)

>## 2 Structuring the Input

We are going to use the $\textbf{calibrateCamera}$ function of OpenCV. The required parameters for it are as following. \\

<ul>
  <li> ObjectPoints </li>
  <ul>
    <li> Datatype: InputArrayOfArray of $\textit{float32}$ </li>
    <li> Expected Shape : $\textit{(n_images, n_points, 3)}$ where n_images are the number of images and n_points are number of points per image </li>
  </ul>
  <li> ImagePoints </li>
  <ul>
    <li> Datatype: InputArrayOfArray of $\textit{float32}$</li>
    <li> Expected Shape : $\textit{(n_images, n_points, 1, 2)}$ where n_images are the number of images and n_points are number of points per image </li>
  </ul>
</ul>

It is assumed that in single image, the object with checkbox pattern is kept parallel to the Z-axis of Object coordinate system and it is proven that it doesn't affect the generalization of the calculation. Moreover, the function expects planar coordinates for one image. So we have kept Z=0 for all the points and to differentiate the levels, we have considered them to be in different images so in our case, we have 3 unique values for Z and 9 points for each values. That's why we will reshape our P accordingly. \\
And as for the image size, we were only given the points so we have assumed it to be (100,100).

In [None]:
# Final Object and Image points array
objectPoints = []
imagePoints = []

# Dividing coordinates according to planes 
for i in range(3):
  P_slice = P[9*i:9*(i+1)].reshape(9, 3).astype('float32')
  Q_slice = Q[9*i:9*(i+1)].reshape(9, 1, 2).astype('float32')
  objectPoints.append(P_slice)
  imagePoints.append(Q_slice)

# Converting lists to np-array
objectPoints = np.array(objectPoints)
imagePoints = np.array(imagePoints)

# print(objectPoints.shape)
# print(imagePoints.shape)

>## 3 Function Call $\textbf{calibrateCamera}$

The function $\textbf{calibrateCamera}$ returns 5 values. 

<ul>
  <li> ret </li>
  <ld> - Status of execution </ld>
  <li> mtx </li>
  <ld> - Camera metrix of its intrinsic parameters </ld>
  <li> dist </li>
  <ld> - Distortion coefficients</ld>
  <li> rvecs </li>
  <ld> - Array of rotation vector </ld>
  <li> tvecs </li>
  <ld> - Array of translation vector </ld>
</ul>

For more details, click [here](https://docs.opencv.org/4.x/dc/dbb/tutorial_py_calibration.html)

In [None]:
# Assuming image size to be (100, 100) 
img_size = (100, 100)
ret, mtx, dist, rvecs, tvecs = cv.calibrateCamera(objectPoints, imagePoints, img_size, None, None)

# print(mtx.shape)

>## 4 Calculating the Perspective Projection Matrix

As we have considered that points with different values of Z falls on different image, the function returns 3 pair of rotation and translation vectors. Using them, we can calculate 3 perspective projection matrix for 3 images.  

In [None]:
# List of projection matrices
p_mtx = []

for i in range(len(objectPoints)):
  # Function to convert rotation vector to rotation matrix
  R = cv.Rodrigues(rvecs[i])[0]
  t = tvecs[i]
  Rt = np.concatenate([R,t], axis=-1) # [R|t]
  p_mtx.append(np.matmul(mtx,Rt)) # A[R|t]

>## 5 Coordinate Prediction and Error Measurement

As we already know that, \\
$p = MP$ \\
where $\textbf{p}$ is 2D image coordinates, $\textbf{P}$ is 3D object coordinates and $\textbf{M}$ is 3x4 projective matrix. 

So, we will multiply each points with M to get its 2D projection. And we will take its euclidean distance from the given value to calculate its mean error.

In [None]:
# Initialize mean error as 0
mean_error = 0

# Calculating Total number of points
total_points = objectPoints.shape[0] * objectPoints.shape[1]

# Loop through all coordinates to get the prediction and calculate its euclidean distance from original coordinates
for i in range(objectPoints.shape[0]):
  for j in range(objectPoints.shape[1]):
    current_point = np.concatenate([objectPoints[i, j, :], [1]], axis=0) # Fetching points one by one
    predicted_point = np.matmul(p_mtx[i], current_point) # Prediction by multiplication
    predicted_point = predicted_point / predicted_point[-1] # Adjusting the scale factor
    predicted_point = predicted_point[:2].astype('float32') # Type casting 

    error = cv.norm(imagePoints[i, j, 0, :], predicted_point, cv.NORM_L2)/total_points
    mean_error = mean_error + error

# mean_error = mean_error / P.shape[0]
print("Mean Error : ", mean_error)

Mean Error :  9.316497449091349e-06


>## Extra: 

We can directly predict the 2D coordinates using $\textbf{projectPoints}$ function of OpenCV. This doesn't require us to explicitely calculate projection matrix and we can also calculate error using this. It is observed that it actually gives even better result.

In [None]:
mean_error = 0
for i in range(len(objectPoints)):
    imgpoints2, _ = cv.projectPoints(objectPoints[i], rvecs[i], tvecs[i], mtx, dist)
    error = cv.norm(imagePoints[i], imgpoints2, cv.NORM_L2)/len(imgpoints2)
    mean_error += error
print( "total error: {}".format(mean_error/len(objectPoints)) )

total error: 0.0


>## 6 Conclusion
The calibration matrix we calculated in section 2 is valid because the average error is significantly small.