# Geometric Camera Calibration
* To be able to map an external world to an image, we need the transformation from the external 3D world into the 3D camera (extrinsic) and then the 3D camera to the 2D sensor image pixel (intrinsic)

## Extrinsic Parameters
### Camera Pose
* A camera can have 6 degrees of freedom to move around
![](img/CameraDegFreedom.png)

### Notations
* The superscript denotes the co-ordinate system we are refering to
* So any point P can be located in the A coordinate system at
$$^AP = \left[
\begin{array}\\
^Ax \\
^Ay \\
^Az \\
\end{array}
\right] <=> \vec{OP} = (^Ax.\vec{i_{A}}) + (^Ay.\vec{j_{A}}) + (^Az.\vec{k_{A}})$$
  * i.e.: As <sup>A</sup>x, etc.. or as a vector OP where O is origin, multipled with the corresponding axis vectors
  
### Translation
* If we want to express the same point in a different coordinate system (say B), then we just need to add the origin of the B coordinate system's value with respect to the A coordinate system to the <sup>A</sup>P value
$$^BP = ^B(O_{A}) + ^AP$$
OR
$$\left[
\begin{array}\\
^BP \\
1 \\
\end{array}
\right] = \left[
\begin{array}\\
I & ^BO_{A} \\
0^T & 1 \\
\end{array}
\right]\left[
\begin{array}\\
^AP \\
1 \\
\end{array}
\right]$$
Where 
  * I is a 3x3 identity matrix
  * 0<sup>T</sup> is a 3x1 row vector
  * <sup>B</sup>O<sub>A</sub> is a 3x1 column vector
  * So the middle matrix is actually a 4x4 matrix

### Rotations
![](img/Rotations.png)
* The two coordinates share the same origin, but the axis is rotated like above
$$\vec{OP} = \left(
\begin{array}\\
i_A & j_A & k_A \\
\end{array}
\right)\left(
\begin{array}\\
^Ax \\
^Ay \\
^Az \\
\end{array}
\right) = \left(
\begin{array}\\
i_B & j_B & k_B \\
\end{array}
\right)\left(
\begin{array}\\
^Bx \\
^By \\
^Bz \\
\end{array}
\right)$$
$$^BP = (^B_AR)^AP$$
Where
$$(^B_AR) = \left[
\begin{array}\\
i_A.i_B & j_A.i_B & k_A.i_B \\
i_A.j_B & j_A.j_B & k_A.j_B \\
i_A.k_B & j_A.k_B & k_A.k_B \\
\end{array}
\right] = \left[
\begin{array}\\
^Bi_A & ^Bj_A & ^Bk_A \\
\end{array}
\right] = \left[
\begin{array}\\
(^Ai_B)^T \\
(^Aj_B)^T \\
(^Ak_B)^T \\
\end{array}
\right]$$
  * Note that this is orthogonal matrix, so the inverse is equal to the transponse
  * Which makes sense becuase to go back from B to A, we'll get the result by just transposing
$$\left[
\begin{array}\\
^BP \\
1 \\
\end{array}
\right] = \left[
\begin{array}\\
^B_AR & 0 \\
0^T & 1 \\
\end{array}
\right]\left[
\begin{array}\\
^AP \\
1 \\
\end{array}
\right]$$
  * Note that this is not commutative, diagnal elements aren't valid

### Rigid Transformations (both translation & rotation)
* Just combine the operations
$$\left[
\begin{array}\\
^BP \\
1 \\
\end{array}
\right] = \left[
\begin{array}\\
I & ^BO_{A} \\
0^T & 1 \\
\end{array}
\right]\left[
\begin{array}\\
^B_AR & 0 \\
0^T & 1 \\
\end{array}
\right]\left[
\begin{array}\\
^AP \\
1 \\
\end{array}
\right] = \left[
\begin{array}\\
^B_AR & ^BO_{A} \\
0^T & 1 \\
\end{array}
\right]\left[
\begin{array}\\
^AP \\
1 \\
\end{array}
\right] = ^BT_A\left[
\begin{array}\\
^AP \\
1 \\
\end{array}
\right]$$
* If we want to go the revese, it should be <sup>A</sup>T<sub>B</sub> which is just the inverse of that matrix <sup>B</sup>T<sub>A</sub>.
  * Only the rotation matrix alone is orthogonal, combined it's not. So inverse is not transpose
  
### Extrinsic Parameter Matrix
$$^C\vec{P} = {^C_WR}{^W\vec{P}} + {^C_Wt}$$
Where 
* <sup>C</sup><sub>W</sub>R is the rotation matrix
* <sup>C</sup><sub>W</sub>t is the translation matrix
* <sup>C</sup>P is the point in the camera coordinate system
* <sup>W</sup>P is the point in the real world system

(OR)
![](img/ExtrinsicCameraParam.png)

## Intrinsics
* This is the projection of the 3D image within the camera onto the 2D surface of the sensor
* We already did the ideal perspective projection in 3A - Imaging (This is an ideal scenario and does not work everywhere)
  1. We assumed that all the images will be scaled down by f/Z. But f/Z might not be a pixel point. So we need to sample this to a pixel point so we introduce alpha.
  2. The pixels need not be square, it could have different sizes in the x & y direction. So we introduce a beta for the y and keep alpha for the x
  3. We assumed that the center of the camera is lined up to (0,0). This need not be a case. So we add a u0 and v0 offsets to each of the directions. These will be a constant for the image.
  4. Maybe there is a bit of a skew between the u and v axis on the sensor (the sensor does not have perfect 90 deg corners), so we need to correct for it based on the angel of skew, theta.

so
$$u = \alpha\frac{x}{z} - \alpha\cot(\theta)\frac{y}{z} + u_0$$
$$v = \frac{\beta}{\sin(\theta)}\frac{y}{z} + v_0$$

**GOD** that's ugly.

### Homogeneous Coordinates
* So we do our usual trick, we move back to homogeneous coordinates to make a beautiful matrix representation
$$\left[\begin{array}\\
z*u \\
z*v \\
z \\
\end{array}\right] = \left[\begin{array}\\
\alpha & -\alpha\cot(\theta) & u_0 & 0 \\
0 & \frac{\beta}{\sin(\theta)} & v_0 & 0 \\
0 & 0 & 1 & 0 \\
\end{array}\right]\left[\begin{array}\\
x \\
y \\
z \\
1 \\
\end{array}\right]$$
$$\vec{P'} = K \vec{^cP}$$
  * Note that to get the 2D pixel on the camera by dividing the left side by z
* We can rewrite the notations and get a much simpler matrix for K
$$K = \left[\begin{array}\\
f & s & C_x \\
0 & af & C_y \\
0 & 0 & 1 \\
\end{array}\right]$$
Where
  * f is the focal length of the camera
  * s is the skew
  * a is the aspect ratio
  * C<sub>x</sub>, C<sub>y</sub> are the offsets
  * Note that if a, s and Cx & Cy are 0, in a perfect universe, we get back the perspective projection equation
* We can combine this along with the extrinsic parameter matrix to form one single matrix called **M** which is the entire camera calibration matrix

## Camera Calibration Matrix
$$\left[\begin{array}\\
z*u \\
z*v \\
z \\
\end{array}\right] = \left[\begin{array}\\
f & s & C'_x \\
0 & af & C'_y \\
0 & 0 & 1 \\
\end{array}\right]\left[\begin{array}\\
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{array}\right]\left[\begin{array}\\
i_A.i_B & j_A.i_B & k_A.i_B & 0 \\
i_A.j_B & j_A.j_B & k_A.j_B & 0 \\
i_A.k_B & j_A.k_B & k_A.k_B & 0 \\
0 & 0 & 0 & 1 \\
\end{array}\right]\left[\begin{array}\\
1 & 0 & 0 & ^BOx_{A} \\
0 & 1 & 0 & ^BOy_{A} \\
0 & 0 & 1 & ^BOz_{A} \\
0 & 0 & 0 & 1 \\
\end{array}\right]\left[\begin{array}\\
^Ax \\
^Ay \\
^Az \\
1 \\
\end{array}\right]$$
* The matrices from right to left (order of operation) is translation, rotation, projection, intrinsics
* Looks kinda like a camera lens dosen't it! :P
* We can also represent the matrix as a single **M** matrix
$$M = \left[\begin{array}\\
m_{00} & m_{01} & m_{02} & m_{03} \\
m_{10} & m_{11} & m_{12} & m_{13} \\
m_{20} & m_{21} & m_{22} & m_{23} \\
\end{array}\right]$$

## Calibration
* Note that for any given point in the world plane, the corresponding pixel location is given by
$$\begin{array}\\
u_i = \frac{m_{00}X_i + m_{01}Y_i + m_{02}Z_i + m_{03}}{m_{20}X_i + m_{21}Y_i + m_{22}Z_i + m_{23}}\\
v_i = \frac{m_{10}X_i + m_{11}Y_i + m_{12}Z_i + m_{13}}{m_{20}X_i + m_{21}Y_i + m_{22}Z_i + m_{23}}\\
\end{array}$$
* To solve for all the m's we can rewrite as
$$\left[\begin{array}\\
X_i & Y_i & Z_i & 1 & 0 & 0 & 0 & 0 & -u_iX_i & -u_iY_i & -u_iZ_i & -u_i \\
0 & 0 & 0 & 0 & X_i & Y_i & Z_i & 1 & -v_iX_i & -v_iY_i & -v_iZ_i & -v_i \\
\end{array}\right]\left[\begin{array}\\
m_{00} \\
m_{01} \\
m_{02} \\
m_{03} \\
m_{10} \\
m_{11} \\
m_{12} \\
m_{13} \\
m_{20} \\
m_{21} \\
m_{22} \\
m_{23} \\
\end{array}\right] = \left[\begin{array}\\
0 \\
0 \\
\end{array}\right]$$
* Some things to note
  * m can be unit vectors as the scale doesn't matter. If m is scaled, the equations for u and v will just cancel out the scale
  * So we can try to find m such that ||Am|| is minimized but not 0
* The answer is
**The eigenvector of A<sup>T</sup>A with the lowest eigenvalue**
  * And we need atleast 6 points to find out M because the camera has 11 degrees of freedom and with 6 points, we get 12 equations
  
### Solving to M (SVD)
* Any matrix can be decomposed into 
$$A = UDV^T$$
Where
  * U and V are orthogonal matrices
  * D is a diagonal martix with the values in decreasing order