# Chapter 4 - Camera Calibration

This article discusses how one can derive the perspective of a camera by the given landmarks. Also, it explains how 
many of such landmarks are needed to be able to get the correct perspective and also explains the different methods 
for calibrated and uncalibrated cameras. Uncalibrated cameras are ones for which we don't know the accurate intrinsic 
parameters. We explain how camera calibration works, so how we can find the intrinsics and what aspects are important 
for this.




## Perspective from *n* Points (PnP Problem)

The goal of Perspective from n Points algorithms is to find the pose of the camera in respect to the world reference 
frame. This means we want to find the six degrees of freedom describing the position and orientation of the camera. 
As input we have the landmarks in the world reference frame as well as their images on the the camera plane. 
In the case of a uncalibrated camera we additionally want to find out the camera's intrinsic parameters which we have 
discussed already in chapter >>>XY<<<.

Let's now find out how many point we need to get a unique solution. First we look into the case of a calibrated camera.

First for only one point for which we know the position in the world and the position on the image plane of the camera 
we have infinite possible solution. This is because the camera can be placed all around that point in the world 
with any distance from it.

![image for only one point and inf solutions]

For two points we know the actual distance between the points in the world as well as for the ones on the image plane. 
Because of the fixed focal length of the camera we know the angle between the two points in the image plan and the 
camera. Therefore the camera can only be at places where the world points lie on the two lines connecting the camera point 
with the two image points. As a result we know the geometric figure describing all possible position for the 
camerapoints has to be rotationally symmetrical around the axis described by the line that intersects both world points. 
To find the exact geometrical figure we now inspect the figure that is rotated. The only figure that fulfills 
the condition that the angle of the lines connetcing the world points with the camera points through the image point on 
a plane is the circle. Therefore we have now found the correct geometrical figure in 3D that describes the possible 
location the camera can have when we have given 2 points. It is called "spindel torus".
For a chitinous reference frame this results still in a infinite number of possible positions but at least we 
know that it is bounded.

![image for two points and inf solutions]

Let us now try it with three points. Again we draw lines from the points in the world reference frame through tier 
image on the camera plane. Dependent on where we place the camera these lines can intersect each other. In the case 
where all three lines intersect in one single point we have found a valid place for the camera point to be. To derive 
how many soultions there are and how to find them we take advantage of the law of cosine. In the previously described 
sotiution situation we hav three different triangles with their cornes at two of the worl-points and one at the camera 
center. Also for these triangles two edges intersect the cameraplane at the image-points. The law of cosine states 
that for each trinagle the squared distance between the two image points ($s_i$) is equal to the sum of the squared distance both edges conneting the image points with the camera center substracing the twice the product of the times the cosine of the angle at the cameracenter.

${s_i}^2={A_i}^2+{B_i}^2 - 2A_iB_icos(\theta_i)$

Given by the points in the world reference frame we know $s_i$ and $\theta_i$ is given by the points on the image plane and the focal length given by the camera intrinsics. We see that each of these equations has degree two. In total we have three unknows since $A_2 = B_1, A_3 = B_2, A_1 = B_3$ and we also have three equations. For systems with $n$ independent equations and $n$ variables it is know that the number of solutions has an upper bound equal to the product of the degrees of the equations. In our case this results to $2^3=8$. Due to the second degree of the equations we know that for every positive solution there is also a negative solution. So given that we expect the camera plane to be inbetween the world points and the camera point there are only 4 possible solutions.
So to get an disambigous solution we need a 4th point. 

Worth mentioniing is also the the system of the three equation can also be reformulated into a system with degree 4. $$G_0 + G_1x + G_2x^2 + G_3x^3 + G_4x^4$$

Since we only need three points to get down to 4 solutions the PnP is often also referred to as P3P however as we have seen this is not completely correct as we need one more point to get an disambigous solution. So P3+1P would fit best as a name.



## DLT - Camera calibration

In the section about the PnP we assumend the angle between imagepoints and camera center as given. However for getting this angle we need the focal length with is unique to each camera, however we only know the exact focal langth in the case of a calibrated camera. So what do we do when we don't know the camera's instrinsic? Well we have to get them firstusing the Direct Linear Transform (DLT)

So our goal is to compute the camera intrinsics K, R and T. We know that the intrinsics have to fulfill the projection equation which we looked at in Chapter 3. Remember that $\widetilde{p}$ the image point locations in homogeneous coordinates.

![Projection equation part 1](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_4/1_projection_equation.png)
![Projection equation part 2](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_4/2_projection_equation_part2.png)
*Figure 1: Projection Equation. [source](http://rpg.ifi.uzh.ch/docs/teaching/2019/03_image_formation_2.pdf)*

The matrix with the twelve entries $m_{ij}$ is called M. We can also write each row of M as ${m_i}^T$ so, 

$\begin{bmatrix} m_{i1} & m_{i2} & m_{i3} & m_{i4} \end{bmatrix}= {m_i}^T$

Also we denote $\begin{bmatrix} X_w \\ Y_w \\ Z_w \\ 1 \end{bmatrix}$ as $P$

To get back to the normal pixel coordinates $u$ and $v$ we have to normalize homogeneous coordinates in respect of the third dimension.

![Homogeneous to pixel coordinates part1](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_4/3_homogeneous_to_pixel.png)
*Figure 2: Conversion homogeneous to pixel coordinates. [source](http://rpg.ifi.uzh.ch/docs/teaching/2019/03_image_formation_2.pdf)*

Now we can inster $i=1$ and swap the positions of $m$ and $P$. When we do this for all $i$ we get a large (2n x 3) matrix equation. This large Matrix is also called Q.

![Matrix equation](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_4/4_matrix_equation.png)
![Matrix equation part2](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_4/5_matrix_equation_part2.png)
![Matrix equation part3](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_4/6_matrix_equation_part3.png)
*Figure 3: Q matrix. [source](http://rpg.ifi.uzh.ch/docs/teaching/2019/03_image_formation_2.pdf)*

Please take note that the entries in the vectore are no longer ${m_i}^T$ but ${m_i}$, so they are no longer transposed which means that the second matrix in not a matrix but rather a vector.

When solving this equation system we know that for getting a unique solution then the matrix Q need one less rank than it has rows. Therefore it need to have rank 11. So the queation now is how many points do we need to be able to create a Q matrix with rank 11? Well each point provides 2 equations, so as a result we need at leat 5.5 points which obviously does not work out. So therefore at least 6 points are required to be able to calculate the camera's intrinsics. In practice one can use Singular Value Decomposition for calculating the solution for the Vetor M.

Now we know that we need at least six points to be able to calculate the intrinsics. However the correct number of points is not a guarantee for getting the correct solution. The points have to fulfill certain conditions. First they arent allowed to be all on one plane (coplanar). Another invalid constelation is when all points lie on a line that also goes through the center of projection, howver this case is covered by the restriction of not beeing coplanar.
Another more complex case is that the points and the center of projection are not allowed to be an the same twisted cubic.