# 💻 02 - Homographies

A *pinhole camera* is perhaps the simplest representation of how cameras, including those used on the Duckiebot, produce images of the scene by recording levels of incident light that reflect of objects in the scene and strike the sensor. The *ideal* pinhole camera treats the aperture as being a point, however most cameras couple wider apertures that allow more light to pass through with lenses that focus reflected light. Coupled with models of lens distortion, many cameras used in practice can be modeled as a pihnole camera.

<figure>
  <div style="text-align:center;">
  <img src="../images/pinhole_camera_model/duckie-pinhole.png", width=400px>
  <figcaption>A visualization of a simple pinhole camera.</figcaption>
  </div>
</figure>

The *pinhole camera model* is a mathematical model that describes the projection of a point in a three-dimensional space onto a two-dimensional image plane by an *ideal* pinhole camera, according to perspective projection. Consider a three-dimensional Cartesian reference frame with its origin at the camera (optical) center $C$ and the positive $Z$-axis pointing out of the camera. The $Z$-axis is referred to as the *principal axis*. The two-dimensional image plane is perpendicular to principal axis and located at a distance $f$ behind the camera center (i.e., $z=-f$), where the parameter $f$ is the focal length. In order to avoid dealing with inverted images, we can treat the image plane as being in the positive $Z$-direction, again at a distance $f$ from the camera center.

<figure>
  <div style="text-align:center;">
  <img src="../images/pinhole_camera_model/pinhole-projection-a.pdf", width=400px>
  <figcaption>A visualization of a simple pinhole camera.</figcaption>
  </div>
</figure>

Consider a point in the reference frame of the camera defined in terms of its Cartesian coordinates $\tilde{\mathbf{X}}_\textrm{cam} = [x \; y \; z]^\top$. We can express this point in terms of its homogeneous 4-vector, $\mathbf{X}_\textrm{cam} = [x \; y \; z\; 1]^\top$. We can relate the vector $\mathbf{X}$ to its projection onto the image $\mathbf{x}$, expressed in terms of its homogeneous 3-vector as

$$ 
\begin{align}
\begin{bmatrix}
fx\\
fy\\
z
\end{bmatrix} &=
\begin{bmatrix}
f & 0 & 0 & 0\\
0 & f & 0 & 0\\
0 & 0 & 1 & 0
\end{bmatrix}
\begin{bmatrix}
x\\
y\\
z\\
1
\end{bmatrix}\\
\mathbf{x} &= P \mathbf{X}_\textrm{cam}
\end{align}
$$

where $P$ is the homogeneous *camera projection matrix*.

Thus far, we have assumed that the origin in the image coordinates is at the principal point $\textbf{p}$, which is the point at which the principal axis intersects the image plane. In practice, the origin may be located elsewhere and the principal point will have coordinates $[p_x \; p_y]$. Meanwhile, digital cameras may include pixels that are not square, which is sometimes represented as focal lengths $(f_x, f_y)$ that differ in the two image-space directions. Together, this gives rise to a more general expression for the camera matrix:

$$ 
\mathbf{x} = 
\begin{bmatrix}
f_x x + p_x\\
f_y y + p_y\\
z
\end{bmatrix} =
\begin{bmatrix}
f_x & 0 & p_x & 0\\
0 & f_y & p_y & 0\\
0 & 0 & 1 & 0
\end{bmatrix}
\begin{bmatrix}
x\\
y\\
z\\
1
\end{bmatrix}
$$

We can express this as 

$$
\mathbf{x} = K [I \; \vert \; \mathbf{0}]\mathbf{X}_\textrm{cam} \qquad 
K = 
\begin{bmatrix}
f_x & 0 & p_x\\
0 & f_y & p_y\\
0 & 0 & 1
\end{bmatrix}
$$

where $K$ is the *camera calibration matrix*, $I$ is a $3 \times 3$ identity matrix, and $\mathbf{0}$ is a three-vector of zeros.