# Image Formation

Nina Miolane, UC Santa Barbara

<center><img src="figs/00_dip.jpeg" width=900px alt="default"/></center>


# Image Formation

Before we can analyze and manipulate images, we need to:
- establish a vocabulary for describing the geometry of a scene,
- understand the image formation process given camera parameters. 

<center><img src="figs/01_lens3.jpg" width=950px alt="default"/></center>

# Image Formation

- [Math Vocabulary: Geometric Primitives and Transformations](#sec-syllabus)

- [Image Formation: 3D to 2D Projections](#sec-ece)

- [Image Formation: Camera Parameters](#sec-ece)

<center><img src="figs/00_dip.jpeg" width=600px alt="default"/></center>


# Image Formation

- **[Math Vocabulary: Geometric Primitives and Transformations](#sec-syllabus)**

- [Image Formation: 3D to 2D Projections](#sec-ece)

- [Image Formation: Camera Parameters](#sec-ece)

<center><img src="figs/00_dip.jpeg" width=600px alt="default"/></center>


# Geometric Primitives: 2D Points

$\color{#EF5645}{\text{2D points}}$ (pixel coordinates in an image) are denoted:
- with a pair of values: $( x,y ) \in \mathbb{R}^2$ ("inhomogeneous coordinates"),
- with homogeneous coordinates, also called projective coordinates:
  - by $\bar x = (x, y, 1)$, called the augmented vector,
  - or by $(kx, ky, k)$ for any $k \in \mathbb{R}$.
 
$\color{#EF5645}{\text{Remark}}$: In homogeneous coordinates, vectors differing only by scale are equivalent. 

$\color{#047C91}{\text{Exercise}}$: Give examples of homogeneous coordinates for the 2D point $(2, 3)$.


# 2D Projective Space

$\color{#EF5645}{\text{The 2D projective space $P^2$}}$ is the set of equivalence classes of $\mathbb{R}^3 - \{0\} $ under the equivalence relation ~ defined by:

<center>$x \sim y$ if there is a nonzero element $k$ of $\mathbb{R}$ such that $x = ky$.</center>

2D points can be seen as elements of the 2D projective space.

<center><img src="figs/01_homogeneous.png" width=400px alt="default"/></center>

# Geometric Primitives: 2D Lines

$\color{#EF5645}{\text{2D lines}}$ are represented:
- with the coefficients $a, b, c$ forming an implicit equation: $ax + by + c = 0$,
- if $b \neq 0$, by the equation: $y = \frac{a}{b}x + \frac{c}{b}$,
- equivalently, by the point of the 2D projective space $l = (a, b, c)$. 
  - The corresponding line equation is: $\bar x. l = ax + by + c = 0.$
  
  
$\color{#047C91}{\text{Exercise}}$: Consider a 2D line $l = (a, b, c)$. 
- Given a nonzero $k$, show that $(ka, kb, kc)$ describes the same line.

# Geometric Primitives: 3D Points

$\color{#EF5645}{\text{3D points}}$ are represented:
- with inhomogeneous coordinates $( x,y,z ) \in \mathbb{R}^3$,
- with homogeneous coordinates :
  - by $\bar x =( x, y, z, 1)$ the augmented vector,
  - by $(kx, k y, k z, k)$ for any $k\in \mathbb{R}-{0}$, i.e. in the 3D projective space $P^3$.
  
$\color{#047C91}{\text{Example}}$: What is the 3D point represented in homogeneous coordinates by $(2, 4, 6, 2)$?

# Geometric Primitives: 3D Planes

$\color{#EF5645}{\text{3D planes}}$ are represented:
- with coefficients $a, b, c, d$ forming the implicit equation: $ax + by + cz + d = 0$
- equivalently, by the point of the 3D projective space denoted for example with $m =( a,b,c,d )$. 
  - The corresponding plane equation is: $\bar x. m = ax + by + cz + d = 0.$


# Introducing Geometric Transformations

- Points can be written in inhomogeneous $x$ or homogeneous coordinates $\bar x$.
- Geometric transformations map points $x$ to points $x'$, that can be written:
  - using coordinates $x, x'$,
  - using coordinates $\bar x, \bar  x'$,
  - using coordinates $x, \bar  x'$,
  - using coordinates $\bar x, x'$.
 
$\color{#EF5645}{\text{Remark}}$: Homogeneous/projective coordinates will be useful to describe projections.

# 2D Transformations: Translations

$\color{#EF5645}{\text{2D Translation}}$ by a vector $t \in \mathbb{R}^2$ can be written:
- $x'=x+t$,
- $x' = \begin{bmatrix} I & t \end{bmatrix}  \bar x,$
- $ \bar x' = \begin{bmatrix} I & t \\ 0^T & 1 \end{bmatrix} \bar x,$

where $I$ is the $2 \times 2$ identity matrix.

$\color{#047C91}{\text{Exercise}}$: Show the above computations.

# 2D Transformations: Rigid-Body Motions

$\color{#EF5645}{\text{2D Rigid-Body Motion}}$, also known as a 2D Euclidean transformation, is defined by a 2D rotation and a 2D translation. It can be written as:
- $x′ = Rx + t$,
- $x′ = \begin{bmatrix} R & t \end{bmatrix}  \bar x$,
- $ \bar x′ = \begin{bmatrix} R & t \\ 0^T & 1 \end{bmatrix} \bar x,$

where  $R= \begin{bmatrix} \cos \theta & - \sin \theta \\  
\sin \theta & \cos \theta  \end{bmatrix}$ is an orthonormal rotation matrix.

$\color{#6D7D33}{\text{Properties}}$: Rigid-body motions preserve:
- distances between points,
- angles between lines (thus also parallelism).

# 2D Transformations: Similarities

$\color{#EF5645}{\text{Similarity}}$, also called scaled rotation, adds a scaling $s \in \mathbb{R}_+$ to the 2D Rigid-Body Motion. It can be written as:
- $x′ = sRx + t$
- $x′ 
= \begin{bmatrix} sR & t \end{bmatrix}  \bar x 
= \begin{bmatrix} a & - b &t_x \\  b & a  & t_y \end{bmatrix} \bar x $

where we no longer require that $a^2 + b^2 = 1$. 

$\color{#6D7D33}{\text{Properties}}$: Similarities preserve:
- angles between lines (thus also parallelism).


# 2D Transformations: Affine

$\color{#EF5645}{\text{Affine transformation}}$ is written as:
- $x' = Bx + t$,
- $x′ = A \bar x  = \begin{bmatrix} a_{00} & a_{01} & a_{02} \\ a_{10} & a_{11} & a_{12} \end{bmatrix} \bar x,$

where A is an arbitrary $2 \times 3$ matrix.

$\color{#6D7D33}{\text{Properties}}$: Affine transformations preserve:
- parallelism.


# 2D Transformations: Projective

$\color{#EF5645}{\text{Projective transformation}}$, also called perspective transform or homography, operates on homogeneous coordinates:
- $ \bar x′ = \bar H \bar x,$
where $\bar H$ is an arbitrary $3 \times 3$ matrix. 

$\color{#EF5645}{\text{Remark}}$: Note that $\bar H$ is homogeneous, i.e., it is only defined up to a scale, and that two matrices that differ only by scale are equivalent. 

$\color{#6D7D33}{\text{Properties}}$: Projective transformations preserve:
- straight lines (i.e., they remain straight after the transformation).

# 2D Transformations

Mathematically, they are Lie groups (github.com/geomstats/geomstats)

<center><img src="figs/01_2Dtransf.png" width=820px alt="default"/></center>

$\color{#047C91}{\text{Exercise}}$: For each transformation, list the degrees of freedom.

# 3D Transformations

- Very similar to 2D transformations.

<center><img src="figs/01_3Dtransf.png" width=900px alt="default"/></center>

# 3D Transformations: Notes on 3D Rotations

Biggest difference between 2D and 3D transformations:
- Parameterization of the 3D rotation becomes complicated. 
- Several options:
  - Rotation matrix
  - Axis-angle rotation vector
  - Euler angles
  - Unit quaternions.


# 3D Rotations - Euler Angles

$\color{#EF5645}{\text{Euler Angles}}$: A rotation matrix $R$ can be formed as the product of three rotations around three cardinal axes, e.g., $x, y, z$, or $x, y, x$:
$$R 
= R_x(\alpha)R_y(\beta)R_x(\gamma)
= \begin{bmatrix}
1 & 0 & 0 \\
0 & \cos \alpha & - \sin \alpha \\
0 & \sin \alpha & \cos \alpha 
\end{bmatrix} \begin{bmatrix}
\cos \beta & 0 & - \sin \beta \\
0 & 1 & 0 \\
\sin \beta & 0 & \cos \beta
\end{bmatrix}
 \begin{bmatrix}
1 & 0 & 0 \\
0 & \cos \gamma & - \sin \gamma \\
0 & \sin \gamma & \cos \gamma 
\end{bmatrix}.$$

Warnings with Euler angles:
- the result depends on the order in which the transforms are applied,
- they can be given in extrinsic or intrinsic coordinates system.


# 3D Rotations - Axis Angle Rotation Vector

$\color{#EF5645}{\text{Axis Angle}}$: A 3D rotation can be represented by its rotation axis $u$ (unit vector) and its angle $\theta \in [0, \pi]$, or equivalently by a 3D vector 
$$ω = \theta u.$$

$\color{#047C91}{\text{Exercise}}$: 
- Why is the angle $\theta$ restricted in $[0, \pi]$?
- What is the 3D rotation described by $ω = \frac{\pi}{2}[1, 1, 0]$?

# 3D Rotations - Unit Quaternions

$\color{#EF5645}{\text{Unit quaternions}}$: A unit quaternion is a unit length 4-vector whose components can be written as $q =( x,y,z,w )$. For a rotation represented by its rotation axis $u$ and its angle $\theta \in [0, \pi]$, its quaternion is:
$$ q = \left(\cos \frac{\theta}{2}, \sin \frac{\theta}{2} u_x, \sin \frac{\theta}{2} u_y, \sin \frac{\theta}{2} u_z\right).$$


Antipodal (opposite sign) quaternions, $q$ and $−q$, represent the same rotation.

$\color{#047C91}{\text{Example}}$: Write the quaternion associated to the rotation of axis $\frac{1}{\sqrt{3}}(1, 1, 1)$ and angle $\pi / 6$.

# Converting Between Representations

$\color{#047C91}{\text{Exercise (at home)}}$: 
- Find conversions formulas between representations of rotations.
- Check your results [using Geomstats to convert between different representations](https://github.com/geomstats/geomstats/blob/7c03eb37f3f3392b8d70426a0a038c049c038813/geomstats/geometry/special_orthogonal.py#L872). 

# Image Formation

- [Math Vocabulary: Geometric Primitives and Transformations](#sec-syllabus)

- **[Image Formation: 3D to 2D Projections](#sec-ece)**

- [Image Formation: Camera Parameters](#sec-ece)

<center><img src="figs/00_dip.jpeg" width=600px alt="default"/></center>

# From a 3D World to 2D Images

We describe:
- how 3D primitives and transformations from the real-world...
- ...are projected into 2D primitives and transformations onto the image plane.

This will require:
- geometry of the camera model describing the 3D to 2D projection,
- intrinsic parameters of the camera.

# Pinhole Camera

<center><img src="figs/01_pinhole.png" width=900px alt="default"/></center>

Barrier:
- reduces blurring
- opening = pinhole = aperture = center of the camera

# Pinhole Camera Model

<center><img src="figs/01_pinhole2.png" width=900px alt="default"/></center>

$\color{#EF5645}{\text{Definitions/Notations:}}$
- $f$ = focal length
- $[i, j, k]$ = camera reference or coordinate system
- $\Pi'$ = image or retina plane
- C'O = optical axis
- C' origin of 2D coordinates on the image plane

# Pinhole Camera Model: Projection

$\color{#EF5645}{\text{Projection}}$: Consider a 3D point $p=[x, y, z]^T$. The 3D to 2D projection defined by the pinhole camera model is:
$$p' 
= \begin{bmatrix} x' & y' \end{bmatrix}^T 
= \begin{bmatrix}f\frac{x}{z} & f\frac{y}{z}\end{bmatrix}^T.$$

$\color{#047C91}{\text{Exercise}}$: Prove the formula for the 3D to 2D projection of the pinhole camera model.

# Remark: Aperture Size

In practice, the pinhole is not a single point.

<center><img src="figs/01_aperture.png" width=600px alt="default"/></center>


<center><img src="figs/00_blackhole.jpg" width=500px alt="default"/></center>
<center>Giant aperture.</center>

- Wider aperture: more light, but blurry image. 
- Add lenses !

# Adding a Lens: Paraxial Refraction Model

<center><img src="figs/01_lens.png" width=900px alt="default"/></center>

- Lens focuses the light on the film.
- There is a specific distance at which objects are "in focus".

# Paraxial Refraction Model

<center><img src="figs/01_lens2.png" width=900px alt="default"/></center>

- Lens focuses the light on the film.
- There is a specific distance at which objects are "in focus".
- Lens focuses light rays parallel to the optical axis to the "focal point".

# Paraxial Refraction Model: Projection

$\color{#EF5645}{\text{Projection}}$: Consider a 3D point $p=[x, y, z]^T$. The 3D to 2D projection defined by the paraxial refraction model is:
$$p' 
= \begin{bmatrix} x' & y' \end{bmatrix}^T 
= \begin{bmatrix}(f+z_0)\frac{x}{z} & (f+z_0)\frac{y}{z}\end{bmatrix}^T.$$

$\color{#047C91}{\text{Remark}}$: The proof is beyond the scope of this class.

# Beyond 3D to 2D Projects

Projections of 3D points into the image plane do not directly correspond to what we see in digital images. Digital images:
- may use a reference (coordinate) system different from the image plane's,
- are divided into discrete pixels, whereas points in the image plane are continuous,
- may add distortion coming from nonlinearities introduced by the physical sensors.

We introduce a simple model that addresses some of the points above.

# Image Formation

- [Math Vocabulary: Geometric Primitives and Transformations](#sec-syllabus)

- [Image Formation: 3D to 2D Projections](#sec-ece)

- **[Image Formation: Camera Parameters](#sec-ece)**

<center><img src="figs/00_dip.jpeg" width=600px alt="default"/></center>

# Camera Matrix Model

= describes how 3D world points are mapped to the digital image, by refining the camera paraxial model.

$\color{#EF5645}{\text{Add offset parameters $c_x$ and $c_y$:}}$ Describe how image plane and digital image coordinates differ by a translation.
  - Image plane coordinates: origin C′
  - Digital image coordinates: origin at image's lower-left corner.

The projection becomes: 
$$p' 
= \begin{bmatrix} x' & y' \end{bmatrix}^T 
= \begin{bmatrix}(f+z_0)\frac{x}{z} +c_x & (f+z_0)\frac{y}{z}+c_y\end{bmatrix}^T.$$

# Camera Matrix Model

= describes how 3D world points are mapped to the digital image, by refining the camera paraxial model.


$\color{#EF5645}{\text{Add parameters $k$ and $l$ (e.g. in pixels/cm)}}$: Describe how image plane and digital image coordinates have different units:
- points in image plane: physical measurements, e.g. cm
- points in digital images are expressed in pixels


The projection becomes: 
$$p' 
= \begin{bmatrix} x' & y' \end{bmatrix}^T 
= \begin{bmatrix}(f+z_0)k\frac{x}{z} +c_x & (f+z_0)l\frac{y}{z}+c_y\end{bmatrix}^T
= \begin{bmatrix}\alpha\frac{x}{z} +c_x & \beta\frac{y}{z}+c_y\end{bmatrix}^T.$$

# Going to Homogeneous Coordinates

The projection, written in inhomogeneous coordinates: 
$$p' 
= \begin{bmatrix} x' & y' \end{bmatrix}^T 
= \begin{bmatrix}(f+z_0)k\frac{x}{z} +c_x & (f+z_0)l\frac{y}{z}+c_y\end{bmatrix}^T
= \begin{bmatrix}\alpha\frac{x}{z} +c_x & \beta\frac{y}{z}+c_y\end{bmatrix}^T$$
is not a linear operation.

- Linear operations are interesting, as they allow us to use linear algebra.
- We transform the nonlinear operation above into a linear operation...
  - ...by using homogeneous coordinates.

# With Homogeneous Coordinates

We write:
- $\bar p = [x, y, z, 1]$ the 3D point in the real world,
- $\bar p' = [x', y', 1]$ the 2D point in the digital image plane.

The projection, written in homogeneous coordinates, becomes:
$$\bar p' 
= \begin{bmatrix}
\alpha \frac{x}{z} + c_x \\
\beta \frac{y}{z} + c_y \\
1
\end{bmatrix}
= \begin{bmatrix}
\alpha x + c_x z \\
\beta y + c_y z\\
z
\end{bmatrix}
= 
\begin{bmatrix}
\alpha & 0 & c_x & 0 \\
0 & \beta & c_y & 0 \\
0 & 0 & 1 & 0
\end{bmatrix}
\begin{bmatrix}
x \\
y \\
z \\
1
\end{bmatrix}
= \begin{bmatrix}
\alpha & 0 & c_x & 0 \\
0 & \beta & c_y & 0 \\
0 & 0 & 1 & 0
\end{bmatrix}
\bar p
= M \bar p.
$$
The projection is now a linear operation, represented by the matrix $M$.

# Camera Matrix: Intrinsics

$\color{#EF5645}{\text{Camera Matrix}}$: The projection writes, in homogeneous coordinates:
$$\bar p' 
= \begin{bmatrix}
\alpha & 0 & c_x & 0 \\
0 & \beta & c_y & 0 \\
0 & 0 & 1 & 0
\end{bmatrix}
\bar p
= M \bar p,
$$
which we can also write:
$$\bar p' 
= 
\begin{bmatrix}
\alpha & 0 & c_x  \\
0 & \beta & c_y  \\
0 & 0 & 1 
\end{bmatrix}
\begin{bmatrix}
I_3 & 0
\end{bmatrix}
\bar p
= K  \begin{bmatrix}
I_3 & 0
\end{bmatrix}\bar p,
$$
where $I_3$ is the $3 \times 3$ identity matrix.

The matrix $K$ is refered to as the camera matrix. It contains the intrinsic camera parameters, that depend on the camera type, e.g. on its resolution.

# Camera Parameters: Extrinsics
<center><img src="figs/01_pinhole2bis.png" width=600px alt="default"/></center>

Until now, we assumed that:
- world 3D point $p$ was in the camera coordinate system $[i, j, k]$.

Now, we generalize:
- $p_w$ may be observed in a different coordinate system.

# Camera Extrinsics

The transformation $p_w \rightarrow p$ is a 3D rigid-body transformation, which we write in homogeneous coordinates:
$$\bar p = \begin{bmatrix}
R & T \\
0 & 1
\end{bmatrix} \bar p_w
$$

The projection to the digital image plane becomes:
$$\bar p' 
= K  \begin{bmatrix}
I_3 & 0
\end{bmatrix}\bar p
= 
K  \begin{bmatrix}
I_3 & 0
\end{bmatrix}
\begin{bmatrix}
R & T \\
0 & 1
\end{bmatrix} \bar p_w
= M \bar p_w
$$
$\color{#EF5645}{\text{Extrinsic Camera Parameters}}$: Parameters $R, T$ are known as the extrinsic camera parameters because they are external to and do not depend on the type of the camera.

# Summary

The $3 \times 4$ projection matrix $M$ has 10 degrees of freedom:
- 4 from the intrinsic camera matrix $K$, 
- 3 from extrinsic rotation $R$, 
- 3 from extrinsic translation $T$.

$\color{#EF5645}{\text{Remark}}$: We did not add distortion effects to this model.

# Application: Transformations Between Cameras


<center><img src="figs/01_homogeneous.png" width=350px alt="default"/></center>
<center>Recall: Homogeneous coordinates</center>



<center><img src="figs/01_camera_calibration.png" width=450px alt="default"/></center>


$\color{#EF5645}{\text{Homography}}$ If two cameras see points lying on a plane, a relationship between them can be easily found without going through explicit camera calibration. This relationship that relates the two cameras is called the homography.

# Application

- The real world coordinate system is taken to be the first camera's system.
- $n = (a, b, c)$ the normal to $\pi$ with plane equation $ax + by + cz +1 = 0$.
- Cameras are "canonical" ($K_1 = K_2 = I_3$): only model their relative position.
- $\bar p_1 = M_1 \bar p_\pi$, the projection of $\bar p_\pi$ as a digital image through camera 1
- $\bar p_2 = M_2 \bar p_\pi$, the projection of $\bar p_\pi$ as a digital image through camera 2

Prove that:
$\bar p_2 = M_2 \begin{bmatrix} I_3 \\ -n \end{bmatrix} \bar p_1.$

<center><img src="figs/01_camera_calibration.png" width=400px alt="default"/></center>

# Image Formation

- [Math Vocabulary: Geometric Primitives and Transformations](#sec-syllabus)

- [Image Formation: 3D to 2D Projections](#sec-ece)

- [Image Formation: Camera Parameters](#sec-ece)

Resource: Ch. 2 from "Computer Vision: Algorithms and Applications."

<center><img src="figs/00_dip.jpeg" width=500px alt="default"/></center>