# Image Formation

<center><img src="figs/00_dip.jpeg" width=500px alt="default"/></center>


# Image Formation

- [Geometric Primitives and Transformations](#sec-syllabus)

- [3D to 2D Projections](#sec-ece)

- [Camera Parameters](#sec-ece)

# Basic Primitives

$\color{#EF5645}{\text{2D points}}$: 2D points (pixel coordinates in an image) can be denoted using a pair of values, $( x,y ) \in \mathbb{R}^2$ 2.

2D points can also be represented using their homogeneous coordinates, where the point $x =( x,y )$ is represented:
- by $\bar x = (x, y, 1)$, called the augmented vector.
- or by $(kx, ky, k)$ for any $k \in \mathbb{R}$.
In other words, vectors that differ only by scale are considered equivalent. 

Homogeneous coordinates are also called projective coordinates.

$\color{#EF5645}{\text{2D projective space}}$: The 2D projective space $P^2$ is the set of equivalence classes of $\mathbb{R}^3 - \{0\} $ under the equivalence relation ~ defined by:
$x \sim y$ if there is a nonzero element $k$ of $\mathbb{R}$ such that $x = ky$.

2D points can be seen as elements of the 2D projective space.

# Basic Primitives

$\color{#EF5645}{\text{2D lines}}$: 2D lines can also be represented using homogeneous coordinates $l = (a,b,c )$. The corresponding line equation is: 

$$\bar x. l = ax + by + c =0 .$$

2D lines can be seen as elements of the 2D projective space.

# Basic Primitives

$\color{#EF5645}{\text{3D points}}$: Point coordinates in three dimensions can be written using inhomogeneous coordinates $x =( x,y,z ) \in \mathbb{R}^3$ or homogeneous coordinates $(\tilde x, \tilde y, \tilde z, \tilde k) \in P^3$, where $P^3$ is the 3D projective space.

It can be useful to denote a 3D point using the augmented vector $\bar x =( x,y,z, 1)$.

$\color{#EF5645}{\text{3D planes}}$: 3D planes can also be represented as homogeneous coordinates $m =( a,b,c,d )$ with a corresponding plane equation:
$$\bar x . m = ax + by + cz + d =0.$$


# 2D Transformations

$\color{#EF5645}{\text{2D Translation}}$: 2D translations can be written as $x′ = x + t$ or 
$$x′ = \begin{bmatrix} I & t \end{bmatrix}  \bar x,$$
where $I$ is the (2 × 2) identity matrix or
$$ \bar x′ = \begin{bmatrix} I & t \\ 0^T & 1 \end{bmatrix} \bar x.$$


# 2D Transformations

$\color{#EF5645}{\text{2D Rotation + Translation}}$: This transformation is also known as 2D rigid body motion or the 2D Euclidean transformation (since Euclidean distances are preserved). It can be written as $x′ = Rx + t$ or 
$$x′ = \begin{bmatrix} R & t \end{bmatrix}  \bar x$$
where 
$$R= \begin{bmatrix} \cos \theta & - \sin \theta \\  
\sin \theta & \cos \theta  \end{bmatrix}$$ is an orthonormal rotation matrix with $RR^T = I$ and $\det R =1$ .



# 2D Transformations

$\color{#EF5645}{\text{Scaled rotation}}$:  Also known as the similarity transform, this transformation can be expressed as x′ = sRx + t, where s is an arbitrary scale factor, or:
$$x′ 
= \begin{bmatrix} R & t \end{bmatrix}  \bar x 
= \begin{bmatrix} a & - b &t_x \\  b & a  & t_y \end{bmatrix} \bar x $$ 
where we no longer require that $a^2 + b^2 = 1$. The similarity transform preserves angles between lines.


# 2D Transformations

$\color{#EF5645}{\text{Affine transformation}}$: The affine transformation is written as $x′ = A \bar x$, where A is an arbitrary 2 × 3 matrix, or:
$$x′ = \begin{bmatrix} a_{00} & a_{01} & a_{02} \\ a_{10} & a_{11} & a_{12} \end{bmatrix} \bar x.$$
Parallel lines remain parallel under affine transformations.


# 2D Transformations

$\color{#EF5645}{\text{Projective transformation}}$: This transformation, also known as a perspective transform or homography, operates on homogeneous coordinates,
$$ \tilde x′ = \tilde H \tilde x,$$
where $\tilde H$ isan arbitrary 3 × 3 matrix. Note that $\tilde H$ is homogeneous, i.e., it is only defined up to a scale, and that two matrices that differ only by scale are equivalent. 

Projective transformations preserve straight lines (i.e., they remain straight after the transformation).

# 2D Transformations

- Mathematically, they are Lie groups. (Geomstats: https://github.com/geomstats/geomstats)

<center><img src="figs/01_2Dtransf.png" width=500px alt="default"/></center>

# 3D Transformations

- very similar to that available for 2D transformations.
- we use $p$ tod enote 3D points and $x$ to denote 2D points.

<center><img src="figs/01_3Dtransf.png" width=500px alt="default"/></center>

# 3D Rotations

The biggest difference between 2D and 3D coordinate transformations is that the parameterization of the 3D rotation matrix R is not as straightforward, as several different possibilities exist.


# 3D Rotations - Euler Angles

$\color{#EF5645}{\text{Euler Angles}}$: Arotation matrix can be formed as the product of three rotations around three cardinal axes, e.g., x, y,and z,or x, y,and x. This is generally a bad idea, as the result depends on the order in which the transforms are applied.


# 3D Rotations - Axis Angle / Rotation Vector

$\color{#EF5645}{\text{Axis Angle}}$: A rotation can be represented by a rotation axis $u$ and an angle $\theta$, or equivalently by a 3D vector $ω = \theta u$.



# 3D Rotations - Unit Quaternions

$\color{#EF5645}{\text{Unit quaternions}}$: The unit quaternion representation is closely related to the angle/axis representation. A unit quaternion is a unit length 4-vector whose components can be written as $q =( x,y,z,w )$. Unit quaternions live on the unit sphere $‖q‖ =1$ and antipodal (opposite sign) quaternions, $q$ and $−q$, represent the same rotation.

# Image Formation

- [Geometric Primitives and Transformations](#sec-syllabus)

- [3D to 2D Projections](#sec-ece)

- [Camera Parameters](#sec-ece)

# 3D to 2D Projections

Now that we know how to represent 2D and 3D geometric primitives and how to transform them spatially,we need to specify how3D primitives are projected onto the image plane. We can do this using alinear 3D to 2D projection matrix. The simplest model isorthograph y, which requires no division to get the final (inhomogeneous) result. The more commonly used model isperspecti ve,since this more accurately models the behavior of real cameras.


# Pinhole Camera

<center><img src="figs/01_pinhole.png" width=400px alt="default"/></center>

Barrier:
- reduces blurring
- opening = pinhole = aperture = center of the camera

# Pinhole Camera Model

<center><img src="figs/01_pinhole2.png" width=400px alt="default"/></center>

- $f$ = focal length
- $[i, j, k]$ = camera reference or coordinate system
- $\Pi'$ = image or retina plane
- CO' = optical axis

# Pinhole Camera Model

<center><img src="figs/01_pinhole2.png" width=400px alt="default"/></center>

# Pinhole Camera Model: Projection

$\color{#EF5645}{\text{Projection}}$: The 3D to 2D projection defined by the pinhole camera model is, given $p=[x, y, z]^T$:
$$p' 
= \begin{bmatrix} x' & y' \end{bmatrix}^T 
= \begin{bmatrix}f\frac{x}{z} & f\frac{y}{z}\end{bmatrix}^T.$$

$\color{#047C91}{\text{Exercise}}$: Prove the formula for the 3D to 2D projection of the pinhole camera model.

# Aperture Size

In practice, the pinhole is not a single point.

<center><img src="figs/01_aperture.png" width=400px alt="default"/></center>

- Smaller aperture: sharper but darker image. 
- Add lenses !

# Camera with a Lens

<center><img src="figs/01_lens.png" width=400px alt="default"/></center>

- lens focuses the light on the film
- there is a specific distance at which objects are "in focus"

# Paraxial Refaction Model

<center><img src="figs/01_lens2.png" width=400px alt="default"/></center>

- lens focuses light rays parallel to the optical axis to focal point

# Paraxial Refaction Model: Projection

$\color{#EF5645}{\text{Projection}}$: The 3D to 2D projection defined by the pinhole camera model is, given $p=[x, y, z]^T$:
$$p' 
= \begin{bmatrix} x' & y' \end{bmatrix}^T 
= \begin{bmatrix}(f+z_0)\frac{x}{z} & (f+z_0)\frac{y}{z}\end{bmatrix}^T.$$

$\color{#047C91}{\text{Remark}}$: The proof is beyond the scope of this class.

# Missing Elements

These projections of 3D points into the image plane does not directly correspond to what we see in actual digital images:
- points in the digital images are, in general, in a different reference system than those in the image plane
- digital images are divided into discrete pixels, whereas points in the image plane are continuous.
- the physical sensors can introduce non-linearity such as distortion to the mapping.

We introduce the camera parameters.

# Image Formation

- [Geometric Primitives and Transformations](#sec-syllabus)

- [3D to 2D Projections](#sec-ece)

- [Camera Parameters](#sec-ece)

# Camera Matrix Model

= describes parameters that affect how world point $p$ is mapped to image coordinates $p'$.

Parameters $c_x$ and $c_y$: describe how image plane and digital image coordinates differ by a translation.
  - Image plane coordinates: origin C′
  - Digital image coordinates: origin at image's lower-left corner.

Thus: 
$$p' 
= \begin{bmatrix} x' & y' \end{bmatrix}^T 
= \begin{bmatrix}(f+z_0)\frac{x}{z} +c_x & (f+z_0)\frac{y}{z}+c_y\end{bmatrix}^T.$$

# Camera Matrix Model

= describes parameters that affect how world point $p$ is mapped to image coordinates $p'$.


Parameters $k$ and $l$ (e./g. in pixels/cm): describe how image plane and digital image coordinates have different units:
- points in image plane: physical measurements, e.g. cm
- Points in digital images are expressed in pixels


Thus: 
$$p' 
= \begin{bmatrix} x' & y' \end{bmatrix}^T 
= \begin{bmatrix}(f+z_0)k\frac{x}{z} +c_x & (f+z_0)l\frac{y}{z}+c_y\end{bmatrix}^T
= \begin{bmatrix}\alpha\frac{x}{z} +c_x & \beta\frac{y}{z}+c_y\end{bmatrix}^T.$$

# Camera Matrix Model in Homogeneous Coords

Using $\tilde p = [x', y', 1]$ and $\tilde p = [x, y, z, 1]$, we get:
$$\tilde p' 
= \begin{bmatrix}
\alpha \frac{x}{z} + c_x \\
\beta \frac{y}{z} + c_y \\
1
\end{bmatrix}
= \begin{bmatrix}
\alpha x + c_x z \\
\beta y + c_y z\\
z
\end{bmatrix}
= 
\begin{bmatrix}
\alpha & 0 & c_x & 0 \\
0 & \beta & c_y & 0 \\
0 & 0 & 1 & 0
\end{bmatrix}
\begin{bmatrix}
x \\
y \\
z \\
1
\end{bmatrix}
= \begin{bmatrix}
\alpha & 0 & c_x & 0 \\
0 & \beta & c_y & 0 \\
0 & 0 & 1 & 0
\end{bmatrix}
\tilde p
= M \tilde p
$$

We can also write:
$$\tilde p' 
= M \tilde p
= 
\begin{bmatrix}
\alpha & 0 & c_x  \\
0 & \beta & c_y  \\
0 & 0 & 1 
\end{bmatrix}
\begin{bmatrix}
I & 0
\end{bmatrix}
\tilde p
= K  \begin{bmatrix}
I & 0
\end{bmatrix}\tilde p
$$

The matrix $K$ is refered to as the camera matrix.

# Camera Parameters: Extrinsics

- So far: world point $P$ is in the camera coordinate system $[i, j, k]$.
- Now: $P_w$ may be observed in a different coordinate system.

$$p = \begin{bmatrix}
R & T \\
0 & 1
\end{bmatrix} p_w
$$

So that:
$$\tilde p' 
= K  \begin{bmatrix}
I & 0
\end{bmatrix}\tilde p
= 
K  \begin{bmatrix}
I & 0
\end{bmatrix}
\begin{bmatrix}
R & T \\
0 & 1
\end{bmatrix} p_w
= M p_w
$$
parameters R and T are known as the extrinsic parameters because they are external to and do not depend on the camera


# Summary

The $3 \times 4$ projection matrix $M$ has 11 degrees of freedom:
- 5 from the intrinsic camera matrix $K$, 
- 3 from extrinsic rotation $R$, 
- 3 from extrinsic translation $T$.

# Outline: Welcome

- [Syllabus](#sec-syllabus)

- [Real-World Applications](#sec-ece)

- **[Outline](#sec-outline)**