This chapter discusses image formation: everything that preceeds the actual image processing by a computer.

# Optics
## 1 Lenses
### Projection geometry - the camera obscura
When the diameter of the hole is sufficiently large compared to the wavelength of lighe, diffraction can be ignored and light may be assumed to follow a straight path. This is an example of a __pinhole camera model__ (camera aperture is described as a point and no lenses are used to focus light). Projection geomettry of thin lens is the same.
* Place the camera hole at $O(0,0,0)$
* Let a point on the object be $P_o(X_o,Y_o,Z_o)$. (by definition object coordinates are all positive)
* Corresponding image point is $P_i(X_i,Y_i,-f)$, where f is the distance between image plane and the camera.
* __linear magnification,m__ follows $$ m =-\frac{X_i}{X_o}=-\frac{Y_i}{Y_o}=\frac{f}{Z_o}, $$ beware image coordinates are of opposite sign of corresponding object coordinates.
* This image formation geometry is called __perspective projection__, describing the mathematical relationship between the coordinates of a point in three-dimensional space and its projection onto the image plane of an ideal pinhole camera. 
Limitation: small hole -> faint image; large hole -> unsharp image.

### Thin lenses
All lenses discussed in this chapter are for the "thin lenses".
* Place center of the lens (optical center) at $O(0,0,0)$.
* Let optical axis be Z-axis.
* The projection geometry is unbroken but the imgage distance $Z_i$ is now fixed for a sharply focused image, which follows __Gauss's formula__ $$ \frac{1}{f}=\frac{1}{Z_o}-\frac{1}{Z_i}.$$ Here f is the focal length, the image distance where points at infiniy are focused. Note it is different from the notion f in camera obscura.
* f is determined by the __lensmaker's formula__ 
$$\frac{1}{f}=(\frac{n_l}{n_s}-1) (\frac{1}{r}+\frac{1}{r'}),$$
where r and r' are both taken to be postive (radii of curvature on both side).
* magnification and image position of lense with fixed focal length f: $$m=\frac{f}{Z_o-f}.$$ Higher f -> larger m.

Assumptions:
1. the lens surfaces are sperical on both sides.
2. the lens thickness is small comapred to its radii of curvature.
3. incomming light rays make a small angle with the optical axis (paraxil approximation)
4. the refractive index of the media is the same on both sides of the lens

### The depth-of-field
For fixed focal length f and image distance Zi, only objects at Zo fulfilling the Gauss's fomular will be sharply focused. In practise, a range of distances $Z_o^- <Z_o <Z_o^+$ can be accepted as sharp. Objects within this range is called _in focus_. $Z_o^+ - Z_o^-$ is the depth-of-field. 

Thus a point on the object can be considered as sharp when it is projected to a circle with diameter $b$.

$$\Delta Z_o=\Delta Z_o^+ +\Delta Z_o^-$$

$$\Delta Z_o^- = Z_o - Z_o^- = \frac{Z_o(Z_o-f)}{Z_o+\frac{fd}{b}-f}$$

Larger d (bigger aperture) -> narrower depth-of-field<br>
Small Dof -> sense of small objects (think about close-ups,tilt effects)<br>
Large Dof -> sense of largde objects (think about sceneary pictures)

### Aberrations
#### Geometrical aberrations
Image distortions or degradations like blurring, remain small for paraxial rays (which underlies most of the formulas of geometrical optics)

* Radial distortion (impt)

Caused by a systematic variation of optical magnification when radially moving away from a the point called the center of distortion. In practice, the center of radial distortion can be assumed to coincide with the principal point wiich usually coincides with the center of image.

Since radial distortion is non-linear, it is modeled with Tyler expansion. Typicaly only even order terms play a role where the effectis symmetric around the center.

* Spherical aberration

When paraxial rays do not converge at a single point. 

* Astugmatism
#### Chromatic aberrations
Results from different behaviour of different wavelengths, since refractive index is wavelength dependent.


# 2 Cameras
## Camera types
Vacuum tube camera -> solid state camera ( CCD, charge-coupled device, sensors& CMOS sensors. Through photoelectric effect)
* CCD operation (passing on charges): Full frame, Frame transfer, Interline transfer
* CMOS operation (transistors in each performing charge-voltage conversion), PPS (passive pixel sensor), APS (active, with added amplifier), DPS (digital, added ADC)
* Color image with filter mosaics.

## Models for camera projection
### Perspective projection
* Place center of projection at origin $O(0,0,0)$. 
* __Camera coordinate frame__:
$Z_c$ axis is the optical axis of the lens, $X_c$ axis is parallel to the rows of the image (points right). $Y_c$ axis is parallel to the column of the image (points down). 
* Treat the center of lens as the center of projection
* Introduce __virtual image plane__ (equation $Z=f$) with upright virtual image and with positive coordinates, which is of identical size of the real image. Imgage plane has coordinates (u,v) and is parallel to the $X_c,Y_c$ plane.

Now consider a point P in the scene with coordinates $(X_c,Y_c,Z_c) \in \mathbb{R}^3 $. 

Its image is the intersection of the line passing through origin and $(X_c,Y_c,Z_c)$ and the image plane.
Its coordinates on the image plane can be determined as 
$$ u=f\frac{X_c}{Z_c}$$

$$v=f\frac{Y_c}{Z_c}$$


### Pseudo-orthographic projection
When the object has small range of Z, $f/Z$ can be treat as a constant $k$. This model of projection is pseudo-perspective projection or pseudo-orthographic projection.
* amounts  parallel projection with a scaling added to mimic the effect that the image of pbjects shrink with distance.
* egdes of square remain parallel under peuso-orthographic projection

### Projection Matrix
The perpective projection models uses the camera coordinate frame to describe the object and and the image plane. 
1. Need to conber from world coordinate frame to camera coordinate frame.
2. Need to express u,v coordinates as rows and columns of image.

#### Convert world coordinate frame to camera coordinate frame
In the world coordiniate frame, the camera setting can be described with 9 degrees of fredom:
1. A point C descrribing its location (center of projection)
2. A 3x3 rotation marix $R\in SO(3)$ ([3D rotation group](https://en.wikipedia.org/wiki/3D_rotation_group)) indicating the orientation of the camera coodinate frame with respect to the world frame. 

    * let $\mathbf{r_i}$ be i-th column of R
    * $\mathbf{r_1}$,$\mathbf{r_2}$ and $\mathbf{r_3}$ are the __unit vectors__ of camera coordinate frame in terms of world coordinate frame.
    * The point P in world coordinate system can thus be obtianed by projecting relative postions to those unit vectors.
$$P'=(<\mathbf{r_1},P-C>,<\mathbf{r_2},P-C>,<\mathbf{r_3},P-C>)$$

$$u=f\frac{<\mathbf{r_1},P-C>}{<\mathbf{r_3},P-C>}$$

$$v=f\frac{<\mathbf{r_2},P-C>}{<\mathbf{r_3},P-C>}$$

Let P be (X,Y,Z), C be (C1,C2,C3) and $r_{ij}$ be element at position $(i,j)$ of R, we then have:

$$u=f\frac{r_{11}(X-C1)+r_{12}(Y-C2)+r_{13}(Z-C3)}{r_{31}(X-C1)+r_{32}(Y-C2)+r_{33}(Z-C3)}$$

$$v=f\frac{r_{21}(X-C1)+r_{22}(Y-C2)+r_{23}(Z-C3)}{r_{31}(X-C1)+r_{32}(Y-C2)+r_{33}(Z-C3)}$$

C and R are the external camera calibraton, if they are known, the camera is externally calibrated. This means we can convert back and forth between scene coordinates (world coordinate frame) and camera centerd coordinates (camera coordinate frame).

#### Convert image plane coordinate u,v to pixel coordinate
Consider a digital image as a 2-d array of pixels of n rows and m columns.
An photo of width $\times$ height (x,y) is represented as  a $x\times y$ array with x rows and y columns.

Becareful that no. of rows-> width, no. of columns-> height. (width -> rows, heigh -> columns)

Here we define the five _internal camera parameters_,$x_0,y_0,k_x,k_y,s$. When they are known, the camera is __internally calibrated__. This means we can convert the observed (x,y) image to (u,v) image plane coordinates and vice versa.

* Let the princical point (image plane coordinates (0,0)) takes the pixel coordinates $(x_0,y_0)$ 
* $k_x=\frac{\text{number of pixels}}{\text{unit length in horizontal direction}}=\frac{1}{\text{width of a pixel}}$ 
* $k_y=\frac{\text{number of pixels}}{\text{unit length in vertical direction}}=\frac{1}{\text{height of a pixel}}$ 
* $s$ is the skewness of a pixel, indicating the extent of deviation of a pixel shape from a rectangle. s=0 -> rectangular pixel. 
* $\frac{k_y}{k_x}$ is the aspect ratio of a pixel.

The (x,y) pixel coordiniates can be obtained from (u,v) image plane coordinates as
$$ \begin{cases} 
x=&k_xu + &sv +x_0\\
y=& &k_yv+y_0
\end{cases}
$$

### Homogeneous coordinates and projection geometry 