**B M Nafis Fuad**

**ID: 274502**

# ELE510 Image Processing with robot vision: LAB, Exercise 2, Image Formation.

**Purpose:** *To learn about the image formation process, i.e. how images are projected from the scene to the image plane.*

The theory for this exercise can be found in chapter 2 and 3 of the text book [1]. Supplementary information can found in chapter 1, 2 and 3 in the compendium [2]. See also the following documentations for help:
- [OpenCV](https://docs.opencv.org/4.8.0/d6/d00/tutorial_py_root.html)
- [numpy](https://numpy.org/doc/stable/)
- [matplotlib](https://matplotlib.org/stable/users/index.html)

**IMPORTANT:** Read the text carefully before starting the work. In
many cases it is necessary to do some preparations before you start the work
on the computer. Read necessary theory and answer the theoretical part
frst. The theoretical and experimental part should be solved individually.
The notebook must be approved by the lecturer or his assistant.

**Approval:**
<div class="alert alert-block alert-success">
The current notebook should be submitted on CANVAS as a single pdf file. 
</div>

<div class="alert alert-block alert-info">
    To export the notebook in a pdf format, goes to File -> Download as -> PDF via LaTeX (.pdf).
</div>

**Note regarding the notebook**: The theoretical questions can be answered directly on the notebook using a *Markdown* cell and LaTex commands (if relevant). In alternative, you can attach a scan (or an image) of the answer directly in the cell.

Possible ways to insert an image in the markdown cell:

`![image name]("image_path")`

`<img src="image_path" alt="Alt text" title="Title text" />`


**Under you will find parts of the solution that is already programmed.**

<div class="alert alert-block alert-info">
    <p>You have to fill out code everywhere it is indicated with `...`</p>
    <p>The code section under `######## a)` is answering subproblem a) etc.</p>
</div>


## Problem 1

**a)** What is the meaning of the abbreviation PSF? What does the PSF specify?

**PSF** - Point Spread Function

PSF specifies the shape that a point will take on the image plane. It specifies how an ideal point source of light is transformed or spread out in the final image due to various factors in the imaging system. It characterizes the blur or spreading of light from a point source as it passes through the optical components of the imaging system, including lenses, apertures, and other elements.

**b)** Use the imaging model shown in Figure 1. The camera has a lens with focal length $f = 40\text{mm}$ and in the image plane a CCD sensor of size $10\text{mm} \times 10\text{mm}$. The total number of pixels is $5000 \times 5000$. At a distance of $z_w = 0.5\text{m}$ from the camera center, what will be the camera's resolution in pixels per millimeter?

<img src="perspectiveProjection.jpg" alt="Alt text" title="Title text" />

**Figure 1**: Perspective projection caused by a pinhole camera. Figure 2.23 in [2].


The camera's resolution in pixel per milimeter can be calculated using the following formula,

    R0 = R1 / M

    Here,
    R0 = Camera resolution in object plane
    R1 = Camera resolution in image plane
    M = Magnification of camera lens

**Calculating R1**

Resolution of camera at image plane can be calculated by taking the square root of the pixel count and multiplying it by the aspect ratio.

    Here,
    Pixel count = 5000 x 5000
    Sensor Size = 10 mm x 10 mm
    So, the aspect ratio is 1:1 

    Resolution of camera at image plane = sqrt(5000 x 5000) x 1 = 5000

This means that the camera can capture 5000 pixels along each side of the sensor. To find the resolution in pixels per millimeter, we can divide this number by the sensor height size:

    R1 = 5000 pixels / 10 mm = 500 pixel/mm

*So, the camera’s resolution at image plane, R1 = 500 pixels per milimeter*


**Calculating M**

Magnification of lens can be calculated using the following formula,

    M = zw / f

    Here,
    zw = Distance from camera center to the object = 0.5 m = 500 mm
    f = Focal length = 40 mm

    M = 500 mm / 40 mm = 12.5

This means that the object size is magnified by 12.5 times on the image plane. Therefore, the resolution at the object plane is 12.5 times smaller than the resolution at the image plane.


**Calculating R0**

    R0 = R1 / M = 500 pixel per milimter / 12.5 = 40 pixel per milimeter

The camera’s resolution in pixels per millimeter at a distance of 𝑧𝑤=0.5m from the camera center is 40.



**c)** Explain how a Bayer filter works. What is the alternative to using this type of filter in image acquisition?

A Bayer Filter works by filtering light to capture color information in a way that mimics the human eye's sensitivity to different colors. Here's how it works:

* A Bayer filter is a grid of tiny color filters placed over the pixels of an image sensor. These filters are typically arranged in a checkerboard pattern, with alternating red, green, and blue filters. Each pixel on the sensor is covered by one of these filters.
* The human eye perceives color using three types of color receptors: red, green, and blue cones. Similarly, the Bayer filter uses red, green, and blue filters to mimic this color perception. The green filter is used twice as much as red and blue because our eyes are more sensitive to green light.
* When light enters the camera's sensor through the lens, each pixel captures the intensity of light for one of the three colors: red, green, or blue, depending on which filter covers it. For example, pixels with red filters record the intensity of red light.
* Since each pixel on the sensor only captures one color's information, the missing color information for each pixel is interpolated (estimated) using data from neighboring pixels. This process is known as demosaicing, and various algorithms are used to interpolate the missing color values.

Alternative of Bayes Filter can be 3CCD where cameras often use three separate image sensors, each with a dedicated filter for one of the primary colors (red, green, and blue). This eliminates the need for interpolation and can provide higher color accuracy but is more complex and expensive.

**d)** Briefly explain the following concepts: Sampling, Quantization, Gamma Compression.

**Sampling:** Sampling is the process of converting a continuous image into a grid of discrete points or pixels. It determines the spatial resolution of the digital image by specifying how densely the image is sampled. Higher sampling rates result in more detail but also larger file sizes.

**Quantization:** Quantization involves mapping the continuous range of brightness values in an image to a limited set of discrete values. It reduces the image's data size by approximating continuous intensity levels with a fixed number of digital values (typically integers).

**Gamma Compression:** Gamma compression (or gamma correction) is a nonlinear adjustment applied to pixel values in an image. It compensates for the nonlinear response of human vision to changes in brightness. Gamma correction ensures that the displayed image appears more natural by adjusting its tonal response.

## Problem 2

Assume we have captured an image with a digital camera. The image covers an area in the scene of size $1.024\text{m} \times 0.768\text{m}$ (The camera has been pointed towards a wall such that the distance is approximately constant over the whole image plane, *weak perspective*). The camera has 4096 pixels horizontally, and 3072 pixels vertically. The active region on the CCD-chip is $8\text{mm} \times 6\text{mm}$. We define the spatial coordinates $(x_w,y_w)$ such that the origin is at the center of the optical axis, x-axis horizontally and y-axis vertically upwards. The image indexes $(x,y)$ is starting in the upper left corner. The solutions to this problem can be found from simple geometric considerations. Make a sketch of the situation and answer the following questions:


**a)** What is the size of each sensor (one pixel) on the CCD-chip?

* Sensor Size (x, y) = 8 mm x 6 mm
* Pixel Number (M, N) = 4096 x 3072

The size of each sensor (one pixel) on the CCD-chip can be calculated as follows:

      Sensor Size (in the horizontal direction), dx = (Sensor Width, x) / (Number of Pixels Horizontally, M)

            = (8mm) / (4096 pixels) = 0.001953125 mm/pixel

      Sensor Size (in the vertical direction), dy = (Sensor Height, y) / (Number of Pixels Vertically, N)
    
            = (6mm) / (3072 pixels) = 0.001953125 mm/pixel

**b)** What is the scaling coefficient between the image plane (CCD-chip) and the scene? What is the scaling coefficient between the scene coordinates and the pixels of the image?


* Field of View, FOV (X, Y) = 1.024 m × 0.768 m
* Sensor Size (x, y) = 8 mm x 6 mm
* Pixel Number (M, N) = 4096 x 3072

Scaling coefficient between the image plane (CCD-chip) and the scene can be calculated as follows:

    Horizontal Scaling Coefficient, sx = (Sensor Width, x) / (Scene Width, X) 

        = 8mm / 1.024 m = 8mm / 1024mm = 0.0078125

    Vertical Scaling Coefficient, sy = (Sensor Height, y) / (Scene Height, Y) 

        = 6mm / 0.768m = 6mm / 768mm = 0.0078125

Scaling coefficient between the scene coordinates and the pixels of the image can be calculated as follows:

    Horizontal Scaling Coefficient, alpha_x = sx / dx 

        = (0.0078125) / (0.001953125 mm/pixel) = 4 pixel per milimeter

    Vertical Scaling Coefficient, alpha_y = sy / dy
        
        = (0.0078125) / (0.001953125 mm/pixel) = 4 pixel per milimeter


## Problem 3

Translation from the scene to a camera sensor can be done using a transformation matrix, $T$. 

\begin{equation}
    \left[
	\begin{array}{c}x \\ y \\ 1\end{array}\right] = 
	T\left[
	\begin{array}{ccc}
		x_w\\ y_w\\ 1
	\end{array} \right]
\end{equation}

where

\begin{equation}
	T= \left[\begin{array}{ccc} \alpha_x & 0 & x_0\\
			0 & \alpha_y & y_0\\
		0   & 0 & 1
	\end{array} \right]
\end{equation}
$\alpha_x$ and $\alpha_y$ are the scaling factors for their corresponding axes.

Write a function in Python that computes the image points using the transformation matrix, using the parameters from Problem 2. Let the input to the function be a set of $K$ scene points, given by a $2 \times K$ matrix, and the output the resulting image points also given by a $2 \times K$ matrix. The parameters defining the image sensor and field of view from the camera center to the wall can also be given as input parameters.  For simplicity, let the optical axis $(x_0,y_0)$ meet the image plane at the middle point (in pixels).

Test the function for the following input points given as a matrix:
\begin{equation}
    {\mathbf P}_{in} = \left[\begin{array}{ccccccccc} 
    0.512 & -0.512 & -0.512 & 0.512 & 0 & 0.35 & 0.35 & 0.3 & 0.7\\
    0.384 & 0.384 & -0.384 & -0.384 & 0 & 0.15 & -0.15 & -0.5 & 0\end{array}\right]
\end{equation}

<div class="alert alert-block alert-info">
Comment on the results, especially notice the two last points!
</div>

In [12]:
# Import the packages that are useful inside the definition of the weakPerspective function
import math 
import numpy as np
import matplotlib.pyplot as plt

In [13]:
# Function that takes in input:
# - FOV: field of view,
# - sensorsize: size of the sensor,
# - n_pixels: camera pixels,
# - p_scene: K input points (2xK matrix)
# and return the resulting image points given the 2xK matrix

def weakPerspective(FOV, sensorsize, n_pixels, p_scene):
    #Calculate size of each sensor (one pixel)
    dx = sensorsize[0] / n_pixels[0]
    dy = sensorsize[1] / n_pixels[1]

    #Calculate the scaling coefficient between the image plane (CCD-chip) and the scene
    sx = sensorsize[0] / FOV[0]
    sy = sensorsize[1] / FOV[1]
    
    # Calculate the scaling factors
    alpha_x = sx / dx
    alpha_y = sy / dy

    # Calculate the center pixel
    x0 = n_pixels[0] / 2
    y0 = n_pixels[1] / 2

    # Construct the transformation matrix T
    T = np.array([[alpha_x, 0, x0],
                  [0, alpha_y, y0],
                  [0, 0, 1]])
    
    # Add a row of ones to the input points matrix
    p_scene = np.vstack((p_scene, np.ones(p_scene.shape[1])))

    # Apply the transformation matrix to the input points
    p_image = T @ p_scene
    
    # Divide by the third row to get homogeneous coordinates
    p_image = p_image / p_image[2,:]
    
    # Remove the third row and return the result
    p_image = p_image[0:2,:]
    
    return p_image    

In [14]:
# The above function is then called using the following parameters:

# Parameters
FOV = (1.024, 0.768)  # Field of view in meters (width x height)
sensorsize = (8 / 1000, 6 / 1000)  # Sensor size in meters (width x height)
n_pixels = (4096, 3072)  # Number of pixels (width x height)
p_scene_x = [0.512, -0.512, -0.512, 0.512, 0, 0.35, 0.35, 0.3, 0.7]
p_scene_y = [0.384, 0.384, -0.384, -0.384, 0, 0.15, -0.15, -0.5, 0]

In [15]:
####
# This cell is locked; it can be only be executed to see the results. 
####
# Input data:
p_scene = np.array([p_scene_x, p_scene_y])

# Call to the weakPerspective() function 
pimage = weakPerspective(FOV, sensorsize, n_pixels, p_scene)

# Result: 
print(pimage)

[[4096.    0.    0. 4096. 2048. 3448. 3448. 3248. 4848.]
 [3072. 3072.    0.    0. 1536. 2136.  936. -464. 1536.]]


**Comments**

* The first four points indicates the four corners (0 x 0), (0 x 3072), (4096 x 3072), (4096 x 0)
* The fifth point indicates the center pixel (2048 x 1536)
* The sixth and seventh points are within the FOV, so they will be captured in the image sensor (3448 x 2136) & (3448 x 936)
* The last two points are outside the FOV, they won't be captured in the image sensor (3248 x -464) & (4848 x 1536)



### Delivery (dead line) on CANVAS: 12-09-2021 at 23:59


## Contact
### Course teacher
Professor Kjersti Engan, room E-431,
E-mail: kjersti.engan@uis.no

### Teaching assistant
Saul Fuster Navarro, room E-401
E-mail: saul.fusternavarro@uis.no


Jorge Garcia Torres Fernandez, room E-401
E-mail: jorge.garcia-torres@uis.no


## References

[1] S. Birchfeld, Image Processing and Analysis. Cengage Learning, 2016.

[2] I. Austvoll, "Machine/robot vision part I," University of Stavanger, 2018. Compendium, CANVAS.