# Geometric Transformations

So far, all we have done is to make changes to the range of an image. i.e. We change the intensity values of the image. 
But we can also move pixels from one place to another.
Some general examples:

- Rotation
- Warping
- Enlarge/Reduce

 
##Mathematical Characterisation


- Translation - preserves orientation, length, angles, parallelism, straight lines.
- Rigid (Euclidean) - preserves length, angles, parallelism, straight lines.
- Similarity - preserves angles, parallelism, straight lines.
- Affine - preserves parallelism, straight lines.
- Projective - preserves straight lines






![](images/geometricTransforsms.png)
 

## Translation
The matrix for this is 

$$\begin{bmatrix}
                1      & 0  & t_x \\
                0       & 1  & t_y  \\
                0        & 0   & 1  
\end{bmatrix} $$   

or
               $$  \begin{bmatrix}
                \textbf{I} & \textbf{t} \\
                \textbf{0}^{\top}       & 1 
            \end{bmatrix} $$

or
    $$[ \textbf{I} |  \textbf{t}  ]_{2\times3}$$

Example: take the pixel at position $(2,3)$ and translate it 5 pixels on the x-axis and 4 pixels on the y-axis.

$$  \begin{bmatrix}
                1      & 0  & 5 \\
                0       & 1  & 4  \\
                0        & 0   & 1  
            \end{bmatrix} 
            \begin{bmatrix}
                2\\
                3  \\
                1   
            \end{bmatrix} 
            = \begin{bmatrix}
                7\\
                7  \\
                1   
            \end{bmatrix} $$   

The 2D coordinate answer is (7,7).

_Note: we have turned (2,3) into (2,3,1). This is called a homogeneous coordinate. It is still a 2D coordinate, as the third coordinate is not independent of the other two coordinates. We will go into more detail on this in a later lecture; for the moment, just accept that it allows us greater freedom to carry out all the different types of geometric transformations in a similar fashion using the techniques of Linear Algebra._

##  Rigid
The matrix for this is
                $$  \begin{bmatrix}
                \cos\theta      & -\sin\theta  & t_x \\
                \sin\theta       & \cos\theta  & t_y  \\
                0        & 0   & 1  
            \end{bmatrix} $$   

or
                $$  \begin{bmatrix}
                \textbf{R}& \textbf{t} \\
                \textbf{0}^{\top}       & 1 
            \end{bmatrix} $$

or

    $$[\textbf{R} | \textbf{t}]_{2\times3}$$

$\textbf{R}$ is a rotation matrix and is an orthogonal (orthonormal) matrix. i.e. $\textbf{R}\textbf{R}^{\top} = \textbf{R}^{\top}\textbf{R} = \textbf{I}$ and $|\textbf{R}| = 1$. 
$\theta$ is assumed to be in an anti-clockwise direction.


[patrickJMT video on rotation of points](https://www.youtube.com/watch?v=OYuoPTRVzxY)


[Gilbert Strang - Orthogonal Matrices](https://www.youtube.com/watch?v=uNsCkP9mgRk&list=PLE7DDD91010BC51F8&index=17)
            

## Similarity
The matrix for this is 
$$  \begin{bmatrix}
                s \cos\theta      & -s \sin\theta  & t_x \\
                s \sin\theta       & s \cos\theta  & t_y  \\
                0        & 0   & 1  
            \end{bmatrix} $$   

or
$$  \begin{bmatrix}
                s\textbf{R}& \textbf{t} \\
                \textbf{0}^{\top}       & 1 
            \end{bmatrix} $$

or
$$[ \ s\textbf{R} \ | \ \textbf{t} \ ]_{2\times3}$$

Where $s$ is the scale factor.

 

## Affine
The matrix for this is 
$$  \begin{bmatrix}
               a_{11}     & a_{12}  & t_x \\
                a_{21}       & a_{22}  & t_y  \\
                0        & 0   & 1  
            \end{bmatrix} $$   

or

$$  \begin{bmatrix}
                \textbf{A}& \textbf{t} \\
                \textbf{0}^{\top}       & 1 
            \end{bmatrix} $$

or
$$[ \ \textbf{A} \ \textbf{t} \ ]_{2\times3}$$




### SVD of Affine

Using Singular Value Decomposition, $\textbf{A}$ can be broken down into $\textbf{A} = \textbf{R}(\theta)\textbf{R}(-\phi)\textbf{D}\textbf{R}(\phi)$ where 
$$D= \begin{bmatrix}
               \sigma_1    & 0  \\
                0       & \sigma_2 
            \end{bmatrix} $$

In words, rotate ($\phi$) so as to line up it's Eigen vectors with the x-y axis. 
Scale it by the $\sigma_1$ in the x direction and by $\sigma_2$ in the y direction. Now rotate it back ($-\phi$) to the original angle and then rotate to the desired angle ($\theta$).   


[See Hartley and Zisserman for more](https://tinyurl.com/ztphphpc)
  

## Projective
The matrix for this is 

$$  \begin{bmatrix}
               h_{11}     & h_{12}  & h_{13} \\
               h_{21}     & h_{22}  & h_{23}  \\
               h_{31}     & h_{32}  & h_{33}      
            \end{bmatrix} $$   

or
$$[ \ \textbf{H}  \ ]_{3\times3}$$

 


## Going forwards or backwards?

It seems to make sense that whatever our transformation, we would take each input pixel coordinate, transform it with the matrix to find it's destination in the new image and transfer the brightness to there. 
This has problems though. 
Due to quantization and other effects we are not guaranteed to fill every position in the new image. 
This can leave gaps. 




 

The better plan is to start with a coordinate in the output image. Determine where its pixel should be coming from in the input image and copy that over. This way we get no gaps. 
To calculate where an output pixel comes from in the input image you must calculate the inverse of the matrix and multiply that by the output coordinate vector and this will give you the input coordinate vector.

## 3D Vectors
Keep in mind the following primary goal.
We want to determine things about the real world by looking at images of the real world.
An image is a 2D representation of the world taken from a specific point of view. 
So we need to understand the world coordinate system, the image coordinate system and the transformation between the two. 
We will start by describing the world coordinate system and how objects in that system can move and change i.e. 3D transformations. 


When we get to camera views you will also need to consider relative movement. e.g. an image of an object, rotating the camera clockwise will result in the image of the object rotating anti-clockwise.
If we wish to describe the position of a point in the world we will need three coordinates. $(x,y,z)^{\top}$.
To describe how a 3D point would change under 3D transformations we will once again use homogeneous coordinates changing the 3D coordinate to $(x,y,z,1)^{\top}$

 

## Translation
The matrix for this is 

$$  \begin{bmatrix}[r]
                1      & 0  & 0 & t_x \\
                0       & 1  & 0 &t_y  \\
                0       & 0  & 1 &t_z  \\
                0        & 0   & 0 & 1  
            \end{bmatrix} $$   

or


$$  \begin{bmatrix}
                \textbf{I} & \textbf{t} \\
                \textbf{0}^{\top}       & 1 
\end{bmatrix} $$

or

$$[\textbf{I} | \textbf{t}]_{3\times4}$$

## Rigid}
The matrix for this is 
$$  \begin{bmatrix}
                r_{11}     & r_{12} & r_{13} & t_x \\
                r_{21}     & r_{22} & r_{23} & t_y \\
                r_{31}     & r_{32} & r_{33} & t_z \\
                0        & 0  &  & 1  
\end{bmatrix} $$   

or

$$  \begin{bmatrix}
                \textbf{R}& \textbf{t} \\
                \textbf{0}^{\top} & 1 
            \end{bmatrix} $$

or

$$[\textbf{R}|\textbf{t}]_{3\times4}$$

$\textbf{R}$ is a rotation matrix and is an orthogonal (orthonormal) matrix. i.e. $\textbf{R}\textbf{R}^{\top} = \textbf{R}^{\top}\textbf{R} = \textbf{I}$ and $|\textbf{R}| = 1$. 
Note we haven't parameterised this yet. 
That's because it is much more difficult than in the 2D case. 

Here we will require three angles (one for each axis). 
So we will leave the parameterisation for the next section.

## Similarity, scaled rotation}
The matrix for this is 
$$  \begin{bmatrix}
                sr_{11}     & sr_{12} & sr_{13} & t_x \\
                sr_{21}     & sr_{22} & sr_{23} & t_y \\
                sr_{31}     & sr_{32} & sr_{33} & t_z \\
                0        & 0  & 0 & 1  
\end{bmatrix} $$   

or
$$  \begin{bmatrix}
                s\textbf{R}& \textbf{t} \\
                \textbf{0}^{\top}       & 1 
\end{bmatrix} $$

or

$$[s\textbf{R}|\textbf{t}]_{3\times4}$$

Where $s$ is the scale factor.

## Affine
The matrix for this is 
$$\begin{bmatrix}
               a_{11}     & a_{12}  & a_{13} & t_x \\
                a_{21}     & a_{22}  & a_{23} & t_y  \\
                a_{31}     & a_{32}  & a_{33} & t_z  \\
                0        & 0 & 0   & 1  
\end{bmatrix} $$   

or

$$\begin{bmatrix}
                \textbf{A}& \textbf{t} \\
                \textbf{0}^{\top}       & 1 
\end{bmatrix} $$

or

$$[\textbf{A}\textbf{t}]_{3\times4}$$

## Projective
The matrix for this is 
$$  \begin{bmatrix}[r]
               h_{11}     & h_{12}  & h_{13} & h_{14} \\
               h_{21}     & h_{22}  & h_{23} & h_{24}\\
               h_{31}     & h_{32}  & h_{33} & h_{34}\\
               h_{41}     & h_{42}  & h_{43} & h_{44}
\end{bmatrix} $$   

or

$$[ \ \textbf{H}  \ ]_{4\times4}$$

Projective is very important to us because it relates the 3D coordinates to a point-of-view, otherwise known as a centre of projection, i.e. where the camera was positioned when it took an image. 
We will be discussing this a lot throughout the module. 
  

## 3D rotations

2D rotations are relatively straight forward but 3D rotations are not. There are three angles involved.
If we take the $3\times3$ rotation matrix $\textbf{R}$ we have nine unknowns arising from three degrees of freedom. 
If we try to determine these unknowns by the usual methods e.g. least squares etc. we have a problem. 
We can certainly find a result for the nine unknowns but how do we ensure that the resulting $\textbf{R}$ is orthogonal and has determinant of +1?

### Separate Rotation around each axis

$$ R^x_{\alpha} = \begin{bmatrix}
                1     & 0 & 0 \\
                0     & cos\alpha & -sin\alpha\\
                0     & sin\alpha & cos\alpha
\end{bmatrix} $$   

$$ R^y_{\beta} = \begin{bmatrix}
               cos\beta     & 0 & sin\beta \\
                0     & 1& 0\\
                -sin\beta    & 0 & cos\beta
\end{bmatrix} $$   

$$ R^z_{\gamma} = \begin{bmatrix}
                cos\gamma & -sin\gamma & 0\\
                 sin\gamma & cos\gamma & 0\\
                0    & 0 & 1
\end{bmatrix} $$   

Where $\alpha, \beta, \gamma$ are called Euler Angles. To get any combination of rotations we multiply the appropriate matrices together. But therein lies the problem. In what order should we multiply them.



$R^x_{\alpha}R^y_{\beta} \neq R^y_{\beta}R^x_{\alpha}$.

## Other rotation issues
There are other issues too. 
Moving through the parameter space $(\alpha,\beta, \gamma)$ is not always smooth. 
There are strict mathematical definitions for smoothness but in practice what this means is that small changes in overall movement do not always lead to small changes in each of the separate parameters. Sometimes a small change will require a very large change in one of the Euler angles. 

  
## Euler Angles
This is not to suggest that Euler angles are of no use. 
In particular where simple known rotations are required, perhaps between two fixed camera positions then simple rotations can be employed between the two views.
However in the general case we would like a better method.
Without getting too deep into the mathematics of vector spaces, the space of rotation matrices is not linear under addition. 
A common solution to this is to change to another space, which is linear, move about in that space and then move back.
  


## Smooth Path
We would also like to determine a smooth path between two views rather than just determining the full rotations that will lead from one to the other. 
For example, the path that a camera took to get from one position/orientation to another is particularly useful when the views all come from one camera, which only takes an image at some sample period, and we may wish to interpolate between these.

  