# Epipolar Geometry

## Opening Assumptions

To solve this problem one step at a time we need to make some assumptions.

- We have to assume that we are viewing a static scene and that we have two different views of this scene.
- We will assume that we already have a set of point correspondences in our two views. \\How we did this is not our concern in this section.
- We assume that we know the intrinsic parameters of the camera. And we will assume that they are the same for both views.


## What we don't know
Even with these assumptions we still have a problem on our hands. 

We know a set of 2D points but we don't know the extrinsic parameters and we don't know the 3D coordinates that correspond to the points in the images.

This leads to a bit of a catch-22.
To find the extrinsic parameters (rotation and translation) we need the 3D points.
To find the 3D points we need to know the extrinsic camera parameters.

## Route to solution

- Disentangle the 3D coordinates from the camera motion (rotation/translation) algebraically.
- Remove the 3D coordinates from the algebra so that we have equations in 2D image coordinates only.
- Use these to solve for camera motion (using the 8-point algorithm).
- Now we have the extrinsic parameters we can determine the 3D coordinates. 
- The 3D coordinates are the reconstruction.
	

**Note: that this is solving a neat mathematics problem. So a gentle reminder: the real world is not a neat mathematics problem.**

## Epipolar Geometry
We will continue to call a point in the 3D world $X$.

The projection of this into the first view results in 2D coordinate $x_1$ and its projection into the second view results in 2D coordinate $x_2$.

We refer to the center of projection of the two cameras as $O_1$ and $O_2$.

If we draw a line beteen the two centers of projection this line will intersect the two image planes, the points of intersection are called the epipoles $e_1$ for the intersection with the first view plane and $e_2$ for the intersection with image plane of the second view.

Note: these intersections do not have to occur inside the small rectangle of the image. The image planes are assumed to continue forever in all directions.
 

![](images/epipolarMuSh.png)
[Image Credit: Mubarak Shah UCF](https://youtu.be/1X93H_0_W5k?t=1890)
 

A triangle is formed between the 3D point $X$ and the two camera centers. This triangle is said to lie on the epipolar plane.
The intersection of the epipolar plane with each image plane is called the epipolar lines $l_1$ and $l_2$.
There is a single epipolar plane for each 3D point $X$

## Null Spaces

A quick linear algebra reminder. A 2D plane in a 3D space is called a subspace (as long as it passes through the origin).

We can define the plane with vectors that lie in the plane. Two independent vectors in the plane will be enough.

However we can also define the plane by a single vector that is orthogonal to it, i.e. a vector that is in its null space.

Keep this in mind as we proceed.

**Ideas like null space, singular matrices, rank deficiency, zero determinant, SVD (Singular Value Decomposition) are all relevant here.**

## Simplifying the Intrinsic Parameters
Firstly let's deal with the intrinsic parameters. 

We will assume no skew in the pixels and assume they are 1:1 aspect ratio. We will assume the focal length is 1. 

This idea of setting the focal length to one is just a way of saying that all other measures will be in units of the focal length. 

So rather than giving sizes in mm, cm or meters we instead give everything in units of the focal length. We can then convert everything easily if we know the focal length. 

We will assume that the origin in the center of the frame.

So the Camera Intrinsic parameter matrix will be 
$$K= \begin{bmatrix}
                fs_x    & 0 & O_x  \\
                0     & fs_y & O_y  \\
                0     & 0 & 1  
            \end{bmatrix} = \begin{bmatrix}[c]
                1    & 0 & 0  \\
                0     & 1 & 0  \\
                0     & 0 & 1  
            \end{bmatrix}$$

Which is the identity matrix, and this allows us to leave it out entirely in our mathematical manipulation.

   

## Epipolar Constraints
So $x_1$ is the projection of the world 3D coordinate point onto the first camera image plane.
$x_1$ will be in homogeneous coordinates.

We will work with everything being relative to the first camera view. Therefore there will be zero rotation and translation for the mathematical description of the first camera view.

\begin{equation}
	\lambda_1x_1 = X
\end{equation}


For the second camera view, we must describe this in terms of camera one. 

This means a Rotation and translation of the 3D point followed by the projection.
\begin{equation}
	\lambda_2x_2=RX+T
\end{equation}
Subsititute equation for the first view into the equation for the second view.

\begin{equation}
	\lambda_2x_2=R(\lambda_1x_1)+T
\end{equation}

## $T_{\times}$

Now it's convenient to remove the $+T$ out of this equation. So what we will do is multiply across by $T_{\times}$.

Remember how we defined this. Take a  vector $T$ and make it into a skew symmetric matrix that performs the cross product with $T$
\begin{equation}
	\lambda_2T_{\times}x_2=T_{\times}R(\lambda_1x_1)+T_{\times}T
\end{equation}

\begin{equation}
	T_{\times}T = 0
\end{equation}

So we can rewite equation for the second view as follows
\begin{equation}
	\lambda_2T_{\times}x_2=\lambda_1T_{\times}Rx_1
\end{equation}
Remember that  $\lambda_1$ and $\lambda_2$ are simply scalar values and can be moved about more freely than vectors or matrices.\\
   

## Getting rid of the $\lambda$s
Now this next step takes a bit of explaining.

Firstly be aware that $T_{\times}x_2$ will result in vector that is orthogonal to $x_2$. So if we get the dot product of $x_2$ with this vector we will get zero.
So multiply across by $x_2$
\begin{equation}
	\lambda_2x_2^{\top}T_{\times}x_2=\lambda_1x_2^{\top}T_{\times}Rx_1
\end{equation}

Which is 
\begin{equation}
	0=\lambda_1x_2^{\top}T_{\times}Rx_1	
\end{equation}

That's $\lambda_2$ gone. Now divide both sides by $\lambda_1$
\begin{equation}
    0=	x_2^{\top}T_{\times}Rx_1	
\end{equation}

## The Epipolar Constraint

\begin{equation}
    x_2^{\top}T_{\times}Rx_1=0
\end{equation}

Is called the Epipolar Constraint.

It is an important result as it relates 2D coordinates without mention of 3D coordinates.

Remember our issue from earlier, the catch-22.
In order to determine camera motion we needed the 3D coordinates and in order to determine the 3D coordinates we needed the camera motion.

The epipolar constraint allows us to determine camera motion without 3D coordinates.

So from there we can work towards getting the 3D coordinates.
    

## The Essential matrix

If we take the central part of the epipolar constraint, the part that doesn't include the two 2D coordinates we have what is called the essential matrix.
\begin{equation}
	E = T_{\times}R \quad \in \mathbb{R}^{3\times3}
\end{equation}

Due to this name the epipolar constraint may be variously called the the essential constraint or the bilinear constraint.

  
## The Epipolar Plane
$E$ is a $3\times3$ matrix of rank 2 which means it has a left and right null space of 1. 
The epipolar constraint stipulates that the  triangle ($\vec{O_1X},\vec{O_2O_1},\vec{O_2X})$ lies on a plane. 

Or we can say that those points alone can define the plane.

Now, as these three are vectors, we can define a triple product with them.

Remember that the triple product defines a volume. And a plane should have a volume of zero.
So we can say
\begin{equation}
	x^{\top}_2(T_{\times} Rx_1) = 0 
\end{equation}
   

![](images/epipolarMuSh.png)
[Image Credit: Mubarak Shah UCF](https://youtu.be/1X93H_0_W5k?t=1890)
 

## A comment on invertability

So we mentioned that $E$ has rank 2.

Therefore it is singular and not invertible.

Imagine you are given the epipolar constraint and asked to work back to get the $\lambda$s.

Can you do so?

No, because you now have zero on the other side of the equation.

This tells you that you have lost some things along the way.
You may remember that rotation and translation (in 3D) has 6-DoF. 

The fact that we have lost information along the way means we cannot get away with just six equations to solve for these.

We will need eight sets of corresponding points and hence this is called the 8-point algorithm.

## Some properties of $E$

This is all great, but the truth is that the epipolar constraint is not really a nice identity.
$E$ is a $3\times3$ matrix. 
If we find enough point correspondences we should be able to recover $E$, but then what?

We don't really want $E$, we want $R$ and $T$ so that we can tell how the camera moved between frames. 
How do we separate out $R$ and $T$?

The space of all essential matrices is called the essential space defined as follows

\begin{equation}
	\mathcal{E} \equiv \left\{ T_{\times}R | R \in SO(3), T \in \mathbb{R}^3 \right\} \subset \mathbb{R}^{3\times3}
\end{equation}

    

## Some properties of $E$

A nonzero matrix $E \in \mathbb{R}^{3\times3}$ is an essential matrix (if and only if) $iff \quad E$ has a singular value decomposition (SVD) $E=U\Sigma V^{\top}$ with 

\begin{equation}
	\Sigma = \begin{bmatrix}
\sigma & 0 & 0\\
0 &\sigma & 0\\
0 & 0 & 0
\end{bmatrix}
\end{equation}

for some $\sigma>0$ and $U,V \in SO(3)$.

This is from the 1989 theorem _Characterization of a the essential Matrix_ by Huang & Faugeras.

And it poses quite a problem because the space of essential matrices is not a linear one, so solving it by the normal linear algebra will find us a $3\times3$ matrix, but it is unlikely to have these required properties.

To add insult to injury here, even if we find the essential matrix, there are two possible decompositions of $R$ and $T$.

The one bit of good news is that in general only one of the decompositions makes sense, i.e. gives positive depth values. Negative depth values would be behind the camera.

## Two decompositions

From the theorem _Pose recovery from the Essential Matrix_ - page 84 An invitation to 3D Vision by Ma, Kosecka, Soatto, Sastry.

There are two relative poses $(R,T)$ with $R\in SO(3)$ and $T\in\mathbb{R}^3$ corresponding to an essential matrix $E\in\mathcal{E}$


For $E=U\Sigma V^{\top}$ we have:
\begin{equation}
	(T_{1\times},R_1) = (UR_{Z(+\frac{\pi}{2})}\Sigma U^{\top}, UR^{\top}_{Z(+\frac{\pi}{2})}V^{\top})
\end{equation}
\begin{equation}
	(T_{2\times},R_2) = (UR_{Z(-\frac{\pi}{2})}\Sigma U^{\top}, UR^{\top}_{Z(-\frac{\pi}{2})}V^{\top})
\end{equation}

## Getting an essential matrix
As mentioned earlier our standard linear Algebra methods will recover a $3\times3$ matrix but this is unlikely to meet the stringent criteria of an Essential matrix.

We have two options:

- Recover whatever $3\times3$ matrix we can from our linear methods and then project that on to the space of essential matrices. In other words use the closest essential matrix to the one we recover (whatever closest means) - Easy but lacking accuracy. 
- Optimise the epipolar constraints in the essential space $\mathcal{E}$, accurate but requires non-linear constrained optimisation which is difficult and would require a whole new toolset of skills.


We will use the first approach.
    

## The Eight Point Algorithm

So we start with the epipolar constraint.
\begin{equation}
    x_2^{\top}Ex_1=0
\end{equation}

This should hold for any matching points $x_1$ and $x_2$ in two image views.

If we have enough points, we will use eight, then we should be able to recover the unknown matrix $E$.

Strictly speaking we can get away with seven points but eight will give us a unique solution (up to scale).

This assumes the eight points meet certain criteria and have no noise. More on this later.



$E$ is a $3\times3$ matrix as follows.
\begin{equation}
	E= \begin{bmatrix}
e_{11} & e_{12} & e_{13}\\
e_{21} & e_{22} & e_{23}\\
e_{31} & e_{32} & e_{33}\\
 \end{bmatrix}
\end{equation}
    

We can stack this matrix into a single vector $\in \mathbb{R}^9$

which we will call $e_s$

$$e_s= \begin{bmatrix}
e_{11}\\
e_{21}\\
e_{31}\\
e_{12}\\
e_{22}\\
e_{32}\\
e_{13}\\
e_{23}\\
e_{33}
\end{bmatrix} $$
 
For each set of point correspondences in homogeneous coordinates we will get one linear equation in the unknown entries of $E$.

For example if $x = (x,y,1)^{\top}$ and $x' = (x',y',1)^{\top}$ we have the linear equation

 \begin{equation}
	x'xe_{11}+x'ye_{12}+x'e_{13}+y'xe_{21}+y'ye_{22}+y'e_{23}+xe_{31}+ye_{32}+e_{33} = 0
\end{equation}

We can write this as $a^{\top}e_s=0$ which is equivalent to the epipolar constraint.


$$ \begin{bmatrix}
x'x & x'y & x'& y'x&y'y&y'&x&y&1
 \end{bmatrix} \begin{bmatrix}
e_{11}\\
e_{21}\\
e_{31}\\
e_{12}\\
e_{22}\\
e_{32}\\
e_{13}\\
e_{23}\\
e_{33}
\end{bmatrix}=0$$

 
     





 
We can put each of the equations for the eight point correspondances into a separate row of a matrix called $A$
 \begin{equation}
	\begin{bmatrix}
x_1'x_1 & x_1'y_1 & x_1'& y_1'x_1&y_1'y_1&y_1'&x_1&y_1&1\\
x_2'x_2 & x_2'y_2 & x_2'& y_2'x_2&y_2'y_2&y_2'&x_2&y_2&1\\
x_3'x_3 & x_3'y_3 & x_3'& y_3'x_3&y_3'y_3&y_3'&x_3&y_3&1\\
x_4'x_4 & x_4'y_4 & x_4'& y_4'x_4&y_4'y_4&y_4'&x_4&y_4&1\\
x_5'x_5 & x_5'y_5 & x_5'& y_5'x_5&y_5'y_5&y_5'&x_5&y_5&1\\
x_6'x_6 & x_6'y_6 & x_6'& y_6'x_6&y_6'y_6&y_6'&x_6&y_6&1\\
x_7'x_7 & x_7'y_7 & x_7'& y_7'x_7&y_7'y_7&y_7'&x_7&y_7&1\\
x_8'x_8 & x_8'y_8 & x_8'& y_8'x_8&y_8'y_8&y_8'&x_8&y_8&1\\
 \end{bmatrix} \begin{bmatrix}
e_{11}\\
e_{21}\\
e_{31}\\
e_{12}\\
e_{22}\\
e_{32}\\
e_{13}\\
e_{23}\\
e_{33} 
 \end{bmatrix} = 0
\end{equation}

  

## Why Eight?
You will notice that there are nine unknowns in $E$ but as mentioned earlier we can only determine a unique solution up to a scale factor. So generally we set $e_{33}$ to 1 as we can set it to any value. But we must do this first before we start solving.

For $E$ to have a solution, $A$ must be of rank at most 8. 

If it is of rank = 8 then we will have a unique solution. $Ae_s = 0$ says that the vector $e_s$ is in the null space of $A$. So if $A$ had rank greater than 8 then there would be no null space and no solution. 

For less than 8 e.g. 7 there is a whole 2D plane of solutions and there are methods to determine a solution on this plane with the best essential matrix in $\mathcal{E}$ space.

## Noise
**If the data (point correspondences) are not exact (it won't be exact in the real world), the rank of $A$ may be greater than 8.**

9 is the full rank as there are 9 columns.

In this case we can find the least squares solution. 

This is also the case if we use more than 8 point correspondences.

Also note that even in the supposed full rank case we would need 9 correspondences to realise this. 

So in the case of noise and only 8 points, we will still only have rank 8, but our null space could be the wrong vector/line. 

    

## How do we find the Null space of A?
How do we find $e_s$?

Get the SVD of $A$.
\begin{equation}
	A = U\Sigma V^{\top}
\end{equation}

The solution is the column vector of $V$ that corresponds to the smallest singular value of $\Sigma$. 

Most programming software libraries (Matlab, Numpy) will order the singular values of $\Sigma$ in descending order. 

In that case the solution $e_s$ will be the final column of $V^{\top}$.

## Projecting $E$ onto the essential space ($\mathcal{E}$)

As we mentioned, calculating $E$ by this manner is unlikely to get us a matrix that obeys all the constraints of the essential space.

Instead we must project onto $\mathcal{E}$.

From Theorem: _Projection onto the essential space_ page 86 An invitation to 3D Vision by Ma, Kosecka, Soatto, Sastry.


Let $E$ be the calculated matrix $\in \mathbb{R}^{3\times3}$.

Perform an SVD on E such that
\begin{equation}
	E = U \begin{bmatrix}
\lambda_1 & 0 & 0\\
0 & \lambda_2 & 0\\
0 & 0 &\lambda_3
 \end{bmatrix}V^{\top}
\end{equation}

Where $\lambda_1 \geq \lambda_2 \geq \lambda_3$, i.e. the singular values are in descending order.
The closest essential matrix (we'll call it $E^*$) is given by 
 \begin{equation}
	E^* = U \begin{bmatrix}
	\sigma & 0 & 0\\
0 & \sigma & 0\\
0 & 0 &0
 \end{bmatrix}V^{\top}, \quad \text{with } \sigma = \frac{\lambda_1+\lambda_2}{2}
\end{equation}




  

## Two decompositions

From the theorem _Pose recovery from the Essential Matrix_ - page 84 An invitation to 3D Vision by Ma, Kosecka, Soatto, Sastry.

There are two relative poses $(R,T)$ with $R\in SO(3)$ and $T\in\mathbb{R}^3$ corresponding to an essential matrix $E\in\mathcal{E}$


For $E=U\Sigma V^{\top}$ we have:
\begin{equation}
	(T_{1\times},R_1) = (UR_{Z(+\frac{\pi}{2})}\Sigma U^{\top}, UR^{\top}_{Z(+\frac{\pi}{2})}V^{\top})
\end{equation}
\begin{equation}
	(T_{2\times},R_2) = (UR_{Z(-\frac{\pi}{2})}\Sigma U^{\top}, UR^{\top}_{Z(-\frac{\pi}{2})}V^{\top})
\end{equation}


 