# Geometry from Multiple Views


## Opening Assumptions

As with the two-view case,  to solve this problem one step at a time we need to make some assumptions.

- We have to assume that we are viewing a static scene and that we have multiple different views of this scene.
- We will assume that we already have a set of point correspondences in our multiple views. How we did this is not our concern in this section.
- We assume that we know the intrinsic parameters of the camera. And we will assume that they are the same for all views.

## What does more views bring us?
More views allows us to have more measurements for the same number of 3D points.

This constrains our result even more than the two-view case.

In general we start by looking at the three-view case and then generalise that to the n-view case.

This three-view case can be tackled with matrices (which is how we will do it) or with the trifocal tensor.

The trifocal tensor is a generalisation of the fundamental matrix.

As with the fundamental matrix, the trifocal tensor doesn't depend on the 3D points, but only on the inter-frame camera motion.

The relationship between points and lines encoded by the trifocal tensor is called a trilinear relationship.

## The matrix view
We will not use the trifocal tensor but instead use a matrix notation which once again will make use of rank constraints which are imposed by the constraints from the multiple views.


## Pre-image and Co-image
Definition of Pre-image:_The pre-image of a point or a line in 3D is defined by the subspace (in the camera coordinate frame)  spanned by the homogeneous coordinates of the image point or points of the line in the plane._

Definition of Co-image: _The co-image of a point or a line in 3D is defined to be the maximum orthogonal supplementary subspace orthogonal to its pre-image (in the camera coordinate frame)._



## Pre-image of a point


For a point, the pre-image is a 1D subspace, i.e. a line defined by a vector.
In the figure the pre-image of the point $p$ in camera\textsubscript{1}  is $\vec{x_1}$

![](images/precoimage.png)


## Pre-image of a line


For a line, the pre-image is a 2D subspace, i.e. a plane which can be defined by any two linearly independent vectors in that plane.

The plane uniquely determines the image of the line in the image plane as the intersection of the plane with image plane.
In the figure the pre-image of the line $L$ in camera\textsubscript{1} is the dark blue line. 
	![](images/precoimage.png)

## Co-image of a point}

As the pre-image of a point is a 1D subspace then its co-image is a plane with the pre-image vector being the normal to that plane.
![](images/precoimage.png)


## Co-image of a line


As the pre-image of a line is a 2D subspace, i.e. a plane, then its co-image is a vector that is normal to this plane.
Shown in the diagram as $\vec{l_1}$ and $\vec{l_2}$.

![](images/precoimage.png)
	
	
## Pre-image from multiple views


A pre-image of multiple images of a point is the largest set of 3D points that give rise to the same set of multiple images of the point.

In the figure for the two images, $p$ is the pre-image as it is the intersection of the two vectors $\vec{x_1}$ and $\vec{x_2}$.

![](images/precoimage.png)
	
 
## Pre-image from multiple views


A pre-image of multiple images of a line is the largest set of 3D points that give rise to the same set of multiple images of the line.
In the figure for the two images, $L$ is the pre-image as it is the intersection of the two planes $l_{1\times}$ and $l_{2\times}$.

![](images/precoimage.png)
	
 
## Pre-image intersection

The pre-image in multiple images of points and lines can be defined by the intersection.
pre-image$(\vec{x_1},\dots, \vec{x_m})$ = pre-image$(\vec{x_1}) \cap \cdots \cap$ pre-image$(\vec{x_m})$

pre-image$(l_1,\dots, l_m)$ = pre-image$(l_1) \cap \cdots \cap$ pre-image$(l_m)$  

The pre-image of multiple image lines can either be nothing (empty set), a point, a line or a plane, depending on whether or not they come from the same line in space.




Assume we have a moving camera, at time $t$, let $x(t)$ denote the coordinates of a 3D point $\mathbf{X}$ in homogeneous coordinates.

\begin{equation}
	\lambda(t)x(t) = K(t)\Pi_0g(t)\mathbf{X}
\end{equation}

where $\lambda(t)$ denotes the depth of the point, $K(t)$ denotes the intrinsic paramters and $\Pi_0$ denotes the generic projection.

\begin{equation}
	g(t) = \begin{bmatrix}
R(t) & T(t)\\
0 & 1
\end{bmatrix} \in SE(3)
	
\end{equation}

Which you recall denotes the rigid body motion, at time t.

## 3D line $L$


A 3D line $L$ in homogeneous coordinates can be written as,
\begin{equation}
	L = \{\mathbf{X}|\mathbf{X}=\mathbf{X}_0 + \mu\mathbf{V}, \mu \in \mathbb{R}\} \subset \mathbb{R}^4
\end{equation}

Where $\mathbf{X}_0 = [X_0,Y_0,Z_0,1]^{\top} \in \mathbb{R}^4$ are the coordinates of the base point $p_0$ and 

$\mathbf{V} = [V_1,V_2,V_3,0]^{\top} \in \mathbb{R}^4$ is a nonzero 

vector indicating the line direction.

![](images/precoimage.png}
	


## 3D line $L$


The pre-image of $L$ w.r.t. the image at time $t$ is a plane with normal $l(t)$. The vector $l(t)$ is orthogonal to all points $x(t)$ of the line

\begin{equation}
	l(t)^{\top}x(t) = l(t)^{\top}K(t)\Pi_0g(t)\mathbf{X} = 0
\end{equation}

Assume we have a set of $m$ images at times $t_1,\dots,t_m$ where
$\lambda_i=\lambda(t_i)$

$x_i = x(t_i)$,

$l_i = l(t_i)$,

$\Pi_i=K(t_i)\Pi_0g(t_i)$

  
![](images/precoimage.png)
	


## 3D line $L$


We can now relate the $i^{th}$ image of a point $p$ to its world coordinates $\mathbf{X}$:

\begin{equation}
	\lambda_ix_i=\Pi_i\mathbf{X}
	
\end{equation}
and the $i^{th}$ co-image of a line $L$ to its world coordinates $(\mathbf{X}_0, \mathbf{V})$:
\begin{equation}
	l^{\top}_i\Pi_i\mathbf{X}_0=l_i^{\top}\Pi_i\mathbf{V}=0
	
\end{equation}
    
![](images/precoimage.png)
	


## Pre-images and Rank Constraints

As we did in the two-view case, we need to remove the 3D coordinates of points (and lines) from the equations on the previous slides if we are to solve the system.

We want equations that are in only the 2D coordinates, which we know.

Take the images of a 3D point $\mathbf{X}$ which we capture in multiple views;
\begin{equation}
	\mathcal{I}\vec{\lambda} \equiv 
	\begin{bmatrix}
	x_1 & 0 & \cdots & 0 \\
    0 & x_2 &  0 & 0\\
	\vdots & \vdots & \ddots & \vdots\\
	0 & 0 & \cdots & x_m\\
	\end{bmatrix}\begin{bmatrix}
	\lambda_1  \\
    \lambda_2 \\
	\vdots \\
	\lambda_m\\
	\end{bmatrix}=\begin{bmatrix}
	\Pi_1  \\
    \Pi_2 \\
	\vdots \\
	\Pi_m\\
	\end{bmatrix}\mathbf{X}\equiv \Pi\mathbf{X}
\end{equation}
or compactly:

\begin{equation}
	\mathcal{I}\vec{\lambda} =\Pi\mathbf{X}
\end{equation}




## Pre-images and Rank Constraints

$$\mathcal{I}\vec{\lambda} =\Pi\mathbf{X}$$
where $\vec{\lambda} \in \mathbb{R}^m$ is the depth scale vector, and $\Pi \in \mathbb{R}^{3m\times4}$ is the multiple-view projection matrix associated with the image matrix $\mathcal{I} \in \mathbb{R}^{3m\times m}$

Just as with the two-view case, this equation is not of use yet as everything in it is unknown apart from the 2D coordinates.

So our goal is to decouple the above equations into constraints which allow us to separately recover the camera displacements $\Pi_i$ first and then the scene structure $\lambda_i$ and $\mathbf{X}$.

## Point Features

Every column of $\mathcal{I}$ lies in a 4D space spanned by the columns of the matrix $\Pi$. 

In order to have a solution to the above equation, the columns of $\mathcal{I}$ and $\Pi$ must therefore be linearly **dependent**,
i.e. 
\begin{equation}
	N_p \equiv (\Pi,\mathcal{I}) = \begin{bmatrix}
	\Pi_1 & x_1 & 0 & \cdots & 0 \\
    \Pi_2 & 0 & x_2 &  0 & 0\\
	\vdots & \vdots & \vdots & \ddots & \vdots\\
	\Pi_m & 0 & 0 & \cdots & x_m\\
	\end{bmatrix} \in \mathbb{R}^{3m\times(m+4)}
\end{equation}

must have a non-trivial right null space. For $m \geq 2$ (i.e. $3m \geq m+4)$, full rank would be $m+4$. Linear dependence of columns therefore implies the rank constraint.

\begin{equation}
	rank(N_p) \leq m+3
\end{equation}


We can make a more compact formulation as follows.

First introduce the following matrix.

\begin{equation}
	\mathcal{I}^{\perp} \equiv \begin{bmatrix}
	x_{1\times} & 0 & \cdots & 0 \\
    0 & x_{2\times} &  0 & 0\\
	\vdots & \vdots & \ddots & \vdots\\
	0 & 0 & \cdots & x_{m\times}\\
	\end{bmatrix} \in \mathbb{R}^{3m\times3m}
\end{equation}

which has the property of removing $\mathcal{I}$

\begin{equation}
	\mathcal{I}^{\perp}\mathcal{I} = 0
\end{equation}

So we can pre-muliply $\mathcal{I}\vec{\lambda} =\Pi\mathbf{X}$ by $\mathcal{I}^{\perp}$ to get

\begin{equation}
	\mathcal{I}^{\perp}\Pi\mathbf{X} = 0
\end{equation}



Once again we see a solution defined by a null space.
i.e. $X$ is in the null space of the matrix

\begin{equation}
	W_p \equiv \mathcal{I}^{\perp}\Pi = 
	\begin{bmatrix}
	x_{1\times}\Pi_1  \\
   x_{2\times}\Pi_2 \\
	\vdots \\
    x_{m\times}\Pi_m
	\end{bmatrix}  \in \mathbb{R}^{3m\times4}
\end{equation}

To have a non-trivial solution, we must have 

\begin{equation}
	\text{rank}(W_p) \leq 3
\end{equation}

## Line Features

Just as with the point features, we can use a rank constraint for lines. 

The co-images $l_i$ of a line $L$ spanned by a base $\mathbf{X}_0$ and a direction $\mathbf{V}$ we have:

\begin{equation}
	l_i^{\top}\Pi_i\mathbf{X}_0=l_i^{\top}\Pi_i\mathbf{V} = 0
\end{equation}

Don't let the subtlety of the above equation fool you.\\ $\mathbf{X}_0 \neq \mathbf{V}$
Instead, it is saying that both $\mathbf{X}_0$ and $\mathbf{V}$ are in the null space of $l_i^{\top}\Pi_i$.


Let us construct the following matrix
\begin{equation}
	W_l \equiv \begin{bmatrix}
l_1^{\top}\Pi_1\\
l_2^{\top}\Pi_2\\
\vdots\\
l_m^{\top}\Pi_m
 \end{bmatrix} \in \mathbb{R}^{m\times4}
\end{equation}


An $m\times4$ matrix can have rank of at most $4$.
We know that there are at least two vectors living in the null space, i.e. $\mathbf{X}_0 \neq \mathbf{V}$.

Therefore $W_l$ can have rank of at most $2$
So the question is, how many lines do we need?

Well, if we had only two lines then that would give $rank=2$ right there, because $M_l \in \mathbb{R}^{m\times4}$.

But, this wouldn't be a solution.



This would simply be stating that any two planes in a space must meet eachother in a line....somewhere.

It wouldn't uniquely identify our line.

Instead, if we have three or more planes and they all meet in the same line, then we have a unique identification.

This is why lines don't appear in the two-view case but become useful in the multi-view case.



## Only two views of a line


![](images/LinesInTwoViews.png)
Ambiguous reconstruction with only two views of a line

## Degenerate three-view of a line}


![](images/Line3ViewDegenerate.png)
	
 


## Consistent three-view of a line}


![](images/Line3ViewGood.png)
	
 