!["HCI Banner Logos for ATU Sligo, the HCI Human capital initiative and Higher Education 4.0"](images/HCIBanner.png)

# Geometry from Multiple Views


## Opening Assumptions

As with the two-view case,  to solve this problem one step at a time we need to make some assumptions.

- We have to assume that we are viewing a static scene and that we have multiple different views of this scene.
- We will assume that we already have a set of point correspondences in our multiple views. How we did this is not our concern in this section.
- We assume that we know the intrinsic parameters of the camera. And we will assume that they are the same for all views.

## What does more views bring us?
More views allows us to have more measurements for the same number of 3D points.

This constrains our result even more than the two-view case.

In general we start by looking at the three-view case and then generalise that to the n-view case.

This three-view case can be tackled with matrices (which is how we will do it) or with the trifocal tensor.

The trifocal tensor is a generalisation of the fundamental matrix.

As with the fundamental matrix, the trifocal tensor doesn't depend on the 3D points, but only on the inter-frame camera motion.

The relationship between points and lines encoded by the trifocal tensor is called a trilinear relationship.

## The matrix view
We will not use the trifocal tensor but instead use a matrix notation which once again will make use of rank constraints which are imposed by the constraints from the multiple views.


# Modify the basic epipolar program to demonstrate pre-image and co-image 

## Pre-image and Co-image
Definition of Pre-image:_The pre-image of a point or a line in 3D is defined by the subspace (in the camera coordinate frame)  spanned by the homogeneous coordinates of the image point or points of the line in the plane._

Definition of Co-image: _The co-image of a point or a line in 3D is defined to be the maximum orthogonal supplementary subspace orthogonal to its pre-image (in the camera coordinate frame)._



## Pre-image of a point


For a point, the pre-image is a 1D subspace, i.e. a line defined by a vector.
In the figure the pre-image of the point $p$ in camera\textsubscript{1}  is $\vec{x_1}$

![](images/precoimage.png)


## Pre-image of a line


For a line, the pre-image is a 2D subspace, i.e. a plane which can be defined by any two linearly independent vectors in that plane.

The plane uniquely determines the image of the line in the image plane as the intersection of the plane with image plane.
In the figure the pre-image of the line $L$ in camera\textsubscript{1} is the dark blue line. 
	

## Co-image of a point}

As the pre-image of a point is a 1D subspace then its co-image is a plane with the pre-image vector being the normal to that plane.



## Co-image of a line


As the pre-image of a line is a 2D subspace, i.e. a plane, then its co-image is a vector that is normal to this plane.
Shown in the diagram as $\vec{l_1}$ and $\vec{l_2}$.


	
	
## Pre-image from multiple views


A pre-image of multiple images of a point is the largest set of 3D points that give rise to the same set of multiple images of the point.

In the figure for the two images, $p$ is the pre-image as it is the intersection of the two vectors $\vec{x_1}$ and $\vec{x_2}$.


	
 
## Pre-image from multiple views


A pre-image of multiple images of a line is the largest set of 3D points that give rise to the same set of multiple images of the line.
In the figure for the two images, $L$ is the pre-image as it is the intersection of the two planes $l_{1\times}$ and $l_{2\times}$.

 
## Pre-image intersection

The pre-image in multiple images of points and lines can be defined by the intersection.
pre-image $(\vec{x_1},\dots, \vec{x_m})$ = pre-image$(\vec{x_1}) \cap \cdots \cap$ pre-image$(\vec{x_m})$

pre-image $(l_1,\dots, l_m)$ = pre-image$(l_1) \cap \cdots \cap$ pre-image$(l_m)$  

The pre-image of multiple image lines can either be nothing (empty set), a point, a line or a plane, depending on whether or not they come from the same line in space.




Assume we have a moving camera, at time $t$, let $x(t)$ denote the coordinates of a 3D point $\mathbf{X}$ in homogeneous coordinates.

\begin{equation}
	\lambda(t)x(t) = K(t)\Pi_0g(t)\mathbf{X}
\end{equation}

where $\lambda(t)$ denotes the depth of the point, $K(t)$ denotes the intrinsic paramters and $\Pi_0$ denotes the generic projection.

\begin{equation}
	g(t) = \begin{bmatrix}
R(t) & T(t)\\
0 & 1
\end{bmatrix} \in SE(3)
\end{equation}

Which you recall denotes the rigid body motion, at time t.

## 3D line $L$


A 3D line $L$ in homogeneous coordinates can be written as,
\begin{equation}
	L = \{\mathbf{X}|\mathbf{X}=\mathbf{X}_0 + \mu\mathbf{V}, \mu \in \mathbb{R}\} \subset \mathbb{R}^4
\end{equation}

Where $\mathbf{X}_0 = [X_0,Y_0,Z_0,1]^{\top} \in \mathbb{R}^4$ are the coordinates of the base point $p_0$ and 

$\mathbf{V} = [V_1,V_2,V_3,0]^{\top} \in \mathbb{R}^4$ is a nonzero 

vector indicating the line direction.

![](images/precoimage.png)

The pre-image of $L$ w.r.t. the image at time $t$ is a plane with normal $l(t)$. The vector $l(t)$ is orthogonal to all points $x(t)$ of the line

\begin{equation}
	l(t)^{\top}x(t) = l(t)^{\top}K(t)\Pi_0g(t)\mathbf{X} = 0
\end{equation}

Assume we have a set of $m$ images at times $t_1,\dots,t_m$ where
$\lambda_i=\lambda(t_i)$

$x_i = x(t_i)$,

$l_i = l(t_i)$,

$\Pi_i=K(t_i)\Pi_0g(t_i)$


We can now relate the $i^{th}$ image of a point $p$ to its world coordinates $\mathbf{X}$:

\begin{equation}
	\lambda_ix_i=\Pi_i\mathbf{X}
\end{equation}

and the $i^{th}$ co-image of a line $L$ to its world coordinates 
$(\mathbf{X}_0, \mathbf{V})$:
\begin{equation}
	l^{\top}_i\Pi_i\mathbf{X}_0=l_i^{\top}\Pi_i\mathbf{V}=0	
\end{equation}
    

	


## Pre-images and Rank Constraints

As we did in the two-view case, we need to remove the 3D coordinates of points (and lines) from the equations on the previous slides if we are to solve the system.

We want equations that are in only the 2D coordinates, which we know.

Take the images of a 3D point $\mathbf{X}$ which we capture in multiple views;
\begin{equation}
	\mathcal{I}\vec{\lambda} \equiv 
	\begin{bmatrix}
	x_1 & 0 & \cdots & 0 \\
    0 & x_2 &  0 & 0\\
	\vdots & \vdots & \ddots & \vdots\\
	0 & 0 & \cdots & x_m\\
	\end{bmatrix}\begin{bmatrix}
	\lambda_1  \\
    \lambda_2 \\
	\vdots \\
	\lambda_m\\
	\end{bmatrix}=\begin{bmatrix}
	\Pi_1  \\
    \Pi_2 \\
	\vdots \\
	\Pi_m\\
	\end{bmatrix}\mathbf{X}\equiv \Pi\mathbf{X}
\end{equation}
or compactly:

\begin{equation}
	\mathcal{I}\vec{\lambda} =\Pi\mathbf{X}
\end{equation}




## Pre-images and Rank Constraints

$$\mathcal{I}\vec{\lambda} =\Pi\mathbf{X}$$
where $\vec{\lambda} \in \mathbb{R}^m$ is the depth scale vector, and $\Pi \in \mathbb{R}^{3m\times4}$ is the multiple-view projection matrix associated with the image matrix $\mathcal{I} \in \mathbb{R}^{3m\times m}$

Just as with the two-view case, this equation is not of use yet as everything in it is unknown apart from the 2D coordinates.

So our goal is to decouple the above equations into constraints which allow us to separately recover the camera displacements $\Pi_i$ first and then the scene structure $\lambda_i$ and $\mathbf{X}$.




In [31]:
import math
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import ipywidgets as widgets
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import matplotlib.patches as patches
from matplotlib.gridspec import GridSpec

camera_coords = np.array(np.zeros([3,2,2]))
epipole_coords = np.array(np.zeros([2,2]))
lambda1 = 0
lambda2 = 0
x1 = 0
x2 = 0
num_points = 3



def update_2d_plots(fig,gs, camera_coords, epipole_coords, ep):
    
    #fig, axs = plt.subplots(1, 2, figsize=(10, 5))
    #axs = [fig.add_subplot(1, 3, 2), fig.add_subplot(1, 3, 3)]
    axs = [fig.add_subplot(gs[0,3]),fig.add_subplot(gs[1,3])]
    # Clear existing plots
    axs[0].cla()
    axs[1].cla()
    
    # Update and configure the first plot (Camera 1 View)
    axs[0].set_title('Camera 1 View')
    axs[0].set_xlim(-1, 1)
    axs[0].set_ylim(-0.5, 0.5)
    rect1 = patches.Rectangle((-0.6, -0.4), 1.2, 0.8, color=(0, 1, 1, 0.5))  # Cyan color
    axs[0].add_patch(rect1)
    axs[0].plot([camera_coords[0][0][0], camera_coords[2][0][0]], [camera_coords[0][0][1], camera_coords[2][0][1]], color='blue')
    for i in range(0,num_points):
        axs[0].scatter(camera_coords[i][0][0], camera_coords[i][0][1], color='magenta')
        axs[0].text(camera_coords[i][0][0], camera_coords[i][0][1], f"x{i}", color='black')
        
        if ep:
            axs[0].scatter(epipole_coords[0][0], epipole_coords[0][1], color='black', marker='x')
            axs[0].plot([camera_coords[i][0][0], epipole_coords[0][0]], [camera_coords[i][0][1], epipole_coords[0][1]], color='green')
            


        # Update and configure the second plot (Camera 2 View)
    axs[1].set_title('Camera 2 View')
    axs[1].set_xlim(-1, 1)
    axs[1].set_ylim(-0.5, 0.5)
    rect2 = patches.Rectangle((-0.6, -0.4), 1.2, 0.8, color=(1, 1, 0, 0.2))  # Yellow color
    axs[1].add_patch(rect2)
    axs[1].plot([camera_coords[0][1][0], camera_coords[2][1][0]], [camera_coords[0][1][1], camera_coords[2][1][1]], color='red')
    for i in range(0,num_points):
        axs[1].scatter(camera_coords[i][1][0], camera_coords[i][1][1], color='green')
        axs[1].text(camera_coords[i][1][0], camera_coords[i][1][1], f"x{i}\'", color='black')
        
        if ep:
            axs[1].plot([camera_coords[i][1][0], epipole_coords[1][0]], [camera_coords[i][1][1], epipole_coords[1][1]], color='magenta')
            axs[1].scatter(epipole_coords[1][0], epipole_coords[1][1], color='black', marker='x')
    # Redraw the plots
    plt.draw()


# Define a function to update the plot with both elevation and azimuth angles
def update_plot(elev_angle, azim_angle, roll_angle, FL2, Alpha, Beta, Gamma, tx, ty, tz, ep):
    # Create a new matplotlib figure and axis
    fig = plt.figure(figsize=(20, 6))
    gs = GridSpec(2, 4, figure=fig)
    #fig = plt.figure(figsize=(30, 10))
    #ax = fig.add_subplot(111, projection='3d')
    ax = fig.add_subplot(gs[0:4,:], projection='3d')
    # Set axes labels and limits
    ax.set_xlabel('X axis')
    ax.set_ylabel('Y axis')
    ax.set_zlabel('Z axis')
    ax.set_xlim([-2, 2])
    ax.set_ylim([-2, 2])
    ax.set_zlim([0, 4])
    global E_mat
    global rot
    global T
    global world_coord


    # Second camera array
    K=np.array([[1,0,0,0],
               [0,1,0,0],
               [0,0,1,0],
               [0,0,0,1]])
    
    T = np.array([[1,0,0,tx],
                   [0,1,0,ty],
                   [0,0,1,tz],
                   [0,0,0,1]])
    
    # Individual Euler angle matrices
    alphaRot = np.array([[1,0,0,0],
       [0,math.cos(math.pi*Alpha/180),-math.sin(math.pi*Alpha/180),0],
       [0,math.sin(math.pi*Alpha/180),math.cos(math.pi*Alpha/180),0],
       [0,0,0,1]])
    betaRot = np.array([[math.cos(math.pi*Beta/180),0,math.sin(math.pi*Beta/180),0],
       [0,1,0,0],
       [-math.sin(math.pi*Beta/180),0,math.cos(math.pi*Beta/180),0],
       [0,0,0,1]])
    gammaRot = np.array([
       [math.cos(math.pi*Gamma/180),-math.sin(math.pi*Gamma/180),0,0],
       [math.sin(math.pi*Gamma/180),math.cos(math.pi*Gamma/180),0,0],
        [0,0,1,0],
       [0,0,0,1]]
    # Full rotation matrix but keep in mind that changing the order will change the rotation.
    rot = alphaRot @ betaRot @ gammaRot
    
    # Camera two focal length only.
    K_FL = ([[FL2,0,0,0],
       [0,FL2,0,0],
       [0,0,FL2,0],
       [0,0,0,1]])
    
    '''Special matrix for the applying the focal length to the z-axis only 
    This is used to move the image sensor with the focal length but not resize the sensor
    '''
    K_plane = ([[1,0,0,0],
               [0,1,0,0],
               [0,0 ,FL2,0],
               [0,0 ,0,1]])
    
    '''K_NF is the camera two matrix but without the focal length
    This is to all the red, green and blue axes for camera two 
    to be the same size as for camera one. So this matrix is to help 
    with the visualisation only'''
    K_NF = K @ T @ rot 
    
    '''K_z is for the visualisation only. It allows the camera two frame to be shown in the correct
    position without re-sizing the frame. Note, as we are only affecting the z-axis, ordering matters here.
    You must do the rotation and translation first and only then extend the z-axis or otherwise you will rotate
    and translate what you did to the z-axis and point it in another direction'''
   
    K_z = K @ T  @ rot @ K_plane 
   
    '''This is the full camera two matrix (relative to camera one). The focal length is in multiples 
    of the first camera focal length. Hence the first camera focal lenght is fixed at 1 and therefore all 
    coordinates are in units of the focal length of camera one'''
    
    K = K  @ T  @  rot @  K_FL  
    
       
    # Plotting the axes for the two cameras
    axes = np.array([[[-.1, 0, 0],[.1, 0, 0]],
            [[0, -.1, 0], [0, .1, 0]],
            [[0, 0, 0], [0, 0, 0.5]]])
               
       
    axes_cam_2 = axes.reshape(6,3)
    axes_cam_2 = np.hstack([axes_cam_2, np.ones((6, 1))])
    axes_cam_2 = K_NF @ axes_cam_2.transpose()
    axes_cam_2 = axes_cam_2.transpose() 
    # Remove the last column
    axes_cam_2 = axes_cam_2[:, :-1]
    axes_cam_2 = axes_cam_2.reshape(3,2,3)
    colors = ['r', 'g', 'b']  # Colors for each axis
    for i in range(0, 3):
        ax.plot([axes_cam_2[i][0][0], axes_cam_2[i][1][0]],  # X coordinates
            [axes_cam_2[i][0][1], axes_cam_2[i][1][1]],  # Y coordinates
            [axes_cam_2[i][0][2], axes_cam_2[i][1][2]],  # Z coordinates
            color=colors[i]) 
        
        ax.plot([axes[i][0][0], axes[i][1][0]],  # X coordinates
            [axes[i][0][1], axes[i][1][1]],  # Y coordinates
            [axes[i][0][2], axes[i][1][2]],  # Z coordinates
            color=colors[i])   
    
    intersection_point = np.zeros([3,3])
    cam_1_coord = np.zeros([3,2])
    cam_2_coord = np.zeros([3,2])
    intersection_point_imageP2 = np.zeros([3,4])
    points = np.zeros([3,3,3])
    pre_im_points1 = np.zeros([3,3,3])
    pre_im_points2 = np.zeros([3,3,3])
    world_coord = np.array([[-1.0,  0.0,2.5],
                            [-0.5, 0.4, 2.5],
                            [0.0, 0.8, 2.5],
                            ]) 
    
    
    
    for i in range(0,num_points):
        # adding the world coordinate point
        
        ax.scatter(*world_coord[i], color='black')
        ax.text(world_coord[i][0], world_coord[i][1], world_coord[i][2], f"X{i}", color='black')

        # Drawing a line from the origin to the  World coordinate point
        ax.plot([0, world_coord[i][0]], [0, world_coord[i][1]], [0, world_coord[i][2]], color='magenta')

        # Creating a plane normal to the y-axis centered at (0, 1, 0)
        x = np.linspace(-.6, .6, 10)
        y = np.linspace(-.4, .4, 10)
        X, Y = np.meshgrid(x, y)
        Z = np.ones_like(X)  # Plane centered at Z=focal length
        image_plane1 = np.array([X,Y,Z, np.ones_like(X)])



        camera_2_center = K_NF @ np.array([0,0,0,1])
        # Drawing a line from the camera 2 center to the point
        ax.plot([camera_2_center[0], world_coord[i][0]], [camera_2_center[1], world_coord[i][1]], [camera_2_center[2], world_coord[i][2]], color='green')

        # Adding the plane with transparency
        if i == 0:
            ax.plot_surface(image_plane1[0], image_plane1[1], image_plane1[2], color='cyan', alpha=0.5)

            # This reshapes image_plane1 for matrix multiplication by our camera 2 matrix
            image_plane2 = K_z @ image_plane1.reshape(4,-1) 

            # Reshaping back to original shape
            image_plane2 = image_plane2.reshape(image_plane1.shape) 
            ax.plot_surface(image_plane2[0], image_plane2[1], image_plane2[2], color='yellow', alpha=0.5)

        # The intersection point where the magenta line intersects the image_plane1 Z = 1
        intersection_point[i] = (world_coord[i][0]/world_coord[i][2], world_coord[i][1]/world_coord[i][2], world_coord[i][2]/world_coord[i][2])
        cam_1_coord[i] = np.array(intersection_point[i][:2])


        
        ax.scatter(*intersection_point[i], color='magenta')
        
        world_hom = np.array([world_coord[i][0],world_coord[i][1],world_coord[i][2],1])
        try:
            K_inv = np.linalg.inv(K)
        except np.linalg.LinAlgError:
            print("The matrix is not invertible.")

        temp_world = K_inv @ world_hom
        if i == 0:
            temp_world0=temp_world
        intersection_point_imageP2[i] = np.array([FL2*temp_world[0]/temp_world[2], 
                                      FL2*temp_world[1]/temp_world[2], 
                                      FL2*temp_world[2]/temp_world[2],1])

        x2 = intersection_point_imageP2[i][:3]
        cam_2_coord[i] = intersection_point_imageP2[i][:2]

        intersection_point_imageP2[i] = K_NF @ intersection_point_imageP2[i]
        pt = (intersection_point_imageP2[i][0],intersection_point_imageP2[i][1],intersection_point_imageP2[i][2])
        ax.scatter(*pt, color='green')

        # draw line between camera centers
        ax.plot([0, camera_2_center[0]], [0, camera_2_center[1]], [0, camera_2_center[2]], color='cyan')

        points[i] = np.array([[0, 0, 0],  # Origin - camera 1 center
                           [world_coord[i][0], world_coord[i][1], world_coord[i][2]],  # World coordinate
                           [camera_2_center[0], camera_2_center[1], camera_2_center[2]]])  # Camera 2 center

        # Shade in the Epipolar plane
        epipoloar_plane = Poly3DCollection([points[i]])
        epipoloar_plane.set_color('grey')
        epipoloar_plane.set_alpha(0.2)  # Adjust transparency here
        ax.add_collection3d(epipoloar_plane)

        pre_im_points1[i] = np.array([[0, 0, 0],  # Origin - camera 1 center
                           [world_coord[0][0], world_coord[0][1], world_coord[0][2]],  # World coordinate 0
                           [world_coord[2][0], world_coord[2][1], world_coord[2][2]]])  # World coordinate 2

        # Shade in the co-image plane 1
        pre_image_plane1 = Poly3DCollection([pre_im_points1[i]])
        pre_image_plane1.set_color('blue')
        pre_image_plane1.set_alpha(0.1)  # Adjust transparency here
        ax.add_collection3d(pre_image_plane1)

        pre_im_points2[i] = np.array([[camera_2_center[0], camera_2_center[1], camera_2_center[2]],  # Origin - camera 1 center
                           [world_coord[0][0], world_coord[0][1], world_coord[0][2]],  # World coordinate 0
                           [world_coord[2][0], world_coord[2][1], world_coord[2][2]]])  # World coordinate 2

         # Shade in the co-image plane 2
        pre_image_plane2 = Poly3DCollection([pre_im_points2[i]])
        pre_image_plane2.set_color('red')
        pre_image_plane2.set_alpha(0.1)  # Adjust transparency here
        ax.add_collection3d(pre_image_plane2)
        
        
        if i == 0:
            # Show the epipole for camera 1
            cam_1_epipole = (camera_2_center[0]/camera_2_center[2], 
                             camera_2_center[1]/camera_2_center[2], 
                             camera_2_center[2]/camera_2_center[2])
            ax.scatter(*cam_1_epipole, color='black', marker='x')
            epipole_coords[0] = np.array([cam_1_epipole[0], cam_1_epipole[1]])

            # Show the epipole for camera 2
            cam_2_view_origin = K_inv @ np.array([0,0,0,1])
            cam_2_epipole = (cam_2_view_origin[0]/cam_2_view_origin[2], 
                             cam_2_view_origin[1]/cam_2_view_origin[2],
                             cam_2_view_origin[2]/cam_2_view_origin[2], 1)
            epipole_coords[1] = np.array([cam_2_epipole[0], cam_2_epipole[1]])
            cam_2_epipole = K @ cam_2_epipole
            ax.scatter(*cam_2_epipole[:3], color='black', marker='x')

            # Adjust view
            ax.view_init(elev=elev_angle, azim=azim_angle, roll=roll_angle)

        #show view in camera 1    
        cam_1_coord[i] = np.array([world_coord[i][0]/world_coord[i][2], world_coord[i][1]/world_coord[i][2]])


        camera_coords[i][0] = cam_1_coord[i]
        camera_coords[i][1] = cam_2_coord[i]

    
    
    update_2d_plots(fig,gs, camera_coords, epipole_coords, ep)
    
    lambda1 = world_coord[0][2]#math.sqrt((world_coord[0]**2)+(world_coord[1]**2)+(world_coord[2]**2))
    lambda2 = FL2*temp_world0[2]# math.sqrt((world_coord[0]-camera_2_center[0])**2+(world_coord[1]-camera_2_center[1])**2+(world_coord[2]-camera_2_center[2])**2)
    print(f'lambda1:{lambda1}')
    print(f'lambda2:{lambda2}')
    
    x0 = np.append(camera_coords[0][0], 1)
    
    x0p = np.append(camera_coords[0][1], 1)
    print(f'x0:{x0}')
    #np.append(cam_2_coord, 1)
    print(f'x0\':{x0p}')
    Tx = np.array([[0, -tz, ty],
                   [tz, 0, -tx],
                   [-ty, tx, 0]])
    
    R = np.array(rot[:3,:3])
    print(f'x0TxRx0\':{ x0 @ Tx @ R @ x0p}')
    print(f'lambda1*x1:{lambda1*x0}')
    R_inv = np.linalg.inv(R)
    print(f'R(lambda1*x0)+T:{R_inv @ (lambda1*x0 -np.array([tx, ty,tz])) }' )
    print(f'lambda2*x0\':{lambda2*x0p}')
    
    E_mat= Tx @ R
    
    
    print(f'Determinant of E=TxR: {np.linalg.det(E_mat)}')
    # Show the plot
    plt.show()
    



elev_slider = widgets.IntSlider(min=-180, max=180, step=1, value=0, description='Elevation')
azim_slider = widgets.IntSlider(min=-180, max=180, step=1, value=90, description='Azimuth')
roll_slider = widgets.IntSlider(min=-180, max=180, step=1, value=-90, description='Roll')



FL_slider2 = widgets.FloatSlider(min=0.1, max=3, step=0.1, value=1.0, description='Cam 2 Focal')

alpha_slider = widgets.IntSlider(min=-180, max=180, step=1, value=0, description='Cam2 Alpha')
beta_slider = widgets.IntSlider(min=-180, max=180, step=1, value=-150, description='Cam2 Beta')
gamma_slider = widgets.IntSlider(min=-180, max=180, step=1, value=-70, description='Cam2 Gamma')


tx_slider = widgets.FloatSlider(min=-2.0, max=2.0, step=0.1, value=1, description='Tx')
ty_slider = widgets.FloatSlider(min=-2.0, max=2.0, step=0.1, value=1, description='Ty')
tz_slider = widgets.FloatSlider(min=0.0, max=5.0, step=0.1, value=4, description='Tz')

# Group sliders into two columns
left_box = widgets.VBox([elev_slider, azim_slider, roll_slider, Xw_slider, Yw_slider, Zw_slider ])
right_box = widgets.VBox([ FL_slider2, alpha_slider, beta_slider, gamma_slider, tx_slider, ty_slider, tz_slider])

epipoles_checkbox = widgets.Checkbox(value=False, description='Show Epipoles in 2D',disabled=False)

# Combine the two columns into a single horizontal layout
ui = widgets.HBox([left_box,  right_box])


# Interactive widget
out = widgets.interactive_output(update_plot, {'elev_angle': elev_slider, 'azim_angle': azim_slider, 
                                               'roll_angle': roll_slider, 
                                                'FL2': FL_slider2, 
                                               'Alpha': alpha_slider, 'Beta': beta_slider, 'Gamma': gamma_slider,
                                              'tx': tx_slider, 'ty': ty_slider, 'tz': tz_slider, 'ep': epipoles_checkbox})

sliders_box = widgets.VBox([elev_slider, azim_slider, roll_slider, 
                            FL_slider2, alpha_slider, beta_slider, 
                            gamma_slider, tx_slider, ty_slider, tz_slider, epipoles_checkbox])
ui = widgets.HBox([sliders_box, out])


# Display the UI and the output widget
display(ui)

HBox(children=(VBox(children=(IntSlider(value=0, description='Elevation', max=180, min=-180), IntSlider(value=…

## Point Features

Every column of $\mathcal{I}$ lies in a 4D space spanned by the columns of the matrix $\Pi$. 

In order to have a solution to the above equation, the columns of $\mathcal{I}$ and $\Pi$ must therefore be linearly **dependent**,
i.e. 
\begin{equation}
	N_p \equiv (\Pi,\mathcal{I}) = \begin{bmatrix}
	\Pi_1 & x_1 & 0 & \cdots & 0 \\
    \Pi_2 & 0 & x_2 &  0 & 0\\
	\vdots & \vdots & \vdots & \ddots & \vdots\\
	\Pi_m & 0 & 0 & \cdots & x_m\\
	\end{bmatrix} \in \mathbb{R}^{3m\times(m+4)}
\end{equation}

must have a non-trivial right null space. For $m \geq 2$ (i.e. $3m \geq m+4)$, full rank would be $m+4$. Linear dependence of columns therefore implies the rank constraint.

\begin{equation}
	rank(N_p) \leq m+3
\end{equation}


We can make a more compact formulation as follows.

First introduce the following matrix.

\begin{equation}
	\mathcal{I}^{\perp} \equiv \begin{bmatrix}
	x_{1\times} & 0 & \cdots & 0 \\
    0 & x_{2\times} &  0 & 0\\
	\vdots & \vdots & \ddots & \vdots\\
	0 & 0 & \cdots & x_{m\times}\\
	\end{bmatrix} \in \mathbb{R}^{3m\times3m}
\end{equation}

which has the property of removing $\mathcal{I}$

\begin{equation}
	\mathcal{I}^{\perp}\mathcal{I} = 0
\end{equation}

So we can pre-muliply $\mathcal{I}\vec{\lambda} =\Pi\mathbf{X}$ by $\mathcal{I}^{\perp}$ to get

\begin{equation}
	\mathcal{I}^{\perp}\Pi\mathbf{X} = 0
\end{equation}



Once again we see a solution defined by a null space.
i.e. $X$ is in the null space of the matrix

\begin{equation}
	W_p \equiv \mathcal{I}^{\perp}\Pi = 
	\begin{bmatrix}
	x_{1\times}\Pi_1  \\
   x_{2\times}\Pi_2 \\
	\vdots \\
    x_{m\times}\Pi_m
	\end{bmatrix}  \in \mathbb{R}^{3m\times4}
\end{equation}

To have a non-trivial solution, we must have 

\begin{equation}
	\text{rank}(W_p) \leq 3
\end{equation}

## Multi-view Matrix of a point

Beware! These matrices will begin to look very scary.

DON'T PANIC!

First, we want to rewrite the rank constraints in a more compact way so that it is more transparent what we are solving for.
If we have $m$ images of a point we will show the first as a projection of the world coordinates and then show each of the others as a rotation and translation from the first.

We are assuming that these are calibrated cameras and so $(K_i=I)$

\begin{equation}
	\Pi_1=[I, 0], \Pi_2=[R_2, T_2],\dots, \Pi_m=[R_m,T_m] \in \mathbb{R}^{3\times4}
\end{equation}

As before we define the matrix $W_p$ as:

\begin{equation}
	W_p \equiv \mathcal{I}^{\perp}\Pi = 
	\begin{bmatrix}
	x_{1\times}\Pi_1  \\
   x_{2\times}\Pi_2 \\
	\vdots \\
    x_{m\times}\Pi_m
	\end{bmatrix}  \in \mathbb{R}^{3m\times4}
\end{equation}

The rank of the matrix $W_p$ is not affected if we multiply by a full-rank matrix $D_p \in \mathbb{R}^{4\times5}$ as follows:


\begin{equation}
	W_pD_p=\begin{bmatrix}
	x_{1\times}\Pi_1  \\
   x_{2\times}\Pi_2 \\
	\vdots \\
    x_{m\times}\Pi_m
	\end{bmatrix} 
	\begin{bmatrix}
    x_{1\times} & x_1 & 0 \\
    0 & 0 & 1
 \end{bmatrix}= 
 \begin{bmatrix}
    x_{1\times}x_{1\times} & 0 & 0\\
    x_{2\times}R_2x_{1\times} & x_{2\times}R_2x_1 & x_{2\times}T_2\\
    x_{3\times}R_3x_{1\times} & x_{3\times}R_3x_1 & x_{3\times}T_3\\
    \vdots & \vdots & \vdots\\
    x_{m\times}R_mx_{1\times} & x_{m\times}R_mx_1 & x_{m\times}T_m\\
 \end{bmatrix}
\end{equation}





So the rank$(W_p)\leq3$ if and only if the submatrix
\begin{equation}
	M_p \equiv \begin{bmatrix}
 x_{2\times}R_2x_1 & x_{2\times}T_2\\
     x_{3\times}R_3x_1 & x_{3\times}T_3\\
     \vdots & \vdots\\
    x_{m\times}R_mx_1 & x_{m\times}T_m\\
 \end{bmatrix} \in \mathbb{R}^{3(m-1)\times2}
\end{equation}

has rank$(M_p)\leq 1$.

$M_p$ is called the multiple-view matrix of a point $p$.

It relates the 

$x_1$ in the first view and the co-images $x_{i\times}$ in the remaining views.

## Analysis: Multi-View Constraint

For any non-zero vectors $a_i, b_i \in \mathbb{R}^3, i=1,2,\dots, n$, the matrix

\begin{equation}
\begin{bmatrix}
 a_1 & b_1\\
     a_2 & b_2\\
     \vdots & \vdots\\
    a_n & b_n \\
 \end{bmatrix} \in \mathbb{R}^{3n\times2}	
\end{equation}

is rank-deficient if and only if $a_ib^{\top}_j - b_ia^{\top}_j = 0$ for all $i,j=1,\dots,n$. 
Applied to the rank constraint on $M_p$ we get:

\begin{equation}
	x_{i\times}R_ix_1(x_{j\times}T_j)^{\top} - x_{i\times}T_i(x_{j\times}R_jx_1)^{\top} = 0 
\end{equation}

## The Tri-linear Constraint

This gives us the tri-linear constraint
\begin{equation}
	x_{i\times}(T_ix_1^{\top}R_j^{\top} - R_ix_1T^{\top}_j)x_{j\times} = 0 
\end{equation}

This is a matrix equation giving $3\times3 = 9$ scalar tri-linear equations, only four of which are linearly independent.
From the equations 

$$x_{i\times}R_ix_1(x_{j\times}T_j)^{\top} - x_{i\times}T_i(x_{j\times}R_jx_1)^{\top} = 0 \quad  \forall i,j$$

We see that as long as the entries in $x_{j\times}T_j$ and $x_{j\times}R_jx_1$ are non-zero, it follows from the above, that the two vectors $x_{i\times}R_ix_1$ and $x_{i\times}T_i$ are linearly dependent.  

In other words: Except for degeneracies, the bi-linear (epipolar) constraints relating two views are already contained in the tri-linear constraints obtained for the multi-view scenario.

Note that the equivalence between the bi-linear and tri-linear constraints on the one hand and the condition that rank($M_p\leq 1$) on the other only holds if the vectors in $M_p$ are non-zero.

In certain degenerate cases this is not fulfilled.

An example of a rare degenerate case is that the point $p$ lies on the line through the optical centers $o_1$ and $o_j$.

## AV Degenerate Cases


![](images/degenerateFOE.png)
Vechicle Degenerate Case
 

## Uniqueness of Pre-image

Using the multiple-view matrix we obtain a more general and simpler characterization regarding the uniqueness of the pre-image.

Given $m$ vectors representing the $m$ images of a point in $m$ views, they correspond to the same point in the 3D space if the rank of the $M_p$ matrix relative to any of the camera frames is one.

If the rank is zero, the point is determined up to the line on which all the camera centers must lie.
In summary we get the following


- rank$(M_p)=2 \rightarrow{}$ no point correspondence and empty pre-image.
- rank$(M_p)=1 \rightarrow{}$ point correspondence and unique pre-image. 
- rank$(M_p)=0 \rightarrow{}$ point correspondence and pre-image not unique.

## Let's take stock

What we have shown for points and lines is that we can construct a matrix, with a rank deficiency.

Our solution lives in the null space of the matrix.

If we construct a matrix with just the right rank then the null space will be the same dimensions as our solution. 

And that would mean a unique solution. 

At least up to scale.

So our next move is to actually find the null space.

## Multi-view Factorisation

With the two view case what we did here is to try to separate the structure (3D points) from the motion (Rotation and Translation).

We will try the same here but beware that once again noise can cause these algorithms to be sub-optimal.

So if we have $m$ images of $n$ points $p^j$ and we want to estimate the unknown projection matrix $\Pi$, the condition rank$(M_p) \leq 1$) states that the two columns of $M_p$ are linearly dependent. 


For the $j^{\text{th}}$ point $p^j$ this implies

\begin{equation}
	\begin{bmatrix}
x_{2\times}^jR_2x_1^j \\
x_{3\times}^jR_3x_1^j \\
\vdots\\
x_{m\times}^jR_mx_1^j \\
 \end{bmatrix} + \alpha^j\begin{bmatrix}
x_{2\times}^jT_2 \\
x_{3\times}^jT_3 \\
\vdots\\
x_{m\times}^jT_m \\
 \end{bmatrix} = 0 \in \mathbb{R}^{3(m-1)\times1}
\end{equation}


$\alpha \in \mathbb{R}, j=1,\dots,n$.

Each row in the above equation can be obtained from $\lambda^j_ix^j_i=\lambda^j_1R_ix^j_1+T_i$ and multiplying by 
$x^j_{i\times}$.


\begin{equation}
	x^j_{i\times}R_ix^j_1 + x^j_{i\times}T_i/\lambda^j_1=0
\end{equation}

Therefore, $\alpha^j=1/\lambda^j_1$ is nothing but the inverse of the depth of point $p^j$ w.r.t. the first frame.




## If we know the structure
If we know the depth of points i.e. their inverse $\alpha^j$ then the equation above is linear in the camera motion parameters $R_i$ and $T_i$. 

We can use the stack operation $R^s_i=[r_{11},r_{21},r_{31},r_{12},r_{21},r_{32},r_{13},r_{23},r_{33}]^{\top}\in \mathbb{R}^9$ and $T_i\in \mathbb{R}^3$, we have the linear equation system.

\begin{equation}
	P_i \begin{bmatrix}
R^s_i\\
T_i
 \end{bmatrix} = \begin{bmatrix}
{x^1_1}^{\top} \otimes x^1_{j\times} & \alpha^1x^1_{j\times}\\
{x^2_1}^{\top} \otimes x^2_{j\times} & \alpha^2x^2_{j\times}\\
\vdots & \vdots\\
{x^n_1}^{\top} \otimes x^n_{j\times} & \alpha^nx^n_{j\times}\\
 \end{bmatrix}\begin{bmatrix}
R^s_i\\
T_i
 \end{bmatrix}=0 \in \mathbb{R}^{3n}
\end{equation}



The $P_i\in \mathbb{R}^{3n\times12}$ is of rank=11 if more than 6 points in general position are used. 

If that is the case the null space of $P_i$ is 1-Dimensional and the projection matrix $\Pi_i=(R_i,T_i)$ is given up to a scale factor.

In practice you should use more than 6 points, obtain a full rank matrix and compute the solution by SVD using the smallest singular value.


## If we know the motion
If we know $\Pi_i = (R_i, T_i), i=1,\dots,m$, then we can estimate the structure i.e. $\alpha^j, j=1,\dots,m)$.

The least squares solution to equation

\begin{equation}
	x^j_{i\times}R_ix^j_1 + x^j_{i\times}T_i/\lambda^j_1=0
\end{equation} 

is given by:

\begin{equation}
	\alpha^j=-\frac{\sum^m_{i=2}(x^j_{i\times}T_i)^{\top}x_{i\times}^jR_ix^j_1}{\sum^m_{i=2}||x_{i\times}^jT_i||^2}, j=1,\dots,n.
\end{equation}

## But we don't know....
Ok, so unfortunately we don't know the the structure or the motion and unlike the two view case we can't easily separate them.

However if we can estimate either or both we can iteratively estimate each in turn while keeping the other fixed.

These estimates could come from other sensors as part of a sensor fusion setup.

Or we could apply the 8-point algorithm to the first two images to obtain an estimate of the structure parameters $\alpha^j$.

While the equation for $\Pi_i$ makes use of the two frames 1 and $i$ only, the structure parameter estimation takes into account all frames. 

## Line Features

Just as with the point features, we can use a rank constraint for lines. 

The co-images $l_i$ of a line $L$ spanned by a base $\mathbf{X}_0$ and a direction $\mathbf{V}$ we have:

\begin{equation}
	l_i^{\top}\Pi_i\mathbf{X}_0=l_i^{\top}\Pi_i\mathbf{V} = 0
\end{equation}

Don't let the subtlety of the above equation fool you.\\ $\mathbf{X}_0 \neq \mathbf{V}$
Instead, it is saying that both $\mathbf{X}_0$ and $\mathbf{V}$ are in the null space of $l_i^{\top}\Pi_i$.


Let us construct the following matrix
\begin{equation}
	W_l \equiv \begin{bmatrix}
l_1^{\top}\Pi_1\\
l_2^{\top}\Pi_2\\
\vdots\\
l_m^{\top}\Pi_m
 \end{bmatrix} \in \mathbb{R}^{m\times4}
\end{equation}


An $m\times4$ matrix can have rank of at most $4$.
We know that there are at least two vectors living in the null space, i.e. $\mathbf{X}_0 \neq \mathbf{V}$.

Therefore $W_l$ can have rank of at most $2$
So the question is, how many lines do we need?

Well, if we had only two lines then that would give $rank=2$ right there, because $M_l \in \mathbb{R}^{m\times4}$.

But, this wouldn't be a solution.



This would simply be stating that any two planes in a space must meet eachother in a line....somewhere.

It wouldn't uniquely identify our line.

Instead, if we have three or more planes and they all meet in the same line, then we have a unique identification.

This is why lines don't appear in the two-view case but become useful in the multi-view case.



## Only two views of a line


![](images/LinesInTwoViews.png)
Ambiguous reconstruction with only two views of a line

## Degenerate three-view of a line}


![](images/Line3ViewDegenerate.png)
	
 


## Consistent three-view of a line}


![](images/Line3ViewGood.png)
	
 

!["HigherEd 4.0 is funded by the Human Capital Initiative Pillar 3. HCI Pillar 3 supports projects to enhance the innovation and agility in response to future skills needs"](images/HCIFunding.png)