A simple implementation of classical Structure From Motion
This project was done in collaboration with Dushyant Patil and under the guidance of Prof. Nitin J. Sanket
Structure from Motion refers to reconstruction of a 3D scene and simultaneously obtain the camera poses of a monocular camera w.r.t. the given scene.
SfM makes it possible to simultaneously determine the camera settings and the 3D structure of a scene by combining 2D images taken from various perspectives. SfM approaches may create a dense point cloud from a series of partially overlapping photos even using a cheap digital camera.
It is use extensively in geosciences to provide hyperscale landform models n remote or rugged environments where terrestrial laser scanning is limited by equipment portability and airborne laser scanning is limited by terrain roughness causing loss of data and image foreshortening.
SfM is used in order to properly estimate situations as well as planning and maintenance efforts and costs, control and restoration of Cultural heritage sites.
We were given 5 distortion corrected images of Unity hall at WPI campus resized from 4000 x 3000 px to 800 x 600 px, the Camera Intrinsic Matrix, K, and SIFT keypoints and descriptors for each image.
The fundamental matrix represents the epipolar geometry between 2 images. The fundamental matrix between 2 cameras with corresponding image points x and x’ of same world point X is governed by equation:
Let the fundamental matrix be defined as below:
To find the fundamental matrix between 2 cameras using 2 images of similar scene, we need minimum 8 point matches. To solve for the elements of the fundamental matrix, we need to use the SVD of the A matrix formed using the 8 or more correspondences as shown below:
The fundamental matrix derived by solving above equation is supposed to be a rank 2 matrix for perfect matches. But, due to the noise in the image data the fundamental matrix derived might be rank 3 and needs to be set to rank 2 matrix by changing the last diagonal element of the covariance matrix of the SVD of F matrix.
To obtain a better estimate of fundamental matrix and remove outliers in the initially found matches (using SIFT feature descriptor here), we perform a RANSAC algorithm. The F matrix with maximum number of inliers and the inliers found using the RANSAC are used further for pose estimation and 3D reconstruction.
The essential matrix between 2 cameras represents the epipolar geometry irrespective of the camera matrix between 2 images capturing same scene at the same time. The essential matrix is the specialization of the fundamental matrix to the case of normalized image coordinates. The relation between the fundamental matrix and the essential matrix is as follows:
We can find the relative camera pose of second camera with respect to the first camera with the help of the essential matrix. The underlying assumption in this pose estimation is that the pose matrix for camera 1 is assumed to be
Here the W matrix is derived as multiplication of a diagonal matrix with diagonal elements (1, 1, 0) and a skew symmetric matrix. The camera pose can be written as:
Here we can find 4 ambiguous poses of second camera as follows:
t1 =
t1 =
t1 =
t1 =
If
Linear triangulation is used to estimate the world points using the camera pose and image points-feature matches. Let the image points using 2 camera describing similar scene
Using above equations, the cross product between vector
Here,
By solving for
The ambiguity for the correct pose of second camera is solved using the cheirality check condition. For cheirality check, the world points are calculated for each possible camera pose of camera 2 and camera 1 pose using linear triangulation as explained above. For the world points X and camera center C the depth positivity/ cheirality condition is checked by multiplying the difference between the world point and camera center by the third column of the rotation matrix for the given camera pose, i.e, r3
The non linear triangulation is performed to optimize the output of linear triangulation between camera 1 and true camera pose of camera 2. The non linear triangulation treats the world points found using linear triangulation as input and optimizes the reprojection error using the pinhole camera model as shown below:
The error minimization is done using non linear least squares method with the help of scipy.optimize library.
The objective of the perspective n points problem is to find the camera poses of camera 3, 4, 5 based on camera1, camera2 poses, and image3 points. The first step to solve the PnP problem is to create required input data for image1 and image3. Using RANSAC for fundamental matrix, the inliers between image1 and 3 are found. Using [the inliers output corresponding to image 1 after RANSAC between image 1 and 3] and [the inliers output corresponding to image 1 after RANSAC between image 1 and 2], we can find the common inliers corresponding to image 1. This gives us a set of common world points corresponding to the image 1 and image3. We also get a set of unique image points which solely represent the matches between image 1 and 3 and are not common with other images.
Using the image points and world points, camera3 pose can be found by solving following linear equations:
As
Let
To get a better estimate of the PnP matrix against the noise present in the input data, we run the RANSAC algorithm. The RANSAC gives The
The non linear optimization is used to minimize the re-projection error for image 3 onward. The reprojection error minimization is as shown below:
The optimized PnP matrices for
camera 3,4,5, etc are further used to find the world points corresponding to the image 3,4,5 etc. The world point estimates are found using the linear triangulation followed by non linear triangulation.
After computing all the camera poses and 3D points, the bundle adjustment is used to refine the camera pose and world point estimates of all cameras. To improve the speed of the implementation of bundle adjustment, a visibility matrix is used to minimize the reprojection error for the 3D points which are visible to particular cameras only. The optimization problem for visibility matrix can be shown as below:
- Python 3
- NumPy
- OpenCV
- SciPy
- Make sure the path to input images & output folder in your device is updates in location.txt .
- Run the Wrapper.py by the following command.
python3 Wrapper.py