Creating a depth map and 3D point cloud of stereo image pairs from the Middlebury Stereo Dataset (https://vision.middlebury.edu/stereo/data/)
Stereoscopic vision in humans relates to the ability to get depth information based on the perception of a scene from two different vantage points, one from each eye. Our brain processes the shift in the horizontal positions of objects in the views provided from each eye in the visual cortex giving us depth perception. We can replicate this task with cameras and some computational processing. Using a picture of the same object or scene from two different vantage points, we can compute the distance parts of the scene have shifted and infer depth. The underlying basis of this process is that objects closer to the camera eye will travel more in their movement from one image to the next, while objects further from the camera eye will travel less.
In stereo_rectification.py you will see the process of rectifying 2 uncalibrated images of a scene into a stereo image pair. (We call 2 images with purely
horizontal shift in the camera planes used to capture them a stereo image pair.) This process involves:
- Feature detection and matching between two images
- Fundamental Matrix inference using the 7-point algorithm to identify epipolar lines
- Warp each image in the pair with the homography transform needed to make epipolar lines in the image horizontal composited with the translation matrix needed to bring images in positive xy space.
- Violá! Stereo Image Pair Accomplished
Note: this image pair may not be the best for 3D reconstruction and depth map creation, calibrated camera data like the Middlebury Stereo Dataset provides is much better for nicer 3D reconstruction results!
This process can be found in disparity_map.py
- Use block matching to identify matching blocks in the stereo image pair and measure the distance between the center 'x' (columnwise) pixel in each. Store this value in your disparity map. To classify blocks as matching, use some distance metric like Sum of Absolute Diffences (see below)
- Create disparity maps for many colorspaces and average them to reduce possible inaccuracies
- Use the formula
where f is focal length and t is baseline (or distance between camera centers) to convert disparity to depth
This described process can be found in clean_up_disparity.py
- Stack the edges of your disparity map with the edges of an image in your stereo pair
- Traverse the edge image with varying window sizes
- At the largest window size containing no edges, store the median of the values in your depth map falling under this window in your filtered depth map
This process can be found in disparity_map.py, specifically in the display_depth_map function.
-
Create RGBD matrix with 1 of the 2 images forming your stereo pair and the depth map you have created
in Open3D 0.7.0 this is
rgbd = o3d.geometry.create_rgbd_image_from_color_and_depth(img, depth)in Open3D 0.8.0 this is
rgbd = o3d.geometry.RGBDImage.create_from_color_and_depth(img, depth) -
Create object storing camera intrinsic values with PinholeCameraIntrinsic using camera focal length and center
-
Create a point cloud with the RGBD image + PinholeCameraIntrinsic object
in Open3D 0.7.0 this is
pcd_from_depth_map = o3d.geometry.create_point_cloud_from_rgbd_image(rgbd, o3d_pinhole)in Open3D 0.8.0 this is
pcd_from_depth_map = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd, o3d_pinhole)
Some notes:
- If you see any errors related to Open3D in regards to creating an RGBD image or Point Cloud, check your version and use the line of code required for your installed version.
o3d.visualization.draw_geometriesdoes not work on Mac, I have only confirmed it works on Debian Linux




