3D scanning and camera tracking using a depth camera
This project is a demonstration on how to use an Intel® RealSense™ camera and create a full 3D model of an object by moving the depth camera around it. It is able to guess the movement of the camera without any additional motion sensor, just from the depth data. It then combines the data into a single model.
How it works
The project consists of three main parts: the motion estimation, the model creation, and the rendering. Almost everything is performed on the GPU by using WebGL shaders.
The motion estimation algorithm
This stage of the demo uses the ICP algorithm to guess the movement of the camera without any motion sensor (also known as SLAM). It has been inspired by the paper KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera, which implements a similar version of the algorithm optimized for GPUs. Thanks to this design, it's able to process the frames in real-time even on a relatively weak GPU.
The images below show how two frames of data (that were artificially created) get aligned over the course of 10 steps of the ICP algorithm.
The principle is similar to
linear regression. In linear
regression, you are trying to fit a line trough a noisy set of points,
while minimizing the error. With the ICP algorithm, we are trying to find
a motion that will match two pointclouds together as best as possible,
assuming 6DOF (six degrees of freedom). If we had exact information on which
point from one pointcloud corresponds to which point in the other pointcloud,
this would be relatively easy. To some degree, this could be achieved by
recognizing features of a scene (e.g. corners of a table) and deciding that they
match up, but this approach is computationally intensive and difficult to
implement. A simpler approach is to decide that whatever point is closest,
that's the corresponding point. The closest point could be found by a brute
force search or by using a k-d tree, but this project uses a heuristic that is
very well suited for the GPU and is described in the
file. It's not as exact as using the k-d tree, but has linear time complexity
for each point and is very well suited for the GPU.
This is the most complex part of the project, consisting of three different
shaders that are run several times per frame of data. The documentation is
in the shaders and in
A much simpler implementation is in the file
movement_cpu.js, which is used for
Since WebGL 2.0 doesn't have compute shaders, the calculations are done in fragment shaders that take a texture with floating point data as input. Then they render the output data into another texture with floating point data.
If memory and bandwidth were free, we could just store all the poinclouds and render them together. However, this would not only be very inefficient (we would have millions of points after just a few minutes of recording), it would also end up looking very noisy. A better solution is to create a volumetric model. You can imagine it as a 3D grid where we simply set a voxel (volumetric pixel) to 1 if a point lies within it. This would still be very inefficient and noisy, with the addition of looking too much like Minecraft. An even better way is to create a volumetric model using a signed distance function. Instead of storing 1 or 0 in a voxel, we store the distance to the object surface from the center of the voxel. This method is described in the paper A Volumetric Method for Building Complex Models from Range Images.
The demo uses a 3D texture to store the volumetric model. The details of the
model creation are described in the file
This stage is the simplest and is more closely described in the file
shaders/renderer-fshader.js. It uses the
algorithm (a simpler and faster version of raytracing) to render the volumetric,
model, on which it then applies Phong lighting.
The project works on Windows, Linux and ChromeOS with Intel® RealSense™ SR300 (and related cameras like Razer Stargazer or Creative BlasterX Senz3D) and R200 3D Cameras.
To make sure your system supports the camera, follow the installation guide in librealsense.
Connect the camera.
Go to the demo page.
To run the code locally, give Chromium the parameter
--use-fake-ui-for-media-stream, so that it doesn't ask you for camera
permissions, which are remembered only for https pages.
Intel and Intel RealSense are trademarks of Intel Corporation in the U.S. and/or other countries.