3D Recognition of a Hand

Problem Description

The servos and motors that control the arm and hand of an object often do not move to exactly where it is supposed to go. If the hand position could be found in a scene its location could be used to correct the error. Some include cameras or expensive feedback sensors on their hands to aid in correction. There is no shortage of robots that simply use incredibly expensive but precise servos. Many robots that cannot afford these components possess some easily recognizable patterns or colors on their hands to aid in segmentation using 2D methods. However, this method requires that special code be written for each unique hand.

Possible Solution

Yet there is a possibility of using a 3D point cloud approach to provide visual feedback of the arm's location.

Preforming 3D model scans of objects has become easy enough that a CAD model of a hand can be obtained in less than minute with a proper scan environment.
This 3D CAD model can then be used to generate a database of point cloud models of the object from different perspectives.
For each perspective of the hand a corresponding global/local descriptor(s), that is some numerical representation of a point cloud cluster's geometrical features or a points relationship with nearby points, can be saved.
Search information like a KDTree and HDF5 data can be pre-processed for the database, allowing for quick searches of the data base to be used in real time matching of a segmented object's global descriptors.

This means it might be possible to segment an arm out of a point cloud scene and recognize it in real time.

Assumptions

So far, this is only to be considered a viable solution only under the following circumstances:

The hand to segment will stay in a rigid position throughout the pick and place process. (AKA, the hand will not change shape to something outside of the database during the tracking process)
The hand to segment can be seen using depth cameras well. That is, is it not black, transparent, or reflective. Some robot hands may have to painstakingly painted to a matte non-black surface to be seen by depth sense cameras.
The hand to segment will only be tracked when it can be segmented out. For example, if using prism based object segmentation, the hand must be over the segmented plane within a certain distance (the height of the prism).

Generating a 3D CAD Model (.ply) of a hand.

In order to use our hand recognition package you must have a 3D Model of your hand. Beyond making the model yourself, you have the option to take a 3D scan of the hand. To do this you must:

Have a depth sense camera that has small enough resolution to capture a decent 3D scan of your robot's hand.
Have some software capable of capturing a 3D scan of the hand.

Recommended software:

Realsense R200 SDK 3D Scanner This obviously only works for the Intel Realsense cameras.
PCL In-hand scanner for small objects Compatible with OpenNI cameras only as of April 2016. Attempts to make it work with any depth sense camera have been attempted. See ROS In-hand Scanner, but please note that our team-member James Schiffer was unable to get it to work with a Realsense R200 camera with PCL 1.8 RC2. Trying it with a newer build of PCL may be worthwhile, since it was not included in the build of PCL 1.8 RC2.

Not recommended software:

Kscan3D Very painful scan process. Seems like there is no shortage of newer and better designed software available for this application.

Tests have been done with the Kinect V1 and an Inmoov robotic hand found that the resolution was insufficient to capture a decent model of the hand. The Intel Realsense R200 worked well.

Setup for taking the 3D scan can be troublesome and finicky. Workarounds for lower quality 3D scans are being investigated (see Design Choices).

Training

(In development)

Now that you have the 3D scan of the hand in a PLY file, you need to preform some prepossessing so that the 3D Object Detection ROS package can preform in real-time.

To do this, first convert the .ply file to a .pcd file using the following command (Requires PCL)

pcl_mesh2pcd <input.ply> <output.pcd>

Then build and run the 3D object training software provided (still in development) with the path to the 3D scan hand .pcd File. The resulting descriptor files will be put into a folder inside the build directory.

To generate these descriptor files, the training software makes a icosahedron (a sphere made of triangles) and generates a view of the hand from each vertex of the sphere. For 42 different views, descriptors are generated and saved to a folder inside of the program's build folder.

The path to this build folder must be passed to the ROS package that preforms the detection of the hand on launch.

Using the ROS package.

(Placeholder, in progress)

Design Choices

Please see source [1] for the general theory.

3D Object Detection using global or local descriptors.

Currently we are still investigating using local or global descriptors for object detection. There are advantages and disadvantages for each. Here is what we assume about the situation of tracking a robot hand during pick and place operation.

The point cloud cluster representing the arm in the scene may be partial / missing points that are in the 3D scan of the hand.
The scan of the hand may contain errors or missing data.

Therefore, we need to choose a descriptor that is optimized for handling missing or additional data in the point cloud cluster and can preform in real-time.

Local descriptors look promising because they focus on single points, and therefore will work well with occlusions (partial / missing points) and completely ignore additional points. However, many local descriptors like PFH or FPFH require normals to be calculated for the entire scene.

There are some global descriptors capable of dealing with occlusion, for example the Clustered Viewpoint Feature Histogram (CVFH) and Camera Roll Histogram (CRH). Using global descriptors are also stated to be faster than using local descriptors[3], so our real-time performance requirement may force our hand in this decision.

When comparing global vs local descriptors for object recognition, researchers made the following statement:

Finally, we would like to mention that a direct, fair, and accurate comparison between global and local features/pipelines is difficult due to their different characteristics and assumptions. However, it is clear that in scenarios where segmentation becomes challenging or occluded objects need to be recognized, global features should be avoided. Conversely, in controlled environments where speed becomes an important factor, global features might represent a good choice. [3]

Sources:

[1] PhD-3D-Object-Tracking

[2] CAD-model recognition and 6DOF pose estimation using 3D cues

[3] Three-Dimensional Object Recognition and 6 DoF Pose Estimation

Work on progress!