ROS package that generates hierarchical CNN features from RGB-D sensors
Python C++ CMake Shell
Pull request Compare This branch is 54 commits ahead, 82 commits behind yosinski:master.
Latest commit 5f89180 Feb 7, 2017 @goolygu make sure box is int
Failed to load latest commit information.

ROS Deep Vision package

This is a ROS package that generates hierarchical CNN features and their locations based on RGB-D inputs. Hierarchcial CNN features represent meaningful properties of object parts and can be localized to support manipulation. Read the arXiv paper Associating Grasping with Convolutional Neural Network Features for more details. This repository is originally forked from the Deep Visualization Toolbox made by Yosinski which I found extremely useful in understanding CNNs.


This current version assumes objects are placed on the ground or table top in order to crop the image into square images centered on objects as CNN inputs. Only the top N largest clusters are handled. This package is setup to handle RGB-D camera inputs with resolution 640x480 such as the Kinect and Asus xtion.


This package requires a specific branch of caffe and several different libraries listed below. This guide also assumes ROS (Version >= hydro) is already installed.

Step 0: Install caffe

Get the master branch of caffe to compile on your machine. If you've never used Caffe before, it can take a bit of time to get all the required libraries in place. Fortunately, the installation process is well documented.

In addition to running on GPU with CUDA it is highly recommended to install the cudnn library to speed up the computation. Remember to set USE_CUDNN := 1 in Caffe's Makefile.config before compiling if cudnn is installed.

After installing CUDA and caffe make sure that the following environment variables are set correctly:

export PATH=/usr/local/cuda-7.0/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:$LD_LIBRARY_PATH export PYTHONPATH=$PYTHONPATH:{PATH}/caffe/python

Step 1: Compile the ros branch of caffe

Instead of using the master branch of caffe, to use the package you'll need a slightly modified branch. Getting the branch and switching to it is easy.

If you are using cudnn version 5 than starting from your caffe directory, run:

$ git remote add goolygu
$ git fetch --all
$ git checkout --track -b ros-cudnn5 goolygu/ros-cudnn5
$ make clean
$ make -j
$ make -j pycaffe

If you are using cudnn version 4 than starting from your caffe directory, run:

$ git remote add goolygu
$ git fetch --all
$ git checkout --track -b ros goolygu/ros
$ make clean
$ make -j
$ make -j pycaffe

If you are not using cudnn, both versions should work.

Step 2: Install required python libraries if haven't

$ sudo apt-get install python-opencv

Install pip if haven't.

$ sudo pip install scipy
$ sudo pip install scikit-learn
$ sudo pip install scikit-image

Download python-pcl and install from local directory.

$ sudo pip install -e ./python-pcl/

Step 3: Download and configure ros-deep-vision package

Download the package and place it along with other ROS packages in the catkin workspace.

$ git clone
$ roscd ros_deep_vision

modify ./src/ so the caffevis_caffe_root variable points to the directory where you've compiled caffe in Step 1:

Download the example model weights and corresponding top-9 visualizations made by Yosinski (downloads a 230MB model and 1.1GB of jpgs to show as visualization):

$ cd models/caffenet-yos/
$ ./

Step 5: Install required ros packages if haven't

$ sudo apt-get install ros-{rosversion}-openni2-launch

I would recommend modifying the depth registration option in "openni2_launch/launch/openni2.launch" if the point cloud color has an offset and your hardware supports depth registration.

Step 4: Run the package

Make sure your RGB-D camera is connected. Start the RGB-D camera

$ roslaunch openni2_launch openni2.launch

Start the input server that does point cloud segmentation

$ roslaunch ros_deep_vision input_server.launch

Start rviz

$ roslaunch ros_deep_vision rviz.launch

You should be able to see the point cloud in rviz. Start the cnn state manager that generates the features

$ roslaunch ros_deep_vision cnn_state_manager.launch

Press enter r in the cnn_state_manager to run. The detected features should show up in rviz similar to the following image when finished running. The yellow, cyan, and magenta dots represent conv-5, conv-4, and conv-3 hierarchical CNN features. Set self.max_clusters = 3 in to a higher number if more than 3 objects are in the scene. alt tag

Set self.data_monster.show_backprop = True in if you want to visualize the targeted backpropagation result for each hierarchical CNN feature like in the image below. The blue dots are the feature locations based on the average response locations. Note that this may generate a bunch of image windows, set self.max_clusters = 1 so that it only handles the largest object. You can also change the following settings:

elif case == "cnn_features":
    self.conv5_top = 10
    self.conv4_top = 5
    self.conv3_top = 2
    self.conv2_top = 0

in to modify the number of features extracted. Note that in this case there will be 10 conv5, 50 conv4, and 100 conv3 hierarchical CNN features.

alt tag